Training Model Detection Generative AI Analysis Open Web Monitoring
The foundation of the forensic record. Training data evidence feeds directly into output analysis and web monitoring. One continuous chain from model training through public deployment.
The Problem

Courts are dismissing cases for lack of evidence.

In 2025, two federal judges sided with AI companies — not because training was authorized, but because plaintiffs hadn't produced forensic evidence that specific works were in the data. One judge noted that plaintiffs with better evidence will often prevail. The evidentiary bar isn't impossibly high. It just hasn't been met.

Current Litigation Evidence Gap
What courts asked for
Evidence of specific works in training data with documented methodology and market impact
What plaintiffs provided
Side-by-side screenshots and general claims about publicly known training datasets
What VN delivers
Three independent investigative approaches — assembled into documented evidence packages with transparent methodology and stated confidence levels
The Methodology

Three investigative approaches.
One evidence package.

No single approach produces definitive evidence on its own. We layer independent methods to build the most complete forensic picture available.

Dataset Provenance Investigation

We investigate training data supply chains directly — tracing whether your copyrighted works appear in documented datasets, crawl indices, and disclosed data sources. Direct evidence of inclusion, not inference.

  • Search across documented training datasets and crawl archives
  • URL-level identification of your works in data pipelines
  • Supply chain mapping from original source through dataset to model
Dataset Provenance Search Results
TRAINING DATASET — MATCH FOUND
dataset/00042/000428391.jpg
Source URL: example-portfolio.com/works/image-042.jpg
CRAWL INDEX — MATCH FOUND
crawl-2023-14/segments/...warc.gz
Crawl date: March 2023 · Content hash verified
CHAIN: Your work → Source URL → Training dataset → Model

Output Elicitation & Analysis

We prompt models to reproduce your work, document everything, and score each output against your source material and a control set of similar content that wasn't in training. If the model reconstructs your work with significantly higher fidelity than the controls, that gap is the signal.

  • Elicitation protocols tailored to each asset
  • Control-set comparison against non-member content
  • Full audit trail with stated confidence levels
  • Tested across all major generative AI platforms
Sample Output Elicitation Scan
Midjourney
87% match
DALL·E 3
62% match
Stable Diff.
91% match
Imagen 3
34% match
Flux Pro
71% match
Illustrative example. Actual results vary by asset and model.
Statistical Membership Signal
Your asset
0.94
Control (similar)
0.41
Control (random)
0.22
Anomalous reconstruction signal detected.

Open-Weight Model Analysis

Where model weights are publicly available, we conduct analysis that goes beyond what's possible through API-only access. Direct access to model parameters enables deeper detection sensitivity that closed-source models don't permit. Particularly effective on fine-tuned models and community checkpoints.

  • White-box analysis of publicly available model weights
  • Deeper detection sensitivity than API-based approaches
  • Covers major open-source models, fine-tunes, and community checkpoints
Model Analysis Coverage
Open-source models Full white-box analysis
Community fine-tunes & LoRAs Full white-box analysis
Closed-source models API-based methods
How We Measure
Structural similarity
Pixel-level comparison
Semantic embeddings
Meaning-level analysis
Perceptual hashing
Fingerprint matching
Our Commitment

We tell you exactly what we can establish. And what we can't.

Forensic integrity matters more than marketing claims. This is how we position our findings.

What we don't claim

No methodology — ours or anyone else's — can prove with absolute certainty that a specific work was in a training set. The science is advancing rapidly, but honest limitations exist. We document them.

What we deliver

The strongest available body of evidence from multiple independent approaches. More substantive than what plaintiffs have recently brought to court, documented with transparent methodology so your legal team knows exactly what they're working with.

Get Started

Start with a forensic assessment.

A targeted analysis of your highest-priority assets against major AI models — combining dataset investigation with technical analysis. Delivered as a documented report with full methodology, stated confidence levels, and clear next steps for your legal team.

Request Forensic Assessment Next: AI Detection →