Training Model Detection

The Problem

Courts are dismissing cases for lack of evidence.

In 2025, two federal judges sided with AI companies — not because training was authorized, but because plaintiffs hadn't produced forensic evidence that specific works were in the data. One judge noted that plaintiffs with better evidence will often prevail. The evidentiary bar isn't impossibly high. It just hasn't been met.

Current Litigation Evidence Gap

What courts asked for

Evidence of specific works in training data with documented methodology and market impact

What plaintiffs provided

Side-by-side screenshots and general claims about publicly known training datasets

What VN delivers

Three independent investigative approaches — assembled into documented evidence packages with transparent methodology and stated confidence levels

The Methodology

Three investigative approaches.
One evidence package.

No single approach produces definitive evidence on its own. We layer independent methods to build the most complete forensic picture available.

Dataset Provenance Investigation

We investigate training data supply chains directly — tracing whether your copyrighted works appear in documented datasets, crawl indices, and disclosed data sources. Direct evidence of inclusion, not inference.

Search across documented training datasets and crawl archives
URL-level identification of your works in data pipelines
Supply chain mapping from original source through dataset to model

Dataset Provenance Search Results

TRAINING DATASET — MATCH FOUND

dataset/00042/000428391.jpg

Source URL: example-portfolio.com/works/image-042.jpg

CRAWL INDEX — MATCH FOUND

crawl-2023-14/segments/...warc.gz

Crawl date: March 2023 · Content hash verified

CHAIN: Your work → Source URL → Training dataset → Model

Output Elicitation & Analysis

We prompt models to reproduce your work, document everything, and score each output against your source material and a control set of similar content that wasn't in training. If the model reconstructs your work with significantly higher fidelity than the controls, that gap is the signal.

Elicitation protocols tailored to each asset
Control-set comparison against non-member content
Full audit trail with stated confidence levels
Tested across all major generative AI platforms

Sample Output Elicitation Scan

Midjourney

87% match

DALL·E 3

62% match

Stable Diff.

91% match

Imagen 3

34% match

Flux Pro

71% match

Illustrative example. Actual results vary by asset and model.

Statistical Membership Signal

Your asset

0.94

Control (similar)

0.41

Control (random)

0.22

Anomalous reconstruction signal detected.

Open-Weight Model Analysis

Where model weights are publicly available, we conduct analysis that goes beyond what's possible through API-only access. Direct access to model parameters enables deeper detection sensitivity that closed-source models don't permit. Particularly effective on fine-tuned models and community checkpoints.

White-box analysis of publicly available model weights
Deeper detection sensitivity than API-based approaches
Covers major open-source models, fine-tunes, and community checkpoints

Model Analysis Coverage

Open-source models Full white-box analysis

Community fine-tunes & LoRAs Full white-box analysis

Closed-source models API-based methods

How We Measure

Structural similarity

Pixel-level comparison

Semantic embeddings

Meaning-level analysis

Perceptual hashing

Fingerprint matching

Our Commitment

We tell you exactly what we can establish. And what we can't.

Forensic integrity matters more than marketing claims. This is how we position our findings.

What we don't claim

No methodology — ours or anyone else's — can prove with absolute certainty that a specific work was in a training set. The science is advancing rapidly, but honest limitations exist. We document them.

What we deliver

The strongest available body of evidence from multiple independent approaches. More substantive than what plaintiffs have recently brought to court, documented with transparent methodology so your legal team knows exactly what they're working with.

Is your IP inside the model? Build the evidence.

Courts are dismissing cases for lack of evidence.

Three investigative approaches.
One evidence package.

Dataset Provenance Investigation

Output Elicitation & Analysis

Open-Weight Model Analysis

We tell you exactly what we can establish. And what we can't.

Start with a forensic assessment.

Is your IP inside the model? Build the evidence.

Courts are dismissing cases for lack of evidence.

Three investigative approaches.One evidence package.

Dataset Provenance Investigation

Output Elicitation & Analysis

Open-Weight Model Analysis

We tell you exactly what we can establish. And what we can't.

Start with a forensic assessment.

Three investigative approaches.
One evidence package.