● NYT v. OpenAI · $-B at stake● Getty v. Stability AI · ongoing● Authors Guild class action · filed● Anthropic settlement · $1.5B● Reddit licensing deal · $60M/yr● UMG v. Anthropic · filed● Universal Music v. Suno · filed● Thomson Reuters v. Ross · verdict● NYT v. OpenAI · $-B at stake● Getty v. Stability AI · ongoing● Authors Guild class action · filed● Anthropic settlement · $1.5B● Reddit licensing deal · $60M/yr● UMG v. Anthropic · filed● Universal Music v. Suno · filed● Thomson Reuters v. Ross · verdict● NYT v. OpenAI · $-B at stake● Getty v. Stability AI · ongoing● Authors Guild class action · filed● Anthropic settlement · $1.5B● Reddit licensing deal · $60M/yr● UMG v. Anthropic · filed● Universal Music v. Suno · filed● Thomson Reuters v. Ross · verdict

License the work.
Keep the models honest.

AITrainingMart connects the people who made the internet with the companies training on it - under real licenses, with real payouts, and a real audit trail. The Amazon for AI training content.

Join the waitlist

$0B

AI training data market by 2028

CAGR 22%

Active AI copyright lawsuits

US + EU, 2024-26

$0B

Anthropic class-action settlement

Largest to date

AI is being built on a legal fault line.

The largest models were trained by scraping first and asking never. The bill is now coming due - in courtrooms, in creator revenue, and in the slow enclosure of the open web.

For AI labs

Existential legal exposure

70+ active suits. Discovery obligations. Training pipelines subpoenaed. One bad ruling can force a model re-train.

For creators

Work scraped, nothing earned

Decades of writing, photos, code and music ingested without consent, credit, or a dollar in return.

For the web

Robots.txt is breaking the internet

Open publishing collapses as every site walls off. Everyone loses the commons that made AI possible.

One licensed marketplace.
Both sides whole.

Creators list what they own. AI companies buy what they need. Every transaction is a signed license, a verifiable receipt, and a royalty stream that keeps paying as the model keeps earning.

Creators

List & license

Photos

Articles

Code

Audio

AITM

Rights engine

Watermark audit

Provenance chain

Smart contracts

Royalty ledger

AI companies

Train & deploy

Pre-train corpora

SFT datasets

RLHF prefs

Eval sets

Signed licenses

Every purchase creates a cryptographically signed usage contract.

Provenance built-in

C2PA + on-chain manifest. Defensible audit trail in any jurisdiction.

Royalties that compound

Creators keep earning when their data trains downstream models.

One integration

Dataset delivery to S3, GCS, R2, or your training cluster via API.

Four products.
One end-to-end rights stack.

From the moment a creator uploads a file to the moment a model deploys in production - every step has an owner, a receipt, and a royalty.

01 · FLAGSHIP

★ All-in-one platform

Core Marketplace

The exchange. List a dataset, discover a corpus, sign a license, settle a payout - all in one flow. Granular by modality, rights window, and exclusivity.

Explore the marketplace

At a glance

Modalities

Text · Image · Audio · Code · Video

Pricing

Fixed, auction, or royalty

Settlement

ACH · Wire · USDC

02 · MODULE

PreprocessX

Turn raw archives into training-grade corpora. Dedup, PII strip, language ID, toxicity filter, doc-quality scoring.

Throughput

4.2B tok/hr

Filters

42 pre-built

Output

JSONL · Parquet

03 · MODULE

HAIperTuneX

Fine-tune on licensed data without standing up the infra. Point at a model card, pick a corpus, hit run.

Engine

customLLM runtime

Modes

Instruct · preference · eval

Audit

Per-sample lineage

04 · API

Data Pipeline API

The programmatic spine. Stream licensed corpora directly into your training cluster. Rights-scoped, usage-logged.

Latency

p50 38ms

Protocols

REST · gRPC

SLA

99.95%

Built for both sides of the table.

For creators

Get paid when your work teaches a machine.

One-click listing

Upload a folder, an RSS feed, a GitHub repo. We handle the format.

You set the terms

Exclusive or non-exclusive. Pre-train or fine-tune only. Flat fee or royalty.

Compounding income

Earn every time a model derived from your work ships an update.

For AI companies

Train without the courtroom overhead.

Defensible corpora

Every token has a signed license. Discovery becomes a query.

Sourced on-spec

Request-for-data auctions: describe the gap, get curated bids.

Coverage you can price

Indemnity wrapper available - turn legal risk into a line item.

Talk to platform team →

For developers

Build on the pipeline.

Programmatic access

REST + gRPC endpoints. Stream licensed corpora straight into your training job or notebook.

JSON / JSONL / Parquet

Structured output, schema-pinned. Drop into HuggingFace datasets with one line.

Pay-per-token pricing

Metered by the million tokens. No seats, no commits - scale from notebook to cluster.

A category forming in real time.

$4.2B

AI TRAINING DATA TAM, 2026

The data that trains frontier models was worth roughly nothing a decade ago. By 2033 it will be a $16B market growing at 22% CAGR- and that’s only the clean, licensed slice. AITM is the exchange infrastructure that makes the slice legible, tradable, and auditable.

Licensed training-data TAM · USD B

▲ CAGR 22.0%

$4.2B

$5.1B

$6.3B

$7.6B

$9.3B

$11.4B

$13.9B

$16B

2026

2027

2028

2029

2030

2031

2032

2033

2.4M

Creators earning nothing on scraped works today

64%

Of top-100 sites now block AI crawlers

$1.1B

Published licensing deals in 2025 alone

3 of 4

Frontier labs under active discovery

The next era of AI will be built on consented data.
Get on the list.

Be first to access AITrainingMart - for creators and AI companies. Private beta opens Q3 2026.

License the work.Keep the models honest.