AITrainingMart connects the people who made the internet with the companies training on it - under real licenses, with real payouts, and a real audit trail. The Amazon for AI training content.
The largest models were trained by scraping first and asking never. The bill is now coming due - in courtrooms, in creator revenue, and in the slow enclosure of the open web.
For AI labs
Existential legal exposure
70+ active suits. Discovery obligations. Training pipelines subpoenaed. One bad ruling can force a model re-train.
For creators
Work scraped, nothing earned
Decades of writing, photos, code and music ingested without consent, credit, or a dollar in return.
For the web
Robots.txt is breaking the internet
Open publishing collapses as every site walls off. Everyone loses the commons that made AI possible.
One licensed marketplace. Both sides whole.
Creators list what they own. AI companies buy what they need. Every transaction is a signed license, a verifiable receipt, and a royalty stream that keeps paying as the model keeps earning.
Creators
List & license
Photos
Articles
Code
Audio
AITM
Rights engine
Watermark audit
Provenance chain
Smart contracts
Royalty ledger
AI companies
Train & deploy
Pre-train corpora
SFT datasets
RLHF prefs
Eval sets
01
Signed licenses
Every purchase creates a cryptographically signed usage contract.
02
Provenance built-in
C2PA + on-chain manifest. Defensible audit trail in any jurisdiction.
03
Royalties that compound
Creators keep earning when their data trains downstream models.
04
One integration
Dataset delivery to S3, GCS, R2, or your training cluster via API.
Four products. One end-to-end rights stack.
From the moment a creator uploads a file to the moment a model deploys in production - every step has an owner, a receipt, and a royalty.
01 · FLAGSHIP
★ All-in-one platform
Core Marketplace
The exchange. List a dataset, discover a corpus, sign a license, settle a payout - all in one flow. Granular by modality, rights window, and exclusivity.
REST + gRPC endpoints. Stream licensed corpora straight into your training job or notebook.
02
JSON / JSONL / Parquet
Structured output, schema-pinned. Drop into HuggingFace datasets with one line.
03
Pay-per-token pricing
Metered by the million tokens. No seats, no commits - scale from notebook to cluster.
A category forming in real time.
$4.2B
AI TRAINING DATA TAM, 2026
The data that trains frontier models was worth roughly nothing a decade ago. By 2033 it will be a $16B market growing at 22% CAGR- and that’s only the clean, licensed slice. AITM is the exchange infrastructure that makes the slice legible, tradable, and auditable.
Licensed training-data TAM · USD B
▲ CAGR 22.0%
$4.2B
$5.1B
$6.3B
$7.6B
$9.3B
$11.4B
$13.9B
$16B
2026
2027
2028
2029
2030
2031
2032
2033
2.4M
Creators earning nothing on scraped works today
64%
Of top-100 sites now block AI crawlers
$1.1B
Published licensing deals in 2025 alone
3 of 4
Frontier labs under active discovery
The next era of AI will be built on consented data. Get on the list.
Be first to access AITrainingMart - for creators and AI companies. Private beta opens Q3 2026.