← All posts

May 15, 2026 · 4 min read

Why your archive is worth more than you think

Most independent creators see their old work as a sunk cost. AI companies see it as raw material they need. The gap between those two views is the opportunity.

If you've been writing a newsletter for six years, or running a podcast since 2019, or putting essays on a personal website since college, you probably don't think of that body of work as a thing that has a price.

It feels more like compost. Old. Decaying. Mostly out of your head. You haven't looked at it in a while, and when you do, you mostly wince.

That's a mistake. Or more precisely — it was a defensible mistake in 2019, and it is a costly one in 2026.

What AI companies are actually buying

When OpenAI signs a content licensing deal with the Associated Press, or Reddit licenses its corpus to Google, the thing being purchased is not the next article. It is the backlog — twenty years of reporting, ten years of conversations, every single thing the AP archive contains.

The reason is dull but important: large language models are not trained on a steady drip of new material. They are trained on enormous static snapshots. A snapshot is more valuable when it contains years of high-quality human writing on a topic, because that gives the model the breadth it needs to generalize.

For a publication like the AP, "years of high-quality writing on a topic" describes their entire archive. For you — if you've been writing for any meaningful length of time on something you actually know about — it describes your archive too.

Why the price isn't zero

The instinct most creators have is to assume their work is too small to matter. There are billions of words on the internet. Why would anyone pay for yours?

Two reasons.

The first is that most of those billions of words are garbage. AI companies are very public about this. They are spending enormous amounts of money cleaning up scraped data, removing duplicates, filtering out spam, throwing out anything that doesn't come from a real human with a track record. The supply of clean training data is not in fact unlimited. It is constrained, and it gets more constrained every month as more of the web becomes AI-generated.

The second is that specificity matters. A general-purpose model needs general-purpose data. A specialist model — for legal research, medical Q&A, niche technical work, a particular language or community — needs specialist data, and there isn't much of it. If you have spent a decade writing about something you know well, your archive is more valuable to a specialist model than the entire scraped homepage of any major news site.

What the price actually is

Pricing is not yet a solved problem. Most deals so far have been one-off contracts negotiated in private, and the numbers leak out in ranges rather than line items. Recent reporting suggests the AP got somewhere between $5 million and $10 million per year. Reddit got around $60 million per year from Google. Smaller publishers have signed for low six figures.

For a single independent creator the math is obviously different — you're not selling AP's archive, you're selling yours. But the per-word multiples in those big deals are not enormous. A medium-size newsletter archive priced as if it were scaled-down AP data lands somewhere between $10,000 and $200,000, depending on size, exclusivity, and how specialized the content is.

Whether that is the right price is a separate question. The point is that the price is not zero, and it is not theoretical. Cheques have been cut. The default for an independent creator today is not "I might someday get paid." The default is "I am being trained on, right now, for free, by everyone."

What changes when you list it

The first thing that changes when you list your archive is that it becomes a legal artifact instead of a vibe. There is a record of who owns it, what the terms are, and what was licensed.

That sounds bureaucratic but it is actually the whole game. The reason AI companies pay the AP and not you is not that the AP's writing is so much better. It is that the AP can sue. You cannot — you don't have the lawyers, you don't have proof of ownership beyond your own word, and you don't have a way to even detect when your work has been used.

A platform like ArchiveBay closes that gap. You get verified proof of ownership. You get a public, immutable record every time a license is sold. And you get a price tag, in dollars, on a thing you have always been told had no commercial value.

The point isn't that you'll necessarily get rich. The point is that you'll stop pretending the work was free.