Clarity Compounded

Clarity That Grows With You.

jt@claritycompounded.com

Ghost in the Machine: How LLMs Are Profiting Off Writers Without Paying Them

Imagine spending years honing a voice. You write consistently, across tweets, essays, blog posts, maybe a book. Some pieces go viral and some don't, but it's your mind rendered into words, week after week, and the body of work accumulates into something recognizably yours.

Now imagine that anyone can open a chatbot and ask it to summarize your blog, generate ten tweets in your style, or explain your book better than you ever could. And it does, instantly, with no credit, no payment, and no acknowledgment that the core ideas, structure, and cadence all came from you.

This is not theoretical. This is the present tense of large language models like ChatGPT and Claude. They're extracting and internalizing massive amounts of publicly available human writing without consent, without compensation, and often without consequence. And writers, perhaps more than any other class of creator, are the most exposed in this new arrangement, because the very thing that made the internet useful for building an audience is the same thing that made their work easy to scrape.

The Illusion of "Free"

The internet trained us to publish for attention. Writers gave away thoughts for likes, threads for visibility, Substack posts for email subscribers, essays for the hope of one day landing a book deal or a speaking slot. But the value was never zero. The cost was time, intellect, and in many cases expertise that took a decade to earn.

And now, in a genuinely perverse twist, the internet's openness has become a feeding ground for AI systems that never ask "who wrote this?" but only "can we scrape this?" Public data becomes private capital. Open knowledge becomes proprietary training material. And your labor, your writing, becomes infrastructure for a trillion-dollar model you'll never profit from.

The Writer's Dilemma in the LLM Era

LLMs cannibalize the writer's value chain in three ways, and each one compounds the others.

The first is loss of revenue. Ask an LLM to summarize Atomic Habits or The War of Art and you'll get a surprisingly cogent, distilled, ready-to-digest version of a book that once sold for $15. Why buy the book? Now imagine this playing out across every book, every paid blog, every course, every newsletter. The backlist dies, the long tail vanishes, and the writer's residual income, already thin, quietly disappears.

The second is loss of attribution. LLMs don't credit sources. They remix, rephrase, and reassemble, but the DNA remains: a writer's phrasing, structure, and tone, all stripped of origin. Imagine someone quoting your idea, in your voice, without ever saying your name. Now imagine millions doing it at scale, invisibly, every day.

The third is loss of market differentiation. A writer's edge was once their uniqueness, the thing that made readers seek them out specifically. But LLMs flatten style. They democratize access while homogenizing output, and your voice becomes one of many, diluted, anonymized, and commodified. Worse still, others can now imitate you with a prompt.

The Core Problem: Your labor becomes infrastructure for a trillion-dollar model you'll never profit from.

Publishers' Silent Complicity

Traditional publishing should have been the firewall, but in many cases it's the sieve. Most publishers haven't updated copyright contracts to explicitly prevent AI training. DRM protections are weak, and book PDFs float across the internet, indexed and scraped like blog posts.

And worse, some publishing houses are selling access to their archives via quiet licensing deals, profiting from their authors' work without transparent revenue sharing. Writers, boxed into legacy contracts, have no recourse. The gatekeepers are either asleep or cashing checks while the gates burn.

How LLM Companies Rationalize It

Their stance is simple: it's publicly available, we're not copying but learning patterns, we filtered copyrighted data, we're compliant with fair use. But fair use wasn't built for this scale. It wasn't designed to account for models that internalize entire corpora and then replicate them with minor variation.

This is not quoting a passage. This is internalizing a million passages, generating an output, and pretending no one deserves a slice of that value. It's technical legalism dressed up as progress.

What Must Change

The knee-jerk response is to scream "regulate," but vague regulation without implementation will move slower than model training. We need layered, realistic, enforceable solutions, and here's what that could look like.

The first is a collective licensing body for writers, something like ASCAP for prose. Writers enroll their work, LLMs pay for access, and revenue is distributed back based on model usage, weight, or domain. It won't be perfect, but it's a starting point and a way to make "training" no longer a euphemism for theft.

The second is opt-in marketplaces for training material. Imagine an AI-native Substack where you publish, tag your work as trainable or not, set licensing terms if it is, and LLMs browse, pay, and use accordingly. Control moves back to the creator.

The third is digital watermarking for text. The tech exists to watermark images, videos, and music, and text needs the same treatment: invisible tags embedded into phrasing patterns or syntax choices. If LLM output mirrors the structure too closely, it gets flagged and attribution is enforced.

The fourth is consumer-level transparency. Build it into the LLM interface: "This summary draws from published books and essays by real human writers. Here are the sources. Consider supporting them." Even that tiny gesture reshapes user behavior and brings creators back into the value loop.

The fifth is paygated summarization. If I ask ChatGPT to summarize Sapiens, I shouldn't get a free essay. I should get a message telling me to purchase the book or read excerpts via a licensed preview. LLMs should augment consumption, not replace it.

Why It All Matters

This isn't about resistance to technology. I love LLMs and I use them. But I also write, and I can see what's coming if we don't course-correct.

If we allow LLMs to ingest everything without cost, we will eventually kill the incentive to create anything worth ingesting. AI will eat itself, training on AI-generated sludge, and the golden era of human insight, raw, flawed, and genuinely beautiful, will vanish beneath layers of derivative noise.

If we want intelligent machines, we have to sustain human intelligence. And that starts by paying the people who built the internet's mind: writers like you, like me, and like the millions whose names no one sees but whose words built the scaffolding of modern AI.

What Comes Next

We're at a decision point. Do we let LLMs become the new robber barons, extracting value at scale with no redistribution? Or do we build something better, a world where AI amplifies creators instead of replacing them, where compensation isn't an afterthought but a first principle, and where attribution isn't optional but essential?

This isn't a technical problem. It's a moral one. And if the machines are learning from us, the least we can do is give them something worth learning. Starting with how to be fair.

Share: