How Is AI Closing the Open Web in 2026?

Table of Contents

A young woman with silver-white hair and calm crimson eyes stands before a towering wall of monospace code that is open and flowing on the machine’s side and sealed into a blank barrier on the human’s side, teal-cyan and red neon accents in the rain-dark

A note on funding: CypherpunkGuide carries no surveillance advertising — no ad networks, tracking pixels, or sponsored content. It is funded by transparent streams: reader donations now; subscription and editorially-aligned affiliate later. We answer to our readers, not to advertisers.

The open web — the one where anyone could publish at a URL and anyone could find it by following a link — has spent two years being quietly re-plumbed around AI. The change is easy to miss because nothing was announced and no wall went up overnight. What happened instead is measurable in traffic logs. When Google shows an AI-generated answer, the share of searches that send a click to an outside site falls from roughly 15% to about 8% (Pew Research, 2025); across thousands of news sites, search referrals dropped about a third over the year to late 2025 (Chartbeat data, reported by the Reuters Institute); and the AI tools that absorbed that attention send back, in return, on the order of 0.1–0.5% of web visits.

So the web is being read more than ever — just not by people, and not in a way that returns to the source. Is the open web dying, then? The question is worth holding for a moment, because the answer decides what you do next — mourn it, optimize for the machines reading it, or build something they cannot enclose. “Dying” turns out to be both too dramatic and too comforting a word; the honest diagnosis is narrower, and it points toward construction rather than nostalgia or surrender.

A note on where I stand: this site is published the way this article will end up recommending — self-hosted, syndicated over RSS and open social protocols, dependent on no single platform to exist. That is not a victory lap. It is a small, practical experiment, and what follows is as honest about its limits as about its case.

The Web Is Being Read More Than Ever — Just Not by People
#

The change is not that the web is being crawled — it always was — but that the crawl no longer returns a visit: search was a directory that routed people to sources, while AI answer engines are a terminus that ends the journey at the answer. That distinction is the whole argument, and it is easy to get wrong.

It would be a mistake to point at the sheer volume of AI crawling as the harm. Some bots are extraordinarily extractive — one analysis put a major AI crawler’s ratio, in 2025, at over twenty thousand pages fetched for every visitor it referred back — but search engines have always crawled far more than they sent traffic to. Crawling more than you refer is normal; it is how indexing works. The number that matters is not the ratio but the severed return path: the page is read, the answer is served, and the reader never arrives.

That severance is what the cleaner figures isolate. The same search, with and without an AI summary, loses roughly half its outbound clicks (about 15% down to 8%), and only about 1% of users click the citations inside an AI answer (Pew Research, 2025). Zero-click searches — those that end without a visit anywhere — rose from about 56% to 69% in the year to May 2025 (Similarweb data, via Search Engine Roundtable). To be fair to the data, the broad referral decline has several causes at once — social platforms throttling outbound links, core-algorithm shifts, paywalls, audiences moving to apps — so the honest causal claim rests on the with-and-without-AI comparison, not on blaming AI for every lost visit.

	The web as directory (search)	The web as terminus (AI answer)
What the crawl is for	Indexing, to route a human to the page	Ingestion, to synthesize an answer in place
What the reader gets	A list of sources to visit	A finished answer; the sources are optional
What the source gets back	A visit — attention, subscribers, revenue	A citation almost no one clicks (~1%)
The bargain	Allow indexing in exchange for discovery	Allow ingestion in exchange for ~0.1–0.5% referral

But People Are Choosing the Answer Box
#

The uncomfortable truth the enclosure story tends to skip is that most people prefer the answer to the ten blue links — and they are not wrong to. Any argument that treats readers as pure victims of AI search is going to misread why AI search won, and it will be easy to dismiss for exactly that reason.

The open web that users are leaving was, very often, a hostile place to land: search-optimized filler that buried the answer under a personal anecdote, ad units that shifted the text as it loaded, cookie banners, newsletter pop-ups, autoplay video. An answer that skips all of that is a genuine improvement in daily life, not a trick played on the gullible. Honesty about that is the price of being taken seriously.

But two things follow that the convenience does not cancel. First, the answer is assembled from work the answer engine did not do and, increasingly, neither pays for nor points back to — the cost is transferred to the people who wrote the underlying pages, with no mechanism to recover it. Second, and more corrosive in the long run, is the sustainability paradox: an answer box that starves its sources eventually has nothing fresh or true left to summarize. By mid-2025, researchers at Stanford, Imperial College and the Internet Archive estimated that 17.6% of newly published websites were entirely AI-generated — a subset of the roughly 35% that were AI-generated or AI-assisted (reported by Gizmodo, 2025); “slop” — low-quality content generated at scale by AI — was named Merriam-Webster’s word of the year for 2025. A web that increasingly reads and rewrites itself is a hall of mirrors. So the problem is not that users are wrong to want answers. It is that the current arrangement quietly spends down a commons that no one is refilling.

Every Response on Offer Is a Petition or a Palliative
#

The fixes currently on the table — block the crawlers, charge them, optimize for them, or sue them — share a hidden assumption: that a web you control can be restored by persuading the enclosers, or a new middleman, to behave. Each is worth taking seriously on its own terms, and each, taken seriously, falls short of returning control.

Blocking is the reflex, and the weakest. A robots.txt rule is a request, not a fence: it has no reliable legal force on its own, and a well-funded crawler can ignore it or simply relabel its traffic. The most effective version delegates the blocking to an intermediary — Cloudflare began blocking AI crawlers by default for new sites in July 2025 (Cloudflare) — which works, but only by moving the gate to Cloudflare.

Charging is the response that looks most like a solution, and it deserves the strongest version of its case. Cloudflare’s “pay-per-crawl” marketplace (launched July 2025) and the large licensing deals — Reddit’s reported ~$60 million a year from Google, News Corp’s reported $250-million-plus, five-year pact with OpenAI — are not nothing. They are the first real compensation many publishers have ever had for machine reading, and a price signal where there had been only free extraction. The trouble is structural. Pay-per-crawl installs a central tollbooth as the interface between every site and every model; it concedes the principle that access is fine so long as you pay the gatekeeper; and it routes the money to the handful of publishers with the leverage to negotiate, leaving the independent web with a cheaper version of the same dependency. It is enclosure with a revenue-share — call it Enclosure 2.0 — not a web the publisher controls.

Optimizing for AI search (“generative engine optimization”) is adaptation that has already accepted the terms. And litigation, the most institutional path, is genuinely consequential: the copyright suits against OpenAI are live — a judge ordered the company to produce twenty million ChatGPT logs in January 2026 (National Law Review) — the UK’s competition regulator now lets publishers opt out of Google’s AI summaries (Press Gazette, 2026), and the EU AI Act’s remaining transparency obligations, including disclosure rules for AI-generated content, reach their enforcement phase in August 2026. These matter, and the next section is not an argument against them. But they are slow, jurisdiction-bound, and uncertain, and every one of them asks an institution to grant what cypherpunks have long said institutions do not grant out of beneficence.

Response	What it actually does	Honest upside	Why it is still a petition
`robots.txt` block	Politely asks bots not to read	Free, simple, widely honored by reputable bots	No legal force alone; ignored or relabeled by aggressive crawlers
Pay-per-crawl / licensing	Charges machines via a CDN or deal	First real compensation; a price signal	New central tollbooth; concedes “access if you pay”; favors incumbents
AI-search optimization	Formats content for the answer box	Some visibility inside the enclosure	Accepts the terms; you are now optimizing for the gatekeeper
Copyright litigation	Sues for training/use	Can reshape licensing and disclosure	Slow, jurisdictional, uncertain; asks institutions to grant control

The Open Web Was Already Enclosed — Which Is the Whole Point
#

Before mourning the open web, it is worth admitting it had already been enclosed once: for fifteen years, most discovery ran through a single search box and most of the money through one ad exchange. AI did not pave a free commons. It is the second enclosure, and it is taking the one consolation the first one left publishers — the referral visit.

That correction matters because it kills the nostalgia, and the nostalgia is the trap. The goal was never to restore the Google-era web, which was already someone else’s toll road. The useful instruction is older and was written down in 1993: privacy — and, it turns out, openness — is not something you petition institutions to grant, because privacy that depends on an institution’s goodwill is privacy it can revoke. The cypherpunk conclusion is to build the guarantee into a mechanism instead, which is the argument of the Cypherpunk Manifesto applied now to the web itself rather than to the message in transit.

The person who saw the shape of this earliest was Richard Stallman. His 1997 story The Right to Read imagined a near future in which the act of reading is metered and access to text is controlled by whoever owns the software — not a bad description of a web where reading is increasingly mediated by a service you query rather than pages you open. And his objection to that kind of service is precise rather than merely rhetorical. He argues that doing your computing on someone else’s server “inherently trashes your computing freedom,” because you cannot get a copy of a hosted AI and run it yourself; the only way to use it is on a machine you do not control. He is blunter still about its reliability — he refuses to call it intelligence at all, preferring “bullshit generator” for a system that, in his words, “generates output ‘with indifference to the truth.’” You need not adopt the polemic to keep the structural point, which is the cypherpunk one: a capability you cannot run, inspect, or fork is a capability someone else controls.

“Privacy that depends on an institution’s goodwill is privacy that the institution can revoke.” The same is now true of access, discovery, and reading. The durable version of any of them has to be a property of a mechanism, not a promise.

The Honest Sovereign-Web Revival Path
#

The constructive answer is not to quit the mainstream web tomorrow — almost no one can — but to build a parallel layer whose openness needs no company’s permission, starting with how you read and ending with how you publish. What makes this different from the usual “just use the decentralized web” advice is that it states the limits as plainly as the tools.

Reading sovereign comes first, because it is the easier half:

Follow sources directly with RSS — the open web’s surviving circulatory system. It is a feed you control, with no algorithm deciding what you see and no engagement metrics shaping what gets written; interest in it is climbing again as readers look for an exit from algorithmic feeds.
Read through Tor Browser for unintermediated access to the web and to onion services. This is load-bearing rather than fringe: the New York Times, the Guardian and Der Spiegel all run onion services for sources.
Answer your own questions with a local, open-weights model — the step most “decentralize everything” guides skip. Decentralizing your data accomplishes little if the only intelligence allowed to read it is a frontier model running on someone else’s GPUs, which is where the real concentration of power now sits. Running a smaller model locally, over indexes and feeds you choose, is what makes the read sovereign rather than merely relocated. Sovereignty over the model matters as much as sovereignty over the data.

Publishing sovereign is the harder half, and the more important one:

Self-host on a cheap virtual server — a home no platform can delete.
Syndicate over RSS and open social protocols — Nostr and the Fediverse’s ActivityPub — where the relationship with your audience is not a platform’s to revoke.
Offer an onion mirror if your readers need to reach you from somewhere hostile.

None of this is frictionless, and pretending otherwise is how the last decade of “own your platform” advice lost credibility. So, the limits, stated plainly. The Fediverse is on the order of one to two million monthly-active users, not a billion; it is a town, not a continent. IPFS trades real latency and a steep learning curve for its censorship resistance. Self-hosting is a privilege of time and skill before it is anything else. None of these replaces the reach of mainstream search or a large social platform, and anyone who tells you a migration is imminent is selling something.

Which is why building this layer is not a substitute for politics. Antitrust enforcement, statutory licensing, the kind of opt-out the UK regulator just won, public-interest search indexes, data trusts — all of it is worth fighting for, and code does not replace any of it. The sovereign web is the floor beneath that fight: the thing that holds when the law is slow or captured, so that you are not helpless in the meantime. The cypherpunk claim was never that software replaces collective action; it is that rights you can run yourself do not depend on winning the politics first.

This is not theoretical here. This site is self-hosted, posted to Nostr and the Fediverse, and reachable without any platform’s leave; we also watch the self-identifying AI crawlers arrive on our own server logs every day, as we noted in our manifesto primer. The honest report is that the sovereign web is a place to stand, not yet a place to win — smaller and slower to find than it would be if we optimized purely for the answer box. To be square about our own interests: an independent site can benefit from being cited in an AI answer, and we are not against that. The point is not to refuse the machines but to refuse dependence on a channel a company can close. Be citable; do not be captured. The same logic runs through the rest of our work on AI-scale surveillance — from the AI-age threat model to the way deletion no longer reaches a model’s training data in the permanence of your footprint — and through the identity checkpoints going up at the door of the web in the fight over age verification.

Layer	The move	What it gives you	The honest limit
Read — feeds	RSS reader you control	Discovery with no algorithm or metrics	You curate it yourself; no serendipity engine
Read — access	Tor Browser + onion services	Unintermediated, censorship-resistant reading	Slower; some sites mis-treat Tor traffic
Read — intelligence	Local, open-weights model	Answers from software you run, not a logged service	Smaller and less capable than frontier models
Publish — hosting	Self-host on a cheap VPS	A home no platform can delete	Time, skill, and maintenance are on you
Publish — reach	Nostr + Fediverse + RSS	An audience tie no platform owns	~1–2M-active scale, not mass reach

Bottom Line — Build, Don’t Petition
#

The open web is not dying of natural causes; it is being enclosed for the second time, and the most honest response is neither nostalgia nor adaptation but construction. Search was the first enclosure; the AI answer box is the second, and it removes the referral visit that the first one left behind. Readers genuinely prefer answers, which is exactly why the source web has to be defended on grounds of sustainability and sovereignty rather than sentiment. The petitions and palliatives on offer — blocking, charging, optimizing, suing — each leave a gatekeeper in charge. The older instruction is the durable one: build openness into mechanisms no one can revoke — the model as well as the data — and treat that as the floor under the law and collective action, not a replacement for them. It will not out-scale the answer box this year. It is, still, somewhere to stand when the walls finish going up.

Key Takeaways

The harm is a severed return path, not crawling itself: with an AI summary present, outbound clicks roughly halve (about 15% to 8%), and only ~1% of people click the citations inside an AI answer.
It is the second enclosure: the pre-AI web already ran through one search box and one ad exchange, so the goal is not nostalgia for the Google era but openness built into mechanisms no one can revoke.
Users are not wrong to prefer answers — but an answer box that starves its sources runs into a sustainability paradox, with ~17.6% of new sites already entirely AI-generated by mid-2025.
Every mainstream fix leaves a gatekeeper in charge: robots.txt has no legal force alone, pay-per-crawl installs a central tollbooth, optimization accepts the terms, and litigation is slow and jurisdictional.
Sovereignty needs the model, not just the data: decentralizing content while the only intelligence is a frontier model on someone else’s GPUs changes little — a local, open-weights model is what makes reading sovereign rather than relocated.

Frequently Asked Questions
#

Is AI really closing the open web, or just changing it?
#

Both, but the precise claim is narrower than “the web is dying.” Web content is more accessible to machines than ever; what is closing is the path back to humans and to the sites that produced it. When an AI answer appears, outbound clicks fall by roughly half, and the AI tools that captured that attention refer back only about 0.1–0.5% of web traffic. The web is being read more than ever — just not by people in a way that returns to the source. “Enclosure” fits better than “death”: the commons is still there, but access to its value is being walled and metered.

What is “pay-per-crawl,” and does it fix the problem?
#

Pay-per-crawl is a system — Cloudflare launched a marketplace for it in July 2025 — that lets a site charge AI crawlers for access instead of blocking them, alongside direct licensing deals between big publishers and AI companies. It is a real improvement on uncompensated scraping and the first money many publishers have seen for machine reading. But it does not return control of the web to publishers: it installs a central intermediary as the tollbooth between every site and every model, accepts the principle that access is fine if you pay the gatekeeper, and favors large publishers with negotiating leverage. It is enclosure with a revenue-share, not an open web.

Should I block AI crawlers with robots.txt?
#

You can, and reputable crawlers will usually honor it, but understand what it is: a request, not a fence. A robots.txt directive has no reliable legal force on its own, and aggressive or relabeled crawlers can ignore it. Blocking also trades away any chance of being cited in answers people do read. A more durable posture than block-or-allow is to stop depending on any single channel you do not control — publish where your audience tie cannot be revoked, and treat crawler policy as a tactic, not a strategy.

What is the “sovereign web,” realistically?
#

It is the set of ways to read and publish that do not depend on a company’s permission: RSS for feeds, Tor and onion services for access, locally-run open-weights models for answers, and self-hosting plus open social protocols (Nostr, the Fediverse) for publishing. Realistically, it is niche today — the Fediverse is on the order of one to two million monthly-active users, self-hosting demands time and skill, and IPFS trades latency for censorship resistance. It will not replace mainstream search or social this year. Its value is as resilient parallel infrastructure — a place to stand — not as a finished replacement.

Do I have to be technical to escape the answer-box web?
#

No, not to start. The lowest-effort, highest-return move is reclaiming how you read: install an RSS reader and follow sources directly, so an algorithm and an answer box stop deciding what reaches you. Tor Browser is a one-click download for unintermediated access. The more technical steps — running a local model, self-hosting, offering an onion mirror — are a ladder you can climb over time, not a prerequisite. The principle does not require code: prefer tools and protocols whose openness is a property of their design over services that merely promise to behave.

#	Source	URL	Archive
1	Pew Research — clicks fall when an AI summary appears (2025)	https://www.pewresearch.org/short-reads/2025/07/22/google-users-are-less-likely-to-click-on-links-when-an-ai-summary-appears-in-the-results/	https://web.archive.org/web/2025/https://www.pewresearch.org/short-reads/2025/07/22/google-users-are-less-likely-to-click-on-links-when-an-ai-summary-appears-in-the-results/
2	Reuters Institute — Journalism Trends 2026 (Chartbeat ~one-third referral decline)	https://reutersinstitute.politics.ox.ac.uk/journalism-media-and-technology-trends-and-predictions-2026	https://web.archive.org/web/2026/https://reutersinstitute.politics.ox.ac.uk/journalism-media-and-technology-trends-and-predictions-2026
3	Search Engine Roundtable — Similarweb zero-click 56%→69%	https://www.seroundtable.com/similarweb-google-zero-click-search-growth-39706.html	https://web.archive.org/web/2025/https://www.seroundtable.com/similarweb-google-zero-click-search-growth-39706.html
4	Cloudflare — controlling content use for AI training (July 2025)	https://blog.cloudflare.com/control-content-use-for-ai-training/	https://web.archive.org/web/2025/https://blog.cloudflare.com/control-content-use-for-ai-training/
5	Cloudflare — Pay Per Crawl changelog (2025-07-01)	https://developers.cloudflare.com/changelog/2025-07-01-pay-per-crawl/	https://web.archive.org/web/2025/https://developers.cloudflare.com/changelog/2025-07-01-pay-per-crawl/
6	The Decoder — Reddit–Google AI training deal (~$60M/yr)	https://the-decoder.com/reddit-signs-60-million-annual-training-data-deal-with-google/	https://web.archive.org/web/2024/https://the-decoder.com/reddit-signs-60-million-annual-training-data-deal-with-google/
7	Variety — News Corp–OpenAI licensing deal	https://variety.com/2024/digital/news/news-corp-openai-licensing-deal-1236013734/	https://web.archive.org/web/2024/https://variety.com/2024/digital/news/news-corp-openai-licensing-deal-1236013734/
8	Gizmodo — 17.6% of new sites entirely AI-generated (Stanford/Imperial/Internet Archive)	https://gizmodo.com/dead-internet-theory-is-17-of-the-way-to-becoming-reality-study-finds-2000751718	https://web.archive.org/web/2025/https://gizmodo.com/dead-internet-theory-is-17-of-the-way-to-becoming-reality-study-finds-2000751718
9	Merriam-Webster — Word of the Year 2025 (“slop”)	https://www.merriam-webster.com/wordplay/word-of-the-year	https://web.archive.org/web/20251201000000*/https://www.merriam-webster.com/wordplay/word-of-the-year
10	National Law Review — court orders 20M ChatGPT logs (Jan 2026)	https://natlawreview.com/article/openai-loses-privacy-gambit-20-million-chatgpt-logs-likely-headed-copyright	https://web.archive.org/web/2026/https://natlawreview.com/article/openai-loses-privacy-gambit-20-million-chatgpt-logs-likely-headed-copyright
11	Press Gazette — UK CMA publisher opt-out from AI summaries (Jan 2026)	https://pressgazette.co.uk/platforms/google-regulation-uk/	https://web.archive.org/web/2026/https://pressgazette.co.uk/platforms/google-regulation-uk/
12	GNU Project — Richard Stallman, The Right to Read (1997)	https://www.gnu.org/philosophy/right-to-read.en.html	https://web.archive.org/web/2025/https://www.gnu.org/philosophy/right-to-read.en.html
13	stallman.org — Reasons not to use ChatGPT	https://stallman.org/chatgpt.html	https://web.archive.org/web/2025/https://stallman.org/chatgpt.html
14	Tor Project — official site / onion services	https://www.torproject.org/	https://web.archive.org/web/2025/https://www.torproject.org/

The Web Is Being Read More Than Ever — Just Not by People#

But People Are Choosing the Answer Box#

Every Response on Offer Is a Petition or a Palliative#

The Open Web Was Already Enclosed — Which Is the Whole Point#

The Honest Sovereign-Web Revival Path#

Bottom Line — Build, Don’t Petition#

Frequently Asked Questions#

Is AI really closing the open web, or just changing it?#

What is “pay-per-crawl,” and does it fix the problem?#

Should I block AI crawlers with robots.txt?#

What is the “sovereign web,” realistically?#

Do I have to be technical to escape the answer-box web?#

Related