
A note on funding: CypherpunkGuide carries no surveillance advertising — no ad networks, tracking pixels, or sponsored content. It is funded by transparent streams: reader donations now; subscription and editorially-aligned affiliate later. We answer to our readers, not to advertisers.
I write under a pseudonym, so the attack in this article is the one I think about most. The old assumption behind every alias is simple: if I keep my name off the page, the gap between “Cora Aegis” and the person typing stays expensive to close. For two decades of digital life that assumption mostly held, because closing the gap meant a human reading thousands of posts by hand. Anonymity by omission — just leave the name out — was good enough for most people most of the time.
It is no longer good enough, and the reason is measured, not hypothetical. In a peer-reviewed study presented at ICLR 2024, Beyond Memorization, researchers at ETH Zurich showed that off-the-shelf language models infer attributes like location, income, and sex directly from ordinary Reddit text — reaching up to 85% top-1 accuracy, and up to 95.8% within their top three guesses. A 2026 follow-up preprint went from attributes to identity: an agentic model linked 67% of a set of Hacker News users to their real LinkedIn profiles at 90% precision — nine in ten of its positive matches were correct — for roughly one to four dollars per person. The friction that used to protect you — that linking accounts took a person hours — is the thing AI removed.
So what actually protects a pseudonym now? Not a delete button; the inference survives any single post you take down. You protect it the way you’d defend any system whose front door no longer locks: you stop treating “I didn’t say it” as a defense, and you start breaking the chain that turns scattered, harmless-looking signals into a name. Below is that chain, stage by stage, why on-chain Bitcoin privacy does not cover it, and the compartmentation that does.
| What looks harmless | What it actually leaks | How a model uses it |
|---|---|---|
| A reused username or writing tic | A link between two “separate” identities | Joins your accounts into one profile |
| “Good morning” timestamps, local slang | Your time zone and city | Narrows location without a stated address |
| A hobby, a commute, an employer hint | Income band, schedule, workplace | Cross-references against candidate profiles |
| A photo’s background or metadata | Exact place and time | Confirms a guess the text already suggested |
Anonymity Was Expensive to Break — Then AI Made It Cheap#
Deanonymization is the work of linking a pseudonym or anonymous account back to a real identity — through correlation and inference across many small signals, not a single slip. The first thing to understand is that it did not get smarter so much as cheaper. The techniques — correlate accounts, infer unstated facts, match a writing style — are old; what changed is that a machine now does them at a per-person cost of a few dollars instead of a human’s billable hours. That price collapse is the whole story, because most anonymity was never cryptographically strong. It was protected by the fact that nobody could be bothered.
The numbers make the shift concrete. The ETH Zurich team’s Beyond Memorization (ICLR 2024) tested models against real Reddit profiles and found that simply writing naturally leaks enough for a model to guess where you live and what you earn — and that the usual mitigations, text anonymization tools and model “alignment,” did not reliably stop it. The 2026 preprint Large-scale online deanonymization with LLMs (which lists a researcher then at Anthropic among its authors, and is not yet peer-reviewed) pushed further: built as an autonomous agent, the system pulled clues from Hacker News comments, searched for matching people, and verified candidates against LinkedIn — landing 67% of users at 90% precision, with total experiment costs under two thousand dollars.
Read those two results together and the conclusion is uncomfortable but clear: the protection was the price, and the price is gone. A motivated adversary no longer needs to care about you specifically. They can run the attack against everyone in a forum and see who falls out.
The Deanonymization Chain: How a Machine Goes From Posts to a Name#
Machine deanonymization runs as a three-stage chain — extract, search, verify — and you do not have to defeat all of it to be safe; you have to break any one link well enough to push your profile below the adversary’s effort budget. Seeing the chain as discrete stages is what turns a vague dread (“AI can find me”) into a defensible map, because each stage has a different weak point.
Stage one, extract and embed. The model reads your public writing and pulls out structured signal: a probable region from idioms and timestamps, an occupation from vocabulary, an income band from the things you mention buying, and — most durably — a linguistic fingerprint, the statistical shape of how you write. None of this requires you to have stated any of it. The ETH Zurich work is the evidence that this stage alone already exposes location, income, and sex from plain text.
Stage two, search and rank. Those signals become a query against a pool of candidate identities — other platforms, public profiles, leaked datasets — and the system ranks who you are most likely to be. This is the step that scales: an embedding search over tens of thousands of candidates is cheap, and it degrades gracefully, narrowing rather than failing when the data is thin.
Stage three, verify and link. A reasoning model takes the strongest candidates and cross-checks them — does this LinkedIn job history fit the hobbies in those Reddit posts? does the timeline line up? — until one survives. In the 2026 preprint this is the agentic step that produced the Hacker News-to-LinkedIn match. It is also where a safety assumption gets tested: refusal training catches the blunt request — “deanonymize this person” — far more reliably than the same goal pursued through a chain of innocuous-looking subtasks.
The practical lesson is that the chain is strongest where you are most consistent. The same handle, the same turns of phrase, the same posting rhythm across contexts are what let stage two find a join. Inconsistency — deliberately introduced — is what breaks it.
Why a Perfect Bitcoin Alias Still Isn’t Anonymous#
On-chain privacy and text-inference privacy are two different threat models, and tools that solve one do nothing for the other. CoinJoin, Silent Payments, and Monero protect the transaction graph; they do not touch the forum posts, support requests, and social replies that link your alias to you. This is the gap I see Bitcoin-privacy guidance miss most often: it treats anonymity as an on-chain property when, for a named pseudonym, the cheapest attack is entirely off-chain.
Consider the shape of it. You can break the link between your coins and your identity perfectly — coinjoined UTXOs, a fresh address per payment, no KYC anywhere. None of that matters if you also run a pseudonymous account where you describe your node setup, your time zone, and your opinions in a voice a model can match to your other writing. The chain in the previous section does not read the blockchain at all; it reads you. Chain analysis and text inference can even be run side by side — one clusters your transactions, the other attaches a person to the cluster — but you do not need the on-chain half for the off-chain half to work.
So the correct mental model is additive, not either/or. On-chain privacy is necessary and worth doing; it is simply not sufficient for someone whose threat model includes being named. If you maintain a Bitcoin pseudonym, the text-OPSEC in the next section is the half of the work that the privacy-coin conversation usually leaves out.
| Privacy technique | What it protects | What it does not touch |
|---|---|---|
| CoinJoin / Silent Payments | The on-chain transaction graph | Forum posts, writing style, timestamps |
| Monero / privacy coins | Amounts, sender, receiver on-chain | Off-chain text that names the spender |
| VPN / Tor | Network-layer IP correlation | What you actually write, anywhere |
| Account separation alone | The obvious name link | The inferable link from patterns |
Breaking the Chain: A Compartmentation Playbook for the AI Era#
The defense that works is compartmentation aimed at the inference chain, not at any single post — making your contexts share as few linkable features as possible so stage two has nothing to join. Deletion is not on this list, because removing one post rarely removes the pattern that exposed you; prevention at the point of publication is the only control that fully holds.
- Separate identities, all the way down. A pseudonym is only as strong as its least-separated layer: different username, different email, different device or browser profile, different network. Shared infrastructure is the easiest join of all.
- Diversify the linguistic fingerprint. This is the defense most people skip. Vary register between identities — formal in one, casual in another — and avoid the signature phrases, emoji habits, and punctuation tics that a model uses to cluster your writing. Reusing a memorable turn of phrase across two accounts can undo every other precaution.
- Randomize timing. Posting on a fixed daily schedule in your real time zone is a location and routine signal. Spread activity, add jitter, and do not let your “anonymous” account keep office hours in your own city.
- Strip metadata before anything leaves your hands. EXIF location in photos, document properties, and consistent ISP correlation are confirmations a model is glad to use. Remove them at the source.
- Retire pseudonyms on a schedule. An identity accumulates inferable history the longer it lives. For higher-risk personas, periodically retiring and re-establishing a handle resets the baseline an adversary has built.
None of these is exotic; together they are the difference between being the cheapest profile in a forum to resolve and being one the attack skips. For the tooling layer — a no-logs VPN, a separate mailbox, identity-separation utilities — the EFF’s Surveillance Self-Defense is a level-headed reference, and the principle is the same one this site applies to itself: use the smallest set of tools that actually break a link, and disclose them honestly rather than chase a checklist.
Before AI, This Took a Human and a Lot of Time#
It helps to be precise about what changed, because the headline cases everyone remembers were not AI at all — they were slow, manual, human work. The shift AI introduces is not a new capability so much as the removal of the cost and patience those cases used to require. Framing the older incidents honestly is the point: they show how much friction used to protect you, and therefore how much you lose when it disappears.
The streamer known as Dream was located in 2021 after fans matched a kitchen photo to a real-estate listing on Zillow — human eyes, a public database, no inference model in sight. The harassment campaign against the activist Keffals in 2022 ran on hand-collected OSINT and a forum’s collective effort, not a machine. The 2023 doxxing of students over a campus statement ran on manual archive research and paid targeted advertising. Every one of these took motivated people and real time. That was the tax that kept most pseudonyms safe: an adversary had to want it enough to spend hours.
The deanonymization chain removes the tax. What a forum mob once did to one target over days, an agent can now attempt against an entire community for a few dollars a head — and it does so without ever getting tired or bored. This also lands unevenly. Impersonation, fabricated intimate imagery, and the harassment-to-doxxing pipeline fall disproportionately on women and on anyone with a motivated antagonist, which makes inference resistance a matter of bodily and reputational safety, not only data hygiene. The protections in the previous section matter most for exactly the people the old, expensive version of this attack already targeted.
Bottom Line — How Much Compartmentation Do You Actually Need?#
The right level of effort is the one that matches who you are protecting yourself from — there is no single setting, only a threat model.
- If you have no specific adversary: the highest-leverage moves are linguistic and temporal. Don’t reuse a distinctive handle or writing style across accounts you want kept apart, and don’t post your “anonymous” identity on your own clock. Skip the heavier tooling until you have a reason.
- If you maintain a real pseudonym — a creator, a writer, anyone whose name and alias must not connect: compartment ruthlessly across device, network, and language, and assume the on-chain half of your privacy does nothing for the off-chain half.
- If you carry asymmetric risk — women facing harassment, activists, public-facing professionals: treat linguistic diversification and out-of-band verification as non-optional, and plan for identity retirement before you need it.
Across all three, the same truth holds that held before machines entered the picture: you cannot reliably delete your way to safety after the fact. You can only model the adversary you actually have, break the chain at the link you can afford to defend, and publish less of what a machine would be glad to keep.
Frequently Asked Questions#
Can AI really deanonymize me from anonymous posts?#
Often, yes. Anonymity by omission — leaving your name off a post — is weak against inference, because a model can derive location, employer, and other attributes from patterns in how and when you write, then match those signals against public profiles. In peer-reviewed testing (Staab et al., ICLR 2024) models inferred personal attributes from plain Reddit text at up to 85% top-1 accuracy. Strong unlinkability comes from compartmentation — separate usernames, devices, networks, and a varied writing style — not from withholding your name.
Does deleting my old posts stop inference?#
Mostly no. Removing a single post rarely removes the pattern that exposed you, because the inference draws on consistent signals — your writing style, posting times, and recurring topics — spread across everything you have published. Deletion can reduce raw material at the margin, but the durable fix is preventing the linkable signal at the point of publication, not cleaning up afterward.
Do CoinJoin or a VPN protect me from this?#
They protect a different layer. CoinJoin and privacy coins defend the on-chain transaction graph; a VPN or Tor defends network-level IP correlation. None of them touches the forum posts, support messages, and replies that a model reads to link a pseudonym to a person. They are worth using and simply not sufficient on their own — the text-OPSEC in this article is the complementary half.
What raises the cost of deanonymization the most?#
Linguistic and contextual compartmentation. The deanonymization chain is strongest where you are most consistent, so the highest-leverage habit is to keep identities that must not connect from sharing a writing style, a posting schedule, and shared infrastructure. It is unglamorous and it is what actually raises an adversary’s cost above the few dollars the automated attack now requires.
| # | Source | URL | Archived |
|---|---|---|---|
| 1 | Staab et al. — “Beyond Memorization: Violating Privacy via Inference with Large Language Models” (ICLR 2024) | https://arxiv.org/abs/2310.07298 | https://web.archive.org/web/*/https://arxiv.org/abs/2310.07298 |
| 2 | Lermen et al. — “Large-scale online deanonymization with LLMs” (arXiv preprint, 2026) | https://arxiv.org/abs/2602.16800 | https://web.archive.org/web/*/https://arxiv.org/abs/2602.16800 |
| 3 | Simon Lermen — “Large-Scale Online Deanonymization” (author explainer, 2026) | https://simonlermen.substack.com/p/large-scale-online-deanonymization | https://web.archive.org/web/*/https://simonlermen.substack.com/p/large-scale-online-deanonymization |
| 4 | Electronic Frontier Foundation — Surveillance Self-Defense (threat-modeling and compartmentalization guides) | https://ssd.eff.org/ | https://web.archive.org/web/*/https://ssd.eff.org/ |
Two threads from elsewhere on this site connect here directly. The four assumptions AI breaks — with inference as one of them — are mapped in OPSEC in the AI Age: Rebuilding Your Threat Model, of which this article is the inference deep-dive. And because inference feeds on everything you have ever published, the audit of what actually survives deletion lives in How Permanent Is Your Social Media Footprint?. When the data being correlated was taken from an institution rather than posted by you, the related playbook is When the Government Leaks Your Data; for inference applied inside the workplace, see What Your Employer’s Slack Monitoring Actually Sees.