Skip to main content

Audit Your Own Post History the Way an AI Would (2026)

·2719 words·13 mins
Cora Aegis
Author
Cora Aegis
Privacy is the right; the tools are how we exercise it.
Table of Contents
AI-Age OPSEC - This article is part of a series.
Part : This Article
A woman with short silver hair and calm red eyes, lit from below by a wall of her own scattered posts — comment fragments, timestamps, a map pin and a small camera icon — converging into one outlined silhouette

A note on funding: CypherpunkGuide carries no surveillance advertising — no ad networks, tracking pixels, or sponsored content. It is funded by transparent streams: reader donations now; subscription and editorially-aligned affiliate later. We answer to our readers, not to advertisers. The audit tool referenced below is free and open-source.

I write under a pseudonym, and the companion to this piece — AI Deanonymization: How Inference Undoes Your Anonymity — lays out how a model turns scattered posts into a name, and how to compartment going forward. This article is about the half that prevention can’t reach: the years of posts you have already published. That archive is sitting in public right now, and it is the exact corpus the attack reads. The honest question is not “what will I post carefully from now on” but “what does everything I’ve already said add up to” — and the only way to know is to look at it the way the machine does.

The good news is that you can. Your own export is something only you can pull, and reading it adversarially is a skill, not a secret. The bad news is the most natural way to do that reading — paste it into an AI and ask “what does this reveal about me?” — is also the single move most likely to make things worse. We’ll get to why. First, the thing you can’t feel from inside your own timeline.

The Mosaic Is the Part You Can’t Feel
#

The danger is not one careless post; it is the aggregate. Re-identification works by stacking many individually-innocuous signals — a commute, a slang word, a timestamp — until they intersect at one person. This is the “mosaic effect,” and you cannot sense it from inside your own feed, because each tile looks harmless on its own. The mosaic is old. In 2000, Latanya Sweeney showed that roughly 87% of Americans could be uniquely identified by just three public facts — ZIP code, gender, and date of birth (from 1990 census data; a 2006 reanalysis put it nearer 63%, and the pattern holds either way). In 2006, a New York Times reporter named an “anonymous” AOL searcher from her query logs alone; in 2008, researchers re-identified Netflix users by cross-referencing the service’s “anonymized” ratings against public IMDb reviews. None of those used AI. They used aggregation.

What AI changed is the price. In a peer-reviewed study at ICLR 2024, Beyond Memorization, ETH Zurich researchers showed that off-the-shelf models infer attributes — location, occupation, sex, income — from ordinary Reddit text at roughly 85% top-1 accuracy averaged across eight attributes (with wide variation between them), at roughly 100× lower cost and 240× faster than human investigators. Newer work industrializes it: AutoProfiler (Du et al., ACL 2026) runs a four-agent pipeline that pulls a pseudonymous post history (via platform APIs) and assembles a profile automatically, “at web scale.” The point is not that any single post doxxes you. It is that a machine can now afford to read all of them, together, and notice the intersection you never could.

On X, the Leak Usually Isn’t the Words
#

On Reddit the mosaic is mostly text. On X it is mostly metadata — and a text-only mental model is dangerous reassurance. Your self-set location field, your posting times, your image EXIF, your outbound links, and who you reply to often say more than anything you actually wrote. A pseudonymous account can be careful about its sentences and still leak through the scaffolding around them. Posting-time concentration is the clearest example: if your “anonymous” account keeps office hours, the histogram of when you post quietly hands over your time zone and your waking life.

Images are worse than people think, in two layers. Most platforms strip EXIF GPS from public uploads — but not from every path (direct messages, some API and scheduling tools, and chat “file” modes can retain it), so older media is worth checking. And even when the GPS tag is gone, the picture itself geolocates: a 2024 study, Image-Based Geolocation Using Large Vision-Language Models, found that vision-language models place photos from visual content alone — winning 85.37% of GeoGuessr-style matchups over 50,000 images, sometimes to within 0.3 km. Stripping metadata is necessary; it is not the whole job.

Metadata layer (mostly X)What it quietly revealsWhere to look in your export
Self-set “location” fieldA real region, in your own wordsprofile.js / your bio
Posting timestampsTime zone and daily routinetweets.js created_at
Image EXIF + photo contentExact place; device; even EXIF-free geolocationtweets_media/ images
Outbound linksYour other sites and identitiesURL entities in posts
Replies and mentionsThe social graph that already knows youmention entities

Read Your Own History Like an Adversary
#

The audit is a deliberate inversion: stop reading your timeline as a person reminiscing and start reading it as a stranger hunting. Pull your full export, then go category by category asking not “is this embarrassing” but “does this narrow who I am.” You can request your data from Reddit (Settings → Privacy → request a copy) and from X (Settings → Your account → download an archive). Both arrive as a structured file you can read offline. Then work the categories below — and weigh weak signals, not just obvious ones, because the mosaic is built from the weak ones.

A useful discipline: judge each finding by risk contribution, not by how revealing it feels in isolation. Twenty-eight posts that each mention a neighborhood landmark are a bigger problem than one post that names your employer once, because the twenty-eight intersect. Look for clusters and consistency — the same handle, the same turns of phrase, the same 7 a.m. posting slot — because consistency is exactly what a later search-and-match stage uses to find a join.

CategoryWhat to search your own history forHow to soften it
LocationCommutes, local events, “near the…”, neighborhood landmarks, geotagged photosGeneralize to region; strip/skip image EXIF; coarsen the bio field
Employer / incomeRole + team size + tech stack, “we’re hiring,” salary or holdings hintsDrop the distinctive combination; avoid recruiting-from-your-account posts
FamilyKids’ ages and schools, partners, routinesRemove specifics; remember relatives didn’t consent to be findable
ScheduleFixed daily times, “every weekday,” posting-time concentrationVary timing; never run an alias on your real clock
Identity linksReused handle, links to a personal site, device model in EXIFDon’t reuse handles; remove outbound personal links; strip device tags

The Privacy Audit That Deanonymizes You
#

Here is the trap, and almost no one names it. The obvious way to audit your history is to paste it into a capable AI and ask what it reveals. If the account you’re checking is a pseudonym you keep apart from your legal name — and the AI you ask is logged into your real identity — you have just handed one company both halves of the link you were protecting. The audit becomes the breach. Think it through. A cloud provider now holds, under your real-name account, the full post history of your “anonymous” persona, with a prompt that explicitly asks how the two connect. That association can surface later through a subpoena, a breach, or an insider — the precise failure you were auditing to prevent, except you created it yourself.

This does not mean cloud AI is forbidden. The risk is conditional. If you are auditing your real-name, public account, there is no anonymous identity to expose, so the deanonymization risk does not apply — though sending a full export to any cloud service still means a third party processes its contents under their terms, so check what yours holds first. The acute danger is specifically the pairing of an anonymous account with a real-name AI account. For that case, keep the analysis where no one else can see it.

If you are auditing…Cloud AI (real-name account)Local model (offline)
Your real-name / public accountNo deanonymization risk — still review export contents firstFine, just slower
A strict pseudonym you keep separateAvoid — creates the real↔alias linkRecommended — nothing leaves your machine

The clean version of this audit runs locally: an open-source, local-first tool that parses your export and reports, by category, what it leaks — without ever sending your posts anywhere, and without writing a profile of you to disk. (I built one for exactly this; the link will live here on release.) If you must use a cloud model on a sensitive account, prefer a service built for crypto payment and minimal-identity sign-up — within its own terms — over a mainstream account tied to your real name and card. As of June 2026, for instance, OpenRouter offers an OpenAI-compatible API that accepts USDC and needs only an email or a wallet, and Venice is privacy-first with a no-account, pay-per-request crypto path and an OpenAI-compatible API — both plug straight into this tool’s cloud option. None of this is true anonymity: a wallet, an email, or network metadata can still remain, your prompts still reach a third party (with a router like OpenRouter, the model provider behind it too), and these privacy claims are largely vendor-stated rather than independently audited — so check each provider’s current terms, and remember that running locally is the only path that sends nothing at all.

What to Do With What You Find
#

Resist the urge to mass-delete. Removing one post rarely removes the pattern that exposed you, and deletion is not erasure: archives, search caches, screenshots, and other people’s copies persist long after you hit the button. The higher-leverage move is to generalize and edit the highest-contribution items — turn “the 8:07 ferry from my neighborhood” into “my commute” — and then to change what you publish going forward. For the full picture of what actually survives a deletion, see How Permanent Is Your Social Media Footprint?; for the prevention side — compartmenting identities so the mosaic has nothing to join — the playbook is in AI Deanonymization, and the broader rebuild of assumptions is mapped in OPSEC in the AI Age.

It is worth being honest about the limits. An audit of your own export is a closed-set exercise: it sees what you provided, not the open world an adversary draws on — data brokers, breaches, the reply graph, your writing style across services. A 2025 study of 240 people (Wang et al.) found users judged which of their own snippets were risky only slightly better than chance, and their rewrites successfully reduced inference in just 28% of cases. So treat the audit as risk reduction, not a clean bill of health — and re-check after you edit, because lowering the score is the only proof an edit worked.

Who This Matters For Most
#

Inference resistance is data hygiene for most people and physical safety for some. The retroactive audit matters most for those an adversary is already motivated to find. Harassment-driven doxxing, impersonation, and fabricated imagery fall disproportionately on women, and the same retroactive exposure threatens abuse survivors, LGBTQ people in hostile environments, dissidents, and journalists’ sources — anyone for whom an old, forgotten post is a present-tense risk. The case studies in How Streamers Get Doxxed show the pattern in the open; if that is your threat model, the audit is not optional housekeeping but maintenance you schedule.

Frequently Asked Questions
#

How do I get my Reddit and X post history to audit?
#

Request an export from each platform. On Reddit: Settings → Privacy & Security → “Request a copy of your data,” which returns CSV files of your comments and posts. On X: Settings → Your account → “Download an archive of your data,” which returns a folder of tweets.js, account.js, profile.js, and a tweets_media image folder. Both let you read your full history offline, which is the safe way to analyze it — you never hand it to a third party just to look at it.

Is it safe to ask ChatGPT or another cloud AI to check my posts?
#

It depends entirely on the account. If you are auditing your real-name or public profile, there is no anonymous identity to expose and a cloud model is fine. If you are auditing a pseudonym you keep separate from your legal name, sending its history to an AI logged in under your real identity links the two on that provider’s servers — the exact deanonymization you were trying to prevent. For that case, use a local, offline model, or a cloud account opened and paid for anonymously.

Should I just delete my old posts?
#

Usually not wholesale. Deleting one post rarely removes the pattern that exposed you, and deletion is not erasure — archives, caches, and screenshots persist, and platforms keep deleted content on their own servers for a window (Reddit, for instance, around 90 days) that legal process can still reach. The higher-leverage move is to generalize or edit the highest-risk items (a specific time and place becomes a vague one) and to change what you publish going forward. Re-audit afterward to confirm the change actually lowered your exposure.

Can’t I just strip EXIF from my photos and be done?
#

Strip EXIF — it’s necessary — but it is not sufficient. Vision-language models can geolocate a photo from its visual content alone, with no metadata at all (Liu et al., 2024, found accuracy to within 0.3 km in some cases). A storefront, a skyline, a transit sign, or a window view can place an image even after every tag is removed. Treat backgrounds, not just metadata, as part of what a picture discloses.

How accurate is AI at this, really?
#

Accurate enough to take seriously, and cheap enough to be run against everyone. Peer-reviewed work (Staab et al., ICLR 2024) put GPT-4 at roughly 85% top-1 accuracy averaged across eight attribute categories (with wide variation between them) from plain Reddit text; a 2026 preprint (not yet peer-reviewed) linked roughly two-thirds of a sample of Hacker News users to their real LinkedIn profiles at 90% precision for about one to four dollars each. The numbers vary by task and are not perfect — but the friction that used to protect you, a human spending hours, is gone.

#SourceURLArchived
1Staab et al. — “Beyond Memorization: Violating Privacy via Inference with LLMs” (ICLR 2024)https://arxiv.org/abs/2310.07298https://web.archive.org/web/*/https://arxiv.org/abs/2310.07298
2Du et al. — “Automated Profile Inference with Language Model Agents” / AutoProfiler (ACL 2026 Findings)https://arxiv.org/abs/2505.12402https://web.archive.org/web/*/https://arxiv.org/abs/2505.12402
3Lermen et al. — “Large-scale online deanonymization with LLMs” (arXiv preprint, 2026)https://arxiv.org/abs/2602.16800https://web.archive.org/web/*/https://arxiv.org/abs/2602.16800
4Liu et al. — “Image-Based Geolocation Using Large Vision-Language Models” (2024)https://arxiv.org/abs/2408.09474https://web.archive.org/web/*/https://arxiv.org/abs/2408.09474
5Wang et al. — “Beyond PII: How Users Attempt to Estimate and Mitigate Implicit LLM Inference” (2025)https://arxiv.org/abs/2509.12152https://web.archive.org/web/*/https://arxiv.org/abs/2509.12152
6Electronic Frontier Foundation — Surveillance Self-Defensehttps://ssd.eff.org/https://web.archive.org/web/*/https://ssd.eff.org/
AI-Age OPSEC - This article is part of a series.
Part : This Article

Related