
A note on funding: CypherpunkGuide carries no surveillance advertising — no ad networks, tracking pixels, or sponsored content. It is funded by transparent streams: reader donations now; subscription and editorially-aligned affiliate later. We answer to our readers, not to advertisers.
I publish under a pseudonym, and I am a woman, so this is the threat I weigh before I record anything. The old assumption behind a familiar voice or face was that it authenticated itself: if your mother heard your voice on the phone, it was you, because forging it required your participation. That assumption is gone. The same biometric features you treat as proof of “you” — the timbre of your voice, the geometry of your face, even the rhythm of your writing — are now raw material a model can use to impersonate you, from samples you published yourself.
This is the fourth broken assumption from the AI-age threat model, and it deserves its own treatment because the defense is unusual: it is almost entirely preventive. You cannot recall a voice sample, and as we will see, you cannot reliably make a model forget one. So the work is front-loaded — what you release, and what you agree in advance with the people who would be targeted through you. Below is the dual nature of the problem, why it falls unevenly on women and on anyone who publishes under a name, the minimisation that lowers your exposure, and the full verification protocol that the previous article only promised.
Your Biometrics Became Logins and Targets at the Same Time#
A credential is something that proves identity; an attack surface is something an adversary can exploit. Voice, face, and writing style are now both at once — the same features that vouch for you also let a model forge you. The collapse is recent and measured. Microsoft researchers showed in 2023 that their VALL-E model could synthesise a speaker’s voice from only a three-second sample; a handful of photos is enough for a convincing synthetic likeness; a corpus of your posts is enough to mimic how you write. None of this requires your cooperation beyond having published in the first place.
What makes this a credential problem and not just a forgery problem is that institutions started trusting biometrics precisely as they became cheap to fake. Banks deployed voiceprint phone authentication; families rely on a recognised voice; assistants unlock to a face. The U.S. Federal Trade Commission flagged the consequence directly, launching a Voice Cloning Challenge in November 2023 and publishing Approaches to Address AI-enabled Voice Cloning in April 2024. The thing that authenticates you is now the thing that compromises you.
| Your biometric | Trusted today as a credential by | Now also an attack surface because |
|---|---|---|
| Voice | Bank phone-ID, family trust, voice assistants | A ~3-second clip yields a convincing clone |
| Face | Photo-ID checks, social proof, device unlock | A handful of images yields a synthetic likeness |
| Writing style | “It sounds like them” | A corpus of posts enables style transfer |
The defensive consequence is that you should stop thinking of these as self-authenticating. A voice on the phone is no longer proof; a face in a video is no longer proof. Everything downstream in this article follows from accepting that.
Why This Lands Hardest on Women and Pseudonyms#
This risk is not evenly distributed. Impersonation, fabricated intimate imagery, and voice-based fraud fall disproportionately on women and on anyone with a motivated harasser — which makes it a question of bodily and reputational sovereignty, not merely data hygiene. The evidence is consistent across sources. A 2019 Deeptrace study found 96% of deepfake videos were pornographic and that effectively all targeted individuals were women; a 2023 industry survey by the deepfake-tracking firm Security Hero put the pornographic share at 98%, with 99% of targets being women. These are tracking studies, not government data — but their direction is corroborated by harder reporting.
In December 2024, the American Sunlight Project found that roughly one in six women in the U.S. Congress — about 16% — had been depicted in non-consensual deepfake imagery, and that women were targeted some 70 times more often than men (first reported by The 19th). UN Women, reviewing the broader pattern, notes that more than half of deepfake victims in the United States contemplated suicide, and that digital violence routinely spills into offline harassment. The harm is not abstract reputational risk; it is targeted, gendered, and designed to silence.
For a pseudonymous creator the bind tightens into a contradiction. A named persona is built on voice and presence — a podcast, a talk, a face that makes the work feel human — yet every clean recording and every face-forward photo is also training data for someone who wants to impersonate that persona or attach it to my legal self. Minimisation, the first defense below, trades directly against reach. I will not pretend that tension away; I will show how to manage it instead of being managed by it.
Prevention First: Minimise the Samples You Publish#
The first lever is minimisation: reduce the volume and clarity of the raw biometric samples you put into public, accepting that this is mitigation, not a cure. This is the same logic that governs AI-scale deanonymization — the cheapest attack reads what you already published, so the highest-leverage control is upstream of any takedown. A clone’s quality is bounded by its training material. Long, clean, solo recordings are the ideal sample; noisy, short, co-present audio is a poor one. You get to choose which you supply.
Concretely, that means separating the named persona’s media from high-fidelity biometric capture wherever you can, and stripping the metadata that pins a sample to a time and place. For a public creator the goal is not silence — it is deliberate degradation of sample quality relative to reach: co-hosted audio instead of solo monologue, an illustrated avatar carrying the named identity instead of a face tied to a legal name, and a hard refusal to let your voice double as an authentication factor.
| What you publish | The risk it creates | Lower-exposure alternative |
|---|---|---|
| Long, clean, solo voice recordings | A high-fidelity training sample | Shorter clips; co-hosted audio; ambient noise/music under voice |
| Face-forward photos tied to your legal name | A likeness and an identity link | An illustrated avatar for the named persona; keep any real face off the legal name |
| Voiceprint as a bank/login factor | A clone becomes a working credential | Disable voice authentication; use a non-biometric second factor |
None of this is a cure, and saying otherwise would be dishonest. Samples already public stay public, and a determined adversary can work with poor material. Minimisation lowers the probability and the fidelity of a successful clone; it does not zero them. That is exactly why it is paired with the second lever, which assumes a clone will eventually exist.
The Verification Protocol, In Full#
The second lever is pre-registered trust: agree, in advance and out of band, on a verification step with the people who could be targeted through you — so a cloned voice cannot manufacture urgency. Most advice stops at “pick a family safe word.” That is the right instinct and an incomplete protocol. A safe word works not because it is secret but because it forces a second, attacker-controlled-channel-independent check at the moment urgency is weaponised. Build the whole mechanism around that principle, not around a single shared phrase.
The design rule is simple: the verification must never travel on the same channel as the request. A cloned voice controls the inbound call; it does not control a callback to a number you already hold, or a private memory it was never trained on. Episodic memory — a specific shared moment, not a fact that could be posted anywhere — is the part of you a model cannot synthesise.
| Protocol element | How to set it up | Why a clone can’t beat it |
|---|---|---|
| Out-of-band rule | Verify on a different channel than the request arrived on (a call → a text to a known number) | The clone controls one channel, not a second independent one |
| Lived-memory challenge | A question answered only from a shared experience, never posted; rotate it | Models synthesise voice, not private episodic memory |
| Callback discipline | Hang up; call back the number you already have stored | Defeats spoofed caller ID and time pressure |
| Duress signal | A pre-agreed word meaning “I am being coerced — comply and get help” | Covers the case where the person is real but compelled |
| Pseudonym extension | For pseudonymous contacts, pre-share a one-time token out of band, not tied to legal identity | Lets a pseudonym verify without de-pseudonymising |
That last row is the piece written for people like me, and the one no family-safe-word guide covers. If your trusted contacts know you only as a pseudonym, you cannot fall back on shared family history without breaking the wall between persona and person. A one-time verification token — exchanged once over an encrypted channel, used to bootstrap a rotating challenge — lets a network of pseudonymous collaborators authenticate each other without anyone learning a legal name. The protocol scales from a two-person household to a distributed activist or creator network precisely because it never depends on a shared legal identity, only on a shared secret established out of band.
“Just Delete It” Doesn’t Work — Which Is Why Prevention Is the Whole Game#
Prevention carries the weight here because deletion does not load-bear. Removing a voice or likeness from a trained model is, at production scale, still a research-stage capability — not a button you can press today — so the control that actually works is not releasing the sample. This is the same hand-off as the permanence of your published footprint: timing beats cleanup, because ingestion is continuous and removal is partial.
The research is honest about its own limits. MIT Technology Review reported in July 2025 that researchers can make a text-to-speech model “unlearn” a specific speaker, but the process takes days, slightly degrades the model’s permitted voices, and in the researchers’ own words “would need faster and more scalable solutions” for real use. So the accurate statement is not “deletion is impossible” — it is that machine unlearning is still a research-stage capability, not a button you can press today. Treat any “remove my voice” offering as partial and forward-looking, not as an undo.
Which reorders everything. If the sample, once public, is effectively permanent, then the only fully effective control sits before publication — and the second-best control is the verification protocol that assumes the clone exists. Detection tools and takedown services have their place, but they are the outer, weakest ring. The inner rings — minimise, and pre-register trust — are the ones you control completely.
Key Takeaways#
- Voice, face, and writing are now credentials and attack surfaces at once. Stop treating a recognised voice or face as self-authenticating proof.
- The defense is preventive, not reactive. A ~3-second clip clones a voice; you cannot recall a sample, and unlearning is not yet production-ready.
- The threat is gendered. Synthetic intimate imagery and impersonation fall overwhelmingly on women and public pseudonyms — this is bodily and reputational sovereignty, not mere data hygiene.
- Minimise sample quality relative to reach. Co-hosted audio, avatars for the named persona, no voiceprint logins, stripped metadata.
- Pre-register an out-of-band verification step. Callback discipline, a lived-memory challenge, a duress signal, and — for pseudonyms — a one-time token that verifies without de-pseudonymising.
Frequently Asked Questions#
Can AI really clone my voice from a short clip?#
Yes. A 2023 Microsoft research model demonstrated voice synthesis from a three-second sample, and commercial tools now offer similar short-sample cloning. In a 2025 UC Berkeley study (Barrington & Farid, Scientific Reports), listeners mistook such clones for real voices roughly 80% of the time. The practical takeaway is to treat any clean, public recording of your voice as a usable sample, and to reduce how many of them exist.
Do family “safe words” actually work?#
They work when they force a check on a channel the attacker doesn’t control — which is why the stronger version is a callback to a known number plus a question answered only from private, shared memory, not a single static phrase. A password can be guessed, overheard, or socially engineered; a rotating lived-memory challenge plus a duress signal is far more resilient. The phrase is the seed of the protocol, not the whole of it.
Can I remove my voice or face from AI models that already trained on it?#
Not reliably, at scale, today. Researchers can make a model “unlearn” a speaker, but the process is slow, imperfect, and not yet deployed in production systems (per MIT Technology Review, 2025). Opt-outs and “do not train” signals mostly affect future ingestion where platforms honour them. Treat removal as partial and forward-looking — which is exactly why minimising what you publish matters more than any takedown.
Why frame this as a women’s issue specifically?#
Because the data is lopsided. Tracking studies put women at the overwhelming majority of deepfake-pornography targets, and an American Sunlight Project study found about one in six women in Congress depicted in non-consensual imagery — roughly 70 times the rate for men. A defense that ignores who is actually targeted will under-protect the people most at risk, so the protocol here is built for the harasser-and-impersonation threat model, not only the fraud one.
What is the single most effective step?#
Stop letting your voice or face act as an authentication factor — disable voiceprint banking and biometric “something you are” logins where a non-biometric second factor exists. It is the one move that removes a working credential from the attacker’s reach immediately, while minimisation and the verification protocol do the slower structural work.