Compliance· 6 min read

Is De-Identified Therapy Data Really Anonymous?

De-identified isn't the same as anonymous. Here's what HIPAA Safe Harbor removes, what it leaves behind, and why de-identified therapy transcripts can still raise real privacy concerns.

Not entirely. De-identification under HIPAA's Safe Harbor method removes 18 identifiers (name, email, dates, record numbers), but the clinical content of a session remains — and studies have shown de-identified data can sometimes be re-identified when combined with other datasets. "De-identified" lowers risk; it doesn't make data truly anonymous.

What HIPAA Safe Harbor actually removes

The Safe Harbor method requires stripping 18 specific identifiers: names, geographic subdivisions smaller than a state, all dates tied to an individual, phone numbers, emails, medical record numbers, and more. Once removed, the data is considered "de-identified" under US law.

De-identifiedTruly anonymous

|---|---|---|

Direct identifiers removedYesYes
Re-identification possibleSometimes, via linkageNo, by definition
Used to train AIFrequentlyRarely the point

Why de-identified ≠ anonymous for therapy

The content is still sensitive

A transcript stripped of names still contains the substance of a therapy session — trauma disclosures, relationship details, emotional patterns. For clinicians, the content is the confidential part, not just the name attached to it.

Re-identification is a known risk

Research has repeatedly shown that "anonymized" datasets can be re-identified when cross-referenced with other data. A landmark study found that 87% of the US population could be uniquely identified by just ZIP code, birth date, and sex (Sweeney, 2000). Rich clinical narratives can be even more distinctive.

"De-coupled" is a vendor promise, not a law

When a platform says transcripts are "de-coupled and cannot be re-linked," that's a description of their internal process — it's only as strong as their engineering and their incentives. (See Do AI scribes train on your therapy data?.)

What this means for your consent process

If you use a tool that retains de-identified data, your client consent should say so plainly. Telling a client "it's anonymized" when it's technically "de-identified and used to train AI" risks misleading them. See client consent for AI note-taking for language that holds up.

Where Eclio stands

Eclio sidesteps the entire debate: we don't retain your transcripts to train AI, de-identified or otherwise. And our upcoming local mode keeps the transcript on your device, so there's no copy on a vendor's servers to de-identify in the first place.

Frequently Asked Questions

Is de-identified the same as anonymous?

No. De-identified data has direct identifiers removed but retains clinical content and can sometimes be re-identified. Truly anonymous data cannot be linked back to a person at all.

Can de-identified therapy transcripts be re-identified?

Potentially. Studies show anonymized datasets can be re-identified when combined with other data, and rich clinical narratives are highly distinctive.

Do I need to tell clients if my AI tool de-identifies and retains data?

Yes. Informed consent should accurately describe what happens to the data. Saying 'anonymized' when data is de-identified and used for AI training can mislead clients.

Cut your documentation to 2 minutes per session.

Eclio generates SOAP, DAP, and BIRP notes automatically. Free during beta, works from anywhere.

Get early access — free