Confession: I Accidentally Published Victim Names

Then I spent six hours doing what the DOJ hasn’t done in four months.

Feb 22, 2026

I run epstein-data.com, a public research database hosting the full OCR text of 1.38 million documents from the DOJ’s Epstein Files production. On February 21, 2026, I discovered that the database, which had been live and searchable for weeks, contained the real names, phone numbers, and Social Security numbers of trafficking victims.

This is what happened, what I did about it, and what it taught me about the DOJ’s claim that “victim protection” required removing 67,784 documents from justice.gov.

How It Happened

On January 30, 2026, the DOJ published its final major release of Epstein Files: over 3 million pages, bringing the total to roughly 3.5 million. More than 500 attorneys had spent more than two months reviewing the material, working through Christmas and New Year’s, with a two-tier review process and a court order requiring the SDNY U.S. Attorney to personally certify that victim-identifying information would be redacted.

Within days, victims’ attorneys reported thousands of redaction failures on behalf of nearly 100 survivors. Full names, Social Security numbers, credit card numbers, dates of birth, all published to justice.gov. One FBI document containing 32 minor victims had only one name removed, leaving the rest exposed. One victim received death threats after her information was exposed. Another told the court: “I have never come forward! I am now being harassed by the media and others. This is devastating to my life.”

Victims’ attorneys Henderson and Edwards wrote to the court that this was “never a complex undertaking” because the DOJ “has possessed the names of victims that it promised to redact for months.” They called the January 30 release “what may be the single most egregious violation of victim privacy in one day in United States history.”

I had built my database from the same source files. If the DOJ had published victim names, so had I.

What I Found

Before I ever put these databases online, I did check for victim PII. I searched for obvious identifiers: names that had appeared in public court filings, known pseudonyms, patterns that looked like Social Security numbers. I wasn’t careless about this. But here’s the problem: I didn’t have the victim list. I didn’t know who these people were. You can’t search for names you don’t have.

The DOJ had the names. They’d had them for years. Victims’ attorneys provided them directly as part of the case. I was an independent researcher working with OCR text.

On February 21, I was preparing the substack article on 67,784 removed documents, posted yesterday. To write it, I was reading removed documents from our OCR corpus, describing what the public was missing.

Around 7:40 AM, I found it. Victim names. Real names, not pseudonyms, sitting in prosecution correspondence that had been published to the DOJ’s website and mirrored to ours. Not in some obscure corner of the corpus, but in routine case files that had been published to the DOJ's website and mirrored to ours. Full names, phone numbers, dates of birth, home addresses, Social Security fragments. All searchable on my public site. My heart dropped, and I was nearly sick to the stomach.

I won’t say which documents, because some of this material is still unredacted on other sites that mirror the DOJ production. I don’t want to draw a map. What I can say is that several of the documents I was analyzing were contaminated with victim PII. It was bad.

This is what the DOJ page for the BOP Psychological Reconstruction of Inmate Death for Epstein *still* shows, as of this morning, February 22nd 2026.

How I Fixed It (Eventually)

I want to be honest about this part. It was not a clean process.

First I tried regex: pattern-matching rules for phone numbers, addresses, and victim surnames. I ran it on one of the longer documents. Out of 54 modifications, 43 were wrong. The regex ate case numbers, redacted Epstein’s own address, and caught unrelated places because a victim shared the surname. I restored everything from backup.

Then I tried named entity recognition, standard NLP software for detecting names in text. It drowned in OCR noise. Police booking forms with fragmented text produced thousands of false detections. Unusable. Restored from backup again.

I tried starting over simple: redacting school names. One school returned over 500 hits across the corpus: court filings, news articles, unrelated documents. The false positive rate was catastrophic.

By 9 AM, I had three failed approaches and was seriously considering taking the whole database offline.

The breakthrough came from accepting what I couldn’t do. I couldn’t scan 2.77 million pages for “anything that looks like PII.” The error rate was too high. But now, for the first time, I had what I’d never had before: the actual names. The documents I’d stumbled across contained victim contact information, and that meant I could build a specific list and search the rest of the corpus for exactly those terms.

This is the part that should bother you about the DOJ’s approach. The hard part of my day was not having the names. I spent two hours flailing because I was trying to find PII I couldn’t identify. The moment I had the names, the problem became solvable. The DOJ had those names from day one. Victims’ attorneys had provided them years ago. They never had the problem I had. They only had the easy part.

The corpus-wide scan took 95 seconds. It returned over 4,000 hits across over 400 documents. The same names and phone numbers had been copied, forwarded, quoted, and referenced across hundreds of prosecution emails, FBI reports, and court filings. The contamination was far worse than I’d realized.

From there, the redaction was methodical. Full names, phone numbers, and Social Security numbers got simple search-and-replace. Dates of birth and addresses needed context classification, because a date like “6/30/1987” could be a victim’s birthday or a settlement date, so I checked surrounding text for victim-context indicators. Rare surnames got a whitelist system for cases where non-victims shared the name.

I removed all text that matched, and had Claude do a full sweep of any document that matched, in case other contextual clues might give someone a roadmap to re-traumatizing these victims.

By early afternoon, roughly six hours after the first discovery, every victim name, phone number, Social Security fragment, date of birth, and home address I could identify had been removed from over 400 documents across approximately 1,400 pages. Every original page was backed up before modification. Every redaction was logged. The database was redeployed and verified clean. Thousands of individual redactions across three passes, with a handful of false positives caught and excluded.

Not a single document was removed from public access.

What the DOJ Did Instead

3.5 million pages is a lot of material, and victim privacy wasn’t DOJ’s only obligation. They were also reviewing for grand jury material, information that could compromise ongoing investigations, and content protected by court orders. That is real, legitimate work.

But their response to the victim PII failures was not targeted redaction. It was removal. The DOJ told the public it had taken down “several thousand documents” for victim-identifying information. Our scan of every EFTA URL on justice.gov found that the actual number was 67,784 complete PDF files, pulled from public access entirely, content and all.

We’ve now categorized every one of them. The vast majority appear to have contained no victim information at all. Our full analysis is in yesterday’s article, with the complete technical report on our research site.

What I Think

The DOJ had a real victim privacy problem. I know because I had the same one.

But removing 67,784 entire documents is not a victim privacy solution. It’s something else.

Targeted redaction works. I know because I did it, badly at first, then better. One person, one computer, one AI coding assistant — one morning! And the hardest part was not having the victim list. The DOJ has had that list for years. They have hundreds of attorneys, unlimited computing resources, and a congressional mandate. They could do what I did, better and faster.

They could put those 67,784 documents back online tomorrow, with victim names properly redacted, and it would take them days, not months.

They haven’t.

Our full inventory of all 67,784 removed documents and 23,989 size-changed documents is publicly available at Epstein-research-data on GitHub. The complete OCR text of every document cited in this article, with victim-identifying information properly redacted, is searchable at epstein-data.com.

This analysis relies on Claude Code running Opus 4.6, which can make mistakes.

Edwin Henley

Feb 22

Thank you for your due diligence and persistence. I think the incompetence and misapplication of resources in this government is reducing the harm this administration is causing, leaving us with an intolerable but not catastrophic disaster that we are living through

gladys amato

This type of information on the mismanagement of the files needs to be publicized on FOX and major networks but of course that would never happen. It could be a game changer for the midterms

5 more comments...

Document Zero with Rye Howard-Stone

Discussion about this post

Ready for more?