Party time! Introducing AIWhisperer. Feed massive files to AI with less data exposed
Plus three Perplexity Pro and one You.com annual subscription up for grabs.
Yesterday you got the story about how to build a tool. Today the tool is the story. I released AIWhisperer — analyze thousands of pages with AI, while reducing what sensitive data leaves your computer.
And because I apparently have the business instincts of a labrador retriever, I’m also giving away:
🎁 3x Perplexity Pro annual subscriptions 🎁 1x You.com annual subscription
To win? Just be a subscriber this month. That’s it. Already subscribed? You’re in. Not yet? Fix that before January 30th. I’ll draw winners on January 31st.
Why am I doing this? Because you helped me hit 10,787 subscribers today. ( I don’t do round numbers. I do exact counts and excessive footnotes.)
Yesterday’s article hit a nerve. Within hours, my inbox was filled with questions.
The article from yesterday described a workflow. Convert your PDF if a chatbot can’t read it because it’s too big. Write a script to detect sensitive data. Replace it with placeholders. Save a mapping file. Upload the sanitized text. Download the results. Decode. Repeat.
Simple, right? My friend in search Craig Silverman—cofounder of Indicator —agreed. He called me for a piece about the state of search, out soon on Indicator. (I told him: Google is still a friend, but I'm seeing others now.)
“I did understand it” someone else wrote. “But I have no idea how to make it myself. I’m a lawyer, not a programmer. Can you just give me something I can run?”
My point was: let AI write the code for you. Don’t fool around yourself. But magic has its price. Claude Code requires a $200-per-month Max subscription for heavy users. That’s $2,400 a year. For a tool. “I’m a freelance journalist,” one reader wrote. “I make $2,400 in a good month. I can’t spend that on a subscription.”
Fair.
One reader had somehow deleted their own documents while trying to process them. I still don't know how. I didn't ask.
A compliance officer emailed. She worked at a bank. Internal investigation. Documents to analyze. Couldn’t upload to cloud AI—regulatory reasons.
“Your article made it sound easy,” she wrote. “But I can’t write code via Claude or Gemini. I can’t install Python. I can’t do any of this. I just need something that works.”
She attached a screenshot of her desktop. Windows 10. Microsoft Edge. Outlook. Excel. No terminal. No command line. No development environment.
This is most people. This is normal people.
“Can you just give me a tool?”
I get it. The article was a tease. Here’s this amazing workflow that can save five days of work. Here’s how it works in theory. Here’s what I did. Good luck figuring it out yourself.
That's a recipe blog. Seventeen paragraphs about grandma before the ingredients. I’m doing the same thing right now.
So I packaged it. Tested it. Documented it. Put it on GitHub. Made it work even for people who don’t know what GitHub is. Working on a Windows and Mac version, in my second life.
It’s public domain. It’s a first version. If you are a coder, please help improve it. The tool tries to tackle two problems:
Problem 1: Too big to upload
Your files are too big to fail—but too big to upload.
ChatGPT: “Failed upload”
Claude.ai: “Files larger than 31 MB not supported”
Gemini: “File larger than 100 MB”
You compress the file. Adobe Acrobat shrinks it to 80 MB. Problem solved? No. Shrinking a PDF makes the file smaller. Not the text. The AI still has to read all 1,053,356 words.
Vacuum-packing a suitcase doesn’t reduce the number of shirts.
Even Google NotebookLM—which accepts up to 200 MB—has a hidden 500,000-word limit. It doesn’t tell you that. It just says “Error, try again.”
The tool will slice the PDF in several parts and convert them to text.
Problem 2: Too sensitive to upload, too slow to run locally
Local AI would be safe, but it’s painfully slow—hours for what cloud AI does in minutes. That 4,713-page investigation? A local model takes days. And the results are worse—local models can’t match cloud performance.
So you upload to cloud AI anyway, unredacted, hoping for the best.
Your confidential data sits on infrastructure you don’t control. One breach, one subpoena, one rogue employee, and your source is exposed. Your client is compromised. Your career is over.
The middle path
AIWhisperer gives you a middle path: sanitize locally, analyze in the cloud, decode locally.
For big files: Convert your PDF to text. The tool strips out fonts, layout, images, compression tables—everything the AI doesn’t need. A 170 MB PDF becomes 13.8 MB of text. Ninety-two percent smaller. Not a single sentence lost. Split into chunks small enough.
For confidential files: Replace data, like names, phone, email, address, and bank accounts, with a placeholder. “Johannes van der Berg” becomes PERSON_001. The mapping stays on your computer. Upload the sanitized text to AI. Get fast results. Decode back to real names locally.
The cloud AI sees structure. You see everything.
What it catches
Names (via AI language models + context patterns)
Locations (cities, addresses, “te Antwerpen”, “richting Rotterdam”)
Phone numbers
Email addresses
IBANs (European bank accounts)
Vehicles (makes and models)
Dates of birth
ID numbers
Six languages: Dutch, English, German, French, Italian, Spanish.
What it misses
Let me be direct about limitations.
Nicknames. “Big J” isn’t flagged as PERSON_001 unless you tell it.
Rare spellings. Unusual name formats may slip through.
Context clues. This is the big one. “The mayor of Rotterdam” identifies someone even with the name stripped. “Europe’s largest drug bust” narrows it down. The tool can’t strip meaning from your documents. Only you can decide what’s safe.
AIWhisperer reduces the risk of exposing confidential data. It doesn’t eliminate all risk. Nothing does.
Always check the sanitized output before uploading.
Why NotebookLM
You could upload your sanitized files to any AI. But NotebookLM has something the others don’t: source references.
When NotebookLM makes a claim, it shows you exactly where that claim came from. Little numbers in brackets: [1] [2] [3]. Click one, and you’re looking at the original passage in your uploaded document.
This matters for verification.
Say NotebookLM tells you: “PERSON_001 transferred funds to COMPANY_003 three days before the arrest [4].” You click [4]. There’s the source text. You can check whether the AI got it right or hallucinated.
ChatGPT doesn’t do this. Claude doesn’t do this. Gemini doesn’t do this. They summarize, but they don’t show their work.
For investigative work, showing your work is everything. You need to trace every claim back to the original document. NotebookLM lets you do that.
The workflow
Upload your sanitized files to NotebookLM. Don’t ask it to build your timeline directly.
Ask it to write you a prompt first.
“Give me a prompt I can use to create a comprehensive timeline
from these five documents.”
NotebookLM analyzes your files and generates a prompt optimized for your specific documents. Legal files get a different prompt than financial records.
Then type: “Execute prompt.”
Three minutes later: a timeline. Dates, events, connections—all organized. Every claim tagged with source references you can verify.
NotebookLM has a “Data Table” button. Click it. Your timeline becomes a spreadsheet. Export to CSV. Run it through the decoder. Real names restored.
Twenty minutes total. Not five days.
Who needs this
Start simple: you have a 200-page PDF. ChatGPT chokes. AIWhisperer converts it to text, splits it, done. Works today. Public documents, no sensitivity, no problem.
Then there’s the harder case: confidential files. Source documents. Client data. Internal investigations. That’s where the sanitization layer comes in. Strip the names, analyze the structure, decode afterward. More steps, more protection. Use it when you need it. Be careful.
Journalists. A source hands you 2,000 pages of leaked documents. Somewhere in there is the story. Cloud AI finds patterns in minutes—but you can’t upload your source’s data without reducing what’s exposed.
Lawyers. Opposing counsel dumps 50,000 pages of discovery on your desk. AI can help find the smoking gun—but client confidentiality means minimizing what reaches cloud servers.
Researchers. Court records, police files, medical studies. The data contains real people. Ethics boards don’t approve of uploading patient names to ChatGPT. So don’t. Use placeholders.
HR professionals. Internal investigations. Harassment complaints. Whistleblower reports. This can never leave your laptop at full exposure.
Accountants. Client financials under audit. Bank statements, invoices, tax records. Your professional liability insurance doesn’t cover “uploaded to AI.”
Thank you!
AI is a flashlight. It shows you where to look. It doesn’t replace verification—you still check every connection in the original documents.
AIWhisperer solves the file size problem by converting PDF to lean text. It can reduce part of the confidentiality problem by stripping identifiers before upload, but only after you double check it. NotebookLM solves the verification problem by showing sources for every claim and allows you to export to a sheet or make a timeline which you can decode with the tool.
It’s not perfect. Context leaks. Unique events identify people. You have to review the output.
But it’s better than waiting days for a local model. And it’s better than spending a week reading what AI could analyze in twenty minutes.
Thank you for reading. Thank you for subscribing. Thank you for making me release things I would have kept to myself.
Here’s to the next 10,787. 🥂
AIWhisperer is free and open source: github.com/voelspriet/aiwhisperer
🎁 3x Perplexity Pro annual subscriptions 🎁 1x You.com annual subscription
To win? Just be a subscriber this month. That’s it. Already subscribed? You’re in. Not yet? Fix that before January 30th. I’ll draw winners on January 31st.






