An employee in Hong Kong watched her CFO and several colleagues join a video call. Familiar faces. Familiar voices. She wired $25.6 million across 15 transactions. Every face on that call was synthetic.
That's the IMINT deepfake problem in one sentence. Not a future risk. Not a research paper. A line item on a forensic accountant's spreadsheet.
What "IMINT deepfake" actually means
Imagery intelligence used to be about pixels you trusted. A face on a body cam. A still from a war zone. A voice on a call. Now any of those can be generated end-to-end by a model that fits on a laptop. Detection is no longer a sub-discipline of OSINT — it's the part that decides whether anything else you do is worth the time.
The discipline isn't "hit a button, get a verdict." It's a stack: human heuristics on top of classifier outputs on top of provenance signals. Skip a layer and you're guessing.
The actual workflow
Real operators don't trust a single tool. The honest workflow looks like this:
- Triage with a human eye first. Anatomical errors, weird ear-jaw asymmetry, gaze that drifts off-target, blinks that are too rare or too rhythmic.
- Cross-classifier sanity check. Run the file through at least two independent detectors with different training pipelines. If they disagree, that's data — not failure.
- Provenance check. Look for a Content Credentials manifest (C2PA). Note whether it's missing, broken, or signed by something you don't trust.
- Reverse-search for the seed. Find the original photo, the prompt, the LinkedIn headshot the model trained on. Synthetic media almost always has a mother.
- For audio: spectral and prosodic checks. Phoneme glitches, missing room reverb, lip-sync drift against the video.
That's the reality. Anyone selling you a single-button "is this a deepfake?" answer is selling lottery tickets.
The visual tells worth knowing
Anatomy is still the easiest crack in most generations. Ears that don't match. Glasses that warp at the temple. Teeth that swap shape mid-blink. Hair near the jawline that flickers between frames. None of these are guarantees — they're prompts to look harder.
Eye-blink and gaze tracking is a longer-running heuristic. Real humans blink around 15–20 times per minute, with irregular gaps. Generators have gotten better at this, so it's now a clue, not a proof.
Lighting consistency is more durable. A face composited onto a body almost always lights from a slightly different direction than the rest of the scene. Same with reflections — synthetic eyes often miss the catchlight pattern of the room.
Lip-sync versus audio drift is the giveaway on dubbed deepfakes. Run the video at 0.25× speed and watch the corners of the mouth.
Then there's Intel's FakeCatcher, which reads photoplethysmography (PPG) — the micro blood-flow signal across 32 points on a real face. Intel claimed 96% accuracy on real-time video. Caveat: 2025 research showed deepfake pipelines can now synthesize convincing heartbeats. The arms race got faster.
Audio is harder than you think
Voice cloning is under five seconds of training data now. The tells aren't where most people look.
Forget "robotic voice." That's gone. What remains:
- Phoneme prosody anomalies — synthetic voices stress the wrong syllable, especially on uncommon words.
- Missing room reverb consistency — a real call recorded in a kitchen carries kitchen acoustics. A clone lifted from podcast audio carries podcast acoustics into the kitchen.
- ASR-edge artifacts — sibilants, plosives and breath sounds either cleaner than real or weirdly absent.
ElevenLabs' AI Speech Classifier is free and useful — but only against ElevenLabs-generated audio. Run it as a confirmation step, never a verdict. Pindrop Pulse is the call-center reference point — it claims 99.4% accuracy on synthetic voices in two seconds. Pindrop's 2025 Voice Intelligence Report recorded a 1,300% surge in deepfake fraud and a projected $44.5 billion in contact-center exposure for the year.
C2PA is not what most people think it is
This part trips up newsrooms constantly. C2PA Content Credentials are a chain-of-custody record. They tell you who claims to have made the file, with what tool, and what edits were applied — cryptographically signed. They do not tell you whether the content is true.
A deepfake generated in a tool that implements C2PA will arrive with a perfectly valid manifest declaring exactly that. The signature is honest. The content can still be a complete fabrication.
So treat C2PA as evidence, not verdict. A missing or broken manifest is a flag worth investigating. A present manifest is the start of a question, not the end of one.
The detector landscape, ordered by what they actually do
Enterprise multi-modal:
- Sensity AI — claims 98% accuracy, monitors 9,000+ sources, court-ready reporting. Strongest for institutional deployment.
- Reality Defender — Gartner-named market leader for 2025, with RealScan, RealCall and RealMeeting plugins for Zoom and Teams.
- Hive Detect — frame-by-frame results and a U.S. Department of Defense contract for offline deployment.
Free and consumer-grade:
- Deepware Scanner — free upload-and-scan for video. Simple, useful, nothing more.
- AI or Not (Optic) — fast single-image checks, no account.
- ElevenLabs Speech Classifier — narrow scope, free, fast.
Open-source and research:
- TrueMedia.org's open-sourced models — the nonprofit shut down in January 2025 and released its detection stack publicly after analyzing 60,000+ pieces of suspect election media.
- Hugging Face dffd and dffd-v2 classifiers — drop-in models for image-level GAN and diffusion artifacts.
Forensics for OSINT investigators:
- FotoForensics — error-level analysis, metadata, JPEG quirks. Old-school, still essential.
- Microsoft Video Authenticator — frame-by-frame confidence scoring, originally distributed via Project Origin partners only.
- Content Credentials Verify — official C2PA inspector.
Why detectors disagree — and why that's fine
AI detectors are noisy. Different models were trained on different generators, different resolutions, different compression histories. Run the same file through four detectors and you'll get a 92%, an 81%, a "likely human" and a "cannot determine." That's not failure — that's the point.
Multi-method confirmation isn't a slogan. It's the only sane workflow. Pair classifier output with C2PA inspection, a reverse search, and old-fashioned human visual triage. If three independent methods converge, you have something. If only one does, you have a starting point.
Operators worth following
Detection moves weekly. The signal you want comes from a small list:
- @hanyfarid — UC Berkeley professor, GetReal Labs co-founder, the dean of digital forensics.
- @sam_gregory — director at WITNESS, focused on the human-rights edge of synthetic media.
- @WITNESS_org — practical detection guidance for journalists in conflict zones.
- @bellingcat — the public verification workflow nobody else publishes.
- @realitydefender and @sensity_ai — vendor accounts, but their threat-intel posts are real signal.
- @disinfobytes — focused, daily case dumps.
Calibrate your panic
The 2024 election year was supposed to be the great deepfake apocalypse. It wasn't. The Knight First Amendment Institute found that less than 1% of fact-checked misinformation was AI-generated, and "cheap fakes" — slowed footage, jump-cut edits, mislabeled real videos — outnumbered deepfakes seven to one.
That doesn't mean the threat is overhyped. It means the threat is uneven. The Hong Kong $25M case wasn't political — it was financial. Police later attributed it to attackers who built deepfakes from public conference footage of the targeted executives. That's the actual front line: not viral political fakes, but targeted impersonation of executives, recruiters and family members. Pindrop's 2025 numbers tell the same story — fraud, not propaganda, is where the deepfake economy lives.
If you're an OSINT analyst, that's the calibration: spend less time chasing viral fakes and more time building a defensible verification stack you can run on any inbound media. Multi-classifier. Provenance-aware. Reverse-searchable. Documented.
The smart operator isn't the one with the best detector. It's the one who treats every video as guilty until three independent methods agree it isn't.
