SOCMINT on Twitter / X Playbook: Operators, Scrapers, Survival

SOCMINT · April 27, 2026 · Updated Apr 30, 2026

If you do SOCMINT, you live on X. Whatever you call it — Twitter, the bird site, the everything app — this is still where breaking news, war footage, market panic, and political miscalculation hit the wire first. The platform is messier than it was in 2022, the data is harder to extract, and half the tools you trained on are dead. None of that has reduced its value. It has only raised the price of being good at it. This guide is the working brief: how the data is shaped, which operators still cut, which scrapers still breathe, and how to recover what the platform tries to bury.

The firehose hasn't moved

Every prediction that X would collapse and take its OSINT community with it has aged badly. Conflict watchers, financial trackers, security researchers and investigative reporters are still here because nowhere else aggregates the same density of timely signal. The Russo-Ukrainian war made that obvious — the open-source community on Twitter counted equipment losses, geolocated strikes and authenticated atrocity footage in something close to real time. Oryx, run by ex-Bellingcat analysts Stijn Mitzer and Joost Oliemans, became the citation-of-record on Russian materiel losses by methodically verifying photographs lifted from the same firehose. Bellingcat used it to evidence war crimes, and Russia returned the compliment by designating the group an undesirable organisation in 2022. A 2024 academic study captured almost two million OSINT-related tweets across 1,040 contributing accounts on the war alone. That is the gravity well you are working inside.

What sets X apart isn't the text — it's the metadata around the text. Each account is a graph: followers, replies, quotes, list memberships, community affiliations. Each post carries a stable conversation_id that lets you reconstruct full reply trees long after the original tweet is gone. Geotags, where present, are precise to coordinates. Even deletions leave fingerprints in caches, archives, and the timelines of accounts that interacted with the post. Treat the data model as your real target. The 280 characters are just the bait.

What broke in 2023, and what it costs now

Tradecraft changed because access changed. In early 2023 the free API died. By 2026 the official tier card looks brutal: a Free tier with no useful search, a Basic tier near $200/month with a seven-day search window, a Pro tier at $5,000/month, and an Enterprise tier that, by credible breakdowns, opens at $42,000/month. One academic summarised the obvious: nobody on a faculty payroll can sustain that. February 2026 added a pay-per-use option at roughly $0.005 per post read, which helps small queries and does nothing for historical work.

The downstream casualties are worth memorising, because they tell you which workflows you can no longer trust. Politwoops, ProPublica's archive of politicians' deleted tweets, is effectively dead — the deletion-tracking endpoint was disabled and only a partial 2025 snapshot survives. Indiana University's Botometer is in archival mode, returning pre-calculated scores from data captured before June 2023, and explicitly cannot identify modern AI-generated bots. Hoaxy retired in July 2025 and was folded into a successor project called OSoMeNet. Twint is dead. snscrape works in fits and starts. If your last good tweet-investigation muscle memory dates from 2021, retire it.

Operators are the cheapest superpower

Before you spend a cent on tooling, master the search bar. Twitter (X) Advanced Search at twitter.com/search-advanced still parses the operator vocabulary that made the platform investigatable in the first place. The compound operators you should be able to type from muscle memory:

from:user and to:user — pin posts to a specific author or addressee.
since:YYYY-MM-DD and until:YYYY-MM-DD — close the date window.
geocode:lat,long,radius — geographic filter, e.g. geocode:50.4501,30.5234,5km for a five-kilometre radius around Kyiv. Coverage thinned as the user base stopped sharing precise GPS, but where it still hits, it hits hard.
filter:images, filter:videos, filter:links, filter:verified — narrow by content type.
min_retweets: and min_faves: — surface viral content above thresholds, useful for tracking amplification curves.
conversation_id: — pull an entire reply tree out of any tweet ID.
lang: — restrict to a single language for cross-border investigations.

Stack them. A query like from:bellingcat geocode:48.0,37.8,50km filter:images since:2024-02-01 until:2024-03-01 returns image posts from one author inside a fifty-kilometre radius across one month — closer to a forensic database query than a social-media search. Bookmark a current operator reference such as the SocialRails advanced-search guide. The operator surface drifts quietly between releases, and the day you needed a deprecated flag is the wrong day to find out.

Scraping in a hostile environment

Without API budget, you scrape. Nitter — the open-source privacy-respecting front end — went mostly dark in early 2024 when X tightened its guest-token defences, then came back when maintainer Zedeus reactivated the project in February 2025. The most consistently maintained public instance is xcancel.com; nitter.poast.org and nitter.privacydev.net rotate in and out of working status. Hammer the public ones and you'll eat 429s within minutes. The right answer, repeated by everyone running an instance, is to host Nitter yourself the moment your work crosses casual into systematic.

For programmatic collection, Twint is a museum piece — its endpoints all require authentication now. The honest current shortlist: twscrape for authenticated scraping of search, profiles, followers, favouriters and retweeters; ntscraper as a Python wrapper around Nitter instances when you need quick one-shots. Neither is permanent — anti-scraping rules shift, sometimes weekly. Browser-side, TweetVacuum as a Chrome extension breaks the 3,200-tweet timeline ceiling by scrolling and harvesting from the rendered DOM, and DMI-TCAT remains the academic standard for building reproducible local archives.

Profiling a target account

You have a handle. What does the platform tell you about its owner without the owner knowing? Tinfoleak is still the most complete free dossier generator — devices, applications used, geolocation patterns, hashtags, mentions, topic clusters — built around a single username, coordinates, or keyword. SocialBearing covers timeline analytics, sentiment and per-tweet engagement; Foller.me profiles activity hours, dominant topics and interaction partners; AccountAnalysis by Luca Hammer surfaces posting cadence and the client apps the user actually posts from. Run twitterBFTD last — it scans the target's tweet history for domain names that have lapsed since they were posted, a quiet pivot into impersonation and brand-abuse exposure that pays off more often than it should.

Tweet archaeology: recovering what they deleted

With Politwoops gone, deleted-tweet recovery is now a multi-tool craft. Build the reflex into your workflow. The Wayback Machine and archive.today store profile snapshots, individual tweets and full threads — for any account on your watchlist, a manual save-this-page on every newsworthy post is now standard practice rather than overkill. Google's site cache holds tweet text for hours past deletion. The conversation_id pivot lets you reconstruct a deleted post from the replies that quoted it before it disappeared. And lateral mirrors — Telegram channels, Discord servers, Mastodon bridges that auto-repost specific accounts — routinely hold deleted material longer than X itself does. The Bellingcat online investigation toolkit documents most of these patterns. Read it once a year; it changes.

Bots, CIB and the verified-blue problem

Coordinated inauthentic behaviour is harder to surface than in 2022, not because the signals changed but because the cheap detectors are gone. Look for the same things you always did: clusters of accounts with near-identical creation dates, synchronous posting histograms, near-duplicate bios, recycled avatars, identical client-app strings, and follower graphs that intersect more tightly than chance permits. Botometer scoring is frozen pre-2023, so the practical answer is to do the cluster analysis yourself on harvested data, eyeball account-creation timestamps, and reverse-image-search profile photos to flag generative-AI face composites. The 2024 OSINT-versus-BULLSHINT study showed how community-structure analysis on Twitter graphs separates genuine investigators from misleading content dressed up as intelligence — that lens is now table stakes for any conflict-related SOCMINT.

And forget the checkmark. The legacy verification system was replaced by paid X Premium subscriptions in 2023, so a blue tick is a billing receipt, not an identity claim. Pull legacy-verification metadata where it exists, the date Premium was activated, the handle-change history, and the account's behaviour in the weeks after subscription. That is how you separate the real public figure from the impostor renting the badge.

Pivoting outward

One tweet is rarely the whole investigation. The strongest cases pivot from a single post to the target's broader digital footprint. Run the handle through WhatsMyName or the Sherlock project to find it on the platforms the target forgot about; reverse-image-search the avatar and banner to surface older, less-curated identities. Once you have a graph, Maltego with its Twitter transforms and SpiderFoot remain the link-analysis canvases of choice for tying accounts to hashtags, domains and people across sources. GeoSocial Footprint renders unintentional location leaks; Birdwatcher gives you an offline workspace for harvested data when you need to work without touching the live site again. For corporate-grade monitoring, Brandwatch and Meltwater hold licensed firehose access — out of reach for independents, default issue for threat-intelligence teams.

Accounts that pay for the follow

The fastest skill upgrade is following people who already work this beat. The accounts cited again and again in conflict-OSINT literature: @bellingcat, @benjaminstrick, @aric_toler, @oryxspioenkop, @nrg8000, @hatless1der, @sector035 (whose weekly "Week in OSINT" remains the single best aggregator), @cyb_detective, @dutchosintguy, @osintcurious, @intelschool and @i_am_osint. Henk van Ess (@thaler3) publishes hands-on investigative method; @osinttechnical runs a steady stream of geopolitical analysis. Treat their public lists as research artefacts in their own right — a curated list is a fast on-ramp into a topic, region or threat actor.

Bottom line

SOCMINT on X is more expensive, more fragmented and more adversarial than it used to be. None of that means it is over. The core skills — operator-fluent search, disciplined archiving, behavioural profiling, structured pivoting — pay better now precisely because the easy automation is gone. Save every newsworthy post the moment it appears. Run your own Nitter instance. Memorise the operator surface. Keep a private list of which third-party tools actually worked this week. The platform changed the rules; the discipline kept the work. The firehose is still there. Drink from it carefully.