Real-CATS
awaiting buildAcademic Bitcoin Anti-Money-Laundering dataset, 90K labeled criminal addresses.
Rows in labels.db
0
Confidence
0.7
Refresh cadence
On academic release (irregular)
Schema
flat
About this source
Real-CATS is the Real Criminal Addresses for Targeted Studies dataset, published as part of an academic Bitcoin AML study. It collects ~90K Bitcoin addresses associated with criminal activity (Ransomware, Blackmail Scam, Darknet Market, Mixer, Theft, Tumbler) with category labels suitable for machine-learning training. We ingest only the criminal addresses (CB.tsv); the matched-benign sample (BB.tsv, 90K generic clean addresses) is intentionally not loaded because blanket benign labels add noise to /address attribution.
How we got the data
Cloned from sjdseu/Real-CATS on GitHub. CB.tsv parsed at load time; per-row category preserved as the badge on /address pages.
Why this confidence
Research-derived, varies in citation quality. The dataset's value is in its breadth, but individual entries inherit the uncertainty of whichever upstream feed the authors aggregated. Lower than direct-disclosure (1.0) and curated WalletExplorer (0.85) but useful as corroboration for OFAC / OpenSanctions / WatchYourBack overlap.
License & attribution
Academic redistribution, citation requested. We retain the dataset citation in our docs/SOURCES.md; please cite the original authors if you use this data downstream.
Sample addresses
python tools/build_labels_db.py --update --only real-cats
to populate it.
See the data live
Every row from this source is queryable through the /address aggregator and the JSON API.
JSON API Cross-sourceWant this as a feed?
Same data drives the Address Monitoring API: real-time inflow / outflow events on these addresses as they confirm.
About the API