🎓

Real-CATS

awaiting build

Academic Bitcoin Anti-Money-Laundering dataset, 90K labeled criminal addresses.

Rows in labels.db

0

Confidence

0.7

Refresh cadence

On academic release (irregular)

Schema

flat

About this source

Real-CATS is the Real Criminal Addresses for Targeted Studies dataset, published as part of an academic Bitcoin AML study. It collects ~90K Bitcoin addresses associated with criminal activity (Ransomware, Blackmail Scam, Darknet Market, Mixer, Theft, Tumbler) with category labels suitable for machine-learning training. We ingest only the criminal addresses (CB.tsv); the matched-benign sample (BB.tsv, 90K generic clean addresses) is intentionally not loaded because blanket benign labels add noise to /address attribution.

How we got the data

Cloned from sjdseu/Real-CATS on GitHub. CB.tsv parsed at load time; per-row category preserved as the badge on /address pages.

Why this confidence

Research-derived, varies in citation quality. The dataset's value is in its breadth, but individual entries inherit the uncertainty of whichever upstream feed the authors aggregated. Lower than direct-disclosure (1.0) and curated WalletExplorer (0.85) but useful as corroboration for OFAC / OpenSanctions / WatchYourBack overlap.

License & attribution

Academic redistribution, citation requested. We retain the dataset citation in our docs/SOURCES.md; please cite the original authors if you use this data downstream.

Sample addresses

labels.db hasn't been built with this source yet. Run python tools/build_labels_db.py --update --only real-cats to populate it.

See the data live

Every row from this source is queryable through the /address aggregator and the JSON API.

JSON API Cross-source

Want this as a feed?

Same data drives the Address Monitoring API: real-time inflow / outflow events on these addresses as they confirm.

About the API
← back to all sources