peteromallet/dataclaw

2,046 stars · Last commit 2026-04-03

Agent harness to publish your history from Claude Code et al. as Huggingface datasets.

README preview

# DataClaw

> **This is a performance art project.** Anthropic built their models on the world's freely shared information, then introduced increasingly [dystopian data policies](https://www.anthropic.com/news/detecting-and-preventing-distillation-attacks) to stop anyone else from doing the same with their data - pulling up the ladder behind them. DataClaw lets you throw the ladder back down. The dataset it produces is yours to share.

Turn your Claude Code, Codex, and other coding-agent conversation history into structured data and publish it to Hugging Face with a single command. DataClaw parses session logs, redacts secrets and PII, and uploads the result as a ready-to-use dataset.

![DataClaw](dataclaw.jpeg)

Every export is tagged **`dataclaw`** on Hugging Face. Together, they may someday form a growing [distributed dataset](https://huggingface.co/datasets?other=dataclaw) of real-world human-AI coding collaboration.

## Give this to your agent

Paste this into Claude Code, Codex, or any coding agent:

```
Help me export my Claude Code, Codex, and other coding-agent conversation history to Hugging Face using DataClaw.
Install it, then walk me through the process.

STEP 1 - INSTALL
  pip install -U dataclaw

View full repository on GitHub →