peteromallet/dataclaw

2,086 stars · Last commit 2026-05-30

Agent harness to publish your history from Claude Code et al. as Huggingface datasets.

README preview

# DataClaw

> **This is a performance art project.** Anthropic built their models on the world's freely shared information, then introduced increasingly [dystopian data policies](https://www.anthropic.com/news/detecting-and-preventing-distillation-attacks) to stop anyone else from doing the same with their data - pulling up the ladder behind them. DataClaw lets you throw the ladder back down. The dataset it produces is yours to share.

Turn your Claude Code, Codex, and other coding-agent conversation history into structured data and publish it to Hugging Face with a single command. DataClaw parses session logs, redacts secrets and PII, and uploads the result as a ready-to-use dataset.

![DataClaw](dataclaw.jpeg)

Every export is tagged **`dataclaw`** on Hugging Face. Together, they may someday form a growing [distributed dataset](https://huggingface.co/datasets?other=dataclaw) of real-world human-AI coding collaboration.

## Download for Mac

<p align="center">
  <a href="https://github.com/peteromallet/dataclaw/releases/latest/download/DataClaw-macOS-Apple-Silicon.dmg">
    <img alt="Download DataClaw for Apple Silicon Macs" src="https://img.shields.io/badge/Download%20for%20Mac-Apple%20Silicon-111111?style=for-the-badge&logo=apple&logoColor=white">
  </a>
  <a href="https://github.com/peteromallet/dataclaw/releases/latest">
    <img alt="View GitHub Releases" src="https://img.shields.io/badge/View%20Releases-GitHub-0969da?style=for-the-badge&logo=github&logoColor=white">
  </a>
</p>

View full repository on GitHub →