ajobi-uhc/seer

146 stars · Last commit 2026-02-08

This was designed for interp researchers who want to do research on or with interp agents to give quality of life improvements and fix some of the annoying things you get from only using Claude code out of the box

README preview

# Seer
a small hackable library that makes it easier to do interpretability work with agents

#### [Docs](https://ajobi-uhc.github.io/seer/)  
#### [Markdown docs for LLM](https://raw.githubusercontent.com/ajobi-uhc/seer/main/docs/llm-context.md)


## What is Seer?
Seer is a library for interpretability researchers who want to do research on or with agents. It makes use cases like creating environments for agents, equipping an agent with your technique and building on papers easier-and fixes some of the annoying things you get from just using Claude Code out of the box.

The core mechanism: you specify an environment (github repos, files, dependencies), Seer launches it as a sandbox on Modal (GPU or CPU), and an agent operates within it via an IPython kernel. 
This setup means you can see what the agent is doing as it runs, it can iteratively fix bugs and adjust its work, and you can spin up many sandboxes in parallel.

Seer is designed to be extensible - you can build on top of it to support complex techniques that you might want the agent to use, eg. [giving an agent SAE tools to diff two Gemini checkpoints](https://ajobi-uhc.github.io/seer/experiments/05-checkpoint-diffing/) or [building a Petri-style auditing agent with whitebox tools](https://ajobi-uhc.github.io/seer/experiments/06-petri-harness/).


## When to use Seer
- **Exploratory investigations**: You have a hypothesis about a model's behavior but want to try many variations quickly without manually rerunning notebooks
    - Case study: [Hidden Preference](https://ajobi-uhc.github.io/seer/experiments/03-hidden-preference/) - investigate the model (from Cywinski et al. [link](https://arxiv.org/pdf/2510.01070)) where a model has been finetuned to have a secret preference to think the user it's talking to is a female
- **Give agents access to your techniques**: Expose methods from your paper to the agent and measure how well they use them across runs

View full repository on GitHub →