protectskills/MaliciousAgentSkillsBench

53 stars · Last commit 2026-05-30

A Security Benchmark for Claude Code Agent Skills

README preview

# "Do Not Mention This to the User": Detecting and Understanding Malicious Agent Skills

![License: MIT](https://img.shields.io/badge/license-MIT-green.svg)

This repository contains a comprehensive security benchmark dataset and evaluation framework for Claude Code Agent Skills. The paper reports a **three-tiered, nested** dataset of __98,380 skills__ from two major platforms (skills.rest and skillsmp.com): **4,287 statically-flagged suspicious candidates** (Tier 2), of which **157 are behaviorally-confirmed malicious skills** (Tier 3). The 157 confirmed skills are a verified **subset of** the 4,287 candidates — not a separate group — and the candidates are themselves a subset of the 98,380-skill snapshot.

## Project Structure

```
MaliciousAgentSkillsBench/
├── data/                           # Benchmark datasets
│   ├── malicious_skills.csv        # 157 malicious skill samples
│   ├── skills_dataset.csv          # Ecosystem snapshot; see Data section
├── code/                           # Security analysis framework
│   ├── helper.py                   # Interactive reproduction CLI (main entry point)
│   ├── analyzer/                   # Optional LLM-assisted triage
│   ├── crawler/                    # Multi-platform data crawler (registry crawler)
│   ├── executor/                   # Dynamic execution in Docker sandbox (behavioral verification harness)
│   ├── scanner/                    # Static rule-based security scanner (static analysis rules)
│   ├── analysis/                   # RQ2 statistics: taxonomy counts + co-occurrence + hypothesis tests

View full repository on GitHub →