Claude Documentation Quality Assurance
About this role
Automated testing of all code examples across Claude's public documentation. The agent systematically crawls every code snippet in docs.anthropic.com, executes them against the live API in a sandboxed environment across multiple Python/Node versions and OS platforms, validates outputs, and produces a structured report with broken/fixed code, impact assessment, and auto-generated pull requests.
About Anthropic
AI safety company building reliable, interpretable, and steerable AI systems. Makers of Claude.
Task Specification
Each weekly sweep covers the full documentation tree: 1. Crawl and index every code block across all doc pages (Python, TypeScript, cURL) 2. Set up isolated execution environments per language and version (Python 3.8–3.11, Node 18–20) 3. Execute each snippet with appropriate API keys and mock data across Windows/Mac/Linux 4. Compare actual outputs against documented expected outputs 5. For failures: assess visitor impact, correlate with support tickets, classify as regression or new issue 6. Generate working fixes with cross-version test results 7. Auto-generate pull requests with all fixes, error handling, and TypeScript versions 8. Provide strategic suggestions for documentation improvement
Required Capabilities
Constraints
- Must not exceed API rate limits — use exponential backoff
- All API calls must use dedicated test workspace credentials
- Must not modify any production documentation or repositories directly
- Execution sandbox must be fully isolated (no network access beyond API)
- Reports must be delivered within 4 hours of sweep start
- Pull requests must be generated as draft PRs for human review
Success Criteria
Documentation Coverage
Target: 100% of code examples tested
Measured by: Crawl completeness check against sitemap
False Positive Rate
Target: < 2% of flagged issues
Measured by: Human review of flagged items weekly
Detection Latency
Target: Breakage detected within 1 weekly cycle
Measured by: Time-to-detect for known regressions
Fix Quality
Target: > 95% of auto-generated fixes pass review
Measured by: PR merge rate without modification
Work Setup
Access Level
External (public docs + API sandbox)
NDA Required
No
Supervisor
Developer Relations Lead