---
name: seo-audit
description: >
  Full website SEO audit with machine-verified data pipeline. Uses Python scripts
  for crawling, hreflang, meta extraction, schema, images, AI crawlers. Then
  compiles into a structured report. Use for comprehensive SEO audits.
user-invokable: true
argument-hint: "[domain1] [sitemap1] [domain2] [sitemap2] ..."
allowed-tools:
  - Read
  - Grep
  - Glob
  - Bash
  - Write
  - Agent
---

# Full SEO Audit — Machine-Verified Data Pipeline

## Philosophy

This audit uses Python scripts for all data collection — NO AI interpretation of page content.
AI is only used for the final report compilation, and must cite numbers from CSV/JSON data files.

## Process

### Step 1: Setup
```bash
mkdir -p ./reports/crawl-data
```

### Step 2: Machine Data Collection (per site, run in parallel)

For each site, run these Python scripts via Bash:

1. **Crawl & Broken Links**: `python3 ~/.claude/skills/seo-audit-crawl/scripts/crawl.py [domain] [sitemap] ./reports/crawl-data`
2. **Hreflang**: `python3 ~/.claude/skills/seo-audit-hreflang/scripts/hreflang.py [domain] [sitemap] ./reports/crawl-data`
3. **Meta Tags**: `python3 ~/.claude/skills/seo-audit-meta/scripts/meta.py [domain] [sitemap] ./reports/crawl-data`
4. **Schema**: `python3 ~/.claude/skills/seo-audit-schema/scripts/schema.py [domain] [sitemap] ./reports/crawl-data`
5. **Images**: `python3 ~/.claude/skills/seo-audit-images/scripts/images.py [domain] [sitemap] ./reports/crawl-data`
6. **AI Crawlers**: `python3 ~/.claude/skills/seo-audit-ai-crawlers/scripts/ai_crawlers.py [site-url] ./reports/crawl-data`

### Step 3: Analytics (if MCP servers available)
- **Google Search Console**: use `mcp__gsc__get_performance_overview`, `mcp__gsc__get_search_analytics`
- **Google Analytics 4**: use `mcp__analytics-mcp__run_report`

### Step 4: Keywords (if keyword tracking CSV available)
- Read and analyze keyword ranking exports (e.g., from Mangools, Ahrefs, SEMrush)

### Step 5: Performance (if Chrome DevTools MCP available)
- Lighthouse audits + performance traces on key pages

### Step 6: Compile Report
- Read all CSV/JSON data files
- Generate a structured HTML or Markdown report
- All claims must cite the source CSV/JSON data

## Key Principles
1. **Machine-first**: Python scripts collect factual data, not AI agents
2. **CSV pipeline**: Each script outputs CSV/JSON — report reads these files
3. **AI writes prose only**: Report compilation must cite CSV numbers, not make them up
4. **Fact-check**: Verify key claims against source data after report generation

## Crawl Configuration

```
Max pages: 500
Respect robots.txt: Yes
Follow redirects: Yes (max 3 hops)
Timeout per page: 30 seconds
Concurrent requests: 5
Delay between requests: 1 second
```

## Scoring Weights

| Category | Weight |
|----------|--------|
| Technical SEO | 22% |
| Content Quality | 23% |
| On-Page SEO | 20% |
| Schema / Structured Data | 10% |
| Performance (CWV) | 10% |
| AI Search Readiness | 10% |
| Images | 5% |

## Report Structure

### Executive Summary
- Overall SEO Health Score (0-100)
- Business type detected
- Top 5 critical issues
- Top 5 quick wins

### Technical SEO
- Crawlability issues
- Indexability problems
- Security concerns
- Core Web Vitals status

### Content Quality
- E-E-A-T assessment
- Thin content pages
- Duplicate content issues
- Readability scores

### On-Page SEO
- Title tag issues
- Meta description problems
- Heading structure
- Internal linking gaps

### Schema & Structured Data
- Current implementation
- Validation errors
- Missing opportunities

### Performance
- LCP, INP, CLS scores
- Resource optimization needs
- Third-party script impact

### Images
- Missing alt text
- Oversized images
- Format recommendations

### AI Search Readiness
- Citability score
- Structural improvements
- Authority signals

## Priority Definitions

- **Critical**: Blocks indexing or causes penalties (fix immediately)
- **High**: Significantly impacts rankings (fix within 1 week)
- **Medium**: Optimization opportunity (fix within 1 month)
- **Low**: Nice to have (backlog)

## Error Handling

| Scenario | Action |
|----------|--------|
| URL unreachable | Report the error clearly. Do not guess site content. |
| robots.txt blocks crawling | Report which paths are blocked. Analyze only accessible pages. |
| Rate limiting (429 responses) | Back off and reduce concurrent requests. Report partial results. |
| Timeout on large sites (500+ pages) | Cap the crawl at the limit. Report findings for pages crawled. |
