Web App Testing

Name: Web App Testing
Author: Anthropic

Drives reliable end-to-end and integration testing of web apps, with patterns for flake-free assertions and setup.

View on GitHub Read SKILL.md

anthropics/skills 2026-06-08

145,091 GitHub stars

17,084 Forks

2026-05-29 Updated

Other License

The full SKILL.md

Synced June 1, 2026 — view latest on GitHub

---
name: webapp-testing
description: Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.
license: Complete terms in LICENSE.txt
---

# Web Application Testing

To test local web applications, write native Python Playwright scripts.

**Helper Scripts Available**:
- `scripts/with_server.py` - Manages server lifecycle (supports multiple servers)

**Always run scripts with `--help` first** to see usage. DO NOT read the source until you try running the script first and find that a customized solution is abslutely necessary. These scripts can be very large and thus pollute your context window. They exist to be called directly as black-box scripts rather than ingested into your context window.

## Decision Tree: Choosing Your Approach

```
User task → Is it static HTML?
    ├─ Yes → Read HTML file directly to identify selectors
    │         ├─ Success → Write Playwright script using selectors
    │         └─ Fails/Incomplete → Treat as dynamic (below)
    │
    └─ No (dynamic webapp) → Is the server already running?
        ├─ No → Run: python scripts/with_server.py --help
        │        Then use the helper + write simplified Playwright script
        │
        └─ Yes → Reconnaissance-then-action:
            1. Navigate and wait for networkidle
            2. Take screenshot or inspect DOM
            3. Identify selectors from rendered state
            4. Execute actions with discovered selectors
```

## Example: Using with_server.py

To start a server, run `--help` first, then use the helper:

**Single server:**
```bash
python scripts/with_server.py --server "npm run dev" --port 5173 -- python your_automation.py
```

**Multiple servers (e.g., backend + frontend):**
```bash
python scripts/with_server.py \
  --server "cd backend && python server.py" --port 3000 \
  --server "cd frontend && npm run dev" --port 5173 \
  -- python your_automation.py
```

To create an automation script, include only Playwright logic (servers are managed automatically):
```python
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True) # Always launch chromium in headless mode
    page = browser.new_page()
    page.goto('http://localhost:5173') # Server already running and ready
    page.wait_for_load_state('networkidle') # CRITICAL: Wait for JS to execute
    # ... your automation logic
    browser.close()
```

## Reconnaissance-Then-Action Pattern

1. **Inspect rendered DOM**:
   ```python
   page.screenshot(path='/tmp/inspect.png', full_page=True)
   content = page.content()
   page.locator('button').all()
   ```

2. **Identify selectors** from inspection results

3. **Execute actions** using discovered selectors

## Common Pitfall

❌ **Don't** inspect the DOM before waiting for `networkidle` on dynamic apps
✅ **Do** wait for `page.wait_for_load_state('networkidle')` before inspection

## Best Practices

- **Use bundled scripts as black boxes** - To accomplish a task, consider whether one of the scripts available in `scripts/` can help. These scripts handle common, complex workflows reliably without cluttering the context window. Use `--help` to see usage, then invoke directly. 
- Use `sync_playwright()` for synchronous scripts
- Always close the browser when done
- Use descriptive selectors: `text=`, `role=`, CSS selectors, or IDs
- Add appropriate waits: `page.wait_for_selector()` or `page.wait_for_timeout()`

## Reference Files

- **examples/** - Examples showing common patterns:
  - `element_discovery.py` - Discovering buttons, links, and inputs on a page
  - `static_html_automation.py` - Using file:// URLs for local HTML
  - `console_logging.py` - Capturing console logs during automation

Install

Add Web App Testing to your agent

Pick your tool, then drop the file in or run the one-line fetch command.

1Drop this in

Project: .cursor/skills/webapp-testing.md

2Or fetch it from the repo

curl -fsSL https://raw.githubusercontent.com/anthropics/skills/main/skills/webapp-testing/SKILL.md -o .cursor/skills/webapp-testing.md

Restart Cursor. The agent now follows this skill on every relevant task.

1Drop this in

User-level: ~/.claude/skills/webapp-testing/SKILL.md

2Or fetch it from the repo

mkdir -p ~/.claude/skills/webapp-testing && curl -fsSL https://raw.githubusercontent.com/anthropics/skills/main/skills/webapp-testing/SKILL.md -o ~/.claude/skills/webapp-testing/SKILL.md

Claude Code auto-discovers skills in ~/.claude/skills/.

1Drop this in

Project: AGENTS.md (append the SKILL contents)

2Or fetch it from the repo

curl -fsSL https://raw.githubusercontent.com/anthropics/skills/main/skills/webapp-testing/SKILL.md >> AGENTS.md

Codex CLI reads AGENTS.md automatically from the project root.

1Drop this in

Project: .windsurf/rules/webapp-testing.md

2Or fetch it from the repo

mkdir -p .windsurf/rules && curl -fsSL https://raw.githubusercontent.com/anthropics/skills/main/skills/webapp-testing/SKILL.md -o .windsurf/rules/webapp-testing.md

Windsurf loads project rules on every Cascade run.

1Drop this in

Project: .github/copilot-instructions.md (append)

2Or fetch it from the repo

mkdir -p .github && curl -fsSL https://raw.githubusercontent.com/anthropics/skills/main/skills/webapp-testing/SKILL.md >> .github/copilot-instructions.md

Copilot reads .github/copilot-instructions.md as project-wide context.

1Drop this in

Project: .gemini/skills/webapp-testing.md

2Or fetch it from the repo

mkdir -p .gemini/skills && curl -fsSL https://raw.githubusercontent.com/anthropics/skills/main/skills/webapp-testing/SKILL.md -o .gemini/skills/webapp-testing.md

Gemini CLI auto-loads project skills on the next run.