Home Benchmarks Learn Tools News
SPONSOR

AppSignal — Stop vibe-debugging. Every exception, every backtrace, grouped so you see patterns, not noise.

↗
Skills · Testing Review
Review skill · v1 · 12-point audit

Tests that catch real bugs.

Runs inside your agent. Flags 12 patterns AI tests get wrong by default. Scores the result A–F.

Install the skill → Read SKILL.md
12
Audit checks
6
AI tools supported
0
Dependencies
agent — review-testing live
$ agent run --skill review-testing
▼ test review starting · 12 checks · target: tests/
behavior over implementation PASS
snapshot tests in Modal.test.tsx NO-OP
→ 3 tests assert nothing meaningful
missing error path for /api/user HIGH
keyboard nav covered PASS
focus returns on modal close PASS
mock at internal boundary SMELL
→ vi.mock('./userService') in Profile.test.tsx
score 68 / 100 C
→ 4 anti-patterns flagged. autopatch? [Y/n]
Works with
  • Cursor
  • Claude Code
  • Codex CLI
  • Windsurf
  • GitHub Copilot
  • Gemini CLI
What it catches

The tests that prove nothing.

review-testing / checks SKILL.md · 12 checks · behavior-first
checks/
no-op-tests.diff •
−Modal.test.tsx Before
test('renders without crashing', () => {
render(<Modal />);
});
 
test('matches snapshot', () => {
const { container } = render(<Modal />);
expect(container).toMatchSnapshot();
});
+Modal.test.tsx After
test('shows confirmation when delete is confirmed', async () => {
render(<DeleteModal item={mockItem} />);
await userEvent.click(screen.getByRole('button', { name: /delete/i }));
expect(screen.getByText(/item deleted/i)).toBeInTheDocument();
});
− passes whether code works or not · coverage theatre → + asserts the user-visible outcome
error-paths.diff •
−Profile.test.tsx Before
test('loads user profile', async () => {
server.use(http.get('/api/user', () =>
HttpResponse.json(mockUser)));
render(<Profile />);
expect(await screen.findByText(mockUser.name)).toBeInTheDocument();
});
+Profile.test.tsx After
test('shows error when profile fails to load', async () => {
server.use(http.get('/api/user', () => HttpResponse.error()));
render(<Profile />);
expect(await screen.findByRole('alert'))
.toHaveTextContent(/failed/i);
});
− only the happy path · 5xx, timeouts, empty: untested → + failure modes covered · alert is asserted
behavior.diff •
−Counter.test.tsx Before
test('calls setState on click', () => {
const setState = vi.fn();
handleClick(setState, 'new');
expect(setState).toHaveBeenCalledWith('new');
});
+Counter.test.tsx After
test('increments displayed count when clicked', () => {
render(<Counter />);
fireEvent.click(screen.getByRole('button', { name: /increment/i }));
expect(screen.getByText('Count: 1')).toBeInTheDocument();
});
− couples to setState · breaks on every refactor → + asserts what the user sees · survives refactors
mock-boundary.diff •
−setup.ts Before
// internal-boundary mock hides real failures
vi.mock('../api/userService', () => ({
fetchUser: vi.fn().mockResolvedValue(mockUser)
}));
+setup.ts After
// MSW intercepts at the HTTP layer; the real code path runs
server.use(
http.get('/api/user/:id', ({ params }) =>
HttpResponse.json(mockUsers[params.id])
)
);
− mocks own modules · passes when integration breaks → + intercepts at the network · real code path runs
Benchmarked

Proof, not vibes.

Modal Component brief · Claude Opus 4.7 · with skill vs. without · 2026-04-23
86
Mutation score 41 → 86 baseline +45
14
No-ops removed Avg deleted per run
27
Behavior assertions Avg added per run
90
Test grade · A vs C · 68 baseline +22
0–49 50–89 90–100
Single-run comparison. Same model and prompt, scored on Lighthouse + an internal rubric.
Install

One file. Six tools. Zero ceremony.

One Markdown file, zero dependencies. Pick your tool below.

1Drop this in

Project: .cursor/skills/review-testing.md

2Or fetch it directly
curl -fsSL https://webdeveloper.com/skills/review-testing/SKILL.md -o .cursor/skills/review-testing.md

Restart Cursor. The next time you ask the agent to review your tests, it’ll run the 12-point audit and grade the suite.

1Drop this in

User-level: ~/.claude/skills/review-testing/SKILL.md

2Or fetch it directly
mkdir -p ~/.claude/skills/review-testing && curl -fsSL https://webdeveloper.com/skills/review-testing/SKILL.md -o ~/.claude/skills/review-testing/SKILL.md

Claude Code auto-discovers skills in ~/.claude/skills/. Available across every project on this machine.

1Drop this in

Project: AGENTS.md (append the SKILL contents)

2Or fetch it directly
curl -fsSL https://webdeveloper.com/skills/review-testing/SKILL.md >> AGENTS.md

Codex CLI reads AGENTS.md automatically when you run it from the project root.

1Drop this in

Project: .windsurf/rules/review-testing.md

2Or fetch it directly
mkdir -p .windsurf/rules && curl -fsSL https://webdeveloper.com/skills/review-testing/SKILL.md -o .windsurf/rules/review-testing.md

Windsurf loads project rules on every Cascade run.

1Drop this in

Project: .github/copilot-instructions.md (append)

2Or fetch it directly
mkdir -p .github && curl -fsSL https://webdeveloper.com/skills/review-testing/SKILL.md >> .github/copilot-instructions.md

Copilot reads .github/copilot-instructions.md as project-wide context.

1Drop this in

Project: .gemini/skills/review-testing.md

2Or fetch it directly
mkdir -p .gemini/skills && curl -fsSL https://webdeveloper.com/skills/review-testing/SKILL.md -o .gemini/skills/review-testing.md

Gemini CLI auto-loads project skills on the next run.

The full SKILL.md

560 lines · plain Markdown · MIT-licensed
SKILL.md
---
name: review-testing
description: >-
  Write meaningful tests for AI-generated front-end code. Use after building
  a component, feature, or page to verify behavior, catch edge cases, validate
  accessibility contracts, and prevent regressions — not to inflate coverage
  numbers.
---

# Testing Review

AI coding tools generate tests that pass but prove nothing. They test that a
div renders, that a class name exists, that a function returns what it was
hardcoded to return. These tests create false confidence — they pass today and
still pass when the code is broken tomorrow.

This skill teaches the agent to write tests that catch real bugs: behavior
tests, edge case tests, accessibility contract tests, and error state tests.

## What to Test

### Test behavior, not implementation

```javascript
// Bad — tests implementation details
test('calls setState with new value', () => {
  const setState = vi.fn();
  handleClick(setState, 'new');
  expect(setState).toHaveBeenCalledWith('new');
});

// Good — tests what the user experiences
test('updates the displayed count when increment is clicked', () => {
  render(<Counter />);
  const button = screen.getByRole('button', { name: /increment/i });
  fireEvent.click(button);
  expect(screen.getByText('Count: 1')).toBeInTheDocument();
});
```

Implementation tests break when you refactor. Behavior tests break when the
feature is actually broken. Write the second kind.

### The testing priority pyramid

Test these in order — higher items catch more real bugs:

| Priority | What to test | Example |
|----------|-------------|---------|
| 1 | User flows | "User can submit the form and see confirmation" |
| 2 | Error handling | "Shows error message when API returns 500" |
| 3 | Edge cases | "Handles empty list, single item, 10,000 items" |
| 4 | Accessibility | "Modal traps focus and returns it on close" |
| 5 | State transitions | "Loading → success, loading → error" |
| 6 | Input validation | "Rejects invalid email, shows inline error" |

AI-generated tests cluster at the bottom (unit tests for utils). Real bugs
cluster at the top (broken user flows, missing error handling).

## Test Structure

### Arrange, Act, Assert — every test

```javascript
test('displays error when email is invalid', () => {
  // Arrange — set up the component
  render(<SignupForm />);
  const emailInput = screen.getByLabelText(/email/i);
  const submitButton = screen.getByRole('button', { name: /sign up/i });

  // Act — perform the action
  await userEvent.type(emailInput, 'not-an-email');
  await userEvent.click(submitButton);

  // Assert — verify the outcome
  expect(screen.getByRole('alert')).toHaveTextContent(/valid email/i);
});
```

Every test should be readable in 10 seconds. If you need to study the test
to understand what it checks, it's too complex — split it.

### One assertion concept per test

```javascript
// Bad — testing multiple unrelated things
test('signup form works', () => {
  render(<SignupForm />);
  expect(screen.getByLabelText(/email/i)).toBeInTheDocument();
  expect(screen.getByLabelText(/password/i)).toBeInTheDocument();
  expect(screen.getByRole('button')).toBeEnabled();
  // ... 15 more assertions
});

// Good — each test has a clear purpose
test('disables submit button while form is submitting', () => { ... });
test('shows success message after valid submission', () => { ... });
test('shows field-level error for invalid email', () => { ... });
```

### Name tests as behavior descriptions

```javascript
// Bad — describes code, not behavior
test('handleSubmit', () => { ... });
test('renders correctly', () => { ... });
test('Component test 1', () => { ... });

// Good — describes what the user experiences
test('submitting valid form shows success confirmation', () => { ... });
test('pressing Escape closes the modal and returns focus', () => { ... });
test('empty search shows "no results" message', () => { ... });
```

The test name should read as a sentence that describes the expected behavior.
If the test fails, the name alone should tell you what's broken.

## Error and Edge Case Tests

### Test every error path

For every happy path the AI generated, write the corresponding error test:

```javascript
// If this exists...
test('loads and displays user profile', async () => {
  server.use(http.get('/api/user', () => HttpResponse.json(mockUser)));
  render(<Profile />);
  expect(await screen.findByText(mockUser.name)).toBeInTheDocument();
});

// ...then these must also exist
test('shows error message when profile fails to load', async () => {
  server.use(http.get('/api/user', () => HttpResponse.error()));
  render(<Profile />);
  expect(await screen.findByRole('alert')).toHaveTextContent(/failed/i);
});

test('shows retry button on network error', async () => {
  server.use(http.get('/api/user', () => HttpResponse.error()));
  render(<Profile />);
  expect(await screen.findByRole('button', { name: /retry/i }))
    .toBeInTheDocument();
});
```

### Test boundary values

| Input | Boundaries to test |
|-------|-------------------|
| Text | Empty string, single char, max length, max + 1 |
| Numbers | 0, negative, min, max, min - 1, max + 1, NaN |
| Arrays | Empty, single item, many items (100+) |
| Dates | Past, today, future, invalid date, timezone edge |
| Files | 0 bytes, max size, max + 1, wrong type |

### Test loading states

```javascript
test('shows skeleton while data is loading', async () => {
  server.use(
    http.get('/api/data', async () => {
      await delay(100);
      return HttpResponse.json(mockData);
    })
  );
  render(<DataList />);

  expect(screen.getByTestId('skeleton')).toBeInTheDocument();
  expect(await screen.findByText(mockData[0].title)).toBeInTheDocument();
  expect(screen.queryByTestId('skeleton')).not.toBeInTheDocument();
});
```

## Accessibility Tests

### Test keyboard navigation

```javascript
test('menu items are navigable with arrow keys', async () => {
  render(<Menu items={['File', 'Edit', 'View']} />);
  const menu = screen.getByRole('menu');

  await userEvent.tab(); // Focus enters menu
  expect(screen.getByRole('menuitem', { name: 'File' })).toHaveFocus();

  await userEvent.keyboard('{ArrowDown}');
  expect(screen.getByRole('menuitem', { name: 'Edit' })).toHaveFocus();

  await userEvent.keyboard('{ArrowDown}');
  expect(screen.getByRole('menuitem', { name: 'View' })).toHaveFocus();
});
```

### Test focus management

```javascript
test('opening modal moves focus to first focusable element', async () => {
  render(<ModalTrigger />);
  await userEvent.click(screen.getByRole('button', { name: /open/i }));

  const dialog = screen.getByRole('dialog');
  expect(dialog).toBeInTheDocument();

  const closeButton = within(dialog).getByRole('button', { name: /close/i });
  expect(closeButton).toHaveFocus();
});

test('closing modal returns focus to the trigger', async () => {
  render(<ModalTrigger />);
  const trigger = screen.getByRole('button', { name: /open/i });
  await userEvent.click(trigger);

  await userEvent.keyboard('{Escape}');
  expect(trigger).toHaveFocus();
});
```

### Test ARIA states

```javascript
test('accordion toggles aria-expanded on click', async () => {
  render(<Accordion items={[{ title: 'FAQ', content: 'Answer' }]} />);
  const button = screen.getByRole('button', { name: /faq/i });

  expect(button).toHaveAttribute('aria-expanded', 'false');
  await userEvent.click(button);
  expect(button).toHaveAttribute('aria-expanded', 'true');
});
```

## Form Tests

### Test the full validation flow

```javascript
test('shows inline errors for each invalid field on submit', async () => {
  render(<RegistrationForm />);
  await userEvent.click(screen.getByRole('button', { name: /register/i }));

  expect(screen.getByText(/name is required/i)).toBeInTheDocument();
  expect(screen.getByText(/email is required/i)).toBeInTheDocument();
  expect(screen.getByText(/password is required/i)).toBeInTheDocument();
});

test('clears field error when user corrects the input', async () => {
  render(<RegistrationForm />);
  await userEvent.click(screen.getByRole('button', { name: /register/i }));

  const emailInput = screen.getByLabelText(/email/i);
  await userEvent.type(emailInput, '[email protected]');

  expect(screen.queryByText(/email is required/i)).not.toBeInTheDocument();
});
```

### Test form submission states

```javascript
test('disables submit button and shows spinner during submission', async () => {
  server.use(http.post('/api/register', async () => {
    await delay(100);
    return HttpResponse.json({ success: true });
  }));

  render(<RegistrationForm />);
  fillValidForm();
  const button = screen.getByRole('button', { name: /register/i });

  await userEvent.click(button);
  expect(button).toBeDisabled();
  expect(button).toHaveAttribute('aria-busy', 'true');

  await waitFor(() => expect(button).toBeEnabled());
});
```

## Mocking Guidelines

### Mock at the network boundary, not internal modules

```javascript
// Bad — mocks internal implementation
vi.mock('../api/userService', () => ({
  fetchUser: vi.fn().mockResolvedValue(mockUser)
}));

// Good — mocks the network request
server.use(
  http.get('/api/user/:id', ({ params }) => {
    return HttpResponse.json(mockUsers[params.id]);
  })
);
```

Mocking internals makes tests pass even when the real integration is broken.
Mocking at the network boundary (with MSW or similar) tests the actual code
path.

### Don't mock what you're testing

```javascript
// Pointless — you're testing your mock, not the function
vi.mock('../utils/format', () => ({
  formatDate: vi.fn().mockReturnValue('Jan 1, 2026')
}));
test('formatDate returns formatted date', () => {
  expect(formatDate(new Date())).toBe('Jan 1, 2026'); // Always passes
});
```

## Test Maintenance

### Delete tests that test nothing

AI-generated test suites often include tests like:

```javascript
// Delete these — they prove nothing
test('renders without crashing', () => {
  render(<Component />);
});

test('matches snapshot', () => {
  const { container } = render(<Component />);
  expect(container).toMatchSnapshot();
});

test('has correct class name', () => {
  render(<Component />);
  expect(screen.getByTestId('wrapper')).toHaveClass('wrapper');
});
```

These tests pass when the component is broken. They fail when you make
harmless refactors. They're worse than no tests because they create false
confidence.

### Keep test files next to source files

```
components/
  Button/
    Button.tsx
    Button.test.tsx
  Modal/
    Modal.tsx
    Modal.test.tsx
```

Not in a separate `__tests__/` directory tree. Co-located tests are found
faster, updated alongside the component, and deleted when the component is
removed.

## The Testing Checklist

After writing tests for any feature, verify:

- [ ] Every user flow has at least one integration test
- [ ] Every `fetch`/API call has both success and error tests
- [ ] Every form has validation and submission state tests
- [ ] Boundary values are tested (empty, min, max, overflow)
- [ ] Loading, error, and empty states are tested
- [ ] Keyboard navigation works (tab order, arrow keys, escape)
- [ ] Focus management is tested (modals, drawers, dropdowns)
- [ ] ARIA states toggle correctly (expanded, selected, checked)
- [ ] Tests use behavior descriptions as names
- [ ] No snapshot tests without clear behavioral assertions
- [ ] Mocks are at the network boundary, not internal modules
- [ ] No tests that pass when the feature is broken

## Anti-Patterns

**Never do these:**

- Write `renders without crashing` as the only test — it proves nothing
- Use snapshot tests as a substitute for behavioral assertions — they
  generate noise, not confidence
- Mock the module you're testing — you're testing your mock
- Test CSS class names — they can change without breaking behavior
- Write tests after being told "add tests" without knowing what to test —
  test the behavior described in the feature spec, not random implementation
- Use `getByTestId` as the default query — prefer `getByRole`,
  `getByLabelText`, `getByText` which test the accessible interface
- Copy the component's implementation logic into the test — you'll have two
  copies of the same bug
- Skip error path tests because "the happy path works" — errors are where
  bugs hide
- Write one giant test with 20 assertions — split into focused tests
- Leave `test.skip` or `test.todo` in the suite indefinitely — either write
  the test or delete the placeholder
Pair it

Stack it with the rest of the suite.

Review07 Pre-Deploy Review

26-point audit of error handling, debug artifacts, hallucinated APIs, and a11y smells — the broader pre-flight checklist.

↗
Review08 Security Review

XSS prevention, input sanitization, secret exposure, CSP, CORS, auth token storage, CSRF.

↗
Front-end06 Accessibility

WCAG 2.2 AA, keyboard nav, focus management, contrast, screen reader patterns, form accessibility.

↗

Changelog

V1 April 15, 2026
Initial skill covering behavior-first testing, test structure and naming, error and edge case coverage, accessibility contract tests, form validation flows, mocking guidelines, test maintenance, and the 12-item testing checklist.

FAQ

What kind of tests does this skill teach?

A 12-point review that promotes behavior-driven tests catching real bugs: user flow integration tests, error handling tests, boundary value tests, accessibility contract tests (keyboard navigation, focus management, ARIA states), form validation flow tests, and loading state transition tests. It explicitly flags low-value tests like snapshot tests and “renders without crashing” tests for removal.

Why are AI-generated tests often low quality?

AI tools generate tests that prove nothing: they test that a div renders, that a class name exists, or that a mocked function returns what it was mocked to return. These tests pass when the code is broken and fail when you make harmless refactors. This skill teaches the agent to test behavior instead of implementation details, creating tests that actually catch regressions.

How should mocking work in tests?

Mock at the network boundary (using MSW or similar), not at internal module boundaries. Mocking internal modules makes tests pass even when the real integration is broken. The skill covers when to mock, what to mock, and the common anti-pattern of mocking the module under test.

Which AI coding tools is this compatible with?

Cursor, Claude Code, Codex CLI, Windsurf, GitHub Copilot, and Gemini CLI. The skill is a single Markdown file and ships in the native format for each tool with one-click copy.

STATUS ● BUILDING THE FUTURE
MISSION MAKE AI SHIP BETTER CODE.
VERSION BETA 3.0

MAKE AI SHIP BETTER CODE.

@WEBDEVELOPERHQ ↗
TERMS / PRIVACY
FRIENDS
Authentic Jobs ↗
Web Reference ↗
Ready.dev ↗
Fullres ↗
© 2026 WEB DEVELOPER / ALL RIGHTS RESERVED