Review skill · v1 · 12-point audit

Tests that catch real bugs.

Runs inside your agent. Flags 12 patterns AI tests get wrong by default. Scores the result A–F.

Install the skill Read SKILL.md

12
Audit checks: 6
AI tools supported: 0
Dependencies

agent — review-testing live

$ agent run --skill review-testing

▼ test review starting · 12 checks · target: tests/

behavior over implementation PASS

snapshot tests in Modal.test.tsx NO-OP

→ 3 tests assert nothing meaningful

missing error path for /api/user HIGH

keyboard nav covered PASS

focus returns on modal close PASS

mock at internal boundary SMELL

→ vi.mock('./userService') in Profile.test.tsx

score 68 / 100 C

→ 4 anti-patterns flagged. autopatch? [Y/n]

What it catches

The tests that prove nothing.

01 no-op-tests.diff 02 error-paths.diff 03 behavior.diff 04 mock-boundary.diff

Modal.test.tsx Before

test('renders without crashing', () => {

render(<Modal />);

});

test('matches snapshot', () => {

const { container } = render(<Modal />);

expect(container).toMatchSnapshot();

});

Modal.test.tsx After

test('shows confirmation when delete is confirmed', async () => {

render(<DeleteModal item={mockItem} />);

await userEvent.click(screen.getByRole('button', { name: /delete/i }));

expect(screen.getByText(/item deleted/i)).toBeInTheDocument();

});

Profile.test.tsx Before

test('loads user profile', async () => {

server.use(http.get('/api/user', () =>

HttpResponse.json(mockUser)));

render(<Profile />);

expect(await screen.findByText(mockUser.name)).toBeInTheDocument();

});

Profile.test.tsx After

test('shows error when profile fails to load', async () => {

server.use(http.get('/api/user', () => HttpResponse.error()));

render(<Profile />);

expect(await screen.findByRole('alert'))

.toHaveTextContent(/failed/i);

});

Counter.test.tsx Before

test('calls setState on click', () => {

const setState = vi.fn();

handleClick(setState, 'new');

expect(setState).toHaveBeenCalledWith('new');

});

Counter.test.tsx After

test('increments displayed count when clicked', () => {

render(<Counter />);

fireEvent.click(screen.getByRole('button', { name: /increment/i }));

expect(screen.getByText('Count: 1')).toBeInTheDocument();

});

setup.ts Before

// internal-boundary mock hides real failures

vi.mock('../api/userService', () => ({

fetchUser: vi.fn().mockResolvedValue(mockUser)

}));

setup.ts After

// MSW intercepts at the HTTP layer; the real code path runs

server.use(

http.get('/api/user/:id', ({ params }) =>

HttpResponse.json(mockUsers[params.id])

)

);

Benchmarked

Proof, not vibes.

Modal Component brief · Claude Opus 4.7 · with skill vs. without · 2026-04-23

Mutation score 41 → 86 baseline +45

No-ops removed Avg deleted per run

Behavior assertions Avg added per run

Test grade A vs C · 68 baseline +22

Install

One file. Six tools. Zero ceremony.

One Markdown file, zero dependencies. Pick your tool below.

1Drop this in

Project: .cursor/skills/review-testing.md

2Or fetch it directly

curl -fsSL https://webdeveloper.com/skills/review-testing/SKILL.md -o .cursor/skills/review-testing.md

Restart Cursor. The next time you ask the agent to review your tests, it’ll run the 12-point audit and grade the suite.

1Drop this in

User-level: ~/.claude/skills/review-testing/SKILL.md

2Or fetch it directly

mkdir -p ~/.claude/skills/review-testing && curl -fsSL https://webdeveloper.com/skills/review-testing/SKILL.md -o ~/.claude/skills/review-testing/SKILL.md

Claude Code auto-discovers skills in ~/.claude/skills/. Available across every project on this machine.

1Drop this in

Project: AGENTS.md (append the SKILL contents)

2Or fetch it directly

curl -fsSL https://webdeveloper.com/skills/review-testing/SKILL.md >> AGENTS.md

Codex CLI reads AGENTS.md automatically when you run it from the project root.

1Drop this in

Project: .windsurf/rules/review-testing.md

2Or fetch it directly

mkdir -p .windsurf/rules && curl -fsSL https://webdeveloper.com/skills/review-testing/SKILL.md -o .windsurf/rules/review-testing.md

Windsurf loads project rules on every Cascade run.

1Drop this in

Project: .github/copilot-instructions.md (append)

2Or fetch it directly

mkdir -p .github && curl -fsSL https://webdeveloper.com/skills/review-testing/SKILL.md >> .github/copilot-instructions.md

Copilot reads .github/copilot-instructions.md as project-wide context.

1Drop this in

Project: .gemini/skills/review-testing.md

2Or fetch it directly

mkdir -p .gemini/skills && curl -fsSL https://webdeveloper.com/skills/review-testing/SKILL.md -o .gemini/skills/review-testing.md

Gemini CLI auto-loads project skills on the next run.

The full SKILL.md

560 lines · plain Markdown · MIT-licensed

---
name: review-testing
description: >-
  Write meaningful tests for AI-generated front-end code. Use after building
  a component, feature, or page to verify behavior, catch edge cases, validate
  accessibility contracts, and prevent regressions — not to inflate coverage
  numbers.
---

# Testing Review

AI coding tools generate tests that pass but prove nothing. They test that a
div renders, that a class name exists, that a function returns what it was
hardcoded to return. These tests create false confidence — they pass today and
still pass when the code is broken tomorrow.

This skill teaches the agent to write tests that catch real bugs: behavior
tests, edge case tests, accessibility contract tests, and error state tests.

## What to Test

### Test behavior, not implementation

```javascript
// Bad — tests implementation details
test('calls setState with new value', () => {
  const setState = vi.fn();
  handleClick(setState, 'new');
  expect(setState).toHaveBeenCalledWith('new');
});

// Good — tests what the user experiences
test('updates the displayed count when increment is clicked', () => {
  render(<Counter />);
  const button = screen.getByRole('button', { name: /increment/i });
  fireEvent.click(button);
  expect(screen.getByText('Count: 1')).toBeInTheDocument();
});
```

Implementation tests break when you refactor. Behavior tests break when the
feature is actually broken. Write the second kind.

### The testing priority pyramid

Test these in order — higher items catch more real bugs:

| Priority | What to test | Example |
|----------|-------------|---------|
| 1 | User flows | "User can submit the form and see confirmation" |
| 2 | Error handling | "Shows error message when API returns 500" |
| 3 | Edge cases | "Handles empty list, single item, 10,000 items" |
| 4 | Accessibility | "Modal traps focus and returns it on close" |
| 5 | State transitions | "Loading → success, loading → error" |
| 6 | Input validation | "Rejects invalid email, shows inline error" |

AI-generated tests cluster at the bottom (unit tests for utils). Real bugs
cluster at the top (broken user flows, missing error handling).

## Test Structure

### Arrange, Act, Assert — every test

```javascript
test('displays error when email is invalid', () => {
  // Arrange — set up the component
  render(<SignupForm />);
  const emailInput = screen.getByLabelText(/email/i);
  const submitButton = screen.getByRole('button', { name: /sign up/i });

  // Act — perform the action
  await userEvent.type(emailInput, 'not-an-email');
  await userEvent.click(submitButton);

  // Assert — verify the outcome
  expect(screen.getByRole('alert')).toHaveTextContent(/valid email/i);
});
```

Every test should be readable in 10 seconds. If you need to study the test
to understand what it checks, it's too complex — split it.

### One assertion concept per test

```javascript
// Bad — testing multiple unrelated things
test('signup form works', () => {
  render(<SignupForm />);
  expect(screen.getByLabelText(/email/i)).toBeInTheDocument();
  expect(screen.getByLabelText(/password/i)).toBeInTheDocument();
  expect(screen.getByRole('button')).toBeEnabled();
  // ... 15 more assertions
});

// Good — each test has a clear purpose
test('disables submit button while form is submitting', () => { ... });
test('shows success message after valid submission', () => { ... });
test('shows field-level error for invalid email', () => { ... });
```

### Name tests as behavior descriptions

```javascript
// Bad — describes code, not behavior
test('handleSubmit', () => { ... });
test('renders correctly', () => { ... });
test('Component test 1', () => { ... });

// Good — describes what the user experiences
test('submitting valid form shows success confirmation', () => { ... });
test('pressing Escape closes the modal and returns focus', () => { ... });
test('empty search shows "no results" message', () => { ... });
```

The test name should read as a sentence that describes the expected behavior.
If the test fails, the name alone should tell you what's broken.

## Error and Edge Case Tests

### Test every error path

For every happy path the AI generated, write the corresponding error test:

```javascript
// If this exists...
test('loads and displays user profile', async () => {
  server.use(http.get('/api/user', () => HttpResponse.json(mockUser)));
  render(<Profile />);
  expect(await screen.findByText(mockUser.name)).toBeInTheDocument();
});

// ...then these must also exist
test('shows error message when profile fails to load', async () => {
  server.use(http.get('/api/user', () => HttpResponse.error()));
  render(<Profile />);
  expect(await screen.findByRole('alert')).toHaveTextContent(/failed/i);
});

test('shows retry button on network error', async () => {
  server.use(http.get('/api/user', () => HttpResponse.error()));
  render(<Profile />);
  expect(await screen.findByRole('button', { name: /retry/i }))
    .toBeInTheDocument();
});
```

### Test boundary values

| Input | Boundaries to test |
|-------|-------------------|
| Text | Empty string, single char, max length, max + 1 |
| Numbers | 0, negative, min, max, min - 1, max + 1, NaN |
| Arrays | Empty, single item, many items (100+) |
| Dates | Past, today, future, invalid date, timezone edge |
| Files | 0 bytes, max size, max + 1, wrong type |

### Test loading states

```javascript
test('shows skeleton while data is loading', async () => {
  server.use(
    http.get('/api/data', async () => {
      await delay(100);
      return HttpResponse.json(mockData);
    })
  );
  render(<DataList />);

  expect(screen.getByTestId('skeleton')).toBeInTheDocument();
  expect(await screen.findByText(mockData[0].title)).toBeInTheDocument();
  expect(screen.queryByTestId('skeleton')).not.toBeInTheDocument();
});
```

## Accessibility Tests

### Test keyboard navigation

```javascript
test('menu items are navigable with arrow keys', async () => {
  render(<Menu items={['File', 'Edit', 'View']} />);
  const menu = screen.getByRole('menu');

  await userEvent.tab(); // Focus enters menu
  expect(screen.getByRole('menuitem', { name: 'File' })).toHaveFocus();

  await userEvent.keyboard('{ArrowDown}');
  expect(screen.getByRole('menuitem', { name: 'Edit' })).toHaveFocus();

  await userEvent.keyboard('{ArrowDown}');
  expect(screen.getByRole('menuitem', { name: 'View' })).toHaveFocus();
});
```

### Test focus management

```javascript
test('opening modal moves focus to first focusable element', async () => {
  render(<ModalTrigger />);
  await userEvent.click(screen.getByRole('button', { name: /open/i }));

  const dialog = screen.getByRole('dialog');
  expect(dialog).toBeInTheDocument();

  const closeButton = within(dialog).getByRole('button', { name: /close/i });
  expect(closeButton).toHaveFocus();
});

test('closing modal returns focus to the trigger', async () => {
  render(<ModalTrigger />);
  const trigger = screen.getByRole('button', { name: /open/i });
  await userEvent.click(trigger);

  await userEvent.keyboard('{Escape}');
  expect(trigger).toHaveFocus();
});
```

### Test ARIA states

```javascript
test('accordion toggles aria-expanded on click', async () => {
  render(<Accordion items={[{ title: 'FAQ', content: 'Answer' }]} />);
  const button = screen.getByRole('button', { name: /faq/i });

  expect(button).toHaveAttribute('aria-expanded', 'false');
  await userEvent.click(button);
  expect(button).toHaveAttribute('aria-expanded', 'true');
});
```

## Form Tests

### Test the full validation flow

```javascript
test('shows inline errors for each invalid field on submit', async () => {
  render(<RegistrationForm />);
  await userEvent.click(screen.getByRole('button', { name: /register/i }));

  expect(screen.getByText(/name is required/i)).toBeInTheDocument();
  expect(screen.getByText(/email is required/i)).toBeInTheDocument();
  expect(screen.getByText(/password is required/i)).toBeInTheDocument();
});

test('clears field error when user corrects the input', async () => {
  render(<RegistrationForm />);
  await userEvent.click(screen.getByRole('button', { name: /register/i }));

  const emailInput = screen.getByLabelText(/email/i);
  await userEvent.type(emailInput, '[email protected]');

  expect(screen.queryByText(/email is required/i)).not.toBeInTheDocument();
});
```

### Test form submission states

```javascript
test('disables submit button and shows spinner during submission', async () => {
  server.use(http.post('/api/register', async () => {
    await delay(100);
    return HttpResponse.json({ success: true });
  }));

  render(<RegistrationForm />);
  fillValidForm();
  const button = screen.getByRole('button', { name: /register/i });

  await userEvent.click(button);
  expect(button).toBeDisabled();
  expect(button).toHaveAttribute('aria-busy', 'true');

  await waitFor(() => expect(button).toBeEnabled());
});
```

## Mocking Guidelines

### Mock at the network boundary, not internal modules

```javascript
// Bad — mocks internal implementation
vi.mock('../api/userService', () => ({
  fetchUser: vi.fn().mockResolvedValue(mockUser)
}));

// Good — mocks the network request
server.use(
  http.get('/api/user/:id', ({ params }) => {
    return HttpResponse.json(mockUsers[params.id]);
  })
);
```

Mocking internals makes tests pass even when the real integration is broken.
Mocking at the network boundary (with MSW or similar) tests the actual code
path.

### Don't mock what you're testing

```javascript
// Pointless — you're testing your mock, not the function
vi.mock('../utils/format', () => ({
  formatDate: vi.fn().mockReturnValue('Jan 1, 2026')
}));
test('formatDate returns formatted date', () => {
  expect(formatDate(new Date())).toBe('Jan 1, 2026'); // Always passes
});
```

## Test Maintenance

### Delete tests that test nothing

AI-generated test suites often include tests like:

```javascript
// Delete these — they prove nothing
test('renders without crashing', () => {
  render(<Component />);
});

test('matches snapshot', () => {
  const { container } = render(<Component />);
  expect(container).toMatchSnapshot();
});

test('has correct class name', () => {
  render(<Component />);
  expect(screen.getByTestId('wrapper')).toHaveClass('wrapper');
});
```

These tests pass when the component is broken. They fail when you make
harmless refactors. They're worse than no tests because they create false
confidence.

### Keep test files next to source files

```
components/
  Button/
    Button.tsx
    Button.test.tsx
  Modal/
    Modal.tsx
    Modal.test.tsx
```

Not in a separate `__tests__/` directory tree. Co-located tests are found
faster, updated alongside the component, and deleted when the component is
removed.

## The Testing Checklist

After writing tests for any feature, verify:

- [ ] Every user flow has at least one integration test
- [ ] Every `fetch`/API call has both success and error tests
- [ ] Every form has validation and submission state tests
- [ ] Boundary values are tested (empty, min, max, overflow)
- [ ] Loading, error, and empty states are tested
- [ ] Keyboard navigation works (tab order, arrow keys, escape)
- [ ] Focus management is tested (modals, drawers, dropdowns)
- [ ] ARIA states toggle correctly (expanded, selected, checked)
- [ ] Tests use behavior descriptions as names
- [ ] No snapshot tests without clear behavioral assertions
- [ ] Mocks are at the network boundary, not internal modules
- [ ] No tests that pass when the feature is broken

## Anti-Patterns

**Never do these:**

- Write `renders without crashing` as the only test — it proves nothing
- Use snapshot tests as a substitute for behavioral assertions — they
  generate noise, not confidence
- Mock the module you're testing — you're testing your mock
- Test CSS class names — they can change without breaking behavior
- Write tests after being told "add tests" without knowing what to test —
  test the behavior described in the feature spec, not random implementation
- Use `getByTestId` as the default query — prefer `getByRole`,
  `getByLabelText`, `getByText` which test the accessible interface
- Copy the component's implementation logic into the test — you'll have two
  copies of the same bug
- Skip error path tests because "the happy path works" — errors are where
  bugs hide
- Write one giant test with 20 assertions — split into focused tests
- Leave `test.skip` or `test.todo` in the suite indefinitely — either write
  the test or delete the placeholder

Changelog

V1 April 15, 2026: Initial skill covering behavior-first testing, test structure and naming, error and edge case coverage, accessibility contract tests, form validation flows, mocking guidelines, test maintenance, and the 12-item testing checklist.

FAQ

What kind of tests does this skill teach?

A 12-point review that promotes behavior-driven tests catching real bugs: user flow integration tests, error handling tests, boundary value tests, accessibility contract tests (keyboard navigation, focus management, ARIA states), form validation flow tests, and loading state transition tests. It explicitly flags low-value tests like snapshot tests and “renders without crashing” tests for removal.

Why are AI-generated tests often low quality?

AI tools generate tests that prove nothing: they test that a div renders, that a class name exists, or that a mocked function returns what it was mocked to return. These tests pass when the code is broken and fail when you make harmless refactors. This skill teaches the agent to test behavior instead of implementation details, creating tests that actually catch regressions.

How should mocking work in tests?

Mock at the network boundary (using MSW or similar), not at internal module boundaries. Mocking internal modules makes tests pass even when the real integration is broken. The skill covers when to mock, what to mock, and the common anti-pattern of mocking the module under test.

Which AI coding tools is this compatible with?

Cursor, Claude Code, Codex CLI, Windsurf, GitHub Copilot, and Gemini CLI. The skill is a single Markdown file and ships in the native format for each tool with one-click copy.