# zudo-test-wisdom > Takazudo's frontend testing strategy guide for AI agents and developers --- # Decision Guide > Source: /pj/zudo-test/docs/decision-guide ## Quick Decision Table Use this table to determine the minimum testing level for your current task: | What Changed | Minimum Level | Why | |-------------|--------------|-----| | Pure logic / utility function | Level 1 | No DOM or CSS involvement | | Component props / state | Level 2 | Need simulated DOM to verify output | | Build config / template / SSG | Level 3 | Need to inspect built output files | | CSS / layout / visibility | Level 5 | CSS requires real rendering engine | | Interactive UI flow | Level 4 | Need real browser for user interactions | | Visual bug report | Level 5 | Must see computed styles + visual result | | "It's not showing" | Level 5 | Visibility is a visual property | | "It's still broken" (after test passed) | Next level up | Current level has blind spot for this bug | **"Minimum level" means the lowest level that can reliably catch the bug.** Using a lower level gives false confidence -- the test passes, but the bug remains. ## Decision Flowchart ```mermaid flowchart TD A[What are you verifying?] --> B{Is it pure logic?} B -->|Yes| L1[Level 1: Unit Test] B -->|No| C{Is it component behavior?} C -->|Yes| D{Does it involve CSS/visibility?} D -->|No| L2[Level 2: DOM Component Test] D -->|Yes| L5a[Level 5: Visual Verification] C -->|No| E{Is it build output?} E -->|Yes| L3[Level 3: Build Output Test] E -->|No| F{Is it interactive UI?} F -->|Yes| L4[Level 4: E2E Browser Test] F -->|No| G{Is it visual/CSS?} G -->|Yes| L5b[Level 5: Visual Verification] G -->|No| L1b[Level 1: Start with Unit Test] ``` ## Key Principle: CSS Always Needs Level 5 Any change involving CSS, layout, or visual appearance should default to Level 5. This is because: 1. **Level 1** (unit tests) -- has no DOM at all, cannot process CSS 2. **Level 2** (jsdom) -- has a DOM but no CSS engine; `getComputedStyle()` returns empty strings 3. **Level 3** (build output) -- checks file contents, not rendering 4. **Level 4** (Playwright) -- runs in a real browser but typically asserts on DOM state, not visual appearance Only Level 5 (verify-ui + headless-browser) can deterministically check computed style values and visually confirm the result. ## Escalation Triggers Move to the next level when: - Test passes but user says problem persists - You are testing logic but the bug might be visual - Lower-level test confirms data is correct but output looks wrong - You suspect a CSS or layout issue - Multiple lower-level tests pass but the feature does not work in the browser --- # Overview > Source: /pj/zudo-test/docs/overview Personal dev notes by [Takazudo](https://x.com/Takazudo). Not official testing documentation. Written for personal reference and AI-assisted coding. This site covers **frontend testing strategy** for AI-assisted development. The focus is on choosing the right testing approach when working with AI coding agents -- knowing when a unit test is sufficient, when you need browser-level verification, and how to avoid the common trap of false confidence from passing tests that do not actually verify what matters. ## What This Covers These are the core topics documented here, all based on patterns extracted from real production projects: - **Testing Levels** -- The 5-level escalation ladder from unit tests to visual verification, each with increasing coverage and decreasing blind spots - **Decision Guide** -- Which testing level to use based on what changed, common failure patterns when the wrong level is chosen, and required behaviors for AI agents - **Real-World Patterns** -- Battle-tested Vitest configurations, Playwright E2E patterns, Tauri desktop app testing, and backend/Node.js testing approaches - **Tools Reference** -- Quick lookup of tools and commands for each testing level This guide is primarily **frontend-focused**, because the author's strength is in frontend development. Backend testing is covered as a supplementary topic -- specifically the patterns that emerge when frontend and backend are properly separated and each can be tested independently. ## The Core Insight The single most important concept in AI-assisted testing is **test level escalation**. When an AI agent fixes a bug and declares it resolved, the fix is only as reliable as the testing method used to verify it. A unit test can confirm logic is correct, but it cannot confirm that the user actually sees the result on screen. The gap between "logically correct" and "visually correct" is the most common source of false confidence. Testing should escalate, not repeat. When a lower-level test passes but the problem persists, move to the next level: 1. **Level 1** -- Unit/Logic Tests (vitest, jest) 2. **Level 2** -- DOM-based Component Tests (jsdom, Testing Library) 3. **Level 3** -- Build Output Verification (read built files) 4. **Level 4** -- E2E Browser Tests (Playwright, headless browser) 5. **Level 5** -- Deterministic + Visual Verification (verify-ui + headless browser) ## How to Use This Guide 1. Start with the [Decision Guide](../decision-guide/index.mdx) to determine which testing level fits your current task 2. Read the [Testing Levels](../testing-levels/index.mdx) section for detailed coverage of each level 3. Reference [Real-World Patterns](../real-world-patterns/index.mdx) for production-tested configurations 4. Keep the [Tools Reference](../tools-reference/index.mdx) open for quick lookup --- # Real-World Patterns > Source: /pj/zudo-test/docs/real-world-patterns ## Production-Tested Approaches The patterns in this section come from real production projects, not theoretical best practices. Each pattern has been used in shipped software and refined through actual bugs and failures. ## Source Projects | Project | Type | Key Testing Patterns | |---------|------|---------------------| | **zudo-text** | Tauri text editor | Mock backend adapter, console error monitoring, @interactive keyboard tests | | **zmod** | Web application | Production-build Playwright, CI image interception, sharded E2E | | **zudo-pattern-gen** | Pattern generator | Deterministic PNG rendering, Miniflare + D1/R2 integration | | **mdx-formatter** | CLI tool | Contract testing Rust via Vitest, idempotency invariant | ## Pattern Categories ### [Vitest Patterns](/pj/zudo-test-wisdom/docs/real-world-patterns/vitest-patterns) Workspace configurations, jsdom/happy-dom environments, contract testing, idempotency testing, and Miniflare integration testing. ### [Playwright Patterns](/pj/zudo-test-wisdom/docs/real-world-patterns/playwright-patterns) CI-safe test splitting, console error monitoring, image interception, production build verification, and sharded CI runs. ### [Tauri Testing](/pj/zudo-test-wisdom/docs/real-world-patterns/tauri-testing) WebKit-only rule, core crate pattern, backend bridge mocking, and the full 8-step escalation ladder for desktop apps. ### [Backend & Node.js Testing](/pj/zudo-test-wisdom/docs/real-world-patterns/backend-testing) Cloudflare Functions with Miniflare, HTTP API testing, fetch mocking with `vi.stubGlobal`, file system testing with temp directories, and key principles for separating frontend and backend test configs. ## Common Theme Across all these projects, one theme emerges: **the testing approach must match the deployment target**. A web app needs browser-level testing. A CLI tool needs output verification. A Tauri app needs WebKit-specific testing. There is no universal test setup -- only appropriate test setups for specific contexts. --- # The 5 Testing Levels > Source: /pj/zudo-test/docs/testing-levels ## Overview Frontend testing is not a single activity -- it is a spectrum of verification methods, each with different capabilities and blind spots. This section defines five distinct levels, ordered by the scope of what they can verify. ```mermaid graph LR L1[Level 1Unit/Logic] --> L2[Level 2DOM Component] L2 --> L3[Level 3Build Output] L3 --> L4[Level 4E2E Browser] L4 --> L5[Level 5Visual Verify] ``` ## Summary Table | Level | Name | Tools | Can Verify | Blind Spots | |-------|------|-------|-----------|-------------| | 1 | Unit/Logic | vitest, jest | Pure functions, data transforms, state logic | DOM, CSS, rendering | | 2 | DOM Component | vitest + jsdom, Testing Library | Component output, props, DOM structure | Visual rendering, CSS | | 3 | Build Output | vitest on built files | SSG output, templates, bundler config | Runtime behavior, visuals | | 4 | E2E Browser | Playwright, headless-browser | User interactions, navigation, full page | Subtle visual details | | 5 | Deterministic + Visual | verify-ui + headless-browser | Computed styles, pixel-level rendering | Minimal blind spots | ## The Escalation Rule When a test at the current level passes but the user reports the problem persists, do **not** re-run the same test. Escalate to the next level. The levels are ordered by coverage breadth. Each higher level catches categories of bugs that lower levels structurally cannot detect. For example, a unit test cannot catch `overflow: hidden` hiding an element, because unit tests do not process CSS at all. ## Choosing the Right Level Not every task requires Level 5. The goal is to match the test level to the nature of the change: - **Logic changes** -- Level 1 is sufficient - **Component behavior** -- Level 2 covers it - **Build configuration** -- Level 3 targets it - **Interactive flows** -- Level 4 is needed - **Visual/CSS bugs** -- Level 5 is required See the [Decision Guide](/pj/zudo-test-wisdom/docs/decision-guide) for a detailed mapping table. --- # Tools Reference > Source: /pj/zudo-test/docs/tools-reference ## Tools by Testing Level | Level | Tools | Install | Run Command | |-------|-------|---------|-------------| | 1 | vitest | `pnpm add -D vitest` | `pnpm vitest` | | 1 | jest | `pnpm add -D jest` | `pnpm jest` | | 2 | vitest + jsdom | `pnpm add -D vitest jsdom` | `pnpm vitest` | | 2 | vitest + happy-dom | `pnpm add -D vitest happy-dom` | `pnpm vitest` | | 2 | @testing-library/react | `pnpm add -D @testing-library/react` | (used in tests) | | 3 | vitest (reading built files) | `pnpm add -D vitest` | `pnpm build && pnpm vitest --project build` | | 4 | Playwright | `pnpm add -D @playwright/test` | `npx playwright test` | | 4 | headless-browser | (Claude Code skill) | `/headless-browser` | | 5 | verify-ui | (Claude Code skill) | `/verify-ui` | | 5 | headless-browser | (Claude Code skill) | `/headless-browser` | ## Tool Capabilities Matrix | Capability | vitest | vitest+jsdom | Playwright | verify-ui | headless-browser | |-----------|--------|-------------|------------|-----------|-----------------| | Pure function testing | Yes | Yes | -- | -- | -- | | DOM structure | -- | Yes | Yes | -- | -- | | User events | -- | Yes | Yes | -- | -- | | CSS computed styles | -- | -- | Yes | **Yes** | -- | | Visual screenshots | -- | -- | Yes | -- | **Yes** | | Console errors | -- | -- | Yes | -- | Yes | | Multi-page navigation | -- | -- | Yes | -- | Yes | | Build output | Yes | -- | -- | -- | -- | | Responsive viewports | -- | -- | Yes | Yes | Yes | ## Vitest Configuration Quick Reference ### Minimal unit test setup ```typescript // vitest.config.ts test: { include: ["src/**/*.test.ts"], }, }); ``` ### Component test setup (jsdom) ```typescript // vitest.config.ts test: { environment: "jsdom", include: ["src/**/*.test.tsx"], setupFiles: ["./test-setup.ts"], }, }); ``` ### Workspace setup (multiple test types) ```typescript // vitest.workspace.ts { test: { name: "unit", include: ["src/**/*.test.ts"], environment: "node" }, }, { test: { name: "component", include: ["src/**/*.test.tsx"], environment: "jsdom", }, }, { test: { name: "build", include: ["tests/build/**/*.test.ts"], environment: "node", }, }, ]); ``` ## Playwright Configuration Quick Reference ### Basic setup ```typescript // playwright.config.ts testDir: "./e2e", use: { baseURL: "http://localhost:3000", }, webServer: { command: "pnpm dev", port: 3000, reuseExistingServer: !process.env.CI, }, }); ``` ### WebKit-only (for Tauri) ```typescript // playwright.config.ts projects: [ { name: "webkit", use: { ...devices["Desktop Safari"] } }, ], }); ``` ### Production build testing ```typescript // playwright.config.ts webServer: { command: "pnpm build && pnpm preview", port: 4173, reuseExistingServer: !process.env.CI, }, }); ``` ## Claude Code Skill Commands | Command | Level | Purpose | |---------|-------|---------| | `/headless-browser` | 4, 5 | Take screenshots, check console errors, interact with pages | | `/verify-ui` | 5 | Assert computed CSS values deterministically | | `/test-wisdom` | -- | Get guidance on which testing level to use | --- # /CLAUDE.md > Source: /pj/zudo-test/docs/claude-md/root **Path:** `CLAUDE.md` # zudo-test-wisdom Takazudo's frontend testing strategy guide, built with zudo-doc (Astro, MDX, Tailwind CSS v4). ## Commands ```bash pnpm dev # Start Astro dev server pnpm build # Build static site to dist/ pnpm preview # Preview built site pnpm check # Astro type checking pnpm format:md # Format MDX files pnpm b4push # Pre-push validation (format + typecheck + build) ``` ## Content Structure - English (default): `src/content/docs/` -> `/docs/...` - Japanese: `src/content/docs-ja/` -> `/ja/docs/...` - Japanese docs mirror the English directory structure **Bilingual rule**: When creating or updating any doc page, ALWAYS update both the English (`docs/`) and Japanese (`docs-ja/`) versions in the same PR. Keep code blocks identical between languages -- only translate surrounding prose. **Exception**: Pages with `generated: true` in frontmatter (e.g., claude-resources auto-generated pages) do not require Japanese translations. ## Content Categories Top-level directories under `src/content/docs/`. Directories with header nav entries are mapped via `categoryMatch` in `src/config/settings.ts`: - `overview/` - Introduction and purpose of the testing guide - `testing-levels/` - The 5 testing levels from unit to visual verification - `decision-guide/` - Which level to use, common failure patterns, required behaviors - `real-world-patterns/` - Vitest patterns, Playwright E2E, Tauri app testing - `tools-reference/` - Quick reference of tools per testing level Auto-generated directories (no header nav entry, managed by claude-resources integration): - `claude-md/` - CLAUDE.md file documentation (`noPage: true`) - `claude-skills/` - Claude Skills documentation (`noPage: true`) ## Writing Docs All documentation files use `.mdx` format with YAML frontmatter. ### Frontmatter Fields Schema defined in `src/content.config.ts`: | Field | Type | Required | Description | |---|---|---|---| | `title` | string | Yes | Page title, rendered as the page h1 | | `description` | string | No | Subtitle displayed below the title | | `sidebar_position` | number | No | Sort order within category (lower = higher). Always set this for predictable ordering | | `sidebar_label` | string | No | Custom text for sidebar display (overrides `title`) | | `generated` | boolean | No | Build-time generated content (skip translation) | ### Content Rules - **No h1 in content**: The frontmatter `title` is automatically rendered as the page h1. Start your content with `## h2` headings. - **Always set `sidebar_position`**: Without it, pages sort alphabetically which is unpredictable. - **Kebab-case file names**: Use `my-article.mdx`, not `myArticle.mdx`. ### Linking Between Docs Use relative file paths with the `.mdx` extension: ```markdown [Link text](./sibling-page.mdx) [Link text](../other-category/page.mdx#anchor) ``` ### Admonitions Available globally without imports: ``, ``, ``, ``, `` ### Navigation Structure Navigation is filesystem-driven. Directory structure directly becomes sidebar navigation. Pages ordered by `sidebar_position` (ascending). Category index pages (`index.mdx`) control category position. ### Content Creation Workflow 1. Create English `.mdx` file under `src/content/docs/` with `title` and `sidebar_position` 2. Write content starting with `## h2` headings (not `# h1`) 3. Create matching Japanese file under `src/content/docs-ja/` 4. Keep code blocks identical -- only translate prose 5. Run `pnpm format:md` then `pnpm build` to verify ## Skills This repo contains test-related Claude Code skills under `.claude/skills/`: - `test-wisdom/` - Doc-lookup skill (**generated** by `pnpm setup:doc-skill`, gitignored -- do NOT track or edit directly) - `verify-ui/` - Deterministic CSS/computed-style verification (tracked in git) - `headless-browser/` - Headless browser screenshots and interaction (tracked in git, run `npm install` in its directory for playwright) Run `pnpm setup:doc-skill` to generate the test-wisdom skill AND symlink all skills to `~/.claude/skills/`. The script handles both the generated doc-lookup skill and the tracked skills in one step. ## Typography - Futura for page h1 titles and header site name (`font-futura` class) - Noto Sans JP for body text - Headings use font-weight 400 (normal), not bold ## Site Config - Base path: `/pj/zudo-test` - Settings: `src/config/settings.ts` ## CI/CD - PR checks: typecheck + build + Cloudflare Pages preview - Main deploy: build + Cloudflare Pages production + IFTTT notification - Secrets: CLOUDFLARE_ACCOUNT_ID, CLOUDFLARE_API_TOKEN, IFTTT_PROD_NOTIFY --- # The Common AI Failure Pattern > Source: /pj/zudo-test/docs/decision-guide/common-failure-pattern ## The Pattern The most frequent testing failure in AI-assisted development follows a predictable pattern: 1. User reports: **"It's not showing"** or **"It's still broken"** 2. AI agent writes a unit test or checks logic 3. Logic test passes -- the data is correct, the component returns the right JSX 4. AI agent declares: **"Fixed! The test passes."** 5. User reports: **"It's still not showing."** 6. AI agent re-runs the same test, gets the same passing result 7. Cycle repeats until the user loses trust ## Why It Happens The AI agent chose **Level 1 (unit test)** or **Level 2 (DOM test)** for a problem that requires **Level 5 (visual verification)**. The data and logic are correct. The component renders the right elements in the DOM tree. But a CSS rule somewhere in the ancestor chain makes the element invisible. ## Concrete Example A developer asks the AI to add a notification banner. The AI creates the component: ```tsx // NotificationBanner.tsx return ( {message} ); } ``` And writes a test: ```tsx // NotificationBanner.test.tsx it("renders the message", () => { render(); expect(screen.getByText("Update available")).toBeTruthy(); }); ``` The test passes. But the user sees nothing on screen. Why? ```css /* layout.css -- inherited from the page layout */ .main-content { overflow: hidden; max-height: 0; transition: max-height 0.3s ease; } .main-content.expanded { max-height: 1000px; } ``` The notification banner is rendered inside `.main-content`, which has `max-height: 0` and `overflow: hidden` by default. The element exists in the DOM (Level 2 passes), but it is visually clipped to zero height (Level 5 would catch this). ## The Fix When the user says something "is not showing," default to Level 5 verification. Check **computed styles** on the element and its ancestors, then take a **screenshot** to confirm visual state. Level 5 verification would reveal: ``` verify-ui result: .main-content { overflow: hidden // <-- clipping children max-height: 0px // <-- zero height } .notification-banner { display: block // present in DOM // but parent clips it to invisible } ``` ## Other Variants of This Pattern The `overflow: hidden` + `height: 0` pattern is just one variant. Other common causes: | CSS Property | Effect | Level 2 Detects? | Level 5 Detects? | |-------------|--------|-----------------|-----------------| | `display: none` | Element removed from flow | No | Yes | | `visibility: hidden` | Element invisible but takes space | No | Yes | | `opacity: 0` | Element fully transparent | No | Yes | | `z-index` stacking | Element behind another | No | Yes | | `overflow: hidden` on ancestor | Content clipped | No | Yes | | `transform: scale(0)` | Element shrunk to nothing | No | Yes | | `position: absolute` + off-screen | Element positioned off viewport | No | Yes | ## Key Takeaway **Never declare a visual bug fixed based on a logic test.** If the user says something is not visible, the test must verify visibility -- and that requires Level 5. --- # Vitest Patterns > Source: /pj/zudo-test/docs/real-world-patterns/vitest-patterns ## Workspace-Level Vitest Configs Large projects often need different Vitest configurations for different test types. Use Vitest workspaces to manage this: ```typescript // vitest.workspace.ts { test: { name: "unit", include: ["src/**/*.test.ts"], environment: "node", }, }, { test: { name: "component", include: ["src/**/*.test.tsx"], environment: "jsdom", }, }, { test: { name: "build", include: ["tests/build/**/*.test.ts"], environment: "node", }, }, ]); ``` Separate configs let you run fast unit tests independently from slower component or build tests: `vitest --project unit` vs `vitest --project component`. ## jsdom and happy-dom Environments Choose the right DOM environment for component tests: ```typescript // vitest.config.ts for component tests test: { environment: "jsdom", // Or use happy-dom for faster execution: // environment: "happy-dom", globals: true, setupFiles: ["./test-setup.ts"], }, }); ``` Per-file environment override when needed: ```typescript // @vitest-environment jsdom describe("DOM-dependent test", () => { it("manipulates the document", () => { document.body.innerHTML = 'Hello'; expect(document.getElementById("app")?.textContent).toBe("Hello"); }); }); ``` ## Separate Configs for Different Test Types A pattern from mdx-formatter: separate configurations for unit tests, API tests, and function tests: ``` vitest.config.ts # Default: unit tests vitest.config.api.ts # API integration tests vitest.config.functions.ts # Cloud function tests ``` ```json { "scripts": { "test": "vitest", "test:api": "vitest --config vitest.config.api.ts", "test:functions": "vitest --config vitest.config.functions.ts", "test:all": "vitest && vitest --config vitest.config.api.ts" } } ``` ## Contract Testing: Rust Engine via Vitest From mdx-formatter: the Vitest suite serves as a contract test for the Rust formatting engine. The Node.js wrapper calls the Rust binary, and Vitest verifies the output matches expectations: ```typescript // tests/contract.test.ts describe("Rust formatter contract", () => { it("formats basic MDX correctly", () => { const input = "# Hello\nSome text here"; const result = execSync(`echo '${input}' | ./target/release/formatter`, { encoding: "utf-8", }); expect(result.trim()).toBe("# Hello\n\nSome text here"); }); }); ``` Contract testing lets you verify a binary's behavior from a higher-level language. The Vitest suite acts as the specification -- if the Rust engine changes behavior, the contract tests catch it. ## Idempotency Testing A powerful invariant for formatters and transformers: applying the operation twice should produce the same result as applying it once. ```typescript // tests/idempotency.test.ts const FIXTURES_DIR = join(__dirname, "fixtures"); describe("idempotency", () => { const fixtures = readdirSync(FIXTURES_DIR).filter((f) => f.endsWith(".mdx") ); for (const fixture of fixtures) { it(`is idempotent for ${fixture}`, () => { const input = readFileSync(join(FIXTURES_DIR, fixture), "utf-8"); const firstPass = format(input); const secondPass = format(firstPass); expect(firstPass).toBe(secondPass); }); } }); ``` ## Miniflare + D1/R2 Integration Tests From zudo-pattern-gen: testing Cloudflare Workers with local D1 database and R2 storage using Miniflare: ```typescript // tests/integration.test.ts describe("Worker with D1", () => { let mf: Miniflare; beforeAll(async () => { mf = new Miniflare({ modules: true, script: `export default { async fetch(req, env) { /* ... */ } }`, d1Databases: ["DB"], r2Buckets: ["STORAGE"], }); // Run migrations const db = await mf.getD1Database("DB"); await db.exec(` CREATE TABLE IF NOT EXISTS patterns ( id TEXT PRIMARY KEY, name TEXT NOT NULL, data TEXT NOT NULL ) `); }); it("stores and retrieves a pattern", async () => { const resp = await mf.dispatchFetch("http://localhost/api/patterns", { method: "POST", body: JSON.stringify({ name: "test", data: "{}" }), }); expect(resp.status).toBe(201); const getResp = await mf.dispatchFetch("http://localhost/api/patterns"); const patterns = await getResp.json(); expect(patterns).toHaveLength(1); expect(patterns[0].name).toBe("test"); }); }); ``` Miniflare runs the same Workers runtime locally, so integration tests closely match production behavior. Combined with D1 and R2 bindings, you can test full data flows without deploying. --- # Level 1: Unit/Logic Tests > Source: /pj/zudo-test/docs/testing-levels/level-1-unit-tests ## What Level 1 Tests Level 1 tests verify **pure logic** -- functions that take inputs and return outputs without touching the DOM, browser APIs, or visual rendering. Typical targets: - Utility functions (string manipulation, date formatting, math) - Data transforms (API response mapping, normalization) - State reducers and selectors - Validation logic - Business rules ## Tools | Tool | Use Case | |------|----------| | **vitest** | Modern projects, Vite-based, fast HMR | | **jest** | Legacy projects, CRA, widely supported | ## Example ```typescript // utils/format-price.ts return `$${(cents / 100).toFixed(2)}`; } ``` ```typescript // utils/format-price.test.ts describe("formatPrice", () => { it("formats cents to dollar string", () => { expect(formatPrice(1299)).toBe("$12.99"); }); it("handles zero", () => { expect(formatPrice(0)).toBe("$0.00"); }); it("handles single-digit cents", () => { expect(formatPrice(5)).toBe("$0.05"); }); }); ``` ## TDD Cycle For logic changes, follow the standard TDD cycle: 1. Write a failing test that describes the expected behavior 2. Implement the minimum code to make the test pass 3. Verify the test is green 4. Refactor if needed 5. Repeat for each behavior ## Blind Spots Level 1 tests are **blind to everything visual**. They cannot detect: - Whether an element renders on screen - CSS issues (overflow, z-index, opacity, display:none) - Layout problems (element present in DOM but not visible) - Browser-specific rendering behavior - User interaction flows Level 1 is the right choice when you are confident the bug is in logic, not in rendering. If there is any doubt about visibility or visual correctness, escalate to Level 4 or 5. ## When to Use Level 1 | Scenario | Level 1 Appropriate? | |----------|---------------------| | Function returns wrong value | Yes | | Data transform is incorrect | Yes | | Validation rejects valid input | Yes | | Element not showing on screen | No -- use Level 4/5 | | Layout looks wrong | No -- use Level 5 | | Click handler not firing | No -- use Level 2 or 4 | --- # Required Testing Behavior > Source: /pj/zudo-test/docs/decision-guide/required-behavior ## The Five Rules Every AI agent working on frontend code must follow these five rules when testing: ### Rule 1: Declare Your Test Plan First Before writing any test or running any verification, state: - **What** you are testing - **Which level** you are using - **Why** that level is appropriate ``` Test plan: - Testing: notification banner visibility after CSS fix - Level: 5 (verify-ui + screenshot) - Reason: This is a visual bug — the element exists in DOM but user reports it's not visible. Need to check computed styles. ``` Declaring the test plan prevents the common mistake of defaulting to Level 1 out of habit. It forces conscious level selection. ### Rule 2: Match Test Level to Goal The test level must match what you are actually verifying: | If you are verifying... | Use at least... | |------------------------|----------------| | A function returns the right value | Level 1 | | A component renders correct elements | Level 2 | | Build output contains expected content | Level 3 | | A user flow works in the browser | Level 4 | | Something is visually correct | Level 5 | Do not use Level 1 for a Level 5 problem. The test will pass, but the bug will remain. ### Rule 3: Escalate When Lower Levels Pass But Problem Persists When a test passes but the user says the problem is not fixed: 1. **Do not** re-run the same test 2. **Do not** suggest the user clear their cache 3. **Do** escalate to the next testing level 4. If already at Level 4, escalate to Level 5 5. If at Level 5, investigate deeper (check ancestor elements, stacking context, etc.) **Never suggest "clear browser cache" or "hard refresh" as a solution.** If the user says it is still broken, the code is still broken. Investigate the actual cause. ### Rule 4: Default to Level 5 for UI/CSS Any task involving: - CSS changes - Layout modifications - Visibility issues - Visual appearance - Responsive design - Spacing, colors, fonts should default to Level 5 verification. Lower levels are structurally unable to verify CSS correctness. ### Rule 5: Report What Was NOT Tested After testing, explicitly state the blind spots: ``` Verification complete: - Tested: computed styles confirm banner has display:block, opacity:1, and parent has overflow:visible - Screenshot: banner is visible at top of page - NOT tested: responsive behavior at mobile breakpoints - NOT tested: animation transition timing ``` This transparency helps the user decide if additional testing is needed. ## Summary Checklist Before declaring any fix complete: - [ ] Test plan was declared before testing - [ ] Test level matches the nature of the change - [ ] If test passes but user reports failure, escalated to next level - [ ] For CSS/visual changes, used Level 5 - [ ] Reported what was and was not tested --- # Playwright Patterns > Source: /pj/zudo-test/docs/real-world-patterns/playwright-patterns ## CI-Safe vs @interactive Test Split Not all E2E tests can run in CI. Tests requiring keyboard shortcuts, clipboard access, or desktop-specific interactions should be tagged and split: ```typescript // e2e/basic-navigation.spec.ts -- runs in CI test("loads the home page", async ({ page }) => { await page.goto("/"); await expect(page.locator("h1")).toBeVisible(); }); ``` ```typescript // e2e/keyboard-shortcuts.spec.ts -- only runs locally test("@interactive Ctrl+S saves document", async ({ page }) => { await page.goto("/editor"); await page.keyboard.press("Control+KeyS"); await expect(page.locator(".save-indicator")).toHaveText("Saved"); }); ``` ```typescript // playwright.config.ts projects: [ { name: "ci", testMatch: /.*\.spec\.ts/, testIgnore: /.*@interactive.*/, }, { name: "interactive", testMatch: /.*@interactive.*\.spec\.ts/, }, ], }); ``` Run `npx playwright test --project=ci` in CI and `npx playwright test --project=interactive` locally when you need full keyboard/clipboard testing. ## Console Error Monitoring Extend Playwright's test fixture to automatically fail on console errors: ```typescript // e2e/fixtures.ts consoleErrors: async ({ page }, use) => { const errors: string[] = []; page.on("console", (msg) => { if (msg.type() === "error") { errors.push(msg.text()); } }); page.on("pageerror", (error) => { errors.push(error.message); }); await use(errors); // Assert no console errors after each test expect(errors).toEqual([]); }, }); ``` ```typescript // e2e/app.spec.ts test("home page has no console errors", async ({ page, consoleErrors }) => { await page.goto("/"); await page.waitForLoadState("networkidle"); // consoleErrors assertion happens automatically in fixture teardown }); ``` ## CI Image Interception for Speed In CI, network requests for large images slow down tests. Intercept and replace them with tiny placeholders: ```typescript // e2e/fixtures.ts page: async ({ page }, use) => { // Intercept image requests in CI if (process.env.CI) { await page.route("**/*.{png,jpg,jpeg,webp,gif}", (route) => { route.fulfill({ status: 200, contentType: "image/png", // 1x1 transparent PNG body: Buffer.from( "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==", "base64" ), }); }); } await use(page); }, }); ``` This pattern from zmod cut CI E2E test time by 40% by eliminating network latency for image assets. ## Production Build Verification Test against the production build, not the dev server. This catches build-specific issues: ```typescript // playwright.config.ts webServer: { command: "npm run build && npm run preview", port: 4173, reuseExistingServer: !process.env.CI, }, use: { baseURL: "http://localhost:4173", }, }); ``` ```typescript // e2e/production.spec.ts test("production build serves all pages", async ({ page }) => { const urls = ["/", "/docs", "/about", "/contact"]; for (const url of urls) { const response = await page.goto(url); expect(response?.status()).toBe(200); } }); test("production build has no broken links", async ({ page }) => { await page.goto("/"); const links = await page.locator("a[href^='/']").all(); for (const link of links) { const href = await link.getAttribute("href"); if (href) { const response = await page.goto(href); expect(response?.status()).toBe(200); } } }); ``` ## Sharded CI Runs For large test suites, shard across multiple CI runners: ```yaml # .github/workflows/e2e.yml jobs: e2e: strategy: matrix: shard: [1/4, 2/4, 3/4, 4/4] steps: - uses: actions/checkout@v4 - run: npx playwright install --with-deps - run: npx playwright test --shard=${{ matrix.shard }} - uses: actions/upload-artifact@v4 if: failure() with: name: playwright-report-${{ strategy.job-index }} path: playwright-report/ ``` ## Mock Backend Adapter for Frontend-Only E2E When testing frontend behavior independently from the real backend: ```typescript // e2e/mocks/backend-adapter.ts await page.route("**/api/**", async (route) => { const url = new URL(route.request().url()); const mocks: Record = { "/api/user": { id: 1, name: "Test User", email: "test@example.com" }, "/api/settings": { theme: "dark", language: "en" }, "/api/documents": [ { id: 1, title: "Doc 1" }, { id: 2, title: "Doc 2" }, ], }; const mockData = mocks[url.pathname]; if (mockData) { await route.fulfill({ status: 200, contentType: "application/json", body: JSON.stringify(mockData), }); } else { await route.continue(); } }); } ``` ```typescript // e2e/frontend.spec.ts test.beforeEach(async ({ page }) => { await mockBackend(page); }); test("displays user name from mock API", async ({ page }) => { await page.goto("/dashboard"); await expect(page.locator(".user-name")).toHaveText("Test User"); }); ``` Mock backends are great for frontend-focused testing, but they do not replace integration tests against the real API. Use both: mocked for UI behavior, real for data flow. --- # Level 2: DOM-based Component Tests > Source: /pj/zudo-test/docs/testing-levels/level-2-dom-tests ## What Level 2 Tests Level 2 tests verify **component behavior in a simulated DOM environment**. They can check that components render the right elements, respond to user events, and update state correctly -- all without a real browser. Typical targets: - Component rendering (does it output the right elements?) - Conditional display (does it show/hide based on props or state?) - Event handlers (does clicking trigger the right behavior?) - Prop-driven behavior - Component integration (parent-child communication) ## Tools | Tool | Role | |------|------| | **vitest** | Test runner | | **jsdom** or **happy-dom** | Simulated browser DOM environment | | **@testing-library/react** | DOM queries and user event simulation | | **@testing-library/preact** | For Preact projects | ## Setup Configure vitest to use a DOM environment: ```typescript // vitest.config.ts test: { environment: "jsdom", // or "happy-dom" }, }); ``` **happy-dom** is faster than jsdom for most use cases. Use jsdom when you need broader browser API compatibility. ## Example ```tsx // components/Toggle.tsx const [on, setOn] = useState(false); return ( setOn(!on)}> {label}: {on ? "ON" : "OFF"} ); } ``` ```tsx // components/Toggle.test.tsx describe("Toggle", () => { it("renders with OFF state", () => { render(); expect(screen.getByText("Sound: OFF")).toBeTruthy(); }); it("toggles to ON on click", async () => { render(); await userEvent.click(screen.getByRole("button")); expect(screen.getByText("Sound: ON")).toBeTruthy(); }); }); ``` ## Blind Spots Level 2 tests use a **simulated** DOM, not a real browser. They cannot detect: - CSS effects (the DOM has no CSS engine) - Visual layout (elements may exist in DOM but be invisible via CSS) - Browser-specific rendering - Scroll behavior - Animation and transition states - Computed styles The critical gap: an element can be present in the jsdom tree (Level 2 passes) while being completely invisible on screen due to CSS (Level 5 would catch this). ## When to Use Level 2 | Scenario | Level 2 Appropriate? | |----------|---------------------| | Component renders wrong text | Yes | | Props not passed correctly | Yes | | Click handler not updating state | Yes | | Element present but not visible | No -- use Level 5 | | CSS layout broken | No -- use Level 5 | | Multi-page navigation flow | No -- use Level 4 | --- # Tauri App Testing > Source: /pj/zudo-test/docs/real-world-patterns/tauri-testing ## The WebKit-Only Rule Tauri uses WebKit as its rendering engine on all platforms. When writing Playwright E2E tests for a Tauri frontend, always test against WebKit: ```typescript // playwright.config.ts projects: [ { name: "webkit", use: { ...devices["Desktop Safari"] }, }, ], }); ``` Do **not** test Tauri frontends with Chromium or Firefox in Playwright. The production app uses WebKit, so testing against other engines gives false confidence. A test passing in Chromium does not mean it works in the Tauri window. ## The Core Crate Pattern Extract platform-independent business logic into a separate Rust crate that has no Tauri dependencies. This crate can be tested with standard `cargo test` without needing a Tauri application context: ``` src-tauri/ Cargo.toml # depends on core + tauri src/ main.rs # Tauri setup, commands commands.rs # #[tauri::command] handlers core/ Cargo.toml # no Tauri dependency src/ lib.rs settings.rs # pure Rust logic file_ops.rs # file operations transforms.rs # data transforms ``` ```toml # core/Cargo.toml [package] name = "myapp-core" version = "0.1.0" edition = "2021" [dependencies] serde = { version = "1", features = ["derive"] } serde_json = "1" # No tauri dependency here ``` ```rust // core/src/settings.rs use serde::{Deserialize, Serialize}; #[derive(Debug, Serialize, Deserialize, PartialEq)] pub struct Settings { pub theme: String, pub font_size: u32, } impl Settings { pub fn with_theme(mut self, theme: &str) -> Self { self.theme = theme.to_string(); self } } #[cfg(test)] mod tests { use super::*; #[test] fn test_with_theme() { let settings = Settings { theme: "light".to_string(), font_size: 14, }; let updated = settings.with_theme("dark"); assert_eq!(updated.theme, "dark"); assert_eq!(updated.font_size, 14); } } ``` The core crate pattern lets you run `cargo test` in CI without building the full Tauri app. This is fast, reliable, and catches logic bugs early. ## Backend Bridge Mock Adapter Pattern In a Tauri app, the frontend communicates with the Rust backend through IPC commands. For frontend testing, mock this bridge: ```typescript // src/adapters/backend.ts -- the real adapter getSettings(): Promise; saveSettings(settings: Settings): Promise; readFile(path: string): Promise; } async getSettings() { return invoke("get_settings"); }, async saveSettings(settings) { return invoke("save_settings", { settings }); }, async readFile(path) { return invoke("read_file", { path }); }, }; ``` ```typescript // src/adapters/mock-backend.ts -- for testing overrides: Partial = {} ): BackendAdapter { return { async getSettings() { return { theme: "dark", fontSize: 14 }; }, async saveSettings() {}, async readFile() { return "mock file content"; }, ...overrides, }; } ``` ```typescript // In the app entry point const backend = import.meta.env.MODE === "test" ? createMockBackend() : tauriBackend; ``` ## The 8-Step Escalation Ladder For Tauri apps, the escalation ladder extends beyond the standard 5 levels: | Step | Method | What It Catches | |------|--------|----------------| | 1 | `cargo test` on core crate | Pure Rust logic bugs | | 2 | Vitest unit tests | Frontend logic bugs | | 3 | Vitest + jsdom component tests | Component behavior bugs | | 4 | Playwright WebKit (dev server) | Frontend rendering bugs | | 5 | Playwright WebKit (production build) | Build-specific frontend bugs | | 6 | verify-ui + headless-browser | CSS/visual bugs in frontend | | 7 | Tauri dev mode manual test | IPC integration bugs | | 8 | Tauri production build manual test | Full app packaging bugs | Steps 1-6 are automatable and should be in CI. Steps 7-8 require the full Tauri application and are typically manual or require specialized CI with display servers. ### Step-by-Step Guide **Steps 1-3: Fast, automatable, no browser needed** ```bash # Step 1: Rust core logic cd core && cargo test # Step 2: Frontend unit tests pnpm vitest --project unit # Step 3: Frontend component tests pnpm vitest --project component ``` **Steps 4-6: Need a browser, still automatable** ```bash # Step 4: E2E against dev server (WebKit only) pnpm dev & npx playwright test --project=webkit # Step 5: E2E against production build pnpm build pnpm preview & npx playwright test --project=webkit # Step 6: Visual verification verify-ui --url http://localhost:4173 --selector ".app" --check "display: flex" ``` **Steps 7-8: Need full Tauri app** ```bash # Step 7: Tauri dev mode pnpm tauri dev # Manual testing in the actual Tauri window # Step 8: Production build pnpm tauri build # Test the built .dmg / .msi / .AppImage ``` ## CI Configuration for Tauri Projects ```yaml # .github/workflows/test.yml jobs: rust-tests: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - run: cd core && cargo test frontend-tests: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: pnpm/action-setup@v4 - run: pnpm install - run: pnpm vitest run e2e-tests: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: pnpm/action-setup@v4 - run: pnpm install - run: npx playwright install webkit --with-deps - run: pnpm build && pnpm preview & - run: npx playwright test --project=webkit ``` Note that Tauri E2E tests in CI require WebKit dependencies. On Ubuntu, this means `npx playwright install webkit --with-deps`. On macOS runners, WebKit is already available. --- # Level 3: Build Output Verification > Source: /pj/zudo-test/docs/testing-levels/level-3-build-output ## What Level 3 Tests Level 3 tests verify **build output** -- the actual files produced by your build tool. Instead of testing source code directly, you run the build and inspect the results. Typical targets: - SSG (Static Site Generation) output HTML - Template rendering results - Bundler output (correct chunks, code splitting) - Generated configuration files - Build-time data transforms (MDX compilation, content collections) ## Tools | Tool | Role | |------|------| | **vitest** | Test runner, reading and asserting on files | | **fs/path** | Node.js file system APIs to read build output | | **cheerio** | Parse and query HTML output | ## Example: SSG Output Verification ```typescript // tests/build-output.test.ts const DIST = join(__dirname, "../dist"); describe("build output", () => { it("generates index.html with correct title", () => { const html = readFileSync(join(DIST, "index.html"), "utf-8"); expect(html).toContain("My Site"); }); it("generates sitemap.xml", () => { const sitemap = readFileSync(join(DIST, "sitemap.xml"), "utf-8"); expect(sitemap).toContain(" { const pages = ["index.html", "about/index.html", "docs/index.html"]; for (const page of pages) { const content = readFileSync(join(DIST, page), "utf-8"); expect(content).toContain(""); } }); }); ``` ## Example: MDX Formatter Contract Testing A real-world pattern from mdx-formatter: the Vitest suite tests the Rust formatting engine by running it on fixture files and comparing output: ```typescript // tests/format.test.ts describe("mdx formatting", () => { it("is idempotent", () => { const input = readFixture("sample.mdx"); const first = format(input); const second = format(first); expect(first).toBe(second); }); }); ``` **Idempotency testing** is a powerful invariant for any formatter or transformer: applying the operation twice should produce the same result as applying it once. ## Blind Spots Level 3 tests verify file contents, not runtime behavior. They cannot detect: - JavaScript runtime errors - Client-side hydration issues - Visual rendering problems - Browser API interactions - Dynamic content loaded after page load ## When to Use Level 3 | Scenario | Level 3 Appropriate? | |----------|---------------------| | SSG page missing from build | Yes | | Wrong HTML structure in output | Yes | | Bundle too large / wrong chunks | Yes | | Hydration mismatch in browser | No -- use Level 4 | | Page renders blank in browser | No -- use Level 4/5 | | CSS not applied correctly | No -- use Level 5 | --- # Backend & Node.js Testing > Source: /pj/zudo-test/docs/real-world-patterns/backend-testing ## Frontend-Backend Separation Philosophy The author's approach: strong in frontend, delegate backend implementation to AI. This makes testing strategy critical -- when you did not write the backend code yourself, tests are your primary verification that it works correctly. The key principle is **separation of concerns enables testable backends**. When frontend and backend are properly separated: - Each layer can be tested independently - Frontend tests use MSW-like mocks to decouple from backend availability - Backend tests run against real or emulated infrastructure without needing a browser - Changes to one layer do not break the other layer's tests This separation is not just an architectural nicety -- it is what makes AI-assisted backend development viable. The AI writes the implementation, and the tests verify it actually works. ## Cloudflare Functions with Miniflare For projects using Cloudflare Workers/Functions with D1 (SQLite) databases and R2 object storage, Miniflare provides local emulation for integration testing. ### Test Environment Setup Create a helper that spins up an isolated test environment with real D1 and R2 bindings: ```ts // test/helpers/test-env.ts const mf = new Miniflare({ modules: true, script: "", d1Databases: ["DB"], r2Buckets: ["BUCKET"], }); const env = await mf.getBindings(); // Run SQL migrations const migration = readFileSync("migrations/0001_init.sql", "utf-8"); const db = env.DB as D1Database; await db.exec(migration); return { mf, env, db, bucket: env.BUCKET as R2Bucket }; } ``` ### Test Data Factories Use factory functions to create test data with sensible defaults: ```ts // test/helpers/factories.ts return { id: crypto.randomUUID(), name: "Test Project", createdAt: new Date().toISOString(), ...overrides, }; } ``` ### Full CRUD Lifecycle Tests Test the complete create-read-update-delete cycle against the emulated infrastructure: ```ts describe("Projects API", () => { let env: any; let mf: any; beforeEach(async () => { ({ env, mf } = await createTestEnv()); }); it("creates and retrieves a project", async () => { const createRes = await app.request( new Request("http://localhost/api/projects", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ name: "My Project" }), }), {}, env, ); expect(createRes.status).toBe(201); const created = await createRes.json(); const getRes = await app.request( new Request(`http://localhost/api/projects/${created.id}`), {}, env, ); expect(getRes.status).toBe(200); const fetched = await getRes.json(); expect(fetched.name).toBe("My Project"); }); }); ``` The key pattern here is `app.request(req, {}, env)` -- this calls the Hono app directly with the Miniflare-provided bindings, bypassing HTTP entirely. ### Separate Vitest Config Backend tests need their own Vitest config with `environment: 'node'`: ```ts // vitest.config.backend.ts test: { environment: "node", include: ["test/backend/**/*.test.ts"], testTimeout: 10000, }, }); ``` ## HTTP API Testing For testing against live or deployed endpoints (staging, preview, production), use direct HTTP requests. ### Environment-Based URL Switching ```ts // test/helpers/api-client.ts function getBaseUrl(): string { if (process.env.TEST_API_URL) { return process.env.TEST_API_URL; } if (process.env.CF_PAGES_URL) { return process.env.CF_PAGES_URL; } return "http://localhost:8787"; } const BASE_URL = getBaseUrl(); return fetch(`${BASE_URL}${path}`, { headers: { Authorization: `Bearer ${process.env.TEST_API_TOKEN}`, }, }); } ``` ### Destructive Test Guards Tests that modify data should be skipped in production: ```ts const isProduction = process.env.TEST_ENV === "production"; describe("Admin API", () => { it.skipIf(isProduction)("deletes all test data", async () => { const res = await apiGet("/api/admin/reset-test-data"); expect(res.status).toBe(200); }); }); ``` ### Network Timeouts HTTP tests need longer timeouts than unit tests: ```ts // vitest.config.http.ts test: { environment: "node", include: ["test/http/**/*.test.ts"], testTimeout: 30000, }, }); ``` ## HTTP Client Testing with Mocks When testing code that makes HTTP requests (API clients, auth flows), mock `fetch` at the global level. ### The `vi.stubGlobal` Pattern ```ts describe("ApiClient", () => { const fetchMock = vi.fn(); beforeEach(() => { vi.stubGlobal("fetch", fetchMock); }); afterEach(() => { vi.restoreAllMocks(); }); it("sends auth header", async () => { fetchMock.mockResolvedValueOnce( new Response(JSON.stringify({ ok: true }), { status: 200 }), ); const client = new ApiClient({ token: "test-token" }); await client.get("/api/data"); expect(fetchMock).toHaveBeenCalledWith( expect.stringContaining("/api/data"), expect.objectContaining({ headers: expect.objectContaining({ Authorization: "Bearer test-token", }), }), ); }); }); ``` ### Auth Flow Testing Test token refresh and retry logic by chaining mock responses: ```ts it("retries with refreshed token on 401", async () => { // First call returns 401 fetchMock.mockResolvedValueOnce(new Response(null, { status: 401 })); // Token refresh succeeds fetchMock.mockResolvedValueOnce( new Response(JSON.stringify({ token: "new-token" }), { status: 200 }), ); // Retry with new token succeeds fetchMock.mockResolvedValueOnce( new Response(JSON.stringify({ data: "success" }), { status: 200 }), ); const client = new ApiClient({ token: "old-token" }); const result = await client.get("/api/data"); expect(result.data).toBe("success"); expect(fetchMock).toHaveBeenCalledTimes(3); }); ``` ## File System Testing For Node.js tools that read/write files, use temporary directories for isolation. ### Temp Directory Pattern ```ts describe("FileProcessor", () => { let tempDir: string; beforeEach(() => { tempDir = mkdtempSync(join(tmpdir(), "test-")); }); afterEach(() => { rmSync(tempDir, { recursive: true, force: true }); }); it("processes and writes output file", () => { const inputPath = join(tempDir, "input.txt"); const outputPath = join(tempDir, "output.txt"); writeFileSync(inputPath, "hello world"); processFile(inputPath, outputPath); expect(readFileSync(outputPath, "utf-8")).toBe("HELLO WORLD"); }); }); ``` The key points: - `mkdtempSync` creates a unique directory per test run -- no collisions - `afterEach` cleanup ensures no leftover files between tests - All paths are relative to `tempDir` -- tests never touch the real filesystem ## Key Principles for Backend Testing 1. **Use `environment: 'node'`** in vitest config, not `jsdom`. Backend code does not need a DOM. 2. **Separate vitest configs** for frontend and backend tests. They need different environments, different timeouts, and often different setup files. 3. **Helper factories** for test data. Never hardcode test data inline -- use factory functions with sensible defaults and overrides. 4. **Environment variables** for configuration switching. Use `process.env` to switch between local, preview, and production URLs rather than hardcoding. 5. **Guard destructive tests** with `it.skipIf()`. Tests that delete data or reset state should never run against production. 6. **Longer timeouts for network tests**. Default 5-second timeouts are too short for HTTP integration tests. Use 30 seconds or more. --- # Level 4: E2E Browser Tests > Source: /pj/zudo-test/docs/testing-levels/level-4-e2e-browser ## What Level 4 Tests Level 4 tests run in a **real browser** (or headless browser), verifying complete user flows from page load through interaction to final state. They catch runtime errors, navigation issues, and interaction bugs that simulated environments miss. Typical targets: - Full user workflows (login, form submission, navigation) - Client-side routing - API integration (with mocks or real endpoints) - Console error detection - Cross-page interactions - Dynamic content loading ## Tools | Tool | Role | |------|------| | **Playwright** | Full E2E test framework with multi-browser support | | **headless-browser** | Quick verification script for page health checks | ## Example: Playwright E2E ```typescript // e2e/navigation.spec.ts test("navigates from home to docs", async ({ page }) => { await page.goto("/"); await page.click('a[href="/docs"]'); await expect(page).toHaveURL("/docs"); await expect(page.locator("h1")).toHaveText("Documentation"); }); test("no console errors on page load", async ({ page }) => { const errors: string[] = []; page.on("console", (msg) => { if (msg.type() === "error") errors.push(msg.text()); }); await page.goto("/"); expect(errors).toHaveLength(0); }); ``` ## Example: Headless Browser Quick Check For rapid verification without a full test suite, use headless-browser to take a screenshot and check for errors: ```bash # Quick page health check node headless-check.js --url http://localhost:3000 --screenshot # Check for console errors node headless-check.js --url http://localhost:3000 --console-errors ``` ## Console Error Monitoring A powerful pattern from production projects: monitor console output during E2E tests to catch unexpected errors: ```typescript // e2e/fixtures.ts page: async ({ page }, use) => { const errors: string[] = []; page.on("console", (msg) => { if (msg.type() === "error") { errors.push(msg.text()); } }); await use(page); if (errors.length > 0) { throw new Error( `Console errors detected:\n${errors.join("\n")}` ); } }, }); ``` ## Blind Spots Level 4 tests interact with real browser rendering but typically assert on **DOM state**, not visual appearance. They may miss: - Subtle CSS issues (off-by-one pixel, wrong color shade) - Elements technically visible but visually overlapped - Font rendering differences - Responsive layout breakpoints - Computed style values Level 4 confirms "the element exists and is interactable." For "the element looks correct," escalate to Level 5. ## When to Use Level 4 | Scenario | Level 4 Appropriate? | |----------|---------------------| | Page navigation broken | Yes | | Form submission fails | Yes | | Dynamic content not loading | Yes | | Console errors on page | Yes | | Element positioned wrong | Partial -- Level 5 better | | CSS color/spacing wrong | No -- use Level 5 | --- # Level 5: Deterministic + Visual Verification > Source: /pj/zudo-test/docs/testing-levels/level-5-visual-verification ## What Level 5 Tests Level 5 combines **deterministic computed style assertions** with **visual screenshot verification** to catch bugs that all lower levels miss. This is the highest-confidence verification method for UI/CSS work. The two complementary tools: - **verify-ui** -- Extracts computed CSS values from a running page and asserts exact values (no LLM interpretation) - **headless-browser** -- Takes screenshots for visual comparison and interaction testing ## Why Both Are Needed Computed style checks alone can miss visual issues that come from element interaction (overlapping, stacking context). Screenshots alone rely on LLM interpretation which can have confirmation bias. Together, they provide: | Approach | Strength | Weakness | |----------|----------|----------| | verify-ui | Deterministic, exact values | Cannot see visual composition | | headless-browser | Sees full visual result | LLM interpretation may have bias | | Both combined | Deterministic + visual | Minimal blind spots | ## verify-ui Example ```bash # Check computed styles of an element verify-ui --url http://localhost:3000 \ --selector ".hero-title" \ --check "font-size: 48px" \ --check "color: rgb(255, 255, 255)" \ --check "display: block" ``` ```bash # Verify visibility verify-ui --url http://localhost:3000 \ --selector ".notification-banner" \ --check "display: block" \ --check "opacity: 1" \ --check "visibility: visible" ``` ## The Common Failure This Catches Consider the classic scenario: 1. An AI agent adds a notification banner component 2. Unit test confirms the component renders (Level 1 passes) 3. DOM test confirms the element is in the tree (Level 2 passes) 4. But the parent container has `overflow: hidden` and `height: 0` 5. The banner exists in the DOM but is completely invisible Level 5 catches this: ```bash # verify-ui would reveal: # .notification-banner parent has height: 0px, overflow: hidden verify-ui --url http://localhost:3000 \ --selector ".notification-container" \ --check "height: auto" \ --check "overflow: visible" ``` And the screenshot from headless-browser would visually confirm the banner is not visible. ## headless-browser for Visual Verification ```bash # Full-page screenshot node headless-check.js --url http://localhost:3000 --screenshot --full-page # Screenshot of specific viewport node headless-check.js --url http://localhost:3000 \ --screenshot --width 375 --height 812 # iPhone viewport ``` ## Combined Workflow For UI/CSS changes, the recommended verification workflow is: 1. Make the CSS/layout change 2. Run **verify-ui** to assert exact computed values 3. Take a **screenshot** with headless-browser to visually confirm 4. If verify-ui passes but screenshot looks wrong, investigate stacking/composition issues 5. If screenshot looks right but verify-ui fails, update the expected values This two-step approach eliminates both false positives (screenshot looks fine but values are wrong) and false negatives (values look right but visual composition is broken). ## Blind Spots Level 5 has minimal blind spots, but some remain: - Font rendering differences across operating systems - Sub-pixel rendering variations - Animation timing (mid-frame states) - Browser-specific quirks not present in the test browser ## When to Use Level 5 | Scenario | Level 5 Appropriate? | |----------|---------------------| | CSS change not taking effect | Yes | | Element not visible despite being in DOM | Yes | | Layout spacing looks wrong | Yes | | Color or font-size incorrect | Yes | | Responsive breakpoint issue | Yes | | Any time user says "it's still broken" after lower-level test passed | Yes | --- # test-wisdom Skill > Source: /pj/zudo-test/docs/overview/test-wisdom-skill The `test-wisdom` skill is a Claude Code skill that indexes all frontend testing documentation articles in this site. It enables AI coding agents to quickly look up relevant testing patterns and techniques during development. ## What It Does The skill maintains a documentation index that maps testing concepts to their articles. When invoked, it finds and reads the relevant article, then applies the recommended patterns. The documentation index is generated from all MDX articles under `src/content/docs/` and `src/content/docs-ja/` (for Japanese). Each `.mdx` file has YAML frontmatter with `title` and `description` fields that help identify the right article to read. ## Installation Run the setup script to create the skill and symlink it to your global Claude Code skills directory: ```bash pnpm run setup:doc-skill ``` This creates the skill at `.claude/skills/test-wisdom/` and symlinks it to `~/.claude/skills/test-wisdom`. ## Usage ### Lookup Mode (default) In any Claude Code session, invoke the skill with a topic keyword: ``` /test-wisdom vitest patterns /test-wisdom playwright e2e /test-wisdom testing level escalation ``` The skill will find the relevant article(s) from the documentation, read them, and apply the testing patterns when writing code. ### Update Mode (`-u` / `--update`) When you have new information about testing and want to add or update documentation in this repo, use the `-u` flag: ``` /test-wisdom -u vitest /test-wisdom --update playwright patterns ``` In update mode, the skill guides Claude to: 1. Ask what you learned or want to document 2. Search existing docs to find related articles 3. Create a new `.mdx` file or update an existing one 4. Update the corresponding Japanese translation under `docs-ja/` 5. Run `pnpm format:md` to format the new/changed files ## Skill Structure ``` .claude/skills/test-wisdom/ SKILL.md # Generated skill definition docs/ # Symlink to src/content/docs/ docs-ja/ # Symlink to src/content/docs-ja/ ``` The `SKILL.md` file is generated by the setup script. It contains the skill metadata and instructions for how Claude Code should use the documentation. ## How It Works When you invoke `/test-wisdom `, Claude Code: 1. Finds the relevant article(s) from the `docs/` directory based on the topic 2. Reads only the specific article(s) needed -- it does not load all articles at once 3. Applies the information from the article when answering your question 4. Mentions the source article path so you can find it for further reading Japanese documentation is available under `docs-ja/`. When working in Japanese or asking for Japanese content, the skill prefers articles from `docs-ja/`. --- # Claude > Source: /pj/zudo-test/docs/claude Claude Code configuration reference. ## Resources --- # headless-browser > Source: /pj/zudo-test/docs/claude-skills/headless-browser ## File Structure ``` headless-browser/ ├── SKILL.md └── scripts/ └── headless-check.js ``` # Headless Browser Skill Browser automation with two efficiency tiers for optimal token usage. ## Decision Tree ``` Need browser automation? | +-- Just checking page health/errors/screenshot? | --> Tier 1: headless-check.js (fastest, lowest tokens) | +-- Need to interact (click, fill, navigate)? | --> Tier 2: custom Playwright script (medium tokens) | +-- Need persistent context, rich introspection, or very complex scenarios? --> MCP Playwright (highest capability, higher tokens) ``` --- ## Tier 1: Lightweight Checks (headless-check.js) **Best for:** Quick health checks, screenshot capture, error detection **Script:** `$HOME/.claude/skills/headless-browser/scripts/headless-check.js` ### Commands Basic check (recommended for error detection): ```bash node $HOME/.claude/skills/headless-browser/scripts/headless-check.js --url --no-block-resources ``` Quick check (faster, but may miss font/image errors): ```bash node $HOME/.claude/skills/headless-browser/scripts/headless-check.js --url ``` With screenshot: ```bash node $HOME/.claude/skills/headless-browser/scripts/headless-check.js --url --screenshot viewport --no-block-resources node $HOME/.claude/skills/headless-browser/scripts/headless-check.js --url --screenshot full --no-block-resources ``` Options: - `--timeout ` - Timeout (default: 15000) - `--wait-until load|networkidle|domcontentloaded` - Wait strategy - `--no-javascript` - Disable JavaScript - `--no-block-resources` - Load all resources (recommended for accurate error detection) - `--user-agent "..."` - Custom user agent **Important:** Always use `--no-block-resources` when checking for errors. Without it, fonts and images are blocked for speed, which can cause false `net::ERR_FAILED` errors or miss real resource loading failures. ### Output JSON with: - `title`, `statusCode`, `finalUrl`, `durationMs` - `hasErrors` - Boolean error indicator - `console` - Console messages (truncated, collapsed) - `pageErrors` - JavaScript errors - `networkErrors` - Failed requests - `metrics` - Performance timing - `screenshot` - File path if captured ### Example Output ``` { "url": "https://example.com", "title": "Example Domain", "statusCode": 200, "durationMs": 1234, "hasErrors": false, "console": { "entries": [], "total": 0 }, "pageErrors": [], "screenshot": { "path": "/Users/you/cclogs/my-project/headless-screenshots/screenshot-2025-01-28.png" } } ``` --- ## Tier 2: Interactive Operations (custom Playwright scripts) **Best for:** Clicking, form filling, navigation, multi-step automation **CRITICAL:** Playwright is installed in `$HOME/.claude/skills/headless-browser/node_modules/`. Scripts **MUST** be saved under `$HOME/.claude/skills/headless-browser/` (e.g., `$HOME/.claude/skills/headless-browser/tmp-browser-check.mjs`). **NEVER save to `/tmp/` or `$HOME/.claude/` root** — Node will fail with `ERR_MODULE_NOT_FOUND` because it cannot find the `playwright` package outside this directory tree. ### How to Use Write a temporary `.mjs` script, save it as `$HOME/.claude/skills/headless-browser/tmp-browser-check.mjs`, and run it with `node`. ### Script Template ```javascript // Save as $HOME/.claude/skills/headless-browser/tmp-browser-check.mjs const browser = await chromium.launch(); const page = await browser.newPage({ viewport: { width: 1280, height: 800 } }); await page.goto('http://localhost:4321/some/page', { waitUntil: 'networkidle' }); // Interact await page.locator('button:has-text("Submit")').click(); await page.waitForTimeout(500); // Screenshot const path = `${process.env.HOME}/cclogs/REPO/headless-screenshots/result.png`; await page.screenshot({ path }); console.log('Screenshot:', path); // Evaluate const result = await page.evaluate(() => document.title); console.log('Title:', result); await browser.close(); ``` ### Running ```bash node $HOME/.claude/skills/headless-browser/tmp-browser-check.mjs ``` ### Common Operations ```javascript // Click by selector await page.locator('.my-button').click(); // Click by text await page.locator('button:has-text("Save")').click(); // Fill an input await page.locator('input[name="email"]').fill('test@example.com'); // Press a key await page.keyboard.press('Enter'); // Wait for element await page.locator('.result').waitFor({ state: 'visible', timeout: 5000 }); // Get computed style / check z-index const zIndex = await page.evaluate(() => { return window.getComputedStyle(document.querySelector('.panel')).zIndex; }); // Scroll to bottom await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight)); // Full page screenshot await page.screenshot({ path: 'full.png', fullPage: true }); ``` ### Cleanup Delete the temporary script after use: ```bash rm -f $HOME/.claude/skills/headless-browser/tmp-browser-check.mjs ``` --- ## When to Use What | Task | Recommended Tier | |------|-----------------| | Check if page loads | Tier 1 | | Capture screenshot | Tier 1 | | Check for console errors | Tier 1 + `--no-block-resources` | | Check network failures | Tier 1 + `--no-block-resources` | | Click a button | Tier 2 | | Fill a form | Tier 2 | | Navigate through pages | Tier 2 | | Test login flow | Tier 2 | | Extract text after interaction | Tier 2 | | Complex stateful automation | MCP Playwright | | Self-healing tests | MCP Playwright | | Deep debugging with tracing | MCP Playwright / Chrome DevTools | --- ## CSS/Style Verification Guidelines **For CSS/style verification, prefer `/verify-ui` which provides deterministic computed style checks before visual analysis.** The guidelines below apply when using `/headless-browser` directly for CSS checks. ### Theme Awareness The target website may support light/dark themes. When checking CSS/style-related changes, capture screenshots in **both** color schemes. Use Playwright's `colorScheme` option: ```javascript // Capture in both themes for (const scheme of ['light', 'dark']) { const page = await browser.newPage({ viewport: { width: 1280, height: 800 }, colorScheme: scheme, }); await page.goto(url, { waitUntil: 'networkidle' }); await page.screenshot({ path: `${ssDir}/check-${scheme}.png` }); await page.close(); } ``` ### Responsive Width Variations When checking layout or fluid design, capture at multiple viewport widths to cover the design's breakpoints. The number and values of widths depend on the project — check the project's CSS/config for actual breakpoints (e.g., `@theme` breakpoints, Tailwind config, media queries) and capture at widths that test each transition point. For example, a project with `sm: 640px`, `lg: 1024px`, `xl: 1280px` breakpoints needs captures at widths like 400px, 700px, 1100px, and 1300px to verify behavior on each side of each breakpoint. If the project's breakpoints are unclear, ask the user which width variations matter for the layout being checked. ### Mandatory Visual Verification **CRITICAL**: After capturing screenshots, you MUST read and carefully examine every captured PNG file using the Read tool. Do NOT report success without visually verifying the screenshots show the expected result. Workflow: 1. Capture screenshots 2. **Read each screenshot with the Read tool** 3. **Carefully inspect** — check borders, spacing, alignment, color, contrast 4. Compare against what was requested 5. Only then report the result Screenshots that are captured but not visually inspected are worthless. If you skip verification, you will miss problems and the user will have to point them out repeatedly. ## Best Practices 1. **Start with Tier 1** - If you just need to check if a page works, use headless-check.js 2. **Escalate to Tier 2** - When interactions are needed, write a custom Playwright script 3. **Save scripts under `$HOME/.claude/skills/headless-browser/`** - This is where `playwright` is installed as a node_module 4. **Clean up temp scripts** - Delete `$HOME/.claude/skills/headless-browser/tmp-*.mjs` after use 5. **Use `waitForTimeout` between actions** - Gives the page time to settle after interactions 6. **Capture both themes** - When checking CSS, use `colorScheme: 'light'` and `colorScheme: 'dark'` 7. **Capture at project breakpoints** - When checking layout, read the project's breakpoint config and capture widths that cover each transition 8. **Always visually verify** - Read every captured PNG with the Read tool before reporting --- ## Technical Notes - Both tiers use Playwright's headless Chromium (installed in `$HOME/.claude/skills/headless-browser/node_modules/`) - **Resource blocking:** By default, Tier 1 blocks images/fonts for speed. Use `--no-block-resources` for accurate error detection - Tier 2 scripts must be saved under `$HOME/.claude/skills/headless-browser/` so Node module resolution can find the `playwright` package in its `node_modules/` - Screenshots saved to `$HOME/cclogs/{repo-name}/headless-screenshots/` (Tier 1) or custom path (Tier 2) - Both tiers are more token-efficient than MCP Playwright --- # verify-ui > Source: /pj/zudo-test/docs/claude-skills/verify-ui ## File Structure ``` verify-ui/ ├── SKILL.md └── scripts/ └── verify-styles.mjs ``` # Verify UI Verify that CSS/UI changes actually match what was requested. ## Core Principle The user asked you to do something ("add a border", "center the dialog", "make it full width on mobile"). After implementing, verify that **the specific thing they asked for** is actually working. Not a generic checklist — verify the requirement. ## Step 1: Clarify What to Verify **If the requirement is clear** — translate it to verifiable CSS properties: | User said | Verify these properties | |-----------|------------------------| | "add a border" | `border-style` (not `none`), `border-width` (not `0px`), `border-color` | | "center the dialog" | `margin` (should be `auto` or symmetric), bounding box position | | "full width on mobile" | `width` at narrow viewport, `max-width` | | "remove rounded corners" | `border-radius` (should be `0px`) | | "make text bigger" | `font-size` | | "fix the z-index" | `z-index`, stacking relative to other elements | **If the requirement is vague** ("check the result", "verify it looks good", "confirm it works") — **ask the user back**: > "What specifically should I verify? For example: is it about the border, the positioning, the spacing, the colors, or something else?" Do NOT proceed with a generic screenshot check when you don't know what you're looking for. That's how confirmation bias happens. ## Step 2: Extract Computed Styles Run the verification script targeting the element in question: ```bash LOGDIR=$(node $HOME/.claude/scripts/get-logdir.js) mkdir -p "$LOGDIR" node $HOME/.claude/skills/verify-ui/scripts/verify-styles.mjs "" "" "$LOGDIR/verify-ui" "" "" ``` **Detect viewport widths from the project's breakpoints:** ```bash grep -n "breakpoint" src/styles/global.css 2>/dev/null ``` Pick widths that test each side of each breakpoint. Default: `400,800,1200`. Default schemes: `light,dark`. **Parse the JSON output.** Find the properties relevant to the user's request and compare against expected values. ``` [PASS] border-style: solid (expected: not "none") [FAIL] border-width: 0px (expected: 1px) ← THIS IS THE PROBLEM ``` If the computed style check reveals the issue, fix it. No screenshot analysis needed — the data is deterministic. ## Step 3: Visual Confirmation (if computed styles pass) If computed styles look correct but the user's concern might be visual (layout, spacing, alignment), read the captured screenshots: 1. **Read** each screenshot with the Read tool 2. **Describe** what you see — specifically about the thing the user asked for 3. **Compare** against the requirement 4. **Report** whether it matches **NEVER say "looks correct" without stating what specific thing you checked and what you observed.** ## When to Use Multiple Widths/Themes - **Always** if the change involves responsive behavior (breakpoint-dependent styling) - **Always** if the change involves colors or borders (may be invisible in one theme) - **Not needed** for changes that are viewport/theme independent (e.g., changing font-weight) ## Anti-Patterns - **Generic "take a screenshot and verify"** — verify WHAT? If you don't know, ask. - **"Looks correct" after glancing at screenshot** — state what you checked. - **Running verification without knowing what you're looking for** — confirmation bias guaranteed. - **Checking only one viewport width when the change is responsive** — you'll miss breakpoint issues.