Playwright Patterns

E2E testing patterns with Playwright for CI and production verification.

CI-Safe vs @interactive Test Split

Not all E2E tests can run in CI. Tests requiring keyboard shortcuts, clipboard access, or desktop-specific interactions should be tagged and split:

// e2e/basic-navigation.spec.ts -- runs in CI
import { test, expect } from "@playwright/test";

test("loads the home page", async ({ page }) => {
  await page.goto("/");
  await expect(page.locator("h1")).toBeVisible();
});

// e2e/keyboard-shortcuts.spec.ts -- only runs locally
import { test, expect } from "@playwright/test";

test("@interactive Ctrl+S saves document", async ({ page }) => {
  await page.goto("/editor");
  await page.keyboard.press("Control+KeyS");
  await expect(page.locator(".save-indicator")).toHaveText("Saved");
});

// playwright.config.ts
import { defineConfig } from "@playwright/test";

export default defineConfig({
  projects: [
    {
      name: "ci",
      testMatch: /.*\.spec\.ts/,
      testIgnore: /.*@interactive.*/,
    },
    {
      name: "interactive",
      testMatch: /.*@interactive.*\.spec\.ts/,
    },
  ],
});

💡 Tip

Run npx playwright test --project=ci in CI and npx playwright test --project=interactive locally when you need full keyboard/clipboard testing.

Quarantining Flakes: The Retries-Asymmetry Trap

Beyond the CI-safe vs @interactive split, there is a third tag worth knowing: @flaky. It exists because of a subtle trap — CI and your local pre-push gate often run with different retry budgets, so a test can be green in one and red in the other.

The trap starts here:

// playwright.config.ts
import { defineConfig } from "@playwright/test";

export default defineConfig({
  // CI retries twice; local runs get zero retries.
  retries: process.env.CI ? 2 : 0,
});

With retries: 2 in CI, a test that passes on its second or third attempt is reported green. Run the exact same test on a local b4push gate with retries: 0 and it goes red on the first failure. The test did not change — only the retry budget did. This is the insight to internalize: “flaky” is gate-relative. A test is only as flaky as the strictest gate it has to clear.

When you have a known-flaky test that already lives on main, deleting it loses coverage. Instead, tag it @flaky in the title and quarantine it from the strict local gate without removing it:

# scripts/run-b4push.sh -- exclude @flaky from the strict local gate
CHROMIUM_INVERT="@interactive|@flaky"
WEBKIT_INVERT="@flaky"

# Chromium step: skip both @interactive and @flaky
pnpm test:e2e --project=chromium --grep-invert="$CHROMIUM_INVERT"

# WebKit @interactive step: run @interactive but still drop @flaky
pnpm test:e2e --project=webkit --grep="@interactive" --grep-invert="$WEBKIT_INVERT"

The Chromium step adds @flaky to its --grep-invert (alongside @interactive), and the WebKit @interactive step also excludes @flaky. The tests stay in the suite — CI still runs them and tolerates the occasional retry — but they no longer trip the zero-retry local gate.

⚠️ Warning

@flaky is a quarantine, not a permanent skip. Tag only tests that are already known-flaky on main; never tag a brand-new test to make a gate pass. When you fix the underlying race, remove the tag in the same PR — otherwise the list silently grows and you lose real coverage.

💡 Tip

Keep escape hatches for the local gate so a flaky machine never blocks a push: e.g. SKIP_E2E_WEBKIT=1 to skip just the WebKit pass, SKIP_E2E=1 to skip the whole E2E stage, and a RUN_FLAKY=1 opt-in to run the quarantined tests when verifying a fix.

Editor Input in E2E

Driving a code editor (CodeMirror, Monaco, ProseMirror, or any contenteditable) from Playwright is harder than page.fill(). If the editor has a vim mode, page.keyboard.type("hello") is a disaster: the leading h moves the cursor left, i enters insert mode, and the rest is interpreted as commands rather than text.

The reliable approach is to select all existing content via the DOM Selection API, then push the new content with page.keyboard.insertText(). insertText dispatches a synthetic input event that the editor handles directly, bypassing vim-mode command interpretation entirely:

// e2e/helpers.ts
import type { Page } from "@playwright/test";
import { expect } from "@playwright/test";
import os from "os";

// Platform-aware modifier: Meta on macOS, Control on Linux/Windows
export const mod = os.platform() === "darwin" ? "Meta" : "Control";

export async function setEditorContent(page: Page, content: string) {
  const editor = page.locator(".cm-content");
  await editor.waitFor({ timeout: 5000 });
  await editor.click();

  // Select all content via the DOM Selection API (works regardless of vim mode)
  await page.evaluate(() => {
    const el = document.querySelector(".cm-content");
    if (!el) return;
    const range = document.createRange();
    range.selectNodeContents(el);
    const sel = window.getSelection();
    sel?.removeAllRanges();
    sel?.addRange(range);
  });

  // insertText dispatches an input event the editor handles directly,
  // bypassing vim-mode command interpretation entirely.
  await page.keyboard.insertText(content);

  // Wait for the Lezer parse + decoration updates to land before asserting.
  const firstLine = content.split("\n").find((l) => l.trim()) || content;
  await expect(page.locator(".cm-content")).toContainText(firstLine.slice(0, 20), {
    timeout: 5000,
  });

  // Arbitrary timeout — acceptable ONLY because 500ms is the known auto-save
  // debounce constant. Features like split-pane read content back from the
  // backend, so the test must wait >= the debounce or it races the persist.
  await page.waitForTimeout(500);
}

The platform-aware mod helper lets the same spec drive editor shortcuts on macOS (Meta) and Linux/Windows (Control) without branching in every test.

⚠️ Warning

That waitForTimeout(500) is the legitimate exception to the usual “never use an arbitrary waitForTimeout” rule. An arbitrary wait is acceptable only when it is keyed to a known application constant — here, the 500ms auto-save debounce — and you document why in a comment. A bare waitForTimeout(500) with no rationale is still a flake waiting to happen; tie it to a real constant or replace it with a proper expect wait.

Console Error Monitoring

Extend Playwright’s test fixture to automatically fail on console errors:

// e2e/fixtures.ts
import { test as base, expect } from "@playwright/test";

export const test = base.extend<{ consoleErrors: string[] }>({
  consoleErrors: async ({ page }, use) => {
    const errors: string[] = [];

    page.on("console", (msg) => {
      if (msg.type() === "error") {
        errors.push(msg.text());
      }
    });

    page.on("pageerror", (error) => {
      errors.push(error.message);
    });

    await use(errors);

    // Assert no console errors after each test
    expect(errors).toEqual([]);
  },
});

export { expect };

// e2e/app.spec.ts
import { test, expect } from "./fixtures";

test("home page has no console errors", async ({ page, consoleErrors }) => {
  await page.goto("/");
  await page.waitForLoadState("networkidle");
  // consoleErrors assertion happens automatically in fixture teardown
});

Filtering benign errors with a curated allowlist

The expect(errors).toEqual([]) assertion above works on a pristine app — but real suites quickly hit a wall. There are almost always benign errors: framework dev warnings, third-party SDK noise, adapters that fail gracefully outside their real runtime. A strict empty-array assertion turns every one of those into a red test, and the usual reaction — loosening the check until it stops complaining — throws away the regression-catching value entirely.

The fix is an assertNoConsoleErrors() that filters a curated allowlist. The discipline that keeps it honest: every allowlist entry carries a why-comment justifying why that specific message is safe to ignore.

// e2e/helpers.ts
import { expect } from "@playwright/test";

export function assertNoConsoleErrors(errors: string[]) {
  const unexpected = errors.filter((msg) => {
    // React DevTools install nag — dev-only, not an app error.
    if (msg.includes("Download the React DevTools")) return false;
    // Favicon 404 — the mock server has no favicon; harmless.
    if (msg.includes("Failed to load resource") && msg.includes("favicon")) return false;
    // Tauri listen() fails in browser/mock mode: @tauri-apps/api's transformCallback
    // is undefined outside the WebView runtime. The error is caught internally and
    // the mock adapter registers its own in-memory listeners instead.
    if (msg.includes("Failed to register Tauri event listener")) return false;
    // React warns on an iframe rendered with src="" — known v1 limitation of the
    // preview pane when no URL is seeded; the iframe renders harmlessly.
    if (msg.includes('An empty string ("") was passed to the %s attribute') && msg.includes("src")) {
      return false;
    }
    return true;
  });
  expect(
    unexpected,
    `Unexpected console errors:\n${unexpected.join("\n")}`,
  ).toHaveLength(0);
}

⚠️ Warning

The why-comment on each entry is the load-bearing part, not bureaucratic ceremony. Without a rationale, an allowlist silently rots into “ignore everything”: months later nobody remembers whether an entry guards a real known-issue or was added to mute a genuine regression, so the safe move becomes never removing anything. A one-line why lets the next reader delete the entry the day its underlying cause is fixed — which is exactly when the allowlist should shrink, not grow.

CI Image Interception for Speed

In CI, network requests for large images slow down tests. Intercept and replace them with tiny placeholders:

// e2e/fixtures.ts
export const test = base.extend({
  page: async ({ page }, use) => {
    // Intercept image requests in CI
    if (process.env.CI) {
      await page.route("**/*.{png,jpg,jpeg,webp,gif}", (route) => {
        route.fulfill({
          status: 200,
          contentType: "image/png",
          // 1x1 transparent PNG
          body: Buffer.from(
            "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==",
            "base64"
          ),
        });
      });
    }
    await use(page);
  },
});

📝 Note

This pattern from zmod cut CI E2E test time by 40% by eliminating network latency for image assets.

Production Build Verification

Test against the production build, not the dev server. This catches build-specific issues:

// playwright.config.ts
import { defineConfig } from "@playwright/test";

export default defineConfig({
  webServer: {
    command: "npm run build && npm run preview",
    port: 4173,
    reuseExistingServer: !process.env.CI,
  },
  use: {
    baseURL: "http://localhost:4173",
  },
});

// e2e/production.spec.ts
import { test, expect } from "@playwright/test";

test("production build serves all pages", async ({ page }) => {
  const urls = ["/", "/docs", "/about", "/contact"];
  for (const url of urls) {
    const response = await page.goto(url);
    expect(response?.status()).toBe(200);
  }
});

test("production build has no broken links", async ({ page }) => {
  await page.goto("/");
  const links = await page.locator("a[href^='/']").all();
  for (const link of links) {
    const href = await link.getAttribute("href");
    if (href) {
      const response = await page.goto(href);
      expect(response?.status()).toBe(200);
    }
  }
});

Sharded CI Runs

For large test suites, shard across multiple CI runners:

# .github/workflows/e2e.yml
jobs:
  e2e:
    strategy:
      matrix:
        shard: [1/4, 2/4, 3/4, 4/4]
    steps:
      - uses: actions/checkout@v4
      - run: npx playwright install --with-deps
      - run: npx playwright test --shard=${{ matrix.shard }}
      - uses: actions/upload-artifact@v4
        if: failure()
        with:
          name: playwright-report-${{ strategy.job-index }}
          path: playwright-report/

Mock Backend Adapter for Frontend-Only E2E

When testing frontend behavior independently from the real backend:

// e2e/mocks/backend-adapter.ts
import { Page } from "@playwright/test";

export async function mockBackend(page: Page) {
  await page.route("**/api/**", async (route) => {
    const url = new URL(route.request().url());

    const mocks: Record<string, unknown> = {
      "/api/user": { id: 1, name: "Test User", email: "test@example.com" },
      "/api/settings": { theme: "dark", language: "en" },
      "/api/documents": [
        { id: 1, title: "Doc 1" },
        { id: 2, title: "Doc 2" },
      ],
    };

    const mockData = mocks[url.pathname];
    if (mockData) {
      await route.fulfill({
        status: 200,
        contentType: "application/json",
        body: JSON.stringify(mockData),
      });
    } else {
      await route.continue();
    }
  });
}

// e2e/frontend.spec.ts
import { test, expect } from "@playwright/test";
import { mockBackend } from "./mocks/backend-adapter";

test.beforeEach(async ({ page }) => {
  await mockBackend(page);
});

test("displays user name from mock API", async ({ page }) => {
  await page.goto("/dashboard");
  await expect(page.locator(".user-name")).toHaveText("Test User");
});

⚠️ Warning

Mock backends are great for frontend-focused testing, but they do not replace integration tests against the real API. Use both: mocked for UI behavior, real for data flow.