Spin up a full local E2E environment (backend, frontend, docker services, Playwright UI):
hogli test:e2eThis uses bin/mprocs-e2e.yaml under the hood. If you need to reset the E2E database, trigger the reset-db process in the phrocs UI.
To run tests against an already-running PostHog instance:
LOGIN_USERNAME='[email protected]' LOGIN_PASSWORD="the-password" BASE_URL='http://localhost:8010' pnpm --filter=@posthog/playwright exec playwright test --uiYou might need to install Playwright first: pnpm --filter=@posthog/playwright exec playwright install
Use the /playwright-test skill to have Claude Code write and validate end-to-end tests for you.
It will explore the UI with Playwright MCP tools, plan the tests, implement them, and run them in a loop until they pass reliably (including a flakiness check with --repeat-each 10).
This suite is expensive as it runs the full stack and a real browser, and every spec costs PR runtime, runner credits, and a slice of the team's flake budget. Use these principles to decide whether a test is worth that cost.
E2E is uniquely suited to multi-step journeys where the frontend, network, backend, and datastores all have to agree at once. If a regression would surface in a Jest + kea test, a Storybook story, an API integration test, or a ClickHouse unit test, write it there instead — cheaper to run, easier to debug, no 8-vCPU runner tied up.
- "Page renders", "button is present", "heading reads X", "tab is active"
- Those are smoke checks, not e2e, and they belong in Jest or Storybook.
- Visual regressions belong in Storybook visual review.
- If a failure can be diagnosed without reading a backend log, the test probably didn't need the backend.
The suite stays small on purpose; the bigger it gets, the noisier the flake signal becomes, and we drift back into "ignore the red, it's probably flake". You should treat adding a spec like adding a CI job. Justify it in the PR description (which cross-stack flow, why won't a lower layer catch it, how it sits next to the existing specs, etc.). Reviewers should push back when that justification is thin. "Nice to have coverage" isn't enough, but "this flow has broken in prod and nothing below this layer would have caught it" is.
- Don't use CSS selectors — prefer accessibility roles (
getByRole) orgetByTestId()which maps todata-attrin our config. Adddata-attrto components if needed. - Write fewer, longer tests that do multiple things. Split logical steps with
test.step(). - Use page object models for common tasks and accessing common elements (see
page-models/). - After UI interactions, assert on UI changes — don't assert on network requests resolving.
- Never put conditional logic (
if) in a test.
Flaky tests are almost always due to not waiting for the right thing. Consider adding a better selector, an intermediate step like waiting for URL or page title to change, or waiting for a critical network request to complete.
Loose selectors cause strict mode violations. If a selector matches multiple elements, Playwright will show all matches — use the output to narrow down:
Error: locator.click: Error: strict mode violation: locator('text=Set a billing limit') resolved to 2 elements:
1) <span class="LemonButton__content">Set a billing limit</span> aka getByTestId('billing-limit-input-wrapper-product_analytics').getByRole('button', { name: 'Set a billing limit' })
2) <span class="LemonButton__content">Set a billing limit</span> aka getByTestId('billing-limit-input-wrapper-session_replay').getByRole('button', { name: 'Set a billing limit' })