Skip to content
All projects
ShippedAI · Developer Tools· 2025

QA Copilot

AI-powered QA system.

Self-maintaining
tests that heal
AI triage
noise filtered out
Faster
release confidence

The problem

Manual QA is slow and inconsistent, and automated end-to-end tests are famously flaky — eroding trust until teams ignore failures and ship bugs anyway. QA becomes a bottleneck instead of a safety net.

Context

Having built browser automation for Snappit, the leap to QA was natural: the same resilience techniques that make automation reliable can make tests reliable. QA Copilot applies AI agents to the testing lifecycle itself.

Architecture

QA Copilot observes application behaviour to generate meaningful tests, uses AI to triage failures (real bug vs. flake), and self-heals brittle selectors. It plugs into CI so the safety net runs continuously.

  • Behaviour-driven test generation.
  • AI failure triage that separates signal from noise.
  • Self-healing selectors built on the Snappit recovery layer.
  • CI-native so confidence is continuous, not a gate.

Technical challenges

Trust

Teams have to trust the copilot’s judgement on what failed and why. Calibrating confidence and being transparent about reasoning is the whole game.

Flake vs. bug

Distinguishing a genuine regression from environmental flakiness is hard. AI triage plus historical signal makes the call defensible.

Engineering decisions

Reuse the Snappit recovery layer

Self-healing selectors were a solved problem from Snappit — composing my own platforms compounded the value.

AI assists, humans decide

The copilot proposes; engineers approve. That keeps it useful without overstepping trust.

Technologies

LLM AgentsPlaywrightTypeScriptNode.jsVector search

Results

Faster releases with higher confidence, and far less time lost to brittle test maintenance — QA that scales with the product instead of fighting it.

Lessons learned

  • Reusing your own platforms compounds leverage.
  • AI tools win on trust, earned through transparency.

What I’d improve today

  • Risk-based test prioritisation from production telemetry.
  • Auto-generated reproduction steps attached to every flagged failure.