Training Vertical AI Agents With Product Test Cases: Making an Agent Truly 'Get' a Product
Training Vertical AI Agents With Product Test Cases
Recently while leading the team on vertical AI agents, I ran into a familiar problem: a general-purpose model’s understanding of the business stays at the surface. Ask it “what fields should this user-profile page show,” and it gives a generic industry answer. Ask it “why does our paid-conversion funnel drop harder at step three than step two,” and it’s dead in the water. It’s not that the model is bad — it just doesn’t live inside our product.
In the same period we were pushing on product QA, and I had a realization: every test case we write is essentially translating product business rules, edge conditions, and expected behaviors into verifiable natural language. Isn’t that the ideal domain-knowledge corpus? So we merged Agent training and product testing into one task, and the payoff was bigger than expected.
Why Test Cases Are the Best Vehicle for Training Vertical Agents
Most teams’ first reflex when injecting domain knowledge is to throw documents at the agent. But product docs have three chronic problems: written late (only after a feature ships), written sloppily (only what, not why), and decoupled from code (changing code doesn’t necessarily change the doc). Test cases are the opposite — they have to be written before or alongside the feature, must cover both positive and negative paths, and must rigorously match real product behavior, otherwise the test simply doesn’t pass.
In other words, a high-quality test-case set is itself the executable specification of product behavior. When an Agent ingests this kind of corpus, it isn’t just learning facts — it’s learning the product’s judgment standards: what’s correct, what’s anomalous, what counts as an edge case. The information density is in a different league from reading a PRD.
How We Actually Did It
The flow is plain: have the Agent be the primary executor running tests, rather than human QA executing while the Agent passively learns afterward. For each case the Agent runs, it has to do four things — read and understand the case’s business intent, execute verification in the real environment, record the delta between actual behavior and expectation, and classify the delta (is it a bug, an unclear spec, or an outdated test case).
The key detail: we require the Agent, after each case finishes, to write a short summary in its own words covering “what this feature is doing, why it’s designed this way, what it’s coupled with upstream and downstream” — and persist it to the knowledge base. That step looks redundant, but it’s the source of the qualitative jump — the Agent stops being an executor and starts building a product mental model while running. After three hundred cases, its product comprehension surpasses that of a new hire who’s been in the role three months.
The Two-For-One Economic Logic
What surprised me most was the ROI structure. In the traditional path, training a domain Agent requires standalone corpus-curation cost, and doing QA requires standalone test-execution cost — the two workstreams don’t communicate. In our path, every second of test execution simultaneously produces two deliverables: a test report deliverable to product managers, and an expert Agent with continuously deepening product comprehension. The corpus isn’t an additional cost — it’s a natural byproduct of the testing work.
There’s also a hidden bonus: when test cases and the Agent knowledge base share the same source, drift between spec and implementation gets caught automatically. Documentation no longer has the “this was written for the previous version” problem, because it’s on the same pipeline as test execution.
Three Reusable Principles
Principle one: pick a knowledge vehicle that is itself a specification. Documents are bad corpus because they can drift from reality; test cases are good corpus because they have to match reality, otherwise they don’t run.
Principle two: make the Agent the executor, not the spectator. Passive ingestion produces dead knowledge; active execution accumulates transferable judgment. This step can’t be skipped.
Principle three: turn process artifacts into deliverables. Don’t pay separately for Agent training — find the workflow that simultaneously produces business deliverables and knowledge accumulation. That’s where ROI actually holds.
If you’re building an AI engineering team, take a look at our open-source Synapse framework — it integrates Multi-Agent collaboration, knowledge accumulation, and task execution into a reusable Harness, and the practice in this article runs on top of Synapse.