Take the Driver's Seat: Why Product Teams Need to Own AI Experimentation

Written by Madalina Turlea

11 Nov 2025

"We'll iterate later" is the most expensive lie in AI product development.

Here's what actually happens: Teams deploy a basic prompt, tell themselves they'll improve it once they have real data, and then watch as suboptimal AI bleeds money and damages user trust while they're too afraid to touch it because they don't know what will break.

The root cause? 91% of product managers can't test AI without engineering.

This is a fundamental problem in how AI products are being built today.

The Default Is Broken

AI ownership defaults to tech teams. This seems logical—it's technical, requires API keys, needs infrastructure. But this default is creating a critical bottleneck that's killing product velocity and quality.

Here's the typical flow:

- PM has an idea for an AI feature
- PM writes a rough prompt in a doc or Notion
- PM waits for engineering to have capacity
- Engineering implements the prompt
- PM can't test changes without going back to engineering
- Iteration becomes painful, so everyone defaults to "ship it and we'll improve later"

Average time from idea to tested concept: 3 weeks

This isn't engineering's fault. They're busy building the actual product. But it creates a structural problem: the people who understand the problem best can't experiment with solutions.

The Missing Context Problem

Here's what we've observed working with 10+ startups: when AI ownership defaults to tech teams, they're missing critical domain context.

Example 1: The Fintech Case A fintech app needed AI to categorize financial transactions. Engineers built it based on general financial knowledge. It worked... sort of. Accuracy was around 65%.

When a financial analyst finally got to test different prompts, accuracy jumped to 87%. Why? The analyst knew the edge cases. They understood that "DoorDash" isn't a restaurant, it's a food delivery service. That "Venmo" could be either social or bill payment depending on context.

Context is gold for AI. The tech team can build the infrastructure, but the domain experts know what "good" looks like.

Example 2: The Mental Health App A mental health app built AI-powered conversation analysis. Engineers built the initial prompts based on research papers and general psychology principles.

When a psychologist on the team finally got access to test different approaches, the quality of insights transformed. They knew which phrases indicated genuine distress versus common figures of speech. They understood cultural contexts. They knew what questions would actually help versus trigger defensive responses.

The product became fundamentally better when the right people could experiment.

What Product Teams Actually Need

PMs don't need to replace engineers. They need to validate before engineers build.

Think about how we approach product development:

-
Design: We don't wait for developers to build every design variation. We prototype, test, iterate—then hand validated designs to engineering.
-
Analytics: We don't ask engineers to run every query. We use tools like Mixpanel or Amplitude to explore data ourselves.
-
A/B Testing: We don't wait for engineering to deploy every experiment. We use platforms that let us test variations independently.

Why should AI be different?

The Independence You Actually Need

Here's what systematic AI experimentation looks like when product teams can own it:

Day 1 - Monday Morning PM has an idea for an AI-powered feature. Creates comprehensive test cases (50-200 scenarios) representing real user inputs, including edge cases.

Day 1 - Monday Afternoon PM experiments with different prompts and models. Tests across GPT-4, Claude, Gemini, Llama. Sees performance data in real-time.

Day 2 - Tuesday PM analyzes results. Finds that Claude Sonnet provides 85% accuracy at 40% lower cost than GPT-4. Documents failure modes and edge cases.

Day 3 - Wednesday PM presents to engineering: "Here's the validated setup. 85% accuracy on 150 test cases. $X cost per 1000 requests. Here are the known edge cases. Ready to implement."

Engineering's response: "This is exactly what we need to build it right the first time."

Total time: 3 days instead of 3 weeks

What This Enables

When product teams can experiment independently, everything changes:

1. Faster Validation

Move from "I think this could work" to "I know this works with 85% accuracy on 150 test cases" before engineering writes a single line of code.

2. Better Context

Domain experts contribute their knowledge directly. The financial analyst tests prompts for financial categorization. The psychologist shapes the mental health conversation AI. The legal expert refines the contract analysis prompts.

3. Data-Backed Decisions

No more debating which model to use based on hype. "GPT-4 is the best" becomes "Claude Sonnet gives us 87% accuracy at $0.40 per 1000 requests versus GPT-4's 85% accuracy at $1.20 per 1000 requests."

4. Confident Engineering Handoff

Engineers receive validated setups with clear performance expectations, cost projections, and documented edge cases. They can build with confidence instead of guessing.

5. Continuous Improvement

When PMs can test independently, iteration doesn't require sprints. Found a better prompt? Test it. New model released? Benchmark it. User pattern changes? Validate the fix.

The Cross-Functional Collaboration Model

The goal isn't to cut engineering out. It's to enable proper collaboration:

Product Manager

- Defines success criteria
- Creates comprehensive test cases
- Experiments with prompts and models
- Validates performance before engineering builds
- Monitors production performance
- Iterates based on real-world data

Domain Expert (Financial analyst, psychologist, legal expert, etc.)

- Contributes specialized knowledge
- Tests prompt variations
- Identifies edge cases
- Validates AI outputs for domain accuracy

Engineering

- Builds production infrastructure
- Implements validated AI setups
- Optimizes performance and latency
- Handles scaling and reliability
- Focuses on what they do best: building robust systems

The Result: Higher quality AI products built faster with better collaboration.

The Skills Product Teams Need

Here's what's required to experiment effectively with AI:

Not required:

- Engineering background
- Understanding of transformer architectures
- Ability to write code
- Knowledge of API infrastructure

Actually required:

- Clear understanding of your use case
- Ability to create representative test scenarios
- Critical thinking about what "good" looks like
- Basic understanding of prompting (learnable in hours)

Prompt engineering isn't rocket science. It's a learnable skill. The hard part isn't the technical implementation—it's understanding the problem well enough to test it properly.

That's why product teams, with their deep understanding of user needs and domain context, are actually better positioned to own AI experimentation than anyone else.

What Changes When Teams Can Experiment

We've seen this transformation repeatedly:

Before: "Can engineering test this other prompt?" "They're busy with the sprint." "Okay, we'll just ship what we have."

After: "I tested 5 prompt variations across 6 models on 100 test cases. Here's the optimal setup with performance data. Ready for engineering to implement."

Before: "Should we use GPT-4 or Claude?" "I don't know, let's go with GPT-4, everyone uses it."

After: "Here's data from testing both. Claude gives us better accuracy at lower cost for our specific use case."

Before: "The AI isn't working well in production." "We can't change it without engineering, and they're focused on the next sprint."

After: "I identified the failure mode, tested a fix, validated it works. Here's the updated prompt for engineering to deploy."

Breaking the Bottleneck

The future of AI product development isn't about making everyone learn to code. It's about giving the right people the right tools to contribute their expertise.

Just like Figma enabled designers to prototype without coding, and Amplitude enabled PMs to analyze data without SQL mastery, systematic AI experimentation platforms enable product teams to validate AI features without engineering bottlenecks.

Getting Started: The First Experiment

If you want to bring AI experimentation to your product team:

Step 1: Pick One Feature Choose a single AI use case you're considering. Don't try to test everything at once.

Step 2: Build Your Test Library Create 50-100 test cases representing real scenarios. Include edge cases, variations, and tricky inputs.

Step 3: Define Success What does "good" look like? What accuracy do you need? What's an acceptable cost per request?

Step 4: Experiment Test different prompts. Compare models. Measure performance on YOUR metrics, not generic benchmarks.

Step 5: Document and Share Present findings to engineering with data, not opinions. "87% accuracy on 150 test cases" beats "I think this works."

The Competitive Advantage

Teams that empower product people to own AI experimentation will build better AI products faster.

Why? Because they're leveraging the full team's expertise. Because they're validating before building. Because they're making data-driven decisions instead of engineering-constrained compromises.

The question isn't whether product teams should be able to experiment with AI.

The question is: can you afford the bottleneck of not empowering them?

Ready to empower your product team? Learn how to implement systematic AI experimentation with our free masterclass or explore our open-source framework for team-based AI development.