What is AI experimentation, and why do you need it?

By Madalina Turlea·

Written by Madalina Turlea

14 May 2026

Companies have always struggled with innovation. Especially the ones whose products have been in the market for years, with large teams, deep process, and a lot to protect.

Innovation comes from creativity. And creativity gets killed in environments with high pressure, where mundane tasks eat up most of the day.

Neuroscience has a clear explanation for this. Creative insight shows up when the brain is in the Default Mode Network, resting, but with the wheels still turning in the background. It's why your best ideas come in the shower, or on a walk with the dog. Not at your desk between two meetings.

Most teams don't have any of that time in their day. And then leadership wonders why the company stopped innovating.

So companies try to force innovation. Hackathons are one way. Dedicated "innovation" teams are another.

I've been in those hackathons. In one of my first roles, I joined every hackathon I could. In one of them, I got my first exposure to AI and ML and built my first AI product. I loved it. I didn't know it at the time, but that's when I first learned I thrive in startups, where it's just about the product, about pushing an idea forward and shaping how you bring it into the world.

I won that hackathon. I got the bug. And I spent years trying to recreate that energy in my day job, not just once a year in a two-day sprint.

Now companies and product leaders are trying to do the same thing with AI. Particularly with GenAI. Most still have an "AI team" or an ML team that takes in requests from different product teams and picks which ideas to implement.

There's a huge gap between what this "AI team" understands and what the product team building a specific feature actually needs. The AI team knows the tech; how to work with models. But they don't know the user. They don't carry the domain expertise. They don't have product sense for this specific feature.

For traditional ML, this was working. Iterating on a model was engineering work. Only AI engineers and data scientists could actually change what the model did.

But with GenAI, the way you shape AI behavior is plain English.

The programming language of GenAI is the same language your PMs, designers, and domain experts already use every day. Blocking them from direct access to the technology, by consolidating it into an AI team, or hiring a handful of AI engineers to drive your innovation, will fail miserably.

Why your team keeps defaulting to a chatbot

I've heard the same thing from a lot of product leaders: we're struggling to move our teams from traditional SaaS product building into AI-native product building.

That's because you haven't let them really play with this technology.

You haven't changed the environment or the process in which your team builds. You still think engineering should own the tech part, and product writes the specs. Then leadership pressure comes down: "add AI to search," "add agentic workflows for customer support." So you rush to add a chatbot to the product to show investors and clients that you innovate and you're AI-native.

And then you wonder why your team isn't building more. Why companies like Anthropic keep shipping great products while your chatbot hasn't transferred any of that magic to your own product.

When I started building my first AI feature, I was on the exact same path. I thought I needed to do a huge amount of work upfront. Scope the feature in detail. Prepare the data. Decide whether we'd fine-tune our own model or use a closed one.

I also started, unoriginally, with a chatbot UX. Because I thought that's what "AI-native" meant.

It was only when I built the first real AI feature at Lovelaice that I got the unlock. When I started AI experimentation, on my own, with my own idea.

I'll show you how you can do it too. It takes less than a day.

What one experiment actually looks like

A few months ago, we ran a first experiment in Lovelaice with the CPO of a B2C consumer product. Entertainment space. Users come for fun and novelty.

He started where almost every product leader starts: he wanted to add a chatbot to help users get around the product and get more out of it. Classic first move.

We agreed to run a first experiment to see what could actually be done, how we'd evaluate the results, and what kind of quality he could expect. We set it up for his chatbot idea. But we also proposed a second version of AI for his product, an invisible one. A daily AI-generated challenge based on each user's profile, designed to bring them back for novelty and progress.

The experiment was simple. We pulled a few real user profiles and the product's knowledge base. We wrote the first prompt. We ran it across 7 models at the same time.

Then we read the responses together.

Within minutes, reading real AI responses on real user data, seeing how different models handled the same user profile in very different ways, he was hooked. He started noticing what was good, what was bad, where the AI failed, how it failed. He also started seeing how he could improve the prompt, what new iterations he wanted to run, what new ideas he hadn't considered.

From one starting idea of a chatbot, he left with four different AI features. All invisible. All high value. All with a clear path to validate and iterate.

He started his journey the way most product leaders start theirs: knowing AI could bring value to his product, business, and users, but not knowing where to start. Overwhelmed by the work he thought was required before he could even begin.

One experiment. 10-15 real AI responses. That's what it took.

That is AI experimentation.

What AI experimentation is, and what it isn't

AI experimentation is not running benchmarks. It's not a bigger eval suite. It's not picking the "best" model from a leaderboard. It's also not prototyping with AI tools like Lovable or Figma Make.

AI experimentation is a product thinking exercise. You take one idea for how AI could help your user. You write your first guess of a prompt. You gather a few real cases from your product. You run the prompt across several models at once. Then you read what the AI actually produced, carefully, with your product and user context in mind.

And in reading, you discover two things at the same time:

  • - What your AI actually does on real data, not what you imagined it would do.
  • - A world of possibilities you couldn't see before, because you had nothing concrete to react to.

This is how you build AI product sense. It's a different muscle from traditional product sense. Traditional product sense is built by shipping features and talking to users. AI product sense is built by reading what AI actually produces on your real data, again and again, and developing an intuition for what's possible.

You can't read about it. You can't watch someone else do it. You have to do it yourself.

The mistakes that keep teams stuck

Before I walk you through the process, here are the two traps I see most often.

Trap 1: you think you need engineering excellence to build something great with AI.

You imagine terabytes of data fed into an engine. Weeks of data annotation. Fine-tuning runs. PhD-level ML work. So you don't start. Or you hand the whole thing to engineering and wait.

This is killing your progress and your creativity. You don't need any of that to learn what AI can do for your product. You need one idea, one prompt, five test cases, and an afternoon.

Trap 2: you slap a chatbot on the product, call it "AI," and move on.

This is the other side of the same coin. You're not experimenting, you're performing in the "AI theatre." The chatbot ships. Users don't use it. Nobody on the team learns anything. And your roadmap keeps going the same way it was before AI existed.

Both traps come from the same root: nobody on the product team has actually read what AI produces on your real data.

How AI experimentation actually works

Here's the process. Step by step. You can do this today.

1. Start with a finite problem

One big mistake I see with product leaders and PMs: they start thinking of the most complex thing AI could possibly do. "We'll give AI access to all our data, and it'll deliver insights to users." It rarely works.

Pick one specific problem where you think AI could help your user. Not the universe of problems. One. It might be about data, or it might be a specific question, a specific flow, a specific moment in the user journey.

Finite is the unlock. Complexity comes later.

2. Write your best-guess prompt

This part is critical: the first prompt should not be written by an engineer. It should be written by someone who deeply understands the field in which the AI is used. The PM. The domain expert. The founder. The CPO.

Write it in plain English. Describe what you want the AI to do, in what tone, for which user, with what constraints.

Don't overthink it. Make your best guess. You can also try a few different phrasings or structures and run them all at once; you'll quickly see what actually makes a difference for your field.

3. Engineer just enough data

Contrary to popular belief, you don't need to give the AI all your data. It's actually bad practice. Too much context confuses the model and dilutes quality.

Give it just enough to solve the job to be done. Start with your best guess. The results from the first run will tell you what's missing and what's noise.

This is where a lot of teams also discover a second world of possibilities: you can engineer the data you pass to the AI. Just enough context. Just the right personalization. The next iteration writes itself once you see the first run.

4. A warning before you go further: only build what you have expertise in

This is the anti-pattern that wastes the most time. Teams try to get AI to do a task they themselves don't know how to do.

Take a financial app that wants to give users recommendations on optimizing their finances. Unless you're an expert financial advisor, someone who knows good financial practices, particular to each user's situation, you won't know a good recommendation from a mediocre one. Your standards will be low. And users can already get mediocre generic answers from ChatGPT. They don't need your product for that.

But if you are the expert, if you know what you'd tell a user in their exact situation, and you can compare that with what the AI says, then you know. You can judge. You can iterate.

AI amplifies expertise. Experts get 100x their own quality, at huge scale. Non-experts get mediocre, average training-data answers. Indistinguishable from ChatGPT.

If you don't know what a good result looks like, you shouldn't build that product.

5. Create 5 real test cases

Not made-up ones. As close as possible to the reality of your production data. It doesn't have to be in the exact final format yet, but close enough that your test is real.

Five is enough to start. You'll learn a huge amount from five. If you're pushing this to 30 before you've read the first five, you're stalling.

6. Run across multiple models: not to pick one, to learn

Another big mistake I see: teams treat models like commodities. "They're all the same." "Let's just use the latest one, it's the best."

I've heard all of it:

  • - "Experts told us this is the best for our industry."
  • - "We know model XYZ is the best because we ran a few manual tests."
  • - "I benchmark LLMs in my head, I don't need a platform for it."

You're leaving enormous quality on the table when you do this. Models are not commodities. They interpret prompts differently, they handle ambiguity differently, they surface different nuances.

The reason you run across multiple models is not necessarily to crown a winner. It's because the differences between their responses reveal what's missing from your prompt. When Claude solves the problem one way and GPT another, the gap between them shows you the ambiguity in your instructions. The gap is the learning.

7. Read every response

When you run your 5 test cases, on 2 different prompt versions, across 7 models, you'll have 70 answers to read through. That's a lot of answers. You might be tempted to skim through, quickly scan them. Don't. Read them carefully and take the time to actually reason through them.

This is the step 90% of teams skip. They run the prompt, glance at a couple of outputs, and move on. They miss the entire point.

When you read carefully, in your product and user context, you'll notice:

  • - What the AI got right intrinsically; parts of the problem it solved well without you asking.
  • - What it got wrong, and why it got it wrong.
  • - Nuances it surfaced that you hadn't even considered.
  • - How different models interpreted the same instruction differently.
  • - What your prompt is missing.

Write it down. Each note is either (a) a direction for your next prompt iteration, or (b) an eval criterion for later. Both are gold.

8. Iterate and watch the improvement

Now you rewrite the prompt based on what you saw. Run again. The jump in quality from a single iteration is often dramatic.

We've seen teams double their accuracy, from 40% to 80%, by iterating on the prompt alone, without changing the model. We've seen 50% to 93% in a single iteration when the error analysis was done well.

This is the most under-used lever in AI product building. Most teams, when they see bad results, switch models. They don't touch the prompt. That's backwards.

The bigger unlock: democratize this across your team

Here's what actually changes when a PM or a domain expert runs this process once: they come out of it with ideas they never could have written down in a PRD.

The CPO walked in wanting a chatbot. He walked out with four invisible AI features, because he'd seen how AI actually behaved on his users' data.

We've seen the same thing with students in our Maven course. They arrive with one idea, convinced they're solving a "problem of one" for a single imagined user. Through one experiment, they see how AI can solve the problem at scale, across different segments, in ways they hadn't even scoped.

At Lovelaice, we're building 10+ AI features right now. All invisibly integrated. None forced. Every single one of them emerged from this process: experimentation first, then opportunity, then feature. The result is AI that helps users reach an outcome at every stage of the workflow. Not AI for the sake of AI. AI aligned to a specific outcome we know our users want.

The pattern is consistent: the closer your product thinkers are to the AI, the better your AI gets and the more ideas your product generates.

Now imagine this at team scale. Every PM running their own experiments. Designers exploring AI as a material. Domain experts shaping prompts with their actual expertise. Each person who has unique insight into your product and your customers running experiments on real data.

That's how you build an AI-native team. Not by hiring more AI engineers. Not by spinning up a dedicated "AI team" and gatekeeping access. By decentralizing experimentation and letting product thinking do what it does best; discover what's valuable to the user.

If your AI is still gatekept in one team, separated from your product teams, you are missing out. Your AI is almost certainly not hitting the mark. And the creative potential that's sitting inside your organization right now is completely locked.

Bottom line

AI experimentation isn't a testing practice. It's the foundational practice for building AI-native products.

One idea. One prompt. Five real cases. Several models. Read every response. Iterate. That's it. That's where it starts.

If you've never done it, you can't fully comprehend what it unlocks. It's not like playing with another tool. Every experiment teaches you something different: about your product, your users, and the creative surface of AI you didn't know you had.

Start with one experiment. Then give that same experience to every product thinker on your team.

Madalina


→ Take our AI evaluation quiz, to see where your team could unlock the next level.

Want to go deeper, here's our latest article on evals Why your evaluation is lying to you →