Wizard of Oz Testing for AI: No Code, Big Insights

Before building anything, we simulated a smart AI assistant for customer service agents using the Wizard of Oz method and a Figma prototype.


LET’S DISCUSS YOUR NEXT PROJECT

The goal: validate usefulness, tone, and timing—without writing code.

The result: actionable results (and it was fun!).

When you're designing AI for customer service, you're not just automating tasks, you’re intervening in human-to-human conversations. That's a high bar for usefulness, trust, and emotional intelligence.


Our client asked us to design an AI copilot for call center agents. The assistant needed to:

  • Answer agents’ questions on the fly, like an internal expert.

  • Actively listen to customer calls and suggest improvements based on tone, sentiment, and product knowledge.

It was an ambitious vision. But before diving into NLP models and backend logic, we wanted to test a fundamental question:

Would real support agents find this tool helpful, usable, and trustworthy during live customer calls?

How to answer this question? We turned to a Wizard of Oz (WoZ) approach, faking the AI experience using a scripted role-play, a Figma prototype, and a few good actors.

What is the Wizard of Oz Method?

The Wizard of Oz method is a design research technique where users interact with what they believe is a fully functioning system - when in reality, a human is operating it behind the scenes.

Why it’s perfect for AI: You can simulate intelligent, dynamic behavior long before the tech is actually built. Instead of burning cycles building AI we might have to throw away, we tested the illusion of intelligence. This allowed us to validate the user experience early (timing, tone, helpfulness) without burning development dollars.

Our Test Setup: 3 Roles, 1 Figma Prototype

We used a live role-play format, with each session including:

  • Customer Service Agent (our test participant)

  • Customer on the call and also the Wizard (played by a YYK UX team member)

  • Moderator (a second YYK UX team member)

Each person had a specific script and role to play.

Diagram showing the agent, customer, and moderator roles, plus communication flow. The customer/wizard types simulated AI responses into the Figma UI.

Diagram showing the agent, customer, and moderator roles, plus communication flow. The customer/wizard types simulated AI responses into the Figma UI.

The Figma Prototype

We designed a realistic interface that mirrored what the finished product might look like:

  • A chat-style AI assistant (where agents could ask questions)

  • A copilot that pushed real-time tips, alerts, or tone feedback during a live customer call

  • Details like loading animations, typing indicators, and timestamps for realism

During the usability test, a YYK team member acted as both the customer calling our fake call center and the "AI" behind the scenes. This individual would manually display pre-written responses that mimicked AI coaching tips triggered by specific phrases or the tone of voice used by either party.

Pro Tip: Screen overlays, clickthroughs, and smart framing can easily mimic high-fidelity interactivity and save time spent prototyping.

A sidebar AI assistant answers agent queries, while a dynamic panel offers proactive suggestions during a simulated customer call.

What We Tested

Our goal wasn’t to test the AI tech - we were testing the experience.

We wanted to understand:

  • When and how agents would engage with the AI

  • How tone, phrasing, and delivery timing influenced trust

  • What proactive suggestions helped—and what overwhelmed

  • Where the line was between "supportive" and "annoying"

A sidebar AI assistant answers agent queries, while a dynamic panel offers proactive suggestions during a simulated customer call.

Key Insights

1. Agents want help - but only at the right moment

If the AI interrupted during critical conversation moments, it felt like a distraction. But if it chimed in just after a key phrase - like, “I’m really frustrated about this device” - it was welcomed as supportive.

2. Trust is built in the first few interactions

When early responses were accurate and well-timed, agents leaned in. A weak or off-topic suggestion early on caused them to mentally “mute” the assistant for the rest of the call.

3. Tone and wording matter more than intelligence

A prompt like “Consider using an empathetic tone here” was helpful. But “You sound too blunt” was seen as insulting. Emotional nuance was as important as utility.

4. Agents don’t want constant input

Too many suggestions, even helpful ones, quickly became noise. Agents preferred one meaningful, actionable suggestion every few minutes, not a stream of reactive alerts.

Tips for Running Your Own Wizard of Oz Test

Whether you're prototyping AI, voice assistants, or automation tools, this method is gold for testing human-centered AI.

Build a strong scenario

Our scenario involved a customer calling in frustrated about their slow streaming issues. The call included scripted emotional moments to test the copilot’s sentiment analysis and tone suggestions.

Use savvy actors

Your “customer (wizard)” should throw realistic curveballs, it’s important to cast someone in the wizard role who can think on their feet and quickly control the prototype. They need sharp improvisation skills and familiarity with the product.

Use lightweight tools

Our Figma prototype included less than 10 screens and basic overlays. Most of the “magic” was just fast manual interaction behind the scenes.

Record and debrief

We captured video and transcripts of every session and followed up with the agent to unpack:

  • Which AI suggestions felt useful?

  • Which ones were ignored?

  • What are the AI's other potential uses?

What We Gained

  • We learned exactly when and how agents want help

  • We uncovered trust signals and failure points

  • We avoided overbuilding features users didn’t want

  • We confirmed the value proposition with real user feedback—before writing backend code

AI for customer support is all about timing, tone, and trust – not just about access to an inhuman amount of information. Our Wizard of Oz test helped us simulate all three, without building any code.

Bottom line: Faking it allowed us to learn fast, iterate effectively, and prove value early.

If you’re designing an AI assistant—especially in high-stakes environments like customer support—this method will help you answer key questions before making expensive bets.


LET’S DISCUSS YOUR NEXT PROJECT


We look forward to hearing from you.

Rich Buttiglieri

Rich has proven experience in UX as a designer, developer, researcher, consultant, and leader. He brings a track record of building, maturing, and managing UX practices and is instrumental in transforming organizations into user-centered culture of curiosity.

Next
Next

AI in UX: Where to Start, and Where to Stop