Defining the North Star AI vision for WhatsApp Business

The problem

Many AI capabilities, no unified vision

WhatsApp Business was at an inflection point: multiple AI capabilities were being built in parallel (a business-focused agent, a general assistant, several interface modalities), but there was no unified vision for how they should come together for small-business users.

The stakes: with no shared direction, teams were building competing AI experiences for a product serving millions of businesses — risking a fragmented, confusing product and wasted investment.

What I set out to achieve

Define what AI should be for small businesses

From the outset the goal wasn't to ship a feature — it was to set a North Star: an evidence-based answer to what AI should be for SMBs on WhatsApp, plus the framework to get the team there. Concretely, I set out to answer four questions:

How do SMBs perceive and differentiate the business agent from the general AI assistant?
Which modalities should AI take — traditional UI flows, voice agents, or messaging-based assistants?
What jobs-to-be-done are highest priority for AI across the SMB workflow?
Where does AI build trust vs. create anxiety for business owners?

The challenges I faced

Designing for a vision, not a feature

Vision-level ambiguity: the "right answer" wasn't yes/no — it was a strategic framework to guide years of product development.
Concept abstraction: testing future-state concepts (voice agents, avatar assistants) with SMBs who had no prior familiarity required careful stimulus design to avoid leading or confusing participants.
Cross-market validity: Mexico and India have very different commerce patterns, digital maturity, and language needs — yet findings had to converge into a single recommendation.
Multi-stakeholder alignment: results had to inform PM (roadmap), Design (modality), Engineering (infrastructure), and Leadership (investment) — each with different decision frames.
Competing internal visions: different teams held different hypotheses; the research had to provide objective evidence to resolve debates, not confirm any one team's priors.

How I resolved it · the approach

Comparative concept testing across modalities

Research & concept design

Framed the study around two levels: immediate BizAI opportunities (improve now) and broader AI opportunities across WAB (build next).
Designed concept stimuli for three modalities: a traditional wizard-guided UI, a voice/avatar agent, and a messaging assistant.
Mapped jobs-to-be-done across the full SMB workflow: profile setup, catalog management, customer chat, ads/broadcast, CRM, and sales/support.
Structured a comparative evaluation — reacting to multiple modalities for the same task — instead of testing concepts in isolation.

Execution & synthesis

Conducted 16 generative interviews across the two markets, probing AI perceptions, trust, mental models, modality preference by task complexity, customization needs, and willingness to centralize into one assistant.
Surfaced the core confusion: SMBs conflate the business agent with the general assistant at entry points.
Mapped modality preference to task complexity: wizard flows for routine tasks; voice/avatar for complex "teaching moments"; users want to switch.
Synthesized the strategic recommendation: one unified AI assistant with tiered customization > multiple separate bots — organized so each insight linked to a concrete monetization or retention pathway.

Concept-testing stimuli

The three modalities, as live screens

To test the modality decisions with SMBs, I prototyped each modality across three core tasks — built in Stitch, with the microanimations running live (the pulsing field highlight, the floating tip card). Pick a task, then compare how each modality handles it.

JTBD

Guided UI · floating tips

Open full screen ↗

Voice / avatar agent

Open full screen ↗

Meta AI business agent

Open full screen ↗

Guided UI · floating tips

Open full screen ↗

Voice / avatar agent

Open full screen ↗

Meta AI business agent

Open full screen ↗

Guided UI · floating tips

Open full screen ↗

Voice / avatar agent

Open full screen ↗

Meta AI business agent

Open full screen ↗

All of these prototypes were vibe-coded using Stitch.

Synthesis · Prioritization hierarchy

From jobs-to-be-done to a build order the team can act on

I synthesized the interviews into a prioritization hierarchy for the four core jobs — creating a business profile, setting up the catalog, inbox management, and creating ads & broadcast messaging — scoring each by importance and difficulty and segmenting by SMB type. It's not just a map of what matters; it's an actionable build order: the high-importance, high-difficulty jobs are where AI should land first — and the priority shifts depending on who the business owner is.

Prioritization hierarchy: four jobs-to-be-done scored by importance and difficulty and segmented by SMB type, indicating build order for the product team — Each job scored by importance × difficulty and segmented by SMB type — turning research into a prioritized, owner-specific build order for the product team.

Underneath that hierarchy is the raw distribution — how individual SMBs rated each job. The four jobs are plotted by importance and difficulty, and the size of each bubble denotes the number of SMBs who voted for that job — so the highest-leverage targets are the large bubbles sitting high on both axes.

Bubble distribution: each of the four jobs-to-be-done plotted by importance and difficulty; the size of each bubble denotes the number of SMBs who voted for that job — Distribution of SMB ratings by importance × difficulty — the size of each bubble denotes the number of SMBs who voted for that JTBD.

Designing the human–AI interaction

The interaction decisions I shaped

Beyond findings, this study made concrete decisions about how people would interact with the AI:

Modality by task complexity: wizard-guided UI for routine setup, voice/avatar for complex "teaching moments," with smart defaults and the ability to switch — not a single-modality bet.
Trust & mental models: a distinct, business-first agent identity to resolve the agent-vs-assistant confusion and reduce anxiety at the entry point.
Human-in-the-loop boundary: the AI assists but never "closes the deal" — hand-holding that lowers risk over autonomy that raises it.
Interaction architecture: one unified assistant with tiered customization (pre-trained default vs. custom) — a decision about the agent's behavioral model, not just a feature toggle.

Why it's design, not polish: task-dependent modality and the unified-assistant model were adopted as the monetizable architecture — the interaction decision and the business model were the same decision.

Framework · Modality × task complexity

Which modality wins for which task

The research pointed to a clear rule: modality should follow task complexity — with a smart default and the option to switch. Guided UI carries routine, structured setup; a chat assistant handles ongoing, conversational work; and a voice/avatar agent earns its place in the complex, novel "teaching moments." This framework maps the four core jobs to the modality that wins each.

Best-fit modality per task as complexity rises — guided UI → chat assistant → voice/avatar. The default just leads; members can still switch modality, and the AI hand-holds rather than acting autonomously.

Design principles established

Principles the research turned into guidance

Task-dependent modality with smart defaults

Not a single-modality bet — AI form should follow task complexity, with sensible defaults and the ability to switch.

Business-first branding builds trust

Distinct naming anchored on outcomes ("talk to customers, get info, sell faster") resolves the agent-vs-assistant confusion and increases adoption.

One assistant, tiered customization

Default to pre-trained; offer custom options for power users — best UX and the clearest monetization path.

Hand-holding wins; autonomy that "closes the deal" doesn't

Keep humans in the loop for high-stakes moments; AI assists but doesn't close — this preserves trust.

Impact

The value it created — for users and the business

Value to users (SMBs)

A clearer AI experience: distinct, business-first branding that resolves the agent-vs-assistant confusion and lowers anxiety at the entry point.
The right modality for the task — guided setup for routine jobs, conversation for ongoing work, voice for complex moments — with the freedom to switch.
AI that hand-holds and keeps people in control rather than acting autonomously, which preserves trust.

Value to the business

Fed directly into the North Star AI vision and resolved competing internal bets — aligning PM, Design, Eng & Leadership on one direction.
Validated one unified assistant with tiered customization as both the best UX and the clearest monetization path.
Prioritized investment: profile + catalog setup identified as the highest-ROI AI bets (high importance, high difficulty).

How it added business value: the interaction decision and the business model became the same decision — a task-dependent, unified-assistant architecture that is at once the most usable experience and the most monetizable one, now guiding the AI roadmap for a product serving millions of businesses.

Reflection

What I learned

What worked: the two-tier recommendation structure made findings actionable across time horizons; cross-market convergence strengthened the recommendations; and framing every finding through business impact kept PM and leadership engaged.

What I'd do differently: add a quantitative validation phase to strengthen the "one assistant" recommendation; also test with experienced users (all participants were new to the agent), and coordinate earlier with adjacent research.

What I learned: how to design research that shapes long-term vision (not just evaluate features), the power of comparative concept testing, and how to translate qualitative insight into strategic frameworks leadership can act on.

My responsibilities & artifacts

What I owned

Responsibilities

Designed the study around two strategic levels
Designed concept stimuli for three AI modalities
Built the JTBD prioritization across the SMB workflow
Ran comparative concept testing across two markets
Synthesized findings into a North Star framework & design principles
Aligned PM, Design, Engineering & Leadership on direction

Artifacts delivered

Comparative concept-test research design
Three-modality concept stimuli
Jobs-to-be-done map / prioritization hierarchy
North Star vision input + two-tier recommendations
Design principles set
Findings share-out to cross-functional stakeholders

What kind of design this was

Design research that set product & design direction

This was generative design research and product strategy — the upstream design work that decides what a product should become before any screen is finalized. As design researcher and strategist, I designed the concept stimuli for three AI modalities, mapped jobs-to-be-done across the SMB workflow, ran a comparative evaluation, and synthesized the findings into a North Star framework and a set of design principles that now guide modality, branding, and monetization decisions for WhatsApp Business AI.

Protected case study

Many AI capabilities, no unified vision

Define what AI should be for small businesses

Designing for a vision, not a feature

Comparative concept testing across modalities

Research & concept design

Execution & synthesis

The three modalities, as live screens

From jobs-to-be-done to a build order the team can act on

The interaction decisions I shaped

Which modality wins for which task

Principles the research turned into guidance

Task-dependent modality with smart defaults

Business-first branding builds trust

One assistant, tiered customization

Hand-holding wins; autonomy that "closes the deal" doesn't

The value it created — for users and the business

What I learned

What I owned

Design research that set product & design direction