Measure trust in A/B test results, identify flaky experiments, and boost data quality. Launch this survey to spot issues and build a stronger testing culture.
What's Included
AI-Powered Questions
Intelligent follow-up questions based on responses
Automated Analysis
Real-time sentiment and insight detection
Smart Distribution
Target the right audience automatically
Detailed Reports
Comprehensive insights and recommendations
Sample Survey Items
Q1
Chat Message
Thank you for participating. Responses are confidential and analyzed in aggregate. Please be candid.
Q2
Multiple Choice
Which areas best describe your role? Select up to three.
Product Management
Engineering
Data Science / Analytics
Design / UX
Marketing / Growth
Operations / Support
Leadership / Strategy
Other
Q3
Multiple Choice
In the last 6 months, how often have you consumed or acted on A/B test results?
Weekly or more
1 to 3 times per month
A few times total
Not in the last 6 months
Never
Q4
Opinion Scale
Based on what you see and hear, how reliable are our A/B results overall?
Range: 1 – 10
Min: Not reliableMid: NeutralMax: Very reliable
Q5
Multiple Choice
What limits your use of A/B test results today? Select all that apply.
Hard to access results
Unsure how to interpret results
Don’t trust data quality
Not relevant to my work
No tests run in my area
Lack of time
Other
Q6
Multiple Choice
Would a brief primer on power, minimum detectable effect (MDE), and uncertainty be helpful to you?
Yes
Maybe
No
Q7
Numeric
About how many distinct A/B tests did you work on or consume results from in the last 3 months?
Accepts a numeric value
Whole numbers only
Q8
Multiple Choice
Where are the tests you touch primarily run? Select all that apply.
Web
iOS app
Android app
Backend systems
Marketing channels (email/ads)
Other
Q9
Opinion Scale
How much do you trust the validity of our A/B test conclusions lately?
Range: 1 – 10
Min: Do not trustMid: NeutralMax: Trust completely
Q10
Multiple Choice
Recently, have you observed flaky or inconsistent A/B outcomes on key metrics?
No
Yes, occasionally
Yes, frequently
Unsure
Q11
Long Text
Please share one or two recent examples of flakiness and what you think caused them.
Max 600 chars
Q12
Matrix
How often do the following contribute to flaky results in your area?
Rows
Never
Rarely
Sometimes
Often
Very often
Traffic imbalance between variants
•
•
•
•
•
Incorrect sampling or targeting
•
•
•
•
•
Event instrumentation issues
•
•
•
•
•
Seasonality or external shocks
•
•
•
•
•
Peeking or early stopping
•
•
•
•
•
Interference between concurrent tests
•
•
•
•
•
Data pipeline lag or bugs
•
•
•
•
•
Q13
Dropdown
When deciding to ship based on a test, what effect size on the primary metric is typically meaningful for you?
Any positive change
At least 0.5 percentage points
At least 1 percentage point
At least 2 percentage points
At least 5 percentage points
It depends on context
Q14
Rating
How often do A/B results meaningfully change your team’s decisions?
Scale: 10 (star)
Min: NeverMax: Very often
Q15
Opinion Scale
How clear is the communication of uncertainty (confidence intervals, p-values, power) in shipped reports?
Range: 1 – 10
Min: Not clearMid: ModerateMax: Very clear
Q16
Multiple Choice
Before launch, how often are MDE and power planned explicitly?
Always
Often
Sometimes
Rarely
Never
Unsure
Q17
Ranking
Rank the top improvements that would most increase your trust.