A/B Test Calculator

Frequentist · Two-prop Z-test

A/B Test Calculator

Evaluate statistical significance and test power — for pre-test planning or post-test analysis.

Test Data

Visitors A (Control) ?

Visitors B (Variant) ?

Conversions A ?

Conversions B ?

Settings

Hypothesis ?

Confidence Level ?

How to read results p-value — how surprising the result is if there is truly no difference.
Power — how likely you are to detect an effect (shown two ways).
Z-score — difference measured in "standard errors."

📊

Awaiting calculation

Enter your test data above

CR Control (A) ?

—

Conversions / Visitors

CR Variant (B) ?

—

Conversions / Visitors

Relative uplift ?

—

(CRB − CRA) / CRA

p-value ?

—

Z-score ?

—

Standard deviations

Test power ?

—

Power of two-prop z-test

CI threshold power ?

—

Sampling distributions — Control A vs Variant B ?

Control (A)

Variant (B)

Standard Error A ? sqrt( CRa × (1−CRa) / Va )	—
Standard Error B ? sqrt( CRb × (1−CRb) / Vb )	—
Std. Error of Difference ? sqrt( SEa² + SEb² )	—
Critical Z-value ? At chosen confidence level	—
A CI Upper Threshold ? CRa + Zcrit × SEa	—
A CI Lower Threshold ? CRa − Zcrit × SEa	—
Alpha (α) ? 1 − confidence level	—

Use this tool to answer two practical questions:

Did the variant likely change behavior? The calculator estimates how surprising your observed difference would be if there were actually no difference between A and B.
How confident can I be in the result? It reports p-value, z-score, and two power readouts so you can judge whether your study was capable of detecting the effect you care about.

This is a frequentist two-proportion z-test (for conversion rates: “converted” vs “didn’t convert”). It’s appropriate when each visitor has one chance to convert and visitors are independently assigned to A or B.

This estimates the probability that B would clear a confidence bound around A (a “confidence interval threshold” framing). Some tools use this as their default “observed power.”

Interpretation:

It often looks higher than Test Power, because it’s tied to a different rejection criterion.
Useful if your team uses CI-style decision rules (“ship if B clears A’s bound”), but it’s not identical to the z-test power.

Bottom line: if you’re using the p-value/z-test to make decisions, treat Test Power as the primary power metric.

Think of results as a combination of evidence strength and decision risk:

Significant + high power: strong evidence; effect is likely real and the study had enough sensitivity.
Not significant + low power: inconclusive; you may need more traffic, a higher-frequency metric, or a larger effect.
Significant + low power: possible “lucky” detection; replicate or run longer if the decision is high impact.
Huge uplift on tiny baseline: check absolute differences and confidence; relative metrics can overdramatize.

A/B testing is one input to product judgment. Pair it with:

qualitative insights (why it happened),
segmentation (who it helped/hurt),
guardrail metrics (did it break anything).

Follow us on LinkedIn

Stay in touch

Get updates on new courses, plus valuable education on all things UX research.

I would like to receive news, tips and tricks, and other promotional material

Thank you!

A/B Test Calculator: The Technical Standard for Statistical Significance

A/B Test Calculator

What this calculator is for

Before you trust any result: check experiment quality

1. Random Assignment and Stable Exposure

2. Sample Ratio Mismatch (SRM) warning

3. Low conversion counts

What to enter

How to read the core outputs

Conversion rate (CR)

Z-score

p-value

Confidence level and the “significant / not significant” label

One-sided vs two-sided (choose based on your study intent)

One-sided: “B is better than A”

Two-sided: “B is different from A (better or worse)”

Power: why you might see two different numbers

1. Test Power (Z-test)

2. CI Threshold Power (common in other calculators)

A UX-researcher-friendly way to interpret outcomes

Quick checklist before you share results

Quantitative UX Research Course: Statistical Methods for Product Development

Follow us on LinkedIn

Stay in touch

A/B Test Calculator: The Technical Standard for Statistical Significance

A/B Test Calculator

What this calculator is for

Before you trust any result: check experiment quality

1. Random Assignment and Stable Exposure

2. Sample Ratio Mismatch (SRM) warning

3. Low conversion counts

What to enter

How to read the core outputs

Conversion rate (CR)

Z-score

p-value

Confidence level and the “significant / not significant” label

One-sided vs two-sided (choose based on your study intent)

One-sided: “B is better than A”

Two-sided: “B is different from A (better or worse)”

Power: why you might see two different numbers

1. Test Power (Z-test)

2. CI Threshold Power (common in other calculators)

A UX-researcher-friendly way to interpret outcomes

Quick checklist before you share results

Quantitative UX Research Course: Statistical Methods for Product Development

Follow us on LinkedIn

Stay in touch

Stay in the loop

Request a course topic