Fähigkeiten ab-test-setup
📦

ab-test-setup

Sicher

Set Up Rigorous A/B Tests

Auch verfügbar von: coreyhaines31

A/B tests often fail due to poor design, premature stopping, and invalid metrics. This skill enforces rigorous methodology with mandatory gates for hypothesis locking, metric definition, and sample size calculation before any test runs.

Unterstützt: Claude Codex Code(CC)
🥉 74 Bronze
1

Die Skill-ZIP herunterladen

2

In Claude hochladen

Gehe zu Einstellungen → Fähigkeiten → Skills → Skill hochladen

3

Einschalten und loslegen

Teste es

Verwendung von "ab-test-setup". Help me set up an A/B test for our checkout page

Erwartetes Ergebnis:

  • Step 1: Hypothesis Lock - Present your final hypothesis including: target audience, primary metric, expected direction of effect, and Minimum Detectable Effect (MDE). Ask: Is this the final hypothesis we are committing to?
  • Step 2: Assumptions Check - List assumptions about traffic stability, user independence, metric reliability, and randomization quality.
  • Step 3: Test Type Selection - Choose A/B (default), A/B/n, Multivariate, or Split URL based on your change complexity.
  • Step 4: Metrics Definition - Define your primary metric (mandatory), secondary metrics for context, and guardrail metrics that must not degrade.

Verwendung von "ab-test-setup". Is my hypothesis valid?

Erwartetes Ergebnis:

  • Valid hypothesis checklist:
  • ✓ Observation or evidence - Do you have data supporting this?
  • ✓ Single, specific change - Is the change clearly defined?
  • ✓ Directional expectation - Do you expect increase or decrease?
  • ✓ Defined audience - Who is being tested?
  • ✓ Measurable success criteria - What defines success?

Sicherheitsaudit

Sicher
v1 • 2/24/2026

All 12 static findings are false positives. The scanner detected benign A/B testing terminology (hypothesis, design, metrics, valid, peeking) and misinterpreted it as cryptographic/network security issues. This skill is a legitimate methodology guide for setting up rigorous A/B tests with statistical rigor. No actual security risks identified.

1
Gescannte Dateien
238
Analysierte Zeilen
0
befunde
1
Gesamtzahl Audits
Keine Sicherheitsprobleme gefunden
Auditiert von: claude

Qualitätsbewertung

38
Architektur
100
Wartbarkeit
87
Inhalt
50
Community
100
Sicherheit
91
Spezifikationskonformität

Was du bauen kannst

Product Manager Validates Test Design

A product manager uses the skill to structure a new feature test, ensuring hypothesis is specific and metrics are defined before engineering begins.

Data Scientist Ensures Statistical Rigor

A data scientist applies the methodology to review a proposed experiment, checking sample size calculations and guardrail metrics.

Growth Engineer Plans Conversion Test

A growth engineer uses the skill to structure a landing page optimization test, locking hypothesis and calculating required traffic before launch.

Probiere diese Prompts

Basic Test Setup
Help me set up an A/B test. I have a user problem: [describe problem]. I want to test: [describe proposed change]. Guide me through the mandatory setup steps.
Hypothesis Validation
Review my hypothesis for an A/B test: [paste hypothesis]. Does it meet the quality checklist? What is missing or needs improvement?
Sample Size Calculation
Help me calculate sample size. My current conversion rate is [X]%. I want to detect a [Y]% relative lift. Significance level 95%, power 80%. What sample size do I need?
Execution Readiness Check
Run an execution readiness check for my A/B test. I have: hypothesis [paste], primary metric [name], sample size [number], duration [days]. What gates am I missing?

Bewährte Verfahren

  • Lock your hypothesis and primary metric BEFORE any implementation work begins
  • Calculate sample size upfront and ensure you have enough traffic for the test duration
  • Use guardrail metrics to prevent harmful wins that damage user experience

Vermeiden

  • Starting a test without a frozen hypothesis - this leads to moving goalposts
  • Peeking at results early and stopping tests based on initial significance
  • Defining multiple primary metrics - this increases false positive risk

Häufig gestellte Fragen

What is the minimum traffic needed for an A/B test?
It depends on your baseline conversion rate and Minimum Detectable Effect. A typical test detecting a 5% relative lift on a 10% baseline rate needs approximately 30,000 visitors per variant at 95% significance and 80% power.
Can I run multiple variants in one test?
Yes, but each additional variant requires more traffic. A/B/n tests need significantly more sample size than simple A/B tests. Consider if multiple variants are truly necessary or if sequential testing is more practical.
When should I stop an A/B test early?
Rarely. Early stopping based on peeking invalidates statistical guarantees. Only stop early for technical failures, severe guardrail violations, or if you have pre-registered an adaptive design with proper statistical correction.
What is a guardrail metric?
A guardrail metric monitors that your test does not cause harm. Examples include: revenue per user, page load time, customer support tickets, or unsubscribe rate. If a guardrail fails, do not ship even if the primary metric wins.
How long should I run an A/B test?
Run tests for at least one full business cycle (typically 1-2 weeks) to account for weekday/weekend variations. Always run for the full calculated sample size, not just a fixed calendar duration.
What if my test shows inconclusive results?
Inconclusive results mean you did not detect a statistically significant difference. This is valuable learning - either your effect size is smaller than expected (need more traffic) or the change has no effect (consider a bolder change).

Entwicklerdetails

Dateistruktur

📄 SKILL.md