スキル ab-test-setup

📦

ab-test-setup

Name: ab-test-setup
Author: sickn33

安全

Set Up Rigorous A/B Tests

こちらからも入手できます: coreyhaines31

A/B tests often fail due to poor design, premature stopping, and invalid metrics. This skill enforces rigorous methodology with mandatory gates for hypothesis locking, metric definition, and sample size calculation before any test runs.

対応: Claude Codex Code(CC)

🥉 74 ブロンズ

スキルZIPをダウンロード

Claudeでアップロード

設定 → 機能 → スキル → スキルをアップロードへ移動

オンにして利用開始

テストする

「ab-test-setup」を使用しています。 Help me set up an A/B test for our checkout page

期待される結果:

Step 1: Hypothesis Lock - Present your final hypothesis including: target audience, primary metric, expected direction of effect, and Minimum Detectable Effect (MDE). Ask: Is this the final hypothesis we are committing to?
Step 2: Assumptions Check - List assumptions about traffic stability, user independence, metric reliability, and randomization quality.
Step 3: Test Type Selection - Choose A/B (default), A/B/n, Multivariate, or Split URL based on your change complexity.
Step 4: Metrics Definition - Define your primary metric (mandatory), secondary metrics for context, and guardrail metrics that must not degrade.

「ab-test-setup」を使用しています。 Is my hypothesis valid?

期待される結果:

Valid hypothesis checklist:
✓ Observation or evidence - Do you have data supporting this?
✓ Single, specific change - Is the change clearly defined?
✓ Directional expectation - Do you expect increase or decrease?
✓ Defined audience - Who is being tested?
✓ Measurable success criteria - What defines success?

セキュリティ監査

安全

v1 • 2/24/2026

All 12 static findings are false positives. The scanner detected benign A/B testing terminology (hypothesis, design, metrics, valid, peeking) and misinterpreted it as cryptographic/network security issues. This skill is a legitimate methodology guide for setting up rigorous A/B tests with statistical rigor. No actual security risks identified.

スキャンされたファイル

238

解析された行数

検出結果

総監査数

セキュリティ問題は見つかりませんでした

監査者: claude

品質スコア

アーキテクチャ

100

保守性

コンテンツ

コミュニティ

100

セキュリティ

仕様準拠

作れるもの

Product Manager Validates Test Design

A product manager uses the skill to structure a new feature test, ensuring hypothesis is specific and metrics are defined before engineering begins.

Data Scientist Ensures Statistical Rigor

A data scientist applies the methodology to review a proposed experiment, checking sample size calculations and guardrail metrics.

Growth Engineer Plans Conversion Test

A growth engineer uses the skill to structure a landing page optimization test, locking hypothesis and calculating required traffic before launch.

これらのプロンプトを試す

Basic Test Setup

Help me set up an A/B test. I have a user problem: [describe problem]. I want to test: [describe proposed change]. Guide me through the mandatory setup steps.

Hypothesis Validation

Review my hypothesis for an A/B test: [paste hypothesis]. Does it meet the quality checklist? What is missing or needs improvement?

Sample Size Calculation

Help me calculate sample size. My current conversion rate is [X]%. I want to detect a [Y]% relative lift. Significance level 95%, power 80%. What sample size do I need?

Execution Readiness Check

Run an execution readiness check for my A/B test. I have: hypothesis [paste], primary metric [name], sample size [number], duration [days]. What gates am I missing?

ベストプラクティス

Lock your hypothesis and primary metric BEFORE any implementation work begins
Calculate sample size upfront and ensure you have enough traffic for the test duration
Use guardrail metrics to prevent harmful wins that damage user experience

回避

Starting a test without a frozen hypothesis - this leads to moving goalposts
Peeking at results early and stopping tests based on initial significance
Defining multiple primary metrics - this increases false positive risk

よくある質問

What is the minimum traffic needed for an A/B test?

It depends on your baseline conversion rate and Minimum Detectable Effect. A typical test detecting a 5% relative lift on a 10% baseline rate needs approximately 30,000 visitors per variant at 95% significance and 80% power.

Can I run multiple variants in one test?

Yes, but each additional variant requires more traffic. A/B/n tests need significantly more sample size than simple A/B tests. Consider if multiple variants are truly necessary or if sequential testing is more practical.

When should I stop an A/B test early?

Rarely. Early stopping based on peeking invalidates statistical guarantees. Only stop early for technical failures, severe guardrail violations, or if you have pre-registered an adaptive design with proper statistical correction.

What is a guardrail metric?

A guardrail metric monitors that your test does not cause harm. Examples include: revenue per user, page load time, customer support tickets, or unsubscribe rate. If a guardrail fails, do not ship even if the primary metric wins.

How long should I run an A/B test?

Run tests for at least one full business cycle (typically 1-2 weeks) to account for weekday/weekend variations. Always run for the full calculated sample size, not just a fixed calendar duration.

What if my test shows inconclusive results?

Inconclusive results mean you did not detect a statistically significant difference. This is valuable learning - either your effect size is smaller than expected (need more traffic) or the change has no effect (consider a bolder change).

開発者の詳細

作成者

sickn33

ライセンス

MIT

リポジトリ

https://github.com/sickn33/antigravity-awesome-skills/tree/main/skills/ab-test-setup

参照

main

ファイル構成

📄 SKILL.md