# Compare Code Implementations with a Judge Rubric

Teams often struggle to choose between similar code implementations using consistent criteria. This skill provides a structured rubric for scoring, gates, trade-offs, and winner selection.

## Install

```bash
npx skillstore add 2389-research/judge
```

## Metadata

- - Status: approved
- - Slug: 2389-research-judge
- - Version: 1.0.0
- - Author: 2389-research
- - GitHub username: 2389-research
- - License: MIT
- - Repository: https://github.com/2389-research/claude-plugins/tree/main/test-kitchen/skills/judge
- - Ref: main
- - Supported tools: Claude, Codex, Claude Code
- - Risk level: low
- - Quality score: 79
- - Quality tier: bronze
- - Public page: https://skillstore.pages.dev/skills/2389-research-judge
- - Manifest: https://skillstore.pages.dev/api/skills/2389-research-judge/manifest

## Capabilities

- Scores implementations across fitness, complexity, readability, robustness, and maintainability.
- Creates gate checks for test results and design adherence.
- Applies hard gates for large fitness gaps and critical flaws.
- Produces a scorecard that compares multiple implementation variants.
- Summarizes winner rationale and trade-offs for rejected options.

## Use Cases

- Select the best generated implementation: Compare multiple generated code variants and choose the one that best satisfies the stated requirements.
- Review competing architecture approaches: Score different implementation strategies against maintainability, robustness, and purpose fit before merging work.
- Document selection decisions: Create a concise scorecard that explains why one implementation won and what trade-offs remain.

## Prompt Templates

### Score two implementations

```
Use the judge rubric to compare impl-1 and impl-2. Include gate checks, criterion scores, total scores, winner, and trade-offs.
```

### Evaluate three variants

```
Judge impl-1, impl-2, and impl-3 against the original requirements. Apply hard gates and explain the winning implementation.
```

### Compare different solution approaches

```
Use the judge rubric for variant-a and variant-b. Treat fitness gaps as valid solution differences, then identify the best approach.
```

### Audit a close decision

```
Re-score the candidate implementations with special attention to hidden complexity, future maintenance cost, and robustness under realistic load.
```

## Limitations

- It does not run tests or inspect files unless the host agent provides that context.
- It depends on accurate implementation context and test results from earlier workflow phases.
- It is designed for comparison workflows, not general code review alone.
- It uses integer rubric scores and does not support weighted scoring.

## Best Practices

- Provide original requirements, test results, and implementation summaries before invoking the skill.
- Use the same evidence standard for every implementation being compared.
- Record trade-offs even when one implementation clearly wins.

## Anti Patterns

- Do not use it before implementation code and test outcomes are available.
- Do not treat equal total scores as interchangeable without reviewing hard gates.
- Do not skip rubric sections because one implementation appears obviously better.

## Security Audit

- - Safe to publish: true
- - Audited at: 2026-06-27T15:39:00.626\+00:00
- - Summary: Static analysis found several high-risk patterns, but review shows they are false positives from Markdown formatting, rubric text, and placeholder labels. No evidence found for command execution, credential theft, weak cryptography, Windows SAM access, reconnaissance, network use, or prompt injection.

## Stats

- - Views: 158
- - Downloads: 6
- - Favorites: 0
- - Popularity score: 0
