Remediation: clawbio_bench external audit findings (80/140 passing)

## Context

Sergey Kornilov (Biostochastics, LLC) built [clawbio_bench](https://github.com/biostochastics/clawbio_bench) v0.1.0, an independent audit suite testing ClawBio skills across three dimensions: safety, correctness, and honesty. Run against commit `1481fb4`, result: **80/140 tests passing (57.1%)**.

Full remediation plan: [REMEDIATION-PLAN.md](https://github.com/ClawBio/ClawBio/blob/main/REMEDIATION-PLAN.md)

## Scorecard

| Skill | Pass | Fail | Rate | Worst Finding |
|-------|------|------|------|---------------|
| bio-orchestrator | 41 | 13 | 75.9% | stub_silent, routed_wrong |
| equity-scorer | 3 | 12 | 20.0% | fst_mislabeled, heim_unbounded, edge_crash |
| nutrigx-advisor | 8 | 2 | 80.0% | snp_invalid, score_incorrect |
| pharmgx-reporter | 14 | 19 | 42.4% | correct_determinate, disclosure_failure |
| claw-metagenomics | 6 | 1 | 85.7% | exit_suppressed |
| fine-mapping | 4 | 12 | 25.0% | pathology_flagged |
| clinical-variant | 4 | 1 | 80.0% | report_structure_complete |

## Fixes shipped

### Equity-scorer (was 3/15, 20%)

| Finding | Fix | Commit |
|---------|-----|--------|
| C-06 `fst_mislabeled` | Renamed all output labels from "Hudson FST" to "Nei's GST". Added Nei 1973 citation. | `f6076f5` |
| U-2/F-27 `heim_unbounded` | Added weight normalization (sum to 1.0), negative weight rejection, zero weight rejection, score clamping to [0, 100]. | `f6076f5` |
| `edge_crash` (9 tests) | Added 8 edge case tests. Core computation functions pass all edge cases. Remaining CLI-layer crashes need separate investigation. | `b8d9d6a` |

### Fine-mapping (was 4/16, 25%)

| Finding | Fix | Commit |
|---------|-----|--------|
| Purity mean vs min | Changed `_purity()` from `np.mean` to `np.min` per Wang et al. 2020 section 3.2. | `c65d84a` |
| PIP formula | Verified correct: uses `1 - prod(1 - alpha)`. No change needed. | `c65d84a` |
| Input validation | Added `ValueError` contracts for `n <= 0`, `w <= 0`, NaN z-scores, `coverage` outside (0,1], `min_purity` outside [0,1]. SE <= 0 warning in sumstats loader. | `c65d84a` |

### Test counts after fixes

- equity-scorer: 36/36 passing
- fine-mapping: 76/76 passing

## Remaining tasks

- [ ] SuSiE null component (P0, architectural, needs work against susieR reference)
- [ ] PharmGx CPIC gaps, 19 failures (P1, need per-test breakdown from bench)
- [ ] Bio-orchestrator stub_silent + routing collision (P1)
- [ ] NutriGx empty input + allele mismatch (P1)
- [ ] Metagenomics exit suppression (P2)

## CI integration

A `scientific-audit` job has been added to CI that runs `clawbio-bench --smoke` after unit tests pass. Verdicts uploaded as artifacts with 30-day retention.

## Benchmark repo

https://github.com/biostochastics/clawbio_bench — independent external standard maintained by Sergey Kornilov. We are requesting contributor access to help extend coverage.

cc @camlloyd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remediation: clawbio_bench external audit findings (80/140 passing) #106

Context

Scorecard

Fixes shipped

Equity-scorer (was 3/15, 20%)

Fine-mapping (was 4/16, 25%)

Test counts after fixes

Remaining tasks

CI integration

Benchmark repo

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Skill	Pass	Fail	Rate	Worst Finding
bio-orchestrator	41	13	75.9%	stub_silent, routed_wrong
equity-scorer	3	12	20.0%	fst_mislabeled, heim_unbounded, edge_crash
nutrigx-advisor	8	2	80.0%	snp_invalid, score_incorrect
pharmgx-reporter	14	19	42.4%	correct_determinate, disclosure_failure
claw-metagenomics	6	1	85.7%	exit_suppressed
fine-mapping	4	12	25.0%	pathology_flagged
clinical-variant	4	1	80.0%	report_structure_complete

Finding	Fix	Commit
C-06 `fst_mislabeled`	Renamed all output labels from "Hudson FST" to "Nei's GST". Added Nei 1973 citation.	`f6076f5`
U-2/F-27 `heim_unbounded`	Added weight normalization (sum to 1.0), negative weight rejection, zero weight rejection, score clamping to [0, 100].	`f6076f5`
`edge_crash` (9 tests)	Added 8 edge case tests. Core computation functions pass all edge cases. Remaining CLI-layer crashes need separate investigation.	`b8d9d6a`

Finding	Fix	Commit
Purity mean vs min	Changed `_purity()` from `np.mean` to `np.min` per Wang et al. 2020 section 3.2.	`c65d84a`
PIP formula	Verified correct: uses `1 - prod(1 - alpha)`. No change needed.	`c65d84a`
Input validation	Added `ValueError` contracts for `n <= 0`, `w <= 0`, NaN z-scores, `coverage` outside (0,1], `min_purity` outside [0,1]. SE <= 0 warning in sumstats loader.	`c65d84a`

Uh oh!

Remediation: clawbio_bench external audit findings (80/140 passing) #106

Description

Context

Scorecard

Fixes shipped

Equity-scorer (was 3/15, 20%)

Fine-mapping (was 4/16, 25%)

Test counts after fixes

Remaining tasks

CI integration

Benchmark repo

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions