Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Remediation: clawbio_bench external audit findings (80/140 passing) #106

Description

@manuelcorpas

Context

Sergey Kornilov (Biostochastics, LLC) built clawbio_bench v0.1.0, an independent audit suite testing ClawBio skills across three dimensions: safety, correctness, and honesty. Run against commit 1481fb4, result: 80/140 tests passing (57.1%).

Full remediation plan: REMEDIATION-PLAN.md

Scorecard

Skill Pass Fail Rate Worst Finding
bio-orchestrator 41 13 75.9% stub_silent, routed_wrong
equity-scorer 3 12 20.0% fst_mislabeled, heim_unbounded, edge_crash
nutrigx-advisor 8 2 80.0% snp_invalid, score_incorrect
pharmgx-reporter 14 19 42.4% correct_determinate, disclosure_failure
claw-metagenomics 6 1 85.7% exit_suppressed
fine-mapping 4 12 25.0% pathology_flagged
clinical-variant 4 1 80.0% report_structure_complete

Fixes shipped

Equity-scorer (was 3/15, 20%)

Finding Fix Commit
C-06 fst_mislabeled Renamed all output labels from "Hudson FST" to "Nei's GST". Added Nei 1973 citation. f6076f5
U-2/F-27 heim_unbounded Added weight normalization (sum to 1.0), negative weight rejection, zero weight rejection, score clamping to [0, 100]. f6076f5
edge_crash (9 tests) Added 8 edge case tests. Core computation functions pass all edge cases. Remaining CLI-layer crashes need separate investigation. b8d9d6a

Fine-mapping (was 4/16, 25%)

Finding Fix Commit
Purity mean vs min Changed _purity() from np.mean to np.min per Wang et al. 2020 section 3.2. c65d84a
PIP formula Verified correct: uses 1 - prod(1 - alpha). No change needed. c65d84a
Input validation Added ValueError contracts for n <= 0, w <= 0, NaN z-scores, coverage outside (0,1], min_purity outside [0,1]. SE <= 0 warning in sumstats loader. c65d84a

Test counts after fixes

  • equity-scorer: 36/36 passing
  • fine-mapping: 76/76 passing

Remaining tasks

  • SuSiE null component (P0, architectural, needs work against susieR reference)
  • PharmGx CPIC gaps, 19 failures (P1, need per-test breakdown from bench)
  • Bio-orchestrator stub_silent + routing collision (P1)
  • NutriGx empty input + allele mismatch (P1)
  • Metagenomics exit suppression (P2)

CI integration

A scientific-audit job has been added to CI that runs clawbio-bench --smoke after unit tests pass. Verdicts uploaded as artifacts with 30-day retention.

Benchmark repo

https://github.com/biostochastics/clawbio_bench — independent external standard maintained by Sergey Kornilov. We are requesting contributor access to help extend coverage.

cc @camlloyd

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions