Status: Framework defined — awaiting clinical partner onboarding and prospective data collection.
This document defines the clinical validation programme for Trij's AI-assisted triage assessments. It serves as the protocol reference for ongoing validation activities.
- Measure the diagnostic accuracy of Trij's AI assessments against a clinician-adjudicated ground truth
- Compute per-condition sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV)
- Ensure equitable performance across demographic subgroups (Fitzpatrick skin tone, age, sex, image quality)
- Establish inter-rater agreement between AI and clinician reviewers
| Condition Category | Minimum Labelled Cases | Source |
|---|---|---|
| Dermatological (wounds, rashes, infections) | 1,000 | Prospective CHW field collection |
| Respiratory (pneumonia, bronchitis, COPD) | 1,000 | Prospective CHW field collection |
| Fever / Infectious (malaria, typhoid, UTI) | 1,000 | Prospective CHW field collection |
| Gastrointestinal (diarrhoea, dehydration) | 1,000 | Prospective CHW field collection |
| Neurological (stroke, meningitis, neuropathy) | 500 | Retrospective dataset + field collection |
| Nutritional (SAM, MAM, obesity) | 500 | Prospective CHW field collection |
| Eye / Ear (conjunctivitis, otitis media) | 500 | Prospective CHW field collection |
| Musculoskeletal (fracture, sprain) | 500 | Prospective CHW field collection |
| Total | 6,000 |
Each condition category must be stratified across:
| Stratification Variable | Levels |
|---|---|
| Fitzpatrick skin tone | I–VI (minimum 15% per range I–III and IV–VI) |
| Age group | 0–4, 5–17, 18–59, 60+ (minimum 10% each) |
| Sex | Male, Female, Other (minimum 25% each for Male/Female) |
| Image quality | Good, Acceptable, Poor (as rated by validateImageQuality()) |
| Device class | Budget (<$150), Mid-range ($150–$400), Premium (>$400) |
For each validation case, ground truth is established as follows:
- Clinician panel review: Each case reviewed independently by ≥2 licensed physicians
- Adjudication: Where panelists disagree, a third senior clinician adjudicates
- Reference standard: Final diagnosis is the majority/adjudicated clinician opinion, supported by:
- Laboratory results where available (culture, PCR, rapid diagnostic test)
- Radiological imaging reports where available
- Histopathology results where applicable
- Clinical response to treatment at follow-up
| Metric | Definition | Target |
|---|---|---|
| Sensitivity | TP / (TP + FN) | ≥85% |
| Specificity | TN / (TN + FP) | ≥90% |
| Positive Predictive Value (PPV) | TP / (TP + FP) | ≥80% |
| Negative Predictive Value (NPV) | TN / (TN + FN) | ≥92% |
| AUC-ROC | Area under receiver operating characteristic curve | ≥0.90 |
| Metric | Definition | Target |
|---|---|---|
| Urgency classification accuracy | % where AI urgency (RED/YELLOW/GREEN) matches clinician | ≥90% |
| Agreement with top-3 differential | % where true diagnosis is in AI top 3 | ≥95% |
| Mean confidence calibration error | ≤0.10 |
- Cohen's Kappa (κ) computed between AI and each clinician reviewer
- Target: κ ≥ 0.80 (substantial agreement)
Assessment data is collected through normal CHW usage. When a validation study is active:
- CHW completes a triage assessment in the app
- With patient consent, the case (de-identified images + structured findings) is queued for review
- Cases are batched and sent to the clinician review panel
- Clinicians review via a web-based review dashboard and record their diagnosis, urgency, and confidence
The validation pipeline runs as part of the CI/CD process:
flowchart LR
A[CHW Assessment] --> B[De-identify Case]
B --> C[Queue for Review]
C --> D[Clinician Panel Review]
D --> E[Adjudication]
E --> F[Compute Metrics]
F --> G[Update CLINICAL_VALIDATION.md]
G --> H[Publish Report]
Before any model update is deployed to production:
- Run validation on the held-out test set (20% of total labelled data)
- Verify metrics meet or exceed the current deployed model
- If metrics regress, flag for investigation — do not deploy
- Publish comparative results as a PR comment on the model update PR
No results published yet — validation is in the setup phase.
| Condition | Sensitivity | Specificity | PPV | NPV | AUC | n | Last Updated |
|---|---|---|---|---|---|---|---|
| TBD | — | — | — | — | — | 0 | — |
Results will be published here as they become available. See validation protocol above for methodology.
| Activity | Status | Lead | Target Completion |
|---|---|---|---|
| Define validation protocol | ✅ Complete | — | — |
| Recruit clinician reviewers (≥2 licensed physicians) | 🔴 Pending | — | Q3 2026 |
| Partner with clinical institutions | 🔴 Pending | — | Q3 2026 |
| Begin prospective data collection | 🔴 Pending | — | Q3 2026 |
| Interim analysis (n=500 per category) | 🔴 Pending | — | Q4 2026 |
| Full analysis (n=6,000) | 🔴 Pending | — | Q2 2027 |
| Publish validation results | 🔴 Pending | — | Q2 2027 |
Trij supports 7 languages. Each non-English language pack undergoes medical terminology review per the localisation review methodology.
| Language | Locale | Medical Review Status | Rating |
|---|---|---|---|
| English (source) | en-US | Source language | Certified |
| Spanish | es-ES | Pending | Draft |
| French | fr-FR | Pending | Draft |
| Swahili | sw-KE | Pending | Draft |
| Hindi | hi-IN | Pending | Draft |
| Portuguese (BR) | pt-BR | Pending | Draft |
| Arabic | ar-SA | Pending | Draft |
Ratings:
- Certified — All clinical strings reviewed and accurate
- Conditional — Critical strings reviewed, suitable for pilot
- Draft — Not yet reviewed, available for testing only
See docs/localisation-review/ for detailed findings per language.
- The labelled validation dataset is stored in a separate, access-controlled repository
- All clinician reviewers will sign confidentiality and data handling agreements
- The validation protocol is reviewed by an ethics committee / IRB where applicable
- Results are published as a scientific poster or manuscript upon completion
- STARD 2015 guidelines for diagnostic accuracy studies
- WHO Ethical Standards for Research Involving Human Subjects
- FDA Guidance on Clinical Decision Support Software
- CONSORT-AI extension for AI clinical trials