Thanks to visit codestin.com
Credit goes to github.com

Skip to content
#

nshkr-crucible

Here are 20 public repositories matching this topic...

Model evaluation harness for standardized benchmarking—comprehensive metrics (F1, BLEU, ROUGE, METEOR, BERTScore, pass@k), statistical analysis (confidence intervals, effect size, bootstrap CI, ANOVA), multi-model comparison, and report generation. Research-grade evaluation for LLM and ML experiments.

  • Updated Dec 6, 2025
  • Elixir

Metrics aggregation and alerting for ML experiments—multi-backend export (Prometheus, InfluxDB, Datadog, OpenTelemetry), advanced aggregations (percentiles, histograms, moving averages), threshold-based alerting with anomaly detection (z-score, IQR), and time-series storage. Research-grade observability for the NSAI ecosystem.

  • Updated Dec 6, 2025
  • Elixir

Dataset management library for ML experiments—loaders for SciFact, FEVER, GSM8K, HumanEval, MMLU, TruthfulQA, HellaSwag; git-like versioning with lineage tracking; transformation pipelines; quality validation with schema checks and duplicate detection; GenStage streaming for large datasets. Built for reproducible AI research.

  • Updated Dec 6, 2025
  • Elixir

Improve this page

Add a description, image, and links to the nshkr-crucible topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the nshkr-crucible topic, visit your repo's landing page and select "manage topics."

Learn more