llm-evaluation-metrics

Star

Here are 7 public repositories matching this topic...

confident-ai / deepeval

Star

The LLM Evaluation Framework

evaluation-metrics evaluation-framework llm-evaluation llm-evaluation-framework llm-evaluation-metrics

Updated May 6, 2025
Python

locuslab / open-unlearning

Star

A one-stop repository for large language model (LLM) unlearning. Supports TOFU, MUSE and is an easily extensible framework for new datasets, evaluations, methods, and other benchmarks.

open-source benchmarks right-to-be-forgotten privacy-protection unlearning membership-inference-attacks membership-inference llms llm-privacy llm-unlearning llm-evaluation-metrics

Updated Apr 25, 2025
Python

cvs-health / langfair

Star

LangFair is a Python library for conducting use-case level LLM bias and fairness assessments

python ai artificial-intelligence bias fairness ai-safety fairness-testing bias-detection fairness-ai fairness-ml responsible-ai ethical-ai large-language-models llm llm-evaluation llm-evaluation-framework llm-evaluation-metrics

Updated Apr 25, 2025
Python

zhuohaoyu / KIEval

Star

[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models

machine-learning explainable-ai llm llm-evaluation llm-evaluation-toolkit llm-evaluation-framework llm-evaluation-metrics acl2024

Updated Jul 19, 2024
Python

pyladiesams / eval-llm-based-apps-jan2025

Star

Create an evaluation framework for your LLM based app. Incorporate it into your test suite. Lay the monitoring foundation.

workshop llm llms llmops llm-eval llm-test llm-evaluation-framework llm-evaluation-metrics llm-monitoring llm-testing llm-evals

Updated May 6, 2025
Jupyter Notebook

ronniross / llm-confidence-scorer

Sponsor

Star

A set of auxiliary systems designed to provide a measure of estimated confidence for the outputs generated by Large Language Models.

dataset datasets llm llms llm-training llm-evaluation llms-reasoning llm-evaluation-toolkit llms-benchmarking llm-evaluation-framework llm-evaluation-metrics llms-efficency llms-evalution

Updated Apr 30, 2025
Python

ritwickbhargav80 / quick-llm-model-evaluations

Star

This repo is for an streamlit application that provides a user-friendly interface for evaluating large language models (LLMs) using the beyondllm package.

streamlit llms retrieval-augmented-generation llm-evaluation-metrics beyondllm

Updated Aug 29, 2024
Python

Improve this page

Add a description, image, and links to the llm-evaluation-metrics topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-evaluation-metrics topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-evaluation-metrics

Here are 7 public repositories matching this topic...

confident-ai / deepeval

locuslab / open-unlearning

cvs-health / langfair

zhuohaoyu / KIEval

pyladiesams / eval-llm-based-apps-jan2025

ronniross / llm-confidence-scorer

ritwickbhargav80 / quick-llm-model-evaluations

Improve this page

Add this topic to your repo