- 
            Updated
            Sep 21, 2023 
- Jupyter Notebook
llm-evaluation
Here are 333 public repositories matching this topic...
This repository contains the lab work for Coursera course on "Generative AI with Large Language Models".
- 
            Updated
            Dec 1, 2023 
- Jupyter Notebook
Exploring the depths of LLMs 🚀
- 
            Updated
            Dec 7, 2023 
- Jupyter Notebook
Template for an AI application that extracts the job information from a job description using openAI functions and langchain
- 
            Updated
            Dec 21, 2023 
- Python
Mind Your Format: Towards Consistent Evaluation of In-Context Learning Improvements
- 
            Updated
            Jan 18, 2024 
- Jupyter Notebook
[EMNLP 2023] Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators
- 
            Updated
            Jan 22, 2024 
- Python
A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
- 
            Updated
            Jan 29, 2024 
- Python
- 
            Updated
            Feb 4, 2024 
- Jupyter Notebook
Calibration game is a game to get better at identifying hallucination in LLMs.
- 
            Updated
            Feb 4, 2024 
- CSS
Code and data for Koo et al's ACL 2024 paper "Benchmarking Cognitive Biases in Large Language Models as Evaluators"
- 
            Updated
            Feb 16, 2024 
- Jupyter Notebook
Upload, score, and visually compare multiple LLM-graded summaries simultaneously!
- 
            Updated
            Mar 8, 2024 
- JavaScript
Evaluating LLMs with CommonGen-Lite
- 
            Updated
            Mar 21, 2024 
- Python
Visualize LLM Evaluations for OpenAI Assistants
- 
            Updated
            Mar 27, 2024 
- TypeScript
For the purposes of familiarization and learning. Consists of utilizing LangChain framework, LangSmith for tracing, OpenAI LLM models, Pinecone serverless vectorDB using Jupyter Notebook and Python.
- 
            Updated
            Mar 29, 2024 
- Jupyter Notebook
It is a comprehensive resource hub compiling all LLM papers accepted at the International Conference on Learning Representations (ICLR) in 2024.
- 
            Updated
            Apr 4, 2024 
[Personalize@EACL 2024] LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models.
- 
            Updated
            Apr 8, 2024 
- Python
LLM Security Platform Docs
- 
            Updated
            Apr 9, 2024 
- MDX
Evaluate LLMs using custom functions for reasoning and RAGs and dataset using Langchain
- 
            Updated
            Apr 21, 2024 
- Jupyter Notebook
FactScoreLite is an implementation of the FactScore metric, designed for detailed accuracy assessment in text generation. This package builds upon the framework provided by the original FactScore repository, which is no longer maintained and contains outdated functions.
- 
            Updated
            Apr 25, 2024 
- Python
Improve this page
Add a description, image, and links to the llm-evaluation topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the llm-evaluation topic, visit your repo's landing page and select "manage topics."