-
Updated
Sep 21, 2023 - Jupyter Notebook
llm-evaluation
Here are 333 public repositories matching this topic...
This repository contains the lab work for Coursera course on "Generative AI with Large Language Models".
-
Updated
Dec 1, 2023 - Jupyter Notebook
Exploring the depths of LLMs 🚀
-
Updated
Dec 7, 2023 - Jupyter Notebook
Template for an AI application that extracts the job information from a job description using openAI functions and langchain
-
Updated
Dec 21, 2023 - Python
Mind Your Format: Towards Consistent Evaluation of In-Context Learning Improvements
-
Updated
Jan 18, 2024 - Jupyter Notebook
[EMNLP 2023] Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators
-
Updated
Jan 22, 2024 - Python
A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
-
Updated
Jan 29, 2024 - Python
-
Updated
Feb 4, 2024 - Jupyter Notebook
Calibration game is a game to get better at identifying hallucination in LLMs.
-
Updated
Feb 4, 2024 - CSS
Code and data for Koo et al's ACL 2024 paper "Benchmarking Cognitive Biases in Large Language Models as Evaluators"
-
Updated
Feb 16, 2024 - Jupyter Notebook
Upload, score, and visually compare multiple LLM-graded summaries simultaneously!
-
Updated
Mar 8, 2024 - JavaScript
Evaluating LLMs with CommonGen-Lite
-
Updated
Mar 21, 2024 - Python
Visualize LLM Evaluations for OpenAI Assistants
-
Updated
Mar 27, 2024 - TypeScript
For the purposes of familiarization and learning. Consists of utilizing LangChain framework, LangSmith for tracing, OpenAI LLM models, Pinecone serverless vectorDB using Jupyter Notebook and Python.
-
Updated
Mar 29, 2024 - Jupyter Notebook
It is a comprehensive resource hub compiling all LLM papers accepted at the International Conference on Learning Representations (ICLR) in 2024.
-
Updated
Apr 4, 2024
[Personalize@EACL 2024] LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models.
-
Updated
Apr 8, 2024 - Python
LLM Security Platform Docs
-
Updated
Apr 9, 2024 - MDX
Evaluate LLMs using custom functions for reasoning and RAGs and dataset using Langchain
-
Updated
Apr 21, 2024 - Jupyter Notebook
FactScoreLite is an implementation of the FactScore metric, designed for detailed accuracy assessment in text generation. This package builds upon the framework provided by the original FactScore repository, which is no longer maintained and contains outdated functions.
-
Updated
Apr 25, 2024 - Python
Improve this page
Add a description, image, and links to the llm-evaluation topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the llm-evaluation topic, visit your repo's landing page and select "manage topics."