Langtrace 🔍 is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evaluations and metrics for popular LLMs, LLM frameworks, vector…

TypeScript 1,084 102 Updated Nov 17, 2025

BrainBlend-AI / atomic-agents

Building AI agents, atomically

Python 5,412 445 Updated Dec 24, 2025

RQLabsAI / SyntheticGenAgent

Generuj nieskończony i zdywersyfikowany zbiór danych przy użyciu systemu agentowego!

Python 1 Updated Jun 13, 2025

Shubhamsaboo / awesome-llm-apps

Collection of awesome LLM apps with AI Agents and RAG using OpenAI, Anthropic, Gemini and opensource models.

Python 84,414 12,001 Updated Dec 19, 2025

goodmike31 / blog-agent-evals

1 Updated Jun 21, 2025

BerriAI / litellm

Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthr…

Python 32,984 5,146 Updated Dec 27, 2025

google / adk-samples

A collection of sample agents built with Agent Development Kit (ADK)

Python 7,847 2,099 Updated Dec 24, 2025

deepgram-devs / voice-agent-medical-assistant-demo

A Medical / Clinical Note Taking Demo Application using Deepgram Voice Agent API

TypeScript 10 12 Updated Jul 9, 2025

mlfoundations / dclm

DataComp for Language Models

HTML 1,402 129 Updated Sep 9, 2025

truera / trulens

Evaluation and Tracking for LLM Experiments and AI Agents

Python 2,997 238 Updated Dec 24, 2025

Arize-ai / phoenix

AI Observability & Evaluation

Jupyter Notebook 8,047 661 Updated Dec 27, 2025

goodmike31 / pl-asr-bigos-tools

Extendable toolkit for comprehensive evaluation of ASR systems. Currently supports benchmarking 29 system-models combination for Polish using BIGOS datasets.

Python 11 2 Updated Mar 2, 2025

quotient-ai / judges

A small library of LLM judges

Python 310 31 Updated Jul 31, 2025

DataManagementLab / wikidbs-public

Python 9 Updated Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Michał Junczyk goodmike31

Block or report goodmike31

Lists (2)

🔮 Future ideas

✨ Inspiration

Stars

axon-rl / gem

TrustJudge / TrustJudge

webdataset / webdataset

OFA-Sys / AIR-Bench

AudioLLMs / AudioBench

NVIDIA / NeMo-speech-data-processor

TencentARC / AudioStory

Accenture / mcp-bench

GaiZhenbiao / ChuanhuChatGPT

dequelabs / axe-core

jestjs / jest

microsoft / playwright

Cloud-CV / EvalAI

haizelabs / verdict

llm-as-a-judge / Awesome-LLM-as-a-judge

ai-evals-course / recipe-chatbot

Scale3-Labs / langtrace