Langtrace 🔍 is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evaluations and metrics for popular LLMs, LLM frameworks, vector…

TypeScript 1,171 119 Updated Nov 17, 2025

BrainBlend-AI / atomic-agents

Building AI agents, atomically

Python 5,525 456 Updated Jan 3, 2026

RQLabsAI / SyntheticGenAgent

Generuj nieskończony i zdywersyfikowany zbiór danych przy użyciu systemu agentowego!

Python 1 Updated Jun 13, 2025

Shubhamsaboo / awesome-llm-apps

Collection of awesome LLM apps with AI Agents and RAG using OpenAI, Anthropic, Gemini and opensource models.

Python 89,548 12,912 Updated Jan 24, 2026

goodmike31 / blog-agent-evals

1 Updated Jun 21, 2025

BerriAI / litellm

Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthr…

Python 34,511 5,475 Updated Jan 26, 2026

google / adk-samples

A collection of sample agents built with Agent Development Kit (ADK)

Python 8,225 2,205 Updated Jan 26, 2026

deepgram-devs / voice-agent-medical-assistant-demo

A Medical / Clinical Note Taking Demo Application using Deepgram Voice Agent API

TypeScript 12 13 Updated Jul 9, 2025

mlfoundations / dclm

DataComp for Language Models

HTML 1,409 129 Updated Sep 9, 2025

truera / trulens

Evaluation and Tracking for LLM Experiments and AI Agents

Python 3,057 246 Updated Jan 24, 2026

Arize-ai / phoenix

AI Observability & Evaluation

Jupyter Notebook 8,373 695 Updated Jan 26, 2026

goodmike31 / pl-asr-bigos-tools

Extendable toolkit for comprehensive evaluation of ASR systems. Currently supports benchmarking 29 system-models combination for Polish using BIGOS datasets.

Python 11 2 Updated Mar 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Michał Junczyk goodmike31

Block or report goodmike31

Lists (2)

🔮 Future ideas

✨ Inspiration

Stars

pyjanitor-devs / pyjanitor

anthropics / prompt-eng-interactive-tutorial

axon-rl / gem

TrustJudge / TrustJudge

webdataset / webdataset

OFA-Sys / AIR-Bench

AudioLLMs / AudioBench

NVIDIA / NeMo-speech-data-processor

TencentARC / AudioStory

Accenture / mcp-bench

GaiZhenbiao / ChuanhuChatGPT

dequelabs / axe-core

jestjs / jest

microsoft / playwright

Cloud-CV / EvalAI

haizelabs / verdict

llm-as-a-judge / Awesome-LLM-as-a-judge

ai-evals-course / recipe-chatbot

Scale3-Labs / langtrace