Thanks to visit codestin.com
Credit goes to Github.com

Skip to content
View goodmike31's full-sized avatar

Block or report goodmike31

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A Gym for Agentic LLMs

Python 410 27 Updated Dec 23, 2025

TrustJudge is a probabilistic evaluation framework that reduces score-comparison and pairwise transitivity inconsistencies in LLM-as-a-judge systems.

Python 38 1 Updated Sep 27, 2025

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

Python 2,940 227 Updated Jun 19, 2025

AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension

Python 124 6 Updated Dec 9, 2024

AudioBench: A Universal Benchmark for Audio Large Language Models

Python 289 14 Updated Jun 17, 2025

A toolkit for processing speech data and creating speech datasets

Python 194 38 Updated Sep 29, 2025

AudioStory: Generating Long-Form Narrative Audio with Large Language Models

Jupyter Notebook 292 18 Updated Sep 21, 2025

MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers

Python 414 46 Updated Oct 7, 2025

GUI for ChatGPT API and many LLMs. Supports agents, file-based QA, GPT finetuning and query with web search. All with a neat UI.

Python 15,414 2,267 Updated Aug 15, 2025

Accessibility engine for automated Web UI testing

JavaScript 6,776 849 Updated Dec 23, 2025

Delightful JavaScript Testing.

TypeScript 45,226 6,623 Updated Dec 12, 2025

Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.

TypeScript 80,558 4,943 Updated Dec 27, 2025

☁️ 🚀 📊 📈 Evaluating state of the art in AI

Python 1,970 931 Updated Dec 16, 2025

Inference-time scaling for LLMs-as-a-judge.

Jupyter Notebook 319 24 Updated Nov 5, 2025

Langtrace 🔍 is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evaluations and metrics for popular LLMs, LLM frameworks, vector…

TypeScript 1,084 102 Updated Nov 17, 2025

Building AI agents, atomically

Python 5,412 445 Updated Dec 24, 2025

Generuj nieskończony i zdywersyfikowany zbiór danych przy użyciu systemu agentowego!

Python 1 Updated Jun 13, 2025

Collection of awesome LLM apps with AI Agents and RAG using OpenAI, Anthropic, Gemini and opensource models.

Python 84,414 12,001 Updated Dec 19, 2025

Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthr…

Python 32,984 5,146 Updated Dec 27, 2025

A collection of sample agents built with Agent Development Kit (ADK)

Python 7,847 2,099 Updated Dec 24, 2025

A Medical / Clinical Note Taking Demo Application using Deepgram Voice Agent API

TypeScript 10 12 Updated Jul 9, 2025

DataComp for Language Models

HTML 1,402 129 Updated Sep 9, 2025

Evaluation and Tracking for LLM Experiments and AI Agents

Python 2,997 238 Updated Dec 24, 2025

AI Observability & Evaluation

Jupyter Notebook 8,047 661 Updated Dec 27, 2025

Extendable toolkit for comprehensive evaluation of ASR systems. Currently supports benchmarking 29 system-models combination for Polish using BIGOS datasets.

Python 11 2 Updated Mar 2, 2025

A small library of LLM judges

Python 310 31 Updated Jul 31, 2025
Python 9 Updated Jan 23, 2025
Next