Thanks to visit codestin.com
Credit goes to Github.com

Skip to content
View gkiril's full-sized avatar

Block or report gkiril

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A library for making RepE control vectors

Jupyter Notebook 689 56 Updated Sep 24, 2025

The official repo for the Dialz Python library - a toolkit for steering vector research.

Jupyter Notebook 22 2 Updated Jul 9, 2025

Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models know themselves through automated interpretability.

Python 240 55 Updated Feb 16, 2026

[ICLR 2025] This is the official implementation for the paper: "Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation"

Python 42 6 Updated Jun 11, 2025

Large language model and dataset for natural language to first-order logic translation

Jupyter Notebook 73 6 Updated Oct 25, 2023

Awesome things about LLM-powered agents. Papers / Repos / Blogs / ...

2,200 188 Updated Apr 30, 2025

This rep is done for the bachelors project VU - AI.

Python 1 Updated Jun 30, 2025

ACL2023 - AlignScore, a metric for factual consistency evaluation.

Python 152 30 Updated Mar 11, 2024
Jupyter Notebook 5 2 Updated Apr 14, 2025

Prefix-Tuning: Optimizing Continuous Prompts for Generation

Python 959 165 Updated Apr 26, 2024

implementation of EMU for KG link prediction

Python 2 Updated May 5, 2025

Repository for "Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators"

12 Updated Mar 25, 2025

This is the repository of our EMNLP 2024 Main conference paper "Revealing Personality Traits: A New Benchmark Dataset for Explainable Personality Recognition on Dialogues".

Python 9 4 Updated Dec 5, 2024

Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"

Python 184 17 Updated May 20, 2025

Apache Wayang is the first cross-platform data processing system.

Java 256 113 Updated Feb 20, 2026

Simple language-driven navigation tasks for studying compositional learning

204 27 Updated Nov 5, 2020

From Chain-of-Thought prompting to OpenAI o1 and DeepSeek-R1 🍓

3,539 198 Updated May 7, 2025

Codebase for "DAVE: Diagnostic benchmark for Audio Visual Evaluation" (NeurIPS 2025 Datasets & Benchmarks)

Python 4 1 Updated Dec 4, 2025
Jupyter Notebook 14 1 Updated Jan 15, 2026

Systematic evaluation framework that automatically rates overthinking behavior in large language models.

Shell 96 14 Updated May 16, 2025

Toolkit for linearizing PDFs for LLM datasets/training

Python 16,924 1,347 Updated Feb 19, 2026

Parsers for clinical trials data from clinicaltrials.gov

Python 8 Updated Jun 28, 2023

Fully open reproduction of DeepSeek-R1

Python 25,893 2,412 Updated Nov 24, 2025

A blueprint for AI development, focusing on applied examples of RAG, information extraction, analysis and fine-tuning in the age of LLMs and agents.

Jupyter Notebook 63 5 Updated Feb 6, 2025

A course on aligning smol models.

Jupyter Notebook 6,583 2,296 Updated Feb 6, 2026

Benchmarking Benchmark Leakage in Large Language Models

JavaScript 59 3 Updated May 20, 2024
Python 57 11 Updated May 10, 2021
Next