Thanks to visit codestin.com
Credit goes to Github.com

Skip to content
View yangyangyang127's full-sized avatar

Block or report yangyangyang127

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Summarize existing representative LLMs text datasets.

1,426 140 Updated Oct 11, 2025

Build Your Own Local AI-Powered NSFC Proposal Writing Assistant

Python 45 12 Updated Dec 25, 2025
Python 21 2 Updated Mar 20, 2025

Henbio Pathogen Detection Toolkit

R 2 Updated Jul 9, 2025

Material Safety Data Sheets - Operator Procedures Prediction (aka MSDS-OPP)

Python 9 Updated Dec 27, 2020

Therapeutics Commons (TDC): Multimodal Foundation for Therapeutic Science

Jupyter Notebook 1,198 205 Updated Jul 13, 2025

A curated collection of papers, datasets, and resources on Scientific Datasets and Large Language Models (LLMs)

432 32 Updated Oct 3, 2025

Benchmark for evaluating capabilities of AI models to understand biological lab protocols

Python 6 1 Updated Apr 7, 2025

A topic-centric list of HQ open datasets.

72,434 11,114 Updated Jan 22, 2026

Measuring correlations between safety benchmarks and general AI capabilities benchmarks.

Python 11 3 Updated Oct 2, 2024
Python 5 2 Updated Aug 3, 2025

No fortress, purely open ground. OpenManus is Coming.

Python 53,579 9,416 Updated Jan 5, 2026

A programming framework for agentic AI

Python 53,922 8,140 Updated Jan 22, 2026
Python 5 Updated Jun 9, 2025

The official GitHub repository of the paper "Recent advances in large langauge model benchmarks against data contamination: From static to dynamic evaluation"

400 40 Updated Sep 13, 2025

Easy Data Preparation with latest LLMs-based Operators and Pipelines.

Python 2,692 169 Updated Jan 23, 2026

Large datasets for conversational AI

Python 1,381 176 Updated Nov 16, 2019

贴吧接口合集✨可用于工具箱/吧务管理/数据采集

Python 565 82 Updated Jan 20, 2026

Covid-19 Twitter dataset for non-commercial research use and pre-processing scripts - under active development

Jupyter Notebook 480 189 Updated Apr 17, 2023

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 66,485 8,089 Updated Jan 25, 2026

Align Anything: Training All-modality Model with Feedback

Python 4,625 509 Updated Nov 27, 2025
Python 20 1 Updated Jun 16, 2025

[ICLR Workshop 2025] An official source code for paper "GuardReasoner: Towards Reasoning-based LLM Safeguards".

Python 164 18 Updated May 19, 2025

[S&P 2026] SoK: Evaluating Jailbreak Guardrails for Large Language Models

Python 35 4 Updated Dec 17, 2025

A Python library for guardrail models evaluation.

Python 30 6 Updated Oct 9, 2025

ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors [EMNLP 2024 Findings]

Python 221 10 Updated Sep 29, 2024

The RedPajama-Data repository contains code for preparing large datasets for training large language models.

Python 4,923 369 Updated Dec 7, 2024

[ICML 2025] Official repository for paper "OR-Bench: An Over-Refusal Benchmark for Large Language Models"

Python 21 2 Updated Mar 4, 2025

Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷

Python 5,780 318 Updated Jan 26, 2026
Next