Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View youngfire's full-sized avatar

Block or report youngfire

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

Python 9,745 1,536 Updated Jan 15, 2026

[ACL2025] STATE ToxiCN: A Benchmark for Span-level Target-Aware Toxicity Extraction in Chinese Hate Speech Detection

Jupyter Notebook 42 3 Updated Oct 25, 2025
Python 1 Updated Jul 1, 2025

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

Python 68,176 9,658 Updated Jan 16, 2026

Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"

Python 103 8 Updated May 20, 2025

An LLM can Fool Itself: A Prompt-Based Adversarial Attack (ICLR 2024)

Python 109 18 Updated Jan 21, 2025

语料数据和词库收集:中文、英文停用词,情感分析,分类词典,敏感词库(违禁词,审查词)。stop words, sentiment analysis, thesaurus, censorship/sensitive word

33 6 Updated May 12, 2025

A RWKV management and startup tool, full automation, only 8MB. And provides an interface compatible with the OpenAI API. RWKV is a large language model that is fully open source and available for c…

TypeScript 6,171 588 Updated Jan 11, 2026

⚡️ 80x faster Fasttext language detection out of the box | Split text by language

Python 283 9 Updated Sep 17, 2025

搜索、推荐、广告、用增等工业界实践文章收集(来源:知乎、Datafuntalk、技术公众号)

HTML 4,181 464 Updated Jan 17, 2026

The third application topic: Multimodal Large Language Model security

Jupyter Notebook 1 Updated Jan 14, 2026

Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts

Python 563 78 Updated Sep 24, 2024

Prompt越狱手册

3,082 315 Updated Dec 17, 2024

心理健康大模型 (LLM x Mental Health), Pre & Post-training & Dataset & Evaluation & Depoly & RAG, with InternLM / Qwen / Baichuan / DeepSeek / Mixtral / LLama / GLM series models

Python 1,669 213 Updated Aug 19, 2025

该仓库主要记录 NLP 算法工程师相关的面试题

Jupyter Notebook 1 1 Updated Sep 17, 2022

The official Meta Llama 3 GitHub site

Python 29,174 3,504 Updated Jan 26, 2025

Python implementation of an N-gram language model with Laplace smoothing and sentence generation.

Python 87 27 Updated Feb 9, 2018

Question and Answer based on Anything.

Python 13,816 1,332 Updated Mar 24, 2025

汉字拆字库,可以将汉字拆解成偏旁部首,在机器学习中作为汉字的字形特征 | Hanzi Decomposition Library allows Chinese characters to be broken down into radicals and components, which can be used as character shape features in machine l…

Python 407 59 Updated Dec 29, 2025

[EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation"

Python 27,377 3,894 Updated Jan 16, 2026

Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。

Python 1,856 207 Updated Jan 16, 2025

Kolors Team

Python 4,591 353 Updated Nov 13, 2024

百度百科 500 万数据集

44 25 Updated Dec 1, 2023

Large-scale, Informative, and Diverse Multi-round Chat Data (and Models)

Python 2,763 137 Updated Mar 13, 2024

Yet Another Chinese Spelling Check Dataset (YACSC)

Python 19 3 Updated Oct 25, 2023

高性能文本 Tokenizer 库

C 32 3 Updated Feb 2, 2024

Focal loss for multiple class classification

Python 83 17 Updated Oct 27, 2020

A PyTorch Implementation of Focal Loss.

Python 992 219 Updated Oct 16, 2019

pke_zh, python keyphrase extraction for chinese(zh). 中文关键词或关键句提取工具,实现了KeyBert、PositionRank、TopicRank、TextRank等算法,开箱即用。

Python 215 33 Updated Mar 27, 2024
Next