Stars
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
[ACL2025] STATE ToxiCN: A Benchmark for Span-level Target-Aware Toxicity Extraction in Chinese Hate Speech Detection
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"
An LLM can Fool Itself: A Prompt-Based Adversarial Attack (ICLR 2024)
语料数据和词库收集:中文、英文停用词,情感分析,分类词典,敏感词库(违禁词,审查词)。stop words, sentiment analysis, thesaurus, censorship/sensitive word
A RWKV management and startup tool, full automation, only 8MB. And provides an interface compatible with the OpenAI API. RWKV is a large language model that is fully open source and available for c…
⚡️ 80x faster Fasttext language detection out of the box | Split text by language
搜索、推荐、广告、用增等工业界实践文章收集(来源:知乎、Datafuntalk、技术公众号)
The third application topic: Multimodal Large Language Model security
Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
心理健康大模型 (LLM x Mental Health), Pre & Post-training & Dataset & Evaluation & Depoly & RAG, with InternLM / Qwen / Baichuan / DeepSeek / Mixtral / LLama / GLM series models
该仓库主要记录 NLP 算法工程师相关的面试题
Python implementation of an N-gram language model with Laplace smoothing and sentence generation.
Question and Answer based on Anything.
汉字拆字库,可以将汉字拆解成偏旁部首,在机器学习中作为汉字的字形特征 | Hanzi Decomposition Library allows Chinese characters to be broken down into radicals and components, which can be used as character shape features in machine l…
[EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation"
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
Large-scale, Informative, and Diverse Multi-round Chat Data (and Models)
Yet Another Chinese Spelling Check Dataset (YACSC)
Focal loss for multiple class classification
A PyTorch Implementation of Focal Loss.
pke_zh, python keyphrase extraction for chinese(zh). 中文关键词或关键句提取工具,实现了KeyBert、PositionRank、TopicRank、TextRank等算法,开箱即用。