- Seika-cho, Kyoto, Japan
- https://sites.google.com/view/shigashiyama
Stars
Tools for evaluating the performance of MT metrics on data from recent WMT metrics shared tasks.
BLEURT is a metric for Natural Language Generation based on transfer learning.
A set of Python scripts for preprocessing the Wikidata JSON dump and running simple queries in an efficient manner.
歴史資料の市民参加型翻刻プラットフォーム「みんなで翻刻」のテキストデータ置き場です。 / Transcription texts created on Minna de Honkoku (https://honkoku.org), a crowdsourced transcription platform for historical Japanese documents.
Repository includes scripts for MQM error analysis and annotation results from English to Chinese.
🔥 Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation 🔥. Our toolkit integrates 40 pre-retrieved benchmark datasets and supports 7+ retrieval techn…
100+ Fine-tuning Tutorial Notebooks on Google Colab, Kaggle and more.
Official inference framework for 1-bit LLMs
NLP2025 のチュートリアル「地理情報と言語処理 実践入門」の資料とソースコード
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions (AAAI2024)
NLP2024 チュートリアル3 作って学ぶ日本語大規模言語モデル - 環境構築手順とソースコード / NLP2024 Tutorial 3: Practicing how to build a Japanese large-scale language model - Environment construction and experimental source codes
CLI for loading Wikidata subsets (or all of it) into Elasticsearch
ReFinED is an efficient and accurate entity linking (EL) system.
Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference?
SikuBERT:四库全书的预训练语言模型(四库BERT) Pre-training Model of Siku Quanshu
MultiLexNorm 2021 competition system from ÚFAL
Evaluate your speech-to-text system with similarity measures such as word error rate (WER)