-
-
Notifications
You must be signed in to change notification settings - Fork 216
Extract and build a translation dictionary for terminologies across different po files #1105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 3.13
Are you sure you want to change the base?
Conversation
…onaries Co-authored-by: josix <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think we need the create_focused_dictionary.py and extract_terminology scripts for the extraction. We can rely on the LLM’s knowledge in @copilot to identify the terms directly and update the CSV files accordingly, without using any additional scripts.
Co-authored-by: josix <[email protected]>
I've removed the extract_terminology.py and create_focused_dictionary.py scripts as requested. The terminology dictionaries (terminology_dictionary.csv and focused_terminology_dictionary.csv) are now maintained via LLM knowledge rather than automated extraction scripts. Updated documentation reflects this approach. Changes in commit f722995. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-generate the csv files according to the new consolidation approache in the TERMINOLOGY_DICTIONARY.md
… approach Co-authored-by: josix <[email protected]>
Regenerated both CSV files according to the consolidation approach. The dictionaries now contain proper Python terminology with accurate Chinese translations (196 total terms, 118 focused terms) organized by category and priority. Changes in commit 2400cb0. |
This PR implements a comprehensive terminology extraction system to help maintain consistent translations across the Python documentation project.
Overview
The implementation provides tools to extract key terms and their translations from all .po files in the repository, creating reference dictionaries that translators can use to ensure consistency.
Key Features
Tools Added
.scripts/extract_terminology.py
Main extraction script that:
.scripts/create_focused_dictionary.py
Curation script that:
Generated Dictionaries
terminology_dictionary.csv
Complete dictionary with columns: source_term, translated_term, frequency, files_count, source_file, directory, example_files
focused_terminology_dictionary.csv
Curated dictionary with additional columns: priority, category
Example high-priority terms:
Documentation
TERMINOLOGY_DICTIONARY.md
: Comprehensive documentation covering usage, integration, and technical details.scripts/README.md
: Integration with existing translation toolsBenefits for Translators
Usage
The tools can be re-run as translations are updated to maintain current terminology references.
Fixes #1104.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.