Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@sspanak
Copy link
Owner

@sspanak sspanak commented Oct 16, 2025

No description provided.

@sspanak sspanak self-assigned this Oct 16, 2025
@sspanak sspanak added languages Dictionary or language related issues technical Refactoring without user-facing or functional changes labels Oct 16, 2025
@sspanak sspanak force-pushed the enhanced-transcribed-language-scripts branch from 4b7f2be to 82e2122 Compare October 19, 2025 10:17
@sspanak sspanak marked this pull request as ready for review October 19, 2025 10:17
@sspanak sspanak requested a review from Copilot October 19, 2025 10:17
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances support for transcribed languages and updates frequency handling across scripts and documentation.

  • Add YAML-based layout parsing and grouping for transcribed entries in normalize-transcribed.py
  • Add --prefer-higher flag and improved transcribed handling in inject-dictionary-frequencies.js
  • Increase max frequency from 999 to 9999 and update padding to 4 digits

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
scripts/normalize-transcribed.py Adds YAML layout parsing, groups by layout index pattern, and switches from 'chinese' to 'native' keys.
scripts/inject-dictionary-frequencies.js Adds --prefer-higher flag, adjusts parsing for transcribed inputs, and changes keying for frequency lookups.
app/constants.gradle Bumps MAX_WORD_FREQUENCY to 9999 to match new constraints.
app/build-dictionaries.gradle Pads frequency prefixes to 4 digits for sorting consistency with new max.
CONTRIBUTING.md Updates documented valid frequency range to 0–9999.

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@sspanak sspanak force-pushed the enhanced-transcribed-language-scripts branch from 82e2122 to 79e3dd7 Compare October 19, 2025 10:32
@sspanak sspanak requested a review from Copilot October 19, 2025 10:33
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

matched = False
for symbol in sorted_symbols:
if latin.startswith(symbol, i):
index_seq.append(str(layout_dict[symbol]))
Copy link

Copilot AI Oct 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concatenating numeric group indices without a delimiter can create ambiguous keys once indexes reach two digits (e.g., [1,10] vs [11,0] both become '110'). Use a non-ambiguous key, e.g., store integers and use a tuple.

Copilot uses AI. Check for mistakes.
with open(yaml_path, encoding='utf-8') as f:
data = yaml.safe_load(f)

if "layout" not in data or not isinstance(data["layout"], list):
Copy link

Copilot AI Oct 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the YAML file is empty or doesn't parse to a mapping, data will be None, and 'layout' in data will raise a TypeError. Guard for dict first: if not isinstance(data, dict) or 'layout' not in data or not isinstance(data['layout'], list):

Suggested change
if "layout" not in data or not isinstance(data["layout"], list):
if not isinstance(data, dict) or "layout" not in data or not isinstance(data["layout"], list):

Copilot uses AI. Check for mistakes.
const parts = line.split(DELIMITER);
const word = parts[0].toLocaleLowerCase(locale);
let frequency = parts.length > 1 ? Number.parseInt(parts[1]) : 0;
const wordId = transcribed && parts.length >= 2 ? `${parts[0]}${parts[1]}` : parts[0].toLocaleLowerCase(locale);
Copy link

Copilot AI Oct 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The composite key concatenates the first two columns without a separator, which can collide (e.g., 'a'+'bc' vs 'ab'+'c'). Use a separator unlikely to appear in fields, such as DELIMITER.

Copilot uses AI. Check for mistakes.
if (transcribed) {
const parts = line.split(DELIMITER);
word = parts[0];
wordId = parts.length > 1 ? `${parts[0]}${parts[1]}` : parts[0];
Copy link

Copilot AI Oct 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same collision risk as above when forming the composite key for dictionary lines. Use a separator like DELIMITER to avoid ambiguity.

Copilot uses AI. Check for mistakes.
@sspanak sspanak force-pushed the enhanced-transcribed-language-scripts branch from 79e3dd7 to e3148e5 Compare October 19, 2025 10:44
@sspanak sspanak merged commit fd604aa into master Oct 19, 2025
5 checks passed
@sspanak sspanak deleted the enhanced-transcribed-language-scripts branch October 19, 2025 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

languages Dictionary or language related issues technical Refactoring without user-facing or functional changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants