RawTextCheck is a Python-based integration of LanguageTool, built with PyQt5. Its purpose is to proofread texts found in non-standard formats such as XML, Excel, CSV, and other structured files. These files often contain code elements, markup, or special characters that are not part of the actual text, but are essential for the logic or styling of scripts — especially in video games.
This tool was originally designed for fan translation of video games, but can be adapted to many other use cases where structured text needs linguistic analysis.
To analyze a file, a parser first extracts the relevant text lines. Several generic parsers are already included, and custom parsers can be plugged in for specific formats. The parser itself does not need to clean the text, as that can be handled by a dedicated filtering system afterward.
Each file is analyzed within the context of a project, which defines parameters such as:
- The language of the text
- A list of valid characters
- Words to flag as errors (e.g., banned words)
- Custom dictionary entries to ignore false positives
- Filters to exclude code fragments, either by defining specific tokens or start/end delimiters
- Replacement of values with others
RawTextCheck (LanguageTool) requires Java 17 or higher to run.
Make sure Java is installed and available in your system's PATH before using the tool.
Create a virtual environment:
py -m venv venvActivate it (PowerShell):
.\venv\Scripts\Activate.ps1Install dependencies:
pip install -r requirements.txtRun the project:
py -m rawtextcheck.mainBuild the project:
pyinstaller rawtextcheck.spec| Menu | Item | Description |
|---|---|---|
| Manage | Configure project | Opens the project management window. Use this to create your first project. (Project configuration) |
| Manage | Add google credentials | Add a Google API credentials file. Required for the Google Sheet parser. See this guide. |
| Preference | Language | Select the application language. The app must be restarted for changes to take effect. This does not affect the language used for text analysis. |
| Preference | Theme | Select the application theme. The app must be restarted for changes to take effect. |
- Project selection: Select which project to use for file analysis.
- File path input: Enter the path of the file to analyze. You can also drag and drop a file onto the app.
- Argument parser: Specify arguments for the parser. By default, the arguments defined in the project are used, but you can modify them for specific files if needed.
- Process button: Start the analysis. Processing may take some time. If LanguageTool is not yet initialized, analyzing a 500-line file may take 1–2 minutes.
- Raw Line: The raw line of the selected error. You can insert a suggestion or directly edit the line. Then, you can apply the correction with the button Apply modification. This feature can't be used with errors loaded from history. The process of the file need to be done just before.
- Results table: Displays the analysis results. If a file has been analyzed previously, the most recent results will be loaded automatically.
The results table has 6 columns:
| Column | Description |
|---|---|
| Line Number | The line in the file where the error was found, or the line's ID |
| Line | The text content of the line |
| Error | The word(s) where the error was detected |
| Type | The LanguageTool error type |
| Category | The LanguageTool error category, useful for sorting |
| Explanation | A description of the error |
| Suggestion | Suggested correction(s) for the error |
Each column can be hidden by right-clicking and deselecting it in the Visibility menu. Hidden columns are remembered between sessions. Each column header can be clicked to sort errors.
| Action | Condition | Description |
|---|---|---|
| Delete | Always | Delete the selected row. The Delete key can also be used. |
| Copy text | Always | Copy the cleaned text of the line column. |
| Copy error | Always | Copy the error. |
| Copy suggestion | When suggestions available | Copy the chosen suggestion. |
| Add character to valid characters | Invalid character errors only | Adds the character to valid characters and removes all related errors. |
| Add this word to dictionary | Spelling errors only | Adds the word to the dictionary and removes all related spelling errors. |
| Remove word from the banword list | Banword errors only | Removes the word from the banword list and deletes all related errors. |
| Add {rule} to ignored rules | All other errors | Adds the LanguageTool rule to ignored rules and removes all related errors. |
- Create New Project: Specify a unique name, a language for analysis, and a parser. All settings can be changed later.
- Project selection: Select a project to load and edit its parameters.
- Delete: Delete the currently selected project.
- Import / Export: Import or export a project configuration. Importing overwrites all settings except the project name.
Dictionary
Words that will not generate spelling errors. Add words manually, or delete them via right-click or the Delete key.
Banwords Words that will always generate errors, even if they are correct in your language.
Ignored codes Codes in the text to filter out. These will either be removed or replaced by a space.
For example, given this line:
That's [c4]Yoko Fukunaga[c0].[r]Good, at least I can remember that much.
Adding [c4] and [c0] without the checkbox, and [r] with the checkbox checked, produces:
That's Yoko Fukunaga. Good, at least I can remember that much.
Ignored substrings Filter text using start/end delimiters. Applied after Ignored codes.
Using the same example, instead of adding every [xx] code individually, you can add [ as start and ] as end delimiter (without checkbox) to filter all of them at once:
That's Yoko Fukunaga. Good, at least I can remember that much.
- You can keep "start" blank to remove the beginning of the line until "end" token.
- You can keep "end" blank to remove the end of the line from "start" token.
Replace codes Define substitutions where specific tokens are automatically replaced with another value.
For example, given:
That$s Y#oko Fukunaga.
Setting ' as replacement for $ and ō as replacement for #o produces:
That's Yōko Fukunaga.
Ignored grammar rules LanguageTool rules to ignore globally for this project.
| Field | Description |
|---|---|
| Project name | Change the name of the project |
| Language | Change the language used for LanguageTool analysis |
| Parser | Select which parser to use |
| Argument for parser | Default arguments for the parser |
| Valid characters | Authorized characters. Checkboxes available for the three common space characters |
Use Restore to revert unsaved changes, Save to save, or Save and Quit to save and close.
Arguments must be written in this format:
arg1="value", arg2="value", arg3='value'
Both single and double quotes are accepted.
There are 7 built-in parsers:
Returns every non-empty line in the file. Useful when Ignored codes, Ignored substrings, and Replace codes are sufficient to extract the sentences you need.
| Argument | Required | Description |
|---|---|---|
beginText |
No | Text to start parsing from. Ignored if beginLineNumber is set |
endText |
No | Text to stop parsing at. Ignored if endLineNumber is set |
beginLineNumber |
No | Line number to start parsing from |
endLineNumber |
No | Line number to stop parsing at |
contains |
No | Text each line must contain. Separate multiple values with | |
notContains |
No | Text each line must not contain. Separate multiple values with | |
Returns every non-empty value from a specified column in a CSV file.
| Argument | Required | Description |
|---|---|---|
col |
Yes | Column number (1-based) |
colID |
No | Column to use as row identifier instead of line number |
Returns every non-empty cell from a specified column in an Excel file.
| Argument | Required | Description |
|---|---|---|
col |
Yes | Column letter (e.g., D) |
colID |
No | Column letter to use as row identifier instead of row number |
Returns every non-empty cell from a specified column in a Google Sheet. The file path must be the sheet URL. Credentials are required — see this guide.
| Argument | Required | Description |
|---|---|---|
col |
Yes | Column letter (e.g., D) |
colID |
No | Column letter to use as row identifier instead of row number |
Returns every non-empty value associated with a specified key, recursively across all objects in the JSON structure.
| Argument | Required | Description |
|---|---|---|
key |
Yes | The JSON key whose value to extract |
idKey |
No | Another key to use as row identifier instead of an auto-incremented index |
Returns every non-empty translation string (msgstr) from a PO file.
| Argument | Required | Description |
|---|---|---|
id |
No | Row identifier: line (default) uses the line number, msgid uses the source string |
Returns non-empty text or attribute values from XML elements matching a given tag.
| Argument | Required | Description |
|---|---|---|
tag |
Yes | The XML element tag to extract |
attr |
No | Attribute name to extract instead of element text |
idAttr |
No | Attribute name to use as row identifier. Defaults to line number |
If the built-in parsers are not sufficient, you can create your own.
A parser is a Python file implementing a specific interface. A template is available here. You can also start from any of the default parsers as a base.
To add a parser, place the Python file in the parsers folder. The parser name will match the filename — avoid using the same name as any built-in parser.
Note: All imports used by your parser must already be available in the RawTextCheck environment.
If your parser could be useful to others, consider sharing it in the community parsers repository.

