Thanks to visit codestin.com
Credit goes to github.com

Skip to content

UnicodeDecodeError on Windows when running gita ll (threaded describe) #302

@FreezeSoul

Description

@FreezeSoul

Environment

  • OS: Windows 10 / 11
  • Python: CPython 3.x
  • gita version: (please fill in, e.g. 0.16.x)
  • Terminal encoding: GBK / UTF-8 (reproducible regardless)

Description

Running gita ll on Windows consistently crashes with a UnicodeDecodeError when multiple repositories are configured.

The error occurs while gita tries to read git output (commit messages) concurrently using ThreadPoolExecutor. On Windows, subprocess output decoding combined with threaded execution causes decoding failures, especially when commit messages contain non-GBK / UTF-8 characters (e.g. emoji, CJK characters).

This issue is systematic, not repo-specific: once multiple repositories are involved, gita ll crashes reliably.


Traceback (excerpt)

UnicodeDecodeError: 'gbk' codec can't decode byte 0xaf in position XX
  File ".../gita/utils.py", in describe
  File ".../concurrent/futures/thread.py", in run
  File ".../gita/info.py", in get_commit_msg

Root Cause

The crash is caused by:

  1. describe() using ThreadPoolExecutor
  2. Per-repo info functions invoking subprocess.run(...)
  3. Subprocess output being decoded with the Windows default codec (gbk)
  4. Any non-decodable byte causing the entire threaded execution to fail

Because exceptions raised inside the thread pool are not isolated per repository, one failure aborts the entire gita ll command.


Proposed Fix (Minimal & Safe)

Wrap the per-repository formatting logic inside describe() with a safe wrapper so that:

  • The repository name causing the failure is identifiable
  • One failing repository does not crash the entire command
  • The fix is thread-safe and does not change behavior on Linux/macOS

Example patch:

def describe(repos: Dict[str, Dict[str, str]], no_colors: bool = False):
    if repos:
        truncator = info.Truncate()
        name_width = len(max(repos, key=len)) + 1
        funcs = info.get_info_funcs(no_colors=no_colors)

        def _safe_describe(name, name_width=name_width):
            try:
                return f"{name:<{name_width}}{' '.join(f(repos[name], truncator) for f in funcs)}"
            except Exception as e:
                return f"{name:<{name_width}}[ERROR] {e!r}"

        num_threads = min(multiprocessing.cpu_count(), len(repos))
        with ThreadPoolExecutor(max_workers=num_threads) as executor:
            for line in executor.map(_safe_describe, sorted(repos)):
                yield line

This prevents a single Unicode decoding failure from crashing gita ll on Windows and makes the command robust in mixed-encoding environments.


Notes

  • This issue is rarely seen on Linux/macOS due to UTF-8 defaults
  • On Windows, it is easily reproducible with repositories containing emoji or CJK characters in commit messages
  • A more complete solution may also involve explicitly specifying encoding="utf-8", errors="replace" in all subprocess.run calls

Expected Behavior

gita ll should continue listing repositories even if one repository fails to retrieve commit information, instead of aborting the entire command.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions