-
-
Notifications
You must be signed in to change notification settings - Fork 81
Description
Environment
- OS: Windows 10 / 11
- Python: CPython 3.x
- gita version: (please fill in, e.g. 0.16.x)
- Terminal encoding: GBK / UTF-8 (reproducible regardless)
Description
Running gita ll on Windows consistently crashes with a UnicodeDecodeError when multiple repositories are configured.
The error occurs while gita tries to read git output (commit messages) concurrently using ThreadPoolExecutor. On Windows, subprocess output decoding combined with threaded execution causes decoding failures, especially when commit messages contain non-GBK / UTF-8 characters (e.g. emoji, CJK characters).
This issue is systematic, not repo-specific: once multiple repositories are involved, gita ll crashes reliably.
Traceback (excerpt)
UnicodeDecodeError: 'gbk' codec can't decode byte 0xaf in position XX
File ".../gita/utils.py", in describe
File ".../concurrent/futures/thread.py", in run
File ".../gita/info.py", in get_commit_msg
Root Cause
The crash is caused by:
describe()usingThreadPoolExecutor- Per-repo info functions invoking
subprocess.run(...) - Subprocess output being decoded with the Windows default codec (
gbk) - Any non-decodable byte causing the entire threaded execution to fail
Because exceptions raised inside the thread pool are not isolated per repository, one failure aborts the entire gita ll command.
Proposed Fix (Minimal & Safe)
Wrap the per-repository formatting logic inside describe() with a safe wrapper so that:
- The repository name causing the failure is identifiable
- One failing repository does not crash the entire command
- The fix is thread-safe and does not change behavior on Linux/macOS
Example patch:
def describe(repos: Dict[str, Dict[str, str]], no_colors: bool = False):
if repos:
truncator = info.Truncate()
name_width = len(max(repos, key=len)) + 1
funcs = info.get_info_funcs(no_colors=no_colors)
def _safe_describe(name, name_width=name_width):
try:
return f"{name:<{name_width}}{' '.join(f(repos[name], truncator) for f in funcs)}"
except Exception as e:
return f"{name:<{name_width}}[ERROR] {e!r}"
num_threads = min(multiprocessing.cpu_count(), len(repos))
with ThreadPoolExecutor(max_workers=num_threads) as executor:
for line in executor.map(_safe_describe, sorted(repos)):
yield lineThis prevents a single Unicode decoding failure from crashing gita ll on Windows and makes the command robust in mixed-encoding environments.
Notes
- This issue is rarely seen on Linux/macOS due to UTF-8 defaults
- On Windows, it is easily reproducible with repositories containing emoji or CJK characters in commit messages
- A more complete solution may also involve explicitly specifying
encoding="utf-8", errors="replace"in allsubprocess.runcalls
Expected Behavior
gita ll should continue listing repositories even if one repository fails to retrieve commit information, instead of aborting the entire command.