Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@kofany
Copy link

@kofany kofany commented Sep 10, 2025

Summary

This PR addresses critical UTF-8 display and input issues that occur in modern terminals like Ghostty when handling emoji with variation selectors and other complex Unicode sequences.

The problem manifests as display corruption where emoji sequences like πŸ’•πŸ’‹πŸ˜˜πŸ₯°πŸ’žβ£οΈπŸ’“β™₯️β™₯οΈπŸ’“β£οΈπŸ₯°πŸ₯°πŸ˜˜πŸ’‹πŸ’‹ cause text positioning issues and garbled output in sidepanels and input fields.

Key Changes

  • Enhanced UTF-8 processing: Added grapheme cluster detection using utf8proc when available
  • Fixed input handling: Preserved emoji variation selectors during text input and paste operations
  • Improved text measurement: Proper width calculation for complex Unicode sequences
  • Smart TRANSLIT bypass: Avoided problematic charset conversion for pure UTF-8 content
  • Backward compatibility: All changes gracefully fallback when utf8proc is unavailable

Technical Details

  • src/core/utf8.c: New grapheme cluster advancement functions with utf8proc integration
  • src/fe-text/gui-entry.c: Fixed cursor positioning and text measurement for complex Unicode
  • src/fe-text/gui-readline.c: Enhanced paste processing to handle multi-codepoint sequences
  • src/core/recode.c: Bypass TRANSLIT when both input and output are valid UTF-8

Test Case

The implementation was tested with the emoji sequence that originally caused issues:
πŸ’•πŸ’‹πŸ˜˜πŸ₯°πŸ’žβ£οΈπŸ’“β™₯️β™₯οΈπŸ’“β£οΈπŸ₯°πŸ₯°πŸ˜˜πŸ’‹πŸ’‹

Compatibility

  • Maintains full backward compatibility with existing UTF-8 handling
  • Graceful fallback when utf8proc library is not available
  • No changes to existing APIs or user-facing behavior for standard text

This commit addresses critical UTF-8 display and input issues, particularly
with emoji variations and complex Unicode sequences that caused rendering
problems in modern terminals like Ghostty.

Key improvements:
- Enhanced utf8.c with grapheme cluster detection using utf8proc
- Fixed input field handling to preserve emoji variation selectors
- Improved text measurement and cursor positioning for complex Unicode
- Added smart paste processing for multi-codepoint characters
- Bypassed problematic TRANSLIT when not needed for UTF-8 content
- Enhanced GUI entry and readline to handle grapheme clusters properly

The implementation maintains full backward compatibility while providing
proper support for modern Unicode standards, fixing display corruption
that occurred with emoji sequences containing variation selectors.

Tested with emoji sequences like: πŸ’•πŸ’‹πŸ˜˜πŸ₯°πŸ’žβ£οΈπŸ’“β™₯️β™₯οΈπŸ’“β£οΈπŸ₯°πŸ₯°πŸ˜˜πŸ’‹πŸ’‹
Addresses issue where emoji with variation selectors (like ❣️, β™₯️) were
incorrectly calculated as width 1 instead of width 2, causing display
overflow in modern terminals.

- Add variation selector detection (0xFE0F) in string_advance_with_grapheme_support()
- Apply special width handling for base emoji + variation selector combinations
- Ensures consistent width calculation with input field processing
- Fixes chat window overflow issues in terminals like Ghostty

This brings string_advance logic in line with unichar_array_advance_cluster
which already had proper variation selector handling.
Relocate is_combining_char() from gui-entry.c to src/core/utf8.c
as suggested in code review. This better organizes the codebase by
placing UTF-8 character classification logic with other UTF-8
utilities in the core module.

The function now uses is_utf8() instead of checking term_type
directly, which is the appropriate method for core layer code.

πŸ€– Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@kofany kofany requested a review from dwfreed October 6, 2025 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants