Releases: starwing/luautf8
luautf8 0.2.0
luautf8 0.2.0 - Modernized API & New Truncation Features
🚨 Breaking Changes
This release modernizes the width-related APIs with breaking changes to parameter names and order. Most users are unaffected (if you only pass the string argument), but please review the migration guide below if you use the ambi_is_double parameter.
API Changes
utf8.width() and utf8.widthindex() now use:
- Integer
ambiwidth(1 or 2) instead of booleanambi_is_double - New optional byte range parameters
i, jfor substring operations
Old signatures (v0.1.x):
utf8.width(s[, ambi_is_double[, default]])
utf8.widthindex(s, width[, ambi_is_double[, default]])New signatures (v0.2.0):
utf8.width(s[, i[, j[, ambiwidth[, default]]]])
utf8.widthindex(s, width[, i[, j[, ambiwidth[, default]]]])Migration Guide
Most common usage (✅ no changes needed):
utf8.width("你好") -- Still works
utf8.widthindex("你好", 3) -- Still worksIf you used ambi_is_double parameter:
-- Old (v0.1.x):
utf8.width(s, true) -- ambi_is_double=true → width 2
utf8.width(s, false) -- ambi_is_double=false → width 1
-- New (v0.2.0):
utf8.width(s, nil, nil, 2) -- ambiwidth=2
utf8.width(s, nil, nil, 1) -- ambiwidth=1 (or omit)Parameter mapping:
ambi_is_double = true→ambiwidth = 2ambi_is_double = falseornil→ambiwidth = 1or omit
✨ New Features
utf8.widthlimit() - Intelligent Width-Based Truncation
A unified function for measuring display width and finding safe truncation points in UTF-8 strings.
utf8.widthlimit(s, limit[, i[, j[, ambiwidth[, default]]]]) --> pos, remainFeatures:
- Positive limit: Truncate from front (keep prefix)
- Negative limit: Truncate from back (keep suffix)
- Omit limit: Calculate display width of byte range
- Returns truncation position (safe character boundary) and remaining width
Examples:
-- Measure width of substring
local pos, width = utf8.widthlimit("你好world", nil, 1, 11)
-- pos=11, width=9
-- Truncate from front (keep prefix)
local pos, remain = utf8.widthlimit("hello world", 5)
-- pos=5, remain=0 → s:sub(1, pos) == "hello"
-- Truncate from back (keep suffix)
local pos, remain = utf8.widthlimit("/path/to/file.lua", -8)
-- pos=10, remain=0 → s:sub(pos) == "file.lua"
-- Handle fullwidth characters
local pos, remain = utf8.widthlimit("你好世界", 5)
-- pos=6, remain=1 (2 fullwidth chars fit, 1 width unused)Use cases:
- Terminal output formatting
- Text truncation with ellipsis
- Column-width calculations
- Path shortening
Enhanced Width Functions
Both utf8.width() and utf8.widthindex() now support byte range parameters for substring operations:
-- Calculate width of bytes 6-11
local width = utf8.width("hello你好world", 6, 11)
-- width=4 ("你好")
-- Find character at width 3 within bytes 6-11
local idx = utf8.widthindex("hello你好world", 3, 6, 11)
-- Search only within "你好" substringVersion Constant
Added utf8.version constant (returns "0.2.0").
📚 Documentation Improvements
- Rewritten API docs in Lua official manual style
- Consistent parameter naming:
s= stringi,j= byte positions (1-based, inclusive)n= character indexambiwidth= ambiguous-width handling (1 or 2)
- Comprehensive examples for all functions
- Fixed grammar and formatting throughout README
🧪 Testing
- Added extensive test coverage for
utf8.widthlimit()- Basic truncation (positive/negative limits)
- Fullwidth characters and mixed-width strings
- Substring ranges and edge cases
- Ambiguous-width character handling
- Updated existing tests for new API signatures
- All tests passing with 100% coverage
🔧 Technical Details
Why the API change?
- Consistency: Integer
ambiwidthis more intuitive than booleanambi_is_double - Flexibility: Byte range parameters enable efficient substring width operations
- Clarity: "ambiwidth=2" is clearer than "ambi_is_double=true means width 2"
Impact assessment:
- Estimated affected users: <5% (most don't pass
ambi_is_double) - Breaking changes are caught immediately at runtime (wrong parameter count)
- Migration is straightforward (see guide above)
📦 Installation
LuaRocks:
luarocks install luautf8Manual:
git clone https://github.com/starwing/luautf8.git
cd luautf8
# Build and install (see README for details)🙏 Acknowledgments
Thanks to all users and contributors! Special thanks to the Unicode Consortium for test data and the Lua community for feedback.
Questions or issues? Please open an issue on GitHub.
📋 Full Changelog
- BREAKING:
utf8.width()andutf8.widthindex()parameter order changed - BREAKING:
ambi_is_double(boolean) replaced withambiwidth(integer) - NEW:
utf8.widthlimit()for intelligent width-based truncation - NEW: Byte range parameters
i, jfor width functions - NEW:
utf8.versionconstant - IMPROVED: Complete documentation rewrite with examples
- IMPROVED: Comprehensive test coverage for all new features
- FIXED: Various grammar and formatting issues in documentation
0.1.6
What's Changed
- Add 'normalize_nfc' and 'isnfc' functions by @alexdowad in #44
- Update to Unicode 15.1 by @data-man in #45
- Add new 'grapheme_indices' function by @alexdowad in #47
- Improve grammar, spelling, and formatting of README.md by @alexdowad in #50
- Fix bugs in NFC normalization code by @alexdowad in #51
- Explicitly include limits.h instead of transitively assuming it by @alerque in #55
New Contributors
Full Changelog: 0.1.5...0.1.6
Add `clean` and `isvalid` funnctions
- add
clean,isvalid,invalidpositionfunctions - add fuzzing test
thansk for @alexdowad
Update Unicode Standard to 15.0
0.1.4 release new version to luarocks
Bugfix Release
make a new release for #31, changes:
- update Unicode version to 14
- Fix compile error on CentOS6
Bugfix release
This is a bugfix release, as I don't have much time/idea for new feature of this project.
release 0.1.1
fix encode/decode large code point issue.
release 0.1.0-1
release 0.1.0