Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Releases: starwing/luautf8

luautf8 0.2.0

12 Nov 17:24

Choose a tag to compare

luautf8 0.2.0 - Modernized API & New Truncation Features

🚨 Breaking Changes

This release modernizes the width-related APIs with breaking changes to parameter names and order. Most users are unaffected (if you only pass the string argument), but please review the migration guide below if you use the ambi_is_double parameter.

API Changes

utf8.width() and utf8.widthindex() now use:

  • Integer ambiwidth (1 or 2) instead of boolean ambi_is_double
  • New optional byte range parameters i, j for substring operations

Old signatures (v0.1.x):

utf8.width(s[, ambi_is_double[, default]])
utf8.widthindex(s, width[, ambi_is_double[, default]])

New signatures (v0.2.0):

utf8.width(s[, i[, j[, ambiwidth[, default]]]])
utf8.widthindex(s, width[, i[, j[, ambiwidth[, default]]]])

Migration Guide

Most common usage (✅ no changes needed):

utf8.width("你好")              -- Still works
utf8.widthindex("你好", 3)      -- Still works

If you used ambi_is_double parameter:

-- Old (v0.1.x):
utf8.width(s, true)          -- ambi_is_double=true → width 2
utf8.width(s, false)         -- ambi_is_double=false → width 1

-- New (v0.2.0):
utf8.width(s, nil, nil, 2)   -- ambiwidth=2
utf8.width(s, nil, nil, 1)   -- ambiwidth=1 (or omit)

Parameter mapping:

  • ambi_is_double = trueambiwidth = 2
  • ambi_is_double = false or nilambiwidth = 1 or omit

✨ New Features

utf8.widthlimit() - Intelligent Width-Based Truncation

A unified function for measuring display width and finding safe truncation points in UTF-8 strings.

utf8.widthlimit(s, limit[, i[, j[, ambiwidth[, default]]]]) --> pos, remain

Features:

  • Positive limit: Truncate from front (keep prefix)
  • Negative limit: Truncate from back (keep suffix)
  • Omit limit: Calculate display width of byte range
  • Returns truncation position (safe character boundary) and remaining width

Examples:

-- Measure width of substring
local pos, width = utf8.widthlimit("你好world", nil, 1, 11)
-- pos=11, width=9

-- Truncate from front (keep prefix)
local pos, remain = utf8.widthlimit("hello world", 5)
-- pos=5, remain=0 → s:sub(1, pos) == "hello"

-- Truncate from back (keep suffix)
local pos, remain = utf8.widthlimit("/path/to/file.lua", -8)
-- pos=10, remain=0 → s:sub(pos) == "file.lua"

-- Handle fullwidth characters
local pos, remain = utf8.widthlimit("你好世界", 5)
-- pos=6, remain=1 (2 fullwidth chars fit, 1 width unused)

Use cases:

  • Terminal output formatting
  • Text truncation with ellipsis
  • Column-width calculations
  • Path shortening

Enhanced Width Functions

Both utf8.width() and utf8.widthindex() now support byte range parameters for substring operations:

-- Calculate width of bytes 6-11
local width = utf8.width("hello你好world", 6, 11)
-- width=4 ("你好")

-- Find character at width 3 within bytes 6-11
local idx = utf8.widthindex("hello你好world", 3, 6, 11)
-- Search only within "你好" substring

Version Constant

Added utf8.version constant (returns "0.2.0").


📚 Documentation Improvements

  • Rewritten API docs in Lua official manual style
  • Consistent parameter naming:
    • s = string
    • i, j = byte positions (1-based, inclusive)
    • n = character index
    • ambiwidth = ambiguous-width handling (1 or 2)
  • Comprehensive examples for all functions
  • Fixed grammar and formatting throughout README

🧪 Testing

  • Added extensive test coverage for utf8.widthlimit()
    • Basic truncation (positive/negative limits)
    • Fullwidth characters and mixed-width strings
    • Substring ranges and edge cases
    • Ambiguous-width character handling
  • Updated existing tests for new API signatures
  • All tests passing with 100% coverage

🔧 Technical Details

Why the API change?

  • Consistency: Integer ambiwidth is more intuitive than boolean ambi_is_double
  • Flexibility: Byte range parameters enable efficient substring width operations
  • Clarity: "ambiwidth=2" is clearer than "ambi_is_double=true means width 2"

Impact assessment:

  • Estimated affected users: <5% (most don't pass ambi_is_double)
  • Breaking changes are caught immediately at runtime (wrong parameter count)
  • Migration is straightforward (see guide above)

📦 Installation

LuaRocks:

luarocks install luautf8

Manual:

git clone https://github.com/starwing/luautf8.git
cd luautf8
# Build and install (see README for details)

🙏 Acknowledgments

Thanks to all users and contributors! Special thanks to the Unicode Consortium for test data and the Lua community for feedback.

Questions or issues? Please open an issue on GitHub.


📋 Full Changelog

  • BREAKING: utf8.width() and utf8.widthindex() parameter order changed
  • BREAKING: ambi_is_double (boolean) replaced with ambiwidth (integer)
  • NEW: utf8.widthlimit() for intelligent width-based truncation
  • NEW: Byte range parameters i, j for width functions
  • NEW: utf8.version constant
  • IMPROVED: Complete documentation rewrite with examples
  • IMPROVED: Comprehensive test coverage for all new features
  • FIXED: Various grammar and formatting issues in documentation

0.1.6

03 Jan 10:39

Choose a tag to compare

What's Changed

  • Add 'normalize_nfc' and 'isnfc' functions by @alexdowad in #44
  • Update to Unicode 15.1 by @data-man in #45
  • Add new 'grapheme_indices' function by @alexdowad in #47
  • Improve grammar, spelling, and formatting of README.md by @alexdowad in #50
  • Fix bugs in NFC normalization code by @alexdowad in #51
  • Explicitly include limits.h instead of transitively assuming it by @alerque in #55

New Contributors

Full Changelog: 0.1.5...0.1.6

Add `clean` and `isvalid` funnctions

01 Dec 14:59

Choose a tag to compare

  • add clean, isvalid, invalidposition functions
  • add fuzzing test

thansk for @alexdowad

Update Unicode Standard to 15.0

01 Oct 14:37

Choose a tag to compare

0.1.4

release new version to luarocks

Bugfix Release

29 Jul 14:51

Choose a tag to compare

make a new release for #31, changes:

  • update Unicode version to 14
  • Fix compile error on CentOS6

Bugfix release

05 Apr 14:47

Choose a tag to compare

This is a bugfix release, as I don't have much time/idea for new feature of this project.

release 0.1.1

31 May 08:50

Choose a tag to compare

fix encode/decode large code point issue.

release 0.1.0-1

14 May 15:23

Choose a tag to compare

release 0.1.0