pure-lua 5.3 regex library for Lua 5.3, Lua 5.1, LuaJIT
This library provides simple way to add UTF-8 support into your application.
local utf8 = require('.utf8'):init()
for k,v in pairs(utf8) do
string[k] = v
end
local str = "пыщпыщ ололоо я водитель нло"
print(str:find("(.л.+)н"))
-- 8 26 ололоо я водитель
print(str:gsub("ло+", "보라"))
-- пыщпыщ о보라보라 я водитель н보라 3
print(str:match("^п[лопыщ ]*я"))
-- пыщпыщ ололоо яThis library can be used as drop-in replacement for vanilla string library. It exports all vanilla functions under raw sub-object.
local utf8 = require('.utf8'):init()
local str = "пыщпыщ ололоо я водитель нло"
utf8.gsub(str, "ло+", "보라")
-- пыщпыщ о보라보라 я водитель н보라 3
utf8.raw.gsub(str, "ло+", "보라")
-- пыщпыщ о보라보라о я водитель н보라 3It also provides all functions from Lua 5.3 UTF-8 module except utf8.len (s [, i [, j]]). If you need to validate your strings use utf8.validate(str, byte_pos) or iterate over with utf8.validator.
Please note that library assumes regexes are valid UTF-8 strings, if you need to manipulate individual bytes use vanilla functions under utf8.raw.
Download repository to your project folder. (no rockspecs yet)
Examples assume library placed under utf8 subfolder not utf8.lua.
As of Lua 5.3 default utf8 module has precedence over user-provided. In this case you can specify full module path (.utf8).
Library is highly modular. You can provide your implementation for almost any function used. Library already has several back-ends:
- Runtime character class processing using hardcoded codepoint ranges or using native functions through
ffi. - Basic functions for working with UTF-8 characters have specializations for
ffi-enabled runtime and for tarantool.
Probably most interesting customizations are utf8.config.loadstring and utf8.config.cache if you want to precompile your regexes.
local utf8 = require('.utf8')
utf8.config = {
cache = my_smart_cache,
}
utf8:init()For lower and upper functions to work in environments where ffi cannot be used, you can specify substitution tables (data example)
local utf8 = require('.utf8')
utf8.config = {
conversion = {
uc_lc = utf8_uc_lc,
lc_uc = utf8_lc_uc
},
}
utf8:init()Customization is done before initialization. If you want, you can change configuration after init, it might work for everything but modules. All of them should be reloaded.
Please provide example script that causes error together with environment description and debug output. Debug output can be obtained like:
local utf8 = require('.utf8')
utf8.config = {
debug = utf8:require("util").debug
}
utf8:init()
-- your codeDefault logger used is io.write and can be changed by specifying logger = my_logger in configuration