A pure Erlang IDNA implementation following RFC 5891.
Current Unicode version: 17.0.0
- IDNA 2008 compliance with RFC 5891
- IDNA 2003 backward compatibility
- UTS #46 compatibility processing (Unicode Technical Standard #46)
- Full label validation:
- NFC normalization check
- Hyphen placement rules
- Leading combining marks check
- Contextual rules (CONTEXTJ/CONTEXTO)
- Bidirectional text rules (RFC 5893)
Add to your rebar.config:
{deps, [
{idna, "7.1.0"}
]}.Add to your mix.exs:
defp deps do
[
{:idna, "~> 7.1"}
]
end%% Basic encoding
1> idna:encode("münchen.de").
"xn--mnchen-3ya.de"
2> idna:encode("βόλος.com").
"xn--nxasmq5b.com"
%% Japanese domain with UTS #46 processing
3> idna:encode("日本語.JP", [uts46]).
"xn--wgv71a119e.jp"1> idna:decode("xn--mnchen-3ya.de").
"münchen.de"
2> idna:decode("xn--nxasmq5b.com").
"βόλος.com"The encode/2 and decode/2 functions accept an options list:
| Option | Default | Description |
|---|---|---|
uts46 |
false |
Enable UTS #46 compatibility processing |
std3_rules |
false |
Enforce STD3 ASCII rules |
transitional |
false |
Use transitional processing (IDNA 2003 compatibility) |
strict |
false |
Only use ASCII period (.) as label separator |
%% UTS #46 processing normalizes and maps characters
1> idna:encode("Ⅷ.com", [uts46]).
"viii.com"
%% Transitional processing (ß → ss)
2> idna:encode("faß.de", [uts46, transitional]).
"fass.de"
%% Non-transitional (default) preserves ß
3> idna:encode("faß.de", [uts46]).
"xn--fa-hia.de"
%% STD3 rules reject certain characters
4> idna:encode("_example.com", [uts46, std3_rules]).
** exception exit: {invalid_codepoint,95}| Function | Description |
|---|---|
encode/1,2 |
Encode a Unicode domain name to ASCII (Punycode) |
decode/1,2 |
Decode an ASCII domain name to Unicode |
alabel/1 |
Convert a single label to ASCII form (A-label) |
ulabel/1 |
Convert a single label to Unicode form (U-label) |
| Function | Description |
|---|---|
check_label/1,4 |
Validate a domain label |
check_nfc/1 |
Check NFC normalization |
check_hyphen/1 |
Check hyphen placement rules |
check_context/1 |
Check contextual rules |
check_initial_combiner/1 |
Check for leading combining marks |
check_label_length/1 |
Check label length (max 63 octets) |
| Function | Replacement |
|---|---|
to_ascii/1 |
Use encode/1 |
to_unicode/1 |
Use decode/1 |
from_ascii/1 |
Use decode/1 |
utf8_to_ascii/1 |
Use encode/1 |
Full API documentation is available on HexDocs.
Generate documentation locally:
rebar3 ex_docThis library currently supports Unicode 17.0.0. To update to a new Unicode version:
Replace VERSION with the target version (e.g., 17.0.0):
# Core Unicode data files
wget -O uc_spec/UnicodeData.txt https://www.unicode.org/Public/VERSION/ucd/UnicodeData.txt
wget -O uc_spec/ArabicShaping.txt https://www.unicode.org/Public/VERSION/ucd/ArabicShaping.txt
wget -O uc_spec/Scripts.txt https://www.unicode.org/Public/VERSION/ucd/Scripts.txt
# IDNA-specific files (path structure as of Unicode 17.0.0)
wget -O uc_spec/IdnaMappingTable.txt https://www.unicode.org/Public/VERSION/idna/IdnaMappingTable.txt
wget -O test/IdnaTestV2.txt https://www.unicode.org/Public/VERSION/idna/IdnaTestV2.txtUse the kjd/idna Python tool:
git clone --depth 1 https://github.com/kjd/idna.git /tmp/kjd-idna
python3 /tmp/kjd-idna/tools/idna-data make-table --version VERSION > uc_spec/idna-table.txt
rm -rf /tmp/kjd-idnaIf the tool needs additional files, use the --source option:
python3 /tmp/kjd-idna/tools/idna-data make-table --version VERSION --source uc_spec > uc_spec/idna-table.txtcd uc_spec
./gen_idnadata_mod.escript
./gen_idna_table_mod.escript
./gen_idna_mapping_mod.escript
cd ..rebar3 eunitMIT License - see LICENSE for details.
Contributions are welcome! Please feel free to submit a Pull Request.