unicode-script.js

Retrieve all Unicode script(s) used in a string.

Includes support for the Script_Extension (scx) property which is defined as characters which are "commonly used with more than one script, but with a limited number of scripts".

Based on the Script_Extension, this library can also return the augmented script set to figure out if a string is mixed-script or single-script. Mixed scripts can be an indicator of suspicious user inputs.

Unicode version: 16.0.0 (September 2024)

Install

Use npm or your favorite package manager to install this module:

npm install --save unicode-script

Or use the ESM module directly from the browser.

Usage - Script

Each codepoint belongs to one script, which might be one of the three special scripts Common, Inherited, Unknown.

`unicodeScript(char)` / `unicodeScriptCode(char)`

// Single character

import { unicodeScript } from "unicode-script";
unicodeScript("ᴦ"); // "Greek"

`unicodeScripts(string)` / `unicodeScriptCodes(string)`

import { unicodeScripts, unicodeScriptCodes } from "unicode-script";

// Set of all scripts of a string
unicodeScripts("СC"); // Set(2) { 'Cyrillic', 'Latin' }
unicodeScripts("𐱐"); // Set(1) { 'Unknown' }

// Get all scripts of string in ISO 15924 four-letter codes
unicodeScriptCodes("СC"); // Set(2) { 'Cyrl', 'Latn' }

Usage - Script Extensions

Each codepoint can belong to mutliple script extensions.

`unicodeScriptExtensions(string)` / `unicodeScriptExtensionCodes(string)`

import { unicodeScriptExtensions } from "unicode-script";
unicodeScriptExtensions("॥");
// Set(23) {
//   'Bengali',
//   'Devanagari',
//   'Dogra',
//   'Grantha',
//   'Gujarati',
//   'Gunjala_Gondi',
//   'Gurmukhi',
//   'Gurung_Khema',
//   'Kannada',
//   'Khudawadi',
//   'Limbu',
//   'Mahajani',
//   'Malayalam',
//   'Masaram_Gondi',
//   'Nandinagari',
//   'Ol_Onal',
//   'Oriya',
//   'Sinhala',
//   'Syloti_Nagri',
//   'Takri',
//   'Tamil',
//   'Telugu',
//   'Tirhuta'
// }

Usage - Augmented Scripts

Like script extensions, but adds meta scripts for Asian languages and treats Common/Inherited values as ALL scripts.

`unicodeAugmentedScriptCodes(char)`

import { unicodeAugmentedScriptCodes } from "unicode-script";

unicodeAugmentedScriptCodes("ねガ"); // Set(3) { 'Hira', 'Kana', 'Jpan' }
unicodeAugmentedScriptCodes("1"); // Set(175) { 'Adlm',  'Aghb', , 'Ahom', … }

Usage - Resolved Script

Intersection of all augmented scripts per character.

`unicodeResolvedScriptCodes(string)`

import { unicodeResolvedScriptCodes } from "unicode-script";

unicodeResolvedScriptCodes("СігсӀе"); // Set(1) { 'Cyrl' }
unicodeResolvedScriptCodes("Сirсlе"); // Set(0) {}
unicodeResolvedScriptCodes("𝖢𝗂𝗋𝖼𝗅𝖾"); // Set(175) { 'Adlm',  'Aghb', , 'Ahom', … }
unicodeResolvedScriptCodes("1"); // Set(175) { 'Adlm',  'Aghb', , 'Ahom', … }
unicodeResolvedScriptCodes("ねガ"); // Set(3) { 'Hira', 'Kana', 'Jpan' }

Please note that the resolved script can contain multiple scripts, as per standard.

Usage - Mixed-Script Detection

Mixed-script if resolved script set is empty, single-script otherwise.

`isMixedScript(string)` / `isSingleScript(string)`

import { isMixedScript, isSingleScript } from "unicode-script";

isMixedScript("СігсӀе"); // false
isMixedScript("Сirсlе"); // true
isMixedScript("𝖢𝗂𝗋𝖼𝗅𝖾"); // false
isMixedScript("1"); // false
isMixedScript("ねガ"); // false

isSingleScript("СігсӀе"); // true
isSingleScript("Сirсlе"); // false
isSingleScript("𝖢𝗂𝗋𝖼𝗅𝖾"); // true
isSingleScript("1"); // true
isSingleScript("ねガ"); // true

Please note that a single-script string might actually contain multiple scripts, as per standard (e.g. for Asian languages)

List of All Scripts

Script names and short names can be retrieved like this:

import { listUnicodeScripts } from "unicode-script";
listUnicodeScripts(); // Set(172) { 'Adlam', 'Ahom', 'Anatolian_Hieroglyphs', …

import { listUnicodeScriptCodes } from "unicode-script";
listUnicodeScriptCodes(); // Set(172) { 'Adlm', 'Aghb', 'Ahom', …

import { listUnicodeAugmentedScriptCodes } from "unicode-script";
listUnicodeAugmentedScriptCodes(); // Set(3) { 'Hanb', 'Jpan', 'Kore' }

You can find a list of all scripts in Unicode, with links to Wikipedia on character.construction/scripts

More Examples / JSDoc

See SPECS and DOCS.

Unicode Standards

UAX24
UTS39 - Mixed-Script Detection

Also See

Get the Unicode blocks of a string: unicode-block.js
Get the name of a character: unicode-name.js
Index created with: unicoder
Ruby implementation (same data & algorithms): unicode-scripts gem

MIT License

Copyright (C) 2024 Jan Lelis https://janlelis.com. Released under the MIT license.
Unicode data: https://www.unicode.org/copyright.html#Exhibit1

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
docs		docs
src		src
test		test
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
MIT-LICENSE.txt		MIT-LICENSE.txt
README.md		README.md
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

unicode-script.js

Install

Usage - Script

`unicodeScript(char)` / `unicodeScriptCode(char)`

`unicodeScripts(string)` / `unicodeScriptCodes(string)`

Usage - Script Extensions

`unicodeScriptExtensions(string)` / `unicodeScriptExtensionCodes(string)`

Usage - Augmented Scripts

`unicodeAugmentedScriptCodes(char)`

Usage - Resolved Script

`unicodeResolvedScriptCodes(string)`

Usage - Mixed-Script Detection

`isMixedScript(string)` / `isSingleScript(string)`

List of All Scripts

More Examples / JSDoc

Unicode Standards

Also See

MIT License

About

Releases

Languages

License

janlelis/unicode-script.js

Folders and files

Latest commit

History

Repository files navigation

unicode-script.js

Install

Usage - Script

unicodeScript(char) / unicodeScriptCode(char)

unicodeScripts(string) / unicodeScriptCodes(string)

Usage - Script Extensions

unicodeScriptExtensions(string) / unicodeScriptExtensionCodes(string)

Usage - Augmented Scripts

unicodeAugmentedScriptCodes(char)

Usage - Resolved Script

unicodeResolvedScriptCodes(string)

Usage - Mixed-Script Detection

isMixedScript(string) / isSingleScript(string)

List of All Scripts

More Examples / JSDoc

Unicode Standards

Also See

MIT License

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Languages

`unicodeScript(char)` / `unicodeScriptCode(char)`

`unicodeScripts(string)` / `unicodeScriptCodes(string)`

`unicodeScriptExtensions(string)` / `unicodeScriptExtensionCodes(string)`

`unicodeAugmentedScriptCodes(char)`

`unicodeResolvedScriptCodes(string)`

`isMixedScript(string)` / `isSingleScript(string)`