Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Ascertains whether a Unicode code point is 'safe' for the purposes of encoding binary data

License

Notifications You must be signed in to change notification settings

qntm/safe-code-point

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

safe-code-point

Package for determining whether the supplied code point is "safe". This module (well, code very much like it) was used to generate code points for Base65536, Base32768 and Base2048.

This package supports ES modules only.

Example

import SafeCodePoint from 'safe-code-point'

const safeCodePoint = await SafeCodePoint('17.0.0')
const numCodePoints = (1 << 16) + (1 << 20)

let numSafeCodePoints = 0
for (let codePoint = 0; codePoint < numCodePoints; codePoint++) {
  if (safeCodePoint(codePoint)) {
    numSafeCodePoints++
  }
}

console.log(numSafeCodePoints)

API

SafeCodePoint(version, options)

Returns a promise which resolves to a safeCodePoint function for the supplied version of Unicode.

Data is fetched from the Unicode website at run time. At the time of writing, supported version strings are a subset of those seen in that directory: '4.1.0' to '17.0.0'. (Earlier versions do not provide the data in the same consumable structure.)

options

An optional object with the keys acting as additional options. The options are as follows:

  • safeCategories: An object that determines whether a codepoint category is safe or not. The keys are the category, and the value is true if it is safe, and anything else if it is not safe. If not passed, the default parameters are shown in the example below.

Example:

const safeCodePoint = await SafeCodePoint('17.0.0', {
  safeCategories: {
    Ll: true, // Letter, Lowercase
    Lm: true, // Letter, Modifier
    Lo: true, // Letter, Other
    Lt: true, // Letter, Titlecase
    Lu: true, // Letter, Uppercase
    Me: false, // Mark, Enclosing
    Mn: false, // Mark, Nonspacing
    Mc: false, // Mark, Spacing Combining
    Nd: true, // Number, Decimal Digit
    Nl: true, // Number, Letter
    No: true, // Number, Other
    Cc: false, // Other, Control
    Cf: false, // Other, Format
    Cn: false, // Other, Not Assigned (no characters in the file have this property)
    Co: false, // Other, Private Use
    Cs: false, // Other, Surrogate
    Pe: false, // Punctuation, Close
    Pc: false, // Punctuation, Connector
    Pd: false, // Punctuation, Dash
    Pf: false, // Punctuation, Final quote (may behave like Ps or Pe depending on usage)
    Pi: false, // Punctuation, Initial quote (may behave like Ps or Pe depending on usage)
    Ps: false, // Punctuation, Open
    Po: false, // Punctuation, Other
    Zl: false, // Separator, Line
    Zp: false, // Separator, Paragraph
    Zs: false, // Separator, Space
    Sc: true, // Symbol, Currency
    Sm: true, // Symbol, Math
    Sk: true, // Symbol, Modifier
    So: true // Symbol, Other
  }
})

safeCodePoint(codePoint)

Returns a Boolean indicating whether the supplied code point is safe (is not a member of any unsafe Unicode General Categories, has a canonical combining class of 0 and survives all forms of normalization). codePoint should be an integer from 0 to 1114111 inclusive.

safeCodePoint.generalCategory(codepoint)

Returns the Unicode General Category of the supplied code point as a two-character string, e.g. "Lo" for "Letter, other".

safeCodePoint.wordBreak(codepoint)

Returns the Word_Break property value of the supplied code point, e.g. 'Numeric', 'ALetter' or (in most cases) undefined.

About

Ascertains whether a Unicode code point is 'safe' for the purposes of encoding binary data

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •