List of Unicode characters
lowercase in XML documents. The nnnn or hhhh may
be any number of digits and may include leading zeros.
The hhhh may mix uppercase and lowercase, though uppercase is the usual style.
In contrast, a character entity reference refers to a character by the name of an entity which has the desired character as its replacement text. The entity must either be
predened (built into the markup language) or explicitly
declared in a Document Type Denition (DTD). The format is the same as for any entity reference:
&name;
where name is the case-sensitive name of the entity. The
semicolon is required.
2 Control codes
This is a list of Unicode characters.
Main articles: Unicode control characters and C0 and
C1 control codes
See also: ASCII control characters in the article ASCII
See also: Control Pictures
65 characters, including DEL but not SP. All belong to
the common script.
3 Latin script
1
Character reference overview
Main article: Latin script in Unicode
See also: List of XML and HTML character entity The Unicode Standard (version 7.0) classies 1,338 characters as belonging to the Latin script.
references and Unicode input
An HTML or XML numeric character reference refers to
a character by its Universal Character Set/Unicode code
point, and uses the format
3.1 Basic Latin
Main article: Basic Latin (Unicode block)
Special characters redirects here. For the Wikipedia
editors handbook page, see Help:Special characters.
See also: ASCII printable characters in the article ASCII
&#nnnn;
or
95 characters; the 52 alphabet characters belong to the
Latin script. The remaining 43 belong to the common
script.
The 33 characters classied as ASCII Punctuation &
&#xhhhh;
where nnnn is the code point in decimal form, and hhhh
is the code point in hexadecimal form. The x must be
1
CYRILLIC
Symbols are also sometimes referred to as ASCII special 96 characters; all belong to the Latin script; three in the
characters. See Latin-1 Supplement and Unicode MES-2 subset. For the rest, see IPA Extensions.
symbols for additional special characters.
3.2
Latin-1 Supplement
Main article: Latin-1 Supplement (Unicode block)
4.2 Spacing modier letters
Main article: Spacing Modier Letters
80 characters; 15 in the MES-2 subset.
96 characters; the 62 letters, and two ordinal indicators
belong to the Latin script. The remaining 32 belong to
the common script.
4.3 Phonetic Extensions
3.3
Latin Extended-A
Main article: Latin Extended-A
Phonetic Extensions
Phonetic Extensions Supplement
128 characters; all belong to the Latin script.
5 Combining Diacritical Marks
3.4
Main article: Combining character
Latin Extended-B
Main article: Latin Extended-B
Combining Diacritical Marks
208 characters; all belong to the Latin script; 33 in the
MES-2 subset.
Combining Diacritical Marks Supplement
Combining Diacritical Marks for Symbols
3.5
Latin Extended Additional
Main article: Latin Extended Additional
6 Greek and Coptic
256 characters; all belong to the Latin script; 23 in the Main article: Greek and Coptic
MES-2 subset. For the rest, see Latin Extended Addi- See also: Coptic (Unicode block)
tional.
3.6
Additional Latin Extended
Latin Extended-C
Latin Extended-D
Latin Extended-E
Phonetic scripts
Main articles: Phonetic transcription and Phonetic
symbols in Unicode
4.1
IPA Extensions
144 code points; 135 assigned characters; 85 in the MES2 subset.
6.1 Greek Extended
Main article: Greek Extended
For polytonic orthography. 256 code points; 233 assigned
characters, all in the MES-2 subset (#670 902).
7 Cyrillic
Main articles: Cyrillic script in Unicode and Cyrillic
(Unicode block)
See also: Glagolitic (Unicode block)
Main article: IPA Extensions
256 characters; 191 in the MES-2 subset.
7.1
Cyrillic supplements
Bengali (Unicode block)
Cyrillic Supplement
Gurmukhi (Unicode block)
Cyrillic Extended-A
Gujarati (Unicode block)
Cyrillic Extended-B
Oriya (Unicode block)
Tamil (Unicode block)
Armenian
Armenian (Unicode block)
Telugu (Unicode block)
Kannada (Unicode block)
Malayalam (Unicode block)
Semitic languages
Further information: Semitic languages
Sinhala (Unicode block)
Other Brahmic and Indic scripts in Unicode include:
Balinese (Unicode block)
Arabic script in Unicode, including the Persian alphabet, Jawi alphabet and others
Batak (Unicode block)
Arabic Supplement
Buhid (Unicode block)
Arabic Extended-A
Hanunoo (Unicode block)
Unicode and HTML for the Hebrew alphabet
Khmer (Unicode block)
Mandaic (Unicode block)
Khmer Symbols
Samaritan (Unicode block)
Lao (Unicode block)
Syriac (Unicode block)
Lepcha (Unicode block)
Tinagh (Unicode block)
Limbu (Unicode block)
Mon script Unicode
10
Thaana
Thaana (Unicode block)
New Tai Lue (Unicode block)
Ol Chiki (Unicode block)
Sundanese (Unicode block)
11
N'Ko
NKo (Unicode block)
12
Brahmic (Indic) scripts
Syloti Nagri (Unicode block)
Tagalog (Unicode block)
Tagbanwa (Unicode block)
Tai Le (Unicode block)
Tai Tham (Unicode block)
Main article: Brahmic scripts in Unicode
Thai (Unicode block)
The range from U+0900 to U+0DFF includes
Devanagari, Bengali script, Gurmukhi, Gujarati
script, Oriya script, Tamil script, Telugu script, Kannada
script, Malayalam script, and the Sinhala alphabet.
Tibetan (Unicode block)
Devanagari (Unicode block)
Vedic Extensions
13 Georgian
Georgian (Unicode block)
Georgian Supplement
29 BOX DRAWING
14
Ethiopic
Ge'ez script Unicode
15
Native American scripts
Cherokee (Unicode block)
22 Letterlike Symbols
Main article: Letterlike Symbols (Unicode block)
23 Number Forms
Main article: Number Forms (Unicode block)
Unied Canadian Aboriginal Syllabics (Unicode
block)
Unied Canadian Aboriginal Syllabics Extended
16
Mongolian
24 Arrows
Main articles: Arrow (symbol) and Arrows (Unicode
block)
Mongolian (Unicode block)
24.1 Supplemental Arrows
17
Buginese
Supplemental Arrows-A
Supplemental Arrows-B
Buginese (Unicode block)
25 Mathematical Operators
18
Unicode symbols
Main article: Unicode symbols
19
General Punctuation
Main article: General Punctuation
See also: Supplemental Punctuation
112 code points; 111 assigned characters; 24 in the MES2 subset.
Main articles: Mathematical operators and symbols in
Unicode and Mathematical Operators
26 Miscellaneous Technical
Main article: Miscellaneous Technical
27 Optical Character Recognition
Optical Character Recognition (Unicode block)
20
Superscripts and Subscripts
28 Enclosed Alphanumerics
Main article:
block)
21
Superscripts and Subscripts (Unicode
Currency Symbols
Main article: Currency Symbols (Unicode block)
Main article: Enclosed Alphanumerics (Unicode block)
29 Box Drawing
Main article: Box Drawing (Unicode block)
30
Block Elements
Ideographic
block)
Main article: Block Elements
See also: Box-drawing characters
Description
Characters
(Unicode
CJK Symbols and Punctuation
Hiragana (Unicode block)
Katakana (Unicode block)
31
Geometric Shapes
Bopomofo (Unicode block)
Hangul Compatibility Jamo
Main article: Geometric Shapes
List of hangul jamo
Kanbun (Unicode block)
32
Miscellaneous Symbols
Main article: Miscellaneous Symbols
CJK Unied Ideographs
Yi Syllables
39 Alphabetic Presentation Forms
33
Dingbats
Main article: Alphabetic Presentation Forms
Dingbats (Unicode block)
34
40 Specials
Braille Patterns
Main article: Specials (Unicode block)
Braille Patterns
35
Miscellaneous
Symbols
Mathematical
Miscellaneous Mathematical Symbols-A
Miscellaneous Mathematical Symbols-B
36
Supplemental
Operators
Mathematical
Supplemental Mathematical Operators
37
Miscellaneous Symbols and Arrows
Miscellaneous Symbols and Arrows
38
Chinese, Japanese and Korean
41 Ancient scripts
Ogham (Unicode block)
Runic (Unicode block)
Linear B Syllabary
Linear B Ideograms
Aegean Numbers (Unicode block)
Ancient Greek Numbers (Unicode block)
Ancient Symbols (Unicode block)
Phaistos Disc (Unicode block)
Lycian (Unicode block)
Carian (Unicode block)
Old Italic (Unicode block)
Gothic (Unicode block)
Ugaritic (Unicode block)
CJK Radicals Supplement
Old Persian (Unicode block)
Kangxi Radicals (Unicode block)
Deseret (Unicode block)
48
Shavian (Unicode block)
Osmanya (Unicode block)
Cypriot Syllabary (Unicode block)
Imperial Aramaic (Unicode block)
EXTERNAL LINKS
46 See also
Comparison of Unicode encodings
Free software Unicode typefaces
GNU Unifont
Phoenician (Unicode block)
List of Unicode radicals
Lydian (Unicode block)
List of Unicode fonts
Meroitic Hieroglyphs (Unicode block)
List of typefaces
Meroitic Cursive (Unicode block)
Typographic unit
Kharoshthi (Unicode block)
Unicode Consortium
Avestan (Unicode block)
Unicode fallback font
Inscriptional Parthian (Unicode block)
Unicode typeface
Inscriptional Pahlavi (Unicode block)
Universal Character Set characters
Old Turkic (Unicode block)
Brahmi (Unicode block)
Kaithi (Unicode block)
Cuneiform (Unicode block)
Cuneiform Numbers and Punctuation
Egyptian Hieroglyphs (Unicode block)
42
Musical symbols
Modern
Byzantine
Ancient Greek
43
Emoji
Emoji
44
Alchemical symbols
Alchemical Symbols
45
Game symbols
Mahjong Tiles
Domino Tiles
Playing cards
47 References
[1] Deprecated as of Unicode version 5.2.0 U+0149 Latin
small letter n preceded by apostrophe was encoded for use
in Afrikaans. The character is deprecated, and its use is
strongly discouraged. In nearly all cases it is better represented by a sequence of an apostrophe followed by n.
pg. 208
Unicode 7.0 Character Code Charts, Unicode, Inc.
CWA 13873:2000 Multilingual European Subsets
in ISO/IEC 10646-1 CEN Workshop Agreement
13873
Multilingual European Character Set 2 (MES-2)
Rationale, Markus Kuhn, 1998
48 External links
Ocial web site of the Unicode Consortium (English)
decodeunicode.org Unicode-Wiki with images of
all 98,884 graphical unicode characters (German/English, full text search)
unicodinator.com a visual Unicode navigator
Letters with diacritical marks, grouped alphabetically, Pinyin.info
UTF-8 encoding table and Unicode characters
49
49.1
Text and image sources, contributors, and licenses
Text
List of Unicode characters Source: https://en.wikipedia.org/wiki/List_of_Unicode_characters?oldid=724462910 Contributors: Bryan
Derksen, Zundark, Michael Hardy, Cyde, Darkwind, Random832, Bearcat, Merovingian, DocWatson42, Chowbok, Utcursch, Keith Edkins, Beland, Pmanderson, Icairns, Hobart, Patricknoddy, Rich Farmbrough, Dbachmann, Violetriga, Kwamikagami, Wareh, Sasquatch,
VBGFscJUn3, Brainy J, Resipsa, Arthena, Sl, Mailer diablo, Bsadowski1, Gmaxwell, David Haslam, Waldir, DePiep, Koavf, Hans Genten,
MZMcBride, Gudeldar, DoubleBlue, Fish and karate, Gurch, Benlisquare, Theymos, Sceptre, Ste1n, IanManka, Pseudomonas, Jndrline,
E rulez, Farmanesh, NigelJ, Thespian, Saltmarsh, SmackBot, JeyP, Kazkaskazkasako, MaxSem, George Ho, Egsan Bacon, Alphathon,
Sspecter, Radagast83, Cybercobra, Einhanderkiller, Filan, Wthrower, Keahapana, Jack Waugh, Vanisaac, LSX, Conrad.Irwin, Salicyna,
Epbr123, DewiMorgan, Shahbaz Youse, Guy Macon, BigNate37, Rmsuperstar99, Spencer, Res2216restar, MER-C, Hydro, IIIIIIIII, Idiotkid, Sangak, JamesBWatson, Tedickey, Objectivesea, Dan Pelleg, DerHexer, J.delanoy, Trusilver, Inimino, Nigholith, Hbgarou, Belovedfreak, Liliana-60, Cmichael, , Leszek4444, VolkovBot, TreasuryTag, Aesopos, Fran Rogers, Tavix, Seraphim, Damrung,
Haseo9999, Falcon8765, Grzechooo, Winchelsea, Flyer22 Reborn, PolarBot, Sollbruchstelle, Myotis, Techman224, Kanonkas, RedAugust, Beeblebrox, ClueBot, Mild Bill Hiccup, Biglarrrr, Niceguyedc, Kansoku, Copyeditor42, Alexbot, PixelBot, IXella007, Computer97,
Versus22, Nafsadh, Jrooksjr, Unknownyetknown, Fieldday-sunday, BabelStone, Chamal N, Tide rolls, OlEnglish, Abjiklam, Gail, Yobot,
Timeroot, ArchonMagnus, GUnit594, BrownInSpace, Plappy doon-doon, Xgrimreapahx, Mahmudmasri, Materialscientist, CasperBraske,
The Banner, Swiftarrow9, Supersaiyan474, Tomdo08, Ashershow1, Deleyd, Coroboy, Alxeedo, , Xxz3phyrxx, Szymo1500,
Mono, Lotje,
, Dudy001, Kierany5, Moswento, Dcirovic, Moniqueque18, Thewolfchild, Mankarse, ClueBot NG, Frietjes, DiErrasa,
VH2, Wbm1058, BG19bot, Heikohaller, Fopnor, Toccata quarta, Gorobay, DPL bot, Res0lution, Erase99, Vogone, Dgrima, Mrjulesd,
, JPaestpreornJeolhlna, FanforClark12, Seagull123, DareKevil, R12a, Ljacqu, Dazitzel, Izkala, Iawbrooks and Anonymous: 157
49.2
Images
File:Unicode_logo.svg Source: https://upload.wikimedia.org/wikipedia/commons/a/ab/Unicode_logo.svg License: Public domain Contributors: de::Bild:Unicode logo.jpg Original artist: unbekannt (Transfered by mu cabbage/Original uploaded by Benji)
File:Writing_systems_worldwide.png Source: https://upload.wikimedia.org/wikipedia/commons/9/9d/Writing_systems_worldwide.
png License: CC-BY-SA-3.0 Contributors: the English language Wikipedia (log). Based on File:BlankMap-World-v5-EU.png. Original artist: uploaded to Wikipedia by JWB.
49.3
Content license
Creative Commons Attribution-Share Alike 3.0