0% found this document useful (0 votes)

12 views4 pages

Handout - Utf 8 Encoding Explained (Step by Step For U+1f60a)

This handout explains how a Unicode code point, such as U+1F60A (😊), is converted into a UTF-8 byte sequence (F0 9F 98 8A). It outlines the differences between Unicode and UTF-8, the templates used for encoding, and provides a step-by-step algorithm for encoding a code point into UTF-8. Additionally, it includes examples, common pitfalls, and a quick reference for encoding various code points.

Uploaded by

senbeth11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views4 pages

Handout - Utf 8 Encoding Explained (Step by Step For U+1f60a)

Uploaded by

senbeth11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

UTF‑8 Encoding — Step‑by‑Step Handout (example:

U+1F60A — 😊)
Goal: Explain clearly how a Unicode code point (U+1F60A) becomes the UTF‑8 byte sequence
F0 9F 98 8A . This is a self‑contained handout you can print or give to students.

1) Quick reminder: Unicode vs encodings

• Unicode assigns every character a code point (written U+xxxxxx , in hex). Example: U+1F60A is
the smiling‑face emoji (😊).
• UTF‑8 is one way to convert a Unicode code point into a sequence of bytes so computers can store/
send it.

2) UTF‑8 byte formats (templates)

UTF‑8 uses different templates depending on the code point value.

Bytes Template (bits) Code point range

1 0xxxxxxx U+0000 .. U+007F (7 bits)

2 110xxxxx 10xxxxxx U+0080 .. U+07FF (5 + 6 = 11 bits)

3 1110xxxx 10xxxxxx 10xxxxxx U+0800 .. U+FFFF (4 + 6 + 6 = 16 bits)

4 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx U+10000 .. U+10FFFF (3 + 6 + 6 + 6 = 21 bits)

Important rules: - Continuation bytes always start with 10 (those are the 10xxxxxx bytes). - The
number of leading 1 bits in the first byte (followed by a 0 ) indicates how many bytes the character has
(e.g. 11110 indicates 4 bytes total).

3) General UTF‑8 encoding algorithm (practical steps)

1. Get the Unicode code point (hex), e.g. U+1F60A .
2. Decide how many UTF‑8 bytes are needed using the ranges above.
3. Convert the hex code point to binary.
4. Pad the binary with leading zeros so the total number of bits equals the sum available in the chosen
template (7, 11, 16 or 21 bits).
5. Split the padded binary left to right into groups that match the x slot sizes in the template. For
example, for 4 bytes the groups are 3 | 6 | 6 | 6 bits.

1
6. Put each group into its place in the template. Add the fixed prefix bits ( 0 , 110 , 1110 , or 11110
on the first byte and 10 for continuation bytes).
7. Convert each resulting 8‑bit byte to hex — that gives the UTF‑8 byte sequence.

4) Worked example: encode U+1F60A (😊)

Step A — Start value - Unicode: U+1F60A (hex) - Decimal: 128522

Step B — Convert to binary - Hex 1F60A → digitwise: 1 F 6 0 A → binary nibbles: 0001 1111
0110 0000 1010 . - That is a 20‑bit group. UTF‑8 4‑byte template needs 21 bits, so pad on the left with
one 0 to make 21 bits.

21‑bit padded binary (grouped for clarity):

000 011111 011000 001010

^^^ ^^^^^^ ^^^^^^ ^^^^^^
3b 6b 6b 6b

(we grouped into 3 | 6 | 6 | 6 because the 4‑byte template has xxx + three xxxxxx groups)

Step C — Place groups into the UTF‑8 4‑byte template Template bits:

11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

Fill in the groups (left→right): - xxx ← 000 - 1st xxxxxx ← 011111 - 2nd xxxxxx ← 011000 - 3rd
xxxxxx ← 001010

So the bytes become (binary):

11110000 10011111 10011000 10001010

Breakdown: - 11110000 → 0xF0 - 10011111 → 0x9F - 10011000 → 0x98 - 10001010 → 0x8A

Result (UTF‑8 byte sequence): F0 9F 98 8A — that’s exactly the bytes sent/stored for 😊 in UTF‑8.

5) Visual grid (how the bits are packed)

Code point (hex): 1 F 6 0 A

Nibbles (binary): 0001 1111 0110 0000 1010 (20 bits)

2
Pad to 21 bits: 0 0001 1111 0110 0000 1010 -> 000011111011000001010
Split into 3|6|6|6: 000 | 011111 | 011000 | 001010
UTF-8 template: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
Fill in groups: 11110000 10011111 10011000 10001010
↓ ↓ ↓
Hex bytes: F0 9F 98 8A

6) Extra notes & common pitfalls

• Padding is crucial. Always pad the code point on the left with zeros so the bit length matches the
template capacity (7, 11, 16, 21). Forgetting padding will misalign groups.
• Group left → right. Always take the most significant bits first and move to the least significant.
• Continuation bytes always start with 10 . If you see a byte that starts 10 , it’s a continuation (not
a start) of a multi‑byte character.
• Endianness confusion: UTF‑8 is a byte sequence and has no byte‑order mark (BOM) issues like
UTF‑16; the order of bytes is fixed.

7) Quick reference: how many bits per template

• 1 byte: 0xxxxxxx → 7 bits available (for U+0000..U+007F)
• 2 bytes: 110xxxxx 10xxxxxx → 5 + 6 = 11 bits (for U+0080..U+07FF)
• 3 bytes: 1110xxxx 10xxxxxx 10xxxxxx → 4 + 6 + 6 = 16 bits (for U+0800..U+FFFF)
• 4 bytes: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx → 3 + 6 + 6 + 6 = 21 bits (for
U+10000..U+10FFFF)

8) Small exercises (try to encode, answers below)

1. Encode U+00A9 (©) → expected UTF‑8 bytes?
2. Encode U+20AC (€) → expected UTF‑8 bytes?
3. Encode U+1F44D (👍 thumbs up) → expected UTF‑8 bytes?

Answers: 1. U+00A9 → hex C2 A9

2. U+20AC → hex E2 82 AC
3. U+1F44D → hex F0 9F 91 8D

9) Short summary for students

• Unicode gives each character a code point (U+...).
• UTF‑8 packs those code points into 1–4 bytes using fixed templates.

3
• To encode: convert, pad, split into groups that match template x slots, insert into the template,
then convert each 8‑bit byte to hex.

If you want, I can also provide a printable 1‑page handout (PDF) or make the same content as a slide. Tell
me which format you prefer.

Patterns in Mathematics Class 6 MCQ Assignment
90% (20)
Patterns in Mathematics Class 6 MCQ Assignment
9 pages
CHARACTER ENCODING: How Do Computers Deal With Multiple Language?
No ratings yet
CHARACTER ENCODING: How Do Computers Deal With Multiple Language?
26 pages
Statistics Mcqs With Answers NEW
100% (7)
Statistics Mcqs With Answers NEW
59 pages
Inches to Millimeters Chart
No ratings yet
Inches to Millimeters Chart
1 page
Math Olympiad Prep Guide
No ratings yet
Math Olympiad Prep Guide
57 pages
Unicode CPP PDF
No ratings yet
Unicode CPP PDF
139 pages
Ascii and Unicode
No ratings yet
Ascii and Unicode
6 pages
Damath: Math-Integrated Filipino Board Game
No ratings yet
Damath: Math-Integrated Filipino Board Game
10 pages
Notes 07 Compression PDF
No ratings yet
Notes 07 Compression PDF
193 pages
Lecture - ASCII and Unicode
No ratings yet
Lecture - ASCII and Unicode
38 pages
Division of Mixed Numbers PDF
No ratings yet
Division of Mixed Numbers PDF
8 pages
MWB Grade 6
67% (3)
MWB Grade 6
24 pages
Computer Codes
No ratings yet
Computer Codes
22 pages
Caie Igcse Add Maths 0606 Theory v1
100% (1)
Caie Igcse Add Maths 0606 Theory v1
8 pages
Java and Unicode: The Confusion About String and Char in Java
No ratings yet
Java and Unicode: The Confusion About String and Char in Java
15 pages
Howto Unicode PDF
No ratings yet
Howto Unicode PDF
13 pages
Y3 & Y4 Mathematics
No ratings yet
Y3 & Y4 Mathematics
6 pages
Decimal Division for Students
No ratings yet
Decimal Division for Students
17 pages
UNI Teaching
No ratings yet
UNI Teaching
20 pages
Unicode in C++ - McNellis - CppCon 2014
No ratings yet
Unicode in C++ - McNellis - CppCon 2014
125 pages
Utf-8, Utf-16, Utf-32 & Bom
No ratings yet
Utf-8, Utf-16, Utf-32 & Bom
13 pages
210 Huffman Encoding
No ratings yet
210 Huffman Encoding
10 pages
Unicode Basics for Tech Enthusiasts
No ratings yet
Unicode Basics for Tech Enthusiasts
51 pages
p62 0x09 UTF8 Shellcode by Greuff
No ratings yet
p62 0x09 UTF8 Shellcode by Greuff
16 pages
Lecture 1: Encoding Language: LING 1330/2330: Introduction To Computational Linguistics Na-Rae Han
No ratings yet
Lecture 1: Encoding Language: LING 1330/2330: Introduction To Computational Linguistics Na-Rae Han
18 pages
Ministry of Education: Grade 6 Study Package Mathematics
No ratings yet
Ministry of Education: Grade 6 Study Package Mathematics
354 pages
Unicode HOWTO: Guido Van Rossum and The Python Development Team
No ratings yet
Unicode HOWTO: Guido Van Rossum and The Python Development Team
13 pages
Python Unicode Guide for Developers
No ratings yet
Python Unicode Guide for Developers
2 pages
Unicode in C and C
No ratings yet
Unicode in C and C
8 pages
Compress: Input
No ratings yet
Compress: Input
2 pages
Repeating Decimals
No ratings yet
Repeating Decimals
11 pages
Utf-8 - Wikipedia, The Free Encyclopedia
No ratings yet
Utf-8 - Wikipedia, The Free Encyclopedia
10 pages
First Term Examination Mathematics Basic 7 (JSS 1) - ClassRoomNotes
100% (1)
First Term Examination Mathematics Basic 7 (JSS 1) - ClassRoomNotes
3 pages
018 Repraesentation III Online
No ratings yet
018 Repraesentation III Online
46 pages
Coding Encoding
No ratings yet
Coding Encoding
14 pages
Computer Codes
No ratings yet
Computer Codes
24 pages
7 Add Maths Paper 2 Topical Past Paper Logarithmic and Exponential Functions
No ratings yet
7 Add Maths Paper 2 Topical Past Paper Logarithmic and Exponential Functions
22 pages
Lec 1c - Character Representation
No ratings yet
Lec 1c - Character Representation
11 pages
10200
No ratings yet
10200
38 pages
Unicode HOWTO: Guido Van Rossum and The Python Development Team
No ratings yet
Unicode HOWTO: Guido Van Rossum and The Python Development Team
12 pages
Lecture 2&3
No ratings yet
Lecture 2&3
30 pages
Unicode®: Character Encodings
No ratings yet
Unicode®: Character Encodings
11 pages
Howto Unicode
No ratings yet
Howto Unicode
12 pages
Python Unicode Guide
No ratings yet
Python Unicode Guide
13 pages
Understanding Unicode and Encodings
No ratings yet
Understanding Unicode and Encodings
4 pages
Thanksgiving Multiply Fractions
No ratings yet
Thanksgiving Multiply Fractions
2 pages
008 What Is UTF-8 - UTF-8 Character Encoding Tutorial
No ratings yet
008 What Is UTF-8 - UTF-8 Character Encoding Tutorial
4 pages
General Math Quiz for Students
No ratings yet
General Math Quiz for Students
2 pages
8.4 Character Codes
No ratings yet
8.4 Character Codes
10 pages
Howto Unicode PDF
No ratings yet
Howto Unicode PDF
11 pages
CEG2136 Midterm Exam Solutions
No ratings yet
CEG2136 Midterm Exam Solutions
6 pages
Linux Unicode Programming
No ratings yet
Linux Unicode Programming
10 pages
Online - Mock Test Series For IOQM - Test-2
No ratings yet
Online - Mock Test Series For IOQM - Test-2
4 pages
0980 Scheme of Work (For Examination From 2020)
No ratings yet
0980 Scheme of Work (For Examination From 2020)
52 pages
Lecture 0 - CS50x 2024
No ratings yet
Lecture 0 - CS50x 2024
19 pages
Week 4 - A Comparative Study of UTF-8 UTF-16 and UTF-32
No ratings yet
Week 4 - A Comparative Study of UTF-8 UTF-16 and UTF-32
12 pages
Logic Exercises for Students
No ratings yet
Logic Exercises for Students
7 pages
Extr 040
No ratings yet
Extr 040
4 pages
Unicode Tutorial
No ratings yet
Unicode Tutorial
15 pages
Unit 2 Computer Architecture
No ratings yet
Unit 2 Computer Architecture
19 pages
Encm 369 w25 l04 Jan 31 Notes
No ratings yet
Encm 369 w25 l04 Jan 31 Notes
30 pages
Integer Operations and Exercises
No ratings yet
Integer Operations and Exercises
9 pages
Unicode and Character Sets
No ratings yet
Unicode and Character Sets
2 pages
Divisibility Rule For 10
No ratings yet
Divisibility Rule For 10
4 pages
Lecture 13#CSE1012-2
No ratings yet
Lecture 13#CSE1012-2
34 pages
Howto Unicode
No ratings yet
Howto Unicode
9 pages
Basic Number
No ratings yet
Basic Number
14 pages
Unicode Encoding Explained
No ratings yet
Unicode Encoding Explained
7 pages
Unicode Better Explained
No ratings yet
Unicode Better Explained
5 pages
DLD
No ratings yet
DLD
14 pages
Assignment 7
No ratings yet
Assignment 7
8 pages
Unicode Vs UTF-8
No ratings yet
Unicode Vs UTF-8
2 pages
Weekly Test
No ratings yet
Weekly Test
4 pages
Algebraic Identities & Factorization
No ratings yet
Algebraic Identities & Factorization
3 pages
Unicode UTF Summary
No ratings yet
Unicode UTF Summary
5 pages
Comparing Quantities
No ratings yet
Comparing Quantities
23 pages
Concept Map 5
No ratings yet
Concept Map 5
20 pages
Unicode UTF PlainContent
No ratings yet
Unicode UTF PlainContent
3 pages
Worksheet 1 - Factorisation Algebra Square Root
No ratings yet
Worksheet 1 - Factorisation Algebra Square Root
8 pages
Programming With Uni Cod
No ratings yet
Programming With Uni Cod
63 pages
Cambridge Lower Secondary Maths Year 7 - Chapter 1 - Numbers Teaching Plan
No ratings yet
Cambridge Lower Secondary Maths Year 7 - Chapter 1 - Numbers Teaching Plan
9 pages
Howto Unicode
No ratings yet
Howto Unicode
13 pages
Cambridge Lower Secondary Mathematics Year 7 - Introduction Presentation & First Two Weeks Plan
No ratings yet
Cambridge Lower Secondary Mathematics Year 7 - Introduction Presentation & First Two Weeks Plan
10 pages
Unicode
No ratings yet
Unicode
4 pages
U2 Lesson 4 - Text Sound and Images As Digital Data - Teacher Key
No ratings yet
U2 Lesson 4 - Text Sound and Images As Digital Data - Teacher Key
7 pages
Lab 02
No ratings yet
Lab 02
12 pages
Ass Ement Teacher Cs As
No ratings yet
Ass Ement Teacher Cs As
4 pages
Info
No ratings yet
Info
3 pages
Lesson 1 - Icebreaker Classes As
No ratings yet
Lesson 1 - Icebreaker Classes As
4 pages
Asses Ment Sheet
No ratings yet
Asses Ment Sheet
4 pages
Math
No ratings yet
Math
2 pages
Ans Teach Year 9 Comp
No ratings yet
Ans Teach Year 9 Comp
3 pages
Compyear 7
No ratings yet
Compyear 7
3 pages
Mathbook 9 Fian L
No ratings yet
Mathbook 9 Fian L
1 page
Teacher Prob
No ratings yet
Teacher Prob
2 pages
Cum Mary Comp 78
No ratings yet
Cum Mary Comp 78
2 pages
Mathbook 9
No ratings yet
Mathbook 9
1 page
Uni Code Image
No ratings yet
Uni Code Image
1 page
Pseudocode-Computer Science
No ratings yet
Pseudocode-Computer Science
3 pages
Lesson 2
No ratings yet
Lesson 2
9 pages
Uni Code Basic
No ratings yet
Uni Code Basic
2 pages

Handout - Utf 8 Encoding Explained (Step by Step For U+1f60a)

Uploaded by

Handout - Utf 8 Encoding Explained (Step by Step For U+1f60a)

Uploaded by

UTF‑8 Encoding — Step‑by‑Step Handout (example:

1) Quick reminder: Unicode vs encodings

2) UTF‑8 byte formats (templates)

Bytes Template (bits) Code point range

1 0xxxxxxx U+0000 .. U+007F (7 bits)

2 110xxxxx 10xxxxxx U+0080 .. U+07FF (5 + 6 = 11 bits)

3 1110xxxx 10xxxxxx 10xxxxxx U+0800 .. U+FFFF (4 + 6 + 6 = 16 bits)

4 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx U+10000 .. U+10FFFF (3 + 6 + 6 + 6 = 21 bits)

3) General UTF‑8 encoding algorithm (practical steps)

4) Worked example: encode U+1F60A (😊)

21‑bit padded binary (grouped for clarity):

000 011111 011000 001010

11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

So the bytes become (binary):

11110000 10011111 10011000 10001010

Breakdown: - 11110000 → 0xF0 - 10011111 → 0x9F - 10011000 → 0x98 - 10001010 → 0x8A

5) Visual grid (how the bits are packed)

Code point (hex): 1 F 6 0 A

6) Extra notes & common pitfalls

7) Quick reference: how many bits per template

8) Small exercises (try to encode, answers below)

Answers: 1. U+00A9 → hex C2 A9

9) Short summary for students

You might also like