ASCII and Unicode
WITH MRSAEM
www.mrsaem.com | www.sirsaem.com | 1
Chapter 5
A Level
ASCII and Unicode
In the early days of computing, programmers would combine groups (sequences) of
0s and 1s to represent different things. For example, they might decide that
00000000 could be used to represent an A and 00000001 could be used to
Processor Fundamentals
represent a B and so on. The problem was that different programmers used their own
coding systems so the sequences meant different things to different people.
www.mrsaem.com | www.sirsaem.com | [email protected]
2
Chapter 5
A Level
As a result of the confusion this caused, a standard was agreed upon for the
representation of all the keyboard characters, including the numbers, and other
Processor Fundamentals
commonly used functions.
This standard is called ASCII or the American Standard Code for Information
Interchange. In fact, a 7-bit code was agreed upon as 7 bits gives 128
permutations, which is enough for the most commonly used characters.
More recently, extended ASCII was introduced which is an 8-bit code allowing
for 256 characters.
www.mrsaem.com | www.sirsaem.com | [email protected]
3
Chapter 5
A Level
Processor Fundamentals
www.mrsaem.com | www.sirsaem.com | [email protected]
4
Chapter 5
A Level
ASCII does have certain limitations
ASCII was until recently the standard method of converting keyboard and other characters into
binary codes. However, ASCII does have certain limitations:
Processor Fundamentals
● 256 characters are not sufficient to represent all of the possible characters, numbers and
symbols.
● It was initially developed in English and therefore did not represent all of the other languages
and scripts in the world.
● Widespread use of the web made it more important to have a universal international coding
system.
● The range of platforms and programs has increased dramatically with more developers from
around the world using a much wider range of characters.
www.mrsaem.com | www.sirsaem.com | [email protected]
5
Chapter 5
A Level
As a result, a new standard called Unicode has emerged
ASCII codes have been subsumed within Unicode meaning that the ASCII code for a capital
letter A is 65 and so is the Unicode code for the same character.
Processor Fundamentals
Unicode also includes international characters for over 20 countries and even includes
conversions of classical and ancient characters.
To represent these extra characters it is obviously necessary to use more than 8 bits per
character and there are two common encodings of Unicode in use today (UTF-8 and UTF-16).
As the name suggests the latter is a 16-bit code.
www.mrsaem.com | www.sirsaem.com | [email protected]
6