Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
3 views20 pages

Chapter 3-Text Representation

Data is defined as formatted information that can take various forms, such as text, images, and audio. Standards exist for different types of data, such as ASCII for alphanumeric data and JPEG for images, to ensure convenience and efficiency. Binary codes are used to represent data, with various encoding systems like ASCII, Extended ASCII, and Unicode allowing for the representation of characters in digital formats.

Uploaded by

Resika Umayantha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views20 pages

Chapter 3-Text Representation

Data is defined as formatted information that can take various forms, such as text, images, and audio. Standards exist for different types of data, such as ASCII for alphanumeric data and JPEG for images, to ensure convenience and efficiency. Binary codes are used to represent data, with various encoding systems like ASCII, Extended ASCII, and Unicode allowing for the representation of characters in digital formats.

Uploaded by

Resika Umayantha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

TEXT REPRESENTATION

Chapter-03
WHAT IS DATA?
• Data is a piece of information, usually formatted in a
special way.
• This information may be in the form of text documents,
images, audio clips, software programs, etc.
WHAT IS DATA
EXAMPLES OF STANDARDS
Type of Data Standards
Alphanumeric ASCII, EBCDIC, Unicode

Image JPEG, GIF, PCX, TIFF

Motion picture MPEG-2, Quick Time

Sound Sound Blaster, WAV, AU

Outline graphics/fonts PostScript, TrueType, PDF


EXAMPLES OF STANDARDS

• Microsoft Word produces formatted text and creates documents in DOCX


format.
• Apple Pages produces documents in PAGES format.
• Adobe Acrobat produces documents in PDF format.
• HTML markup language used for Web pages produces documents in HTML
format.
WHY STANDARDS?

• They exist because they are


• Convenient
• Efficient
• Flexible
• Appropriate
• Etc.
BINARY CODES
• Example
NUMBER OF BITS IN BINARY CODES
• The number of possible bit-patterns (symbols) made of N
number of bits, M is given by:
M = 2N
• Inversely, the number of bits needed to construct M number of
symbols is given by:
N = log2 M ≈ 3.2 log10 M
(Note: N must be rounded to next bigger integer)

• Ex: for M = 26, what is the min number of bits?


N= Log2 26 = 3.2 Log10 26 = 4.5 = 5 bits
BITES AND BYTES

• All of the data stored and transmitted by digital devices is encoded as bits.
• Terminology related to bits and bytes is extensively used to describe storage
capacity and network access speed.
• The word bit, an abbreviation for binary digit, can be further abbreviated as
a lowercase b.
• A group of eight bits is called a byte and is usually abbreviated as an
uppercase B.
BITES AND BYTES

• When reading about digital devices, you’ll


frequently encounter references such as 90
kilobits per second, 1.44 megabytes, 2.8
gigahertz, and 2 terabytes.
• Kilo, mega, giga, tera, and similar terms are used
to quantify digital data.
BINARY CODED DECIMAL FORMAT
• It’s a way to represent decimal numbers directly in
binary without actually converting the number as a
whole to binary.
• For the 10 digits need a 4 bit code.
This coding is called Binary Coded
Decimal (BCD)
• The BCD is simply the 4 bit
representation of the decimal digit.

• 6 digits are not used.


EXAMPLE

• 709310 = ? (in BCD)


7 0 9 3

0111 0000 1001 0011


ALPHANUMERIC DATA
• How do you handle alphanumeric data?
• Alphanumeric – consisting of both letters and
numerals
• Easy answer! Formulate a binary code to represent
each character.
– For the 26 letter of the alphabet would need 5 bit for
representation.
– But what about the upper case and lower case, and the
digits, and special characters?
CODE SYSTEMS FOR ALPHANUMERIC
DATA
• Various code systems are used to represent Alphanumeric symbols:
1. ASCII (American Standard Code for Information Interchange)
2. Extended ASCII
3. EBCDIC (Extended Binary Coded Decimal Interchange Code)
4. Unicode (Universal Code)
ASCII

• ASCII stands for American


Standard Code for
Information Interchange
• The code uses 7 bits to
encode 128 unique
characters
EXTENDED ASCII
• It is invented to make the bit-pattern length equal to 8 bits (Byte),
by adding a bit to the left of the ASCII code representation.
Ex. If ASCII code is 1111111 the extended ASCII code is 01111111.

• Using eight bits instead of seven bits allows Extended ASCII to


provide codes for 256 characters.
• “Extended ASCII” codes start with a one-valued bit; these codes are
not standard and vary in meaning among different manufactures
and equipment.
EXTENDED ASCII
UNICODE

• Unicode can represent most of the world's characters in


modern computer use, including technical symbols and
special characters used in publishing.
• One Universal Code for every character
• no matter what the platform is
• no matter what the program is
• no matter what the language is

• It is a superset of ASCII
UNICODE
• The standard is maintained by the Unicode Consortium
• As of May 2019 the most recent version, Unicode 12.1, contains a
repertoire of 137,994 characters covering 150 modern and
historic scripts, as well as multiple symbol sets and emoji.
• Unicode can be implemented by different character encodings.
• The Unicode standard defines UTF-8, UTF-16, and UTF-32, and
several other encodings are in use.
• UTF - Unicode Transformation Format

You might also like