Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
30 views404 pages

NCERT Computer Practical Book

The document is a textbook for Class XI on Informatics Practices for the academic year 2024-25, published by NCERT. It covers fundamental concepts of data handling, programming using Python, and database management using MySQL, aiming to equip students with essential skills in information and communication technology. The book includes hands-on examples, self-assessment activities, and group projects to enhance learning and problem-solving abilities.

Uploaded by

hrudasir.alc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views404 pages

NCERT Computer Practical Book

The document is a textbook for Class XI on Informatics Practices for the academic year 2024-25, published by NCERT. It covers fundamental concepts of data handling, programming using Python, and database management using MySQL, aiming to equip students with essential skills in information and communication technology. The book includes hands-on examples, self-assessment activities, and group projects to enhance learning and problem-solving abilities.

Uploaded by

hrudasir.alc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 404

INFORMATICS PRACTICES

Textbook for Class XI

2024-25

Prelims.indd 1 09-Aug-19 2:28:21 PM


11149 – Informatics Practices ISBN 978-93-5292-148-5
Textbook for Class XI

ALL RIGHTS RESERVED


First Edition  No part of this publication may be reproduced, stored in a retrieval
system or transmitted, in any form or by any means, electronic,
August 2019 Shravana 1941
mechanical, photocopying, recording or otherwise without the prior
Reprinted permission of the publisher.
June 2021 Ashadha 1943  This book is sold subject to the condition that it shall not, by way of
trade, be lent, re-sold, hired out or otherwise disposed of without
October 2022 Kartika 1944 the publisher’s consent, in any form of binding or cover other than
March 2024 Chaitra 1946 that in which it is published.
 The correct price of this publication is the price printed on this page,
Any revised price indicated by a rubber stamp or by a sticker or
PD 20T SU by any other means is incorrect and should be unacceptable.

OFFICES OF THE PUBLICATION


© National Council of Educational
Division, NCERT
Research and Training, 2019
NCERT Campus
Sri Aurobindo Marg
New Delhi 110 016 Phone : 011-26562708
108, 100 Feet Road
Hosdakere Halli Extension
Banashankari III Stage
Bengaluru 560 085 Phone : 080-26725740
Navjivan Trust Building
P.O.Navjivan
Ahmedabad 380 014 Phone : 079-27541446
CWC Campus
Opp. Dhankal Bus Stop
205.00
Panihati
Kolkata 700 114 Phone : 033-25530454
CWC Complex
Maligaon
Guwahati 781 021 Phone : 0361-2674869

Publication Team
Head, Publication : Anup Kumar Rajput
Division

Chief Editor : Shveta Uppal


Printed on 80 GSM paper with NCERT
watermark Chief Production Officer : Arun Chitkara

Published at the Publication Division Chief Business Manager : Amitabh Kumar


by the Secretary, National Council (In charge)
of Educational Research and
Production : Om Prakash
Training, Sri Aurobindo Marg,
Officer
New Delhi 110016 and printed at
Box Corugators and Offset Printers, Cover Design and Layout
Plot No. 14A & B, Sector-1, Industrial
Meetu Sharma (Contractual)
Area, Govindpura, Bhopal- 462 023

2024-25

Prelims.indd 2 29-03-2024 14:56:41


Foreword
Information Technology has continuously been crossing the barriers
of access and communication and reaching more and more people.
The number of internet users in India has been on the rise. The
tremendous growth in computer science, telecommunications
and information technology has resulted in automation of various
tasks and contributed to the ease of living. Technology has made
continuous inroads into diverse areas — be it business, commerce,
science, sports, health, transportation or education. Today, we are
living in an interconnected world where computer based applications
influence the way we learn, communicate, commute, or even socialise.
With so many users of information and communication
technology (ICT), huge volumes of data are continuously generated
at an unprecedented rate. Many innovative business models are
being evolved which utilise such data to reach potential customers
in a more targeted way. Government agencies are also using data
to deliver services and fast track progress of different programmes,
strengthen accountability and to make more informed decisions.
This has been creating better opportunities for our youth not only to
enter the field of technical education but also in the world of work.
NCERT, for the first time, has developed a textbook on ‘Informative
Practices’ to develop skill sets in students to make use of the
opportunities provided by ICT.
This book focuses on the fundamental concepts related to
handling of data while opening a window to the emerging areas of
data processing. It seeks to address the dual challenges of reducing
curricular load as well as introducing the latest development in the
field of ICT.
As an organisation committed to systemic reforms and continuous
improvement in the quality of its curricular material, NCERT
welcomes comments and suggestions to enable us to bring about
necessary changes in its further publications.

Hrushikesh Senapaty
Director
New Delhi National Council of Educational
July 2019 Research and Training

2024-25

Prelims.indd 3 09-Aug-19 2:28:21 PM


2024-25

Prelims.indd 4 09-Aug-19 2:28:21 PM


Preface
In the present education system of our country, specialised/discipline
based courses are introduced at the higher secondary stage. This stage
is crucial as well as challenging because of the transition from general
to discipline-based curriculum. The syllabus at this stage needs to have
sufficient rigour and depth while remaining mindful of the comprehension
level of the learners. Further, the textbook should not be heavily loaded
with content.
We are living in an era where information drives many of our
socio-economic decisions. Millions of people are accessing internet round
the clock for availing various services and thereby generating vast amount
of data. Processing of data is becoming a key skill with applications
across the disciplines. Thus, study of basic concepts of data handling
and analysis is becoming more and more desirable. There are courses
offered in the name of computer science, Information and Communication
Technology (ICT), Information Technology (IT), etc. by various boards and
schools up to the secondary stage, as an optional subject. These mainly
focus on using computer for word processing, presentation tools and
application software.
Informatics Practices (IP) at the higher secondary stage of school
education is also offered as an optional subject. At this stage, students
can take up IP with the aim of pursuing a career in data science or related
areas after going through professional courses at higher levels. Therefore,
at the higher secondary stage, the curriculum of IP introduces the basics
of database management systems and data processing. The book has eight
chapters covering the following broader themes:
• Basic understanding of computer systems and their evolution,
introduction to software and their categorisation, computer
memory, awareness of emerging trends in the field of information
and communication technology.
• Basic constructs of a program using Python programming
language — program structure, identifiers, variables, flow of control,
advanced data types like Lists and Dictionaries.
• Handling data using specialised Python library called NumPy — concept
of single and multi-dimensional Array.
• Concepts of data, database, and relational database management
system using MySQL. Structured query language — data definition,
data manipulation and data querying.
Python programming language and NumPy are introduced using both
the interactive and script mode. A number of hands-on examples are given
in Python, NumPy and MySQL to gradually explain the methodology to
solve different types of problems and handle data. The programming and
database related examples as well as the exercises in those chapters are
required to be solved in a computer and verified with the given outputs.

2024-25

Prelims.indd 5 09-Aug-19 2:28:21 PM


vi

The chapters in this book have two additional components — activities


for self assessment and ‘think and reflect’ to generate further interest in
the learner.
Group projects through case studies are proposed to solve complex
problems. Some exercises have been made in case-study form to promote
problem-finding and problem-solving skills.
These chapters have been written by involving practicing teachers as
well as subject experts. These have been iteratively peer-reviewed. Several
iterations have resulted into this book. Thanks to the authors and reviewers
for their valuable contribution.
Comments and suggestions are welcome to make this endeavour par
excellence.

 Dr. Rejaul Karim Barbhuiya


 Assistant Professor,
 Department of Education in
 Science and Mathematics, NCERT

2024-25

Prelims.indd 6 09-Aug-19 2:28:21 PM


Textbook Development Committee

Members
Anuradha Khattar, Assistant Professor, Miranda House, University of
Delhi, Delhi
Chetna Khanna, Freelance Educationist, Delhi
Gurpreet Kaur, PGT (Computer Science), GD Goenka Public School, Delhi
Harita Ahuja, Assistant Professor, Acharya Narendra Dev College,
University of Delhi, Delhi
Mudasir Wani, Assistant Professor, Govt. Degree College for Women,
Srinagar, Jammu and Kashmir
Om Vikas, Professor (Retd.), Formerly Director, ABV-IIITM, Gwalior,
Madhya Pradesh
Priti Rai Jain, Assistant Professor, Miranda House, University of Delhi,
Delhi
Rinku Kumari, PGT (Computer Science), Kendriya Vidyalaya, Sainik Vihar,
Delhi
Sharanjit Kaur, Associate Professor, Acharya Narendra Dev College,
University of Delhi, Delhi
Tapasi Ray, Formerly Global IT Director, Huntsman Corporation, Singapore

Member-Coordinator
Rejaul Karim Barbhuiya, Assistant Professor, DESM, NCERT, Delhi

2024-25

Prelims.indd 7 09-Aug-19 2:28:21 PM


Acknowledgements

The National Council of Educational Research and Training acknowledges


the valuable contributions of the individuals and organisation involved in
the development of Informatics Practices textbook for Class XI.
The council expresses its gratitude to the syllabus development team
including MPS Bhatia, Professor, Netaji Subhas Institute of Technology,
Delhi; T V Vijay Kumar, Professor, School of Computer and Systems
Sciences, Jawaharlal Nehru University, New Delhi; Zahid Raza, Associate
Professor, School of Computer and Systems Sciences, Jawaharlal Nehru
University, New Delhi; Vipul Shah, Principal Scientist, Tata Consultancy
Services, and the CSpathshala team; Smruti Ranjan Sarangi, Associate
Professor, Department of Computer Science and Engineering, Indian
Institute of Technology Delhi; Vikram Goyal, Associate Professor,
Indraprastha Institute of Information Technology (IIIT) Delhi; Vandana
Tyagi, PGT (Computer Science), Kendriya Vidyalaya, JNU, Delhi and Mamur
Ali, Assistant Professor, Central Institute of Educational Technology,
NCERT, New Delhi.
The council is thankful to the following resource persons for their
contribution in editing, reviewing, and refining the manuscript of this book:
D.N. Sansanwal, Retd. Professor, Devi Ahilya Vishwavidyalaya, Indore;
Veer Saini Dixit, Assistant Professor, Atma Ram Sanatan Dharma College,
University of Delhi, Delhi; Mukesh Kumar, Teacher, DPS RK Puram, Delhi;
Gautam Sarkar, Teacher, Modern School, Barakhamba Road, Delhi; Aswin
K. Dash, Teacher, Mother’s International School, Delhi; Nancy Sehgal,
Teacher, Mata Jai Kaur Public School, Delhi; Neelima Gupta, Professor,
Department of Computer Science, University of Delhi; Anamika Gupta,
Assistant Professor, Shaheed Sukhdev College of Business Studies,
University of Delhi. The council further acknowledges the contribution of
Anuja Krishn, freelance editor, for refining the chapters from language
point of view.
The council in greatful to Dinesh Kumar, Professor and Head, DESM
for his valuable cooperation and support throughout the development
of this book.
The council also gratefully acknowledges the contributions of Meetu
Sharma, Graphic Designer; Kanika Walecha, DTP Operator; Pooja, Junior
Project Fellow; Hari Darshan Lodhi and Junaid Ahmed, DTP Operator
(Contractual); Chanchal Chauhan, Proofreader (Contractual) and
Aishwarya Bhattacharyya, Assistant Editor (Contractual), in shaping
this book. The contributions of the office of the APC, DESM and
Publication Division, NCERT, New Delhi, in bringing out this book are
also duly acknowledged.

2024-25

Prelims.indd 8 09-Aug-19 2:28:21 PM


Contents
Foreword iii
Preface iv
Chapter 1 Computer System 1
1.1 Introduction to Computer System 1
1.2 Evolution of Computer 3
1.3 Computer Memory 5
1.4 Software 9
Chapter 2 Emerging Trends 15
2.1 Introduction to Emerging Trends 15
2.2 Artificial Intelligence (AI) 16
2.3 Big Data 19
2.4 Internet of Things (IoT) 21
2.5 Cloud Computing 23
2.6 Grid Computing 25
2.7 Blockchains 26
Chapter 3 Brief Overview of Python 31
3.1 Introduction to Python 31
3.2 Python Keywords 34
3.3 Identifiers 34
3.4 Variables 34
3.5 Data Types 35
3.6 Operators 38
3.7 Expressions 41
3.8 Input and Output 42
3.9 Debugging 43
3.10 Functions 44
3.11 if..else Statements 46
3.12 for Loop 48
3.13 Nested Loops 50
Chapter 4 Working with Lists and Dictionaries 55
4.1 Introduction to List 55
4.2 List Operations 57
4.3 Traversing a List 59
4.4 List Methods and Built-in Functions 60

2024-25

Prelims.indd 9 09-Aug-19 2:28:22 PM


x

4.5 List Manipulation 62


4.6 Introduction to Dictionaries 67
4.7 Traversing a Dictionary 69
4.8 Dictionary Methods and Built-in Functions 69
4.9 Manipulating Dictionaries 71

Chapter 5 Understanding Data 81


5.1 Introduction to Data 81
5.2 Data Collection 85
5.3 Data Storage 86
5.4 Data Processing 87
5.5 Statistical Techniques for Data Processing 88

Chapter 6 Introduction to NumPy 95


6.1 Introduction 95
6.2 Array 96
6.3 NumPy Array 96
6.4 Indexing and Slicing 100
6.5 Operations on Arrays 102
6.6 Concatenating Arrays 104
6.7 Reshaping Arrays 105
6.8 Splitting Arrays 106
6.9 Statistical Operations on Arrays 107
6.10 Loading Arrays from Files 109
6.11 Saving NumPy Arrays in Files on Disk 112

Chapter 7 Database Concepts 123


7.1 Introduction 123
7.2 File System 124
7.3 Database Management System 127
7.4 Relational Data Model 132
7.5 Keys in a Relational Database 136

Chapter 8 Introduction to Structured Query Language (SQL) 143


8.1 Introduction 143
8.2 Structured Query Language (SQL) 144
8.3 Data Types and Constraints in MySQL 145
8.4 SQL for Data Definition 146
8.5 SQL for Data Manipulation 153
8.6 SQL for Data Query 156
8.7 Data Updation and Deletion 166

2024-25

Prelims.indd 10 09-Aug-19 2:28:22 PM


Computer Chapter

System 1

In this chapter

»» Introduction to
Computer System
»» Evolution of Computer
“A computer would deserve to be called
»» Computer Memory
intelligent if it could deceive a human into
»» Software
believing that it was human.”

— Alan Turing

1.1 Introduction to Computer System


A computer is an electronic device that can be
programmed to accept data (input), process it and
generate result (output). A computer along with
additional hardware and software together is called a
computer system.
A computer system primarily comprises of a central
processing unit, memory, input/output devices, and
storage devices. All these components function together
as a single unit to deliver the desired output. A computer
system comes in various forms and sizes. It can vary

2024-25

Chap 1.indd 1 19-Jul-19 3:05:06 PM


2 Informatics Practices – Class XI

Secondary from a high-end server to a personal


Storage Devices desktop, laptop, tablet computer, or
smartphone.
Primary Figure 1.1 shows the block diagram
Memory
of a computer system. The directed
Input Control Unit Output lines represent the flow of data and
Device (CU) Device signal between the components.

Arithmetic Logic 1.1.1 Central Processing Unit (CPU)


Unit (ALU)
It is the electronic circuitry of a
Central Processing computer that carries out the actual
Unit (CPU)
processing and is usually referred to
Figure 1.1: Components of a Computer System
as the brain of the computer. It is also
commonly called 'processor' also. Physically, a CPU can
be placed on one or more microchips called integrated
circuits (IC). The ICs comprise semiconductor materials.
The CPU is given instructions and data through
programs. The CPU then fetches the program and data
from the memory and performs arithmetic and logical
operations as per the given instructions and stores the
result back to memory.
While processing, the CPU stores the data as well
as instructions in its local memory, 'called' registers.
Registers are part of the CPU chip and they are limited
in size and number. Different registers are used for
storing data, instructions or intermediate results.
Other than the registers, the CPU has two main
components — Arithmetic Logic Unit (ALU) and Control
Unit (CU). ALU performs all the arithmetic and logic
operations that need to be done as per the instruction in a
Keyboard program. CU controls sequential instruction execution,
interprets instructions and guides data flow through the
computer’s memory, ALU and input or output devices.
CPU is also popularly known as microprocessor.
Mouse
1.1.2 Input Devices
The devices through which control signals are sent
to a computer are termed as input devices. These
Scanner devices convert the input data into a digital form that is
acceptable by the computer system. Some examples of
input devices include keyboard, mouse, scanner, touch
screen, etc., as shown in Figure 1.2. Specially designed
Touch Screen braille keyboards are also available to help the visually
impaired for entering data into a computer. Besides, we
Figure 1.2: Input Devices can now enter data through voice, for example, we can

2024-25

Chap 1.indd 2 19-Jul-19 3:05:07 PM


Computer System 3

use Google voice search to search the web where we can


input the search string through our voice.
Data entered through input device is temporarily
stored in the main memory (also called RAM) of the
Display monitor
computer system. For permanent storage and future use,
the data as well as instructions are stored permanently
in additional storage locations called secondary memory.
1.1.3 Output Devices Speaker

The device that receives data from a computer system


for display, physical production, etc., is called output
device. It converts digital information into human-
understandable form. For example, monitor, projector, Printer
headphone, speaker, printer, etc. Some output devices
are shown in Figure 1.3. A braille display monitor is
useful for a visually challenged person to understand
the textual output generated by computers. 3D printer
A printer is the most commonly used device to get
output in physical (hardcopy) form. Three types of Figure 1.3: Output Devices
commonly used printers are inkjet, laserjet and dot
matrix. Now-a-days, there is a new type of printer
called 3D-printer, which is used to build physical
replica of a digital 3D design. These printers are being
used in manufacturing industries to create prototypes
of products. Their usage is also being explored in the
medical field, particularly for developing body organs.

1.2 Evolution of Computer


From the simple calculator to a modern day powerful
data processor, computing devices have evolved in a
relatively short span of time. The evolution of computing
devices is shown through a timeline at Figure 1.5.
The Von Neumann architecture is shown in Figure
1.4. It consists of a Central Processing Unit (CPU)
for processing arithmetic and logical instructions, a
memory to store data and programs, input and output
devices and communication channels to send/receive
the output data. Electronic Numerical Integrator And
Computer (ENIAC) is the first
binary programmable computer
based on Von Neumann
architecture.
During the 1970s, Large Scale
Integration (LSI) of electronic Figure 1.4: Von Neumann Architecture for the
circuits allowed integration of Computer

2024-25

Chap 1.indd 3 19-Jul-19 3:05:07 PM


4 Informatics Practices – Class XI

EDVAC/ENIAC
Pascaline John Von Neumann introduced
Blaize Pascal invented a the concept of stored program
mechanical calculator known computer which was capable of
as Pascal calculator or storing data as well as program
Pascaline to do addition and in the memory. The EDVAC and
subtraction of two numbers then the ENIAC computers were
directly and multiplication and developed based on this concept.
division through repeated
addition or subtraction.
Tabulating Machine Integrated Circuit
Herman Hollerith designed An Integrated Circuit (IC) is
a tabulating machine for a silicon chip which contains
1642 summarising the data stored 1945 entire electronic circuit on a
on the punched card. It is very small area. The size of
considered to be the first computer has drastically
step towards programming. reduced because of ICs.

1890 1970

1834 1947

Analytical Engine Transistor


Charles Babbage invented Vaccum tubes were
analytical engine, a replaced by transistors
1937 developed at Bell Labs,
500 BC mechanical computing device
for inputting, processing, using semiconductor
storing and displaying the materials.
output, which is considered
to form the basis of modern
computers.

Abacus Turing Machine


Computing is attributed to The Turing machine concept was a
the invention of ABACUS general purpose programmable
almost 3000 years ago. It machine that was capable to solve
was a mechanical device any problem by executing the
capable of doing simple program stored on the punched
arithmetic calculations only. cards.

Figure 1.5: Timeline Showing Key Inventions in Computing Technology

complete CPU on a single chip, called microprocessor.


Moore’s Law predicted exponential growth in number
of transistors that could be assembled in a single
microchip. In 1980s, the processing power of computers
increased exponentially by integrating around 3 million
components on a small-sized chip termed as Very
A punched card is a Large Scale Integration (VLSI). Further advancement in
piece of stiff paper that technology has made it feasible to fabricate high density
stores digital data in
the form of holes at
of transistors and other components (approx 106
predefined positions. components) on a single IC called Super Large Scale
Integration (SLSI) as shown in Figure 1.6.
IBM introduced its first personal computer (PC) for
the home user in 1981, Apple introduced Macintosh
machines in 1984. The popularity of the PC surged
by the introduction of Graphical User Interface (GUI)

2024-25

Chap 1.indd 4 19-Jul-19 3:05:07 PM


Computer System 5

10,000,000,000
Number of Transistors
per Integrated Circuit
1,000,000,000 Core 2 DUO Core i7
100,000,000 Intel Microprocessors Pentium IV
Pentium II Pentium III
10,000,000 Pentium
1,000,000 486
486
100,000 Invention of the 386
Transistor 286
10,000 8086
1,000 4004
100 Doubles every 2 years
10
1
1940 1950 1960 1970 1980 1990 2000 2010 2020

Figure 1.6: Exponential Increase in Number of Transistors used in ICs Over Time
based operating systems by Microsoft and others in
place of computers with only command line interface,
like UNIX or DOS. Around 1990s, the growth of world
wide web (WWW) further accelerated mass usage of
computers and thereafter computers have become an
indispensable part of everyday life.
Further, with the introduction of laptops, personal
computing was made portable to a great extent. This In 1965, Intel co-
was followed by smartphones, tablets and other founder Gordon Moore
personal digital assistants. These devices have introduced Moore’s
Law which predicted
leveraged the technological advancements in processor
that the number of
miniaturisation, faster memory, high speed data and transistors on a chip
connectivity mechanisms. would double every two
The next wave of computing devices includes years while the costs
wearable gadgets such as smart watch, lenses, would be halved.
headbands, headphones, etc. Further, smart appliances
are becoming a part of the Internet of Things (IoT), by
leveraging the power of artificial intelligence.

1.3 Computer Memory


A computer system needs memory to store the data and
instructions for processing. Whenever we talk about
the “memory” of a computer system, we usually talk
about the main or primary memory. The secondary
memory (also called storage device) is used to store
data, instructions and results permanently and for Think and Reflect
future use. How do different
components of a
1.3.1 Units of Memory computer communicate
A computer system uses binary numbers to store and with each other?
process data. The binary digits 0 and 1, which are the

2024-25

Chap 1.indd 5 19-Jul-19 3:05:07 PM


6 Informatics Practices – Class XI

basic units of memory, are called bits. Further, these


bits are grouped together to form words. A 4-bit word
is called a Nibble. Examples of nibble are 1001, 1010,
0010, etc. A two nibble word, i.e., 8-bit word is called a
byte, for example, 01000110, 01111100, 10000001, etc.
Like any other standard unit, bytes are grouped
Table 1.1 Measurement units for digital data together to make
Unit Description Unit Description
bigger chunks or units
of memory. Table
KB (Kilobyte) 1 KB = 1024 Bytes PB (Petabyte) 1 PB = 1024 TB
1.1 shows different
MB (Megabyte) 1 MB = 1024 KB EB (Exabyte) 1 EB = 1024 PB measurement units for
GB (Gigabyte) 1 GB = 1024 MB ZB (Zettabyte) 1 ZB = 1024 EB digital data stored in
TB (Terabyte) 1 TB = 1024 GB YB (Yottabyte) 1 YB = 1024 ZB computer memories.

1.3.2 Types of Memory


Human beings memorise many things over a lifetime,
and recall from memory to make a decision or take
some action. However, we cannot rely on our memory
completely, so we make notes and store important
data and information using other mediums such as
notebook, manual, journal, document etc. for a long-
term storage. Similarly, computers have two types of
memories namely —primary memory and secondary
memory.
(A) Primary Memory
The primary memory is an essential component of a
computer system. Program and data are loaded into the
primary memory before processing. The CPU interacts
directly with the primary memory to perform read/
write operation. It is of two types viz. i) Random access
memory (RAM), and ii) Read only memory (ROM).
RAM is volatile i.e. as long as the power is supplied to
the computer, it retains the data in it. But as soon as the
power supply is turned off, all the contents of RAM are
Think and Reflect wiped out. It is used to store data temporarily while the
Suppose there is a computer is working. Whenever the computer is started
computer with RAM
but no secondary or a software application is launched, the required
storage. Can we install program and data are loaded into RAM for processing.
a software on that RAM is usually referred to as main memory and it is
computer? faster than the secondary memory or storage devices.
On the other hand, ROM is non-volatile, means its
contents are not lost even when the power is turned off.
It is used as a small but faster permanent storage for
the contents which are rarely changed. For example, the

2024-25

Chap 1.indd 6 19-Jul-19 3:05:07 PM


Computer System 7

startup program (boot loader) that loads the operating


system into RAM is stored in a ROM.
(B) Cache Memory
RAM is faster than secondary storage, but not as fast as
a computer processor. So, because of RAM, a CPU may Pen
have to slow down. To speed up the operations of the Drive
CPU, a very high speed memory is placed between the
CPU and the primary memory known as cache. It stores
the copies of the data from frequently accessed primary
memory locations, thus, reducing the average time
required to access data from primary memory. When
the CPU needs to access memory, it first examines the
cache. In case the requirement is met, it is read from
the cache, otherwise the primary memory is accessed.
(C) Secondary Memory
Primary memory has limited storage capacity and
is either volatile (RAM) or read-only (ROM). Thus, a
computer system needs auxiliary or secondary memory
to permanently store the data or instructions for
future use. The secondary memory is non-volatile and
has larger storage capacity than primary memory. It
is slower and cheaper than the main memory. But, it
cannot be accessed directly by the CPU. Contents of
secondary storage need to be first brought into the main Figure 1.7: Storage Devices
memory for the CPU to access. Examples of secondary
memory devices include Hard Disk Drive (HDD), CD/
DVD, Memory Card, etc., as shown in Figure 1.7.
However, these days, there are secondary storage
devices like Solid-State Drive (SSD) which support very
fast data transfer speed as compared to earlier HDDs.
Also, data transfer between computers have become
easier and simpler due to the availability of small sized
and portable flash/pen drives.
1.3.3 Data Capturing, Storage, and Retrieval
To process the data, we need to first input or capture
the data. This is followed by its storage in a file or a
database so that it can be used in the future. Whenever
data is to be processed, it is first retrieved from the file/
database so that we can perform further actions on it.
Activity 1.1
(A) Data Capturing List all secondary
It involves the process of gathering data from different storage devices available
sources in digital form. Data may be coptured using, in your school or home.
keyboard bar code readers (Used at shopping outlets as

2024-25

Chap 1.indd 7 19-Jul-19 3:05:07 PM


8 Informatics Practices – Class XI

shown in Fig. 1.8 (Figure 1.8)), remore sensors on earth


orbiting satellites etc. comments/ports over multiple
social media are also captured as data. Sometimes,
heterogeneity among data sources makes data capturing
a complex task.
Figure 1.8: Capturing Data (B) Data Storage
using Barcode Reader It is the process of storing the captured data for
processing later. Now-a-days data is being produced at
a very high rate, and therefore data storage has become
a challenging task. However, the decrease in the cost
of digital storage devices has helped in simplifying
this task. There are numerous digital storage devices
available in the market as shown in Figure 1.7.
Data keeps on increasing with time. Hence, the
storage devices also require to be upgraded periodically.
Activity 1.2 In large organisations, computers with larger and
faster storage called data servers are deployed to store
Visit some of the places
like bank, automobile vast amount of data. Such dedicated computers help
showroom, shopping in processing data efficiently. However, the cost (both
mall, tehsil office, etc., hardware and software) of setting up a data server as
and find out 2–3 names well as its maintenance is high, especially for small
of tools/instruments
used to capture data in organisations and startups.
digital format. (C) Data Retrieval
It involves fetching data from the storage devices, for its
processing as per the user requirement. As databases
grow, the challenges involved in search and retrieval of
the data in acceptable time, also increase. Minimising
data access time is crucial for faster data processing.
1.3.4 Data Deletion and Recovery
One of the biggest threats associated with digital data is
its deletion. The storage devices can malfunction or crash
down resulting in the deletion of the stored data. Users
can accidentally erase data from storage devices, or a
hacker/malware can delete the digital data intentionally.
Deleting digitally stored data means changing the
details of data at bit level, which can be very time-
consuming. Therefore, when any data is simply deleted,
Activity 1.3 its address entry is marked as free, and that much
Explore possible ways space is shown as empty to the user, without actually
of recovering deleted deleting the data.
data or data from a
In case data gets deleted accidentally or corrupted,
corrupted device.
there arises a need to recover the data. Recovery of the
data is possible only if the contents/memory space

2024-25

Chap 1.indd 8 19-Jul-19 3:05:08 PM


Computer System 9

marked as deleted have not been overwritten by some


other data. Data recovery is a process of retrieving
deleted, corrupted and lost data from secondary
storage devices.
There are usually two security concerns associated
with data. One is its deletion by some unauthorised
person or software. These concerns can be avoided
by limiting access to the computer system and using
passwords for user accounts and files, wherever
possible. There is also an option of encrypting files to
protect them from unwanted modification.
The other concern is related to unwanted recovery
of data by unauthorised user/software. Many a times,
we discard our old, broken or malfunctioning storage Activity 1.4
devices without taking care to delete data. We assume
Create a test file and
that the contents of deleted files are permanently then delete it using
removed. However, if these storage devices fall into the Shift+Delete from
hands of mischief-mongers, they can easily recover the keyboard. Now
data from such devices; this poses a threat to data recover the file using
confidentiality. This concern can be mitigated by using the methods you have
explored at Activity 1.3.
proper tools to delete or shred data before disposing off
any old or faulty storage device.

1.4 Software
Till now, we have studied about the physical
components or the hardware of the computer system.
But the hardware is of no use on its own. Hardware
needs to be operated by a set of instructions. These
sets of instructions are referred to as software. It is that
component of a computer system, which we cannot
touch or view physically. It comprises of the instructions
and data to be processed using the computer hardware.
The computer software and hardware complete any
task together. Hardware refers to the
The software comprises of set of instructions which physical components
on execution deliver the desired outcome. In other of the computer system
which can be seen and
words, each software is written for some computational touched. For example,
purpose. Some examples of software include operating RAM, keyboard,
systems like Ubuntu or Windows 7/10, word processing printer, monitor, CPU
tools like LibreOffice Writer or Microsoft Word, video etc. On the other hand,
player like VLC Player, photo editors like Paint and software is a set of
instructions and data
LibreOffice Draw. A document or image stored on the that makes hardware
hard disk or pen drive is referred to as a softcopy. Once functional to complete
printed, the document or an image is called a hardcopy. the desired task.

2024-25

Chap 1.indd 9 19-Jul-19 3:05:08 PM


10 Informatics Practices – Class XI

1.4.1 Need of Software


The sole purpose of a software is to make computer
hardware useful and operational. A software knows
how to make different hardware components of a
computer work and communicate with each other as
well as with the end user. We cannot talk to or instruct
the hardware of a computer directly. Hence, software
acts as an interface between human users and
the hardware.
Depending on the mode of interaction with hardware
and functions to be performed, software can be broadly
classified into three categories viz. i) System software
ii) Programming tools and iii) Application software. The
categorisation of software is shown in Figure 1.9.
1.4.2 System Software
The software that provides the basic functionality
to operate a computer by interacting directly with its
constituent hardware is termed as system software. A
system software knows how to operate and use different
hardware components of a computer. It provides services
directly to the end user, or to some other software.
Examples of system software include operating systems,
system utilities, device drivers, etc.
(A) Operating System
As the name implies, operating system is a system
software that operates the computer. An operating
system is the most basic system software, without
which other software cannot work. The operating system
manages other application programs and provides
access and security to the users of the system. Some
of the popular operating systems are Windows, Linux,
Macintosh, Ubuntu, Fedora, Android, iOS, etc.
(B) System Utilities
Software used for maintenance and configuration of the
computer system is called system utility. Some system
utilities are shipped with the operating system, for
example disk defragmentation tool, formatting utility,
system restore utility, etc. Another set of utilities are
those which are not shipped with the operating system
but are required to improve the performance of the
system, for example, anti-virus software, disk cleaner
tool, disk compression software, etc.

2024-25

Chap 1.indd 10 19-Jul-19 3:05:08 PM


Computer System 11

(C) Device Drivers


As the name signifies, the purpose
of a device driver is to ensure
proper functioning of a particular
device. When it comes to the overall
working of a computer system, the
operating system does the work.
But everyday new devices and
components are being added to a
computer system. It is not possible
for operating system alone to
manage all of the existing and new
peripherals, where each device
has diverse characteristics. The
responsibility for overall control,
Figure 1.9: Categorisation of Software
operation, and management of a
particular device at the hardware level is delegated to
its device driver.
The device driver acts as an interface between the
device and the operating system. It provides required
services by hiding the details of operations performed
at the hardware level of the device. Just like a language Activity 1.5
translator, a device driver acts as a mediator between Ask your teacher to
the operating system and the attached device. help you locate any two
device drivers installed
1.4.3 Application Software on your computer.
The system software provides the core functionality of
the computer system. However, different users need the
computer system for different purposes depending upon
their requirements. Hence, a new category of software
is needed to cater to different requirements of the end- A computer system
users. This specific software that works on top of the can work without
system software is termed as application software. There application software,
but it cannot work
are again two broad categories of application software:
without system
general purpose and customised application software. software. For example,
we can use a computer
(A) General Purpose Software even if no word
The application software developed for generic processing software
applications, to cater to a bigger audience in general is installed, but if no
are called general purpose software. Such ready-made operating system is
installed, we can not
application software can be used by end users as per
work on the computer.
their requirements. For example, spreadsheet tool In other words, the use
LibreOffice Calc can be used by any computer user to of computer is possible
do calculation or to create an account sheet. Adobe in the absence of
Photoshop, GIMP, Mozilla web browser, iTunes, etc. fall application software.
in the category of general purpose software.

2024-25

Chap 1.indd 11 19-Jul-19 3:05:08 PM


12 Informatics Practices – Class XI

(B) Customised Software


These are custom or tailor-made application software,
that are developed to meet the requirements of a specific
Activity 1.6
organisation or an individual. They are better suited to
With the help of your
the needs of an individual or an organisation, considering
teacher, install one
application software in that they are designed as per special requirements. Some
your computer. examples of user-defined software include websites,
school management software, accounting software, etc.
It is similar to buying a piece of cloth with specific color
and fabric and get it stitched as desired.
1.4.4 Proprietary or Free and Open Source Software
Developers of some software allow public to freely use
their software along with source code with an aim to
improve further with each other’s help. Such software is
known as Free and Open Source Software (FOSS). For
example, the source code of operating system Ubuntu is
freely accessible for anyone with the required knowledge
to improve/add new functionality. More examples of
FOSS include Python, Libreoffice, Openoffice, Mozilla
Firefox, etc. Sometimes, software are freely available
for use but source code may not be available. Such
software is called freeware. Examples of freeware are
Skype, Adobe Reader etc.
When software to be used has to be purchased from
the vendor who has the copyright of the software, then
it is a proprietary software. Examples of proprietary
software include Microsoft Windows, Tally, Quickheal
etc. A software can be freeware or open source or
proprietary software depending upon the terms and
conditions of the person or group who has developed
and released that software.

2024-25

Chap 1.indd 12 19-Jul-19 3:05:08 PM


Computer System 13

Summary Notes
• A computing device, also referred as computer
processes the input data as per given instructions
to generate desired output.
• Computer processes data to generate information
whose further analysis and interpretation yields
knowledge.
• Computer system has four physical components
viz. i) CPU ii) Primary memory iii) Input device and
iv) Output device. They are referred to as hardware
of computer.
• Computer system has two types of primary
memories viz. i) RAM the volatile memory and ii)
ROM the non-volatile memory.
• Software is a set of instructions written to achieve
the desired task and are mainly categorised as
system software, programming tools and application
software.
• Hardware of a computer cannot function on its own.
It needs software to be operational or functional.
• Operating system is an interface between the user
and the computer and supervises the working of
computer system i.e. it monitors and controls the
hardware and software of the computer system.

Exercise
1. Name the software required to make a computer
functional. Write down its two primary functions.
2. What is the need of RAM? How does it differ from ROM?
3. What is the need for secondary memory?
4. Draw the block diagram of a computer system. Briefly
write about the functionality of each component.
5. Differentiate between proprietary software and freeware
software. Name two software of each type.
6. Mention any browsers used for browsing the internet.

2024-25

Chap 1.indd 13 19-Jul-19 3:05:08 PM


14 Informatics Practices – Class XI

Notes 7. Name the input/output device used to do the following:


a) To output audio
b) To enter textual data
c) To make hard copy of a text file
d) To display the data/information
e) To enter audio-based command
f) To build 3D models
g) To assist a visually impaired individual in entering
data
8. Identify the category (system, application, programming
tool) of the following software:
a) Compiler
b) Assembler
c) Ubuntu
d) Text editor
9. Convert the following into bytes:
a) 2 MB
b) 3.7 GB
c) 1.2 TB
10. What is the security threats involved when we throw
away electronic gadgets that are non-functional?
11. Write down the type of memory needed to do the following:
a) To store data permanently
b) To execute the program
c) To store the instructions which can not be overwritten.

2024-25

Chap 1.indd 14 19-Jul-19 3:05:08 PM


Emerging Chapter

Trends 2

In this chapter

»» Introduction to
Emerging Trends
»» Artificial Intelligence (AI)
“Computer science is no more about
»» Big Data
computers than astronomy is about
»» Internet of Things (IoT)
telescopes” »» Cloud Computing
»» Grid Computing
— Edsger Dijkstra »» Blockchains

2.1 Introduction to Emerging Trends


Computers have been around for quite some time
now. New technologies and initiatives emerge
with each passing day. In order to understand the
existing technologies and have a better view of the
developments around us, we must keep an eye on
the emerging trends. Many new technologies are
introduced almost every day. Some of these do not
succeed and fade away over time. Some of these
new technologies prosper and persist over time,
gaining attention from users. Emerging trends
are the state-of-the-art technologies, which gain

2024-25

Chap 2.indd 15 19-Jul-19 3:06:47 PM


16 Informatics Practices – Class XI

popularity and set a new trend among users. In this


chapter, we will learn about some emerging trends
that will make a huge impact (in the future) on digital
economy and interaction in digital societies.

2.2 Artificial Intelligence (AI)


Have you ever wondered how maps in your smartphone
are able to guide you to take the fastest route to your
destination by analysing real time data, such as traffic
congestion? On uploading a photo on a social networking
site, has it ever happened that your friends in the
photograph were recognised and tagged automatically?
These are some of the examples of application of
Artificial Intelligence. The intelligent digital personal
assistants like Siri, Google Now, Cortana, Alexa are
all powered by AI. Artificial Intelligence endeavours to
simulate the natural intelligence of human beings into
machines, thus making them behave intelligently. An
A knowledge base is
intelligent machine is supposed to imitate some of the
a store of information cognitive functions of humans like learning, decision-
consisting of facts, making and problem solving. In order to make machines
assumptions and perform tasks with minimum human intervention, they
rules which an AI are programmed to create a knowledge base and make
system can use for
decision making.
decisions based on it. AI system can also learn from
past experiences or outcomes to make new decisions.
2.2.1 Machine Learning
Machine Learning is a subsystem of Artificial
Intelligence, wherein computers have the ability to learn
from data using statistical techniques, without being
explicitly programmed by a human being. It comprises
algorithms that use data to learn on their own and make
predictions. These algorithms, called models, are first
trained and tested using a training data and testing
data, respectively. After successive trainings, once these
models are able to give results to an acceptable level of
accuracy, they are used to make predictions about new
and unknown data.
2.2.2 Natural Language Processing (NLP)

Activity 2.1
The predictive typing feature of search engine that
helps us by suggesting the next word in the sentence
Find out how NLP is
while typing keywords and the spell checking features
helping differently-
abled persons? are examples of Natural Language Processing (NLP).
It deals with the interaction between human and

2024-25

Chap 2.indd 16 19-Jul-19 3:06:47 PM


Emerging Trends 17

computers using human


spoken languages, such as
Hindi, English, etc.
In fact it is possible to
search the web or operate or
control our devices using our
voice. All this has been possible
by NLP. An NLP system can
perform text-to-speech and Figure 2.1: Use of natural language processing
speech-to-text conversion as
depicted in Figure 2.1.
Machine translation is a rapidly emerging field where
machines are able to translate texts from one language
to another with fair amount of correctness. Another
emerging application area is automated customer
service where a computer software can interact with
customers to serve their queries or complaints.
2.2.3 Immersive Experiences
With the three-dimensional (3D) videography, the
joy of watching movies in theatres has reached to
a new level. Video games are also being developed to
provide immersive experiences to the player. Immersive
experiences allow us to visualise, feel and react by
stimulating our senses. It enhances our interaction and
involvement, making them more realistic and engaging.
Immersive experiences have been used in the field of
training, such as driving simulators (Figure 2.2), flight Figure 2.2: Driving Simulator
simulator and so on. Immersive experience can be
achieved using virtual reality and augmented reality.
(A) Virtual Reality
Everything that we experience in our reality is perceived
through our senses. From this came the idea that
if we can present our senses with made-up or non-
real information, our perception of reality would also
alter in response to that. Virtual Reality (VR) is a
three-dimensional, computer-generated situation that
simulates the real world. The user can interact with and
explore that environment by getting immersed in it while
interacting with the objects and other actions of the user.
At present, it is achieved with the help of VR Headsets.
In order to make the experience of VR more realistic, it
promotes other sensory information like sound, smell,
motion, temperature, etc. It is a comparatively new field Figure 2.3: VR Headset

2024-25

Chap 2.indd 17 19-Jul-19 3:06:48 PM


18 Informatics Practices – Class XI

and has found its applications in gaming (Figure 2.3),


Unlike Virtual Reality, military training, medical procedures, entertainment,
the Augmented Reality
does not create something
social science and psychology, engineering and
new, it just alters or other areas where simulation is needed for a better
augments the perception understanding and learning.
of the underlying physical
world through additional (B) Augmented Reality
information. The superimposition of computer generated perceptual
information over the existing physical surroundings is
called as Augmented Reality (AR). It adds components
of the digital world to the physical world, along with
the associated tactile and other sensory requirements,
thereby making the environment interactive and
digitally manipulable. Users can access information
about the nearest places with reference to their current
location. They can get information about places and
choose on the basis of user reviews. With thet help of
location-based AR App, travellers can access real-time
information of historical places just by pointing their
Figure 2.4: Location based camera viewfinder to subjects as depicted in Figure 2.4.
Augmented Reality Location-based AR apps are major forms of AR apps.
2.2.4 Robotics
A robot is basically a machine capable of carrying out one
or more tasks automatically with accuracy and precision.
Unlike other machines, a robot is programmable, which
means it can follow the instructions given through
Activity 2.2
computer programs. Robots were initially conceptualised
Find out what role are for doing repetitive industrial tasks that are boring or
robots playing in the stressful for humans or were labour-intensive. Sensors
medical field?
are one of the prime components of a robot. Robot can
be of many types, such as wheeled robots, legged robots,
manipulators and humanoids. Robots that resemble
humans are known as humanoids. Robots are being
used in industries, medical science, bionics, scientific
research, military, etc. Some examples are:
Robotics is an • NASA’s Mars Exploration Rover (MER) mission is
interdisciplinary branch a robotic space mission to study about the planet
of technology requiring Mars (Figure 2.5).
applications of mechanical
engineering, electronics, • Sophia is a humanoid that uses artificial intelligence,
and computer science, visual data processing, facial recognition and also
among others. Robotics is
imitates human gestures and facial expressions, as
primarily concerned with
the design, fabrication, shown in Figure 2.6.
operation, and application • A drone is an unmanned aircraft which can be
of robots.
remotely controlled or can fly autonomously through

2024-25

Chap 2.indd 18 19-Jul-19 3:06:48 PM


Emerging Trends 19

Figure 2.5: NASA’s Mars Figure 2.6: Sophia : a Figure 2.7: an unmanned
Exploration Rover (MER) Humanoid aircraft
software-controlled flight plans in their embedded
systems, working in conjunction with onboard
sensors and GPS (Figure 2.7). Drones are being
used in many fields, such as journalism, filming Think and Reflect
and aerial photography, shipping or delivery at short Can a drone be helpful
distances, disaster management, search and rescue in the event of a
operations, healthcare, geographic mapping and natural calamity?
structural safety inspections, agriculture, wildlife
monitoring or pooching, besides law-enforcement and
border patrolling.

2.3 Big Data


With technology making
an inroad into almost every
sphere of our lives, data
is being produced at a
colossal rate. Today, there
are over a billion Internet
users, and a majority of
the world’s web traffic is
coming from smartphones.
Figure 2.8 shows that at
the current pace, around
2.5 quintillion bytes of data
are created each day, and
the pace is increasing with
the continuous evolution of
the Internet of Things (IoT).
This results in the
generation of data sets
of enormous volume and
complexity called Big
Data. Such data cannot
be processed and analysed
using traditional data Figure 2.8: Sources of big data (numbers are approximate)

2024-25

Chap 2.indd 19 19-Jul-19 3:06:50 PM


20 Informatics Practices – Class XI

processing tools as the data is not only voluminous,


but also unstructured like our posts, instant messages
and chats, photographs that we share through various
sites, our tweets, blog articles, news items, opinion
Think and Reflect polls and their comments, audio/video chats, etc.
How are your digital Big data not only represents voluminous data, it also
activities contributing
to generation of Big
involves various challenges like integration, storage,
data? analysis, searching, processing, transfer, querying and
visualisation of such data. Big data sometimes hold rich
information and knowledge which is of high business
value, and therefore there is a keen effort in developing
software and methods to process and analyse big data.
2.3.1 Characteristics of Big Data
Big data exhibits following five characteristics shown in
Figure 2.9, that distinguish it from traditional data.
(A) Volume
The most prominent characteristic of big data is its
enormous size. If a particular data set is of such large
size that it is difficult to process it with traditional DBMS
tools, it can be termed as big data.
(B) Velocity
It represents the rate at which the data under
consideration is being generated and stored. Big data
has an exponentially higher rate of generation than
traditional data sets.
(C) Variety
It asserts that a data set has varied data, such as
structured, semi-structured and unstructured data.
Figure 2.9: Characteristics Some examples are text, images, videos, web pages and
of big data so on.
(D) Veracity
Big data can be sometimes inconsistent, biased, noisy
or there can be abnormality in the data or issues
with the data collection methods. Veracity refers to
the trustworthiness of the data because processing
such incorrect data can give wrong results or mislead
the interpretations.
(E) Value
Big data is not only just a big pile of data, but also
possess to have hidden patterns and useful knowledge
which can be of high business value. But as there is cost
of investment of resources in processing big data, we
should make a preliminary enquiry to see the potential

2024-25

Chap 2.indd 20 19-Jul-19 3:06:50 PM


Emerging Trends 21

of the big data in terms of value discovery or else our


efforts could be in vain.
2.3.2 Data Analytics
Data analytics is the process of examining data sets
in order to draw conclusions about the information
they contain, with the aid of specialised systems
and software.
Data analytics technologies and techniques are
becoming popular day-by-day. They are used in
commercial industries to enable organisations to make
more informed business decisions. In the field of science
and technology, it can be useful for researchers to verify
or disprove scientific models, theories and hypotheses.
Pandas is a library of the programming language
Python that can be used as a tool to make data analysis
much simpler.

2.4 Internet of Things (IoT)


The term computer network that we commonly use is
the network of computers. Such a network consists of a
laptop, desktop, server, or a portable device like tablet,
Activity 2.3
smartphone, smartwatch, etc., connected through
wire or wireless. We can communicate between these Explore and list a few
IoT devices available in
devices using Internet or LAN. Now imagine what if the market.
our bulbs, fans and refrigerator also became a part of
this network. How will they communicate with each
other, and what will they communicate? Think about
the advantages and tasks that can be accomplished
if all these devices with smart connectivity features
are able to communicate amongst themselves and
we are also able to communicate
with them using computers
or smartphones!
The ‘Internet of Things’ is a
network of devices that have an
embedded hardware and software
to communicate (connect and
exchange data) with other
devices on the same network
as shown in Figure 2.10. At
present, in a typical household,
many devices have advanced
hardware (microcontrollers) and
software. These devices are used Figure 2.10: Internet of Things (IoT)

2024-25

Chap 2.indd 21 19-Jul-19 3:06:50 PM


22 Informatics Practices – Class XI

in isolation from each other, with maximum human


intervention needed for operational directions and
input data. IoT tends to bring together these devices to
work in collaboration and assist each other in creating
an intelligent network of things. For example, if a
microwave oven, an air conditioner, door lock, CCTV
camera or other such devices are enabled to connect to
the Internet, we can access and remotely control them
on-the-go using our smartphone.
2.4.1 Web of Things (WoT)
Internet of Things allows us to interact with different
devices through Internet with the help of smartphones
or computers, thus creating a personal network. But to
interact with ‘n’ number of different devices, we need
Activity 2.4 to install ‘n’ different apps. Wouldn’t it be convenient
We use GPS to to have one interface to connect all the devices? The
navigate outdoors. VPS web is already being used as a system to communicate
is another emerging with each other. So, will it be possible to use the web
trend that uses
Augmented Reality.
in such a way that all things can communicate with
Explore and find its each other in the most efficient manner by integrating
other utilities. them together? Web of Things (WoT) allows the use of
web services to connect anything in the physical world,
besides human identities on web. It will pave way for
creating smart homes, smart offices, smart cities and
so on.
2.4.2 Sensors
What happens when you hold your mobile vertically
or horizontally? The display also changes to vertical or
horizontal with respect to the way we hold our mobile.
This is possible with the help of two sensors, namely
accelerometer and gyroscope (gyro). The accelerometer
sensor in the mobile phones detects the orientation of
the phone. The gyroscope sensors tracks rotation or
twist of your hand and add to the information supplied
by the accelerometer.
Sensors are very commonly used for monitoring
and observing elements in real world applications. The
evolution of smart electronic sensors is contributing in
a large way to the evolution of IoT. It will lead to creation
of new sensor-based, intelligent systems.
A smart sensor is a device that takes input from
the physical environment and uses built-in computing
resources to perform predefined functions upon
detection of specific input and then process data before
passing it on.

2024-25

Chap 2.indd 22 19-Jul-19 3:06:50 PM


Emerging Trends 23

2.4.3 Smart Cities


With rapid urbanisation, the load on our cities is Think and Reflect
What are your ideas of
increasing day-by-day, and there are challenges in
transforming your city
management of resources like land water, waste, air into a smart city?
pollution, health and sanitation, traffic congestions,
public safety and security, besides the overall city
infrastructures including road, rail, bridge, electricity,
subways, disaster management, sports facilities, etc.
These challenges are forcing many city planners around
the world to look for smarter ways to manage them and
make cities sustainable and livable.
The idea of a
smart city as shown
in Figure 2.11 makes
use of computer
and communication
technology along with
IoT, WoT to manage and
distribute resources
efficiently. The smart
building shown here
uses sensors to detect
earthquake tremors
and then warn nearby
buildings so that they
can prepare themselves
accordingly. The smart
bridge uses wireless
sensors to detect
any loose bolt, cable
or crack. It alerts Figure 2.11: Smart City
concerned authorities through SMS. The smart tunnel
also uses wireless sensors to detect any leakage or
congestion in the tunnel. This information can be sent
as wireless signals across the network of sensor nodes
to a centralised computer for further analysis.
Every sphere of life in a city like transportation
systems, power plants, water supply networks, waste
management, law enforcement, information systems,
schools, libraries, hospitals and other community
services work in unison to optimise the efficiency of city
operations and services.

2.5 Cloud Computing


Cloud computing is an emerging trend in the field of
information technology, where computer-based services
are delivered over the Internet or the cloud, for the case

2024-25

Chap 2.indd 23 19-Jul-19 3:06:50 PM


24 Informatics Practices – Class XI

of their accessibility form any where using any smart


device. The services comprise software, hardware
(servers), databases, storage, etc. These resources are
provided by companies called cloud service providers
and usually charge on pay per use basis, like the way
we pay for electricity usage. We already use cloud
services while storing our pictures and files as backup
on Internet, or host a website on the Internet. Through
cloud computing, a user can run a bigger application
or process a large amount of data without having the
required storage or processing power on their personal
computer as long as they are connected to the Internet.
Besides other numerous features, cloud computing
offers cost-effective, on-demand resources. A user can
avail need-based resources from the cloud at a very
reasonable cost.
2.5.1 Cloud Services
A better way to understand the cloud is to interpret
everything as a service. A service corresponds to any
facility provided by the cloud. There are three standard
models to categorise different computing services
delivered through cloud as shown in Figure 2.12. These
are Infrastructure as a Service (IaaS), Platform as a
Service (PaaS), and Software as a Service (SaaS).
(A) Infrastructure as a Service (IaaS)
The IaaS providers can offer different kinds of computing
infrastructure, such as servers, virtual machines (VM),
storage and backup facility, network components,
operating systems or any other
hardware or software. Using
IaaS from the cloud, a user can
use the hardware infrastructure
located at a remote location to
configure, deploy and execute
any software application on
that cloud infrastructure. They
can outsource the hardware
and software on demand basis
and pay as per the usage,
thereby they can save the cost
of software, hardware and other
infrastructures as well as the
cost of setting up, maintenance
Figure 2.12: Cloud Computing Services
and security.

2024-25

Chap 2.indd 24 19-Jul-19 3:06:50 PM


Emerging Trends 25

(B) Platform as a Service (PaaS)


Through this service, a user can install and execute
an application without worrying about the underlying
Activity 2.5
infrastructure and their setup. That is, PaaS provides
a platform or environment to develop, test, and deliver Name a few data
centers in India along
software applications. Suppose we have developed a with the major services
web application using MySQL and Python. To run this that they provide.
application online, we can avail a pre-configured Apache
server from cloud having MySQL and Python pre-
installed. Thus, we are not required to install MySQL
and Python on the cloud, nor do we need to configure
the web server (Apache, nginx). In PaaS, the user has
complete control over the deployed application and its
configuration. It provides a deployment environment
for developers at a much reduced cost lessening the
complexity of buying and managing the underlying
hardware and software.
(C) Software as a Service (SaaS)
SaaS provides on-demand access to application software,
usually requiring a licensing or subscription by the
user. While using Google doc, Microsoft Office 365,
Drop Box, etc., to edit a document online, we use SaaS
from cloud. A user is not concerned about installation
or configuration of the software application as long
as the required software is accessible. Like PaaS, a
user is provided access to the required configuration
settings of the application software, that they are using
at present.
In all of the above standard service models, a user can
use on-demand infrastructure or platform or software
and is usually charged as per the usage, thereby
eliminating the need of a huge investment upfront for
a new or evolving organisation. In order to utilise and
harness the benefits of cloud computing, Government
of India has embarked upon an ambitious initiative —
‘GI Cloud’ which has been named as ‘MeghRaj’ (https://
cloud.gov.in).

2.6 Grid Computing


A grid is a computer network of geographically
dispersed and heterogeneous computational resources
as shown in Figure 2.13. Unlike cloud, whose primary
focus is to provide services, a grid is more application
specific and creates a sense of a virtual supercomputer

2024-25

Chap 2.indd 25 19-Jul-19 3:06:50 PM


26 Informatics Practices – Class XI

with an enormous processing power and storage. The


Think and Reflect constituent resources are called nodes. These different
How can some of nodes temporarily come together to solve a single large
the emerging trends task and to reach a common goal.
discussed in this
Nowadays, countless computational nodes ranging
chapter be used as
assistive tools for from hand-held mobile devices to personal computers and
people with disabilities? workstations are connected to Local Area Network (LAN)
or Internet. Therefore, it is economically feasible to
reuse or utilise their resources like memory as well as
processing power. The grid provides an opportunity to
solve computationally intense scientific and research
problems without actually procuring a costly hardware.
Grid can be of two types — (i) Data grid, used to
manage large and distributed data having the required
multi-user access, and (ii) CPU or Processor grid, where
processing is moved from one PC to another as needed
or a large task is divided into subtasks, and allotted to
various nodes for parallel processing.
Grid computing is different from IaaS cloud
Users sharing their resources service. In case of IaaS cloud service, there
is a service provider who rents the required
infrastructure to the users. Whereas in grid
computing, multiple computing nodes join
together to solve a common computational
problem.
Grid Resource
Management To set up a grid, by connecting numerous
System nodes in terms of data as well as CPU, a
middleware is required to implement the
distributed processor architecture. The Globus
toolkit (http://toolkit.globus.org/toolkit) is
one such software toolkit used for building
grids, and it is as open source. It includes
Users sharing their resources
software for security, resource management,
Figure 2.13: Grid computing data management, communication, fault
detection, etc.

2.7 Blockchains
Traditionally, we perform digital transactions by storing
data in a centralised database and the transactions
performed are updated one by one on the database. That
is how the ticket booking websites or banks operate.
However, since all the data is stored on a central
location, there are chances of data being hacked or lost.
The blockchain technology works on the concept of
decentralised and shared database where each computer

2024-25

Chap 2.indd 26 19-Jul-19 3:06:51 PM


Emerging Trends 27

has a copy of the database. A block can be thought


as a secured chunk of data or valid transaction. Each
block has some data called its header, which is visible
to every other node, while only the owner has access to
the private data of the block. Such blocks form a chain
called blockchain as shown in Figure 2.14. We can
define blockchain as a system that allows a group of
connected computers to maintain a single updated and
secure ledger. Each computer or node that participates
in the blockchain receives a full copy of the database. It
maintains an ‘append only’ open ledger which is updated
only after all the nodes within the network authenticate
the transaction. Safety and security of the transactions
are ensured because all the members in the network
keep a copy of the blockchain and so it is not possible
for a single member of the network to make changes or
alter data.

The request is
Someone requests broadcast to all
a transaction nodes in the
network

If verified by all nodes,


The transaction the block get added in the
is complete already existing chain of
blocks

Figure 2.14: Block chain technology

The most popular application of blockchains


technology is in digital currency. However, due to its
decentralised nature with openness and security,
blockchains are being seen as one of the ways to ensure
transparency, accountability and efficiency in business
as well as in governance systems.
For example, in healthcare, better data sharing
between healthcare providers would result in a higher
Think and Reflect
probability of accurate diagnosis, more effective Name any two areas
treatments, and the overall increased ability of healthcare other than those given
organisations to deliver cost-effective care. Another where the concept of
potential application can be for land registration records, blockchain technology
can be useful.
to avoid various disputes arising out of land ownership

2024-25

Chap 2.indd 27 19-Jul-19 3:06:51 PM


28 Informatics Practices – Class XI

Notes claims and encroachments. A blockchain based voting


system can solve the problem of vote alterations and
other issues. Since everything gets stored in the ledger,
voting can become more transparent and authentic. The
blockchain technology can be used in diverse sectors,
such as banking, media, telecom, travel and hospitality
and other areas.

Summary
• Artificial Intelligence endeavours to simulate the
natural intelligence of human beings into machines
thus making them intelligent.
• Machine learning comprises of algorithms that use
data to learn on their own and make predictions.
• Natural language processing (NLP) facilitates
communicating with intelligent systems using a
natural language.
• Virtual reality allows a user to look at, explore, and
interact with the virtual surroundings, just like one
can do in the real world.
• The superimposition of computer-generated
perceptual information over the existing physical
surroundings is called augmented reality.
• Robotics can be defined as the science primarily
associated with the design, fabrication, theory, and
application of robots.
• Big data holds rich information and knowledge which
can be of high business value. Five characteristics
of big data are: Volume, Velocity, Variety, Veracity,
and Value.
• Data analytics is the process of examining data sets
in order to draw conclusions about the information
they contain.
• The Internet of Things (IoT) is a network of devices
that have an embedded hardware and software to
communicate (connect and exchange data) with
other devices on the same network.
• A sensor is a device that takes input from the
physical environment and uses built-in computing
resources to perform predefined functions upon
detection of specific input and then processes data
before passing it on.

2024-25

Chap 2.indd 28 19-Jul-19 3:06:51 PM


Emerging Trends 29

• Cloud computing allows resources located at Notes


remote locations to be made available to anyone
anywhere. Cloud services can be Infrastructure as
a Service (IaaS), Platform as a Service (PaaS), and
Software as a Service (SaaS).
• Block chair technology uses a shared data base of
chaired blocks where copies of data base exist on
multiple computers.

Exercise
1. List some of the cloud-based services that you are using
at present.
2. What do you understand by the Internet of Things? List
some of its potential applications.
3. Write a short note on the following:
a) Cloud computing
b) Big data and its characteristics
4. Explain the following along with their applications.
a) Artificial Intelligence
b) Machine Learning
5. Differentiate between cloud computing and grid
computing with suitable examples.
6. Justify the following statement-
‘Storage of data is cost effective and time saving in cloud
computing.’
7. What is on-demand service? How it is provided in cloud
computing?
8. Write examples of the following:
a) Government provided cloud computing platform
b) Large scale private cloud service providers and the
services they provide
9. A company interested in cloud computing is looking for a
provider who offers a set of basic services such as virtual
server provisioning and on-demand storage that can be
combined into a platform for deploying and running
customised applications. What type of cloud computing
model fits these requirements?
a) Platform as a Service
b) Software as a Service
c) Infrastructure as a Service

2024-25

Chap 2.indd 29 19-Jul-19 3:06:51 PM


30 Informatics Practices – Class XI

Notes 10. Which is not one of the features of IoT devices?


a) Remotely controllable
b) Programmable
c) Can turn themselves off if necessary
d) All of the above
11. If Government plans to make a smart school by
applying IoT concepts, how can each of the following
be implemented in order to transform a school into IoT
enabled smart school?
a) e-textbooks
b) Smart boards
c) Online tests
d) Wifi sensors on classrooms doors
e) Sensors in buses to monitor their location
f) Wearables (watches or smart belts) for attendance
monitoring
12. Five friends plan to try a startup. However, they have
a limited budget and limited computer infrastructure.
How can they avail the benefits of cloud services to
launch their startup?
13. Governments provide various scholarships to students
of different classes. Prepare a report on how blockchain
technology can be used to promote accountability,
transparency and efficiency in distribution of
scholarships?
14. How IoT and WoT are related?
15. Match the following:
Column A Column B

You got a reminder to take Smart Parking


medication
You got a sms alert that you Smart Wearable
forgot to lock the door

You got the sms alert that Home Automation


parking space is available near
your block
You turned off your LED TV Smart Health
from your wrist watch

2024-25

Chap 2.indd 30 19-Jul-19 3:06:51 PM


Brief Overview Chapter

of Python 3

In this chapter

»» Introduction to Python
»» Python Keywords
“Don't you hate code that's not properly »» Identifiers
indented? Making it [indenting] part of »» Variables
the syntax guarantees that all code is »» Data Types
properly indented.” »» Operators
»» Expressions
— G. van Rossum »» Input and Output
»» Debugging
»» Functions
»» if..else Statements
3.1 Introduction to Python »» for Loop
»» Nested Loops
An ordered set of instructions or commands to be
executed by a computer is called a program. The
language used to specify those set of instructions
to the computer is called a programming language
for example Python, C, C++, Java, etc.
This chapter gives a brief overview of Python
programming language. Python is a very popular
and easy to learn programming language, created
by Guido van Rossum in 1991. It is used in a
variety of fields, including software development,
web development, scientific computing, big data

2024-25

Chap 3.indd 31 19-Jul-19 3:16:31 PM


32 Informatics Practices – Class XI

and Artificial Intelligence. The programs given in this book


are written using Python 3.7.0. However, one can install
any version of Python 3 to follow the programs given.
Download Python 3.1.1 Working with Python
The latest version of To write and run (execute) a Python program, we need
Python is available on the to have a Python interpreter installed on our computer
official website:
or we can use any online Python interpreter. The
https://www.python. interpreter is also called Python shell. A sample screen
org/
of Python interpreter is shown in Figure 3.1. Here, the
symbol >>> is called Python prompt, which indicates
that the interpreter is ready to receive instructions.
We can type commands or statements on this prompt
for execution.

Figure 3.1: Python Interpreter or Shell

3.1.2 Execution Modes


There are two ways to run a program using the Python
interpreter:
a) Interactive mode
b) Script mode
(A) Interactive Mode
In the interactive mode, we can type a Python statement
on the >>> prompt directly. As soon as we press enter,
the interpreter executes the statement and displays the
result(s), as shown in Figure 3.2.
Working in the interactive mode is convenient for
testing a single line code for instant execution. But in
the interactive mode, we cannot save the statements for

Figure 3.2: Python Interpreter in Interactive Mode

2024-25

Chap 3.indd 32 19-Jul-19 3:16:32 PM


Brief Overview of Python 33

future use and we have to retype the statements to run


them again.
(B) Script Mode
In the script mode, we can write a Python program in
a file, save it and then use the interpreter to execute
the program from the file. Such program files have
a .py extension and they are also known as scripts.
Usually, beginners learn Python in interactive mode, IDLE : Integrated
but for programs having more than a few lines, we Development and
should always save the code in files for future use. Learning Environment
Python scripts can be created using any editor. Python
has a built-in editor called IDLE which can be used
to create programs. After opening the IDLE, we can
click File>New File to create a new file, then write our
program on that file and save it with a desired name.
By default, the Python scripts are saved in the Python
installation folder.

Figure 3.3: Python Code in Script Mode (prog3-1.py)

To execute a Python program in script mode,


a) Open the program using an editor, for example
IDLE as shown in Figure 3.3.
b) In IDLE, go to [Run]->[Run Module] to execute the
prog3-1.py as shown in Figure 3.4.
c) The output appears on shell as shown in Figure
3.5.

Figure 3.4: Execution of Python in Script mode using IDLE

Figure 3.5: Output of a Program prog 3-1.py executed in Script Mode

2024-25

Chap 3.indd 33 19-Jul-19 3:16:32 PM


34 Informatics Practices – Class XI

Notes 3.2 Python Keywords


Keywords are reserved words. Each keyword has a
specific meaning to the Python interpreter. As Python
is case sensitive, keywords must be written exactly as
given in Table 3.1.
Table 3.1 Python keywords
False class finally is return
None continue for lambda try
True def from nonlocal while
and del global not with
as elif if or yield
assert else import pass
break except in raise

3.3 Identifiers
In programming languages, identifiers are names used
to identify a variable, function, or other entities in a
program. The rules for naming an identifier in Python
are as follows:
• The name should begin with an uppercase or a
lowercase alphabet or an underscore sign (_). This
may be followed by any combination of characters
a-z, A-Z, 0-9 or underscore (_). Thus, an identifier
cannot start with a digit.
• It can be of any length. (However, it is preferred to
keep it short and meaningful).
• It should not be a keyword or reserved word given in
Table 3.1.
• We cannot use special symbols like !, @, #, $, %, etc.
in identifiers.
For example, to find the average of marks obtained
by a student in three subjects namely Maths, English,
Informatics Practices (IP), we can choose the identifiers
as marksMaths, marksEnglish, marksIP and avg
rather than a, b, c, or A, B, C, as such alphabets do not
give any clue about the data that variable refers to.
avg = (marksMaths + marksEnglish + marksIP)/3
3.4 Variables
Variable is an identifier whose value can change. For
example variable age can have different value for
different person. Variable name should be unique in a
program. Value of a variable can be string (for example,

2024-25

Chap 3.indd 34 19-Jul-19 3:16:32 PM


Brief Overview of Python 35

‘b’, ‘Global Citizen’), number (for example 10,71,80.52)


or any combination of alphanumeric (alphabets and Comments are used
numbers for example ‘b10’) characters. In Python, we to add a remark or
can use an assignment statement to create new variables a note in the source
and assign specific values to them. code. Comments
are not executed by
gender = 'M'
interpreter. They
message = "Keep Smiling" are added with the
price = 987.9 purpose of making
the source code
Variables must always be assigned values before easier for humans
they are used in the program, otherwise it will lead to understand. They
to an error. Wherever a variable name occurs in the are used primarily
program, the interpreter replaces it with the value of to document the
meaning and purpose
that particular variable.
of source code.
In Python, a single
Program 3-2 Write a Python program to find the sum line comment starts
with # (hash sign).
of two numbers. Everything following
the # till the end of
#Program 3-2
that line is treated
#To find the sum of two given numbers as a comment and
num1 = 10 the interpreter
num2 = 20 simply ignores it
result = num1 + num2 while executing the
statement.
print(result)
#print function in python displays the output
Output:
30
Program 3-3 Write a Python program to find the area
of a rectangle given that its length is 10
units and breadth is 20 units.
#Program 3-3
#To find the area of a rectangle
length = 10
breadth = 20
area = length * breadth
print(area)
Output:
200
3.5 Data Types
Every value belongs to a specific data type in Python.
Data type identifies the type of data which a variable
can hold and the operations that can be performed on
those data. Figure 3.6 enlists the data types available
in Python.

2024-25

Chap 3.indd 35 19-Jul-19 3:16:32 PM


36 Informatics Practices – Class XI

Dictionaries

Figure 3.6: Different Data Types in Python


3.5.1 Number
Number data type stores numerical values only. It is
further classified into three different types: int, float
and complex.
Table 3.2 Numeric data types
Type/ Class Description Examples
int integer numbers -12, -3, 0, 123, 2
float floating point numbers -2.04, 4.0, 14.23
complex complex numbers 3 + 4i, 2 - 2i

Boolean data type (bool) is a subtype of integer. It


is a unique data type, consisting of two constants, True
and False. Boolean True value is non-zero. Boolean
False is the value zero.
Let us now try to execute few statements in interactive
mode to determine the data type of the variable using
built-in function type().
Example 3.1
>>> quantity = 10
>>> type(quantity)
<class 'int'>

>>> Price = -1921.9


>>> type(price)
<class 'float'>
Variables of simple data types like integer, float, boolean
etc. hold single value. But such variables are not useful
to hold multiple data values, for example, names of the
months in a year, names of students in a class, names
and numbers in a phone book or the list of artefacts in a
museum. For this, Python provides sequence data types like
Strings, Lists, Tuples, and mapping data type Dictionaries.
3.5.2 Sequence
A Python sequence is an ordered collection of items,
where each item is indexed by an integer value. Three

2024-25

Chap 3.indd 36 19-Jul-19 3:16:32 PM


Brief Overview of Python 37

types of sequence data types available in Python are Notes


Strings, Lists and Tuples. A brief introduction to these
data types is as follows:
(A) String
String is a group of characters. These characters may be
alphabets, digits or special characters including spaces.
String values are enclosed either in single quotation
marks (for example ‘Hello’) or in double quotation marks
(for example “Hello”). The quotes are not a part of the
string, they are used to mark the beginning and end of
the string for the interpreter. For example,
>>> str1 = 'Hello Friend'
>>> str2 = "452"
We cannot perform numerical operations on strings,
even when the string contains a numeric value. For
example str2 is a numeric string.
(B) List
List is a sequence of items separated by commas and
items are enclosed in square brackets [ ]. Note that
items may be of different date types.

Example 3.2
#To create a list
>>> list1 = [5, 3.4, "New Delhi", "20C", 45]
#print the elements of the list list1
>>> list1
[5, 3.4, 'New Delhi', '20C', 45]
(C) Tuple
Tuple is a sequence of items separated by commas and
items are enclosed in parenthesis ( ). This is unlike list,
where values are enclosed in brackets [ ]. Once created,
we cannot change items in the tuple. Similar to List,
items may be of different data types.
Example 3.3
#create a tuple tuple1
>>> tuple1 = (10, 20, "Apple", 3.4, 'a')
#print the elements of the tuple tuple1
>>> print(tuple1)
(10, 20, "Apple", 3.4, 'a')

3.5.3 Mapping
Mapping is an unordered data type in Python. Currently,
there is only one standard mapping data type in Python
called Dictionary.

2024-25

Chap 3.indd 37 19-Jul-19 3:16:32 PM


38 Informatics Practices – Class XI

(A) Dictionary
Dictionary in Python holds data items in key-value pairs
and Items are enclosed in curly brackets { }. dictionaries
permit faster access to data. Every key is separated from
its value using a colon (:) sign. The key value pairs of
a dictionary can be accessed using the key. Keys are
usually of string type and their values can be of any data
type. In order to access any value in the dictionary, we
have to specify its key in square brackets [ ].
Example 3.4
#create a dictionary
>>> dict1 = {'Fruit':'Apple',
'Climate':'Cold', 'Price(kg)':120}
>>> print(dict1)
{'Fruit': 'Apple', 'Climate': 'Cold',
'Price(kg)': 120}
#getting value by specifying a key
Python compares two >>> print(dict1['Price(kg)'])
strings lexicographically 120
(According to the
theory and practice of
composing and writing
3.6 Operators
dictionary), using ASCII An operator is used to perform specific mathematical or
value of the characters. logical operation on values. The values that the operator
If the first character of
works on are called operands. For example, in the
both strings are same,
the second character is expression 10 + num, the value 10, and the variable num
compared, and so on. are operands and the + (plus) sign is an operator. Python
supports several kind of operators, their categorisation
is briefly explained in this section.
3.6.1 Arithmetic Operators
Python supports arithmetic operators (Table 3.3) to
perform the four basic arithmetic operations as well as
modular division, floor division and exponentiation.
'+' operator can also be used to concatenate two
strings on either side of the operator.
>>> str1 = "Hello"
>>> str2 = "India"
>>> str1 + str2
'HelloIndia'
'*' operator repeats the item on left side of the
operator if first operand is a string and second operand
is an integer value.
>>> str1 = 'India'
>>> str1 * 2
'IndiaIndia'

2024-25

Chap 3.indd 38 19-Jul-19 3:16:32 PM


Brief Overview of Python 39

Table 3.3 Arithmetic operators in Python


Operator Operation Description Example (Try in Lab)
+ Addition Adds two numeric values on >>> num1 = 5
either side of the operator >>> num2 = 6
>>> num1 + num2
11
- Subtraction Subtracts the operand on the >>> num1 = 5
right from the operand on the left >>> num2 = 6
>>> num1 - num2
-1

* Multiplication Multiplies the two values on both >>> num1 = 5


sides of the operator >>> num2 = 6
>>> num1 * num2
30
/ Division Divides the operand on the left >>> num1 = 5
by the operand on the right of the >>> num2 = 2
operator and returns the quotient >>> num1 / num2
2.5
% Modulus Divides the operand on the left >>> num1 = 13
by the operand on the right and >>> num2 = 5
returns the remainder >>> num1 % num2
3
// Floor Division Divides the operand on the left >>> num1 = 5
by the operand on the right and >>> num2 = 2
returns the quotient by removing >>> num1 // num2
the decimal part. It is sometimes 2
also called integer division. >>> num2 // num1
0
** Exponent Raise the base to the power of the >>> num1 = 3
exponent. That is, multiply the >>> num2 = 4
base as many times as given in >>> num1 ** num2
the exponent 81

Operators (+) and (*) work in similar manner for other


sequence data types like list and tuples.
3.6.2 Relational Operators
Relational operator compares the values of the operands
on its either side and determines the relationship among
them. Conside the given Python variables num1 = 10,
num2 = 0, num3 = 10, str1 = "Good", str2 =
"Afternoon" for the following examples in Table 3.4:
Table 3.4 Relational operators in Python
Operator Operation Description Example (Try in Lab)
>>> num1 == num2
If values of two operands are
Equals to False
== equal, then the condition is True,
>> str1 == str2
otherwise it is False.
False

2024-25

Chap 3.indd 39 19-Jul-19 3:16:32 PM


40 Informatics Practices – Class XI

>>> num1 != num2


If values of two operands are not True
equal, then condition is True, >>> str1 != str2
!= Not equal to
otherwise it is False True
>>> num1 != num3
False
If the value of the left operand is >>> num1 > num2
greater than the value of the right True
> Greater than
operand, then condition is True, >>> str1 > str2
otherwise it is False. True
If the value of the left operand is
less than the value of the right >>> num1 < num3
< Less than
operand, the condition is true False
otherwise it is False
Similarly, there are other relational operators like <=
and >=.
3.6.3 Assignment Operators
Assignment operator assigns or changes the value of
the variable on its left, as shown in Table 3.5.
Table 3.5 Assignment operators in Python
Operator Description Example (Try in Lab)
= Assigns value from right side operand to left >>> num1 = 2
side operand >>> num2 = num1
>>> num2
2
>>> country = 'India'
>>> country
'India'
+= It adds the value of right side operand to the left >>> num1 = 10
side operand and assigns the result to the left >>> num2 = 2
side operand. >>> num1 += num2
Note: x + = y is same as x = x + y >>> num1
12
>>> num2
2
-= It subtracts the value of right side operand >>> num1 = 10
from the left side operand and assigns the >>> num2 = 2
result to left side operand. >>> num1 -= num2
Note: x − = y is same as x = x − y >>> num1
8
Similarly, there are other assignment operators like
*=, /=, %=, //=, and **=.
3.6.4 Logical Operators
There are three logical operators (Table 3.6) supported
by Python. These operators (and, or, not) are to be
written in lower case only. The logical operator evaluates
to either True or False based on the logical operands
on its either side.

2024-25

Chap 3.indd 40 19-Jul-19 3:16:33 PM


Brief Overview of Python 41

Table 3.6 Logical operators in Python


Operator Operation Description Example (Try in Lab)
and Logical AND If both operands are True, >>> num1 = 10
then condition becomes >>> num2 = -20
True >>> num1 == 10 and num2 == -20
True
>>> num1 == 10 and num2 == 10
False

or Logical OR If any of the two operands >>> num1 = 10


are True, then condition >>> num2 = 2
becomes True >>> num1 >= 10 or num2 >= 10
True
>>> num1 <= 5 or num2 >= 10
False
not Logical NOT Used to reverse the logical >>> num1 = 10
state of its operand >>> not (num1 == 20)
True
>>> not (num1 == 10)
False

3.6.5 Membership Operators


Membership operator (Table 3.7) is used to check if a
value is a member of the given sequence or not.
Table 3.7 Membership operators in Python
Operator Description Example (Try in Lab)
in Returns True if the variable or value is found in the >>> numSeq = [1,2,3]
specified sequence and False otherwise >>> 2 in numSeq
True
>>> '1' in numSeq
False
#'1' is a string while
#numSeq contains number 1.
not in Returns True if the variable/value is not found in >>> numSeq = [1,2,3]
the specified sequence and False otherwise >>> 10 not in numSeq
True
>>> 1 not in numSeq
False

3.7 Expressions
An expression is defined as a combination of constants,
variables and operators. An expression always evaluates
to a value. A value or a standalone variable is also
considered as an expression but a standalone operator
is not an expression. Some examples of valid expressions
are given below.
(i) num – 20.4 (iii) 23/3 -5 * 7(14 -2)
(ii) 3.0 + 3.14 (iv) "Global"+"Citizen"

2024-25

Chap 3.indd 41 19-Jul-19 3:16:33 PM


42 Informatics Practices – Class XI

Notes 3.7.1 Precedence of Operators


So far we have seen different operators and examples
of their usage. When an expression contains more than
one operator, their precedence (order or hierarchy)
determines which operator should be applied first.
Higher precedence operator is evaluated before the
lower precedence operator. In the following example, '*'
and '/' have higher precedence than '+' and '-'.
Note:
a) Parenthesis can be used to override the precedence of
operators. The expression within () is evaluated first.
b) For operators with equal precedence, the expression
is evaluated from left to right.
Example 3.5 How will Python evaluate the following
expression?
20 + 30 * 40
Solution:
#precedence of * is more than that of +
= 20 + 1200 #Step 1
= 1220 #Step 2
Example 3.6 How will Python evaluate the following
expression?
(20 + 30) * 40
Solution:
= (20 + 30) * 40 # Step 1
#using parenthesis(), we have forced
precedence of + to be more than that of *
= 50 * 40 # Step 2
= 2000 # Step 3
Example 3.7 How will the following expression be
evaluated?
15.0 / 4.0 + (8 + 3.0)
Solution:
= 15.0 / 4.0 + (8.0 + 3.0) #Step 1
= 15.0 / 4.0 + 11.0 #Step 2
= 3.75 + 11.0 #Step 3
= 14.75 #Step 4

3.8 Input and Output


Sometimes, we need to enter data or enter choices into
a program. In Python, we have the input() function
for taking values entered by input device such as a
keyboard. The input() function prompts user to enter
data. It accepts all user input (whether alphabets,

2024-25

Chap 3.indd 42 19-Jul-19 3:16:33 PM


Brief Overview of Python 43

numbers or special character) as string. The syntax for Notes


input() is:
variable = input([Prompt])
Prompt is the string we may like to display on the
screen prior to taking the input, but it is optional. The
input() takes exactly what is typed from the keyboard,
converts it into a string and assigns it to the variable on
left hand side of the assignment operator (=).
Example 3.8
>>> fname = input("Enter your first name: ")
Enter your first name: Arnab
>>> age = input("Enter your age: ")
Enter your age: 19
The variable fname gets the string ‘Arnab’ as input.
Similarly, the variable age gets '19' as string. We can
change the datatype of the string data accepted from
user to an appropriate numeric value. For example, the
int() function will convert the accepted string to an
integer. If the entered string is non-numeric, an error
will be generated.
Example 3.9
#function int() to convert string to integer
>>> age = int(input("Enter your age: "))
Enter your age: 19
>>> type(age)
<class 'int'>
Python uses the print() function to output data
to standard output device — the screen. The function
print() evaluates the expression before displaying it
on the screen. The syntax for print() is:
print(value)
Example 3.10
Statement Output
print("Hello") Hello
print(10*2.5) 25.0

3.9 Debugging
Due to errors, a program may not execute or may
generate wrong output. :
i) Syntax errors
ii) Logical errors
iii) Runtime errors

2024-25

Chap 3.indd 43 19-Jul-19 3:16:33 PM


44 Informatics Practices – Class XI

Notes 3.9.1 Syntax Errors


Like any programming language, Python has rules that
determine how a program is to be written. This is called
syntax. The interpreter can interpret a statement of a
program only if it is syntactically correct. For example,
parentheses must be in pairs, so the expression (10 +
12) is syntactically correct, whereas (7 + 11 is not due
to absence of right parenthesis. If any syntax error is
present, the interpreter shows error message(s) and
stops the execution there. Such errors need to be
removed before execution of the program.
3.9.2 Logical Errors
A logical error/bug (called semantic error) does not stop
execution but the program behaves incorrectly and
produces undesired /wrong output. Since the program
interprets successfully even when logical errors are
present in it, it is sometimes difficult to identify these
errors.
For example, if we wish to find the average of two
numbers 10 and 12 and we write the code as 10 + 12/2,
it would run successfully and produce the result 16,
which is wrong. The correct code to find the average
should have been (10 + 12) /2 to get the output as 11.
3.9.3 Runtime Error
A runtime error causes abnormal termination of
program while it is executing. Runtime error is when the
statement is correct syntactically, but the interpreter
can not execute it.
For example, we have a statement having division
operation in the program. By mistake, if the denominator
value is zero then it will give a runtime error like “division
by zero”.
The process of identifying and removing logical
errors and runtime errors is called debugging. We need
to debug a program so that is can run successfully and
generate the desired output.

3.10 Functions
A function refers to a set of statements or instructions
grouped under a name that perform specified tasks.
For repeated or routine tasks, we define a function. A
function is defined once and can be reused at multiple

2024-25

Chap 3.indd 44 19-Jul-19 3:16:33 PM


Brief Overview of Python 45

places in a program by simply writing the function Notes


name, i.e., by calling that function.
Suppose we have a program which requires to
calculate compound interest at multiple places. Now
instead of writing the formula to calculate the interest
every time, we can create a function called CalcCompInt
and inside that function we write the code to take
inputs (like interest rate, duration, principle), calculate
interest, and display output. We can simply call the
function by writing the function name CalcCompInt
whenever compound interest is to be computed and
thus reuse the code to save time and efforts.
Python has many predefined functions called built‑in
functions. We have already used two built-in functions
print() and input(). A module is a python file in
which multiple functions are grouped together. These
functions can be easily used in a Python program by
importing the module using import command. Use
of built‑in functions makes programming faster and
efficient. To use a built‑in function we must know the
following about that function:
• Function Name — name of the function.
• Arguments — While calling a function, we may pass
value(s), called argument, enclosed in parenthesis,
to the function. The function works based on these
values. A function may or may not have argument(s).
• Return Value − A function may or may not return one
or more values. A function performs operations on the
basis of argument (s) passed to it and the result is
passed back to the calling point. Some functions do
not return any value.
Let us consider the following Python program using
three built-in functions input(), int() and print():
#Calculate square of a number
num = int(input("Enter the first number"))
square = num * num
print("the square of", num, " is ", square)
Observe:
• Two built‑in functions are used in the first statement,
int() and input(). The third line has a function
print().
• The input function accepts an argument, “Enter your
name”. Argument(s) is the value(s) passed within
the parenthesis.

2024-25

Chap 3.indd 45 19-Jul-19 3:16:33 PM


46 Informatics Practices – Class XI

Notes • Similarly the print function has four arguments "the


square of", num, "is", square separated by
commas.
• The int function in the first line takes as argument
the value entered by the user from the keyboard and
converts it into a string and returns it. Thus the
return value from the int() function is an integer.
Some of the most commonly used built‑in
functions in Python are listed in Table 3.8 under four
broad categories.
Table 3.8 Some commonly used built-in functions in
Python

Input/ Datatype Mathematical Other


Output Conversion Functions Functions
input() bool() abs() __import__()
print() chr() divmod() len()
dict() max() range()
float() min() type()
int() pow()
list() sum()
ord()
set()
str()
tuple()

3.11 if..else Statements


Usually statements in a program are executed one after
another. However, there are situations when we have
more than one option to choose from, based on the
outcome of certain conditions. This can be done using if..
else conditional statements. Conditional statements let
us write program to do different tasks or take different
paths based on the outcome of the conditions.
There are three ways to write if..else statements:
• if statement — executes the statement(s) inside if
when the condition is true. ‘
Example 3.11
age = int(input("Enter your age "))
if age >= 18: # use ‘:’ to indicate end of
condition.
print("Eligible to vote")

2024-25

Chap 3.indd 46 19-Jul-19 3:16:33 PM


Brief Overview of Python 47

• if...else statement executes the statement(s)


inside if when the condition is true, otherwise
Python uses
executes the statement(s) inside else (when the indentation for block
condition is false) as well as for nested
#Program to subtract smaller number from the block structures.
#larger number and display the difference. Leading whitespace
(spaces and tabs)
num1 = int(input("Enter first number: "))
at the beginning
num2 = int(input("Enter second number: ")) of a statement is
if num1 > num2: called indentation.
diff = num1 - num2 In Python, the same
level of indentation
else: associates statements
diff = num2 - num1 into a single block of
print("The difference of",num1,"and",num2, code. The interpreter
"is",diff) checks indentation
levels very strictly
Output: and throws up syntax
Enter first number: 5 errors if indentation
is not correct. It is
Enter second number: 6
a common practice
The difference of 5 and 6 is 1 to use a single tab
• if...elif....else is use dot check multiple for each level of
indentation.
conditions and execute statements accordingly.
Meaning of elif is elseif. We can also write elseif
instead of elif for more clarity.

Example 3.12 Check whether a number is positive,


negative, or zero.
number = int(input("Enter a number: ")
if number > 0:
print("Number is positive")
elif number < 0:
print("Number is negative")
else:
print("Number is zero")
When the conditional statements appear, the
Python interpreter executes code inside one block that
is selected based on the condition. Number of elif is
dependent on the number of conditions to be checked.
If the first condition is false, then the next condition
is checked, and so on. If one of the conditions is true,
then the corresponding indented block executes, and
the if statement terminates. After that, the statements
outside the if..else are executed or the program
terminates if there are no further statements.

2024-25

Chap 3.indd 47 19-Jul-19 3:16:34 PM


48 Informatics Practices – Class XI

3.12 For Loop


Sometimes we need to repeat certain things for a
particular number of times. For example, a program has
to display attendance for every student of a class. Here
the program has to execute the print statement for
every student. In programming, this kind of repetition
is called looping or iteration, and it is done using for
statement. The for statement is used to iterate over
a range of values or a sequence. The loop is executed
for each item in the range. The values can be numeric,
string, list, or tuple.
When all the items in the range are exhausted, the
statements within loop are not executed and Python
interpreter starts executing the statements immediately
following the for loop. While using for loop, we should
know in advance the number of times the loop will
execute.
Syntax of the for Loop:
for <control-variable> in <sequence/items in
range>:
<statements inside body of the
loop>
Program 3-4 Program to
print even
numbers in a
given sequence
using for loop.
#Program 3-4
#Print even numbers in the given sequence
numbers = [1,2,3,4,5,6,7,8,9,10]
for num in numbers:
if (num % 2) == 0:
print(num,'is an even Number')
Output:
2 is an even Number
4 is an even Number
6 is an even Number
8 is an even Number
10 is an even Number
Note: Body of the loop is indented with respect to the for statement.

2024-25

Chap 3.indd 48 19-Jul-19 3:16:34 PM


Brief Overview of Python 49

3.12.1 The range() Function


The range() is a built-in function in Python. Syntax of
range() function is:
range([start], stop[, step])
It is used to create a list containing a sequence of
integers from the given start value upto stop value
(excluding stop value), with a difference of the given
step value. If start value is not specified, by default
the list starts from 0. If step is also not specified, by
default the value is incremented by 1 in each iteration.
All parameters of range() function must be integers. The
step parameter can be a positive or a negative integer
excluding zero.

Example 3.13
>>> list(range(10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
#start value is given as 2
>>> list(range(2, 10))
[2, 3, 4, 5, 6, 7, 8, 9]
#step value is 5 and start value is 0
>>> list(range(0, 30, 5))
[0, 5, 10, 15, 20, 25]
#step value is -1. Hence, decreasing
#sequence is generated
>>> list(range(0, -9, -1))
[0, -1, -2, -3, -4, -5, -6, -7, -8]
The function range() is often used in for loops for
generating a sequence of numbers.
Program 3-5 Program to print the multiples of 10 for
numbers in a given range.
#Program 3-5
#Print multiples of 10 for numbers in a range
for num in range(5):
if num > 0:
print(num * 10)
Output:
10
20
30
40

2024-25

Chap 3.indd 49 19-Jul-19 3:16:34 PM


50 Informatics Practices – Class XI

3.13 Nested Loops


A loop may contain another loop inside it. A loop inside
another loop is called a nested loop.
Program 3-6 Program to
demonstrate
working of
nested for
loops.

#Program 3-6
#Demonstrate working of nested for loops
for var1 in range(3):
print( "Iteration " + str(var1 + 1) + " of outer loop")
for var2 in range(2): #nested loop
print(var2 + 1)
print("Out of inner loop")
print("Out of outer loop")
Output:
Iteration 1 of outer loop
1
2
Out of inner loop
Iteration 2 of outer loop
1
2
Out of inner loop
Iteration 3 of outer loop
1
2
Out of inner loop
Out of outer loop

2024-25

Chap 3.indd 50 19-Jul-19 3:16:34 PM


Brief Overview of Python 51

Summary Notes
• Python is an open-source, high level, interpreter-
based language that can be used for a multitude of
scientific and non-scientific computing purposes.
• Comments are non-executable statements in a
program.
• An identifier is a user defined name given to a
variable or a constant in a program.
• Process of identifying and removing errors from a
computer program is called debugging.
• Trying to use a variable that has not been assigned
a value gives an error.
• There are several data types in Python — integer,
boolean, float, complex, string, list, tuple, sets,
None and dictionary.
• Operators are constructs that manipulate the value
of operands. Operators may be unary or binary.
• An expression is a combination of values, variables,
and operators.
• Python has input() function for taking user input.
• Python has print() function to output data to a
standard output device.
• The if statement is used for decision making.
• Looping allows sections of code to be executed
repeatedly under some condition.
• for statement can be used to iterate over a range
of values or a sequence.
• The statements within the body of for loop are
executed till the range of values is exhausted.

Exercise
1. Which of the following identifier names are invalid and
why?
a) Serial_no. e) Total_Marks
b) 1st_Room f) total-Marks
c) Hundred$ g) _Percentage
d) Total Marks h) True

2024-25

Chap 3.indd 51 19-Jul-19 3:16:34 PM


52 Informatics Practices – Class XI

Notes 2. Write the corresponding Python assignment statements:


a) Assign 10 to variable length and 20 to variable
breadth.
b) Assign the average of values of variables length and
breadth to a variable sum.
c) Assign a list containing strings ‘Paper’, ‘Gel Pen’, and
‘Eraser’ to a variable stationery.
d) Assign the strings ‘Mohandas’, ‘Karamchand’, and
‘Gandhi’ to variables first, middle and last.
e) Assign the concatenated value of string variables
first, middle and last to variable fullname. Make sure
to incorporate blank spaces appropriately between
different parts of names.

3. Which data type will be used to represent the following


data values and why?
a) Number of months in a year
b) Resident of Delhi or not
c) Mobile number
d) Pocket money
e) Volume of a sphere
f) Perimeter of a square
g) Name of the student
h) Address of the student

4. Give the output of the following when num1 = 4, num2 =


3, num3 = 2
a) num1 += num2 + num3
b) print (num1)
c) num1 = num1 ** (num2 + num3)
d) print (num1)
e) num1 **= num2 + c
f) num1 = '5' + '5'
g) print(num1)
h) print(4.00/(2.0+2.0))
i) num1 = 2+9*((3*12)-8)/10
j) print(num1)
k) num1 = float(10)
l) print (num1)
m) num1 = int('3.14')

2024-25

Chap 3.indd 52 19-Jul-19 3:16:34 PM


Brief Overview of Python 53

n) print (num1) Notes


o) print(10 != 9 and 20 >= 20)
p) print(5 % 10 + 10 < 50 and 29 <= 29)

5. Categorise the following as syntax error, logical error or


runtime error:
a) 25 / 0
b) num1 = 25; num2 = 0; num1/num2
6. Write a Python program to calculate the amount payable
if money has been lent on simple interest. Principal or
money lent = P, Rate = R% per annum and Time = T
years. Then Simple Interest (SI) = (P x R x T)/ 100.
Amount payable = Principal + SI.
P, R and T are given as input to the program.

7. Write a program to repeat the string ‘‘GOOD MORNING”


n times. Here n is an integer entered by the user.

8. Write a program to find the average of 3 numbers.

9. Write a program that asks the user to enter one's name


and age. Print out a message addressed to the user that
tells the user the year in which he/she will turn 100
years old.

10. What is the difference between else and elif construct


of if statement?

11. Find the output of the following program segments:


a) for i in range(20,30,2):
print(i)
b) country = 'INDIA'
for i in country:
print (i)
c) i = 0; sum = 0
while i < 9:
if i % 4 == 0:
sum = sum + i
i = i + 2
print (sum)

Case Study Based Question


Schools use “Student Management Information System”
(SMIS) to manage student related data. This system provides
facilities for:

2024-25

Chap 3.indd 53 19-Jul-19 3:16:34 PM


54 Informatics Practices – Class XI

Notes

• Recording and maintaining personal details of students.


• Maintaining marks scored in assessments and computing
results of students.
• Keeping track of student attendance, and
• Managing many other student-related data in the school.
Let us automate the same step by step.
Identify the personal details of students from your school
identity card and write a program to accept these details for
all students of your school and display them in this format.

2024-25

Chap 3.indd 54 19-Jul-19 3:16:34 PM


Working with Chapter
Lists and
Dictionaries 4
In this chapter

»» Introduction to List
»» List Operations
“Computer Science is a science of »» Traversing a List
abstraction – creating the right model for »» List Methods and Built-
a problem and devising the appropriate in Functions

mechanizable techniques to solve it.” »» List Manipulation


»» Introduction to
Dictionaries
— A. Aho and J. Ullman »» Traversing a Dictionary
»» Dictionary Methods and
Built-in Functions
»» Manipulating
4.1 Introduction to List Dictionaries
The data type list is an ordered sequence which is
mutable and made up of one or more elements. Unlike a
string which consists of only characters, a list can have
elements of different data types such as integer, float,
string, tuple or even another list. A list is very useful to
group elements of mixed data types. Elements of a list
are enclosed in square brackets and are separated by
comma.
Example 4.1
#list1 is the list of six even numbers
>>> list1 = [2,4,6,8,10,12]
>>> print(list1)
[2, 4, 6, 8, 10, 12]

2024-25

Chap 4.indd 55 19-Jul-19 3:31:20 PM


56 Informatics Practices – Class XI

Notes #list2 is the list of vowels


>>> list2 = ['a','e','i','o','u']
>>> print(list2)
['a', 'e', 'i', 'o', 'u']

#list3 is the list of mixed data types


>>> list3 = [100,23.5,'Hello']
>>> print(list3)
[100, 23.5, 'Hello']

#list4 is the list of lists called nested


#list
>>> list4 =[['Physics',101],['Chemistry',202],
['Mathematics',303]]
>>> print(list4)
[['Physics', 101], ['Chemistry', 202],
['Mathematics', 303]]

4.1.1 Accessing Elements in a List


Each element in list is accessed using value called index.
The fist index value is 0, the second index is 1 and so
on. Elements in the list are assigned index values in
increasing order sterling from 0.
To access an element, use square brackets with
the index [] value of that element. We may also use
negative index value to access elements starting from
the last element in the list, having index value -0.
#initialing a list named list1
>>> list1 = [2,4,6,8,10,12]
>>> list1[0] #returns first element of list1
2
>>> list1[3] #returns fourth element of list1
8
#Out of range index value for the list returns error
>>> list1[15]
IndexError: list index out of range
#an expression resulting in an integer index
>>> list1[1+4]
12
>>> list1[-1] #return first element from right
12
#length of the list1 is assigned to n
>>> n = len(list1)
>>> print(n)
6
#Get the last element of the list1
>>> list1[n-1]
12

2024-25

Chap 4.indd 56 19-Jul-19 3:31:20 PM


Working with Lists and Dictionaries 57

#Get the first element of list1


>>> list1[-n]
2

4.1.2 Lists are Mutable


In Python, lists are mutable. It means that the contents
of the list can be changed after it has been created.
#List list1 of colors
>>> list1 = ['Red','Green','Blue','Orange']
#change/override the fourth element of list1
>>> list1[3] = 'Black'
>>> list1 #print the modified list list1
['Red', 'Green', 'Blue', 'Black']

4.2 List Operations


The data type list allows manipulation of its contents
through various operations as shown below.
4.2.1 Concatenation
Python allows us to join two or more lists using Concatenation is the
merging of two or
concatenation operator using symbol +. more values. Example:
#list1 is list of first five odd integers we can concatenate
>>> list1 = [1,3,5,7,9] strings together.
#list2 is list of first five even integers
>>> list2 = [2,4,6,8,10]
#Get elements of list1 followed by list2
>>> list1 + list2
[1, 3, 5, 7, 9, 2, 4, 6, 8, 10]
>>> list3 = ['Red','Green','Blue']
>>> list4 = ['Cyan', 'Magenta', 'Yellow'
,'Black']
>>> list3 + list4
['Red','Green','Blue','Cyan','Magenta',
'Yellow','Black']
Note that, there is no change in original lists i.e.,
list1, list2, list3, list4 remain the same after
concatenation operation. If we want to use the result of
two concatenated lists, we should use an assignment
operator.
For example,
#Join list 2 at the end of list
>>> new List = list 1 + list 2
[1, 3, 5, 7, 9, 2, 4, 6, 8, 10]
>> new list The concatenation operator '+’ requires that
the operands should be of list type only. If we try to
concatenate a list with elements of some other data
type, TypeError occurs.

2024-25

Chap 4.indd 57 19-Jul-19 3:31:20 PM


58 Informatics Practices – Class XI

Notes >>> list1 = [1,2,3]


>>> str1 = "abc"
>>> list1 + str1
TypeError: can only concatenate list (not
"str") to list

4.2.2 Repetition
Python allows us to replicate the contents of a list using
repetition operator depicted by symbol *.
>>> list1 = ['Hello']
#elements of list1 repeated 4 times
>>> list1 * 4
['Hello', 'Hello', 'Hello', 'Hello']

4.2.3 Membership
The membership operator in checks if the element
is present in the list and returns True, else returns
False.
>>> list1 = ['Red','Green','Blue']
>>> 'Green' in list1
True
>>> 'Cyan' in list1
False
The Operator not in transpose returns True if the
element is not present in the list, else it returns False.
>>> list1 = ['Red','Green','Blue']
>>> 'Cyan' not in list1
True
>>> 'Green' not in list1
False

4.2.4 Slicing
Slicing operations allow us to create new list by taking
out elements from an existing list.
>>> list1 =['Red','Green','Blue','Cyan',
'Magenta','Yellow','Black']
#subject from indexes 2 to 5 of list 1
>>> list1[2:6]
['Blue', 'Cyan', 'Magenta', 'Yellow']

#list1 is truncated to the end of the list


>>> list1[2:20] #second index is out of range
['Blue', 'Cyan', 'Magenta', 'Yellow',
'Black']

>>> list1[7:2] #first index > second index


[] #results in an empty list

2024-25

Chap 4.indd 58 19-Jul-19 3:31:20 PM


Working with Lists and Dictionaries 59

#return sublist from index 0 to 4


>>> list1[:5] #first index missing
['Red','Green','Blue','Cyan','Magenta']

#slicing with a given step size


>>> list1[0:6:2]
['Red','Blue','Magenta']
#negative indexes
#elements at index -6,-5,-4,-3 are sliced
>>> list1[-6:-2]
['Green','Blue','Cyan','Magenta']

#both first and last index missing


>>> list1[::2] #step size 2 on entire list
['Red','Blue','Magenta','Black']

#Access list in the reverse order using


negative step size
>>> list1[::-1]
['Black','Yellow','Magenta','Cyan','Blue',
'Green','Red']

4.3 Traversing a List


We can access each element of the list or traverse a list
using a for loop or a while loop.
(A) List traversal using for loop:
>>> list1 = ['Red','Green','Blue','Yellow',
'Black']
>>> for item in list1:
print(item) len (list1) returns
Output: the length or total
Red number of elements of
Green list1.
range(n) returns a
Blue
sequence of numbers
Yellow starting from 0,
Black increases by 1 and ends
Another way of accessing the elements of the list is at n-1 (one number
less than the specified
using range() and len() functions: number i.e. is)
>>> for i in range(len(list1)):
print(list1[i])
Output:
Red
Green
Blue
Yellow
Black

2024-25

Chap 4.indd 59 19-Jul-19 3:31:20 PM


60 Informatics Practices – Class XI

4.4 List Methods and Built-in Functions


The data type list has several built-in methods that
are useful in programming. Some of them are listed in
Table 4.1.

Table 4.1 Built-in functions for list manipulation


Method Description Example

len() Returns the length of the list passed as >>> list1 = [10,20,30,40,50]
the argument >>> len(list1)
5

list() Creates an empty list if no argument is >>> list1 = list()


passed >>> list1
[ ]

>>> str1= 'aeiou'


Creates a list if a sequence is passed as >>> list1 = list(str1)
an argument >>> list1
['a', 'e', 'i', 'o', 'u']

append() Appends a single element passed as an >>> list1 = [10,20,30,40]


argument at the end of the list >>> list1.append(50)
>>> list1
A list can also be appended as an ele- [10, 20, 30, 40, 50]
ment to an existing list >>> list1 = [10,20,30,40]
>>> list1.append([50,60])
>>> list1
[10, 20, 30, 40, [50, 60]]

extend() Appends each element of the list passed >>> list1 = [10,20,30]
as argument at the end of the given list >>> list2 = [40,50]
>>> list1.extend(list2)
>>> list1
[10, 20, 30, 40, 50]

insert() Inserts an element at a particular index >>> list1 = [10,20,30,40,50]


in the list #inserts element 25 at index value 2
>>> list1.insert(2,25)
>>> list1
[10, 20, 25, 30, 40, 50]
>>> list1.insert(0,100)
>>> list1
[100, 10, 20, 25, 30, 40, 50]

2024-25

Chap 4.indd 60 19-Jul-19 3:31:20 PM


Working with Lists and Dictionaries 61

count() Returns the number of times a given >>> list1 = [10,20,30,10,40,10]


element appears in the list >>> list1.count(10)
3
>>> list1.count(90)
0

find() Returns index of the first occurrence of >>> list1 = [10,20,30,20,40,10]


the element in the list. If the element is >>> list1.index(20)
not present, ValueError is generated 1
>>> list1.index(90)
ValueError: 90 is not in list
remove() Removes the given element from the >>> list1 = [10,20,30,40,50,30]
list. If the element is present multi- >>> list1.remove(30)
ple times, only the first occurrence is >>> list1
removed. If the element is not present, [10, 20, 40, 50, 30]
then ValueError is generated >>> list1.remove(90)
ValueError:list.remove(x):x not in
list

pop() Returns the element whose index is >>> list1 = [10,20,30,40,50,60]


passed as argument to this function >>> list1.pop(3)
and also removes it from the list. If no 40
argument is given, then it returns and >>> list1
removes the last element of the list [10, 20, 30, 50, 60]
>>> list1 = [10,20,30,40,50,60]
>>> list1.pop()
60
>>> list1
[10, 20, 30, 40, 50]
reverse() Reverses the order of elements in the >>> list1 = [34,66,12,89,28,99]
given list >>> list1.reverse()
>>> list1
[ 99, 28, 89, 12, 66, 34]

>>> list1 = [ 'Tiger' ,'Zebra' ,


'Lion' , 'Cat' ,'Elephant' ,'Dog']
>>> list1.reverse()
>>> list1
['Dog', 'Elephant', 'Cat', 'Lion',
'Zebra', 'Tiger']
sort() Sorts the elements of the given list in >>>list1 = ['Tiger','Zebra','Lion',
place 'Cat', 'Elephant' ,'Dog']
>>> list1.sort()
>>> list1
['Cat', 'Dog', 'Elephant', 'Lion',
'Tiger', 'Zebra']

>>> list1 = [34,66,12,89,28,99]


>>> list1.sort(reverse = True)
>>>list1
[99,89,66,34,28,12]

2024-25

Chap 4.indd 61 19-Jul-19 3:31:20 PM


62 Informatics Practices – Class XI

sorted() It takes a list as parameter and creates >>>list1 = [23,45,11,67,85,56]


a new list consisting of the same ele- >>> list2 = sorted(list1)
ments but arranged in ascending order >>> list1
[23, 45, 11, 67, 85, 56]
>>> list2
[11, 23, 45, 56, 67, 85]
min() Returns minimum or smallest element >>> list1 = [34,12,63,39,92,44]
of the list >>> min(list1)
12
max() Returns maximum or largest element of >>> max(list1)
the list 92
sum() Returns sum of the elements of the list >>> sum(list1)
284

4.5 List Manipulation


In this chapter, we have learnt to create a list and the different ways to
manipulate lists. In the following programs, we will apply the various list
manipulation methods.

Program 4-1 Write a program to allow user to perform any those list operation
given in a menu. The menu is:
1. Append an element
2. Insert an element
3. Append a list to the given list
4. Modify an existing element
5. Delete an existing element from its position
6. Delete an existing element with a given value
7. Sort the list in the ascending order
8. Sort the list in descending order
9. Display the list.
#Program 4-1
#Menu driven program to do various list operations
myList = [22,4,16,38,13] #myList having 5 elements
choice = 0
For attempt in range (3): print ("Attempt number:", attempt)
print("The list 'myList' has the following elements", myList)
print("\nL I S T O P E R A T I O N S")
print(" 1. Append an element")
print(" 2. Insert an element at the desired position")
print(" 3. Append a list to the given list")
print(" 4. Modify an existing element")
print(" 5. Delete an existing element by its position")
print(" 6. Delete an existing element by its value")

2024-25

Chap 4.indd 62 19-Jul-19 3:31:20 PM


Working with Lists and Dictionaries 63

print(" 7. Sort the list in ascending order")


print(" 8. Sort the list in descending order")
print(" 9. Display the list")
choice = int(input("ENTER YOUR CHOICE (1-9): "))

#append element
if choice == 1:
element = eval(input("Enter the element to be appended: "))
myList.append(element)
print("The element has been appended\n")

#insert an element at desired position


elif choice == 2:
element = eval(input("Enter the element to be inserted: "))
pos = int(input("Enter the position:"))
myList.insert(pos,element)
print("The element has been inserted\n")

#append a list to the given list


elif choice == 3:
newList = eval(input("Enter the list to be appended: "))
myList.extend(newList)
print("The list has been appended\n")

#modify an existing element


elif choice == 4:
i = int(input("Enter the position of the element to be
modified: "))
if i < len(myList):
newElement = eval(input("Enter the new element: "))
oldElement = myList[i]
myList[i] = newElement
print("The element",oldElement,"has been modified\n")
else:
print("Position of the element is more then the length
of list")

#delete an existing element by position


elif choice == 5:
i = int(input("Enter the position of the element to be
deleted: "))
if i < len(myList):
element = myList.pop(i)
print("The element",element,"has been deleted\n")
else:
print("\nPosition of the element is more then the length
of list")

2024-25

Chap 4.indd 63 19-Jul-19 3:31:20 PM


64 Informatics Practices – Class XI

#delete an existing element by value


elif choice == 6:
element = int(input("\nEnter the element to be deleted: "))
if element in myList:
myList.remove(element)
print("\nThe element",element,"has been deleted\n")
else:
print("\nElement",element,"is not present in the list")

#list in sorted order


elif choice == 7:
myList.sort()
print("\nThe list has been sorted")

#list in reverse sorted order


elif choice == 8:
myList.sort(reverse = True)
print("\nThe list has been sorted in reverse order")

#display the list


elif choice == 9:
print("\nThe list is:", myList)
else:
print("Choice is not valid")
Output:
The list 'myList' has the following elements [22, 4, 16, 38, 13]
Attempt number : 1
L I S T O P E R A T I O N S
1. Append an element
2. Insert an element at the desired position
3. Append a list to the given list
4. Modify an existing element
5. Delete an existing element by its position
6. Delete an existing element by its value
7. Sort the list in ascending order
8. Sort the list in descending order
9. Display the list
ENTER YOUR CHOICE (1-10): 8

The list has been sorted in reverse order


The list 'myList' has the following elements [38, 22, 16, 13, 4]
Attempt number : 2
L I S T O P E R A T I O N S
1. Append an element

2024-25

Chap 4.indd 64 19-Jul-19 3:31:20 PM


Working with Lists and Dictionaries 65

2. Insert an element at the desired position


3. Append a list to the given list
4. Modify an existing element
5. Delete an existing element by its position
6. Delete an existing element by its value
7. Sort the list in ascending order
8. Sort the list in descending order
9. Display the list
ENTER YOUR CHOICE (1-9) 5
Enter the position of the element to be deleted: 2
The element 16 has been deleted

The list 'myList' has the following elements [38, 22, 13, 4]
Attempt number : 3
L I S T O P E R A T I O N S
1. Append an element
2. Insert an element at the desired position
3. Append a list to the given list
4. Modify an existing element
5. Delete an existing element by its position
6. Delete an existing element by its value
7. Sort the list in ascending order
8. Sort the list in descending order
9. Display the list
ENTER YOUR CHOICE (1-9) 10
choice is not valid

Program 4-2 A program to calculate average marks


of n students where n is entered by
the user.
#Program 4-2
#create an empty list
list1 = []
print("How many students marks you want to enter: ")
n = int(input())
for i in range(0,n):
print("Enter marks of student",(i+1),":")
marks = int(input())
#append marks in the list
list1.append(marks)
#initialize total
total = 0
for marks in list1:

2024-25

Chap 4.indd 65 19-Jul-19 3:31:20 PM


66 Informatics Practices – Class XI

#add marks to total


total = total + marks
average = total / n
print("Average marks of",n,"students is:",average)
Output:
How many students marks you want to enter:
5
Enter marks of student 1:
45
Enter marks of student 2:
89
Enter marks of student 3:
79
Enter marks of student 4:
76
Enter marks of student 5:
55
Average marks of 5 students is: 68.8

Program 4-3 Write a program to check if a number is


present in the list or not. If the number
is present, print the position of the
number. Print an appropriate message if
the number is not present in the list.
#Program 4-3
list1 = [] #Create an empty list
print("How many numbers do you want to enter in the list: ")
maximum = int(input())
print("Enter a list of numbers: ")
for i in range(0,maximum):
n = int(input())
list1.append(n) #append numbers to the list
num = int(input("Enter the number to be searched: "))

position = -1
for i in range (0, lin (list1)
if list1[i] == num: #number is present
position = i+1 #save the position of number
if position == -1 :
print("Number",num,"is not present in the list")
else:
print("Number",num,"is present at",position + 1, "position")
Output:
How many numbers do you want to enter in the list
5

2024-25

Chap 4.indd 66 19-Jul-19 3:31:20 PM


Working with Lists and Dictionaries 67

Enter a list of numbers:


23
567
12
89
324
Enter the number to be searched:12
Number 12 is present at 3 position

4.6 Introduction to Dictionaries

The data type dictionary falls under mapping. It is a


mapping between a set of keys and a set of values. The
key-value pair is called an item. A key is separated from
its value by a colon(:) and consecutive items are separated
by commas. Items in dictionaries are unordered, so we
may not get back the data in the same order in which
we had entered the data initially in the dictionary.
4.6.1 Creating a Dictionary
To create a dictionary, the items entered are separated
by commas and enclosed in curly braces. Each item is
a key value pair, separated through colon (:). The keys
in the dictionary must be unique and should be of any
immutable data type i.e. number, string or tuple. The
values can be repeated and can be of any data type.
Example 4.2
#dict1 is an empty dictionary
>>> dict1 = {}
>>> dict1
{}
#dict3 is the dictionary that maps names of
#the students to marks in percentage
>>> dict3 = {'Mohan':95,'Ram':89,'Suhel':92,
'Sangeeta':85}
>>> dict3
{'Mohan': 95, 'Ram': 89, 'Suhel': 92,
'Sangeeta': 85}

4.6.2 Accessing Items in a Dictionary


We have already seen that the items of a sequence
(string, list and tuple) are accessed using a technique
called indexing. The items of a dictionary are accessed
via the keys rather than via their relative positions
or indices. Each key serves as the index and maps to
a value.

2024-25

Chap 4.indd 67 19-Jul-19 3:31:20 PM


68 Informatics Practices – Class XI

Notes The following example shows how a dictionary


returns the value corresponding to the given key:
>>> dict3 = {'Mohan':95,'Ram':89,'Suhel':92,
'Sangeeta':85}
>>> dict3['Ram']
89
>>> dict3['Sangeeta']
85
#using unspecified key
>>> dict3['Shyam']
KeyError: 'Shyam'
In the above examples the key 'Ram' always maps to the
value 89 and key 'Sangeeta' always maps to the value
85. So the order of items does not matter. If the key is not
present in the dictionary we get KeyError.

4.6.3 Membership Operation


The membership operator in checks if the key is present
in the dictionary and returns True, else it returns False.
>>> dict1 = {'Mohan':95,'Ram':89,'Suhel':92,
'Sangeeta':85}
>>> 'Suhel' in dict1
True
The not in operator returns True if the key is not
present in the dictionary, else it returns False.
>>> dict1 = {'Mohan':95,'Ram':89,'Suhel':92,
'Sangeeta':85}
>>> 'Suhel' not in dict1
False

4.6.4 Dictionaries are Mutable


Dictionaries are mutable which implies that the
contents of the dictionary can be changed after it has
been created.
(A) Adding a new item
We can add a new item to the dictionary as shown in
the following example:
>>> dict1 = {'Mohan':95,'Ram':89,'Suhel':92,
'Sangeeta':85}
>>> dict1['Meena'] = 78
>>> dict1
{'Mohan': 95, 'Ram': 89, 'Suhel': 92,
'Sangeeta': 85, 'Meena': 78}

2024-25

Chap 4.indd 68 19-Jul-19 3:31:20 PM


Working with Lists and Dictionaries 69

(B) Modifying an existing item


The existing dictionary can be modified by just
overwriting the key-value pair. Example to modify a
given item in the dictionary:
>>> dict1 = {'Mohan':95,'Ram':89,'Suhel':92,
'Sangeeta':85}
#Marks of Suhel changed to 93.5
>>> dict1['Suhel'] = 93.5
>>> dict1
{'Mohan': 95, 'Ram': 89, 'Suhel': 93.5,
'Sangeeta': 85}

4.7 Traversing a Dictionary


We can access each item of the dictionary or traverse a
dictionary using for loop.
>>> dict1 = {'Mohan':95,'Ram':89,'Suhel':92,
'Sangeeta':85}
Method 1:
>>> for key in dict1:
print(key,':',dict1[key])
Mohan: 95
Ram: 89
Suhel: 92
Sangeeta: 85
Method 2:
>>> for key,value in dict1.items():
print(key,':',value)
Mohan: 95
Ram: 89
Suhel: 92
Sangeeta: 85

4.8 Dictionary Methods and Built-in Functions


Python provides many functions to work on dictionaries.
Table 4.2 lists some of the commonly used dictionary
methods.
Table 4.2 Built-in functions and methods for dictionary
Method Description Example
len() Returns the length or number of >>> dict1 = {'Mohan':95,'Ram':89,
key: value pairs of the dictionary 'Suhel':92, 'Sangeeta':85}
passed as the argument >>> len(dict1)
4

2024-25

Chap 4.indd 69 19-Jul-19 3:31:21 PM


70 Informatics Practices – Class XI

dict() Creates a dictionary from a pair1 = [('Mohan',95),('Ram',89),


sequence of key-value pairs ('Suhel',92),('Sangeeta',85)]
>>> pair1
[('Mohan', 95), ('Ram', 89), ('Suhel',
92), ('Sangeeta', 85)]
>>> dict1 = dict(pair1)
>>> dict1
{'Mohan': 95, 'Ram': 89, 'Suhel': 92,
'Sangeeta': 85}

keys() Returns a list of keys in the >>> dict1 = {'Mohan':95, 'Ram':89,


dictionary 'Suhel':92, 'Sangeeta':85}
>>> dict1.keys()
dict_keys(['Mohan', 'Ram', 'Suhel',
'Sangeeta'])

values() Returns a list of values in the >>> dict1 = {'Mohan':95, 'Ram':89,


dictionary 'Suhel':92, 'Sangeeta':85}
>>> dict1.values()
dict_values([95, 89, 92, 85])

items() Returns a list of tuples (key — >>> dict1 = {'Mohan':95, 'Ram':89,


value) pair 'Suhel':92, 'Sangeeta':85}
>>> dict1.items()
dict_items([( 'Mohan', 95), ('Ram', 89),
('Suhel', 92), ('Sangeeta', 85)])

get() Returns the value corresponding >>> dict1 = {'Mohan':95, 'Ram':89,


to the key passed as the argument 'Suhel':92, 'Sangeeta':85}
>>> dict1.get('Sangeeta')
If the key is not present in the 85
dictionary it will return None
>>> dict1.get('Sohan')
>>>
update() appends the key-value pair of >>> dict1 = {'Mohan':95, 'Ram':89,
the dictionary passed as the 'Suhel':92, 'Sangeeta':85}
argument to the key-value pair of >>> dict2 = {'Sohan':79,'Geeta':89}
the given dictionary >>> dict1.update(dict2)
>>> dict1
{'Mohan': 95, 'Ram': 89, 'Suhel': 92,
'Sangeeta': 85, 'Sohan': 79, 'Geeta': 89}
>>> dict2
{'Sohan': 79, 'Geeta': 89}

clear() Deletes or clear all the items of >>> dict1 = {'Mohan':95,'Ram':89,


the dictionary 'Suhel':92, 'Sangeeta':85}
>>> dict1.clear()
>>> dict1
{ }

2024-25

Chap 4.indd 70 19-Jul-19 3:31:21 PM


Working with Lists and Dictionaries 71

del() Deletes the item with the given >>> dict1 = {'Mohan':95,'Ram':89,
key 'Suhel':92, 'Sangeeta':85}
To delete the dictionary from the >>> del dict1['Ram']
memory we write: >>> dict1
del Dict_name
{'Mohan':95,'Suhel':92, 'Sangeeta': 85}

>>> dict1
NameError: name 'dict1' is not defined

4.9 Manipulating Dictionaries


In this chapter, we have learnt how to create a
dictionary and apply various methods to manipulate it.
The following examples show the application of those
manipulation methods on dictionaries.
(a) Create a dictionary ‘ODD’ of odd numbers between
1 and 10, where the key is the decimal number and
the value is the corresponding number in words.
>>> ODD = {1:'One',3:'Three',5:'Five',7:'Seven',9:'Nine'}
>>> ODD
{1: 'One', 3: 'Three', 5: 'Five', 7: 'Seven', 9: 'Nine'}

(b) Display the keys in dictionary ‘ODD’.


>>> ODD.keys()
dict_keys([1, 3, 5, 7, 9])

(c) Display the values in dictionary ‘ODD’.


>>> ODD.values()
dict_values(['One', 'Three', 'Five', 'Seven', 'Nine'])

(d) Display the items from dictionary ‘ODD’


>>> ODD.items()
dict_items([(1, 'One'), (3, 'Three'), (5, 'Five'), (7, 'Seven'), (9,
'Nine')])

(e) Find the length of the dictionary ‘ODD’.


>>> len(ODD)
5

(f) Check if 7 is present or not in dictionary ‘ODD’


>>> 7 in ODD
True

2024-25

Chap 4.indd 71 19-Jul-19 3:31:21 PM


72 Informatics Practices – Class XI

(g) Check if 2 is present or not in dictionary ‘ODD’


>>> 2 in ODD
False

(h) Retrieve the value corresponding to the key 9


>>> ODD.get(9)
'Nine'

(i) Delete the item from the dictionary, corresponding to the key 9. ‘ODD’
>>> del ODD[9]
>>> ODD
{1: 'One', 3: 'Three', 5: 'Five', 7: 'Seven'}

Program 4-4 σ n number of write a program to enter


names of employees and their salaries
as input and store them in a dictionary.
Here n is to input by the user.
#Program 4-4
#Program to create a dictionary which stores names of employees
#and their salary
num = int(input("Enter the number of employees whose data to be
stored: "))
count = 1
employee = dict() #create an empty dictionary
for count in range (n):
name = input("Enter the name of the Employee: ")
salary = int(input("Enter the salary: "))
employee[name] = salary
print("\n\nEMPLOYEE_NAME\tSALARY")
for k in employee:
print(k,'\t\t',employee[k])
Output:
Enter the number of employees to be stored: 5
Enter the name of the Employee: 'Tarun'
Enter the salary: 12000
Enter the name of the Employee: 'Amina'
Enter the salary: 34000
Enter the name of the Employee: 'Joseph'
Enter the salary: 24000
Enter the name of the Employee: 'Rahul'
Enter the salary: 30000
Enter the name of the Employee: 'Zoya'
Enter the salary: 25000
EMPLOYEE_NAME SALARY
'Tarun' 12000
'Amina' 34000

2024-25

Chap 4.indd 72 19-Jul-19 3:31:21 PM


Working with Lists and Dictionaries 73

'Joseph' 24000
'Rahul' 30000
'Zoya' 25000

Program 4-5 Write a program to count the number


of times a character appears in a given
string.
#Program 4-5
#Count the number of times a character appears in a given string
st = input("Enter a string: ")
dic = {} #creates an empty dictionary
for ch in st:
if ch in dic: #if next character is already in dic
dic[ch] += 1
else:
dic[ch] = 1 #if ch appears for the first time

for key in dic:


print(key,':',dic[key])
Output:
Enter a string: HelloWorld
H : 1
e : 1
l : 3
o : 2
W : 1
r : 1
d : 1

Program 4-6 Write a program to convert a number


entered by the user into its corresponding
number in words. for example if the input
is 876 then the output should be ‘Eight
Seven Six’.
# Program 4-6
num = input("Enter any number: ") #number is stored as string
#numberNames is a dictionary of digits and corresponding number
#names
numberNames = {0:'Zero',1:'One',2:'Two',3:'Three',4:'Four',\
5:'Five',6:'Six',7:'Seven',8:'Eight',9:'Nine'}

result = ''
for ch in num:
key = int(ch) #converts character to integer
value = numberNames[key]

2024-25

Chap 4.indd 73 19-Jul-19 3:31:21 PM


74 Informatics Practices – Class XI

result = result + ' ' + value


print("The number is:",num)
print("The numberName is:",result)
Output:
Enter any number: 6512
The number is: 6512
The numberName is: Six Five One Two

Summary
• Lists are mutable sequences in Python, i.e. we can
change the elements of the list.
• Elements of a list are put in square brackets
separated by comma.
• List indexing is same as that of list and starts at 0.
Two way indexing allows traversing the list in the
forward as well as in the backward direction.
• Operator + concatenates one list to the end of other
list.
• Operator * repeats the content of a list by
specified number of times.
• Membership operator in tells if an element is
present in the list or not and not in does the
opposite.
• Slicing is used to extract a part of the list.
• There are many list manipulation methods. Few
are: len(), list(), append(), extend(), insert(), count(),
find(), remove(), pop(), reverse(), sort(), sorted(),
min(), max(), sum().
• Dictionary is a mapping (non scalar) data type. It
is an unordered collection of key-value pair; key-
value pair are put inside curly braces.
• Each key is separated from its value by a colon.
• Keys are unique and act as the index.
• Keys are of immutable type but values can be
mutable.

2024-25

Chap 4.indd 74 19-Jul-19 3:31:21 PM


Working with Lists and Dictionaries 75

Notes

Exercise
1. What will be the output of the following statements?

a) list1 = [12,32,65,26,80,10]
list1.sort()
print(list1)
b) list1 = [12,32,65,26,80,10]
sorted(list1)
print(list1)
c) list1 = [1,2,3,4,5,6,7,8,9,10]
list1[::-2]
list1[:3] + list1[3:]

d) list1 = [1,2,3,4,5]
list1[len(list1)-1]
2. Consider the following list myList. What will be
the elements of myList after each of the following
operations?
myList = [10,20,30,40]
a) myList.append([50,60])
b) myList.extend([80,90])
3. What will be the output of the following code segment?
myList = [1,2,3,4,5,6,7,8,9,10]
for i in range(0,len(myList)):
if i%2 == 0:
print(myList[i])
4. What will be the output of the following code segment?
a) myList = [1,2,3,4,5,6,7,8,9,10]
del myList[3:]
print(myList)

b) myList = [1,2,3,4,5,6,7,8,9,10]
del myList[:5]
print(myList)

c) myList = [1,2,3,4,5,6,7,8,9,10]
del myList[::2]
print(myList)
5. Differentiate between append() and extend() methods
of list.

2024-25

Chap 4.indd 75 19-Jul-19 3:31:21 PM


76 Informatics Practices – Class XI

Notes 6. Consider a list:


list1 = [6,7,8,9]
What is the difference between the following
operations on list1:
a) lis t1 * 2
b) lis t1 *= 2
c) lis t1 = lis t1 * 2
7. The record of a student (Name, Roll No, Marks in
five subjects and percentage of marks) is stored in
the following list:
stRecord = ['Raman','A-36',[56,98,99,72,69],
78.8]
Write Python statements to retrieve the following
information from the list stRecord.
a) Percentage of the student
b) Marks in the fifth subject
c) Maximum marks of the student
d) Roll No. of the student
e) Change the name of the student from
‘Raman’ to ‘Raghav’
8. Consider the following dictionary stateCapital:
stateCapital = {"Assam":"Guwahati",
"Bihar":"Patna","Maharashtra":"Mumbai",
"Rajasthan":"Jaipur"}
Find the output of the following statements:
a) print(stateCapital.get("Bihar"))
b) print(stateCapital.keys())
c) print(stateCapital.values())
d) print(stateCapital.items())
e) print(len(stateCapital))
f) print("Maharashtra" in stateCapital)
g) print(stateCapital.get("Assam"))
h) del stateCapital["Assam"]
print(stateCapital)

Programming Problems
1. Write a program to find the number of times an element
occurs in the list.

2024-25

Chap 4.indd 76 19-Jul-19 3:31:21 PM


Working with Lists and Dictionaries 77

2. Write a program to read a list of n integers (positive Notes


as well as negative). Create two new lists, one having
all positive numbers and the other having all negative
numbers from the given list. Print all three lists.
3. Write a program to find the largest and the second
largest elements in a given list of elements.
4. Write a program to read a list of n integers and find their
median.
Note: The median value of a list of values is the middle one
when they are arranged in order. If there are two middle values
then take their average.
Hint: Use an inbuilt function to sort the list.

5. Write a program to read a list of elements. Modify this


list so that it does not contain any duplicate elements i.e.
all elements occurring multiple times in the list should
appear only once.
6. Write a program to create a list of elements. Input an
element from the user that has to be inserted in the list.
Also input the position at which it is to be inserted.
7. Write a program to read elements of a list and do the
following.
a) The program should ask for the position of the
element to be deleted from the list and delete the
element at the desired position in the list.
b) The program should ask for the value of the element
to be deleted from the list and delete this value from
the list.
8. Write a Python program to find the highest 2 values in
a dictionary.
9. Write a Python program to create a dictionary from a
string ‘w3resource’ such that each individual character
mates a key and its index value for fist occurrence males
the corresponding value in dictionary.
Expected output : {'3': 1, 's': 4, 'r': 2, 'u': 6, 'w': 0, 'c': 8,
'e': 3, 'o': 5}
10. Write a program to input your friend’s, names and their
phone numbers and store them in the dictionary as the
key-value pair. Perform the following operations on the
dictionary:
a) Display the Name and Phone number for all your
friends.
b) Add a new key-value pair in this dictionary and
display the modified dictionary

2024-25

Chap 4.indd 77 19-Jul-19 3:31:21 PM


78 Informatics Practices – Class XI

Notes c) Delete a particular friend from the dictionary


d) Modify the phone number of an existing friend
e) Check if a friend is present in the dictionary or not
f) Display the dictionary in sorted order of names

Case Study Based Question


For the SMIS System given in Chapter 3, let us do the
following:
1. Write a program to take in the roll number, name and
percentage of marks for n students of Class X and do
the following:
• Accept details of the n students (n is the number
of students).
• Search details of a particular student on the basis
of roll number and display result.
• Display the result of all the students.
• Find the topper amongst them.
• Find the subject toppers amongst them.
(Hint: Use Dictionary, where the key can be roll number
and the value an immutable data type containing name
and percentage.)

Case Study
1. A bank is a financial institution which is involved in
borrowing and lending of money. With advancement
in technology, online banking, also known as internet
banking allows customers of a bank to conduct a range
of financial transactions through the bank’s website
anytime, anywhere. As part of initial investigation you
are suggested to:
• Collect a Bank’s application form. After careful
analysis of the form, identify the information
required for opening a savings account. Also
enquire about the rate of interest offered for a
savings account.
• The basic two operations performed on an account
are Deposit and Withdrawal. Write a menu driven
program that accepts either of the two choices
of Deposit and Withdrawal, then accepts an
amount, performs the transaction and accordingly
displays the balance. Remember every bank has
a requirement of minimum balance which needs
to be taken care of during withdrawal operations.

2024-25

Chap 4.indd 78 19-Jul-19 3:31:21 PM


Working with Lists and Dictionaries 79

Enquire about the minimum balance required in Notes


your bank.
• Collect the interest rates for opening a fixed
deposit in various slabs in a savings bank
account. Remembers rate may be different for
senior citizens.
Finally, write a menu driven program having the
following options (use functions and appropriate data
types):
• Open a savings bank account
• Deposit money
• Withdraw money
• Take details such as amount and period for a
Fixed Deposit and display its maturity amount for
a particular customer.
2. Participating in a quiz can be fun as it provides a
competitive element. Some educational institutes use
it as a tool to measure knowledge level, abilities and/
or skills of their pupils either on a general level or in
a specific field of study. Identify and analyse popular
quiz shows and write a Python program to create a quiz
that should also contain the following functionalities
besides the one identified by you as a result of
your analysis.
• Create an administrative user ID and password to
categorically add or modify delete a question.
• Register the student before allowing her/him to
play a quiz.
• Allow selection of category based on subject area.
• Display questions as per the chosen category.
• Keep the score as the participant plays.
• Display final score.
3. Our heritage monuments are our assets. They are
a reflection of our rich and glorious past and an
inspiration for our future. UNESCO has identified some
of Indian heritage sites as World Heritage sites. Collect
the following information about these sites:
• What is the name of the site?
• Where is it located?
▪ District
▪ State

2024-25

Chap 4.indd 79 19-Jul-19 3:31:21 PM


80 Informatics Practices – Class XI

Notes • When was it built?


• Who built it?
• Why was it built?
• Website link (if any)
Write a Python program to:
• Create an administrative user ID and password to
add, modify or delete an entered heritage site in the
list of sites.
• Display the list of world heritage sites in India.
• Search and display information of a world heritage
site entered by the user.
• Display the name(s) of world heritage site(s) on the
basis of the state input by the user.

2024-25

Chap 4.indd 80 19-Jul-19 3:31:21 PM


Understanding Chapter

Data 5

In this chapter

»» Introduction to Data
»» Data Collection
“Data is not information, Information »» Data Storage

is not knowledge, Knowledge is not »» Data Processing


»» Statistical Techniques
understanding, Understanding is not for Data Processing
wisdom.”
— Gary Schubert

5.1 Introduction to Data


Many a time, people take decisions based on
certain data or information. For example, while
choosing a college for getting admission, one
looks at placement data of previous years of that
college, educational qualification and experience
of the faculty members, laboratory and hostel
facilities, fees, etc. So we can say that identification
of a college is based on various data and their
analysis. Governments systematically collect
and record data about the population through
a process called census. Census data contains

2024-25

Chap 5.indd 81 09-Aug-19 11:49:18 AM


82 Informatics Practices – Class XI

valuable information which are helpful is planning and


formulating policies. Likewise, the coaching staff of a
sports team analyses previous performances of opponent
teams for making strategies. Banks maintain data about
the customers, their account details and transactions.
All these examples highlight the need of data in various
fields. Data are indeed crucial for decision making.
In the previous examples, one cannot make decisions
by looking at the data itself. In our example of choosing
a college, suppose the placement cell of the college has
maintained data of about 2000 students placed with
different companies at different salary packages in the
last 3 years. Looking at such data, one cannot make
any remark about the placement of students of that
college. The college processes and analyses this data
and the results are given in the placement brochure of
the college through summarisation as well as visuals for
easy understanding. Hence, data need to be gathered,
processed and analysed for making decisions.
A knowledge base is
a store of information In general, data is a collection of characters, numbers,
consisting of facts, and other symbols that represents values of some
assumptions and situations or variables. Data is plural and singular of the
rules which an AI word data is “datum”. Using computers, data are stored
system can use for in electronic forms because data processing becomes
decision making.
faster and easier as compared to manual data processing
done by people. The Information and Communication
Technology (ICT) revolution led by computer, mobile and
Internet has resulted in generation of large volume of
data and at a very fast pace. The following list contains
some examples of data that we often come across.
• Name, age, gender, contact details, etc., of a person
• Transactions data generated through banking,
ticketing, shopping, etc. whether online or offline
• Images, graphics, animations, audio, video
• Documents and web pages
• Online posts, comments and messages
• Signals generated by sensors
• Satellite data including meteorological data,
communication data, earth observation data, etc.

5.1.1 Importance of Data


Human beings rely on data for making decisions.
Besides, large amount of data when processed with the
help of a computer, show us the possibilities or hidden

2024-25

Chap 5.indd 82 09-Aug-19 11:49:19 AM


Understanding Data 83

traits which are otherwise not visible to humans. When Notes


one withdraws money from ATM, the bank needs to debit
the withdrawn amount from the linked account. So the
bank needs to maintain data and update it as and when
required. The meteorological offices continuously keep
on monitoring satellite data for any upcoming cyclone
or heavy rain.
In a competitive business environment, it is important
for business organisations to continuously monitor and
analyse market behavior with respect to their products
and take actions accordingly. Besides, companies
identify customer demands as well as feedbacks, and
make changes in their products or services accordingly.
The dynamic pricing concept used by airlines and
railway is another example where they decide the price
based on relationships between demand and supply.
The cab booking Apps increase or decrease the price
based on demand for cabs at a particular time. Certain
restaurants offer discounted price (called happy hours),
they decide when and how much discount to offer by
analysing sales data at different time periods.
Besides business, following are some other scenarios
where data are also stored and analysed for making
decisions:
• The electronic voting machines are used for recording
the votes cast. Subsequently, the voting data from
all the machines are accumulated to declare election
results in a short time as compared to manual
counting of ballot papers.
• Scientists record data while doing experiments to
calculate and compare results.
• Pharmaceutical companies record data while trying
out a new medicine to see its effectiveness.
• Libraries maintain data about books in the library
and the membership of the library.
• The search engines give us results after analysing
large volume of data available on the websites across
World Wide Web (www).
• Weather alerts are generated by analysing data
received from various satellites.

5.1.2 Types of Data


As data come from different sources, they can be in
different formats. For example, an image is a collection

2024-25

Chap 5.indd 83 09-Aug-19 11:49:19 AM


84 Informatics Practices – Class XI

of pixels; a video is made up of frames; a fee slip is


made up of few numeric and non-numeric entries; and
messages/chats are made up of texts, icons (emoticons)
and images/videos. Two broad categories in which data
Activity 5.1 can be classified on the basis of their format are:
Observe Voter Identity (A) Structured Data
cards of your family Data which is organised and can be recorded in a well
members and identify defined format is called structured data. Structured
the data fields under data is usually stored in computer in a tabular (in rows
which data are
organised. Are they
and columns) format where each column represents
same for all? different data for a particular parameter called attribute/
characteristic/variable and each row represents data of
an observation for different attributes. Table 5.1 shows
structured data related to an inventory of kitchen items
maintained by a shop.
Table 5.1 Structured data about kitchen items in a shop
ModelNo ProductName Unit Price Discount(%) Items_in_Inventory
ABC1 Water bottle 126 8 13
ABC2 Melamine Plates 320 5 45
ABC3 Dinner Set 4200 10 8
GH67 Jug 80 0 10
GH78 Table Spoon 120 5 14
GH81 Bucket 190 12 6
NK2 Kitchen Towel 25 0 32

Given this data, using a spreadsheet or other such


software, the shop owner can find out how many total
items are there by summing the column Items_in_
Inventory of Table 5.1 The owner of the shop can also
calculate the total value of all items in the inventory
by multiplying each entry of column 3 (Unit Price) with
corresponding entry of column 5 (Items_in_Inventory)
and finding their sum.
Table 5.2 shows more examples of structured data
recorded for different attributes.
Table 5.2 Attributes maintained for different activities
Entity/Activities Data Fields/Parameters/Attributes
Books at a shop BookTitle, Author, Price, YearofPublication
Depositing fees in a school StudentName, Class, RollNo, FeesAmount, DepositDate
Amount withdrawal from ATM AccHolderName, AccountNo, TypeofAcc, DateofWithdrawal,
AmountWithdrawn, ATMid, TimeOfWithdrawal

2024-25

Chap 5.indd 84 09-Aug-19 11:49:19 AM


Understanding Data 85

(B) Unstructured Data


A newspaper contains various types of news items
which are also called data. But there is no fixed pattern
that a newspaper follows in placing news articles. One
day there might be three images of different sizes on
a page along with five news items and one or more
advertisements. While on another day there, might be
one big image with three textual news items. So there is
no particular format nor any fixed structure for printing
news. Another example is the content of an email. Think and Reflect
There is no fixed structure about how many lines or When we click a
paragraphs one has to write in an email or how many photograph using
files are to be attached with an email. In summary, our digital or mobile
data which are not in the traditional row and column camera, does it have
some metadata
structure is called unstructured data. associated with it?
Examples of unstructured data include web pages
consisting of text as well as multimedia contents
(image, graphics, audio/video). Other examples include
text documents, business reports, books, audio/video
files, social media messages. Although there are ways
to process unstructured data, we are going to focus on
handling structured data only in this book.
Unstructured data are sometimes described with
the help of some other data called metadata. Metadata
is basically data about data. For example, we describe
different parts of an email as subject, recipient, main
body, attachment, etc. These are the metadata for the
email data. Likewise, we can have some metadata for an
image file as image size (in KB or MB), image type (for
example, JPEG, PNG), image resolution, etc.

5.2 Data Collection


For processing data, we need to collect or gather data
first. We can then store the data in a file or database
for later use. Data collection here means identifying
already available data or collecting from the appropriate
sources. Suppose there are three different scenarios
where sales data in a grocery store are available:
• Sales data are available with the shopkeeper in a
diary or register. In this case we should enter the
data in a digital format for example, in a spreadsheet.
• Data are already available in a digital format, say in
a CSV (comma separated values) file.
• The shopkeeper has so far not recorded any data in
either form but wants to get a software developed for

2024-25

Chap 5.indd 85 09-Aug-19 11:49:19 AM


86 Informatics Practices – Class XI

maintaining sales data and accounts. The software


Think and Reflect may be developed using a programming language such
Identify attributes as Python which can be used to store and retrieve data
needed for creating an
from a CSV file or a database management system
Aadhaar Card.
like MySQL, which will be discussed further.
Data are continuously being generated at different
sources. Our interactions with digital medium are
continuously generating huge volumes of data. Hospitals
are collecting data about patients for improving their
services. Shopping malls are collecting data about the
items being purchased by people. On analysing such
data, suppose it appears that bedsheets and groceries
are frequently bought together. Hence, the shop owner
may decide to display bedsheets near the grocery section
in the mall to increase the sales. Likewise, a political
analyst may look at the data contained in the posts and
messages at a social media platform and analyse to see
public opinion before an election. Organisations like
World Bank and International Monetary Fund (IMF) are
collecting data related to various economic parameters
from different countries for making economic forecasts.

5.3 Data Storage


Once we gather data and process them to get results, we
may not then simply discard the data. Rather, we would
like to store them for future use as well. Data storage
is the process of storing data on storage devices so that
data can be retrieved later. Now a days large volume of
data are being generated at a very high rate. As a result,
data storage has become a challenging task. However,
the decrease in the cost of digital storage devices has
helped in simplifying this task. There are numerous
digital storage devices available in the market like, Hard
Disk Drive (HDD), Solid State Drive (SSD), CD/DVD,
Tape Drive, Pen Drive, Memory Card, etc.
Think and Reflect We store data like images, documents, audios/
Is it necessary to store videos, etc. as files in our computers. Likewise, school/
data in files before hospital data are stored in data files. We use computers
processing?
to add, modify or delete data in these files or process
these data files to get results. However, file processing
has certain limitations, which can be overcome through
Database Management System (DBMS).

2024-25

Chap 5.indd 86 09-Aug-19 11:49:19 AM


Understanding Data 87

5.4 Data Processing


We are interested in
understanding data as INFORMATION
RAW DATA Data Processing (In the form
they hold valuable facts (Numbers/
of table/
Text/Image)
and information that chart/text)
can be useful in our
decision making process.
Data Process Cycle
However, by looking at
the vast or large amount Input Processing Output
of data, one cannot arrive
Data Collection Store Reports
at a conclusion. Rather, Data Prepration Retrieve Results
data need to be processed Data Entry Classify Processing System
to get results and after Update
analysing those results,
Figure 5.1: Steps in Data Processing
we make conclusions or
decisions.
We find automated data processing in situations like
online bill payment, registration of complaints, booking
tickets, etc. Figure 5.1 illustrates basic steps used to
process the data to get the output.
Figure 5.2 shows some tasks along with data,
processing and generated output/information.

A website handling online filling of student details for a competitive examination and generating admit card

Student details like name, address, Processing of filled in details for


qualification, marks, mobile correctness of data received,
number, photo and sign, center eligibility as per advertisement or Examination Admit card specifying
choice, online fee payment details not, fees paid or not, photo and roll number, center address, date
like credit/debit card, net banking signature uploaded or not. Then, and time of test.
or other mode of payment, etc. generate a roll number and add this
applicant in the list of eligible
applicants.

A Bank handling withdrawals of cash through ATMs of its own branch

ATM PIN number, account type, Checking for valid PIN number,
account number, card number, existing bank balance, if satisfied, Currency notes, printed slip with
ATM location from where money then deduction of amount from that transaction details
was withdrawn, date and time, and account and counting of rupees and
amount to be withdrawn. initiate printing of receipt

Issue of train ticket

Journey start and end stations, Verify login details and check
date of journey, number of tickets availability of berth in that class. If
required, class of travel payment done, issue tickets and Generate ticket with berth and
(Sleeper/AC/other), berth deduct that number from the total coach number, or issue ticket with
preference (if any), passenger available tickets on that coach. a waiting list number
name(s) and age(s), mobile and Allocate PNR number and berths or
email id, payment related details, generate a waiting number for that
etc. ticket.

Problem Statement Inputs against which Processing Output


data are collected

Figure 5.2: Data Based Problem Statements

2024-25

Chap 5.indd 87 09-Aug-19 11:49:19 AM


88 Informatics Practices – Class XI

Notes 5.5 Statistical Techniques for Data Processing


Given a set of data values, we need to process them to get
information. There are various techniques which help
us to have preliminary understanding about the data.
Summarisation methods are applied on tabular data
for its easy comprehension. Commonly used statistical
techniques for data summarisation are given below:
5.5.1 Measures of Central Tendency
A measure of central tendency is a single value that
gives us some idea about the data. Three most common
measures of central tendency are the mean, median,
and mode. Instead of looking at each individual data
values, we can calculate the mean, median and mode
of the data to get an idea about average, middle value
and frequency of occurrence of a particular value,
respectively. Selection of a measure of central tendency
depends on certain characteristics of data.
(A) Mean
Mean is simply the average of numeric values of an
attribute. Mean is also called average. Suppose there
are data on weight of 40 students in a class. Instead
of looking at each of the data values, we can calculate
the average to get an idea about the average weight of
students in that class.
Definition: Given n values x1, x2, x3,...xn, mean is
n

computed as ∑ xi .
i
n

Example 5.1
Assume that height (in cm) of students in a class are as
follows [90,102,110,115,85,90,100,110,110]. Mean or
average height of the class is
90 + 102 + 110 + 115 + 85 + 90 + 100 + 110 + 110 912
= = 101.33 cm
9 9

Mean is not a suitable choice if there are outliers


in the data. To calculate mean, the outliers or extreme
values should be removed from the given data and then
calculate mean of the remaining data.
Note: An outlier is an exceptionally large or small value, in
comparison to other values of the data. Usually, outliers are
considered as error since they can influence/affect the average or
other statistical calculation based on the data.

2024-25

Chap 5.indd 88 09-Aug-19 11:49:20 AM


Understanding Data 89

(B) Median
Median is also computed for a single attribute/variable
at a time. When all the values are sorted in ascending or
descending order, the middle value is called the Median.
When there are odd number of values, then median is
the value at the middle position. If the list has even
number of values, then median is the average of the two
middle values. Median represents the central value at
which the given data is equally divided into two parts.
Example 5.2
Consider the previous data of height of students used
in calculation of mean value. In order to compute the
median, the first step is to sort data in ascending or
descending order. We have sorted the height data in
ascending order as [85,90,90,100,102,110,110,110,
115]. As there are total 9 values (odd number), the Think and Reflect
Out of Mean and Median,
median is the value at position 5, that is 102 cm,
which one is more
whether counted from left to right or from right to left. sensitive to outliers in
Median represents the actual central value at which the data?
given data is equally divided into two parts.
(C) Mode
Value that appears most number of times in the given
data of an attribute/variable is called Mode. It is
computed on the basis of frequency of occurrence of
distinct values in the given data. A data set has no mode
if each value occurs only once. There may be multiple
modes in the data if more than one values have same
highest frequency. Mode can be found for numeric as
well as non-numeric data.
Example 5.3
In the list of height of students, mode is 110 as its
frequency of occurrence in the list is 3, which is larger
than the frequency of rest of the values.
5.5.2 Measures of Variability
The measures of variability refer to the spread or variation
of the values around the mean. They are also called
measures of dispersion that indicate the degree of diversity
in a data set. They also indicate difference within the group.
Two different data sets can have the same mean, median
or mode but completely different levels of dispersion, or
vice versa. Common measures of dispersion or variability
are Range and Standard Deviation.

2024-25

Chap 5.indd 89 09-Aug-19 11:49:20 AM


90 Informatics Practices – Class XI

Notes (A) Range


It is the difference between maximum and minimum
values of the data (the largest value minus the
smallest value). Range can be calculated only for
numerical data. It is a measure of dispersion and
tells about coverage/spread of data values. For
example difference in salaries of employees, marks of
a student, price of toys, etc. As range is calculated
based on the two extreme values, any outlier in the
data badly influences the result.
Let M be the largest or maximum value and S is the
smallest or minimum value in the data, then Range is
the difference between two extreme values i.e. M – S or
Maximum – Minimum.
Example 5.4
In the above example, minimum hight value is 85 cm
and maximum hight value is 115 cm. Hane range is
115-85 = 30 cm.
(B) Standard deviation
Standard deviation refers to differences within the group
or set of data of a variable. Like Range, it also measures
the spread of data. However, unlike Range which only
uses two extreme values in the data, calculation of
standard deviation considers all the given data. It is
calculated as the positive square root of the average of
squared difference of each value from the mean value
of data. Smaller value of standard deviation means
data are less spread while a larger value of standard
deviation means data are more spread.
Given n values x1, x2, x3,...xn, and their mean x, the
standard deviation, represented as σ (greek letter sigma)
is computed as
n
∑ (X i − X )2
∑= i =1

Example 5.5
Let us compute the standard deviation of the hight
of nine students that we used while calculating
Mean. The Mean (x) was calculated to be 101.33 cm.
Subtract each value from the mean and take square
of that value. Dividing the sum of square values by
total number of values and taking its square not
gives the standard deviation in data. See Table 5.3
for details.

2024-25

Chap 5.indd 90 09-Aug-19 11:49:21 AM


Understanding Data 91

Table 5.3 Standard deviation of attendance of 9 students


_ _
Height (x) in cm x_x (x _ x )2
90 -11.33 128.37 n

102 0.67 0.36 ∑ (X i − X )2


= i =1

110 8.67 75.17 n

115 13.67 186.87


938
85 -16.33 266.67 = = 104.22
9
90 -11.33 128.37
n
100 -1.33 1.77 ∑ (X i − X )2
110 8.67 75.17 ∑= i =1

n
110 8.67 75.17
n=9 _ _ = 104.22 = 10.2 cm
_ ∑x-x) = 0.03 ∑x-x)2 = 938.00
x =101.33

Let us look at the following problems and select


the suitable statistical technique to be applied (Mean/
Median/Mode/Range/Standard Deviation):
Choose suitable
Problem Statement
statistical method
The management of a company wants to know about disparity in salaries of
all employees.

Teacher wants to know about the average performance of the whole class in
a test.

Compare height of residents of two cities

Find the dominant value from a set of values

Compare income of residents of two cities

Find the popular color for car after surveying the car owners of a small city.

It is important to understand statistical techniques


so that one can decide which statistical technique to
use to arrive at a decision. Different programming tools
are available for efficient analysis of large volumes of
data. These tools make use of statistical techniques for
data analysis. One such programming tool is Python
and it has libraries specially built for data processing
and analysis. We will be covering some of them in the
following chapters.

2024-25

Chap 5.indd 91 09-Aug-19 11:49:21 AM


92 Informatics Practices – Class XI

Notes Summary
• Data refer to unorganised facts that can be processed
to generate meaningful result or information.
• Data can be structured or unstructured.
• Hard Disk, SSD, CD/DVD, Pen Drive, Memory
Card, etc. are some of the commonly used storage
devices.
• Data Processing cycle involves input and storage
of data, its processing and generating output.
• Summarizing data using statistical techniques aids
in revealing data characteristics.
• Mean, Median, Mode, Range, and Standard
Deviation are some of the statistical techniques
used for data summarisation.
• Mean is the average of given values.
• Median is the mid value when data are sorted in
ascending/descending order.
• Mode is the data value that appears most number
of times.
• Range is the difference between the maximum and
minimum values.
• Standard deviation is the positive square root of
the average of squared difference of each value
from the mean.

Excercise
1. Identify data required to be maintained to perform the
following services:
a) Declare exam results and print e-certificates
b) Register participants in an exhibition and issue
biometric ID cards
c) To search for an image by a search engine
d) To book an OPD appointment with a hospital in a
specific department
2. A school having 500 students wants to identify
beneficiaries of the merit-cum means scholarship,
achieving more than 75% for two consecutive years
and having family income less than 5 lakh per annum.

2024-25

Chap 5.indd 92 09-Aug-19 11:49:22 AM


Understanding Data 93

Briefly describe data processing steps to be taken by the Notes


to beneficial prepare the list of school.
3. A bank ‘xyz’ wants to know about its popularity among
the residents of a city ‘ABC’ on the basis of number of
bank accounts each family has and the average monthly
account balance of each person. Briefly describe the
steps to be taken for collecting data and what results
can be checked through processing of the collected data.
4. Identify type of data being collected/generated in the
following scenarios:
a) Recording a video
b) Marking attendance by teacher
c) Writing tweets
d) Filling an application form online
5. Consider the temperature (in Celsius) of 7 days of a week
as 34, 34, 27, 28, 27, 34, 34. Identify the appropriate
statistical technique to be used to calculate the following:
a) Find the average temperature.
b) Find the temperature Range of that week.
c) Find the standard deviation temperature.
6. A school teacher wants to analyse results. Identify the
appropriate statistical technique to be used along with
its justification for the following cases:
a) Teacher wants to compare performance in terms of
division secured by students in Class XII A and Class
XII B where each class strength is same.
b) Teacher has conducted five unit tests for that class in
months July to November and wants to compare the
class performance in these five months.
7. Suppose annual day of your school is to be celebrated.
The school has decided to felicitate those parents of the
students studying in classes XI and XII, who are the
alumni of the same school. In this context, answer the
following questions:
a) Which statistical technique should be used to find
out the number of students whose both parents are
alumni of this school?
b) How varied are the age of parents of the students of
that school?
8. For the annual day celebrations, the teacher is looking
for an anchor in a class of 42 students. The teacher would
make selection of an anchor on the basis of singing skill,
writing skill, as well as monitoring skill.
a) Which mode of data collection should be used?
b) How would you represent the skill of students
as data?

2024-25

Chap 5.indd 93 09-Aug-19 11:49:22 AM


94 Informatics Practices – Class XI

Notes 9. Differentiate between structured and unstructured data


giving one example.
10. The principal of a school wants to do following analysis
on the basis of food items procured and sold in the
canteen:
a) Compare the purchase and sale price of fruit juice
and biscuits.
b) Compare sales of fruit juice, biscuits and samosa.
c) Variation in sale price of fruit juices of different
companies for same quantity (in ml).
Create an appropriate dataset for these items (fruit juice,
biscuits, samosa) by listing their purchase price and
sale price. Apply basic statistical techniques to make
the comparisons.

2024-25

Chap 5.indd 94 09-Aug-19 11:49:22 AM


Introduction Chapter

to NumPy 6

In this chapter

»» Introduction
»» Array
»» NumPy Array
“The goal is to turn data into information, »» Indexing and Slicing
and information into insight.” »» Operations on Arrays
»» Concatenating Arrays
— Carly Fiorina »» Reshaping Arrays
»» Splitting Arrays
»» Statistical Operations
on Arrays
»» Loading Arrays from
Files
»» Saving NumPy Arrays
6.1 Introduction in Files on Disk
NumPy stands for ‘Numerical Python’. It is a
package for data analysis and scientific computing
with Python. NumPy uses a multidimensional
array object, and has functions and tools
for working with these arrays. The powerful
n-dimensional array in NumPy speeds-up data
processing. NumPy can be easily interfaced with
other Python packages and provides tools for
integrating with other programming languages
like C, C++ etc.

2024-25

Chap 6.indd 95 19-Jul-19 3:43:32 PM


96 Informatics Practices – Class XI

Installing NumPy
NumPy can be installed by typing following command:
pip install NumPy

6.2 Array
We have learnt about various data types like list, tuple,
Contiguous memory and dictionary. In this chapter we will discuss another
allocation:
datatype ‘Array’. An array is a data type used to store
The memory space
must be divided multiple values using a single identifier (variable name).
into the fined sized An array contains an ordered collection of data elements
position and each where each element is of the same type and can be
position is allocated referenced by its index (position).
to a single data only.
The important characteristics of an array are:
Now Contiguous
Memory Allocation: • Each element of the array is of same data
Divide the data into type, though the values stored in them may be
several blocks and different.
place in different
parts of the memory
• The entire array is stored contiguously in
according to the memory. This makes operations on array fast.
availability of memory • Each element of the array is identified or
space. referred using the name of the Array along with
the index of that element, which is unique for
each element. The index of an element is an
integral value associated with the element,
based on the element’s position in the array.
For example consider an array with 5 numbers:
[ 10, 9, 99, 71, 90 ]
Here, the 1st value in the array is 10 and has the
index value [0] associated with it; the 2nd value in the
array is 9 and has the index value [1] associated with
it, and so on. The last value (in this case the 5th value)
in this array has an index [4]. This is called zero based
indexing. This is very similar to the indexing of lists in
Python. The idea of arrays is so important that almost
all programming languages support it in one form or
another.

6.3 NumPy Array


NumPy arrays are used to store lists of numerical data,
vectors and matrices. The NumPy library has a large set of
routines (built-in functions) for creating, manipulating,
and transforming NumPy arrays. Python language also
has an array data structure, but it is not as versatile,
efficient and useful as the NumPy array. The NumPy

2024-25

Chap 6.indd 96 19-Jul-19 3:43:32 PM


Introduction to NumPy 97

array is officially called ndarray but commonly known


as array. In rest of the chapter, we will be referring to
NumPy array whenever we use “array”. following are few
differences between list and Array.
6.3.1 Difference Between List and Array
List Array
List can have elements of different data All elements of an array are of same data type for
types for example, [1,3.4, ‘hello’, ‘a@’] example, an array of floats may be: [1.2, 5.4, 2.7]
Elements of a list are not stored Array elements are stored in contiguous memory
contiguously in memory. locations. This makes operations on arrays faster than
lists.
Lists do not support element wise operations, Arrays support element wise operations. For example,
for example, addition, multiplication, etc. if A1 is an array, it is possible to say A1/3 to divide
because elements may not be of same type. each element of the array by 3.
Lists can contain objects of different NumPy array takes up less space in memory as
datatype that Python must store the type compared to a list because arrays do not require to
information for every element along with its store datatype of each element separately.
element value. Thus lists take more space
in memory and are less efficient.
List is a part of core Python. Array (ndarray) is a part of NumPy library.

6.3.2 Creation of NumPy Arrays from List


There are several ways to create arrays. To create an
array and to use its methods, first we need to import the
NumPy library.
#NumPy is loaded as np (we can assign any
#name), numpy must be written in lowercase
>>> import numpy as np
The NumPy’s array() function converts a given list
into an array. For example,
#Create an array called array1 from the
#given list.
>>> array1 = np.array([10,20,30])

#Display the contents of the array


>>> array1
array([10, 20, 30])

• Creating a 1-D Array


An array with only single row of elements is called
1-D array. Let us try to create a 1-D array from
a list which contains numbers as well as strings.
>>> array2 = np.array([5,-7.4,'a',7.2])
>>> array2

2024-25

Chap 6.indd 97 19-Jul-19 3:43:32 PM


98 Informatics Practices – Class XI

array(['5', '-7.4', 'a', '7.2'],


dtype='<U32')
A common mistake
occurs while passing Observe that since there is a string value in the
argument to array() if list, all integer and float values have been promoted to
we forget to put square string, while converting the list to array.
brackets. Make sure
only a single argument Note: U32 means Unicode-32 data type.
containing list of • Creating a 2-D Array
values is passed.
#incorrect way We can create a two dimensional (2-D) arrays by
>>> a = passing nested lists to the array() function.
np.array(1,2,3,4)
#correct way
>>> a =
Example 6.1
np.array([1,2,3,4]) >>> array3 = np.array([[2.4,3],
[4.91,7],[0,-1]])
>>> array3
array([[ 2.4 , 3. ],
[ 4.91, 7. ],
[ 0. , -1. ]])
Observe that the integers 3, 7, 0 and -1 have been
promoted to floats.
6.3.3 Attributes of NumPy Array
Some important attributes of a NumPy ndarray object are:
i) ndarray.ndim: gives the number of dimensions
of the array as an integer value. Arrays can be
1-D, 2-D or n-D. In this chapter, we shall focus
on 1-D and 2-D arrays only. NumPy calls the
dimensions as axes (plural of axis). Thus, a 2-D
array has two axes. The row-axis is called axis-0
A list is called nested
list when each and the column-axis is called axis-1. The number
element is a list itself. of axes is also called the array’s rank.
Example 6.2
>>> array1.ndim
1
>>> array3.ndim
2
ii) ndarray.shape: It gives the sequence of integers
indicating the size of the array for each dimension.
Example 6.3
# array1 is 1D-array, there is nothing
# after , in sequence
>>> array1.shape
(3,)
>>> array2.shape
(4,)
>>> array3.shape
(3, 2)

2024-25

Chap 6.indd 98 19-Jul-19 3:43:32 PM


Introduction to NumPy 99

The output (3, 2) means array3 has 3 rows and 2 Notes


columns.
iii) ndarray.size: It gives the total number of
elements of the array. This is equal to the product
of the elements of shape.
Example 6.4
>>> array1.size
3
>>> array3.size
6
iv) ndarray.dtype: is the data type of the elements
of the array. All the elements of an array are of
same data type. Common data types are int32,
int64, float32, float64, U32, etc.
Example 6.5
>>> array1.dtype
dtype('int32')
>>> array2.dtype
dtype('<U32>')
>>> array3.dtype
dtype('float64')
v) ndarray.itemsize: It specifies the size in bytes
of each element of the array. Data type int32 and
float32 means each element of the array occupies
32 bits in memory. 8 bits form a byte. Thus, an
array of elements of type int32 has itemsize 32/8=4
bytes. Likewise, int64/float64 means each item
has itemsize 64/8=8 bytes.
Example 6.6
>>> array1.itemsize
4 # memory allocated to integer
>>> array2.itemsize
128 # memory allocated to string
>>> array3.itemsize
8 #memory allocated to float type

6.3.4 Other Ways of Creating NumPy Arrays


1. We can specify data type (integer, float, etc.) while
creating array using dtype as an argument to
array(). This will convert the data automatically
to the mentioned type. In the following example,
nested list of integers are passed to the array
function. Since data type has been declared
as float, the integers are converted to floating
point numbers.

2024-25

Chap 6.indd 99 19-Jul-19 3:43:32 PM


100 Informatics Practices – Class XI

>>> array4 = np.array( [ [1,2], [3,4] ],


dtype=float)
>>> array4
array([[1., 2.],
[3., 4.]])
2. We can create an array with all elements initialised
to 0 using the function zeros(). By default, the
data type of the array created by zeros() is float.
The following code will create an array with 3 rows
and 4 columns with each element set to 0.
>>> array5 = np.zeros((3,4))
>>> array5
array([[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])
3. We can create an array with all elements initialised
to 1 using the function ones(). By default, the
data type of the array created by ones() is float.
The following code will create an array with 3 rows
and 2 columns.
>>> array6 = np.ones((3,2))
>>> array6
array([[1., 1.],
[1., 1.],
[1., 1.]])
4. We can create an array with numbers in a given
range and sequence using the arange() function.
This function is analogous to the range() function
of Python.
>>> array7 = np.arange(6)
# an array of 6 elements is created with
start value 5 and step size 1
>>> array7
array([0, 1, 2, 3, 4, 5])
# Creating an array with start value -2, end
# value 24 and step size 4
>>> array8 = np.arange( -2, 24, 4 )
>>> array8
array([-2, 2, 6, 10, 14, 18, 22])

6.4 Indexing and Slicing


Think and Reflect NumPy arrays can be indexed, sliced and iterated over.
When we may require
6.4.1 Indexing
to create an array
initialised to zeros or We have learnt about indexing single-dimensional
ones? array in section 6.2. For 2-D arrays indexing for both
dimensions starts from 0, and each element is referenced
through two indexes i and j, where i represents the row
number and j represents the column number.

2024-25

Chap 6.indd 100 19-Jul-19 3:43:32 PM


Introduction to NumPy 101

Table 6.1 Marks of students in different subjects Notes


Name Maths English Science
Ramesh 78 67 56

Vedika 76 75 47

Harun 84 59 60

Prasad 67 72 54

Consider Table 6.1 showing marks obtained by


students in three different subjects. Let us create an
array called marks to store marks given in three subjects
for four students given in this table. As there are 4
students (i.e. 4 rows) and 3 subjects (i.e. 3 columns),
the array will be called marks[4][3]. This array can
store 4*3 = 12 elements.
Here, marks[i,j] refers to the element at (i+1)th row
and (j+1)th column because the index values start at 0.
Thus marks[3,1] is the element in 4th row and second
column which is 72 (marks of Prasad in English).
# accesses the element in the 1st row in
# the 3rd column
>>> marks[0,2]
56
>>> marks [0,4]
index Out of Bound "Index Error". Index 4
is out of bounds for axis with size 3

6.4.2 Slicing
Sometimes we need to extract part of an array. This is
done through slicing. We can define which part of the
array to be sliced by specifying the start and end index
values using [start : end] along with the array name.
Example 6.7
>>> array8
array([-2, 2, 6, 10, 14, 18, 22])

# excludes the value at the end index


>>> array8[3:5]
array([10, 14])

# reverse the array


>>> array8[ : : -1]
array([22, 18, 14, 10, 6, 2, -2])

2024-25

Chap 6.indd 101 19-Jul-19 3:43:32 PM


102 Informatics Practices – Class XI

Notes Now let us see how slicing is done for 2-D arrays.
For this, let us create a 2-D array called array9 having
3 rows and 4 columns.
>>> array9 = np.array([[ -7, 0, 10, 20],
[ -5, 1, 40, 200],
[ -1, 1, 4, 30]])

# access all the elements in the 3rd column


>>> array9[0:3,2]
array([10, 40, 4])

Note that we are specifying rows in the range 0:3


because the end value of the range is excluded.
# access elements of 2nd and 3rd row from 1st
# and 2nd column
>>> array9[1:3,0:2]
array([[-5, 1],
[-1, 1]])
If row indices are not specified, it means all the rows
are to be considered. Likewise, if column indices are
not specified, all the columns are to be considered.
Thus, the statement to access all the elements in the 3rd
column can also be written as:
>>>array9[:,2]
array([10, 40, 4])

6.5 Operations on Arrays


Once arrays are declared, we con access it's element
or perform certain operations the last section, we
learnt about accessing elements. This section describes
multiple operations that can be applied on arrays.
6.5.1 Arithmetic Operations
Arithmetic operations on NumPy arrays are fast and
simple. When we perform a basic arithmetic operation
like addition, subtraction, multiplication, division etc. on
two arrays, the operation is done on each corresponding
pair of elements. For instance, adding two arrays will
result in the first element in the first array to be added
to the first element in the second array, and so on.
Consider the following element-wise operations on two
arrays:
>>> array1 = np.array([[3,6],[4,2]])
>>> array2 = np.array([[10,20],[15,12]])

2024-25

Chap 6.indd 102 19-Jul-19 3:43:32 PM


Introduction to NumPy 103

#Element-wise addition of two matrices. Notes


>>> array1 + array2
array([[13, 26],
[19, 14]])

#Subtraction
>>> array1 - array2
array([[ -7, -14],
[-11, -10]])

#Multiplication
>>> array1 * array2
array([[ 30, 120],
[ 60, 24]])

#Matrix Multiplication
>>> array1 @ array2
array([[120, 132],
[ 70, 104]])

#Exponentiation
>>> array1 ** 3
array([[ 27, 216],
[ 64, 8]], dtype=int32)

#Division
>>> array2 / array1
array([[3.33333333, 3.33333333],
[3.75 , 6. ]])

#Element wise Remainder of Division


#(Modulo)
>>> array2 % array1
array([[1, 2],
[3, 0]], dtype=int32)
It is important to note that for element-wise
operations, size of both arrays must be same. That is,
array1.shape must be equal to array2.shape.
6.5.2 Transpose
Transposing an array turns its rows into columns and
columns into rows just like matrices in mathematics.
#Transpose
>>> array3 = np.array([[10,-7,0, 20],
[-5,1,200,40],[30,1,-1,4]])
>>> array3
array([[ 10, -7, 0, 20],
[ -5, 1, 200, 40],
[ 30, 1, -1, 4]])

2024-25

Chap 6.indd 103 19-Jul-19 3:43:32 PM


104 Informatics Practices – Class XI

Notes # the original array does not change


>>> array3.transpose()
array([[ 10, -5, 30],
[ -7, 1, 1],
[ 0, 200, -1],
[ 20, 40, 4]])

6.5.3 Sorting
Sorting is to arrange the elements of an array in
hierarchical order either ascending or descending. By
default, numpy does sorting in ascending order.
>>> array4 = np.array([1,0,2,-3,6,8,4,7])
>>> array4.sort()
>>> array4
array([-3, 0, 1, 2, 4, 6, 7, 8])
In 2-D array, sorting can be done along either of the
axes i.e., row-wise or column-wise. By default, sorting
is done row-wise (i.e., on axis = 1). It means to arrange
elements in each row in ascending order. When axis=0,
sorting is done column-wise, which means each column
is sorted in ascending order.
>>> array4 = np.array([[10,-7,0, 20],
[-5,1,200,40],[30,1,-1,4]])
>>> array4
array([[ 10, -7, 0, 20],
[ -5, 1, 200, 40],
[ 30, 1, -1, 4]])

#default is row-wise sorting


>>> array4.sort()
>>> array4
array([[ -7, 0, 10, 20],
[ -5, 1, 40, 200],
[ -1, 1, 4, 30]])
>>> array5 = np.array([[10,-7,0, 20],
[-5,1,200,40],[30,1,-1,4]])

#axis =0 means column-wise sorting


>>> array5.sort(axis=0)
>>> array5
array([[ -5, -7, -1, 4],
[ 10, 1, 0, 20],
[ 30, 1, 200, 40]])

6.6 Concatenating Arrays


Concatenation means joining two or more arrays.
Concatenating 1-D arrays means appending the
sequences one after another. NumPy.concatenate()

2024-25

Chap 6.indd 104 19-Jul-19 3:43:32 PM


Introduction to NumPy 105

function can be used to concatenate two or more Notes


2-D arrays either row-wise or column-wise. All the
dimensions of the arrays to be concatenated must match
exactly except for the dimension or axis along which
they need to be joined. Any mismatch in the dimensions
results in an error. By default, the concatenation of the
arrays happens along axis=0.
Example 6.8
>>> array1 = np.array([[10, 20], [-30,40]])
>>> array2 = np.zeros((2, 3), dtype=array1.
dtype)

>>> array1
array([[ 10, 20],
[-30, 40]])

>>> array2
array([[0, 0, 0],
[0, 0, 0]])

>>> array1.shape
(2, 2)
>>> array2.shape
(2, 3)

>>> np.concatenate((array1,array2), axis=1)


array([[ 10, 20, 0, 0, 0],
[-30, 40, 0, 0, 0]])

>>> np.concatenate((array1,array2), axis=0)


Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
np.concatenate((array1,array2))
ValueError: all the input array dimensions
except for the concatenation axis must
match exactly

6.7 Reshaping Arrays


We can modify the shape of an array using the reshape()
function. Reshaping an array cannot be used to change
the total number of elements in the array. Attempting
to change the number of elements in the array using
reshape() results in an error.
Example 6.9
>>> array3 = np.arange(10,22)
>>> array3
array([10, 11, 12, 13, 14, 15, 16, 17, 18,
19, 20, 21])

2024-25

Chap 6.indd 105 19-Jul-19 3:43:32 PM


106 Informatics Practices – Class XI

Notes >>> array3.reshape(3,4)


array([[10, 11, 12, 13],
[14, 15, 16, 17],
[18, 19, 20, 21]])

>>> array3.reshape(2,6)
array([[10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21]])

6.8 Splitting Arrays


We can split an array into two or more subarrays.
numpy.split() splits an array along the specified axis.
We can either specify sequence of index values where an
array is to be split; or we can specify an integer N, that
indicates the number of equal parts in which the array
is to be split, as parameter(s) to the NumPy.split()
function. By default, NumPy.split() splits along axis =
0. Consider the array given below:
>>> array4
array([[ 10, -7, 0, 20],
[ -5, 1, 200, 40],
[ 30, 1, -1, 4],
[ 1, 2, 0, 4],
[ 0, 1, 0, 2]])

# [1,3] indicate the row indices on which


# to split the array
>>> first, second, third = numpy split(array4,
[1, 3])

# array4 is split on the first row and


# stored on the sub-array first
>>> first
array([[10, -7, 0, 20]])

# array4 is split after the first row and


# upto the third row and stored on the
# sub-array second
>>> second
array([[ -5, 1, 200, 40],
[ 30, 1, -1, 4]])

# the remaining rows of array4 are stored


# on the sub-array third
>>> third
array([[1, 2, 0, 4],
[0, 1, 0, 2]])

2024-25

Chap 6.indd 106 19-Jul-19 3:43:32 PM


Introduction to NumPy 107

#[1, 2], axis=1 give the columns indices Notes


#along which to split
>>> firstc, secondc, thirdc =numpy split(array4,
[1, 2], axis=1)
>>> firstc
array([[10],
[-5],
[30],
[ 1],
[ 0]])

>>> secondc
array([[-7],
[ 1],
[ 1],
[ 2],
[ 1]])

>>> thirdc
array([[ 0, 20],
[200, 40],
[ -1, 4],
[ 0, 4],
[ 0, 2]])

# 2nd parameter 2 implies array is to be


# split in 2 equal parts axis=1 along the
# column axis
>>> firsthalf, secondhalf =np.split(array4,2,
axis=1)
>>> firsthalf
array([[10, -7],
[-5, 1],
[30, 1],
[ 1, 2],
[ 0, 1]])

>>> secondhalf
array([[ 0, 20],
[200, 40],
[ -1, 4],
[ 0, 4],
[ 0, 2]])

6.9 Statistical Operations on Arrays


NumPy provides functions to perform many useful
statistical operations on arrays. In this section, we will
apply the basic statistical techniques called descriptive
statistics that we have learnt in chapter 5.

2024-25

Chap 6.indd 107 19-Jul-19 3:43:32 PM


108 Informatics Practices – Class XI

Notes Let us consider two arrays:


>>> arrayA = np.array([1,0,2,-3,6,8,4,7])
>>> arrayB = np.array([[3,6],[4,2]])
1. The max() function finds the maximum element
from an array.
# max element form the whole 1-D array
>>> arrayA.max()
8
# max element form the whole 2-D array
>>> arrayB.max()
6
# if axis=1, it gives column wise maximum
>>> arrayB.max(axis=1)
array([6, 4])
# if axis=0, it gives row wise maximum
>>> arrayB.max(axis=0)
array([4, 6])
2. The min() function finds the minimum element
from an array.
>>> arrayA.min()
-3
>>> arrayB.min()
2
>>> arrayB.min(axis=0)
array([3, 2])
3. The sum() function finds the sum of all elements
of an array.
>>> arrayA.sum()
25
>>> arrayB.sum()
15
#axis is used to specify the dimension
#on which sum is to be made. Here axis = 1
#means the sum of elements on the first row
>>> arrayB.sum(axis=1)
array([9, 6])
4. The mean() function finds the average of elements
of the array.
>>> arrayA.mean()
3.125
>>> arrayB.mean()
3.75
>>> arrayB.mean(axis=0)
array([3.5, 4. ])
>>> arrayB.mean(axis=1)
array([4.5, 3. ])
5. The std() function is used to find standard
deviation of an array of elements.
>>> arrayA.std()
3.550968177835448

2024-25

Chap 6.indd 108 19-Jul-19 3:43:32 PM


Introduction to NumPy 109

>>> arrayB.std() Notes


1.479019945774904

>>> arrayB.std(axis=0)
array([0.5, 2. ])

>>> arrayB.std(axis=1)
array([1.5, 1. ])

6.10 Loading Arrays from Files


Sometimes, we may have data in files and we may need
to load that data in an array for processing. numpy.
loadtxt() and numpy.genfromtxt()are the two
functions that can be used to load data from text files.
The most commonly used file type to handle large amount
of data is called CSV (Comma Separated Values).
Each row in the text file must have the same number
of values in order to load data from a text file into a
numpy array. Let us say we have the following data in a
text file named data.txt stored in the folder C:/NCERT.

RollNo Marks1 Marks2 Marks3


1, 36, 18, 57
2, 22, 23, 45
3, 43, 51, 37
4, 41, 40, 60
5, 13, 18, 37
We can load the data from the data.txt file into an
array say, studentdata in the following manner:
6.10.1 Using NumPy.loadtxt()
>>> studentdata = np.loadtxt('C:/NCERT/
data.txt', skiprows=1, delimiter=',',
dtype = int)

>>> studentdata
array([[ 1, 36, 18, 57],
[ 2, 22, 23, 45],
[ 3, 43, 51, 37],
[ 4, 41, 40, 60],
[ 5, 13, 18, 27]])
In the above statement, first we specify the name
and path of the text file containing the data. Let us
understand some of the parameters that we pass in the
np.loadtext() function:

2024-25

Chap 6.indd 109 19-Jul-19 3:43:32 PM


110 Informatics Practices – Class XI

• The parameter skiprows=1 indicates that the


first row is the header row and therefore we
need to skip it as we do not want to load it in
the array.
• The delimiter specifies whether the values are
separated by comma, semicolon, tab or space
(the four are together called whitespace), or any
other character. The default value for delimiter
is space.
• We can also specify the data type of the array
to be created by specifying through the dtype
argument. By default, dtype is float.
We can load each row or column of the data file into
different numpy arrays using the unpack parameter.
By default, unpack=False means we can extract each
row of data as separate arrays. When unpack=True, the
returned array is transposed means we can extract the
columns as separate arrays.
# To import data into multiple NumPy arrays
# row wise. Values related to student1 in
# array stud1, student2 in array stud2 etc.
>>> stud1, stud2, stud3, stud4, stud5 =
np.loadtxt('C:/NCERT/data.txt',skiprows=1,
.CSV files or comma delimiter=',', dtype = int)
separated values
files are a type of text >>> stud1
files that have values array([ 1, 36, 18, 57])
separated by commas. >>> stud2
A CSV file stores array([ 2, 22, 23, 45]) # and so on
tabular data in a text
file. CSV files can # Import data into multiple arrays column
be loaded in NumPy # wise. Data in column RollNo will be put
arrays and their data # in array rollno, data in column Marks1
can be analyzed using # will be put in array mks1 and so on.
these functions. >>> rollno, mks1, mks2, mks3 =
np.loadtxt('C:/NCERT/data.txt',
skiprows=1, delimiter=',', unpack=True,
dtype = int)
>>> rollno
array([1, 2, 3, 4, 5])

>>> mks1
array([36, 22, 43, 41, 13])

>>> mks2
array([18, 23, 51, 40, 18])

>>> mks3
array([57, 45, 37, 60, 27])

2024-25

Chap 6.indd 110 19-Jul-19 3:43:32 PM


Introduction to NumPy 111

6.10.2 Using NumPy.genfromtxt()


genfromtxt() is another function in NumPy to load data
from files. As compared to loadtxt(), genfromtxt() Activity 6.1
can also handle missing values in the data file. Let us
Can you write the
look at the following file dataMissing.txt with some command to load the
missing values and some non-numeric data: data.txt including the
header row as well?
RollNo Marks1 Marks2 Marks3
1, 36, 18, 57
2, ab, 23, 45
3, 43, 51,
4, 41, 40, 60
5, 13, 18, 27

>>> dataarray = np.genfromtxt('C:/NCERT/


dataMissing.txt',skip_header=1,
delimiter = ',')

>>> dataarray
array([[ 1., 36., 18., 57.],
[ 2., nan, 23., 45.],
[ 3., 43., 51., nan],
[ 4., 41., 40., 60.],
[ 5., 13., 18., 27.]])
The genfromtxt() function converts missing values
and character strings in numeric columns to nan. But if
we specify dtype as int, it converts the missing or other
non numeric values to -1. We can also convert these
missing values and character strings in the data files
to some specific value using the parameter filling_
values.
Example 6.10 Let us set the value of the missing or non
numeric data to -999:
Activity 6.2
>>> dataarray = np.genfromtxt('C:/NCERT/
Can you create a
dataMissing.txt',skip_header=1,
datafile and import
delimiter=',', filling_values=-999, data into multiple
dtype = int) NumPy arrays column
wise? (Hint: use unpack
>>> dataarray parameter)
array([[ 1, 36, 18, 57],
[ 2, -999, 23, 45],
[ 3, 43, 51, -999],
[ 4, 41, 40, 60],
[ 5, 13, 18, 27]])

2024-25

Chap 6.indd 111 19-Jul-19 3:43:32 PM


112 Informatics Practices – Class XI

Notes 6.11 Saving NumPy Arrays in Files on Disk


The savetxt() function is used to save a NumPy array
to a text file.
Example 6.11
>>> np.savetxt('C:/NCERT/testout.txt',
studentdata, delimiter=',', fmt='%i')
Note: We have used parameter fmt to specify the format in
which data are to be saved. The default is float.

Summary
• Array is a data type that holds objects of same
datatype (numeric, textual, etc.). The elements of
an array are stored contiguously in memory. Each
element of an array has an index or position value.
• NumPy is a Python library for scientific computing
which stores data in a powerful n-dimensional
ndarray object for faster calculations.
• Each element of an array is referenced by the array
name along with the index of that element.
• numpy.array() is a function that returns an object
of type numpy.ndarray.
• All arithmetic operations can be performed on
arrays when shape of the two arrays is same.
• NumPy arrays are not expandable or extendable.
Once a numpy array is defined, the space it occupies
in memory is fixed and cannot be changed.
• numpy.split() slices apart an array into multiple
sub-arrays along an axis.
• numpy.concatenate() function can be used to
concatenate arrays.
• numpy.loadtxt() and numpy.genfromtxt() are
functions used to load data from files. The savetxt()
function is used to save a NumPy array to a
text file.

2024-25

Chap 6.indd 112 19-Jul-19 3:43:32 PM


Introduction to NumPy 113

Notes

Exercise
1. What is NumPy ? How to install it?
2. What is an array and how is it different from a list? What
is the name of the built-in array class in NumPy ?
3. What do you understand by rank of an ndarray?
4. Create the following NumPy arrays:
a) A 1-D array called zeros having 10 elements and
all the elements are set to zero.
b) A 1-D array called vowels having the elements ‘a’,
‘e’, ‘i’, ‘o’ and ‘u’.
c) A 2-D array called ones having 2 rows and 5
columns and all the elements are set to 1 and
dtype as int.
d) Use nested Python lists to create a 2-D array called
myarray1 having 3 rows and 3 columns and store
the following data:
2.7, -2, -19
0, 3.4, 99.9
10.6, 0, 13
e) A 2-D array called myarray2 using arange()
having 3 rows and 5 columns with start value = 4,
step size 4 and dtype as float.
5. Using the arrays created in Question 4 above, write
NumPy commands for the following:
a) Find the dimensions, shape, size, data type of the
items and itemsize of arrays zeros, vowels,
ones, myarray1 and myarray2.
b) Reshape the array ones to have all the 10 elements
in a single row.
c) Display the 2nd and 3rd element of the array vowels.
d) Display all elements in the 2nd and 3rd row of the
array myarray1.
e) Display the elements in the 1st and 2nd column of
the array myarray1.
f) Display the elements in the 1st column of the 2nd
and 3rd row of the array myarray1.
g) Reverse the array of vowels.
6. Using the arrays created in Question 4 above, write
NumPy commands for the following:

2024-25

Chap 6.indd 113 19-Jul-19 3:43:33 PM


114 Informatics Practices – Class XI

Notes a) Divide all elements of array ones by 3.


b) Add the arrays myarray1 and myarray2.
c) Subtract myarray1 from myarray2 and store the
result in a new array.
d) Multiply myarray1 and myarray2 elementwise.
e) Do the matrix multiplication of myarray1 and
myarray2 and store the result in a new array
myarray3.
f) Divide myarray1 by myarray2.
g) Find the cube of all elements of myarray1 and
divide the resulting array by 2.
h) Find the square root of all elements of myarray2
and divide the resulting array by 2. The result
should be rounded to two places of decimals.
7. Using the arrays created in Question 4 above, write
NumPy commands for the following:
a) Find the transpose of ones and myarray2.
b) Sort the array vowels in reverse.
c) Sort the array myarray1 such that it brings the
lowest value of the column in the first row and so
on.
8. Using the arrays created in Question 4 above, write
NumPy commands for the following:
a) Use NumPy. split() to split the array myarray2
into 5 arrays columnwise. Store your resulting
arrays in myarray2A, myarray2B, myarray2C,
myarray2D and myarray2E. Print the arrays
myarray2A, myarray2B, myarray2C, myarray2D
and myarray2E.
b) Split the array zeros at array index 2, 5, 7, 8 and
store the resulting arrays in zerosA, zerosB,
zerosC and zerosD and print them.
c) Concatenate the arrays myarray2A, myarray2B
and myarray2C into an array having 3 rows and 3
columns.
9. Create a 2-D array called myarray4 using arange()
having 14 rows and 3 columns with start value = -1,
step size 0.25 having. Split this array row wise into 3
equal parts and print the result.
10. Using the myarray4 created in the above questions,
write commands for the following:
a) Find the sum of all elements.
b) Find the sum of all elements row wise.

2024-25

Chap 6.indd 114 19-Jul-19 3:43:33 PM


Introduction to NumPy 115

c) Find the sum of all elements column wise. Notes


d) Find the max of all elements.
e) Find the min of all elements in each row.
f) Find the mean of all elements in each row.
g) Find the standard deviation column wise.

Case Study (Solved)


We have already learnt that a data set (or dataset) is a
collection of data. Usually a data set corresponds to the
contents of a database table, or a statistical data matrix,
where every column of the table represents a particular
variable, and each row corresponds to a member or an item
etc. A data set lists values for each of the variables, such as
height and weight of a student, for each row (item) of the data
set. Open data refers to information released in a publicly
accessible repository.
The Iris flower data set is an example of an open data.
It is also called Fisher's Iris data set as this data set was
introduced by the British statistician and biologist Ronald
Fisher in 1936. The Iris data set consists of 50 samples from
each of the three species of the flower Iris (Iris setosa, Iris
virginica and Iris versicolor). Four features were measured
for each sample: the length and the width of the sepals and
petals, in centimeters. Based on the combination of these
four features, Fisher developed a model to distinguish one
species from each other. The full data set is freely available
on UCI Machine Learning Repository at https://archive.ics.
uci.edu/ml/datasets/iris.
We shall use the following smaller section of this data set
having 30 rows (10 rows for each of the three species). We
shall include a column for species number that has a value
1 for Iris setosa, 2 for Iris virginica and 3 for Iris versicolor.

Sepal Sepal Petal Petal Species


Iris
Length Width Length Width No
5.1 3.5 1.4 0.2 Iris-setosa 1
4.9 3 1.4 0.2 Iris-setosa 1
4.7 3.2 1.3 0.2 Iris-setosa 1
4.6 3.1 1.5 0.2 Iris-setosa 1
5 3.6 1.4 0.2 Iris-setosa 1
5.4 3.9 1.7 0.4 Iris-setosa 1
4.6 3.4 1.4 0.3 Iris-setosa 1

5 3.4 1.5 0.2 Iris-setosa 1


4.4 2.9 1.4 0.2 Iris-setosa 1

2024-25

Chap 6.indd 115 19-Jul-19 3:43:33 PM


116 Informatics Practices – Class XI

Notes 4.9 3.1 1.5 0.1 Iris-setosa 1


5.5 2.6 4.4 1.2 Iris-versicolor 2
6.1 3 4.6 1.4 Iris-versicolor 2
5.8 2.6 4 1.2 Iris-versicolor 2
5 2.3 3.3 1 Iris-versicolor 2
5.6 2.7 4.2 1.3 Iris-versicolor 2
5.7 3 4.2 1.2 Iris-versicolor 2
5.7 2.9 4.2 1.3 Iris-versicolor 2
6.2 2.9 4.3 1.3 Iris-versicolor 2
5.1 2.5 3 1.1 Iris-versicolor 2
5.7 2.8 4.1 1.3 Iris-versicolor 2
6.9 3.1 5.4 2.1 Iris-virginica 3
6.7 3.1 5.6 2.4 Iris-virginica 3
6.9 3.1 5.1 2.3 Iris-virginica 3
5.8 2.7 5.1 1.9 Iris-virginica 3
6.8 3.2 5.9 2.3 Iris-virginica 3
6.7 3.3 5.7 2.5 Iris-virginica 3
6.7 3 5.2 2.3 Iris-virginica 3
6.3 2.5 5 1.9 Iris-virginica 3
6.5 3 5.2 2 Iris-virginica 3
6.2 3.4 5.4 2.3 Iris-virginica 3

You may type this using any text editor (Notepad, gEdit
or any other) in the way as shown below and store the
file with a name called Iris.txt. (In case you wish to work
with the entire dataset you could download a .csv file for the
same from the Internet and save it as Iris.txt). The
headers are:
sepal length, sepal width, petal length, petal width, iris,
Species No
5.1, 3.5, 1.4, 0.2, Iris-setosa, 1
4.9, 3, 1.4, 0.2, Iris-setosa, 1
4.7, 3.2, 1.3, 0.2, Iris-setosa, 1
4.6, 3.1, 1.5, 0.2, Iris-setosa, 1
5, 3.6, 1.4, 0.2, Iris-setosa, 1
5.4, 3.9, 1.7, 0.4, Iris-setosa, 1
4.6, 3.4, 1.4, 0.3, Iris-setosa, 1
5, 3.4, 1.5, 0.2, Iris-setosa, 1
4.4, 2.9, 1.4, 0.2, Iris-setosa, 1
4.9, 3.1, 1.5, 0.1, Iris-setosa, 1

2024-25

Chap 6.indd 116 19-Jul-19 3:43:33 PM


Introduction to NumPy 117

5.5, 2.6, 4.4, 1.2, Iris-versicolor, 2 Notes


6.1, 3, 4.6, 1.4, Iris-versicolor, 2
5.8, 2.6, 4, 1.2, Iris-versicolor, 2
5, 2.3, 3.3, 1, Iris-versicolor, 2
5.6, 2.7, 4.2, 1.3, Iris-versicolor, 2
5.7, 3, 4.2, 1.2, Iris-versicolor, 2
5.7, 2.9, 4.2, 1.3, Iris-versicolor, 2
6.2, 2.9, 4.3, 1.3, Iris-versicolor, 2
5.1, 2.5, 3, 1.1, Iris-versicolor, 2
5.7, 2.8, 4.1, 1.3, Iris-versicolor, 2
6.9, 3.1, 5.4, 2.1, Iris-virginica, 3
6.7, 3.1, 5.6, 2.4, Iris-virginica, 3
6.9, 3.1, 5.1, 2.3, Iris-virginica, 3
5.8, 2.7, 5.1, 1.9, Iris-virginica, 3
6.8, 3.2, 5.9, 2.3, Iris-virginica, 3
6.7, 3.3, 5.7, 2.5, Iris-virginica, 3
6.7, 3, 5.2, 2.3, Iris-virginica, 3
6.3, 2.5, 5, 1.9, Iris-virginica, 3
6.5, 3, 5.2, 2, Iris-virginica, 3
6.2, 3.4, 5.4, 2.3, Iris-virginica, 3

1. Load the data in the file Iris.txt in a 2-D array called


iris.
2. Drop column whose index = 4 from the array iris.
3. Display the shape, dimensions and size of iris.
4. Split iris into three 2-D arrays, each array for a different
species. Call them iris1, iris2, iris3.
5. Print the three arrays iris1, iris2, iris3
6. Create a 1-D array header having elements "sepal
length", "sepal width", "petal length", "petal width",
"Species No" in that order.
7. Display the array header.
8. Find the max, min, mean and standard deviation for the
columns of the iris and store the results in the arrays
iris_max, iris_min, iris_avg, iris_std, iris_
var respectively. The results must be rounded to not
more than two decimal places.

2024-25

Chap 6.indd 117 19-Jul-19 3:43:33 PM


118 Informatics Practices – Class XI

Notes 9. Similarly find the max, min, mean and standard deviation
for the columns of the iris1, iris2 and iris3 and
store the results in the arrays with appropriate names.
10. Check the minimum value for sepal length, sepal width,
petal length and petal width of the three species in
comparison to the minimum value of sepal length, sepal
width, petal length and petal width for the data set as a
whole and fill the table below with True if the species value
is greater than the dataset value and False otherwise.

Iris setosa Iris virginica Iris versicolor

sepal length

sepal width

petal length

petal width

11. Compare Iris setosa’s average sepal width to that of Iris


virginica.
12. Compare Iris setosa’s average petal length to that of Iris
virginica.
13. Compare Iris setosa’s average petal width to that of Iris
virginica.
14. Save the array iris_avg in a comma separated file
named IrisMeanValues.txt on the hard disk.
15. Save the arrays iris_max, iris_avg, iris_min in
a comma separated file named IrisStat.txt on the
hard disk.
Solutions to Case Study based Exercises
>>> import numpy as np

# Solution to Q1
>>> iris = np.genfromtxt('C:/NCERT/Iris.txt',skip_
header=1, delimiter=',', dtype = float)

# Solution to Q2
>>> iris = iris[0:30,[0,1,2,3,5]] # drop column 4
# Solution to Q3
>>> iris.shape
(30, 5)
>>> iris.ndim

2024-25

Chap 6.indd 118 19-Jul-19 3:43:33 PM


Introduction to NumPy 119

2 Notes
>>> iris.size
150

# Solution to Q4
# Split into three arrays, each array for a different
# species
>>> iris1, iris2, iris3 = np.split(iris, [10,20],
axis=0)

# Solution to Q5
# Print the three arrays
>>> iris1
array([[5.1, 3.5, 1.4, 0.2, 1. ],
[4.9, 3. , 1.4, 0.2, 1. ],
[4.7, 3.2, 1.3, 0.2, 1. ],
[4.6, 3.1, 1.5, 0.2, 1. ],
[5. , 3.6, 1.4, 0.2, 1. ],
[5.4, 3.9, 1.7, 0.4, 1. ],
[4.6, 3.4, 1.4, 0.3, 1. ],
[5. , 3.4, 1.5, 0.2, 1. ],
[4.4, 2.9, 1.4, 0.2, 1. ],
[4.9, 3.1, 1.5, 0.1, 1. ]])

>>> iris2
array([[5.5, 2.6, 4.4, 1.2, 2. ],
[6.1, 3. , 4.6, 1.4, 2. ],
[5.8, 2.6, 4. , 1.2, 2. ],
[5. , 2.3, 3.3, 1. , 2. ],
[5.6, 2.7, 4.2, 1.3, 2. ],
[5.7, 3. , 4.2, 1.2, 2. ],
[5.7, 2.9, 4.2, 1.3, 2. ],
[6.2, 2.9, 4.3, 1.3, 2. ],
[5.1, 2.5, 3. , 1.1, 2. ],
[5.7, 2.8, 4.1, 1.3, 2. ]])

>>> iris3
array([[6.9, 3.1, 5.4, 2.1, 3. ],
[6.7, 3.1, 5.6, 2.4, 3. ],
[6.9, 3.1, 5.1, 2.3, 3. ],
[5.8, 2.7, 5.1, 1.9, 3. ],
[6.8, 3.2, 5.9, 2.3, 3. ],
[6.7, 3.3, 5.7, 2.5, 3. ],
[6.7, 3. , 5.2, 2.3, 3. ],
[6.3, 2.5, 5. , 1.9, 3. ],
[6.5, 3. , 5.2, 2. , 3. ],
[6.2, 3.4, 5.4, 2.3, 3. ]])

2024-25

Chap 6.indd 119 19-Jul-19 3:43:33 PM


120 Informatics Practices – Class XI

Notes # Solution to Q6
>>> header =np.array(["sepal length", "sepal
width", "petal length", "petal width",
"Species No"])

# Solution to Q7
>>> print(header)
['sepal length' 'sepal width' 'petal length' 'petal
width' 'Species No']

# Solution to Q8
# Stats for array iris
# Finds the max of the data for sepal length, sepal
width, petal length, petal width, Species No
>>> iris_max = iris.max(axis=0)
>>> iris_max
array([6.9, 3.9, 5.9, 2.5, 3. ])

# Finds the min of the data for sepal length, sepal


# width, petal length, petal width, Species No
>>> iris_min = iris.min(axis=0)
>>> iris_min
array([4.4, 2.3, 1.3, 0.1, 1. ])

# Finds the mean of the data for sepal length, sepal


# width, petal length, petal width, Species No
>>> iris_avg = iris.mean(axis=0).round(2)
>>> iris_avg
array([5.68, 3.03, 3.61, 1.22, 2. ])

# Finds the standard deviation of the data for sepal


# length, sepal width, petal length, petal width,
# Species No
>>> iris_std = iris.std(axis=0).round(2)
>>> iris_std
array([0.76, 0.35, 1.65, 0.82, 0.82])

# Solution to Q9
>>> iris1_max = iris1.max(axis=0)
>>> iris1_max
array([5.4, 3.9, 1.7, 0.4, 1. ])

>>> iris2_max = iris2.max(axis=0)


>>> iris2_max
array([6.2, 3. , 4.6, 1.4, 2. ])

2024-25

Chap 6.indd 120 19-Jul-19 3:43:33 PM


Introduction to NumPy 121

>>> iris3_max = iris3.max(axis=0) Notes


>>> iris3_max
array([6.9, 3.4, 5.9, 2.5, 3. ])

>>> iris1_min = iris1.min(axis=0)


>>> iris1_min
array([4.4, 2.9, 1.3, 0.1, 1. ])
>>> iris2_min = iris2.min(axis=0)
>>> iris2_min
array([5. , 2.3, 3. , 1. , 2. ])

>>> iris3_min = iris3.min(axis=0)


>>> iris3_min
array([5.8, 2.5, 5. , 1.9, 3. ])

>>> iris1_avg = iris1.mean(axis=0)


>>> iris1_avg
array([4.86, 3.31, 1.45, 0.22, 1. ])

>>> iris2_avg = iris2.mean(axis=0)


>>> iris2_avg
array([5.64, 2.73, 4.03, 1.23, 2. ])

>>> iris3_avg = iris3.mean(axis=0)


>>> iris3_avg
array([6.55, 3.04, 5.36, 2.2 , 3. ])

>>> iris1_std = iris1.std(axis=0).round(2)


>>> iris1_std
array([0.28, 0.29, 0.1 , 0.07, 0. ])

>>> iris2_std = iris2.std(axis=0).round(2)


>>> iris2_std
array([0.36, 0.22, 0.47, 0.11, 0. ])

>>> iris3_std = iris3.std(axis=0).round(2)


>>> iris3_std
array([0.34, 0.25, 0.28, 0.2 , 0. ])

# Solution to Q10 (solve other parts on the same lines)


# min sepal length of each species Vs the min sepal
# length in the data set
>>> iris1_min[0] > iris_min[0] #sepal length
False

2024-25

Chap 6.indd 121 19-Jul-19 3:43:33 PM


122 Informatics Practices – Class XI

Notes >>> iris2_min[0] > iris_min[0]


True
>>> iris3_min[0] > iris_min[0]
True

# Solution to Q11
#Compare Iris setosa and Iris virginica
>>> iris1_avg[1] > iris2_avg[1] #sepal width
True

# Solution to Q12
>>> iris1_avg[2] > iris2_avg[2] #petal length
False

# Solution to Q13
>>> iris1_avg[3] > iris2_avg[3] #petal width
False

# Solution to Q14
>>> np.savetxt('C:/NCERT/IrisMeanValues.txt',
iris_avg, delimiter = ',')

# Solution to Q15
>>> np.savetxt('C:/NCERT/IrisStat.txt', (iris_
max, iris_avg, iris_min), delimiter=',')

2024-25

Chap 6.indd 122 19-Jul-19 3:43:33 PM


Database Chapter

Concepts 7

In this chapter

»» Introduction
»» File System
“Inconsistency of your mind… Can damage »» Database Management
System
your memory… Remove the inconsistent
»» Relational Data Model
data… And keep the original one !!!” »» Keys in a Relational
Database
— Nisarga Jain

7.1 Introduction
After learning about importance of data in the
previous chapter, we need to explore the methods
to store and manage data electronically. Let us
take an example of a school that maintains data
about its students, along with their attendance
record and guardian details.
The class teacher marks daily attendance of the
students in the attendance register. The teacher
records ‘P’ for present or ‘A’ for absent against
each student’s roll number on each working day.
If class strength is 50 and total working days in

2024-25

Chap 7.indd 123 19-Jul-19 3:45:07 PM


124 Informatics Practices – Class XI

a month are 26, the teacher needs to record 50 × 26


records manually in the register every month. As the
volume of data increases, manual data entry becomes
tedious. Following are some of the limitations of manual
record keeping in this example:
1) Entry of student details (Roll number and name)
Activity 7.1 in the new attendance register when the student is
promoted to the next class.
Visit a few shops
where records are 2) Writing student details on each month’s attendance
maintained manually page where inconsistency may happen due to
and identify a few incorrectly written names, skipped student
limitations of manual records, etc.
record keeping faced 3) Loss of data in case attendance register is lost or
by them. damaged.
4) Erroneous calculation while consolidating
attendance record manually.
The office staff also manually maintain Student
details viz. Roll Number, Name and Date of Birth
with respective guardian details viz. Guardian name,
Contact Number and Address. This is required for
correspondence with guardian regarding student
attendance and result.
Finding information from a huge volume of papers
or deleting/modifying an entry is a difficult task in pen
and paper based approach. To overcome the hassles
faced in manual record keeping, it is desirable to store
attendance record and student details on separate data
files on a computerized system, so that office staff and
teachers can:
1) Simply copy the student details to the new
attendance file from the old attendance file when
students are promoted to next class.
2) Find any data about student or guardian.
3) Add more details to existing data whenever a new
student joins the school.
4) Modify stored data like details of student or guardian
whenever required.
5) Remove/delete data whenever a student leaves the
school.

7.2 File System


A file can be understood as a container to store data in
a computer. Files can be stored on the storage device
of a computer system. Contents of a file can be texts,
computer program code, comma separated values

2024-25

Chap 7.indd 124 19-Jul-19 3:45:07 PM


Database Concepts 125

(CSV), etc. Likewise, pictures, audios/videos, web pages


are also files.
Files stored on a computer can be accessed directly
and searched for desired data. But to access data of a
file through software, for example, to display monthly
attendance report on school website, one has to write
computer programs to access data from files.
Continuing the example of attendance at school,
we need to store data about students and attendance
in two separate files. Table 7.1 shows the contents of
STUDENT file which has six columns, as detailed below:
RollNumber – Roll number of the student
SName – Name of the student
SDateofBirth – Date of birth of the student
GName – Name of the guardian
GPhone – Phone number of the student guardian
GAddress – Address of the guardian of the student
Table 7.1 STUDENT file maintained by office staff
Roll SDateof
SName GName GPhone GAddress
Number Birth
1 Atharv Ahuja 2003-05-15 Amit Ahuja 5711492685 G-35, Ashok Vihar,
Delhi
2 Daizy Bhutia 2002-02-28 Baichung 3612967082 Flat no. 5, Darjeeling
Bhutia Appt., Shimla
3 Taleem Shah 2002-02-28 Himanshu Shah 4726309212 26/77, West Patel
Nagar, Ahmedabad
4 John Dsouza 2003-08-18 Danny Dsouza S -13, Ashok Village,
Daman
5 Ali Shah 2003-07-05 Himanshu Shah 4726309212 26/77, West Patel
Nagar, Ahmedabad

6 Manika P. 2002-03-10 Sujata P. 3801923168 HNO-13, B- block, Preet


Vihar, Madurai

Table 7.2 shows another file called ATTENDANCE


which has four columns, as detailed below:
AttendanceDate – Date for which attendance was
marked
RollNumber – Roll number of the student
SName – Name of the student
AttendanceStatus – Marked as P (present) or A (absent)

2024-25

Chap 7.indd 125 3/31/2023 3:44:54 PM


126 Informatics Practices – Class XI

Table 7.2 ATTENDANCE file maintained by class teacher


AttendanceDate RollNumber SName AttendanceStatus
2018-09-01 1 Atharv Ahuja P
2018-09-01 2 Daizy Bhutia P
2018-09-01 3 Taleem Shah A
2018-09-01 4 John Dsouza P
2018-09-01 5 Ali Shah A
2018-09-01 6 Manika P. P
2018-09-02 1 Atharv Ahuja P
2018-09-02 2 Daizy Bhutia P
2018-09-02 3 Taleem Shah A
2018-09-02 4 John Dsouza A
2018-09-02 5 Ali Shah P
2018-09-02 6 Manika P. P

7.2.1 Limitations of a File System


File system becomes difficult to handle when number of
files increases and volume of data also grows. Following
are some of the limitations of file system:
(A) Difficulty in Access
Files themselves do not provide any mechanism to
retrieve data. Data maintained in a file system are
accessed through application programs. While writing
such programs, the developer may not anticipate all
the possible ways in which data may be accessed. So,
sometimes it is difficult to access data in the required
format and one has to write application program to
access data.
(B) Data Redundancy
Redundancy means same data are duplicated in
different places (files). In our example, student names
are maintained in both the files. Besides, in Table 7.1,
students with roll numbers 3 and 5 have same guardian
name and therefore same guardian name is maintained
twice. Both these are examples of redundancy which is
difficult to avoid in a file system. Redundancy leads to
excess storage use and may cause data inconsistency
also.
(C) Data Inconsistency
Data inconsistency occurs when same data maintained
in different places do not match. If a student wants to
get changed the spelling of her name, it needs to be

2024-25

Chap 7.indd 126 19-Jul-19 3:45:07 PM


Database Concepts 127

changed in SName column in both the files. Likewise, if Notes


a student leaves school, the details need to be deleted
from both the files. As the files are being maintained by
different people, the changes may not happen in one of
the files. In that case, the student name will be different
(inconsistent) in both the files.
(D) Data Isolation
Both the files presented at Table 7.1 (STUDENT) and at
Table 7.2 (ATTENDANCE) are related to students. But
there is no link or mapping between them. The school
will have to write separate programs to access these two
files. This is because data mapping is not supported in
file system. In a more complex system where data files
are generated by different person at different times, files
being created in isolation may be of different formats.
In such case, it is difficult to write new application
programs to retrieve data from different files maintained
at multiple places, as one has to understand the
underlying structure of each file as well.
(E) Data Dependence
Data are stored in a specific format or structure in a
file. If the structure or format itself is changed, all the
existing application programs accessing that file also
need to be change. Otherwise, the programs may not
work correctly. This is data dependency. Hence, updating
the structure of a data file requires modification in all
the application programs accessing that file.
(F) Controlled Data Sharing
There can be different category of users like teacher,
office staff and parents. Ideally, not every user should
be able to access all the data. As an example, guardians
and office staff can only see the student attendance data
but should not be able to modify/delete it. It means
these users should be given limited access (read only)
to the ATTENDANCE file. Only the teacher should be
able to update the attendance data. It is very difficult to
enforce this kind of access control in a file system while
accessing files through application programs.

7.3 Database Management System


Limitations faced in file system can be overcome by
storing the data in a database where data are logically
related. We can organise related data in a database so
that it can be managed in an efficient and easy way.

2024-25

Chap 7.indd 127 19-Jul-19 3:45:07 PM


128 Informatics Practices – Class XI

A database management system (DBMS) or database


system in short, is a software that can be used to
create and manage databases. DBMS lets users to
create a database, store, manage, update/modify and
retrieve data from that database by users or application
programs. Some examples of open source and
commercial DBMS include MySQL, Oracle, PostgreSQL,
SQL Server, Microsoft Access, MongoDB.
A database system hides certain details about
how data are actually stored and maintained. Thus,
it provides users with an abstract view of the data. A
database system has a set of programs through which
users or other programs can access, modify and retrieve
the stored data.
The DBMS serves as an interface between the
database and end users or application programs.
Some database
management systems
Retrieving data from a database through special type of
include a graphical commands is called querying the database. In addition,
user interface for users users can modify the structure of the database itself
to create and manage through a DBMS.
databases. Other Databases are widely used in various fields. Some
database systems use a
command line interface
applications are given in Table 7.3.
that requires users Table 7.3 Use of Database in Real-life Applications
to use programming Application Database to maintain data about
commands to create
and manage databases. Banking customer information, account details, loan details,
transaction details, etc.
Crop Loan kisan credit card data, farmer’s personal data, land
area and cultivation data, loan history, repayment
data, etc.
Inventory product details, customer information, order details,
Management delivery data, etc.
Organisation employee records, salary details, department
Resource information, branch locations, etc.
Management
Online items description, user login details, users
Shopping preferences details, etc.

7.3.1 File System to DBMS


Let us revisit our school example where two data files
were maintained (Table 7.1 by office and Table 7.2 by
teacher). Let us now design a database to store data of
those two files. We know that tables in a database are
linked or related through one or more common columns
or fields. In our example, the STUDENT (Table 7.1) file
and ATTENDANCE (Table 7.2) file have RollNumber
and SName as common field names. In order to convert

2024-25

Chap 7.indd 128 19-Jul-19 3:45:07 PM


Database Concepts 129

these two files into a database, we need to incorporate


the following changes:
a) SName need not be maintained in ATTENDANCE
file as it is already there in STUDENT. Details for a
student can be retrieved through the common field
RollNumber in both the files.
b) If two siblings are in the same class, then same
guardian details (GName, GPhone and GAddress)
are maintained for both the siblings. We know this High Cost is incurred
while shifting from file
is a redundancy and by using a database we can
system to DBMS:
avoid this. So let us split the STUDENT file into two
file (STUDENT file and GUARDIAN) file so that each • Purchasing
sophisticated
guardian data are maintained only once. hardware and
c) One and more guardians can have the same name. software.
So it will not be possible to identify which guardian • Training users for
is related to which student. In such case, we need querying.
to create an additional column, say GUID (Guardian • Recurrent cost to
ID) that will take unique value for each record in take regular backup
the GUARDIAN file. The column GUID will also be and perform recovery
kept with STUDENT file for relating these two files. operations.

Note: We could distinguish guardians by their phone numbers


also. But, phone number can change, and therefore may not
truly distinguish guardian.
Figure 7.1 shows the related data files for the
STUDENT, GUARDIAN and ATTENDANCE details. Note
that this is not the complete database schema since it
does not show any relationship among tables.

STUDENT GUARDIAN ATTENDANCE


RollNumber GUID AttendanceDate
SName GName RollNumber
SDateofBirth GPhone AttendanceStatus
GUID GAddress

Figure 7.1: Record structure of three files in


STUDENTATTENDANCE Database
The tables shown at Figure 7.1 are empty, which are
to be populated with actual data as shown in Table 7.4,
7.5 and 7.6.
Table 7.4 Snapshot of STUDENT table
RollNumber SName SDateofBirth GUID
1 Atharv Ahuja 2003-05-15 444444444444
2 Daizy Bhutia 2002-02-28 111111111111

2024-25

Chap 7.indd 129 19-Jul-19 3:45:07 PM


130 Informatics Practices – Class XI

3 Taleem Shah 2002-02-28


4 John Dsouza 2003-08-18 333333333333
5 Ali Shah 2003-07-05 101010101010
6 Manika P. 2002-03-10 466444444666

Table 7.5 Snapshot of GUARDIAN table


GUID GName GPhone GAddress
444444444444 Amit Ahuja 5711492685 G-35, Ashok Vihar, Delhi
111111111111 Baichung Bhutia 3612967082 Flat no. 5, Darjeeling Appt., Shimla
101010101010 Himanshu Shah 4726309212 26/77, West Patel Nagar, Ahmedabad
333333333333 Danny Dsouza S -13, Ashok Village, Daman
466444444666 Sujata P. 3801923168 HNO-13, B- block, Preet Vihar, Madurai

Table 7.6 Snapshot of ATTENDANCE table


Date RollNumber Status
2018-09-01 1 P
2018-09-01 2 P
2018-09-01 3 A
2018-09-01 4 P
2018-09-01 5 A
2018-09-01 6 P
2018-09-02 1 P
2018-09-02 2 P
2018-09-02 3 A
2018-09-02 4 A
2018-09-02 5 P
2018-09-02 6 P

Figure 7.2 shows a simplified database called


STUDENTATTENDANCE, which is used to maintain
data about the student, guardian and attendance. As
shown here, the DBMS maintains a single repository
of data at a centralized location and can be used by
multiple users (office staff, teacher) at the same time.
7.3.2 Key Concepts in DBMS
In order to efficiently manage data using a DBMS, let us
understand certain key terms:
(A) Database Schema
Database Schema is the design of a database. It is the
skeleton of the database that represents the structure
(table names and their fields/columns), the type of data
each column can hold, constraints on the data to be
stored (if any), and the relationships among the tables.

2024-25

Chap 7.indd 130 3/31/2023 3:56:48 PM


Database Concepts 131

Teacher Office Staff

ry
Query Result

Qu

Query Result
e
Qu
er
y

DBMS Software processes Query


DBMS Software access database and its definition

Student
Database
Guardian Catalog

Attendance

Figure 7.3: StudentAttendance Database Environment

Database schema is also called the visual or logical


architecture as it tells us how the data are organised in
a database.
(B) Data Constraint
Sometimes we put certain restrictions or limitations on
the type of data that can be inserted in one or more
columns of a table. This is done by specifying one or
more constraints on that column(s) while creating the
tables. For example, one can define the constraint that
the column mobile number can only have non-negative
integer values of exactly 10 digits. Since each student
shall have one unique roll number, we can put the NOT
NULL and UNIQUE constraints on the RollNumber
column. Constraints are used to ensure accuracy and
reliability of data in the database
(C) Meta-data or Data Dictionary
The database schema along with various constraints on the
data is stored by DBMS in a database catalog or dictionary,
called meta-data. A meta-data is data about the data.
(D) Database Instance
When we define database structure or schema, state
of database is empty i.e. no data entry is there. After

2024-25

Chap 7.indd 131 19-Jul-19 3:45:08 PM


132 Informatics Practices – Class XI

loading data, the state or snapshot of the database


at any given time is the database instance. We may
then retrieve data through queries or manipulate data
through updation, modification or deletion. Thus, the
state of database can change, and thus a database
schema can have many instances at different times.
(E) Query
A query is a request to a database for obtaining
information in a desired way. Query can be made to get
data from one table or from a combination of tables. For
example, “find names of all those students present on
Attendance Date 2000-01-02” is a query to the database.
To retrieve or manipulate data, the user needs to write
query using a query language called, which is discussed
in chapter 8.
(F) Data Manipulation
Limitations of DBMS Modification of database consists of three operations
Increased Complexity: viz. Insertion, Deletion or Update. Suppose Rivaan joins
Use of DBMS increases as a new student in the class then the student details
the complexity need to be added in STUDENT as well as in GUARDIAN
of maintaining
files of the Student Attendance database. This is called
functionalities like
security, consistency, Insertion operation on the database. In case a student
sharing and integrity leaves the school, then his/her data as well as her
guardian details need to be removed from STUDENT,
Increased data GUARDIAN and ATTENDANCE files, respectively. This
vulnerability:
As data are stored
is called Deletion operation on the database. Suppose
centrally, it increases Atharv’s Guardian has changed his mobile number, his
the chances of loss GPhone should be updated in GUARDIAN file. This is
of data due to any called Update operation on the database.
failure of hardware or
software. It can bring (G) Database Engine
all operations to a halt Database engine is the underlying component or set of
for all the users. programs used by a DBMS to create database and handle
various queries for data retrieval and manipulation.

7.4 Relational Data Model


Different types of DBMS are available and their
classification is done based on the underlying data model.
A data model describes the structure of the database,
including how data are defined and represented,
relationships among data, and the constraints. The most
commonly used data model is Relational Data Model.
Other types of data models include object-oriented data
model, entity-relationship data model, document model
and hierarchical data model. This book discusses the
DBMS based on relational data model.

2024-25

Chap 7.indd 132 19-Jul-19 3:45:08 PM


Database Concepts 133

In relational model, tables are called relations that


store data for different columns. Each table can have
multiple columns where each column name should be
unique. For example, each row in the table represents a
related set of values. Each row of Table 7.5 represents a
particular guardian and has related values viz. guardian’s
ID with guardian name, address and phone number.
Thus, a table consists of a collection of relationships.
It is important to note here that relations in a database
are not independent tables, but are associated with each
other. For example, relation ATTENDANCE has attribute
RollNumber which links it with corresponding student
record in relation STUDENT. Similarly, attribute GUID
is placed with STUDENT table for extracting guardian
details of a particular student. If linking attributes are
not there in appropriate relations, it will not be possible
to keep the database in correct state and retrieve valid
information from the database.
Figure 7.3 shows the relational database Student
Attendance along with the three relations (tables)
STUDENT, ATTENDANCE and GUARDIAN.

Figure 7.4: Representing StudentAttendance Database using Relational Data Model

Table 7.7 Relation schemas along with its description of Student Attendance
database
Relation Scheme Description of attributes
STUDENT(RollNumber, RollNumber: unique id of the student
SName, SDateofBirth, SName: name of the student
GUID) SDateofBirth: date of birth of the student
GUID: unique id of the guardian of the student
ATTENDANCE AttendanceDate: date on which attendance is taken
(AttendanceDate, RollNumber: roll number of the student
RollNumber, AttendanceStatus: whether present (P) or absent(A)
AttendanceStatus) Note that combination of AttendanceDate and RollNumber will be unique
in each record of the table
GUARDIAN(GUID, GUID: unique id of the guardian
GName, GPhone, GName: name of the guardian
GAddress) GPhone: contact number of the guardian
GAddress: contact address of the guardian

2024-25

Chap 7.indd 133 19-Jul-19 3:45:08 PM


134 Informatics Practices – Class XI

Each tuple (row) in a relation (table) corresponds


to data of a real world entity (for example, Student,
Guardian, and Attendance). In the GUARDIAN relation
(Table 7.5), each row represents the facts about the
guardian and each column name in the GUARDIAN table
is used to interpret the meaning of data stored under that
column. A database that is modeled on relational data
model concept is called Relational Database. Figure 7.4
shows relation GUARDIAN with some populated data.
Let us now understand the commonly used
terminologies in relational data model using Figure 7.4.
Relation Guardian
with 4 attribute/
columns

GUID GName GPhone GAddress


444444444444 Amit Ahuja 5711492685 G-35, Ashok Vihar, Delhi
111111111111 Baichung Bhutia 3612967082 Flat no. 5, Darjeeling Appt., Shimla

Relation
State
101010101010 Himanshu Shah 4726309212 26/77, West Patel Nagar, Ahmedabad
333333333333 Danny Dsouza S -13, Ashok Village, Daman
466444444666 Sujata P. 3801923168 HNO-13, B- block, Preet Vihar, Madurai

Facts about RELATION GUARDIAN:


1. Degree (Number of attributes) = 4 Record/tuple/row
2. Cardinality (Number of rows/tuples/records) = 5
3. Relation is a flat file i.e, each column has a single value and each record
has same number of columns
Figure 7.5: Relation GUARDIAN with its Attributes and Tuples

i) ATTRIBUTE: Characteristic or parameters for


which data are to be stored in a relation. Simply
stated, the columns of a relation are the attributes
which are also referred as fields. For example, GUID,
GName, GPhone and GAddress are attributes of
relation GUARDIAN.
ii) TUPLE: Each row of data in a relation (table) is
called a tuple. In a table with n columns, a tuple is
a relationship between the n related values.
iii) DOMAIN: It is a set of values from which an
attribute can take a value in each row. Usually, a
data type is used to specify domain for an attribute.
For example, in STUDENT relation, the attribute
RollNumber takes integer values and hence its
domain is a set of integer values. Similarly, the set
of character strings constitutes the domain of the
attribute SName.

2024-25

Chap 7.indd 134 3/31/2023 4:13:04 PM


Database Concepts 135

iv) DEGREE: The number of attributes in a relation Notes


is called the Degree of the relation. For example,
relation GUARDIAN with four attributes is a relation
of degree 4.
v) CARDINALITY: The number of tuples in a relation
is called the Cardinality of the relation. For example,
the cardinality of relation GUARDIAN is 5 as there
are 5 tuples in the table.

7.4.1 Three Important Properties of a Relation


In relational data model, following three properties
are observed with respect to a relation which makes a
relation different from a data file or a simple table.
Property 1: imposes following rules on an attribute of
the relation.
• Each attribute in a relation has a unique name.
• Sequence of attributes in a relation is immaterial.
Property 2: governs following rules on a tuple of a
relation.
• Each tuple in a relation is distinct. For example, data
values in no two tuples of relation ATTENDANCE
can be identical for all the attributes. Thus, each
tuple of a relation must be uniquely identified by
its contents.
• Sequence of tuples in a relation is immaterial.
The tuples are not considered to be ordered, even
though they appear to be in tabular form.
Property 3: imposes following rules on the state of a
relation.
• All data values in an attribute must be from the
same domain (same data type).
• Each data value associated with an attribute
must be atomic (cannot be further divisible into
meaningful subparts). For example, GPhone of
relation GUARDIAN has ten digit numbers which
is indivisible.
• No attribute can have many data values in one
tuple. For example, Guardian cannot specify
multiple contact numbers under GPhone attribute.
• A special value “NULL” is used to represent
values that are unknown or non-applicable to
certain attributes. For example, if a guardian does
not share his or her contact number with the
school authorities, then GPhone is set to NULL
(data unknown).

2024-25

Chap 7.indd 135 19-Jul-19 3:45:08 PM


136 Informatics Practices – Class XI

Notes 7.5 Keys in a Relational Database


The tuples within a relation must be distinct. It means no two
tuples in a table should have same value for all attributes.
That is, there should be at least one attribute in which
data are distinct (unique) and not NULL. That way, we can
uniquely distinguish each tuple of a relation. So, relational
data model imposes some restrictions or constraints on the
values of the attributes and how the contents of one relation
be referred through another relation. These restrictions
are specified at the time of defining the database through
different types of keys as given below:
7.5.1 Candidate Key
A relation can have one or more attributes that takes
distinct values. Any of these attributes can be used to
uniquely identify the tuples in the relation. Such attributes
are called candidate keys as each of them are candidates
for the primary key.
As shown in Figure 7.4, the relation GUARDIAN has
four attributes out of which GUID and GPhone always take
unique values. No two guardians will have same phone
number or same GUID. Hence, these two attributes are the
candidate keys as they both are candidates for primary key.
7.5.2 Primary Key
Out of one or more candidate keys, the attribute chosen
by the database designer to uniquely identify the tuples
in a relation is called the primary key of that relation. The
remaining attributes in the list of candidate keys are called
the alternate keys.
In the relation GUARDIAN, suppose GUID is chosen as
primary key, then GPhone will be called the alternate key.
7.5.3 Composite Primary Key
If no single attribute in a relation is able to uniquely
distinguish the tuples, then more than one attribute are
taken together as primary key. Such primary key consisting
of more than one attribute is called Composite Primary key.
In relation ATTENDANCE, Roll Number cannot be used
as primary key as roll number of same student will appear
in another row for a different date. Similarly, in relation
Attendance, AttendanceDate cannot be used as primary
key because same date is repeated for each roll number.
However combination of these two attributes RollNumber and
AttendanceDate together would always have unique value in
ATTENDANCE table as on any working day, of a student
would be marked attendance only once. Hence {RollNumber,

2024-25

Chap 7.indd 136 19-Jul-19 3:45:08 PM


Database Concepts 137

AttendanceDate} will make the of ATTENDANCE relation


composite primary key.
7.5.4 Foreign Key
A foreign key is used to represent the relationship between
two relations. A foreign key is an attribute whose value
is derived from the primary key of another relation. This
means that any attribute of a relation (referencing),
which is used to refer contents from another (referenced)
relation, becomes foreign key if it refers to the primary key
of referenced relation. The referencing relation is called
Foreign Relation. In some cases, foreign key can take NULL
value if it is not the part of primary key of the foreign table.
The relation in which the referenced primary key is defined
is called primary relation or master relation.
In Figure 7.5, two foreign keys in Student Attendance
database are shown using schema diagram where the foreign
key is displayed as a directed arc (arrow) originating from it
and ending at the corresponding attribute of the primary
key of the referenced table. The underlined attributes make
the primary key of that table.

STUDENT RollNumber SName SDateofBirth GUID

GUARDIAN GUID GName GPhone GAddress

ATTENDANCE AttendanceDate RollNumber AttendanceStatus

Figure 7.2: StudentAttendance Database with the Primary and Foreign keys

Summary
• A file in a file system is a container to store data in a
computer.
• File system suffers from Data Redundancy, Data
Inconsistency, Data Isolation, Data Dependence and
Controlled Data sharing.
• Database Management System (DBMS) is a software
to create and manage databases. A database is a
collection of tables.
• Database schema is the design of a database
• A database constraint is a restriction on the type of
data that that can be inserted into the table.
• Database schema and database constraints are stored
in database Catalog.

2024-25

Chap 7.indd 137 19-Jul-19 3:45:08 PM


138 Informatics Practices – Class XI

• Whereas the snapshot of the database at any given


time is the database instance.
• A query is a request to a database for information
retrieval and data manipulation (insertion, deletion or
update). It is written in Structured Query Language
(SQL).
• Relational DBMS (RDBMS) is used to store data in
related tables. Rows and columns of a table are called
tuples and attributed respectively. A table is referred
to as a relation.
• Destructions on data stored in a RDBMS is applied
by use of keys such as Candidate Key, Primary Key,
Composite Primary Key, Foreign Key.
• Primary key in a relation is used for unique identification
of tuples.
• Foreign key is used to relate two tables or relations.
• Each column in a table represents a feature (attribute)
of a record. Table stores the information for an entity
whereas a row represents a record.
• Each row in a table represents a record. A tuple is
a collection of attribute values that makes a record
unique.
• A tuple is a unique entity whereas attribute values can
be duplicate in the table.
• SQL is the standard language for RDBMS systems like
MySQL.

Exercise
1. Give the terms for each of the following:
a) Collection of logically related records.
b) DBMS creates a file that contains description about the
data stored in the database.
c) Attribute that can uniquely identify the tuples in a
relation.
d) Special value that is stored when actual data value is
unknown for an attribute.
e) An attribute which can uniquely identify tuples of the
table but is not defined as primary key of the table.
f) Software that is used to create, manipulate and maintain
a relational database.
2. Why foreign keys are allowed to have NULL values? Explain
with an example.

2024-25

Chap 7.indd 138 19-Jul-19 3:45:08 PM


Database Concepts 139

3. Differentiate between: Notes


a) Database state and database schema
b) Primary key and foreign key
c) Degree and cardinality of a relation
4. Compared to a file system, how does a database management
system avoid redundancy in data through a database?
5. What are the limitations of file system that can be overcome
by a relational DBMS?
6. A school has a rule that each student must participate in a
sports activity. So each one should give only one preference
for sports activity. Suppose there are five students in a class,
each having a unique roll number. The class representative
has prepared a list of sports preferences as shown below.
Answer the following:
Table: Sports Preferences
Roll_no Preference
9 Cricket
13 Football
17 Badminton
17 Football
21 Hockey
24 NULL
NULL Kabaddi
a) Roll no 24 may not be interested in sports. Can a NULL
value be assigned to that student’s preference field?
b) Roll no 17 has given two preferences sports. Which
property of relational DBMC is violated here? Can we use
any constraint or key in the relational DBMS to check
against such violation, if any?
c) Kabaddi was not chosen by any student. Is it possible to
have this tuple in the Sports Preferences relation?
7. In another class having 2 sections, the two respective class
representatives have prepared 2 separate Sports Preferences
tables, as shown below:
Sports preference of section 1 (arranged on roll number column)
Table: Sports Preferences
Roll_no Sports
9 Cricket
13 Football
17 Badminton
21 Hockey
24 Cricket
Sports preference of section 2 (arranged on Sports name
column, and column order is also different)

2024-25

Chap 7.indd 139 19-Jul-19 3:45:08 PM


140 Informatics Practices – Class XI

Notes Table: Sports Preferences


Sports Roll_no
Badminton 17
Cricket 9
Cricket 24
Football 13
Hockey 21

Are the states of both the relations equivalent?


Justify.
8. The school canteen wants to maintain records of items
available in the school canteen and generate bills when
students purchase any item from the canteen. The school
wants to create a canteen database to keep track of items
in the canteen and the items purchased by students.
Design a database by answering the following questions:
a) To store each item name along with its price, what
relation should be used? Decide appropriate attribute
names along with their data type. Each item and its
price should be stored only once. What restriction
should be used while defining the relation?
b) In order to generate bill, we should know the quantity
of an item purchased. Should this information be in
a new relation or a part of the previous relation? If
a new relation is required, decide appropriate name
and data type for attributes. Also, identify appropriate
primary key and foreign key so that the following two
restrictions are satisfied:
i) The same bill cannot be generated for different
orders.
ii) Bill can be generated only for available items in
the canteen.
c) The school wants to find out how many calories
students intake when they order an item. In which
relation should the attribute ‘calories’ be stored?
9. An organisation wants to create a database EMP-
DEPENDENT to maintain following details about its
employees and their dependent.
EMPLOYEE(AadharNumber, Name, Address,
Department,EmployeeID)
DEPENDENT(EmployeeID, DependentName,
Relationship)
a) Name the attributes of EMPLOYEE, which can be
used as candidate keys.
b) The company wants to retrieve details of dependent
of a particular employee. Name the tables and the
key which are required to retrieve this detail.

2024-25

Chap 7.indd 140 19-Jul-19 3:45:08 PM


Database Concepts 141

c) What is the degree of EMPLOYEE and DEPENDENT


relation?
10. School uniform is available at M/s Sheetal Private
Limited. They have maintained SCHOOL_UNIFORM
Database with two relations viz. UNIFORM and COST.
The following figure shows database schema and its state.

School Uniform Database

Attributes and Constraints


Table:
UCode Size COST Price
Table: UNIFORM 1 M 500
Attribute UCode UName UColor 1 L 580
Constraints Primary Key Not Null - 1 XL 620
2 M 810
Table: COST 2 L 890
Attribute UCode Size Price 2 XL 940
Constraints Composite Primary Key >0
3 M 770
3 L 830
Table: UNIFORM 3 XL 910
UCode UName UColor 4 S 150
1 Shirt White 4 L 170
2 Pant Grey 5 S 180
3 Skirt Grey 5 L 210
4 Tie Blue 6 M 110
5 Socks Blue 6 L 140
6 Belt Blue 6 XL 160

a) Can they insert the following tuples to the UNIFORM


Relation? Give reasons in support of your answer.
i) 7, Handkerchief, NULL
ii) 4, Ribbon, Red
iii) 8, NULL, White
b) Can they insert the following tuples to the COST
Relation? Give reasons in support of your answer.
i) 7, S, 0
ii) 9, XL, 100
11. In a multiplex, movies are screened in different
auditoriums. One movie can be shown in more than one
auditorium. In order to maintain the record of movies,
the multiplex maintains a relational database consisting
of two relations viz. MOVIE and AUDI respectively as
shown below:
Movie(Movie_ID, MovieName, ReleaseDate)
Audi(AudiNo, Movie_ID, Seats, ScreenType,
TicketPrice)

2024-25

Chap 7.indd 141 19-Jul-19 3:45:08 PM


142 Informatics Practices – Class XI

a) Is it correct to assign Movie_ID as the primary


key in the MOVIE relation? If no, then suggest an
appropriate primary key.
b) Is it correct to assign AudiNo as the primary key in
the AUDI relation? If no, then suggest appropriate
primary key.
c) Is there any foreign key in any of these relations?

Student Project Database

Table: STUDENT
Roll No Name Class Section Registration_ID
11 Mohan XI 1 IP-101-15
12 Sohan XI 2 IP-104-15
21 John XII 1 CS-103-14
Table: PROJECT ASSIGNED
22 Meena XII 2 CS-101-14
Registration_ID ProjectNo
23 Juhi XII 2 CS-101-10
IP-101-15 101
Table: PROJECT IP-104-15 103
ProjectNo PName SubmissionDate CS-103-14 102
101 Airline Database 12/01/2018 CS-101-14 105
102 Library Database 12/01/2018 CS-101-10 104
103 Employee Database 15/01/2018
104 Student Database 12/01/2018
105 Inventory Database 15/01/2018
106 Railway Database 15/01/2018

12. For the above given database STUDENT-PROJECT,


answer the following:
a) Name primary key of each table.
b) Find foreign key(s) in table PROJECT-ASSIGNED.
c) Is there any alternate key in table STUDENT? Give
justification for your answer.
d) Can a user assign duplicate value to the field RollNo
of STUDENT table? Jusify.
13. For the above given database STUDENT-PROJECT, can
we perform the following operations?
a) Insert a student record with missing roll number
value.
b) Insert a student record with missing registration
number value.
c) Insert a project detail without submission-date.
d) Insert a record with registration ID IP-101-19 and
ProjectNo 206 in table PROJECT-ASSIGNED.

2024-25

Chap 7.indd 142 19-Jul-19 3:45:08 PM


Introduction to Chapter

Structured Query 8
Language (SQL)

In this chapter

»» Introduction
»» Structured Query
“The most important motivation for the Language (SQL)
research work that resulted in the relational »» Data Types and
model was the objective of providing a sharp Constraints in MySQL
and clear boundary between the logical and »» SQL for Data Definition
physical aspects of database management.” »» SQL for Data
Manipulation
– E. F. Codd »» SQL for Data Query
»» Data Updation and
Deletion

8.1 Introduction
We have learnt about Relational Database
Management System (RDBMS) and purpose in the
previous chapter. There are many RDBMS such
as MySQL, Microsoft SQL Server, PostgreSQL,
Oracle, etc. that allow us to create a database
consisting of relations and to link one or more
relations for efficient querying to store, retrieve
and manipulate data on that database. In this
chapter, we will learn how to create, populate and
query database using MySQL.

2024-25

Chap 8.indd 143 19-Jul-19 3:45:57 PM


144 Informatics Practices – Class XI

8.2 Structured Query Language (SQL)


One has to write application programs to access data in
case of a file system. However, for database management
systems there are special kind of programming
languages called query language that can be used to
access data from the database. The Structured Query
Language (SQL) is the most popular query language
used by major relational database management systems
such as MySQL, ORACLE, SQL Server, etc.
SQL is easy to learn as the statements comprise of
descriptive English words and are not case sensitive.
We can create and interact with a database using SQL
in an efficient and easy way. The benefit with SQL is
that we don’t have to specify how to get the data from
the database. Rather, we simply specify what is to be
retrieved, and SQL does the rest. Although called a query
language, SQL can do much more besides querying.
SQL provides statements for defining the structure of
the data, manipulating data in the database, declare
constraints and retrieve data from the database in
various ways, depending on our requirements.
In this chapter, we will learn how to create a database
using MySQL as the RDBMS software. We will create a
database called StudentAttendance (Figure 7.5) that we
had identified in the previous chapter. We will also learn
how to populate database with data, manipulate data in
that and retrieve data from the database through SQL
queries.
8.2.1 Installing MySQL
MySQL is an open source RDBMS software which can
be easily downloaded from the official website https://
dev.mysql.com/downloads. After installing MySQL,
start MySQL service. The appearance of mysql> prompt
(Figure 8.1) means that MySQL is ready for us to enter
SQL statements.
Few rules to follow while writing SQL statements in
MySQL:
Activity 8.1 • SQL is case insensitive. That means name and NAME
are same for SQL.
Explore LibreOffice
Base and compare it • Always end SQL statements with a semicolon (;).
with MySQL • To enter multiline SQL statements, we don’t write
‘;’ after the first line. We put enter to continue on
next line. The prompt mysql> then changes to ‘->’,

2024-25

Chap 8.indd 144 19-Jul-19 3:45:57 PM


Introduction to Structured Query Language (SQL) 145

indicating that statement is continued to the next


line. After the last line, put ‘;’ and press enter.

8.3 Data Types and Constraints in MySQL

Figure 8.1: MySQL Shell

We know that a database consists of one or more


relations and each relation (table) is made up of attributes Activity 8.2
(column). Each attribute has a data type. We can also What are the other
specify constraints for each attribute of a relation. data types supported in
MySQL? Are there other
8.3.1 Data type of Attribute variants of integer and
float data type?
Data type indicates the type of data value that an
attribute can have. The data type of an attribute decides
the operations that can be performed on the data of
that attribute. For example, arithmetic operations can Think and Reflect
Can you think of an
be performed on numeric data but not on character
attribute for which
data. Commonly used data types in MySQL are numeric fixed length string is
types, date and time types, and string (character and suitable?
byte) types as shown in Table 8.1.
Table 8.1 Commonly used data types in MySQL
Data type Description
CHAR(n) Specifies character type data of length n where n could be any value from 0 to
255. CHAR is of fixed length, means, declaring CHAR (10) implies to reserve
spaces for 10 characters. If data does not have 10 characters (for example,
‘city’ has four characters), MySQL fills the remaining 6 characters with spaces
padded on the right.
VARCHAR(n) Specifies character type data of length ‘n’ where n could be any value from 0
to 65535. But unlike CHAR, VARCHAR is a variable-length data type. That is,
declaring VARCHAR (30) means a maximum of 30 characters can be stored
but the actual allocated bytes will depend on the length of entered string. So
‘city’ in VARCHAR (30) will occupy the space needed to store 4 characters only.

2024-25

Chap 8.indd 145 19-Jul-19 3:45:57 PM


146 Informatics Practices – Class XI

INT INT specifies an integer value. Each INT value occupies 4 bytes of storage. The
range of values allowed in integer type are -2147483648 to 2147483647. For
values larger than that, we have to use BIGINT, which occupies 8 bytes.
FLOAT Holds numbers with decimal points. Each FLOAT value occupies 4 bytes.
DATE The DATE type is used for dates in 'YYYY-MM-DD' format. YYYY is the 4 digit
year, MM is the 2 digit month and DD is the 2 digit date. The supported range
is '1000-01-01' to '9999-12-31'.

8.3.2 Constraints
Think and Reflect
Which two constraints
Constraints are certain types of restrictions on the data
when applied together values that an attribute can have. They are used to
will produce a Primary ensure the accuracy and reliability of data. However, it
Key constraint? is not mandatory to define constraint for each attribute
of a table. Table 8.2 lists various SQL constraints.
Table 8.2 Commonly used SQL Constraints
Constraint Description
NOT NULL Ensures that a column cannot have NULL values where NULL means missing/
unknown/not applicable value.
UNIQUE Ensures that all the values in a column are distinct/unique.
DEFAULT A default value specified for the column if no value is provided.
PRIMARY KEY The column which can uniquely identify each row or record in a table.
FOREIGN KEY The column which refers to value of an attribute defined as primary key in another
table.

8.4 SQL for Data Definition


SQL provides commands for defining the relation
schemas, modifying relation schemas and deleting
relations. These are called Data Definition Language
(DDL) through which the set of relations are specified,
including their schema, data type for each attribute, the
constraints as well as the security and access related
authorisations.
Data definition starts with the create statement. This
statement is used to create a database and its tables
(relations). Before creating a database, we should be
clear about the number of tables in the database, the
columns (attributes) in each table along with the data
type of each column. This is how we decide the relation
schema.
8.4.1 CREATE Database
To create a database, we use the CREATE DATABASE
statement as shown in the following syntax:
CREATE DATABASE databasename;

2024-25

Chap 8.indd 146 19-Jul-19 3:45:57 PM


Introduction to Structured Query Language (SQL) 147

To create a database called StudentAttendance, we


will type following command at mysql prompt.
mysql> CREATE DATABASE StudentAttendance;
Query OK, 1 row affected (0.02 sec)
Note: In LINUX environment, names for database and tables
are case-sensitive whereas in WINDOWS, there is no such Show
differentiation. However, as a good practice, it is suggested to write
database or table name in the same letter cases that were used at
the time of their creation.
A DBMS can manage multiple databases on one
computer. Therefore, we need to select the database
that we want to use. Once the database is selected, we
can proceed with creating tables or querying data. Write
the following SQL statement for using the database:
mysql> USE StudentAttendance;
Database changed Activity 8.3
Initially, the created database is empty. It can be Type the statement
checked by using the Show tables command that lists show database;. Does
names of all the tables within a database. it show the name of
StudentAttendance
mysql> SHOW TABLES; database?
Empty set (0.06 sec)

8.4.2 CREATE Table


After creating database StudentAttendance, we need
to define relations (create tables) in this database and
specify attributes for each relation along with data types
for each attribute. This is done using the CREATE TABLE
statement.
Syntax:
CREATE TABLE tablename(
attributename1 datatype constraint,
attributename2 datatype constraint,
:
attributenameN datatype constraint);
It is important to observe the following points with
respect to the Create Table statement:
• N is the degree of the relation, means there are N
columns in the table.
• Attribute name specifies the name of the column in
the table.
• Datatype specifies the type of data that an attribute
can hold.
• Constraint indicates the restrictions imposed on the
values of an attribute. By default, each attribute can
take NULL values except for the primary key.

2024-25

Chap 8.indd 147 19-Jul-19 3:45:57 PM


148 Informatics Practices – Class XI

Let us identify data types of the attributes of table


STUDENT along with their constraint, if any. Assuming
maximum students in a class to be 100 and values of
roll number in a sequence from 1 to 100, we know that
3 digits are sufficient to store values for the attribute
RollNumber. Hence, data type INT is appropriate for this
attribute. Total number of characters in student names
(SName) can differ. Assuming maximum characters in
a name as 20, we use VARCHAR(20) for SName column.
Data type for the attribute SDateofBirth is DATE and
supposing the school uses guardian’s 12 digit Aadhaar
number as GUID, we can declare GUID as CHAR (12)
since Aadhaar number is of fixed length and we are not
going to perform any mathematical operation on GUID.
Table 8.3, 8.4 and 8.5 show the chosen data type and
constraint for each attribute of the relations STUDENT,
GUARDIAN and ATTENDANCE, respectively.
Table 8.3 Data types and constraints for the attributes of relation STUDENT
Attribute Name Data expected to be stored Data type Constraint
RollNumber Numeric value consisting of maximum 3 digits INT PRIMARY KEY
SName Variant length string of maximum 20 characters VARCHAR(20) NOT NULL
SDateofBirth Date value DATE NOT NULL
GUID Numeric value consisting of 12 digits CHAR (12) FOREIGN KEY

Table 8.4 Data types and constraints for the attributes of relation GUARDIAN
Attribute Name Data expected to be stored Data type Constraint
GUID Numeric value consisting of 12 digit Aadhaar CHAR (12) PRIMARY KEY
number
GName Variant length string of maximum 20 VARCHAR(20) NOT NULL
characters
GPhone Numeric value consisting of 10 digits CHAR(10) NULL UNIQUE
GAddress Variant length string of size 30 characters VARCHAR(30) NOT NULL

Table 8.5 Data types and constraints for the attributes of relation ATTENDANCE.
Attribute Name Data expected to be stored Data type Constraint
AttendanceDate Date value DATE PRIMARY KEY*
RollNumber Numeric value consisting of maximum 3 INT PRIMARY KEY*
digits FOREIGN KEY
AttendanceStatus ‘P’ for present and ‘A’ for absent CHAR(1) NOT NULL
*means part of composite primary key

Once data types and constraints are identified, let us


create tables without specifying constraint along with
the attribute name for simplification. We will learn to
incorporate constraints on attributes in Section 8.4.4.

2024-25

Chap 8.indd 148 19-Jul-19 3:45:57 PM


Introduction to Structured Query Language (SQL) 149

Example 8.1 Create table STUDENT.


mysql> CREATE TABLE STUDENT( Think and Reflect
-> RollNumber INT, Can we have a CHAR
-> SName VARCHAR(20), or VARCHAR data type
-> SDateofBirth DATE, for contact number
-> GUID CHAR(12), (mobile, landline)?
-> PRIMARY KEY (RollNumber));
Query OK, 0 rows affected (0.91 sec)
Note: ‘,’ is used to separate two attributes and each statement
terminates with a semi-colon (;). The symbol ‘->’ indicates line
continuation as SQL statement may not complete in a single line.
Activity 8.4
8.4.3 DESCRIBE Table
We can view the structure of an already created table Create the other two
relations GUARDIAN
using the describe statement.
and ATTENDANCE
Syntax: as per data types
DESCRIBE tablename; given in Table 8.4 and
8.5, and view their
MySQL also supports the short form DESC of DESCRIBE structures. Don't add
to get description of table. To retrieve details about the any constraint in the
structure of relation STUDENT, we can write DESC or two tables.
DESCRIBE followed by table name:
mysql> DESC STUDENT;
+--------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-------------+------+-----+---------+-------+
| RollNumber | int | NO | PRI | NULL | |
| SName | varchar(20) | YES | | NULL | |
| SDateofBirth | date | YES | | NULL | |
| GUID | char(12) | YES | | NULL | |
+--------------+-------------+------+-----+---------+-------+
4 rows in set (0.06 sec)
The show table command will now return the table
STUDENT:
mysql> SHOW TABLES;
+------------------------------+
| Tables_in_studentattendance |
+------------------------------+
| student |
+------------------------------+
1 row in set (0.00 sec)

8.4.4 ALTER Table


After creating a table we may realize that we need to
add/remove an attribute or to modify the datatype of an
existing attribute or to add constraint in attribute. In all
such cases, we need to change or alter the structure of
the table by using the alter statement.
Syntax:
ALTER TABLE tablename ADD/Modify/DROP attribute1,
attribute2,..

2024-25

Chap 8.indd 149 19-Jul-19 3:45:57 PM


150 Informatics Practices – Class XI

(A) Add primary key to a relation


Let us now alter the tables created in Activity 8.4. The
below MySQL statement adds a primary key to the
GUARDIAN relation:
mysql> ALTER TABLE GUARDIAN ADD PRIMARY KEY (GUID);
Query OK, 0 rows affected (1.14 sec)
Records: 0 Duplicates: 0 Warnings: 0
Now let us add primary key to the ATTENDANCE
relation. The primary key of this relation is a composite
key made up of two attributes — AttendanceDate and
RollNumber.
mysql> ALTER TABLE ATTENDANCE
-> ADD PRIMARY KEY(AttendanceDate,
-> RollNumber);
Query OK, 0 rows affected (0.52 sec)
Records: 0 Duplicates: 0 Warnings: 0
(B) Add foreign key to a relation
Once primary keys are added the next step is to add
foreign keys to the relation (if any). A relation may have
multiple foreign keys and each foreign key is defined on
a single attribute. Following points need to be observed
while adding foreign key to a relation:
• The referenced relation must be already created.
• The referenced attribute must be a part of primary
key of the referenced relation.
• Data types and size of referenced and referencing
attributes must be same.
Syntax:
ALTER TABLE table_name ADD FOREIGN KEY(attribute
name) REFERENCES referenced_table_name
(attribute name);
Let us now add foreign key to the table STUDENT.
Table 8.3 shows that attribute GUID (the referencing
Think and Reflect attribute) is a foreign key and it refers to attribute GUID
Name foreign keys in (the referenced attribute) of table GUARDIAN (Table 8.4).
table ATTENDANCE Hence, STUDENT is the referencing table and GUARDIAN
and STUDENT. Is there
is the referenced table.
any foreign key in table
GUARDIAN. mysql> ALTER TABLE STUDENT
-> ADD FOREIGN KEY(GUID) REFERENCES
-> GUARDIAN(GUID);
Query OK, 0 rows affected (0.75 sec)
Records: 0 Duplicates: 0 Warnings: 0
(C) Add constraint UNIQUE to an existing attribute
In GUARDIAN table, attribute GPhone has a constraint
UNIQUE which means no two values in that column
should be same.
Syntax:

2024-25

Chap 8.indd 150 19-Jul-19 3:45:57 PM


Introduction to Structured Query Language (SQL) 151

ALTER TABLE table_name ADD UNIQUE (attribute


name);
Let us now add the constraint UNIQUE with attribute
GPhone of the table GUARDIAN as shown at table 8.4. Activity 8.5
mysql> ALTER TABLE GUARDIAN Add foreign key in
-> ADD UNIQUE(GPhone); the ATTENDANCE
Query OK, 0 rows affected (0.44 sec)
table (use fig. 8.1 to
Records: 0 Duplicates: 0 Warnings: 0
identify referencing and
(D) Add an attribute to an existing table referenced tables).
Sometimes, we may need to add an additional attribute
in a table. It can be done using the syntax given below:
ALTER TABLE table_name ADD attribute_name DATATYPE;
Suppose the principal of the school has decided to
award scholarship to some needy students for which
income of the guardian must be known. But school has
not maintained income attribute with table GUARDIAN
so far. Therefore, the database designer now needs to
add a new attribute income of data type INT in the table
GUARDIAN.
mysql> ALTER TABLE GUARDIAN
-> ADD income INT;
Query OK, 0 rows affected (0.47 sec)
Records: 0 Duplicates: 0 Warnings: 0 Think and Reflect
What are the minimum
(E) Modify datatype of an attribute and maximum income
We can modify data types of the existing attributes of a values that can be
table using the following ALTER statement. entered in the income
attribute given the data
Syntax:
type is INT?
ALTER TABLE table_name MODIFY attribute DATATYPE;
Suppose we need to change the size of attribute
GAddress from VARCHAR(30) to VARCHAR(40) of the
GUARDIAN table. The MySQL statement will be:
mysql> ALTER TABLE GUARDIAN
-> MODIFY GAddress VARCHAR(40);
Query OK, 0 rows affected (0.11 sec)
Records: 0 Duplicates: 0 Warnings: 0

(F) Modify constraint of an attribute


When we create a table, by default each attribute takes
NULL value except for the attribute defined as primary
key. We can change an attribute’s constraint from NULL
to NOT NULL using alter statement.
Syntax:
ALTER TABLE table_name MODIFY attribute DATATYPE
NOT NULL;
Note: We have to specify the data type of the attribute along with
constraint NOT NULL while using MODIFY.

2024-25

Chap 8.indd 151 19-Jul-19 3:45:57 PM


152 Informatics Practices – Class XI

Notes To associate NOT NULL constraint with attribute


SName of table STUDENT (table 8.3), we write the
following MySQL statement:
mysql> ALTER TABLE STUDENT
-> MODIFY SName VARCHAR(20) NOT NULL;
Query OK, 0 rows affected (0.47 sec)
Records: 0 Duplicates: 0 Warnings: 0

(G) Add default value to an attribute


If we want to specify default value for an attribute, then
use the following syntax:
ALTER TABLE table_name MODIFY attribute DATATYPE
DEFAULT default_value;
To set default value of SDateofBirth of STUDENT to
15th May 2000, we write the following statement:
mysql> ALTER TABLE STUDENT
-> MODIFY SDateofBirth DATE DEFAULT
-> 2000-05-15;
Query OK, 0 rows affected (0.08 sec)
Records: 0 Duplicates: 0 Warnings: 0
Note: We have to specify the data type of the attribute along with
DEFAULT while using MODIFY.

(H) Remove an attribute


Using ALTER, we can remove attributes from a table, as
shown in the below syntax:
ALTER TABLE table_name DROP attribute;
To remove the attribute income from the table
GUARDIAN (8.4), we can write the following MySQL
statement:
mysql> ALTER TABLE GUARDIAN DROP income;
Query OK, 0 rows affected (0.42 sec)
Records: 0 Duplicates: 0 Warnings: 0
(I) Remove primary key from the table
While creating a table, we may have specified incorrect
primary key. In such case, we need to drop the existing
primary key of the table and add a new primary key.
Syntax:
ALTER TABLE table_name DROP PRIMARY KEY;
To remove primary key of table GUARDIAN (Table 8.4),
we write the following MySQL statement:
mysql> ALTER TABLE GUARDIAN DROP PRIMARY KEY;
Query OK, 0 rows affected (0.72 sec)
Records: 0 Duplicates: 0 Warnings: 0
Note: We have dropped primary key from GUARDIAN table, but
each table should have a primary key to maintain uniqueness.
Hence, we have to use ADD command to specify primary key for
the GUARDIAN table as shown in earlier examples.

2024-25

Chap 8.indd 152 19-Jul-19 3:45:57 PM


Introduction to Structured Query Language (SQL) 153

8.4.5 DROP Statement Notes


Sometimes a table in a database or the database itself
needs to be removed. We can use DROP statement to
remove a database or a table permanently from the
system. However, one should be very cautious while
using this statement as it cannot be undone.
Syntax to drop a table:
DROP TABLE table_name;
Syntax to drop a database:
DROP DATABASE database_name;

Cautions:
1) Using the Drop statement to remove a database will
ultimately remove all the tables within it.
2) DROP statement will remove the tables or database
created by you. Hence you may apply DROP statement at
the end of the chapter.

8.5 SQL for Data Manipulation


In the previous section, we created the database
StudentAttendance having three relations STUDENT,
GUARDIAN and ATTENDANCE. When we create a table,
only its structure is created but the table has no data.
To populate records in the table, INSERT statement is
used. Similarly, table records can be deleted or updated
using SQL data manipulation statements.
Data Manipulation using a database means either
retrieval (access) of existing data, insertion of new data,
removal of existing data or modification of existing data
in the database.
8.5.1 INSERTION of Records
INSERT INTO statement is used to insert new records in
a table. Its syntax is:
INSERT INTO tablename
VALUES(value 1, value 2,....);
Here, value 1 corresponds to attribute 1, value 2
corresponds to attribute 2 and so on. Note that we need
not to specify attribute names in insert statement if
there are exactly same number of values in the INSERT
statement as the total number of attributes in the table.
Caution: While populating records in a table with foreign
key, ensure that records in referenced tables are already
populated.

2024-25

Chap 8.indd 153 19-Jul-19 3:45:58 PM


154 Informatics Practices – Class XI

Let us insert some records in the StudentAttendance


database. We shall insert records in the GUARDIAN
table first as it does not have any foreign key. We are
going to insert the records given in Table 8.6.
Table 8.6 Records to be inserted into the GUARDIAN table
GUID GName GPhone GAddress
444444444444 Amit Ahuja 5711492685 G-35, Ashok Vihar, Delhi
111111111111 Baichung Bhutia 3612967082 Flat no. 5, Darjeeling Appt., Shimla
101010101010 Himanshu Shah 4726309212 26/77, West Patel Nagar, Ahmedabad
333333333333 Danny Dsouza S -13, Ashok Village, Daman
466444444666 Sujata P. 3801923168 HNO-13, B- block, Preet Vihar, Madurai

The below statement inserts the first record in the


table.
mysql> INSERT INTO GUARDIAN
-> VALUES (444444444444, 'Amit Ahuja',
-> 5711492685, 'G-35,Ashok vihar, Delhi' );
Query OK, 1 row affected (0.01 sec)
We can use the SQL statement SELECT * from table_
name to view the inserted records. The SELECT statement
will be explained in next section.
mysql> SELECT * from GUARDIAN;
+--------------+-----------------+------------+-------------------------------+
| GUID | GName | Gphone | GAddress |
+--------------+-----------------+------------+-------------------------------+
| 444444444444 | Amit Ahuja | 5711492685 | G-35, Ashok vihar, Delhi |
+--------------+-----------------+------------+-------------------------------+
1 row in set (0.00 sec)
If we want to provide values only for some of the
attributes in a table (supposing other attributes having
NULL or any other default value), then we shall specify
the attribute name alongside each data value as shown
in the following syntax of INSERT INTO statement.
Syntax:
INSERT INTO tablename (column1, column2, ...)
VALUES (value1, value2, ...);
To insert the fourth record of Table 8.6 where GPhone
is not given, we need to insert values in the other three
fields (GPhone was set to NULL by default at the time
of table creation). In this case, we have to specify the
Activity 8.6 names of attributes in which we want to insert values.
The values must be given in the same order in which
Write SQL statements
to insert the remaining attributes are written in INSERT command.
3 rows of table 8.6 in mysql> INSERT INTO GUARDIAN(GUID, GName, GAddress)
table GUARDIAN. -> VALUES (333333333333, 'Danny Dsouza',

2024-25

Chap 8.indd 154 3/31/2023 3:57:16 PM


Introduction to Structured Query Language (SQL) 155

-> 'S -13, Ashok Village, Daman' );


Query OK, 1 row affected (0.03 sec)
Note: Text and date values must be enclosed in ‘ ’ (single quotes).
mysql> SELECT * from GUARDIAN;
+--------------+--------------+------------+----------------------------------+
| GUID | GName | Gphone | GAddress |
+--------------+--------------+------------+----------------------------------+
| 333333333333 | Danny Dsouza | NULL | S -13, Ashok Village, Daman |
| 444444444444 | Amit Ahuja | 5711492685 | G-35, Ashok vihar, Delhi |
+--------------+--------------+------------+----------------------------------+
2 rows in set (0.00 sec)
Let us now insert the records given in Table 8.7 into
the STUDENT table.
Table 8.7 Records to be inserted into the STUDENT table
RollNumber SName SDateofBirth GUID
1 Atharv Ahuja 2003-05-15 444444444444
2 Daizy Bhutia 2002-02-28 111111111111
3 Taleem Shah 2002-02-28
4 John Dsouza 2003-08-18 333333333333
5 Ali Shah 2003-07-05 101010101010
6 Manika P. 2002-03-10 466444444666

To insert the first record of Table 8.7, we write the


following MySQL statement
mysql> INSERT INTO STUDENT
-> VALUES(1,'Atharv Ahuja','2003-05-15',
-> 444444444444);
Query OK, 1 row affected (0.11 sec)
OR
mysql> INSERT INTO STUDENT (RollNumber, SName,
-> SDateofBirth, GUID)
-> VALUES (1,'Atharv Ahuja','2003-05-15',
-> 444444444444);
Query OK, 1 row affected (0.02 sec)
mysql> SELECT * from STUDENT;
+------------+--------------+--------------+--------------+
| RollNumber | SName | SDateofBirth | GUID |
+------------+--------------+--------------+--------------+
| 1 | Atharv Ahuja | 2003-05-15 | 444444444444 |
+------------+--------------+--------------+--------------+
1 row in set (0.00 sec)

Let us now insert the third record of Table 8.7 where


GUID is NULL. Recall that GUID is foreign key of this Recall that Date is
table and thus can take NULL value. Hence, we can put stored in “YYYY-MM-
DD” format.
NULL value for GUID and insert the record by using the
following statement:

2024-25

Chap 8.indd 155 19-Jul-19 3:45:58 PM


156 Informatics Practices – Class XI

mysql> INSERT INTO STUDENT


-> VALUES(3, 'Taleem Shah','2002-02-28',
-> NULL);
Query OK, 1 row affected (0.05 sec)

mysql> SELECT * from STUDENT;


+------------+--------------+--------------+--------------+
| RollNumber | SName | SDateofBirth | GUID |
+------------+--------------+--------------+--------------+
| 1 | Atharv Ahuja | 2003-05-15 | 444444444444 |
| 3 | Taleem Shah | 2002-02-28 | NULL |
+------------+--------------+--------------+--------------+
2 rows in set (0.00 sec)
We had to write NULL in the above MySQL statement
because when not giving the column names, we need
to give values for all the columns. Otherwise, we have
to give names of attributes along with the values if we
need to insert data only for certain attributes, as shown
in the next query:
Activity 8.7
mysql> INSERT INTO STUDENT (RollNumber, SName,
Write SQL statements -> SDateofBirth) VALUES (3, 'Taleem Shah','
to insert the remaining -> 2002-02-28');
4 rows of table 8.7 in Query OK, 1 row affected (0.05 sec)
table STUDENT. In the above statement we are informing DBMS
to insert the corresponding values for the mentioned
columns and GUID would be assigned NULL value.
mysql> SELECT * from STUDENT;
+------------+--------------+--------------+--------------+
| RollNumber | SName | SDateofBirth | GUID |
+------------+--------------+--------------+--------------+
| 1 | Atharv Ahuja | 2003-05-15 | 444444444444 |
| 3 | Taleem Shah | 2002-02-28 | NULL |
+------------+--------------+--------------+--------------+
2 rows in set (0.00 sec)

8.6 SQL for Data Query


Think and Reflect
• Which of the above
So far we have learnt how to create database as well
syntax should be as to store and manipulate data. We are interested to
used when we are store data in a database as it is easier to retrieve data
not sure of the order in future from databases in whatever way we want.
(with respect to the The Structured Query Language (SQL) has efficient
column) in which
the values are to be
mechanisms to retrieve data stored in multiple tables
inserted in the table? in a MySQL database (or any other RDBMS). The
• Can we insert two
user enters the SQL commands called queries where
records with the the specific requirements for data to be retrieved are
same roll number? provided. The SQL statement SELECT is used to retrieve
data from the tables in a database and is also called
query statement.

2024-25

Chap 8.indd 156 19-Jul-19 3:45:58 PM


Introduction to Structured Query Language (SQL) 157

8.6.1 SELECT Statement


The SQL statement SELECT is used to retrieve data from
the tables in a database and the output is also displayed
in tabular form.
Syntax:
SELECT attribute1, attribute2, ...
FROM table_name
WHERE condition
Here, attribute1, attribute2, ... are the column names
of the table table_name from which we want to retrieve
data. The FROM clause is always written with SELECT
clause as it specifies the name of the table from which
data is to be retrieved. The WHERE clause is optional and
is used to retrieve data that meet specified condition(s).
Example 8.2 To display the name and date of birth of student
with roll number 2, we write the following query:

mysql> SELECT SName, SDateofBirth


-> FROM STUDENT
-> WHERE RollNumber = 1;
+--------------+--------------+
| SName | SDateofBirth |
+--------------+--------------+ Think and Reflect
| Atharv Ahuja | 2003-05-15 | Can you think of
+--------------+--------------+
examples from daily
1 row in set (0.03 sec)
life where storing
8.6.2 QUERYING using Database OFFICE and querying data
in a database can be
Different organisations maintain databases to helpful?
store data in the form of tables. Let us consider the
database OFFICE of an organisation that has many
related tables like EMPLOYEE, DEPARTMENT and
so on. Every EMPLOYEE in the database is assigned
to a DEPARTMENT and his/her Department number
(DeptId) is stored as a foreign key in the table EMPLOYEE.
Let us consider some data for the table ‘EMPLOYEE’ as
shown in Table 8.8 and apply the SELECT statement to
retrieve data:
Table 8.8 EMPLOYEE
EmpNo Ename Salary Bonus Deptld
101 Aaliya 10000 234 D02
102 Kritika 60000 123 D01
103 Shabbir 45000 566 D01
104 Gurpreet 19000 565 D04
105 Joseph 34000 875 D03

2024-25

Chap 8.indd 157 19-Jul-19 3:45:58 PM


158 Informatics Practices – Class XI

Notes 106 Sanya 48000 695 D02


107 Vergese 15000 D01
108 Nachaobi 29000 D05
109 Daribha 42000 D04
110 Tanya 50000 467 D05

(A) Retrieve selected columns


The following query displays employee numbers of all
the employees:
mysql> SELECT EmpNo
-> FROM EMPLOYEE;
+-------+
| EmpNo |
+-------+
| 101 |
| 102 |
| 103 |
| 104 |
| 105 |
| 106 |
| 107 |
| 108 |
| 109 |
| 110 |
+-------+
10 rows in set (0.41 sec)
To display the employee number and employee name
of all the employees, we write the following query:
mysql> SELECT EmpNo, Ename
-> FROM EMPLOYEE;
+-------+----------+
| EmpNo | Ename |
+-------+----------+
| 101 | Aaliya |
| 102 | Kritika |
| 103 | Shabbir |
| 104 | Gurpreet |
| 105 | Joseph |
| 106 | Sanya |
| 107 | Vergese |
| 108 | Nachaobi |
| 109 | Daribha |
| 110 | Tanya |
+-------+----------+
10 rows in set (0.00 sec)
(B) Renaming of columns
In case we want to rename any column while displaying
the output, we can do so by using alias 'AS' in the
query as:
Display Employee name as Name in the output for
all the employees.
mysql> SELECT EName AS Name

2024-25

Chap 8.indd 158 19-Jul-19 3:45:58 PM


Introduction to Structured Query Language (SQL) 159

-> FROM EMPLOYEE; Notes


+----------+
| Name |
+----------+
| Aaliya |
| Kritika |
| Shabbir |
| Gurpreet |
| Joseph |
| Sanya |
| Vergese |
| Nachaobi |
| Daribha |
| Tanya |
+----------+
10 rows in set (0.00 sec)

Example 8.3 Display names of all employees along with their


annual salary (Salary*12). While displaying query result,
rename EName as Name.
mysql> SELECT EName AS Name, Salary*12
-> FROM EMPLOYEE;
+----------+-----------+
| Name | Salary*12 |
+----------+-----------+
| Aaliya | 120000 |
| Kritika | 720000 |
| Shabbir | 540000 |
| Gurpreet | 228000 |
| Joseph | 408000 |
| Sanya | 576000 |
| Vergese | 180000 |
| Nachaobi | 348000 |
| Daribha | 504000 |
| Tanya | 600000 |
+----------+-----------+
10 rows in set (0.02 sec)
Observe that in the output, Salary*12 is displayed as
the column name for the annual salary column. In the
output table, we can use alias to rename that column as
Annual Salary as shown below:
mysql> SELECT Ename AS Name, Salary*12 AS
-> 'Annual Salary'
-> FROM EMPLOYEE;
+----------+---------------+
| Name | Annual Salary |
+----------+---------------+
| Aaliya | 120000 |
| Kritika | 720000 |
| Shabbir | 540000 |
| Gurpreet | 228000 |
| Joseph | 408000 |
| Sanya | 576000 |
| Vergese | 180000 |
| Nachaobi | 348000 |
| Daribha | 504000 |
| Tanya | 600000 |
+----------+---------------+
10 rows in set (0.00 sec)

2024-25

Chap 8.indd 159 19-Jul-19 3:45:58 PM


160 Informatics Practices – Class XI

Notes Note:
i) Annual Salary will not be added as a new column in the
database table. It is just for displaying the output of the
query.
ii) If an aliased column name has space as in the case of Annual
Salary, it should be enclosed in quotes as 'Annual Salary'.
(C) DISTINCT Clause
By default, SQL shows all the data retrieved through
query as output. However, there can be duplicate values.
The SELECT statement when combined with DISTINCT
clause, returns records without repetition (distinct
records). For example, while retrieving employee’s
department number, there can be duplicate values as
many employees are assigned to same department. To
display unique department number for all the employees,
we use DISTINCT as shown below:
mysql> SELECT DISTINCT DeptId
-> FROM EMPLOYEE;
+--------+
| DeptId |
+--------+
| D02 |
| D01 |
| D04 |
| D03 |
| D05 |
+--------+
5 rows in set (0.03 sec)
(D) WHERE Clause
The WHERE clause is used to retrieve data that meet
some specified conditions. In the OFFICE database,
more than one employee can have the same salary. To
display distinct salaries of the employees working in the
department number D01, we write the following query
in which the condition to select the employee whose
department number is D01 is specified using the WHERE
clause:
mysql> SELECT DISTINCT Salary
-> FROM EMPLOYEE
-> WHERE Deptid='D01';
As the column DeptId is of string type, its values are
enclosed in quotes ('D01').
+--------+
| Salary |
+--------+
| 60000 |
| 45000 |
| 15000 |
+--------+
3 rows in set (0.02 sec)

2024-25

Chap 8.indd 160 19-Jul-19 3:45:58 PM


Introduction to Structured Query Language (SQL) 161

In the above example, we have used = operator in


WHERE clause. We can also use other relational operators
(<, <=, >, >=, !=) to specify conditions. The logical
operators AND, OR, and NOT are used with WHERE clause
to combine multiple conditions.
Example 8.4 Display all the employees who are earning more
than 5000 and work in department with DeptId D04.
mysql> SELECT *
-> FROM EMPLOYEE
-> WHERE Salary > 5000 AND DeptId = 'D04';
+-------+----------+--------+-------+--------+
| EmpNo | Ename | Salary | Bonus | DeptId |
+-------+----------+--------+-------+--------+
| 104 | Gurpreet | 19000 | 565 | D04 |
| 109 | Daribha | 42000 | NULL | D04 |
+-------+----------+--------+-------+--------+
2 rows in set (0.00 sec)

Example 8.5 The following query displays records of all the Think and Reflect
employees except Aaliya. What will happen if
mysql> SELECT * in the above query
-> FROM EMPLOYEE we write “Aaliya” as
-> WHERE NOT Ename = 'Aaliya'; “AALIYA” or “aaliya”
+-------+----------+--------+-------+--------+ or “AaLIYA”? Will the
| EmpNo | Ename | Salary | Bonus | DeptId | query generate the same
+-------+----------+--------+-------+--------+ output or an error?
| 102 | Kritika | 60000 | 123 | D01 |
| 103 | Shabbir | 45000 | 566 | D01 |
| 104 | Gurpreet | 19000 | 565 | D04 |
| 105 | Joseph | 34000 | 875 | D03 |
| 106 | Sanya | 48000 | 695 | D02 |
| 107 | Vergese | 15000 | NULL | D01 |
| 108 | Nachaobi | 29000 | NULL | D05 |
| 109 | Daribha | 42000 | NULL | D04 |
| 110 | Tanya | 50000 | 467 | D05 |
+-------+----------+--------+-------+--------+
9 rows in set (0.00 sec)
Example 8.6 The following query displays name and
department number of all those employees who are earning Activity 8.8
salary between 20000 and 50000 (both values inclusive). Compare the output
mysql> SELECT Ename, DeptId produced by the query
-> FROM EMPLOYEE in example 8.6 and
-> WHERE Salary>=20000 AND Salary<=50000; the following query
+----------+--------+ and differentiate
| Ename | DeptId | between the OR and AND
+----------+--------+ operators.
| Shabbir | D01 | SELECT *
| Joseph | D03 | FROM EMPLOYEE
| Sanya | D02 | WHERE Salary > 5000 OR
| Nachaobi | D05 | DeptId= 20;
| Daribha | D04 |
| Tanya | D05 |
+----------+--------+
6 rows in set (0.00 sec)

2024-25

Chap 8.indd 161 19-Jul-19 3:45:58 PM


162 Informatics Practices – Class XI

Notes The above query defines a range that can also be


checked using a comparison operator BETWEEN.
mysql> SELECT Ename, DeptId
-> FROM EMPLOYEE
-> WHERE Salary BETWEEN 20000 AND 50000;
+----------+--------+
| Ename | DeptId |
+----------+--------+
| Shabbir | D01 |
| Joseph | D03 |
| Sanya | D02 |
| Nachaobi | D05 |
| Daribha | D04 |
| Tanya | D05 |
+----------+--------+
6 rows in set (0.03 sec)
Note: The BETWEEN operator defines the range of values in which
the column value must fall into, to make the condition true.
Example 8.7 The following query displays details of all the
employees who are working either in DeptId D01, D02 or
D04.
mysql> SELECT *
-> FROM EMPLOYEE
-> WHERE DeptId = 'D01' OR DeptId = 'D02' OR
-> DeptId = 'D04';
+-------+----------+--------+-------+--------+
| EmpNo | Ename | Salary | Bonus | DeptId |
+-------+----------+--------+-------+--------+
| 101 | Aaliya | 10000 | 234 | D02 |
| 102 | Kritika | 60000 | 123 | D01 |
| 103 | Shabbir | 45000 | 566 | D01 |
| 104 | Gurpreet | 19000 | 565 | D04 |
| 106 | Sanya | 48000 | 695 | D02 |
| 107 | Vergese | 15000 | NULL | D01 |
| 109 | Daribha | 42000 | NULL | D04 |
+-------+----------+--------+-------+--------+
7 rows in set (0.00 sec)
(E) MEMBERSHIP OPERATOR IN
The IN operator compares a value with a set of values
and returns true if the value belongs to that set. The
above query can be rewritten using IN operator as
shown below:
mysql> SELECT *
-> FROM EMPLOYEE
-> WHERE DeptId IN ('D01', 'D02' , 'D04');
+-------+----------+--------+-------+--------+
| EmpNo | Ename | Salary | Bonus | DeptId |
+-------+----------+--------+-------+--------+
| 101 | Aaliya | 10000 | 234 | D02 |
| 102 | Kritika | 60000 | 123 | D01 |
| 103 | Shabbir | 45000 | 566 | D01 |
| 104 | Gurpreet | 19000 | 565 | D04 |
| 106 | Sanya | 48000 | 695 | D02 |
| 107 | Vergese | 15000 | NULL | D01 |
| 109 | Daribha | 42000 | NULL | D04 |
+-------+----------+--------+-------+--------+
7 rows in set (0.00 sec)

2024-25

Chap 8.indd 162 19-Jul-19 3:45:58 PM


Introduction to Structured Query Language (SQL) 163

Example 8.8 The following query displays details of all the Notes
employees except those working in department number D01
or D02.
mysql> SELECT *
-> FROM EMPLOYEE
-> WHERE DeptId NOT IN('D01', 'D02');
+-------+----------+--------+-------+--------+
| EmpNo | Ename | Salary | Bonus | DeptId |
+-------+----------+--------+-------+--------+
| 104 | Gurpreet | 19000 | 565 | D04 |
| 105 | Joseph | 34000 | 875 | D03 |
| 108 | Nachaobi | 29000 | NULL | D05 |
| 109 | Daribha | 42000 | NULL | D04 |
| 110 | Tanya | 50000 | 467 | D05 |
+-------+----------+--------+-------+--------+
5 rows in set (0.00 sec)
Note: Here we need to combine NOT with IN as we want to retrieve
all records except with DeptId D01 and D02.
(F) ORDER BY Clause
ORDER BY clause is used to display data in an ordered
(arranged) form with respect to a specified column. By
default, ORDER BY displays records in ascending order of
the specified column’s values. To display the records in
descending order, the DESC (means descending) keyword
needs to be written with that column.
Example 8.9 The following query displays details of all the
employees in ascending order of their salaries.
mysql> SELECT *
-> FROM EMPLOYEE
-> ORDER BY Salary;
+-------+----------+--------+-------+--------+
| EmpNo | Ename | Salary | Bonus | DeptId |
+-------+----------+--------+-------+--------+
| 101 | Aaliya | 10000 | 234 | D02 |
| 107 | Vergese | 15000 | NULL | D01 |
| 104 | Gurpreet | 19000 | 565 | D04 |
| 108 | Nachaobi | 29000 | NULL | D05 |
| 105 | Joseph | 34000 | 875 | D03 |
| 109 | Daribha | 42000 | NULL | D04 |
| 103 | Shabbir | 45000 | 566 | D01 |
| 106 | Sanya | 48000 | 695 | D02 |
| 110 | Tanya | 50000 | 467 | D05 |
| 102 | Kritika | 60000 | 123 | D01 |
+-------+----------+--------+-------+--------+
10 rows in set (0.05 sec)

Example 8.10 The following query displays details of all the


employees in descending order of their salaries.
mysql> SELECT *
-> FROM EMPLOYEE
-> ORDER BY Salary DESC;

2024-25

Chap 8.indd 163 19-Jul-19 3:45:58 PM


164 Informatics Practices – Class XI

+-------+----------+--------+-------+--------+
| EmpNo | Ename | Salary | Bonus | DeptId |
+-------+----------+--------+-------+--------+
| 102 | Kritika | 60000 | 123 | D01 |
| 110 | Tanya | 50000 | 467 | D05 |
| 106 | Sanya | 48000 | 695 | D02 |
| 103 | Shabbir | 45000 | 566 | D01 |
| 109 | Daribha | 42000 | NULL | D04 |
| 105 | Joseph | 34000 | 875 | D03 |
| 108 | Nachaobi | 29000 | NULL | D05 |
| 104 | Gurpreet | 19000 | 565 | D04 |
| 107 | Vergese | 15000 | NULL | D01 |
| 101 | Aaliya | 10000 | 234 | D02 |
+-------+----------+--------+-------+--------+
10 rows in set (0.00 sec)
(G) Handling NULL Values
SQL supports a special value called NULL to represent
a missing or unknown value. For example, the village
column in a table called address will have no value for
cities. Hence, NULL is used to represent such unknown
values. It is important to note that NULL is different
from 0 (zero). Also, any arithmetic operation performed
with NULL value gives NULL. For example: 5 + NULL =
NULL because NULL is unknown hence the result is also
unknown. In order to check for NULL value in a column,
Activity 8.9
we use IS NULL.
Execute the following
two queries and find Example 8.11 The following query displays details of all
out what will happen if those employees who have not been given a bonus. This
we specify two columns implies that the bonus column will be blank.
in the ORDER BY clause: mysql> SELECT *
-> FROM EMPLOYEE
SELECT *
-> WHERE Bonus IS NULL;
FROM EMPLOYEE
ORDER BY Salary, +-------+----------+--------+-------+--------+
Bonus; | EmpNo | Ename | Salary | Bonus | DeptId |
+-------+----------+--------+-------+--------+
SELECT *
| 107 | Vergese | 15000 | NULL | D01 |
| 108 | Nachaobi | 29000 | NULL | D05 |
FROM EMPLOYEE | 109 | Daribha | 42000 | NULL | D04 |
ORDER BY Salary,Bonus +-------+----------+--------+-------+--------+
desc; 3 rows in set (0.00 sec)
Example 8.12 The following query displays names of all the
employees who have been given a bonus. This implies that
the bonus column will not be blank.
mysql> SELECT EName
-> FROM EMPLOYEE
-> WHERE Bonus IS NOT NULL;
+----------+
| EName |
+----------+
| Aaliya |
| Kritika |
| Shabbir |
| Gurpreet |
| Joseph |
| Sanya |
| Tanya |
+----------+
7 rows in set (0.00 sec)

2024-25

Chap 8.indd 164 19-Jul-19 3:45:58 PM


Introduction to Structured Query Language (SQL) 165

(H) Substring pattern matching Notes


Many a times we come across situations where we don’t
want to query by matching exact text or value. Rather, we
are interested to find matching of only a few characters
or values in column values. For example, to find out
names starting with ‘T’ or to find out pin codes starting
with ‘60’. This is called substring pattern matching.
We cannot match such patterns using = operator as
we are not looking for exact match. SQL provides LIKE
operator that can be used with WHERE clause to search
for a specified pattern in a column.
The LIKE operator makes use of the following two
wild card characters:
• % (percent)— used to represent zero, one, or multiple
characters
• _ (underscore)— used to represent a single character

Example 8.13 The following query displays details of all those


employees whose name starts with 'K'.
mysql> SELECT *
-> FROM EMPLOYEE
-> WHERE Ename LIKE 'K%';

+-------+---------+--------+-------+--------+
| EmpNo | Ename | Salary | Bonus | DeptId |
+-------+---------+--------+-------+--------+
| 102 | Kritika | 60000 | 123 | D01 |
+-------+---------+--------+-------+--------+
1 row in set (0.00 sec)

Example 8.14 The following query displays details of all


those employees whose name ends with 'a'.
mysql> SELECT *
-> FROM EMPLOYEE
-> WHERE Ename LIKE '%a';
+-------+---------+--------+-------+--------+
| EmpNo | Ename | Salary | Bonus | DeptId |
+-------+---------+--------+-------+--------+
| 101 | Aaliya | 10000 | 234 | D02 |
| 102 | Kritika | 60000 | 123 | D01 |
| 106 | Sanya | 48000 | 695 | D02 |
| 109 | Daribha | 42000 | NULL | D04 |
| 110 | Tanya | 50000 | 467 | D05 |
+-------+---------+--------+-------+--------+
5 rows in set (0.00 sec)

Example 8.15 The following query displays details of all


those employees whose name consists of exactly 5 letters
and starts with any letter but has ‘ANYA’ after that.
mysql> SELECT *

2024-25

Chap 8.indd 165 19-Jul-19 3:45:58 PM


166 Informatics Practices – Class XI

-> FROM EMPLOYEE


-> WHERE Ename LIKE '_ANYA';
Think and Reflect +-------+-------+--------+-------+--------+
| EmpNo | Ename | Salary | Bonus | DeptId |
When we type first letter
+-------+-------+--------+-------+--------+
of a contact name in | 106 | Sanya | 48000 | 695 | D02 |
our contact list in our | 110 | Tanya | 50000 | 467 | D05 |
mobile phones all the +-------+-------+--------+-------+--------+
names containing that 2 rows in set (0.00 sec)
character are displayed.
Can you relate SQL Example 8.16 The following query displays names of all the
statement with the employees containing 'se' as a substring in name.
process? List other real mysql> SELECT Ename
life situations where you -> FROM EMPLOYEE
can visualize an SQL -> WHERE Ename LIKE '%se%';
statement in operation. +---------+
| Ename |
+---------+
| Joseph |
| Vergese |
+---------+
2 rows in set (0.00 sec)

Example 8.17 The following query displays names of all


employees containing 'a' as the second character.
mysql> SELECT EName
-> FROM EMPLOYEE
-> WHERE Ename LIKE '_a%';
+----------+
| EName |
+----------+
| Aaliya |
| Sanya |
| Nachaobi |
| Daribha |
| Tanya |
+----------+
5 rows in set (0.00 sec)

8.7 Data Updation and Deletion


Updation and deletion of data are also the parts of SQL
data manipulation. In this section, we are going to apply
these two data manipulation methods.
8.7.1 Data Updation
We may need to make changes in the value(s) of one or
more columns of existing records in a table. For example,
we may require some changes in address, phone number
or spelling of name, etc. The UPDATE statement is used to
make such modifications in the existing data.
Syntax:
UPDATE table_name
SET attribute1 = value1, attribute2 = value2, ...

2024-25

Chap 8.indd 166 19-Jul-19 3:45:58 PM


Introduction to Structured Query Language (SQL) 167

WHERE condition;
The STUDENT Table 8.7 has NULL value for GUID
for student with roll number 3. Also, suppose students
with roll numbers 3 and 5 are siblings. So, in STUDENT
table, we need to fill the GUID value for student with
roll number 3 as 101010101010. In order to update or
change GUID of a particular row (record), we need to
specify that record using WHERE clause, as shown below:
mysql> UPDATE STUDENT
-> SET GUID = 101010101010
-> WHERE RollNumber = 3;
Query OK, 1 row affected (0.06 sec)
Rows matched: 1 Changed: 1 Warnings: 0
We can then verify the updated data using the
statement SELECT * FROM STUDENT.
Caution : If we miss the where clause in the UPDATE statement then
the GUID of all the records will be changed to 101010101010.
We can also update values for more than one column
using the UPDATE statement. Suppose, the guardian
(Table 8.6) with GUID 466444444666 has requested to
change the Address to 'WZ - 68, Azad Avenue, Bijnour,
MP' and Phone number to '4817362092'.
mysql> UPDATE GUARDIAN
-> SET GAddress = 'WZ - 68, Azad Avenue,
-> Bijnour, MP', GPhone = 9010810547
-> WHERE GUID = 466444444666;
Query OK, 1 row affected (0.06 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> SELECT * FROM GUARDIAN ;
+------------+---------------+----------+------------------------------------+
|GUID |GName |Gphone |GAddress |
+------------+---------------+----------+------------------------------------+
|444444444444|Amit Ahuja |5711492685|G-35, Ashok vihar, Delhi |
|111111111111|Baichung Bhutia|3612967082|Flat no. 5, Darjeeling Appt., Shimla|
|101010101010|Himanshu Shah |4726309212|26/77, West Patel Nagar, Ahmedabad |
|333333333333|Danny Dsouza |NULL |S -13, Ashok Village, Daman |
|466444444666|Sujata P. |3801923168|WZ - 68, Azad Avenue, Bijnour, MP |
+------------+---------------+----------+------------------------------------+
5 rows in set (0.00 sec)

8.7.2 Data Deletion


The DELETE statement is used to delete one or more
record(s) from a table.
Syntax:
DELETE FROM table_name
WHERE condition;

2024-25

Chap 8.indd 167 3/31/2023 3:58:23 PM


168 Informatics Practices – Class XI

Suppose the student with roll number 2 has left the


school. We can use the following MySQL statement to
delete that record from the STUDENT table.
mysql> DELETE FROM STUDENT WHERE RollNumber = 2;
Query OK, 1 row affected (0.06 sec)

mysql> SELECT * FROM STUDENT ;


+------------+--------------+--------------+--------------+
| RollNumber | SName | SDateofBirth | GUID |
+------------+--------------+--------------+--------------+
| 1 | Atharv Ahuja | 2003-05-15 | 444444444444 |
| 3 | Taleem Shah | 2002-02-28 | 101010101010 |
| 4 | John Dsouza | 2003-08-18 | 333333333333 |
| 5 | Ali Shah | 2003-07-05 | 101010101010 |
| 6 | Manika P. | 2002-03-10 | 466444444666 |
+------------+--------------+--------------+--------------+
5 rows in set (0.00 sec)

Caution: Like UPDATE statement, we need to be careful to include


WHERE clause while using DELETE statement to delete records in a
table. Otherwise, all the records in the table will get deleted.

Summary
• Database is a collection of related tables. MySQL is a
‘relational’ DBMS. A table is a collection of rows and
columns, where each row is a record and columns
describe the feature of records.
• SQL is the standard language for most RDBMS.
SQL is case insensitive.
• CREATE DATABASE statement is used to create a new
database.
• USE statement is used for making the specified
database as active database.
• CREATE TABLE statement is used to create a table.
• Every attribute in a CREATE TABLE statement must
have a name and a datatype.
• ALTER TABLE statement is used to make changes in
the structure of a table like adding, removing or
changing datatype of column(s).
• The DESC statement with table name shows the
structure of the table.
• INSERT INTO statement is used to insert record(s) in
a table.
• UPDATE statement is used to modify existing data in
a table.
• DELETE statement is used to delete records in a table.

2024-25

Chap 8.indd 168 19-Jul-19 3:45:58 PM


Introduction to Structured Query Language (SQL) 169

• The SELECT statement is used to retrieve data from Notes


one or more database tables.
• SELECT * FROM table_name displays data from all
the attributes of that table.
• The WHERE clause is used to enforce condition(s) in
a query.
• DISTINCT clause is used to eliminate repetition and
display the values only once.
• The BETWEEN operator defines the range of values
inclusive of boundary values.
• The IN operator selects values that match any value
in the given list of values.
• NULL values can be tested using IS NULL and IS
NOT NULL.
• ORDER BY clause is used to display the result of an
SQL query in ascending or descending order with
respect to specified attribute values. The default is
ascending order.
• LIKE clause is used for pattern matching. % and _
are two wild card characters. The percent (%) symbol
is used to represent zero or more characters. The
underscore (_) symbol is used to represent a single
character.

Exercise
1. Match the following clauses with their respective
functions.
ALTER Insert the values in a table
UPDATE Restrictions on columns
DELETE Table definition
INSERT INTO Change the name of a column
CONSTRAINTS Update existing information in a table
DESC Delete an existing row from a table
CREATE Create a database

2. Choose appropriate answer with respect to the following


code snippet.
CREATE TABLE student (
name CHAR(30),

2024-25

Chap 8.indd 169 19-Jul-19 3:45:59 PM


170 Informatics Practices – Class XI

Notes student_id INT,


gender CHAR(1),
PRIMARY KEY (student_id)
);

a) What will be the degree of student table?


i) 30
ii) 1
iii) 3
iv) 4
b) What does ‘name’ represent in the above code snippet?
i) a table
ii) a row
iii) a column
iv) a database
c) What is true about the following SQL statement?
SelecT * fROM student;
i) Displays contents of table ‘student’
ii) Displays column names and contents of table
‘student’
iii) Results in error as improper case has been used
iv) Displays only the column names of table ‘student’
d) What will be the output of following query?
INSERT INTO student
VALUES (“Suhana”,109,’F’),
VALUES (“Rivaan”,102,’M’),
VALUES (“Atharv”,103,’M’),
VALUES (“Rishika”,105,’F’),
VALUES (“Garvit”,104,’M’),
VALUES (“Shaurya”,109,’M’);
i) Error
ii) No Error
iii) Depends on compiler
iv) Successful completion of the query
e) In the following query how many rows will be deleted?
DELETE student
WHERE student_id=109;
i) 1 row
ii) All the rows where student ID is equal to 109
iii) No row will be deleted
iv) 2 rows
3. Fill in the blanks:
a) declares that an index in one table is
related to that in another table.
i) Primary Key
ii) Foreign Key
iii) Composite Key
iv) Secondary Key
b) The symbol Asterisk (*) in a select query retrieves
____________.
i) All data from the table
ii) Data of primary key only

2024-25

Chap 8.indd 170 19-Jul-19 3:45:59 PM


Introduction to Structured Query Language (SQL) 171

iii) NULL data


iv) None of the mentioned
4. Consider the following MOVIE database and answer the
SQL queries based on it.
MovieID MovieName Category ReleaseDate ProductionCost BusinessCost
001 Hindi_Movie Musical 2018-04-23 124500 130000
002 Tamil_Movie Action 2016-05-17 112000 118000
003 English_Movie Horror 2017-08-06 245000 360000
004 Bengali_Movie Adventure 2017-01-04 72000 100000
005 Telugu_Movie Action - 100000 -
006 Punjabi_Movie Comedy - 30500 -

a) Retrieve movies information without mentioning their


column names.
b) List business done by the movies showing only
MovieID, MovieName and BusinessCost.
c) List the different categories of movies.
d) Find the net profit of each movie showing its ID, Name
and Net Profit.
(Hint: Net Profit = BusinessCost – ProductionCost)
Make sure that the new column name is labelled as
NetProfit. Is this column now a part of the MOVIE
relation. If no, then what name is coined for such
columns? What can you say about the profit of a
movie which has not yet released? Does your query
result show profit as zero?
e) List all movies with ProductionCost greater than
80,000 and less than 1,25,000 showing ID, Name
and ProductionCost.
f) List all movies which fall in the category of Comedy
or Action.
g) List the movies which have not been released yet.
5. Suppose your school management has decided to conduct
cricket matches between students of class XI and Class
XII. Students of each class are asked to join any one
of the four teams — Team Titan, Team Rockers, Team
Magnet and Team Hurricane. During summer vacations,
various matches will be conducted between these teams.
Help your sports teacher to do the following:
a) Create a database “Sports”.
b) Create a table “TEAM” with following considerations:
i) It should have a column TeamID for storing an
integer value between 1 to 9, which refers to
unique identification of a team.
ii) Each TeamID should have its associated name
(TeamName), which should be a string of length
not less than 10 characters.

2024-25

Chap 8.indd 171 19-Jul-19 3:45:59 PM


172 Informatics Practices – Class XI

c) Using table level constraint, make TeamID as primary


key.
d) Show the structure of the table TEAM using SQL
command.
e) As per the preferences of the students four teams
were formed as given below. Insert these four rows in
TEAM table:
Row 1: (1, Team Titan)
Row 2: (2, Team Rockers)
Row 3: (3, Team Magnet)
Row 4: (4, Team Hurricane)
f) Show the contents of the table TEAM.
g) Now create another table below. MATCH_DETAILS
and insert data as shown in table. Choose appropriate
domains and constraints for each attribute.
Table: MATCH_DETAILS
MatchID MatchDate FirstTeamID SecondTeamID FirstTeamScore SecondTeamScore
M1 2018-07-17 1 2 90 86
M2 2018-07-18 3 4 45 48
M3 2018-07-19 1 3 78 56
M4 2018-07-19 2 4 56 67
M5 2018-07-20 1 4 32 87
M6 2018-07-21 2 3 67 51

h) Use the foreign key constraint in the MATCH_


DETAILS table with reference to TEAM table so
that MATCH_DETAILS table records score of teams
existing in the TEAM table only.
6. Using the sports database containing two relations
(TEAM, MATCH_DETAILS), answer the following
relational algebra queries.
a) Retrieve the MatchID of all those matches where both
the teams have scored > 70.
b) Retrieve the MatchID of all those matches where
FirstTeam has scored < 70 but SecondTeam has
scored > 70.
c) Find out the MatchID and date of matches played by
Team 1 and won by it.
d) Find out the MatchID of matches played by Team 2
and not won by it.
e) In the TEAM relation, change the name of the relation
to T_DATA. Also change the attributes TeamID and
TeamName to T_ID and T_NAME respectively.
7. Differentiate between the following commands:
a) ALTER and UPDATE
b) DELETE and DROP

2024-25

Chap 8.indd 172 19-Jul-19 3:45:59 PM


Introduction to Structured Query Language (SQL) 173

8. Create a database called STUDENT_PROJECT having


the following tables. Choose appropriate data type and
apply necessary constraints.
Table: STUDENT
RollNo Name Stream Section RegistrationID

* The values in Stream column can be either Science, Commerce,


or Humanities.
* The values in Section column can be either I or II.

Table: PROJECT_ASSIGNED
RegistrationID ProjectID AssignDate

Table: PROJECT
ProjectID ProjectName SubmissionDate TeamSize GuideTeacher

a) Populate these tables with appropriate data.


b) Write SQL queries for the following.
c) Find the names of students in Science Stream.
d) What will be the primary keys of the three tables?
e) What are the foreign keys of the three relations?
f) Finds names of all the students studying in class
‘Commerce stream’ and are guided by same teacher,
even if they are assigned different projects.
9. An organization ABC maintains a database EMP-
DEPENDENT to record the following details about its
employees and their dependents.
EMPLOYEE(AadhaarNo, Name, Address, Department,
EmpID)
DEPENDENT(EmpID, DependentName, Relationship)
Use the EMP-DEPENDENT database to answer the following
SQL queries:
a) Find the names of employees with their dependent
names.
b) Find employee details working in a department, say,
‘PRODUCTION’.
c) Find employee names having no dependent
d) Find names of employees working in a department,
say, ‘SALES’ and having exactly two dependents.
10. A shop called Wonderful Garments that sells school
uniforms maintain a database SCHOOL_UNIFORM as shown
below. It consisted of two relations — UNIFORM and
PRICE. They made UniformCode as the primary key for
UNIFORM relation. Further, they used UniformCode and
Size as composite keys for PRICE relation. By analysing
the database schema and database state, specify SQL
queries to rectify the following anomalies.

2024-25

Chap 8.indd 173 19-Jul-19 3:45:59 PM


174 Informatics Practices – Class XI

a) The PRICE relation has an attribute named Price. In


order to avoid confusion, write SQL query to change
the name of the relation PRICE to COST.
UNIFORM PRICE

UCode UName UColor UCode Size Price


1 Shirt White 1 M 500
2 Pant Grey 1 L 580
3 Skirt Grey 1 XL 620
4 Tie Blue 2 M 810

5 Socks Blue 2 L 890

6 Belt Blue 2 XL 940


3 M 770
3 L 830
3 XL 910
4 S 150
4 L 170
5 S 180
5 L 210
6 M 110
6 L 140
6 XL 160

b) M/S Wonderful Garments also keeps handkerchiefs


of red color, medium size of `100 each. Insert this
record in COST table.
c) When you used the above query to insert data,
you were able to enter the values for handkerchief
without entering its details in the UNIFORM
relation. Make a provision so that the data can be
entered in COST table only if it is already there in
UNIFROM table.
d) Further, you should be able to assign a new UCode
to an item only if it has a valid UName. Write a
query to add appropriate constraint to the SCHOOL_
UNIFORM database.
e) ALTER table to add the constraint that price of an
item is always greater than zero.

2024-25

Chap 8.indd 174 19-Jul-19 3:45:59 PM


Informatics Practices

Textbook for Class XII

2024-25

Prelims.indd 1 11/26/2020 12:30:29 PM


12149 – INFORMATICS PRACTICES ISBN 978-93-5292-361-8
Textbook for Class XII

First Edition
ALL RIGHTS RESERVED
December 2020 Agrahayana 1942
 No part of this publication may be reproduced, stored in a
retrieval system or transmitted, in any form or by any means,
electronic, mechanical, photocopying, recording or otherwise
Reprinted without the prior permission of the publisher.
January 2023 Pausha 1944  This book is sold subject to the condition that it shall not, by
way of trade, be lent, re-sold, hired out or otherwise disposed
March 2024 Chaitra 1946 of without the publisher’s consent, in any form of binding or
cover other than that in which it is published.
 The correct price of this publication is the price printed on
this page, Any revised price indicated by a rubber stamp or
by a sticker or by any other means is incorrect and should
PD 50T SU be unacceptable.

OFFICES OF THE PUBLICATION


DIVISION, NCERT
NCERT Campus
Sri Aurobindo Marg
© National Council of New Delhi 110 016 Phone : 011-26562708
Educational Research and 108, 100 Feet Road
Training, 2020 Hosdakere Halli Extension
Banashankari III Stage
Bengaluru 560 085 Phone : 080-26725740
Navjivan Trust Building
P.O.Navjivan
Ahmedabad 380 014 Phone : 079-27541446
CWC Campus
Opp. Dhankal Bus Stop
Panihati
Kolkata 700 114 Phone : 033-25530454
CWC Complex
Maligaon
205.00 Guwahati 781 021 Phone : 0361-2674869

Publication Team
Head, Publication : Anup Kumar Rajput
Division
Chief Editor : Shveta Uppal
Chief Production : Arun Chitkara
Printed on 80 GSM paper Officer
Chief Business : Amitabh Kumar
Published at the Publication Division
Manager (In charge)
by the Secretary, National Council of
Educational Research and Training, Assistant Production : Sunil Kumar
Sri Aurobindo Marg, New Delhi 110016 Officer
and printed at Young Printing Press,
Cover and Layout
S-119, Site-II, Harsha Compound,
Mohan Nagar Industrial Area, DTP Cell, Publication Division
Ghaziabad (UP)

2024-25

Prelims.indd ii 3/11/2024 10:14:42 AM


Foreword
Information Technology has continuously been crossing the barriers of
access and communication and reaching more and more people. The number
of internet users in India has been on the rise. The tremendous growth in
computer science, telecommunications and information technology has
resulted in automation of various tasks and contributed to the ease of
living. Technology has made continuous inroads into diverse areas—be it
business, commerce, science, sports, health, transportation or education.
Today, we are living in an interconnected world where computer based
applications influence the way we learn, communicate, commute, or even
socialise.
With so many users of Information and Communication Technology
(ICT), huge volumes of data are continuously generated at an unprecedented
rate. Many innovative business models are being evolved which utilise such
data to reach potential customers in a more targeted way. Government
agencies are also using data to deliver services and fast track progress
of different programmes, strengthen accountability and to make more
informed decisions. This has been creating better opportunities for our
youth not only to enter the field of technical education but also in the
world of work. NCERT, for the first time, has developed a textbook on
‘Informatics Practices’ to develop skill sets in students to make use of the
opportunities provided by ICT.
This book focuses on the fundamental concepts related to handling of
data while opening a window to the emerging areas of data processing. It
seeks to address the dual challenges of reducing curricular load as well as
introducing the latest development in the field of ICT.
As an organisation committed to systemic reforms and continuous
improvement in the quality of its curricular material, NCERT welcomes
comments and suggestions to enable us to bring about necessary changes
in its further publications.


Hrushikesh Senapaty
Director
New Delhi National Council of Educational
August 2020 Research and Training

2024-25

Prelims.indd 3 11/26/2020 12:30:29 PM


2024-25

Prelims.indd 4 11/26/2020 12:30:29 PM


Preface
In the present education system of our country, specialised and
discipline based courses are introduced at the higher secondary stage.
This stage is crucial as well as challenging because of the transition
from general to discipline-based curriculum. The syllabus at this stage
needs to have sufficient rigour and depth while remaining mindful of the
comprehension level of the learners. Further, the textbook should not be
heavily loaded with content.
We are living in an era where information drives many of our socio
economic decisions. Millions of people are accessing internet round the
clock for availing various services and thereby generating vast amount of
data. Processing of data is becoming a key skill with applications across the
disciplines. Thus, study of basic concepts of data handling and analysis
is becoming more and more desirable. There are courses offered in the
name of Computer Science, Information and Communication Technology
(ICT), Information Technology (IT), etc. by various boards and schools up
to secondary stage, as optional. These mainly focus on using computer for
word processing, presentation tools and application software.
Informatics Practices (IP) at the higher secondary stage of school
education is also offered as an optional subject. At this stage, students
can take up IP with the aim of pursuing a career in data science or related
areas after going through professional courses at higher levels. Therefore,
at higher secondary stage, the curriculum of IP introduces basics of
database management systems and data processing. The book has seven
chapters covering the following broader themes:
• SQL Queries: Querying database using the Structured Query
Language by applying SQL functions including aggregate functions.
• Data Handling: The popular Python library called Pandas has been
introduced. The important data structures of Pandas – Series and
DataFrame have been covered in details and basic data handling
and data analysis using Pandas are included.
• Data Visualisation: The Pandas library called Pyplot is introduced.
It demonstrates how to generate high quality graphs and charts
from Python using the Pyplot tool.
• Internet and Web: Introduction to the concepts of Computer
networks are given, followed by a brief overview of Internet, its
application are given. The concept of web, website, and its hosting
is also included.
• Societal Impact: Awareness of digital footprints, data privacy and
protection, cyber crime, etiquettes, copyright and plagiarism,
E-waste in a digital society and their implications on security,
privacy, piracy, ethics, values and health concerns.

2024-25

Prelims.indd 5 11/26/2020 12:30:29 PM


vi

Each chapter has two additional components — (i) activities and


(ii) think and reflect for self assessment while learning as well as to generate
further interest in the learner. A number of hands-on examples are given to
gradually explain methodology to solve different types of problems across
the Chapters. The programming examples as well as the exercises in the
chapters are required to be solved in a computer and verify with the given
outputs.
Box items are pinned inside the chapters either to explain related
concepts or to describe additional information related to the topic covered
in that section. However, these box-items are not to be assessed through
examinations.
Project Based Learning given at the end includes exemplar projects
related to real-world problems. Teachers are supposed to assign these or
similar projects to be developed in groups. Working in such projects may
promote peer-learning, team spirit and responsiveness.
The chapters have been written by involving practicing teachers as well
as subject experts. Several iterations have resulted into this book. Thanks
are due to the authors and reviewers for their valuable contribution. I would
like to place on record appreciation for Professor Om Vikas for leading the
review activities of the book as well as for his guidance and motivation
to the development team throughout. Comments and suggestions
are welcome.

New Delhi Dr. Rejaul Karim Barbhuiya


31 August 2020 Assistant Professor
Central Institute of
Educational Technology

2024-25

Prelims.indd 6 11/26/2020 12:30:29 PM


Textbook Development Committee
Members
Anamika Gupta, Assistant Professor, Shaheed Sukhdev College of Business
Studies, University of Delhi
Anju Gupta, Freelance Educationist, Delhi
Anuradha Khattar, Assistant Professor, Miranda House, University of Delhi
Chetna Khanna, Freelance Educationist, Delhi
Harita Ahuja, Assistant Professor, Acharya Narendra Dev College,
University of Delhi
Mohini Arora, HOD (Computer Science), Air Force Golden Jubilee Institute,
Subroto Park, Delhi
Naeem Ahmad, Assistant Professor, Madanapalle Institute of Technology
and Science, Madanapalle, Andhra Pradesh
Naveen Gupta, PGT (Computer Science), St. Marks's Sr Sec Public School,
Meera Bagh, Delhi
Neeru Mittal, PGT (Computer Science), SRDAV Public School, Dayanand
Vihar, Delhi
Priti Rai Jain, Assistant Professor, Miranda House, University of Delhi
Sangita Chadha, HOD (Computer Science), Ambience Public School,
Safdarjung Enclave, Delhi
Sharanjit Kaur, Associate Professor, Acharya Narendra Dev College,
University of Delhi
Sugandha Gupta, Assistant Professor, Sri Guru Gobind Singh College of
Commerce, University of Delhi
Vineeta Garg, PGT (Computer Science), SRDAV Public School, Dayanand
Vihar, Delhi

Member-Coordinator
Rejaul Karim Barbhuiya, Assistant Professor, Central Institute of
Educational Technology, NCERT, Delhi

2024-25

Prelims.indd 7 11/26/2020 12:30:29 PM


Acknowledgements
The National Council of Educational Research and Training acknowledges
the valuable contributions of the individuals and organisations involved in
the development of Informatics Practices textbook for Class XII.
The Council expresses its gratitude to the syllabus development team
including MPS Bhatia, Professor, Netaji Subhas Institute of Technology,
Delhi; T. V. Vijay Kumar, Professor, School of Computer and Systems
Sciences, Jawaharlal Nehru University, New Delhi; Zahid Raza, Associate
Professor, School of Computer and Systems Sciences, Jawaharlal Nehru
University, New Delhi; Vipul Shah, Principal Scientist, Tata Consultancy
Services, and the CSpathshala team; Aasim Zafar, Associate Professor,
Department of Computer Science, Aligarh Muslim University, Aligarh;
Faisal Anwer, Assistant Professor, Department of Computer Science,
Aligarh Muslim University, Aligarh; Smruti Ranjan Sarangi, Associate
Professor, Department of Computer Science and Engineering, Indian
Institute of Technology, Delhi; Vikram Goyal, Associate Professor,
Indraprastha Institute of Information Technology (IIIT), Delhi; and Mamur
Ali, Assistant Professor, Department of Teacher Training and Non-formal
Education (IASE), Faculty of Education, Jamia Millia Islamia, New Delhi.
The Council is thankful to the following resource persons for providing
valuable inputs in developing this book — D.N. Sansanwal, Retd. Professor,
Devi Ahilya Vishwavidyalaya, Indore; Veer Sain Dixit, Assistant Professor,
Atma Ram Sanatan Dharma College, University of Delhi; Mukesh Kumar,
DPS RK Puram, Delhi; Aswin K. Dash, Mother’s International School,
Delhi; Purvi Kumar, Co-ordinator, Computer Science Department, Ganga
International School, Rohtak Road, Delhi; Mudasir Wani, Assistant
Professor, Govt. College for Women, Nawakadal, Srinagar, Jammu and
Kashmir; Sajid Yousuf Bhat, Assistant Professor, University of Kashmir,
Jammu and Kashmir; Professor Om Vikas, Formerly Director, ABV-IIITM,
Gwalior, MP.
The council is grateful to Sunita Farkya, Professor and Head, Department
of Education in Science and Mathematics, NCERT and Amarendra P. Behera,
Professor and Joint Director, CIET, NCERT for their valuable cooperation
and support throughout the development of this book.
The Council also gracefully acknowledges the contributions of
Meetu Sharma, Graphic Designer cum DTP Operator; Kanika Walecha,
DTP Operator; Pooja, Junior Project Fellow in shaping this book. The
contributions of the office of the APC, DESM and Publication Division,
NCERT, New Delhi, in bringing out this book are also duly acknowledged.
The Council also acknowledges the contribution of Ankeeta Bezboruah
Assistant Editor (Contractual), Publication Division, NCERT for copy
editing this book. The efforts of Rajshree Saini, DTP Operator (Contractual),
Publication Division, NCERT are also acknowledged.

2024-25

Prelims.indd 8 11/26/2020 12:30:29 PM


Contents
Foreword iii
Chapter 1 Querying and SQL Functions 1
1.1 Introduction 1
1.2 Functions in SQL 4
1.3 GROUP BY in SQL 14
1.4 Operations on Relations 16
1.5 Using Two Relations in a Query 19

Chapter 2 Data Handling using Pandas - I 27


2.1 Introduction to Python Libraries 27
2.2 Series 29
2.3 DataFrame 40
2.4 Importing and Exporting Data between CSV
Files and DataFrames 55
2.5 Pandas Series Vs NumPy ndarray 57

Chapter 3 Data Handling using Pandas - II 63


3.1 Introduction 63
3.2 Descriptive Statistics 65
3.3 Data Aggregations 75
3.4 Sorting a DataFrame 77
3.5 Group by Functions 79
3.6 Altering the Index 82
3.7 Other DataFrame Operations 84
3.8 Handling Missing Values 89
3.9 Import and Export of Data between Pandas
and MySQL 98

Chapter 4 Plotting Data using Matplotlib 105


4.1 Introduction 105
4.2 Plotting using Matplotlib 106
4.3 Customisation of Plots 108
4.4 The Pandas Plot Function (Pandas Visualisation) 112

2024-25

Prelims.indd 9 11/26/2020 12:30:29 PM


x

Chapter 5 Internet and Web 137


5.1 Introduction to Computer Networks 137
5.2 Types of Networks 139
5.3 Network Devices 142
5.4 Networking Topologies 146
5.5 The Internet 148
5.6 Applications of Internet 149
5.7 Website 153
5.8 Web Page 154
5.9 Web Server 156
5.10 Hosting of a Website 157
5.11 Browser 158

Chapter 6 Societal Impacts 167


6.1 Introduction 167
6.2 Digital Footprints 168
6.3 Digital Society and Netizen 169
6.4 Data Protection 174
6.5 Creative Commons 178
6.6 Cyber Crime 179
6.7 Indian Information Technology Act (IT Act) 182
6.8 E-waste: Hazards and Management 183
6.9 Impact on Health 186

Chapter 7 Project Based Learning 195


7.1 Introduction 195
7.2 Approaches for Solving Projects 196
7.3 Teamwork 197
7.4 Project Descriptions 199

2024-25

Prelims.indd 10 11/26/2020 12:30:29 PM


Chapter
Querying and SQL
1 Functions

“Any unique image that you desire


probably already exists on the
internet or in some database... The
problem today is no longer how to
create the right image, but how to
find an already existing one”
— Lev Manovich

In this chapter
»» Introduction
»» Functions in SQL
1.1 Introduction »» Group By in SQL
In Class XI, we have understood database »» Operations on
concepts and learned how to create databases Relations
using MySQL. We have also learnt how to »» Using Two Relations
populate, manipulate and retrieve data from in a Query
a database using SQL queries.
In this chapter, we are going to learn
more SQL commands which are required
to perform various queries in a database.
We will understand how to use single row
functions, multiple row functions, arranging
records in ascending or descending order,
grouping records based on some criteria,
and working on multiple tables using SQL.
Let us create a database called
CARSHOWROOM, having the schema as

2024-25

Chapter 1.indd 1 11/26/2020 12:31:29 PM


2 Informatics Practices

shown in Figure 1.1. It has the following four relations:


• INVENTORY: Stores name, price, model, year
of manufacturing, and fuel type for each car in
inventory of the showroom,
• CUSTOMER: Stores customer Id, name, address,
phone number and email for each customer,
• SALE: Stores the invoice number, car Id, customer
id, sale date, mode of payment, sales person’s
employee Id, and selling price of the car sold,
• EMPLOYEE: Stores employee Id, name, date of
birth, date of joining, designation, and salary of
each employee in the showroom.

Inventory Customer

Car ID CustID
CarName CustName
Price CustAdd
Model Phone
YearManufacture Email
FuelType

Sale
Employee
InvoiceNo
EmpID
CarID
EmpName
CustID
DOB
SaleDate
DOJ
PaymentMode
Designation
EmpID
Salary
SalePrice

Figure 1.1: Schema diagram of database CARSHOWROOM


The records of the four relations are shown in Tables
1.1, 1.2, 1.3, and 1.4 respectively.
Table 1.1 INVENTORY
mysql> SELECT * FROM INVENTORY;
+-------+--------+-----------+-----------+-----------------+----------+
| CarId | CarName| Price | Model | YearManufacture | Fueltype |
+-------+--------+-----------+-----------+-----------------+----------+
| D001 | Car1 | 582613.00 | LXI | 2017 | Petrol |
| D002 | Car1 | 673112.00 | VXI | 2018 | Petrol |
| B001 | Car2 | 567031.00 | Sigma1.2 | 2019 | Petrol |
| B002 | Car2 | 647858.00 | Delta1.2 | 2018 | Petrol |

2024-25

Chapter 1.indd 2 11/26/2020 12:31:29 PM


Querying and SQL Functions 3

| E001 | Car3 | 355205.00 | 5 STR STD | 2017 | CNG |


| E002 | Car3 | 654914.00 | CARE | 2018 | CNG |
| S001 | Car4 | 514000.00 | LXI | 2017 | Petrol |
| S002 | Car4 | 614000.00 | VXI | 2018 | Petrol |
+-------+--------+-----------+-----------+-----------------+----------+
8 rows in set (0.00 sec)
Table 1.2 CUSTOMER
mysql> SELECT * FROM CUSTOMER;
+-------+------------+-----------------------+------------+-------------------+
|CustId | CustName | CustAdd | Phone | Email |
+-------+------------+-----------------------+------------+-------------------+
| C0001 |AmitSaha | L-10, Pitampura | 4564587852 |[email protected]|
| C0002 |Rehnuma | J-12, SAKET | 5527688761 |[email protected]|
| C0003 |CharviNayyar| 10/9, FF, Rohini | 6811635425 |[email protected]|
| C0004 |Gurpreet | A-10/2, SF, MayurVihar| 3511056125 |[email protected]|
+-------+------------+-----------------------+------------+-------------------+
4 rows in set (0.00 sec)
Table 1.3 SALE
mysql> SELECT * FROM SALE;
+-----------+-------+--------+------------+--------------+-------+-----------+
| InvoiceNo | CarId | CustId | SaleDate | PaymentMode |EmpID | SalePrice |
+-----------+-------+--------+------------+--------------+-------+-----------+
| I00001 | D001 | C0001 | 2019-01-24 | Credit Card | E004 | 613247.00 |
| I00002 | S001 | C0002 | 2018-12-12 | Online | E001 | 590321.00 |
| I00003 | S002 | C0004 | 2019-01-25 | Cheque | E010 | 604000.00 |
| I00004 | D002 | C0001 | 2018-10-15 | Bank Finance | E007 | 659982.00 |
| I00005 | E001 | C0003 | 2018-12-20 | Credit Card | E002 | 369310.00 |
| I00006 | S002 | C0002 | 2019-01-30 | Bank Finance | E007 | 620214.00 |
+-----------+-------+--------+------------+--------------+-------+-----------+
6 rows in set (0.00 sec)
Table 1.4 EMPLOYEE
mysql> SELECT * FROM EMPLOYEE;
+-------+----------+------------+------------+--------------+--------+
| EmpID | EmpName | DOB | DOJ | Designation | Salary |
+-------+----------+------------+------------+--------------+--------+
| E001 |Rushil | 1994-07-10 | 2017-12-12 | Salesman | 25550 |
| E002 |Sanjay | 1990-03-12 | 2016-06-05 | Salesman | 33100 |
| E003 |Zohar | 1975-08-30 | 1999-01-08 | Peon | 20000 |
| E004 |Arpit | 1989-06-06 | 2010-12-02 | Salesman | 39100 |
| E006 |Sanjucta | 1985-11-03 | 2012-07-01 | Receptionist | 27350 |
| E007 |Mayank | 1993-04-03 | 2017-01-01 | Salesman | 27352 |
| E010 |Rajkumar | 1987-02-26 | 2013-10-23 | Salesman | 31111 |
+-------+----------+------------+------------+--------------+--------+
7 rows in set (0.00 sec)

2024-25

Chapter 1.indd 3 11/26/2020 12:31:29 PM


4 Informatics Practices

1.2 Functions in SQL


We know that a function is used to perform some
particular task and it returns zero or more values as a
result. Functions are useful while writing SQL queries
also. Functions can be applied to work on single or
multiple records (rows) of a table. Depending on their
application in one or multiple rows, SQL functions
are categorised as Single row functions and Aggregate
functions.
1.2.1 Single Row Functions
These are also known as Scalar functions. Single row
functions are applied on a single value and return
a single value. Figure 1.2 lists different single row
functions under three categories — Numeric (Math),
String, Date and Time.
Math functions accept numeric value as input, and
return a numeric value as a result. String functions
accept character value as input, and return either
character or numeric values as output. Date and
time functions accept date and time values as input,
and return numeric or string, or date and time values
as output.

Single Row Function

Numeric Function String Function Date Function

POWER() UCASE() NOW()


ROUND() LCASE() DATE()
MOD() MID() MONTH()
LENGTH() MONTHNAME()
LEFT() YEAR()
RIGHT() DAY()
INSTR() DAYNAME()
LTRIM()
RTRIM()
TRIM()

Figure 1.2: Three categories of single row functions in SQL

2024-25

Chapter 1.indd 4 11/26/2020 12:31:30 PM


Querying and SQL Functions 5

(A) Numeric Functions


Three commonly used numeric functions are POWER(),
ROUND() and MOD(). Their usage along with syntax is
given in Table 1.5.
Table 1.5 Math Functions
Function Description Example with output
POWER(X,Y) Calculates X to the power Y. mysql> SELECT POWER(2,3);
can also be written as Output:
POW(X,Y)
8
ROUND(N,D) Rounds off number N to D mysql>SELECT ROUND(2912.564, 1);
number of decimal places. Output:
Note: If D=0, then it rounds
off the number to the nearest 2912.6
integer. mysql> SELECT ROUND(283.2);
Output:
283
MOD(A, B) Returns the remainder mysql> SELECT MOD(21, 2);
after dividing number A by Output:
number B.
1

Example 1.1
In order to increase sales, suppose the car dealer decides
to offer his customers to pay the total amount in 10
easy EMIs (equal monthly installments). Assume that
EMIs are required to be in multiples of 10,000. For that,
the dealer wants to list the CarID and Price along with
the following data from the Inventory table:
a) Calculate GST as 12% of Price and display the result
after rounding it off to one decimal place.
mysql> SELECT ROUND(12/100*Price,1) "GST"
FROM INVENTORY;
+---------+
| GST |
+---------+
| 69913.6 |
| 80773.4 |
| 68043.7 |
| 77743.0 |
| 42624.6 |
| 78589.7 |
| 61680.0 |
| 73680.0 |
+---------+
8 rows in set (0.00 sec)
b) Add a new column FinalPrice to the table inventory,
which will have the value as sum of Price and 12%
of the GST.

2024-25

Chapter 1.indd 5 11/26/2020 12:31:30 PM


6 INFORMATICS PRACTICES

mysql> ALTER TABLE INVENTORY ADD FinalPrice


Numeric(10,1);
Query OK, 8 rows affected (0.03 sec)
Records: 8 Duplicates: 0 Warnings: 0

mysql> UPDATE INVENTORY SET


FinalPrice=Price+Round(Price*12/100,1);
Query OK, 8 rows affected (0.01 sec)
Rows matched: 8 Changed: 8 Warnings: 0
mysql> SELECT * FROM INVENTORY;
+-------+--------+-----------+----------+---------------+----------+-------------+
| CarId |CarName | Price | Model |YearManufacture| FuelType | FinalPric |
+-------+--------+-----------+----------+---------------+----------+-------------+
| D001 |Car1 | 582613.00 | LXI | 2017 | Petrol | 652526.6 |
| D002 |Car1 | 673112.00 | VXI | 2018 | Petrol | 753885.4 |
| B001 |Car2 | 567031.00 | Sigma1.2 | 2019 | Petrol | 635074.7 |
| B002 |Car2 | 647858.00 | Delta1.2 | 2018 | Petrol | 725601.0 |
| E001 |Car3 | 355205.00 | 5STR STD | 2017 | CNG | 397829.6 |
| E002 |Car3 | 654914.00 | CARE | 2018 | CNG | 733503.7 |
| S001 |Car4 | 514000.00 | LXI | 2017 | Petrol | 575680.0 |
| S002 |Car4 | 614000.00 | VXI | 2018 | Petrol | 687680.0 |
+-------+--------+-----------+----------+---------------+----------+-------------+
8 rows in set (0.00 sec)
c) Calculate and display the amount to be paid
each month (in multiples of 1000) which is to be
calculated after dividing the FinalPrice of the car
into 10 instalments.
d) After dividing the amount into EMIs, find out the
remaining amount to be paid immediately, by
performing modular division.
Following SQL query can be used to solve the above
mentioned problems:
mysql> select CarId, FinalPrice, ROUND((FinalPrice-
MOD(FinalPrice,10000))/10,0) "EMI", MOD(FinalPrice,10000) "Remaining Amount"
FROM INVENTORY;
+-------+------------+-------+------------------+
| CarId | FinalPrice | EMI | Remaining Amount |
+-------+------------+-------+------------------+
| D001 | 652526.6 | 65000 | 2526.6 |
| D002 | 753885.4 | 75000 | 3885.4 |
| B001 | 635074.7 | 63000 | 5074.7 |
| B002 | 725601.0 | 72000 | 5601.0 |
| E001 | 397829.6 | 39000 | 7829.6 |
| E002 | 733503.7 | 73000 | 3503.7 |
| S001 | 575680.0 | 57000 | 5680.0 |
| S002 | 687680.0 | 68000 | 7680.0 |
+-------+------------+-------+------------------+
8 rows in set (0.00 sec)

2024-25

Chapter 1.indd 6 19-Sep-2023 10:15:54 AM


Querying and SQL Functions 7

Example 1.2
a) Let us now add a new column Commission to the
SALE table. The column Commission should have
a total length of 7 in which 2 decimal places to
be there.
mysql> ALTER TABLE SALE ADD(Commission
Numeric(7,2));
Query OK, 6 rows affected (0.34 sec)
Records: 6 Duplicates: 0 Warnings: 0

b) Let us now calculate commission for sales agents


as 12 per cent of the SalePrice, insert the values
to the newly added column Commission and then
display records of the table SALE where commission
> 73000.
mysql> UPDATE SALE SET
Commission=12/100*SalePrice;
Query OK, 6 rows affected (0.06 sec)
Rows matched: 6 Changed: 6 Warnings: 0

mysql> SELECT * FROM SALE WHERE Commission > 73000;


+---------------+------+----------+------------+------+-----------+-----------+
|invoiceno|carid|custid| saledate |paymentmode |empid | saleprice |Commission |
+---------------+------+----------+------------+------+-----------+-----------+
|I00001 |D001 |C0001 |2019-01-24|Credit Card |E004 | 613247.00 | 73589.64 |
|I0000 |D002 |C0001 |2018-10-15|Bank Finance|E007 | 659982.00 | 79197.84 |
|I00006 |S002 |C0002 |2019-01-30|Bank Finance|E007 | 620214.00 | 74425.68 |
+---------------+------+-----------+------------+------+----------+-----------+
3 rows in set (0.02 sec)

c) Display InvoiceNo, SalePrice and Commission such


that commission value is rounded off to 0.
mysql> SELECT InvoiceNo, SalePrice,
Round(Commission,0) FROM SALE;
+-----------+-----------+---------------------+ Activity 1.1
| InvoiceNo | SalePrice | Round(Commission,0) |
+-----------+-----------+---------------------+ Using the table SALE
| I00001 | 613247.00 | 73590 | of CARSHOWROOM
| I00002 | 590321.00 | 70839 | database, write
| I00003 | 604000.00 | 72480 | SQL queries for the
| I00004 | 659982.00 | 79198 | following:
| I00005 | 369310.00 | 44317 | a) Display the InvoiceNo
| I00006 | 620214.00 | 74426 | and commission
+-----------+-----------+---------------------+ value rounded off to
6 rows in set (0.00 sec) zero decimal places.
(B) String Functions b) Display the details of
SALE where payment
String functions can perform various operations on
mode is credit card..
alphanumeric data which are stored in a table. They
can be used to change the case (uppercase to lowercase

2024-25

Chapter 1.indd 7 11/26/2020 12:31:30 PM


8 Informatics Practices

or vice-versa), extract a substring, calculate the length


of a string and so on. String functions and their usage
are shown in Table 1.6.
Table 1.6 String Functions
Function Description Example with output
UCASE(string) Converts string into uppercase. mysql> SELECT
OR UCASE(“Informatics
UPPER(string) Practices”);
Output:
INFORMATICS PRACTICES
LOWER(string) Converts string into lowercase. mysql> SELECT
OR LOWER(“Informatics
LCASE(string) Practices”);
Output:
informatics practices
MID(string, pos, n) Returns a substring of size n mysql> SELECT
OR starting from the specified position MID(“Informatics”, 3, 4);
SUBSTRING(string, (pos) of the string. If n is not Output:
pos, n) specified, it returns the substring form
OR from the position pos till end of the
SUBSTR(string, pos, n) string.
mysql> SELECT
MID(‘Informatics’,7);
Output:
atics
LENGTH(string) Return the number of characters mysql> SELECT
in the specified string. LENGTH(“Informatics”);
Output:
11
LEFT(string, N) Returns N number of characters mysql> SELECT
from the left side of the string. LEFT(“Computer”, 4);
Output:
Comp
RIGHT(string, N) Returns N number of characters mysql> SELECT
from the right side of the string. RIGHT(“SCIENCE”, 3);
Output:
NCE
INSTR(string, Returns the position of the first mysql> SELECT
substring) occurrence of the substring in INSTR(“Informatics”, “ma”);
the given string. Returns 0, if the Output:
substring is not present in the
6
string.
LTRIM(string) Returns the given string after mysql> SELECT LENGTH(“
removing leading white space DELHI”), LENGTH(LTRIM(“
characters. DELHI”));
Output:
+--------+--------+
| 7 | 5 |
+--------+--------+
1 row in set (0.00 sec)

2024-25

Chapter 1.indd 8 11/26/2020 12:31:30 PM


Querying and SQL Functions 9

RTRIM(string) Returns the given string after mysql>SELECT LENGTH(“PEN “)


removing trailing white space LENGTH(RTRIM(“PEN “));
characters. Output:
+--------+--------+
| 5 | 3 |
+--------+--------+
1 row in set (0.00 sec)
TRIM(string) Returns the given string after mysql> SELECT LENGTH(“ MADAM
removing both leading and trailing “),LENGTH(TRIM(“ MADAM “));
white space characters. Output:
+--------+--------+
| 9 | 5 |
+--------+--------+
1 row in set (0.00 sec)

Example 1.3
Let us use CUSTOMER relation shown in Table 1.2 to
understand the working of string functions.
a) Display customer name in lower case and customer
email in upper case from table CUSTOMER.
mysql> SELECT LOWER(CustName), UPPER(Email) FROM
CUSTOMER;
+-----------------+---------------------+
| LOWER(CustName) | UPPER(Email) |
+-----------------+---------------------+
| amitsaha | [email protected] |
| rehnuma | [email protected] |
| charvinayyar | [email protected] | Activity 1.2
| gurpreet | [email protected] | Using the table
+-----------------+---------------------+ INVENTORY from
4 rows in set (0.00 sec) CARSHOWROOM
database, write
sql queries for the
b) Display the length of the email and part of the email following:
from the email ID before the character ‘@’. Note - Do a) Convert the CarMake
not print ‘@’. to uppercase if its
mysql> SELECT LENGTH(Email), LEFT(Email, INSTR(Email, value starts with the
"@")-1) FROM CUSTOMER; letter ‘B’.
+---------------+----------------------------------+ b) If the length of
| LENGTH(Email) | LEFT(Email, INSTR(Email, "@")-1) | the car’s model is
+---------------+----------------------------------+ greater than 4 then
| 19 | amitsaha2 | fetch the substring
| 19 | rehnuma | starting from position
| 19 | charvi123 | 3 till the end from
attribute Model.
| 19 | gur_singh |
+---------------+----------------------------------+
4 rows in set (0.03 sec)
The function INSTR will return the position of “@”
in the email address. So to print email id without
“@” we have to use position -1.

2024-25

Chapter 1.indd 9 11/26/2020 12:31:30 PM


10 INFORMATICS PRACTICES

c) Let us assume that four digit area code is reflected


in the mobile number starting from position number
3. For example, 2630 is the area code of mobile
number 4726309212. Now, write the SQL query to
Activity 1.3
display the area code of the customer living in Rohini.
Using the table
EMPLOYEE from mysql> SELECT MID(Phone,3,4) FROM CUSTOMER WHERE
CARSHOWROOM CustAdd like ‘%Rohini%’;
database, write +----------------+
SQL queries for the | MID(Phone,3,4) |
following: +----------------+
| 1163 |
a) Display employee +----------------+
name and the last 1 row in set (0.00 sec)
2 characters of his
EmpId.
d) Display emails after removing the domain name
b) Display designation
of employee and the extension “.com” from emails of the customers.
position of character mysql> SELECT TRIM(“.com” from Email) FROM
‘e’ in designation, if CUSTOMER;
present. +-------------------------+
| TRIM(".com" FROM Email) |
+-------------------------+
| amitsaha2@gmail |
| rehnuma@hotmail |
| charvi123@yahoo |
| gur_singh@yahoo |
+-------------------------+
4 rows in set (0.00 sec)

e) Display details of all the customers having yahoo


emails only.
mysql> SELECT * FROM CUSTOMER WHERE Email LIKE

"%yahoo%";
+-------+-------------+----------------------+-----------+--------------------+
|CustID | CustName | CustAdd | Phone | Email |
+-------+-------------+----------------------+-----------+--------------------+
|C0003 |CharviNayyar |10/9, FF, Rohini |6811635425 |[email protected] |
|C0004 |Gurpreet | A-10/2,SF, MayurVihar|3511056125 | [email protected]|
+-------+-------------+----------------------+-----------+--------------------+
2 rows in set (0.00 sec)t

(C) Date and Time Functions


There are various functions that are used to perform
operations on date and time data. Some of the operations
include displaying the current date, extracting each
element of a date (day, month and year), displaying day
of the week and so on. Table 1.7 explains various date
and time functions.

2024-25

Chapter 1.indd 10 12-Apr-2023 3:04:49 PM


Querying and SQL Functions 11

Table 1.7 Date Functions


Function Description Example with output
NOW() It returns the current mysql> SELECT NOW();
system date and time. Output:
2019-07-11 19:41:17
DATE() It returns the date part mysql> SELECT DATE(NOW());
from the given date/ Output:
time expression. 2019-07-11
MONTH(date) It returns the month in mysql> SELECT MONTH(NOW());
numeric form from the Output:
date. 7
MONTHNAME(date) It returns the month mysql> SELECT
name from the specified MONTHNAME(“2003-11-28”);
date. Output:
November
YEAR(date) It returns the year from mysql> SELECT YEAR(“2003-10-03”);
the date. Output:
2003
DAY(date) It returns the day part mysql> SELECT DAY(“2003-03-24”);
from the date. Output:
24
DAYNAME(date) It returns the name of mysql> SELECT
the day from the date. DAYNAME(“2019-07-11”);
Output:
Thursday

Example 1.4
Let us use the EMPLOYEE table of CARSHOWROOM
database to illustrate the working of some of the date
and time functions.
a) Select the day, month number and year of joining of
all employees.
mysql> SELECT DAY(DOJ), MONTH(DOJ), YEAR(DOJ) FROM
EMPLOYEE;
+----------+------------+-----------+ Activity 1.4
| DAY(DOJ) | MONTH(DOJ) | YEAR(DOJ) |
+----------+------------+-----------+ Using the table
| 12 | 12 | 2017 | EMPLOYEE of
| 5 | 6 | 2016 | CARSHOWROOM
| 8 | 1 | 1999 | database, list the
| 2 | 12 | 2010 | day of birth for all
| 1 | 7 | 2012 | employees whose
| 1 | 1 | 2017 | salary is more than
| 23 | 10 | 2013 | 25000.
+----------+------------+-----------+
7 rows in set (0.03 sec)

b) If the date of joining is not a Sunday, then display it


in the following format "Wednesday, 26, November,
1979."

2024-25

Chapter 1.indd 11 11/26/2020 12:31:30 PM


12 Informatics Practices

mysql> SELECT DAYNAME(DOJ), DAY(DOJ),


Think and Reflect MONTHNAME(DOJ), YEAR(DOJ) FROM EMPLOYEE WHERE
DAYNAME(DOJ)!='Sunday';
Can we use arithmetic
+------------+---------+---------------+---------+
operators (+, -. *, or /)
|DAYNAME(DOJ)| DAY(DOJ)|MONTHNAME(DOJ) |YEAR(DOJ)|
on date functions?
+------------+---------+---------------+---------+
|Tuesday | 12 | December | 2017 |
|Friday | 8 | January | 1999 |
|Thursday | 2 | December | 2010 |
|Wednesday | 23 | October 2013 |
+------------+---------+---------------+---------+
4 rows in set (0.00 sec)

1.2.2 Aggregate Functions


Aggregate functions are also called multiple row functions.
These functions work on a set of records as a whole,
and return a single value for each column of the records
on which the function is applied. Table 1.8 shows the
differences between single row functions and multiple
row functions. Table 1.9 describes some of the aggregate
functions along with their usage. Note that column
must be of numeric type.

Table 1.8 Differences between Single row and Multiple row Functions
Single_row Functions Multiple_row functions

1. It operates on a single row at a time. 1. It operates on groups of rows.


2. It returns one result per row. 2. It returns one result for a group of rows.
3. It can be used in Select, Where, and Order 3. It can be used in the select clause only.
by clause.
4. Math, String and Date functions are 4. Max(), Min(), Avg(), Sum(), Count() and Count(*)
examples of single row functions. are examples of multiple row functions.

Table 1.9 Aggregate Functions in SQL


Function Description Example with output
MAX(column) Returns the largest value from mysql> SELECT MAX(Price) FROM
the specified column. INVENTORY;
Output:
673112.00
MIN(column) Returns the smallest value from mysql> SELECT MIN(Price) FROM
the specified column. INVENTORY;
Output:
355205.00
AVG(column) Returns the average of the values mysql> SELECT AVG(Price) FROM
in the specified column. INVENTORY;
Output:
576091.625000

2024-25

Chapter 1.indd 12 11/26/2020 12:31:30 PM


Querying and SQL Functions 13

SUM(column) Returns the sum of the values mysql> SELECT SUM(Price) FROM
for the specified column. INVENTORY;
Output:
4608733.00
COUNT(column) Returns the number of values mysql> SELECT * from MANAGER;
in the specified column ignoring Output:
the NULL values. +------+---------+
| MNO | MEMNAME |
Note: +------+---------+
In this example, let us consider | 1 | AMIT |
a MANAGER table having two | 2 | KAVREET |
attributes and four records.
| 3 | KAVITA |
| 4 | NULL |
+------+---------+
4 rows in set (0.00 sec)

mysql> SELECT COUNT(MEMNAME)


FROM MANAGER;

Output:
+----------------+
| COUNT(MEMNAME) |
+----------------+
| 3 |
+----------------+
1 row in set (0.01 sec)
COUNT(*) Returns the number of records mysql> SELECT COUNT(*) from
in a table. MANAGER;

Note: In order to display the Output:


number of records that matches +----------+
a particular criteria in the table, | count(*) |
we have to use COUNT(*) with +----------+
WHERE clause.
| 4 |
+----------+
1 row in set (0.00 sec)

Example 1.5
a) Display the total number of records from table
INVENTORY having a model as VXI.
mysql> SELECT COUNT(*) FROM INVENTORY WHERE
Model=”VXI”;
+----------+
| COUNT(*) |
+----------+
| 2 |
+----------+
1 row in set (0.00 sec)
b) Display the total number of different types of Models
available from table INVENTORY.

2024-25

Chapter 1.indd 13 11/26/2020 12:31:30 PM


14 Informatics Practices

mysql> SELECT COUNT(DISTINCT Model) FROM


INVENTORY;
Activity 1.5 +-----------------------+
| COUNT(DISTINCT MODEL) |
a) Find sum of Sale +-----------------------+
Price of the cars | 6 |
purchased by the +-----------------------+
customer having ID 1 row in set (0.09 sec)
C0001 from table
c) Display the average price of all the cars with Model
SALE.
LXI from table INVENTORY.
b) Find the maximum mysql> SELECT AVG(Price) FROM INVENTORY WHERE
and minimum Model="LXI";
commission from the +---------------+
SALE table. | AVG(Price) |
+---------------+
| 548306.500000 |
+---------------+
1 row in set (0.03 sec)

1.3 GROUP BY in SQL


At times we need to fetch a group of rows on the
basis of common values in a column. This can be
done using a GROUP BY clause. It groups the rows
together that contain the same values in a specified
column. We can use the aggregate functions (COUNT,
MAX, MIN, AVG and SUM) to work on the grouped
values. HAVING Clause in SQL is used to specify
conditions on the rows with GROUP BY clause.
Consider the SALE table from the CARSHOWROOM
database:
mysql> SELECT * FROM SALE;
+-----------+------+-------+------------+------------------+----
--+------------+------------+
|InvoiceNo|CarId|CustId| SaleDate | PaymentMode |EmpID| SalePrice|Commission|
+-----------+------+-------+------------+------------------+------+------------+------------+
|I00001 |D001 |C0001 |2019-01-24| Credit Card | E004| 613247.00| 73589.64 |
|I00002 |S001 |C0002 |2018-12-12| Online | E001| 590321.00| 70838.52 |
|I00003 |S002 |C0004 |2019-01-25| Cheque | E010| 604000.00| 72480.00 |
|I00004 |D002 |C0001 |2018-10-15| Bank Finance | E007| 659982.00| 79197.84 |
|I00005 |E001 |C0003 |2018-12-20| Credit Card | E002| 369310.00| 44317.20 |
|I00006 |S002 |C0002 |2019-01-30| Bank Finance | E007| 620214.00| 74425.68 |
+-----------+------+-------+------------+------------------+------+------------+------------+
6 rows in set (0.11 sec)
CarID, CustID, SaleDate, PaymentMode, EmpID,
SalePrice are the columns that can have rows with the
same values in it. So, GROUP BY clause can be used

2024-25

Chapter 1.indd 14 11/26/2020 12:31:30 PM


Querying and SQL Functions 15

in these columns to find the number of records of a


particular type (column), or to calculate the sum of the
price of each car type.
Example 1.6
a) Display the number of cars purchased by each
customer from the SALE table.
mysql> SELECT CustID, COUNT(*) "Number of Cars"
FROM SALE GROUP BY CustID;
+--------+----------------+
| CustID | Number of Cars |
+--------+----------------+
| C0001 | 2 |
| C0002 | 2 |
| C0003 | 1 |
| C0004 | 1 |
+--------+----------------+
4 rows in set (0.00 sec)

b) Display the customer Id and number of cars


purchased if the customer purchased more than 1
car from SALE table.
mysql> SELECT CustID, COUNT(*) FROM SALE GROUP BY
CustID HAVING Count(*)>1;
+--------+----------+
| CustID | COUNT(*) |
+--------+----------+
| C0001 | 2 |
| C0002 | 2 |
+--------+----------+
2 rows in set (0.30 sec)

c) Display the number of people in each category of


payment mode from the table SALE. Activity 1.6
mysql> SELECT PaymentMode, COUNT(PaymentMode) FROM
a) List the total number
SALE GROUP BY Paymentmode ORDER BY Paymentmode;
of cars sold by each
+--------------+--------------------+
employee.
| PaymentMode | Count(PaymentMode) |
+--------------+--------------------+ b) List the maximum
| Bank Finance | 2 | sale made by each
| Cheque | 1 | employee.
| Credit Card | 2 |
| Online | 1 |
+--------------+--------------------+
4 rows in set (0.00 sec)

d) Display the PaymentMode and number of payments


made using that mode more than once.
mysql> SELECT PaymentMode, Count(PaymentMode) FROM
SALE GROUP BY Paymentmode HAVING COUNT(*)>1 ORDER

2024-25

Chapter 1.indd 15 11/26/2020 12:31:30 PM


16 Informatics Practices

Notes BY Paymentmode;
+--------------+--------------------+
| PaymentMode | Count(PaymentMode) |
+--------------+--------------------+
| Bank Finance | 2 |
| Credit Card | 2 |
+--------------+--------------------+
2 rows in set (0.00 sec)

1.4 Operations on Relations


We can perform certain operations on relations like
Union, Intersection, and Set Difference to merge the
tuples of two tables. These three operations are binary
operations as they work upon two tables. Note here, that
these operations can only be applied if both the relations
have the same number of attributes, and corresponding
attributes in both tables have the same domain.
1.4.1 UNION (U)
This operation is used to combine the selected rows of
two tables at a time. If some rows are the same in both
the tables, then the result of the Union operation will
show those rows only once. Figure 1.3 shows union of
two sets.
Music Dance

Figure 1.3: Union of two sets

Let us consider two relations DANCE and MUSIC


shown in Tables 1.10 and 1.11 respectively.
Table 1.10 DANCE
+------+--------+-------+
| SNo | Name | Class |
+------+--------+-------+
| 1| Aastha | 7A |
| 2| Mahira | 6A |
| 3| Mohit | 7B |
| 4| Sanjay | 7A |
+------+--------+-------+

2024-25

Chapter 1.indd 16 11/26/2020 12:31:30 PM


Querying and SQL Functions 17

Table 1.11 MUSIC Notes


+------+---------+-------+
| SNo | Name | Class |
+------+---------+-------+
| 1| Mehak | 8A |
| 2| Mahira | 6A |
| 3| Lavanya | 7A |
| 4| Sanjay | 7A |
| 5| Abhay | 8A |
+------+---------+-------+
If we need the list of students participating in either
of events, then we have to apply UNION operation
(represented by symbol U) on relations DANCE and MUSIC.
The output of UNION operation is shown in Table 1.12.
Table 1.12 DANCE MUSIC
+-------+------+------+
|SNo | Name |Class |
+-------+------+------+
| 1 | Aastha | 7A |
| 2 | Mahira | 6A |
| 3 | Mohit | 7B |
| 4 | Sanjay | 7A |
| 1 | Mehak | 8A |
| 3 | Lavanya | 7A |
| 5 | Abhay | 8A |
+-------+------+------+

1.4.2 INTERSECT (∩)


Intersect operation is used to get the common tuples
from two tables and is represented by the symbol ∩.
Figure 1.4 shows intersection of two sets.
Music Dance

Figure 1.4: Intersection of two sets


Suppose we have to display the list of students
who are participating in both the events (DANCE and
MUSIC), then intersection operation is to be applied on
these two tables. The output of INTERSECT operation is
shown in Table 1.13.
Table 1.13 DANCE ∩ MUSIC
+------+---------+-------+
| SNo | Name | Class |
+------+---------+-------+
| 2| Mahira | 6A |
| 4| Sanjay | 7A |
+------+---------+-------+

2024-25

Chapter 1.indd 17 11/26/2020 12:31:30 PM


18 Informatics Practices

Notes 1.4.3 MINUS (-)


This operation is used to get tuples/rows which are
in the first table but not in the second table, and the
operation is represented by the symbol - (minus). Figure
1.5 shows minus operation (also called set difference)
between two sets.
Music Dance

Figure 1.5: Difference of two sets


Suppose, we want the list of students who are only
participating in MUSIC and not in DANCE event. Then,
we will use the MINUS operation, whose output is given
in Table 1.14.
Table 1.14 DANCE - MUSIC
+------+---------+-------+
| SNo | Name | Class |
+------+---------+-------+
| 1| Mehak | 8A |
| 3| Lavanya | 7A |
| 5| Abhay | 8A |
+------+---------+-------+

1.4.4 Cartesian Product


Cartesian product operation combines tuples from two
relations. It results in all pairs of rows from the two input
relations, regardless of whether or not they have the
same values on common attributes. It is denoted as ‘X’.
The degree of the resulting relation is calculated
as the sum of the degrees of both the relations under
consideration. The cardinality of the resulting relation is
calculated as the product of the cardinality of relations
on which cartesian product is applied. Let us use
the relations DANCE and MUSIC to show the output
of cartesian product. Note that both relations are of
degree 3. The cardinality of relations DANCE and MUSIC
is 4 and 5 respectively. Applying cartesian product on
these two relations will result in a relation of degree 6
and cardinality 20, as shown in Table 1.15.

2024-25

Chapter 1.indd 18 11/26/2020 12:31:31 PM


Querying and SQL Functions 19

Table 1.15 DANCE X MUSIC Notes


+------+--------+-------+------+---------+-------+
| SNo | Name | Class| SNo | Name | Class|
+------+--------+-------+------+---------+-------+
| 1 | Aastha | 7A | 1 | Mehak | 8A |
| 2 | Mahira | 6A | 1 | Mehak | 8A |
| 3 | Mohit | 7B | 1 | Mehak | 8A |
| 4 | Sanjay | 7A | 1 | Mehak | 8A |
| 1 | Aastha | 7A | 2 | Mahira | 6A |
| 2 | Mahira | 6A | 2 | Mahira | 6A |
| 3 | Mohit | 7B | 2 | Mahira | 6A |
| 4 | Sanjay | 7A | 2 | Mahira | 6A |
| 1 | Aastha | 7A | 3 | Lavanya | 7A |
| 2 | Mahira | 6A | 3 | Lavanya | 7A |
| 3 | Mohit | 7B | 3 | Lavanya | 7A |
| 4 | Sanjay | 7A | 3 | Lavanya | 7A |
| 1 | Aastha | 7A | 4 | Sanjay | 7A |
| 2 | Mahira | 6A | 4 | Sanjay | 7A |
| 3 | Mohit | 7B | 4 | Sanjay | 7A |
| 4 | Sanjay | 7A | 4 | Sanjay | 7A |
| 1 | Aastha | 7A | 5 | Abhay | 8A |
| 2 | Mahira | 6A | 5 | Abhay | 8A |
| 3 | Mohit | 7B | 5 | Abhay | 8A |
| 4 | Sanjay | 7A | 5 | Abhay | 8A |
+------+--------+-------+------+---------+-------+
20 rows in set (0.03 sec)

1.5 Using Two Relations in a Query


Till now, we have written queries in SQL using a single
relation only. In this section, we will learn to write
queries using two relations.
1.5.1 Cartesian product on two tables
From the previous section, we learnt that application
of operator cartesian product on two tables results
in a table having all combinations of tuples from the
underlying tables. When more than one table is to be
used in a query, then we must specify the table names
by separating commas in the FROM clause, as shown in
Example 1.7. On execution of such a query, the DBMS
(MySql) will first apply cartesian product on specified
tables to have a single table. The following query of
Example 1.7 applies cartesian product on the two tables
DANCE and MUSIC:
Example 1.7
a) Display all possible combinations of tuples of
relations DANCE and MUSIC
mysql> SELECT * FROM DANCE, MUSIC;

As we are using SELECT * in the query, the output will


be the Table 1.15 having degree 6 and cardinality 20.

2024-25

Chapter 1.indd 19 11/26/2020 12:31:31 PM


20 Informatics Practices

Notes b) From the all possible combinations of tuples of


relations DANCE and MUSIC, display only those
rows such that the attribute name in both have the
same value.
mysql> SELECT * FROM DANCE D, MUSIC M WHERE D.Name
= M.Name;

Table 1.16 Tuples with same name


+------+--------+-------+------+--------+-------+
| Sno | Name | Class | Sno | Name | class |
+------+--------+-------+------+--------+-------+
| 2 | Mahira | 6A | 2 | Mahira | 6A |
| 4 | Sanjay | 7A | 4 | Sanjay | 7A |
+------+--------+-------+------+--------+-------+
2 rows in set (0.00 sec)

Note that in this query we have used table aliases


(D for DANCE and M for MUSIC), just like column
aliases to refer to tables by shortened names. It is
important to note that table alias is valid only for
current query and the original table name cannot be
used in the query if its alias is given in FROM clause.
1.5.2 JOIN on two tables
JOIN operation combines tuples from two tables on
specified conditions. This is unlike cartesian product,
which make all possible combinations of tuples. While
using the JOIN clause of SQL, we specify conditions on
the related attributes of two tables within the FROM
clause. Usually, such an attribute is the primary key
in one table and foreign key in another table. Let us
create two tables UNIFORM (UCode, UName, UColor)
and COST (UCode, Size, Price) in the SchoolUniform
database. UCode is Primary Key in table UNIFORM.
UCode and Size is the Composite Key in table COST.
Therefore, Ucode is a common attribute between the
two tables which can be used to fetch the common data
from both the tables. Hence, we need to define Ucode as
foreign key in the Price table while creating this table.
Table 1.17 Uniform table
+-------+-------+--------+
| Ucode | Uname | Ucolor |
+-------+-------+--------+
| 1 | Shirt | White |
| 2 | Pant | Grey |
| 3 | Tie | Blue |
+-------+-------+--------+

2024-25

Chapter 1.indd 20 11/26/2020 12:31:31 PM


Querying and SQL Functions 21

Table 1.18 Cost table Notes


+-----+------+-------+
|Ucode| Size | Price |
+-----+------+-------+
| 1 | L | 580 |
| 1 | M | 500 |
| 2 | L | 890 |
| 2 | M | 810 |
+-------+----+-------+
Example 1.7
List the UCode, UName, UColor, Size and Price of related
tuples of tables UNIFORM and COST.
The given query may be written in three different ways
as given below:
a) Using condition in where clause
mysql> SELECT * FROM UNIFORM U, COST C WHERE
U.UCode = C.UCode;
Table 1.19 Output of the query
+-------+-------+--------+-------+---------+-------+
| UCode | UName | UColor | Ucode | Size | Price |
+-------+-------+--------+-------+---------+-------+
| 1 | Shirt | White | 1 | L | 580 |
| 1 | Shirt | White | 1 | M | 500 |
| 2 | Pant | Grey | 2 | L | 890 |
| 2 | Pant | Grey | 2 | M | 810 |
+-------+-------+--------+-------+---------+-------+
4 rows in set (0.08 sec)

As the attribute Ucode is in both tables, we need


to use table alias to remove ambiguity. Hence, we
have used qualifier with attribute UCode in SELECT
and FROM clauses to indicate its scope.
b) Explicit use of JOIN clause
mysql> SELECT * FROM UNIFORM U JOIN COST C ON
U.Ucode=C.Ucode;

The output of the query is the same as shown


in Table 1.19. In this query, we have used JOIN
clause explicitly along with condition in From
clause. Hence, no condition needs to be given in
where clause.
c) Explicit use of NATURAL JOIN clause
The output of queries (a) and (b) shown in Table
1.19 has a repetitive column Ucode having exactly
the same values. This redundant column provides
no additional information. There is an extension
of JOIN operation called NATURAL JOIN which
works similar to JOIN clause in SQL, but removes
the redundant attribute. This operator can be used

2024-25

Chapter 1.indd 21 11/26/2020 12:31:31 PM


22 Informatics Practices

Notes to join the contents of two tables iff there is one


common attribute in both the tables. The above
SQL query using NATURAL JOIN is shown below:
mysql> SELECT * FROM UNIFORM NATURAL JOIN COST;
+-------+-------+--------+------+-------+
| UCode | UName | UColor | Size | Price |
+-------+-------+--------+------+-------+
| 1 | Shirt | White | L | 580 |
| 1 | Shirt | White | M | 500 |
| 2 | Pant | Grey | L | 890 |
| 2 | Pant | Grey | M | 810 |
+-------+-------+--------+------+-------+
4 rows in set (0.17 sec)
It is clear from the output that the result of this
query is same as that of queries written in (a) and (b),
except that the attribute Ucode appears only once.
Following are some of the points to be considered
while applying JOIN operations on two or more relations:
• If two tables are to be joined on equality condition
on the common attribute, then one may use JOIN
with ON clause or NATURAL JOIN in FROM clause.
If three tables are to be joined on equality condition,
then two JOIN or NATURAL JOIN are required.
• In general, N-1 joins are needed to combine N tables
on equality condition.
• With JOIN clause, we may use any relational
operators to combine tuples of two tables.

S ummary
• A Function is used to perform a particular task
and return a value as a result.
• Single row functions work on a single row to
return a single value.
• Multiple row functions work on a set of records as
a whole and return a single value.
• Numeric functions perform operations on numeric
values and return numeric values.
• String functions perform operations on character
type values and return either character or numeric
values.

2024-25

Chapter 1.indd 22 11/26/2020 12:31:31 PM


Querying and SQL Functions 23

Notes
• Date and time functions allow us to deal with date
type data values.
• GROUP BY function is used to group the rows
together that contain similar values in a specified
column. Some of the group functions are COUNT,
MAX, MIN, AVG and SUM.
• Join is an operation which is used to combine
rows from two or more tables based on one or
more common fields between them.

Exercise
1. Answer the following questions:
a) Define RDBMS. Name any two RDBMS software.
b) What is the purpose of the following clauses in a
select statement?
i) ORDER BY
ii) HAVING
c) Site any two differences between Single_row
functions and Aggregate functions.
d) What do you understand by Cartesian Product?
e) Write the name of the functions to perform the
following operations:
i) To display the day like “Monday”, “Tuesday”,
from the date when India got independence.
ii) To display the specified number of characters
from a particular position of the given string.
iii) To display the name of the month in which
you were born.
iv) To display your name in capital letters.
2. Write the output produced by the following SQL
commands:
a) SELECT POW(2,3);
b) SELECT ROUND(123.2345, 2),
ROUND(342.9234,-1);
c) SELECT LENGTH("Informatics Practices");
d) SELECT YEAR(“1979/11/26”),
MONTH(“1979/11/26”),
DAY(“1979/11/26”),

2024-25

Chapter 1.indd 23 11/26/2020 12:31:31 PM


24 Informatics Practices

Notes MONTHNAME(“1979/11/26”);
e) SELECT LEFT("INDIA",3), RIGHT("Computer
Science",4);
f) SELECT MID("Informatics",3,4),
SUBSTR("Practices",3);
3. Consider the following table named “Product”,
showing details of products being sold in a grocery
shop.

PCode PName UPrice Manufacturer


P01 Washing Powder 120 Surf
P02 Tooth Paste 54 Colgate
P03 Soap 25 Lux
P04 Tooth Paste 65 Pepsodant
P05 Soap 38 Dove
P06 Shampoo 245 Dove

a) Write SQL queries for the following:


i. Create the table Product with appropriate
data types and constraints.
ii. Identify the primary key in Product.
iii. List the Product Code, Product name and
price in descending order of their product
name. If PName is the same then display the
data in ascending order of price.
iv. Add a new column Discount to the table
Product.
v. Calculate the value of the discount in the
table Product as 10 per cent of the UPrice
for all those products where the UPrice is
more than 100, otherwise the discount will
be 0.
vi. Increase the price by 12 per cent for all the
products manufactured by Dove.
vii.Display the total number of products
manufactured by each manufacturer.
b) Write the output(s) produced by executing the
following queries on the basis of the information
given above in the table Product:
i. SELECT PName, Average(UPrice) FROM
Product GROUP BY Pname;
ii. SELECT DISTINCT Manufacturer FROM
Product;

2024-25

Chapter 1.indd 24 11/26/2020 12:31:31 PM


Querying and SQL Functions 25

iii. SELECT COUNT(DISTINCT PName) FROM Notes


Product;
iv. SELECT PName, MAX(UPrice), MIN(UPrice)
FROM Product GROUP BY PName;
4. Using the CARSHOWROOM database given in the
chapter, write the SQL queries for the following:
a) Add a new column Discount in the INVENTORY
table.
b) Set appropriate discount values for all cars
keeping in mind the following:
(i) No discount is available on the LXI model.
(ii) VXI model gives a 10% discount.
(iii) A 12% discount is given on cars other than
LXI model and VXI model.
c) Display the name of the costliest car with fuel
type “Petrol”.
d) Calculate the average discount and total discount
available on Car4.
e) List the total number of cars having no discount.
5. Consider the following tables Student and Stream in
the Streams_of_Students database. The primary key
of the Stream table is StCode (stream code) which is
the foreign key in the Student table. The primary key
of the Student table is AdmNo (admission number).

AdmNo Name StCode


211 Jay NULL
241 Aditya S03
290 Diksha S01
333 Jasqueen S02
356 Vedika S01
380 Ashpreet S03

StCode Stream
S01 Science
S02 Commerce
S03 Humanities

Write SQL queries for the following:


a) Create the database Streams_Of_Students.

2024-25

Chapter 1.indd 25 11/26/2020 12:31:31 PM


26 Informatics Practices

Notes b) Create the table Student by choosing appropriate


data types based on the data given in the table.
c) Identify the Primary keys from tables Student
and Stream. Also, identify the foreign key from
the table Stream.
d) Jay has now changed his stream to Humanities.
Write an appropriate SQL query to reflect this
change.
e) Display the names of students whose names end
with the character ‘a’. Also, arrange the students
in alphabetical order.
f) Display the names of students enrolled in Science
and Humanities stream, ordered by student name
in alphabetical order, then by admission number
in ascending order (for duplicating names).
g) List the number of students in each stream having
more than 1 student.
h) Display the names of students enrolled in
different streams, where students are arranged
in descending order of admission number.
i) Show the Cartesian product on the Student
and Stream table. Also mention the degree and
cardinality produced after applying the Cartesian
product.
j) Add a new column ‘TeacherIncharge” in the
Stream table. Insert appropriate data in each row.
k) List the names of teachers and students.
l) If Cartesian product is again applied on Student
and Stream tables, what will be the degree and
cardinality of this modified table?

2024-25

Chapter 1.indd 26 11/26/2020 12:31:31 PM


Chapter
Data Handling Using
2 Pandas - I

“If you don't think carefully, you


might believe that programming
is just typing statements in a
programming language.”
— W. Cunningham

In this chapter
»» Introduction to
Python Libraries
2.1 Introduction to Python Libraries
»» Series
Python libraries contain a collection of built- »» DataFrame
in modules that allow us to perform many
»» Importing and
actions without writing detailed programs Exporting Data
for it. Each library in Python contains a large between CSV Files
number of modules that one can import and and DataFrames
use.
»» Pandas Series Vs
NumPy, Pandas and Matplotlib are three NumPy ndarray
well-established Python libraries for scientific
and analytical use. These libraries allow us
to manipulate, transform and visualise data
easily and efficiently.
NumPy, which stands for ‘Numerical
Python’, is a library we discussed in class
XI. Recall that, it is a package that can
be used for numerical data analysis and

2024-25

Chapter 2.indd 27 11/26/2020 12:32:46 PM


28 Informatics Practices

Notes scientific computing. NumPy uses a multidimensional


array object and has functions and tools for working
with these arrays. Elements of an array stay together in
memory, hence, they can be quickly accessed.
PANDAS (PANel DAta) is a high-level data manipulation
tool used for analysing data. It is very easy to import
and export data using Pandas library which has a very
rich set of functions. It is built on packages like NumPy
and Matplotlib and gives us a single, convenient place
to do most of our data analysis and visualisation work.
Pandas has three important data structures, namely –
Series, DataFrame and Panel to make the process of
analysing data organised, effective and efficient.
The Matplotlib library in Python is used for plotting
graphs and visualisation. Using Matplotlib, with just a
few lines of code we can generate publication quality
plots, histograms, bar charts, scatterplots, etc. It is
also built on Numpy, and is designed to work well with
Numpy and Pandas.
You may think what the need for Pandas is when
NumPy can be used for data analysis. Following are
some of the differences between Pandas and Numpy:
1. A Numpy array requires homogeneous data, while
a Pandas DataFrame can have different data types
(float, int, string, datetime, etc.).
2. Pandas have a simpler interface for operations like
file loading, plotting, selection, joining, GROUP
BY, which come very handy in data-processing
applications.
3. Pandas DataFrames (with column names) make it
very easy to keep track of data.
4. Pandas is used when data is in Tabular Format,
whereas Numpy is used for numeric array based
data manipulation.
2.1.1. Installing Pandas
Installing Pandas is very similar to installing NumPy. To
install Pandas from command line, we need to type in:

pip install pandas

Note that both NumPy and Pandas can be installed


only when Python is already installed on that system.
The same is true for other libraries of Python.

2024-25

Chapter 2.indd 28 11/26/2020 12:32:46 PM


Data Handling Using Pandas - I 29

2.1.2. Data Structure in Pandas


A data structure is a collection of data values and
operations that can be applied to that data. It enables
efficient storage, retrieval and modification to the data.
For example, we have already worked with a data
structure ndarray in NumPy in Class XI. Recall the ease
with which we can store, access and update data using
a NumPy array. Two commonly used data structures in
Pandas that we will cover in this book are:
• Series
• DataFrame
2.2 Series
A Series is a one-dimensional array containing a
sequence of values of any data type (int, float, list,
string, etc) which by default have numeric data labels
starting from zero. The data label associated with a
particular value is called its index. We can also assign
values of other data types as index. We can imagine a
Pandas Series as a column in a spreadsheet. Example
of a series containing names of students is given below:
Index Value
0 Arnab
1 Samridhi
2 Ramit
3 Divyam
4 Kritika

2.2.1 Creation of Series


There are different ways in which a series can be created
in Pandas. To create or use series, we first need to import
the Pandas library.
(A) Creation of Series from Scalar Values
A Series can be created using scalar values as shown in
the example below:
>>> import pandas as pd #import Pandas with alias pd
>>> series1 = pd.Series([10,20,30]) #create a Series
>>> print(series1) #Display the series

Output:
0 10
1 20
2 30
dtype: int64

2024-25

Chapter 2.indd 29 11/26/2020 12:32:46 PM


30 Informatics Practices

Observe that output is shown in two columns - the


index is on the left and the data value is on the right. If
we do not explicitly specify an index for the data values
while creating a series, then by default indices range
from 0 through N – 1. Here N is the number of data
elements.
We can also assign user-defined labels to the index
and use them to access elements of a Series. The
Activity 2.1
following example has a numeric index in random order.
Create a series having
>>> series2 = pd.Series(["Kavi","Shyam","Ra
names of any five
vi"], index=[3,5,1])
famous monuments of
>>> print(series2) #Display the series
India and assign their
States as index values.
Output:
3 Kavi
5 Shyam
1 Ravi
dtype: object
Here, data values Kavi, Shyam and Ravi have index
values 3, 5 and 1, respectively. We can also use letters
or strings as indices, for example:
>>> series2 = pd.Series([2,3,4],index=["Feb","M
ar","Apr"])
>>> print(series2) #Display the series
Think and Reflect Output:
Feb 2
While importing Mar 3
Pandas, is it Apr 4
mandatory to always dtype: int64
use pd as an alias
name? What would
happen if we give any
Here, data values 2,3,4 have index values Feb, Mar
other name? and Apr, respectively.
(B) Creation of Series from NumPy Arrays
We can create a series from a one-dimensional (1D)
NumPy array, as shown below:
>>> import numpy as np # import NumPy with alias np
>>> import pandas as pd
>>> array1 = np.array([1,2,3,4])
>>> series3 = pd.Series(array1)
>>> print(series3)

Output:
0 1
1 2
2 3
3 4
dtype: int32

2024-25

Chapter 2.indd 30 11/26/2020 12:32:47 PM


Data Handling Using Pandas - I 31

The following example shows that we can use letters Notes


or strings as indices:

>>> series4 = pd.Series(array1, index = ["Jan",


"Feb", "Mar", "Apr"])
>>> print(series4)
Jan 1
Feb 2
Mar 3
Apr 4
dtype: int32
When index labels are passed with the array, then
the length of the index and array must be of the same
size, else it will result in a ValueError. In the example
shown below, array1 contains 4 values whereas there
are only 3 indices, hence ValueError is displayed.
>>> series5 = pd.Series(array1, index = ["Jan",
"Feb", "Mar"])
ValueError: Length of passed values is 4, index
implies 3
(C) Creation of Series from Dictionary
Recall that Python dictionary has key: value pairs and
a value can be quickly retrieved when its key is known.
Dictionary keys can be used to construct an index for a
Series, as shown in the following example. Here, keys of
the dictionary dict1 become indices in the series.
>>> dict1 = {'India': 'NewDelhi', 'UK':
'London', 'Japan': 'Tokyo'}
>>> print(dict1) #Display the dictionary
{'India': 'NewDelhi', 'UK': 'London', 'Japan':
'Tokyo'}
>>> series8 = pd.Series(dict1)
>>> print(series8) #Display the series
India NewDelhi
UK London
Japan Tokyo
dtype: object

2.2.2 Accessing Elements of a Series


There are two common ways for accessing the elements
of a series: Indexing and Slicing.
(A) Indexing
Indexing in Series is similar to that for NumPy arrays,
and is used to access elements in a series. Indexes
are of two types: positional index and labelled index.
Positional index takes an integer value that corresponds
to its position in the series starting from 0, whereas
labelled index takes any user-defined label as index.

2024-25

Chapter 2.indd 31 11/26/2020 12:32:47 PM


32 Informatics Practices

• Following example shows usage of the positional


index for accessing a value from a Series.
>>> seriesNum = pd.Series([10,20,30])
>>> seriesNum[2]
30
Here, the value 30 is displayed for the positional
index 2.
When labels are specified, we can use labels as
indices while selecting values from a Series, as shown
below. Here, the value 3 is displayed for the labelled
index Mar.

>>> seriesMnths = pd.Series([2,3,4],index=["Feb


","Mar","Apr"])
>>> seriesMnths["Mar"]
3
In the following example, value NewDelhi is
displayed for the labelled index India.
>>> seriesCapCntry = pd.Series(['NewDelhi',
'WashingtonDC', 'London', 'Paris'],
index=['India', 'USA', 'UK', 'France'])
Activity 2.2 >>> seriesCapCntry['India']
'NewDelhi'
Write the statement to
get NewDelhi as output We can also access an element of the series using
using positional index. the positional index:
>>> seriesCapCntry[1]
'WashingtonDC'
More than one element of a series can be accessed
using a list of positional integers or a list of index
labels as shown in the following examples:

>>> seriesCapCntry[[3,2]]
France Paris
UK London
dtype: object

>>> seriesCapCntry[['UK','USA']]
UK London
USA WashingtonDC
dtype: object
The index values associated with the series can be
altered by assigning new index values as shown in
the following example:
>>> seriesCapCntry.index=[10,20,30,40]
>>> seriesCapCntry

2024-25

Chapter 2.indd 32 11/26/2020 12:32:47 PM


Data Handling Using Pandas - I 33

10 NewDelhi
20 WashingtonDC
30 London
40 Paris
dtype: object
(B) Slicing
Sometimes, we may need to extract a part of a series.
This can be done through slicing. This is similar to
slicing used with NumPy arrays. We can define which
part of the series is to be sliced by specifying the start
and end parameters [start :end] with the series name.
When we use positional indices for slicing, the value
at the endindex position is excluded, i.e., only (end -
start) number of data values of the series are extracted.
Consider the following series seriesCapCntry:

>>> seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London',


'Paris'], index=['India', 'USA', 'UK', 'France'])

>>> seriesCapCntry[1:3] #excludes the value at index position 3

USA WashingtonDC
UK London
dtype: object

As we can see that in the above output, only data


values at indices 1 and 2 are displayed. If labelled
indexes are used for slicing, then value at the end index
label is also included in the output, for example:
>>> seriesCapCntry['USA' : 'France']

USA WashingtonDC
UK London
France Paris
dtype: object

We can also get the series in reverse order, for


example:
>>> seriesCapCntry[ : : -1]
France Paris
UK London
USA WashingtonDC
India NewDelhi
dtype: object

2024-25

Chapter 2.indd 33 11/26/2020 12:32:47 PM


34 Informatics Practices

Notes We can also use slicing to modify the values of series


elements as shown in the following example:

>>> import numpy as np


>>> seriesAlph = pd.Series(np.arange(10,16,1),
index = ['a', 'b', 'c', 'd', 'e', 'f'])
>>> seriesAlph
a 10
b 11
c 12
d 13
e 14
f 15
dtype: int32

>>> seriesAlph[1:3] = 50
>>> seriesAlph
a 10
b 50
c 50
d 13
e 14
f 15
dtype: int32
Observe that updating the values in a series using
slicing also excludes the value at the end index position.
But, it changes the value at the end index label when
slicing is done using labels.
>>> seriesAlph['c':'e'] = 500
>>> seriesAlph
a 10
b 50
c 500
d 500
e 500
f 15
dtype: int32

2.2.3 Attributes of Series


We can access certain properties called attributes of
a series by using that property with the series name.
Table 2.1 lists some attributes of Pandas series
usingseriesCapCntry as an example:

>>> seriesCapCntry
India NewDelhi
USA WashingtonDC
UK London
France Paris
dtype: object

2024-25

Chapter 2.indd 34 11/26/2020 12:32:47 PM


Data Handling Using Pandas - I 35

Table 2.1 Attributes of Pandas Series


Attribute Name Purpose Example
name assigns a name to the Series >>> seriesCapCntry.name = ‘Capitals’
>>> print(seriesCapCntry)
India NewDelhi
USA WashingtonDC
UK London
France Paris
Name: Capitals, dtype: object
index.name assigns a name to the index >>>seriesCapCntry.index.name =
of the series ‘Countries’
>>> print(seriesCapCntry)
Countries
India NewDelhi
USA WashingtonDC
UK London
France Paris
Name: Capitals, dtype: object

values prints a list of the values in >>> print(seriesCapCntry.values)


the series [‘NewDelhi’ ‘WashingtonDC’ ‘London’
‘Paris’]
size prints the number of values >>> print(seriesCapCntry.size)
in the Series object 4
empty prints True if the series is >>> seriesCapCntry.empty
empty, and False otherwise False

# Create an empty series


seriesEmpt=pd.Series()
>>> seriesEmpt.empty
True

2.2.4 Methods of Series


Activity 2.3
In this section, we are going to discuss some of the
Consider the following
methods that are available for Pandas Series. Let us code:
consider the following series: >>>import pandas as pd
>>>import numpy as np
>>> seriesTenTwenty=pd.Series(np.arange( 10, >>>s2=pd.
20, 1 )) Series([12,np.nan,10])
>>> print(seriesTenTwenty) >>>print(s2)
0 10
1 11 Find output of the
2 12 above code and write
3 13 a Python statement to
4 14 count and display only
5 15
non null values in the
6 16
above series.
7 17
8 18
9 19
dtype: int32

2024-25

Chapter 2.indd 35 11/26/2020 12:32:47 PM


36 Informatics Practices

Method Explanation Example


head(n) Returns the first n members of the series. If >>> seriesTenTwenty.head(2)
the value for n is not passed, then by default 0 10
n takes 5 and the first five members are 1 11
displayed. dtype: int32

>>> seriesTenTwenty.head()
0 10
1 11
2 12
3 13
4 14
dtype: int32
count() Returns the number of non-NaN values in >>> seriesTenTwenty.count()
the Series 10
tail(n) Returns the last n members of the series. If >>> seriesTenTwenty.tail(2)
the value for n is not passed, then by default 8 18
n takes 5 and the last five members are 9 19
displayed. dtype: int32

>>> seriesTenTwenty.tail()
5 15
6 16
7 17
8 18
9 19
dtype: int32

2.2.5 Mathematical Operations on Series


We have learnt in Class XI that if we perform basic
mathematical operations like addition, subtraction,
multiplication, division, etc., on two NumPy arrays,
the operation is done on each corresponding pair of
elements. Similarly, we can perform mathematical
operations on two series in Pandas.
While performing mathematical operations on series,
index matching is implemented and all missing values
are filled in with NaN by default.
Consider the following series: seriesA and seriesB
for understanding mathematical operations on series in
Pandas.
>>> seriesA = pd.Series([1,2,3,4,5], index =
['a', 'b', 'c', 'd', 'e'])

>>> seriesA
a 1
b 2
c 3
d 4
e 5
dtype: int64

2024-25

Chapter 2.indd 36 11/26/2020 12:32:47 PM


Data Handling Using Pandas - I 37

>>> seriesB = pd.Series([10,20,-10,-50,100], Notes


index = ['z', 'y', 'a', 'c', 'e'])
>>> seriesB
z 10
y 20
a -10
c -50
e 100
dtype: int64

(A) Addition of two Series


It can be done in two ways. In the first method, two
series are simply added together, as shown in the
following code. Table 2.2 shows the detailed values that
were matched while performing the addition. Note here
that the output of addition is NaN if one of the elements
or both elements have no value.
>>> seriesA + seriesB
a -9.0
b NaN
c -47.0
d NaN
e 105.0
y NaN
z NaN
dtype: float64

Table 2.2 Details of addition of two series


index value from value from seriesA + seriesB
seriesA seriesB
a 1 -10 -9.0
b 2 NaN
c 3 -50 -47.0
d 4 NaN
e 5 100 105.00
y 20 NaN
z 10 NaN

The second method is applied when we do not


want to have NaN values in the output. We can use
the series method add() and a parameter fill_value to
replace missing value with a specified value. That is,
calling seriesA.add(seriesB) is equivalent to calling
seriesA+seriesB, but add() allows explicit specification
of the fill value for any element in seriesA or seriesB
that might be missing, as shown in Table 2.3.

2024-25

Chapter 2.indd 37 11/26/2020 12:32:47 PM


38 Informatics Practices

>>> seriesA.add(seriesB, fill_value=0)

Activity 2.4 a -9.0


b 2.0
Draw two tables for c -47.0
subtraction similar d 4.0
to tables 2.2 and 2.3 e 105.0
showing the changes in y 20.0
the series elements and z 10.0
corresponding output dtype: float64
without replacing the
missing values, and Table 2.3 Details of addition of two series using add() method
after replacing the index value from value from seriesA + seriesB
missing values with seriesA seriesB
1000.
a 1 -10 -9.0
b 2 0 2.0
c 3 -50 -47.0
d 4 0 4.0
e 5 100 105.00
y 0 20 20.0
z 0 10 10.0
Note that Table 2.2 shows the changes in the series
elements and corresponding output without replacing
the missing values, while Table 2.3 shows the changes
in the series elements and corresponding output after
replacing missing values by 0. Just like addition,
subtraction, multiplication and division can also be
done using corresponding mathematical operators or
explicitly calling of the appropriate method.
(B) Subtraction of two Series
Again, it can be done in two different ways, as shown in
Activity 2.5 the following examples:
Draw two tables for >>> seriesA – seriesB #using subtraction operator
multiplication similar
a 11.0
to Tables 2.2 and 2.3
b NaN
showing the changes
c 53.0
in the series elements
d NaN
and corresponding
e -95.0
output without
y NaN
replacing the missing
z NaN
values, and after
dtype: float64
replacing the missing
values with 0.
Let us now replace the missing values with 1000
before subtracting seriesB from seriesA using explicit
subtraction method sub().

2024-25

Chapter 2.indd 38 11/26/2020 12:32:47 PM


Data Handling Using Pandas - I 39

>>> seriesA.sub(seriesB, fill_value=1000)


# using fill value 1000 while making explicit
# call of the method”

a 11.0
b -998.0
c 53.0
d -996.0
e -95.0
y 980.0
z 990.0
dtype: float64
(C) Multiplication of two Series
Again, it can be done in two different ways, as shown in
the following examples:

>>>seriesA * seriesB #using multiplication operator


a -10.0
b NaN
c -150.0
d NaN
e 500.0
y NaN
z NaN
dtype: float64
Let us now replace the missing values with 0 before Activity 2.6
multiplication of seriesB with seriesA using explicit
Draw two tables for
multiplication method mul(). division similar to
>>> seriesA.mul(seriesB, fill_value=0) tables 2.2 and 2.3
# using fill value 0 while making showing the changes
#explicit call of the method in the series elements
a -10.0 and corresponding
b 0.0 output without
c -150.0 replacing the missing
d 0.0 values, and after
e 500.0 replacing the missing
y 0.0 values with 0.
z 0.0
dtype: float64
(D) Division of two Series
Again, it can be done in two different ways, as shown in Explicit call to
the following examples: a mathematical
operation is preferred
>>> seriesA/seriesB # using division operator when series may have
a -0.10 missing values and we
b NaN want to replace it by a
c -0.06 specific value to have
d NaN a concrete output in
place of NaN.

2024-25

Chapter 2.indd 39 11/26/2020 12:32:47 PM


40 Informatics Practices

e 0.05
y NaN
z NaN
dtype: float64
Let us now replace the missing values with 0 before
dividing seriesA by seriesB using explicit division
method div().

# using fill value 0 while making explicit


# call of the method

a -0.10
b inf
c -0.06
d inf
e 0.05
y 0.00
z 0.00
dtype: float64

2.3 DataFrame
Sometimes we need to work on multiple columns at
a time, i.e., we need to process the tabular data. For
example, the result of a class, items in a restaurant’s
menu, reservation chart of a train, etc. Pandas store
such tabular data using a DataFrame. A DataFrame is
a two-dimensional labelled data structure like a table
of MySQL. It contains rows and columns, and therefore
has both a row and column index. Each column can
have a different type of value such as numeric, string,
boolean, etc., as in tables of a database.
Column Indexes
State Geographical Area Area under Very
(sq Km) Dense Forests (sq
Km)
1 Assam 78438 2797
Row Indexes

2 Delhi 1483 6.72

3 Kerala 38852 1663

2.3.1 Creation of DataFrame


There are a number of ways to create a DataFrame.
Some of them are listed in this section.
(A) Creation of an empty DataFrame
An empty DataFrame can be created as follows:

2024-25

Chapter 2.indd 40 11/26/2020 12:32:47 PM


Data Handling Using Pandas - I 41

>>> import pandas as pd


>>> dFrameEmt = pd.DataFrame() Think and Reflect
>>> dFrameEmt
What would happen if
Empty DataFrame we pass 3 columns or
Columns: [] 5 columns instead of
Index: [] 4 in the above code?
What is the reason?
(B) Creation of DataFrame from NumPy ndarrays
Consider the following three NumPy ndarrays. Let us
create a simple DataFrame without any column labels,
using a single ndarray:
>>> import numpy as np
>>> array1 = np.array([10,20,30])
>>> array2 = np.array([100,200,300])
>>> array3 = np.array([-10,-20,-30, -40])

>>> dFrame4 = pd.DataFrame(array1)


>>> dFrame4
0
0 10
1 20
2 30
We can create a DataFrame using more than one
ndarrays, as shown in the following example:
>>> dFrame5 = pd.DataFrame([array1, array3,
array2], columns=[ 'A', 'B', 'C', 'D'])
>>> dFrame5
A B C D
0 10 20 30 NaN
1 -10 -20 -30 -40.0
2 100 200 300 NaN
(C) Creation of DataFrame from List of Dictionaries
We can create DataFrame from a list of Dictionaries, for
example:
# Create list of dictionaries
>>> listDict = [{'a':10, 'b':20}, {'a':5,
'b':10, 'c':20}]

>>> dFrameListDict = pd.DataFrame(listDict)


>>> dFrameListDict
a b c
0 10 20 NaN
1 5 10 20.0
Here, the dictionary keys are taken as column
labels, and the values corresponding to each key are
taken as rows. There will be as many rows as the
number of dictionaries present in the list. In the above
example there are two dictionaries in the list. So, the
DataFrame consists of two rows. Number of columns

2024-25

Chapter 2.indd 41 11/26/2020 12:32:47 PM


42 Informatics Practices

in a DataFrame is equal to the maximum number of


keys in any dictionary of the list. Hence, there are three
columns as the second dictionary has three elements.
Also, note that NaN (Not a Number) is inserted if a
corresponding value for a column is missing.
(D) Creation of DataFrame from Dictionary of Lists
DataFrames can also be created from a dictionary of
lists. Consider the following dictionary consisting of the
keys ‘State’, ‘GArea’ (geographical area) and ‘VDF’ (very
dense forest) and the corresponding values as list.
>>> dictForest = {'State': ['Assam', 'Delhi',
'Kerala'],
'GArea': [78438, 1483, 38852] ,
'VDF' : [2797, 6.72,1663]}
>>> dFrameForest= pd.DataFrame(dictForest)
>>> dFrameForest
State GArea VDF
0 Assam 78438 2797.00
1 Delhi 1483 6.72
2 Kerala 38852 1663.00
Note that dictionary keys become column labels by
default in a DataFrame, and the lists become the rows.
Thus, a DataFrame can be thought of as a dictionary of
lists or a dictionary of series.
We can change the sequence of columns in a
DataFrame. This can be done by assigning a particular
sequence of the dictionary keys as columns parameter,
for example:
>>> dFrameForest1 = pd.DataFrame(dictForest,
columns = ['State','VDF', 'GArea'])
>>> dFrameForest1
State VDF GArea
0 Assam 2797.00 78438
1 Delhi 6.72 1483
2 Kerala 1663.00 38852
In the output, VDF is now displayed as the middle
column instead of last.
(E) Creation of DataFrame from Series
Consider the following three Series:
seriesA = pd.Series([1,2,3,4,5],
index = ['a', 'b', 'c', 'd', 'e'])

seriesB = pd.Series ([1000,2000,-1000,-5000,1000],


index = ['a', 'b', 'c', 'd', 'e'])

2024-25

Chapter 2.indd 42 11/26/2020 12:32:47 PM


Data Handling Using Pandas - I 43

seriesC = pd.Series([10,20,-10,-50,100], Notes


index = ['z', 'y', 'a', 'c', 'e'])
We can create a DataFrame using a single series as
shown below:
>>> dFrame6 = pd.DataFrame(seriesA)
>>> dFrame6
0
a 1
b 2
c 3
d 4
e 5
Here, the DataFrame dFrame6 has as many numbers
of rows as the numbers of elements in the series, but
has only one column. To create a DataFrame using
more than one series, we need to pass multiple series in
the list as shown below:
>>> dFrame7 = pd.DataFrame([seriesA, seriesB])
>>> dFrame7
a b c d e
0 1 2 3 4 5
1 1000 2000 -1000 -5000 1000
Observe that the labels in the series object become
the column names in the DataFrame object and each
series becomes a row in the DataFrame. Now look at the
following example:

>>> dFrame8 = pd.DataFrame([seriesA, seriesC])


>>> dFrame8
a b c d e z y
0 1.0 2.0 3.0 4.0 5.0 NaN NaN
1 -10.0 NaN -50.0 NaN 100.0 10.0 20.0
Here, different series do not have the same set of
labels. But, the number of columns in a DataFrame
equals to distinct labels in all the series. So, if a particular
series does not have a corresponding value for a label,
NaN is inserted in the DataFrame column.
(F) Creation of DataFrame from Dictionary of Series
A dictionary of series can also be used to create a
DataFrame. For example, ResultSheet is a dictionary of
series containing marks of 5 students in three subjects.
The names of the students are the keys to the dictionary,
and the index values of the series are the subject names
as shown below:

2024-25

Chapter 2.indd 43 11/26/2020 12:32:47 PM


44 Informatics Practices

>>> ResultSheet={
'Arnab': pd.Series([90, 91, 97],
index=['Maths','Science','Hindi']),
'Ramit': pd.Series([92, 81, 96],
index=['Maths','Science','Hindi']),
'Samridhi': pd.Series([89, 91, 88],
index=['Maths','Science','Hindi']),
'Riya': pd.Series([81, 71, 67],
index=['Maths','Science','Hindi']),
'Mallika': pd.Series([94, 95, 99],
index=['Maths','Science','Hindi'])}
Activity 2.7
>>> ResultDF = pd.DataFrame(ResultSheet)
Use the type function >>> ResultDF
to check the datatypes Arnab Ramit Samridhi Riya Mallika
of ResultSheet and Maths 90 92 89 81 94
ResultDF. Are they the Science 91 81 91 71 95
same? Hindi 97 96 88 67 99
The following output shows that every column in the
DataFrame is a Series:
>>> type(ResultDF.Arnab)
<class 'pandas.core.series.Series'>
When a DataFrame is created from a Dictionary of
Series, the resulting index or row labels are a union of all
series indexes used to create the DataFrame. For example:
dictForUnion = { 'Series1' :
pd.Series([1,2,3,4,5],
index = ['a', 'b', 'c', 'd', 'e']) ,
'Series2' :
pd.Series([10,20,-10,-50,100],
index = ['z', 'y', 'a', 'c', 'e']),
'Series3' :
pd.Series([10,20,-10,-50,100],
index = ['z', 'y', 'a', 'c', 'e']) }

>>> dFrameUnion = pd.DataFrame(dictForUnion)


>>> dFrameUnion

Series1 Series2 Series3
a 1.0 -10.0 -10.0
b 2.0 NaN NaN
c 3.0 -50.0 -50.0
d 4.0 NaN NaN
e 5.0 100.0 100.0
y NaN 20.0 20.0
z NaN 10.0 10.0

2.3.2 Operations on rows and columns in DataFrames


We can perform some basic operations on rows and
columns of a DataFrame like selection, deletion,
addition, and renaming, as discussed in this section.

2024-25

Chapter 2.indd 44 11/26/2020 12:32:47 PM


Data Handling Using Pandas - I 45

(A) Adding a New Column to a DataFrame


We can easily add a new column to a DataFrame. Let
us consider the DataFrame ResultDF defined earlier. In
order to add a new column for another student ‘Preeti’,
we can write the following statement:
>>> ResultDF['Preeti']=[89,78,76]
>>> ResultDF

Arnab Ramit Samridhi Riya Mallika Preeti


Maths 90 92 89 81 94 89
Science 91 81 91 71 95 78
Hindi 97 96 88 67 99 76
Assigning values to a new column label that does not
exist will create a new column at the end. If the column
already exists in the DataFrame then the assignment
statement will update the values of the already existing
column, for example:
>>> ResultDF['Ramit']=[99, 98, 78]
>>> ResultDF
Arnab Ramit Samridhi Riya Mallika Preeti
Maths 90 99 89 81 94 89
Science 91 98 91 71 95 78
Hindi 97 78 88 67 99 76
We can also change data of an entire column to a
particular value in a DataFrame. For example, the
following statement sets marks=90 for all subjects for
the column name 'Arnab':

>>> ResultDF['Arnab']=90
>>> ResultDF
Arnab Ramit Samridhi Riya Mallika Preeti
Maths 90 99 89 81 94 89
Science 90 98 91 71 95 78
Hindi 90 78 88 67 99 76
(B) Adding a New Row to a DataFrame
We can add a new row to a DataFrame using the
DataFrame.loc[ ] method. Consider the DataFrame
ResultDF that has three rows for the three subjects –
Maths, Science and Hindi. Suppose, we need to add the
marks for English subject in ResultDF, we can use the
following statement:
>>> ResultDF
Arnab Ramit Samridhi Riya Mallika Preeti
Maths 90 92 89 81 94 89
Science 91 81 91 71 95 78
Hindi 97 96 88 67 99 76

2024-25

Chapter 2.indd 45 11/26/2020 12:32:47 PM


46 Informatics Practices

>>> ResultDF.loc['English'] = [85, 86, 83, 80, 90, 89]


>>> ResultDF
Arnab Ramit Samridhi Riya Mallika Preeti
Maths 90 92 89 81 94 89
Science 91 81 91 71 95 78
Hindi 97 96 88 67 99 76
English 85 86 83 80 90 89
We cannot use this method to add a row of data with
already existing (duplicate) index value (label). In such
case, a row with this index label will be updated, for
example:
>>> ResultDF.loc['English'] = [95, 86, 95, 80, 95,99]
>>> ResultDF

Arnab Ramit Samridhi Riya Mallika Preeti


Maths 90 92 89 81 94 89
Science 91 81 91 71 95 78
Hindi 97 96 88 67 99 76
English 95 86 95 80 95 99
DataFRame.loc[] method can also be used to change
the data values of a row to a particular value. For
example, the following statement sets marks in 'Maths'
for all columns to 0:
>>> ResultDF.loc['Maths']=0
>>> ResultDF
Arnab Ramit Samridhi Riya Mallika Preeti
Maths 0 0 0 0 0 0
Science 91 81 91 71 95 78
Hindi 97 96 88 67 99 76
English 95 86 95 80 95 99
If we try to add a row with lesser values than the
Think and Reflect number of columns in the DataFrame, it results in a
ValueError, with the error message: ValueError:
Can you write a Cannot set a row with mismatched columns.
program to count Similarly, if we try to add a column with lesser values
the number of rows than the number of rows in the DataFrame, it results
and columns in a
DataFrame? in a ValueError, with the error message: ValueError:
Length of values does not match length of index.
Further, we can set all values of a DataFrame to a
particular value, for example:
>>> ResultDF[: ] = 0 # Set all values in ResultDF to 0
>>> ResultDF
Arnab Ramit Samridhi Riya Mallika Preeti
Maths 0 0 0 0 0 0
Science 0 0 0 0 0 0
Hindi 0 0 0 0 0 0
English 0 0 0 0 0 0

2024-25

Chapter 2.indd 46 11/26/2020 12:32:47 PM


Data Handling Using Pandas - I 47

(C) Deleting Rows or Columns from a DataFrame Notes


We can use the DataFrame.drop() method to delete rows
and columns from a DataFrame. We need to specify the
names of the labels to be dropped and the axis from
which they need to be dropped. To delete a row, the
parameter axis is assigned the value 0 and for deleting
a column,the parameter axis is assigned the value 1.
Consider the following DataFrame:
>>> ResultDF
Arnab Ramit Samridhi Riya Mallika
Maths 90 92 89 81 94
Science 91 81 91 71 95
Hindi 97 96 88 67 99
English 95 86 95 80 95
The following example shows how to delete the row
with label 'Science':
>>> ResultDF = ResultDF.drop('Science', axis=0)
>>> ResultDF
Arnab Ramit Samridhi Riya Mallika
Maths 90 92 89 81 94
Hindi 97 96 88 67 99
English 95 86 95 80 95
The following example shows how to delete the
columns having labels 'Samridhi', 'Ramit' and 'Riya':
>>> ResultDF = ResultDF.drop(['Samridhi','Rami
t','Riya'], axis=1)
>>> ResultDF
Arnab Mallika
Maths 90 94
Hindi 97 99
English 95 95
If the DataFrame has more than one row with the
same label, the DataFrame.drop() method will delete all
the matching rows from it. For example, consider the
following DataFrame:

>>> ResultDF
Arnab Ramit Samridhi Riya Mallika
Maths 90 92 89 81 94
Science 91 81 91 71 95
Hindi 97 96 88 67 99
Hindi 97 89 78 60 45
To remove the duplicate rows labelled ‘Hindi’, we
need to write the following statement:
>>> ResultDF= ResultDF.drop('Hindi', axis=0)
>>> ResultDF

2024-25

Chapter 2.indd 47 11/26/2020 12:32:47 PM


48 Informatics Practices

Arnab Ramit Samridhi Riya Mallika


Maths 90 92 89 81 94
Science 91 81 91 71 95
(D) Renaming Row Labels of a DataFrame
We can change the labels of rows and columns in a
DataFrame using the DataFrame.rename() method.
Consider the following DataFrame. To rename the row
indices Maths to sub1, Science to sub2, Hindi to sub3
and English to sub4 we can write the following statement:
>>> ResultDF
Arnab Ramit Samridhi Riya Mallika
Maths 90 92 89 81 94
Science 91 81 91 71 95
English 97 96 88 67 99
Hindi 97 89 78 60 45
Think and Reflect
>>> ResultDF=ResultDF.rename({'Maths':'Sub1',
What if in the rename ‘Science':'Sub2','English':'Sub3',
function we pass a
'Hindi':'Sub4'}, axis='index')
value for a row label
>>> print(ResultDF)
that does not exist?
Arnab Ramit Samridhi Riya Mallika
Sub1 90 92 89 81 94
Sub2 91 81 91 71 95
Sub3 97 96 88 67 99
Sub4 97 89 78 60 45
The parameter axis='index' is used to specify that
the row label is to be changed. If no new label is passed
corresponding to an existing label, the existing row label
is left as it is, for example:
>>> ResultDF=ResultDF.rename({'Maths':'Sub1',‘S
cience':'Sub2','Hindi':'Sub4'}, axis='index')
>>> print(ResultDF)

Arnab Ramit Samridhi Riya Mallika


Sub1 90 92 89 81 94
Sub2 91 81 91 71 95
English 97 96 88 67 99
Sub4 97 89 78 60 45
(E) Renaming Column Labels of a DataFrame
To alter the column names of ResultDF we can again use
the rename() method, as shown below. The parameter
axis='columns' implies we want to change the column
labels:
>>> ResultDF=ResultDF.rename({'Arnab':'Student1','Ramit':'Student2','
Samridhi':'Student3','Mallika':'Student4'},axis='columns')
>>> print(RsultDF)

2024-25

Chapter 2.indd 48 11/26/2020 12:32:47 PM


Data Handling Using Pandas - I 49

Student1 Student2 Student3 Riya Student4


Maths 90 92 89 81 94
Science 91 81 91 71 95
English 97 96 88 67 99
Hindi 97 89 78 60 45

Note that the column Riya remains unchanged since


we did not pass any new label.
2.3.3 Accessing DataFrames Element through Think and Reflect
Indexing
What would happen if
Data elements in a DataFrame can be accessed using the label or row index
indexing.There are two ways of indexing Dataframes : passed is not present
Label based indexing and Boolean Indexing. in the DataFrame?

(A) Label Based Indexing


There are several methods in Pandas to implement label
based indexing. DataFrame.loc[ ] is an important method
that is used for label based indexing with DataFrames.
Let us continue to use the ResultDF created earlier.
As shown in the following example, a single row label
returns the row as a Series.
>>> ResultDF
Arnab Ramit Samridhi Riya Mallika
Maths 90 92 89 81 94
Science 91 81 91 71 95
Hindi 97 96 88 67 99

>>> ResultDF.loc['Science']

Arnab 91
Ramit 81
Samridhi 91
Riya 71
Mallika 95
Name: Science, dtype: int64
Also, note that when the row label is passed as an
integer value, it is interpreted as a label of the index and
not as an integer position along the index, for example:
>>> dFrame10Multiples = pd.DataFrame([10,20,30,40,50])

>>> dFrame10Multiples.loc[2]
0 30
Name: 2, dtype: int64
When a single column label is passed, it returns the column
as a Series.
>>> ResultDF.loc[:,'Arnab']

2024-25

Chapter 2.indd 49 11/26/2020 12:32:47 PM


50 Informatics Practices

Notes Maths 90
Science 91
Hindi 97
Name: Arnab, dtype: int64
Also, we can obtain the same result that is the marks
of ‘Arnab’ in all the subjects by using the command:
>>> print(df['Arnab'])

Maths 56
Science 91
English 97 Hindi 97
Name: Arnab, dtype: int64
To read more than one row from a DataFrame, a list
of row labels is used as shown below. Note that using [[]]
returns a DataFrame.
>>> ResultDF.loc[['Science', 'Hindi']]
Arnab Ramit Samridhi Riya Mallika
Science 91 81 91 71 95
Hindi 97 96 88 67 99
(B) Boolean Indexing
Boolean means a binary variable that can represent
either of the two states - True (indicated by 1) or False
(indicated by 0). In Boolean indexing, we can select
the subsets of data based on the actual values in the
DataFrame rather than their row/column labels. Thus,
we can use conditions on column names to filter data
values. Consider the DataFrame ResultDF, the following
statement displays True or False depending on whether
the data value satisfies the given condition or not.
>>> ResultDF.loc['Maths'] > 90
Arnab False
Ramit True
Samridhi False
Riya False
Mallika True
Name: Maths, dtype: bool
To check in which subjects ‘Arnab’ has scored more
than 90, we can write:
>>> ResultDF.loc[:,‘Arnab’]>90
Maths False
Science True
Hindi True
Name: Arnab, dtype: bool

2.3.4 Accessing DataFrames Element through Slicing


We can use slicing to select a subset of rows and/or
columns from a DataFrame. To retrieve a set of rows,

2024-25

Chapter 2.indd 50 11/26/2020 12:32:47 PM


Data Handling Using Pandas - I 51

slicing can be used with row labels. For example:


>>> ResultDF.loc['Maths': 'Science']
Arnab Ramit Samridhi Riya Mallika
Maths 90 92 89 81 94
Science 91 81 91 71 95
Here, the rows with labels Maths and Science are
displayed. Note that in DataFrames slicing is inclusive Activity 2.8
of the end values. We may use a slice of labels with
a column name to access values of those rows in that a) Using the DataFrame
column only. For example, the following statement ResultDF, write the
statement to access
displays the rows with label Maths and Science, and Marks of Arnab in
column with label Arnab: Maths.
>>> ResultDF.loc['Maths': 'Science', ‘Arnab’] b) Create a DataFrame
Maths 90 having 5 rows and
Science 91 write the statement
Name: Arnab, dtype: int64 to get the first 4 rows
of it.
We may use a slice of labels with a slice of column
names to access values of those rows and columns:
>>> ResultDF.loc['Maths': 'Science', ‘Arnab’:’Samridhi’]
Arnab Ramit Samridhi
Maths 90 92 89
Science 91 81 91

Alternatively, we may use a slice of labels with a list


of column names to access values of those rows and
columns:
>>> ResultDF.loc['Maths': 'Science',[‘Arnab’,’Samridhi’]]
Arnab Samridhi
Maths 90 89
Science 91 91
Filtering Rows in DataFrames
In DataFrames, Boolean values like True (1) and False
(0) can be associated with indices. They can also be used
to filter the records using the DataFrmae.loc[] method.
In order to select or omit particular row(s), we can use
a Boolean list specifying ‘True’ for the rows to be shown
and ‘False’ for the ones to be omitted in the output. For
example, in the following statement, row having index
as Science is omitted:
>>> ResultDF.loc[[True, False, True]]
Arnab Ramit Samridhi Riya Mallika
Maths 90 92 89 81 94
Hindi 97 96 88 67 99

2024-25

Chapter 2.indd 51 11/26/2020 12:32:47 PM


52 Informatics Practices

Notes 2.3.5 Joining, Merging and Concatenation of


DataFrames
(A) Joining
We can use the pandas.DataFrame.append() method to
merge two DataFrames. It appends rowsof the second
DataFrame at the end of the first DataFrame. Columns
not present in the first DataFrame are added as new
columns. For example, consider the two DataFrames—
dFrame1 and dFrame2described below. Let us use
theappend() method to append dFrame2 to dFrame1:

>>> dFrame1=pd.DataFrame([[1, 2, 3], [4, 5],


[6]], columns=['C1', 'C2', 'C3'], index=['R1',
'R2', 'R3'])
>>> dFrame1
C1 C2 C3
R1 1 2.0 3.0
R2 4 5.0 NaN
R3 6 NaN NaN

>>> dFrame2=pd.DataFrame([[10, 20], [30], [40,


50]], columns=['C2', 'C5'], index=['R4', 'R2',
'R5'])
>>> dFrame2
C2 C5
R4 10 20.0
R2 30 NaN
R5 40 50.0

>>> dFrame1=dFrame1.append(dFrame2)
>>> dFrame1
C1 C2 C3 C5
R1 1.0 2.0 3.0 NaN
R2 4.0 5.0 NaN NaN
R3 6.0 NaN NaN NaN
R4 NaN 10.0 NaN 20.0
R2 NaN 30.0 NaN NaN
R5 NaN 40.0 NaN 50.0
Alternatively, if we append dFrame1 to dFrame2, the
rows of dFrame2 precede the rows of dFrame1. To get
the column labels appear in sorted order we can set the
parameter sort=True. The column labels shall appear in
unsorted order when the parameter sort = False.
# append dFrame1 to dFrame2
>>> dFrame2 =dFrame2.append(dFrame1,
sort=’True’)
>>> dFrame2
C1 C2 C3 C5
R4 NaN 10.0 NaN 20.0
R2 NaN 30.0 NaN NaN

2024-25

Chapter 2.indd 52 11/26/2020 12:32:47 PM


Data Handling Using Pandas - I 53

R5 NaN 40.0 NaN 50.0


R1 1.0 2.0 3.0 NaN
R2 4.0 5.0 NaN NaN
R3 6.0 NaN NaN NaN
# append dFrame1 to dFrame2 with sort=False
>>> dFrame2 = dFrame2.append(dFrame1,
sort=’False’)
>>> dFrame2
C2 C5 C1 C3
R4 10.0 20.0 NaN NaN
R2 30.0 NaN NaN NaN
R5 40.0 50.0 NaN NaN
R1 2.0 NaN 1.0 3.0
R2 5.0 NaN 4.0 NaN
R3 NaN NaN 6.0 NaN
The parameter verify_integrity of append()method
may be set to True when we want to raise an error if the
row labels are duplicate. By default, verify_integrity =
False. That is why we could append the duplicate row
with label R2 when appending the two DataFrames, as
shown above.
The parameter ignore_index of append()method may
be set to True, when we do not want to use row index
labels. By default, ignore_index = False.
>>> dFrame1 = dFrame1.append(dFrame2, ignore_ Think and Reflect
index=True)
>>> dFrame1 How can you check
C1 C2 C3 C5 whether a given
0 1.0 2.0 3.0 NaN DataFrame has any
1 4.0 5.0 NaN NaN missing value or not?
2 6.0 NaN NaN NaN
3 NaN 10.0 NaN 20.0
4 NaN 30.0 NaN NaN
5 NaN 40.0 NaN 50.0
The append()method can also be used to append a
series or a dictionary to a DataFrame.
2.3.6 Attributes of DataFrames
Like Series, we can access certain properties called
attributes of a DataFrame by using that property with
the DataFrame name. Table 2.4 lists some attributes of
Pandas DataFrame. We are going to use a part of the
data from a report called “STATE OF FOREST REPORT
2017”, Published by Forest Survey of India, accessible
at http://fsi.nic.in/forest-report-2017, as our example
data in this section.
As per this report, the geographical area, the area
under very dense forests, the area under moderately

2024-25

Chapter 2.indd 53 11/26/2020 12:32:47 PM


54 Informatics Practices

dense forests, and the area under open forests (all in sq


km), in three States of India - Assam, Delhi and Kerala
are as shown in the following DataFrame ForestAreaDF:

>>> ForestArea = {
'Assam' :pd.Series([78438, 2797,
10192, 15116], index = ['GeoArea', 'VeryDense',
'ModeratelyDense', 'OpenForest']),
'Kerala' :pd.Series([ 38852, 1663,
9407, 9251], index = ['GeoArea' ,'VeryDense',
'ModeratelyDense', 'OpenForest']),
'Delhi' :pd.Series([1483, 6.72, 56.24,
129.45], index = ['GeoArea', 'VeryDense',
'ModeratelyDense', 'OpenForest'])}

>>> ForestAreaDF = pd.DataFrame(ForestArea)


>>> ForestAreaDF
Assam Kerala Delhi
GeoArea 78438 38852 1483.00
VeryDense 2797 1663 6.72
ModeratelyDense 10192 9407 56.24
OpenForest 15116 9251 129.45

Table 2.4 Some Attributes of Pandas DataFrame


Attribute Name Purpose Example
DataFrame.index to display row >>> ForestAreaDF.index
labels Index([‘GeoArea’, ‘VeryDense’,
‘ModeratelyDense’, ‘OpenForest’], dtype
=’object’)
DataFrame.columns to display column >>> ForestAreaDF.columns
labels Index([‘Assam’, ‘Kerala’, ‘Delhi’],
dtype=’object’)
DataFrame.dtypes to display data >>> ForestAreaDF.dtypes
type of each Assam int64
column in the Kerala int64
DataFrame Delhi float64
dtype: object
DataFrame.values to display a NumPy >>> ForestAreaDF.values
ndarray having array([[7.8438e+04, 3.8852e+04, 1.4830e+03],
all the values in        [2.7970e+03, 1.6630e+03, 6.7200e+00],
the DataFrame,        [1.0192e+04, 9.4070e+03, 5.6240e+01],
without the axes        [1.5116e+04, 9.2510e+03,
labels 1.2945e+02]])

DataFrame.shape to display a tuple >>> ForestAreaDF.shape


representing the (4, 3)
dimensionality of It means ForestAreaDF has 4 rows and 3
the DataFrame columns.
DataFrame.size to display a tuple >>> ForestAreaDF.size
representing the 12
dimensionality of This means the ForestAreaDF has 12 values in
the DataFrame it.

2024-25

Chapter 2.indd 54 11/26/2020 12:32:47 PM


Data Handling Using Pandas - I 55

DataFrame.T to transpose >>> ForestAreaDF.T


the DataFrame. GeoArea VeryDense ModeratelyDense OpenForest
Means, row indices Assam 78438.0 2797.00 10192.00 15116.00
and column labels Kerala38852.0 1663.00 9407.00 9251.00
Delhi 1483.0 6.72 56.24 129.45
of the DataFrame
replace each
other’s position
DataFrame.head(n) to display the >>> ForestAreaDF.head(2)
first n rows in the        Assam Kerala Delhi
DataFrame GeoArea 78438 38852 1483.00
VeryDense 2797 1663 6.72

displays the first 2 rows of the DataFrame


ForestAreaDF.If the parameter n is not
specified by default it gives the first 5 rows
of the DataFrame.
DataFrame.tail(n) to display the >>> ForestAreaDF.tail(2)
last n rows in the                Assam Kerala Delhi
DataFrame ModeratelyDense 10192 9407 56.24
OpenForest 15116 9251 129.45

displays the last 2 rows of the DataFrame


ForestAreaDF.If the parameter n is not
specified by default it gives the last 5 rows
of the DataFrame.

to returns the >>> ForestAreaDF.empty


value True if False
DataFrame is >>> df=pd.DataFrame() #Create an empty dataFrame
empty and False >>> df.empty
otherwise True

2.4 Importing and Exporting Data between CSV


Files and DataFrames
We can create a DataFrame by importing data from CSV
files where values are separated by commas. Similarly,
we can also store or export data in a DataFrame as a
.csv file.
2.4.1 Importing a CSV file to a DataFrame
Let us assume that we have the following data in a csv file
named ResultData.csv stored in the folder C:/NCERT.
In order to practice the code while we progress, you are
suggested to create this csv file using a spreadsheet and
save in your computer.
RollNo Name Eco Maths
1 Arnab 18 57
2 Kritika 23 45
3 Divyam 51 37
4 Vivaan 40 60
5 Aaroosh 18 27

2024-25

Chapter 2.indd 55 11/26/2020 12:32:48 PM


56 Informatics Practices

Notes We can load the data from the ResultData.csv file


into a DataFrame, say marks using Pandas read_csv()
function as shown below:
>>> marks = pd.read_csv("C:/NCERT/ResultData.
csv",sep =",", header=0)
>>> marks
RollNo Name Eco Maths
0 1 Arnab 18 57
1 2 Kritika 23 45
2 3 Divyam 51 37
3 4 Vivaan 40 60
4 5 Aaroosh 18 27

• The first parameter to the read_csv() is the name of


the comma separated data file along with its path.
• The parameter sep specifies whether the values are
separated by comma, semicolon, tab, or any other
character. The default value for sepis a space.
• The parameter header specifies the number of the row
whose values are to be used as the column names. It
also marks the start of the data to be fetched. header=0
implies that column names are inferred from the first
line of the file. By default, header=0.
We can exclusively specify column names using the
parameter names while creating the DataFrame using
the read_csv() function. For example, in the following
statement, names parameter is used to specify the
labels for columns of the DataFrame marks1:
>>> marks1 = pd.read_csv("C:/NCERT/ResultData1.
csv",sep=",",
names=['RNo','StudentName', 'Sub1',
'Sub2'])
>>> marks1
RNo StudentName Sub1 Sub2
0 1 Arnab 18 57
1 2 Kritika 23 45
2 3 Divyam 51 37
3 4 Vivaan 40 60
4 5 Aaroosh 18 27

2.4.2 Exporting a DataFrame to a CSV file


We can use the to_csv() function to save a DataFrame
to a text or csv file. For example, to save the DataFrame
ResultDF created in the previous section; we can use
the following statement:

>>> ResultDF

2024-25

Chapter 2.indd 56 11/26/2020 12:32:48 PM


Data Handling Using Pandas - I 57

Arnab Ramit Samridhi Riya Mallika


Maths 90 92 89 81 94 A Comma-
Science 91 81 91 71 95 Separated Value
Hindi 97 96 88 67 99 (CSV) file is a text
file where values
>>> ResultDF.to_csv(path_or_buf='C:/NCERT/ are separated by
resultout.csv', sep=',') comma. Each
line represents
This creates a file by the name resultout.csv in the a record (row).
folder C:/NCERT on the hard disk. When we open this Each row consists
file in any text editor or a spreadsheet, we will find the of one or more
above data along with the row labels and the column fields (columns).
headers, separated by comma. They can be easily
handled through
In case we do not want the column names to be saved a spreadsheet
to the file we may use the parameter header=False. application.
Another parameter index=False is used when we do not
want the row labels to be written to the file on disk. For
example:
>>> ResultDF.to_csv( 'C:/NCERT/resultonly.txt',
sep = '@', header = False, index= False)

If we open the file resultonly.txt, we will find


the following contents: Think and Reflect
90@92@89@81@94 What are the other
91@81@91@71@95 parameters that can
97@96@88@67@99 be used with read_csv()
function? You may
2.5 Pandas Series Vs NumPy ndarray explore from https://
pandas.pydata.org.
Pandas supports non-unique index values. If an
operation that does not support duplicate index values
is attempted, an exception will be raised at that time.
A basic difference between Series and ndarray is that Think and Reflect
operations between Series automatically align the data Besides comma, what
based on the label. Thus, we can write computations are the other allowed
without considering whether all Series involved have characters that can be
the same label or not. used as a separator
while creating a CSV
The result of an operation between unaligned Series
file frmo a DataFrame?
(i.e. where the corresponding labels of the series are not
the same or are not in the same order) will have the
union of the indexes involved. If a label is not found
in one Series or the other, the result will be marked as
missing NaN. Being able to write code without doing
any explicit data alignment grants immense freedom
and flexibility in interactive data analysis and research.

2024-25

Chapter 2.indd 57 11/26/2020 12:32:48 PM


58 Informatics Practices

Table 2.5 Difference between Pandas Series and NumPy Arrays


Pandas Series NumPy Arrays

In series we can define our own labeled index to NumPy arrays are accessed by their integer
access elements of an array. These can be numbers position using numbers only.
or letters.

The elements can be indexed in descending order The indexing starts with zero for the first
also. element and the index is fixed.
If two series are not aligned, NaN or missing values There is no concept of NaN values and if there
are generated. are no matching values in arrays, alignment
fails.
Series require more memory. NumPy occupies lesser memory.

S ummary
• NumPy, Pandas and Matplotlib are Python
libraries for scientific and analytical use.
• pip install pandas is the command to install
Pandas library.
• A data structure is a collection of data values
and the operations that can be applied to that
data. It enables efficient storage, retrieval and
modification to the data.
• Two main data structures in Pandas library
are Series and DataFrame. To use these
data structures, we first need to import the
Pandas library.
• A Series is a one-dimensional array containing a
sequence of values. Each value has a data label
associated with it also called its index.
• The two common ways of accessing the elements
of a series are Indexing and Slicing.
• There are two types of indexes: positional index
and labelled index. Positional index takes an
integer value that corresponds to its position in
the series starting from 0, whereas labelled index
takes any user-defined label as index.
• When positional indices are used for slicing, the
value at end index position is excluded, i.e., only
(end - start) number of data values of the series
are extracted. However with labelled indexes the

2024-25

Chapter 2.indd 58 11/26/2020 12:32:48 PM


Data Handling Using Pandas - I 59

Notes
value at the end index label is also included in
the output.
• All basic mathematical operations can be
performed on Series either by using the
operator or by using appropriate methods of the
Series object.
• While performing mathematical operations index
matching is implemented and if no matching
indexes are found during alignment, Pandas
returns NaN so that the operation does not fail.
• A DataFrame is a two-dimensional labeled data
structure like a spreadsheet. It contains rows
and columns and therefore has both a row and
column index.
• When using a dictionary to create a DataFrame,
keys of the Dictionary become the column labels
of the DataFrame. A DataFrame can be thought of
as a dictionary of lists/ Series (all Series/columns
sharing the same index label for a row).
• Data can be loaded in a DataFrame from a file on
the disk by using Pandas read_csv function.
• Data in a DataFrame can be written to a text
file on disk by using the pandas.DataFrame.to_
csv() function.
• DataFrame.T gives the transpose of a DataFrame.
• Pandas haves a number of methods that support
label based indexing but every label asked for
must be in the index, or a KeyError will be raised.
• DataFrame.loc[ ] is used for label based indexing
of rows in DataFrames.
• Pandas.DataFrame.append() method is used to
merge two DataFrames.
• Pandas supports non-unique index values. Only
if a particular operation that does not support
duplicate index values is attempted, an exception
is raised at that time.
• The basic difference between Pandas Series and
NumPy ndarray is that operations between Series
automatically align the data based on labels. Thus,
we can write computations without considering
whether all Series involved have the same label or
not whereas in case of ndarrays it raises an error.

2024-25

Chapter 2.indd 59 11/26/2020 12:32:48 PM


60 Informatics Practices

Notes Exercise
1. What is a Series and how is it different from a 1-D
array, a list and a dictionary?
2. What is a DataFrame and how is it different from a
2-D array?
3. How are DataFrames related to Series?
4. What do you understand by the size of (i) a Series,
(ii) a DataFrame?
5. Create the following Series and do the specified
operations:
a) EngAlph, having 26 elements with the alphabets
as values and default index values.
b) Vowels, having 5 elements with index labels ‘a’,
‘e’, ‘i’, ‘o’ and ‘u’ and all the five values set to zero.
Check if it is an empty series.
c) Friends, from a dictionary having roll numbers of
five of your friends as data and their first name
as keys.
d) MTseries, an empty Series. Check if it is an empty
series.
e) MonthDays, from a numpy array having the
number of days in the 12 months of a year. The
labels should be the month numbers from 1 to 12.
6. Using the Series created in Question 5, write
commands for the following:
a) Set all the values of Vowels to 10 and display the
Series.
b) Divide all values of Vowels by 2 and display the
Series.
c) Create another series Vowels1 having 5 elements
with index labels ‘a’, ‘e’, ‘i’, ‘o’ and ‘u’ having values
[2,5,6,3,8] respectively.
d) Add Vowels and Vowels1 and assign the result to
Vowels3.
e) Subtract, Multiply and Divide Vowels by Vowels1.
f) Alter the labels of Vowels1 to [‘A’, ‘E’, ‘I’, ‘O’, ‘U’].
7. Using the Series created in Question 5, write
commands for the following:
a) Find the dimensions, size and values of the Series
EngAlph, Vowels, Friends, MTseries, MonthDays.
b) Rename the Series MTseries as SeriesEmpty.
c) Name the index of the Series MonthDays as
monthno and that of Series Friends as Fname.

2024-25

Chapter 2.indd 60 11/26/2020 12:32:48 PM


Data Handling Using Pandas - I 61

d) Display the 3rd and 2nd value of the Series Notes


Friends, in that order.
e) Display the alphabets ‘e’ to ‘p’ from the Series
EngAlph.
f) Display the first 10 values in the Series EngAlph.
g) Display the last 10 values in the Series EngAlph.
h) Display the MTseries.
8. Using the Series created in Question 5, write
commands for the following:
a) Display the names of the months 3 through 7
from the Series MonthDays.
b) Display the Series MonthDays in reverse order.
9. Create the following DataFrame Sales containing
year wise sales figures for five sales persons in INR.
Use the years as column labels, and sales person
names as row labels.
2014 2015 2016 2017
Madhu 100.5 12000 20000 50000
Kusum 150.8 18000 50000 60000
Kinshuk 200.9 22000 70000 70000
Ankit 30000 30000 100000 80000
Shruti 40000 45000 125000 90000
10. Use the DataFrame created in Question 9 above to
do the following:
a) Display the row labels of Sales.
b) Display the column labels of Sales.
c) Display the data types of each column of Sales.
d) Display the dimensions, shape, size and values
of Sales.
e) Display the last two rows of Sales.
f) Display the first two columns of Sales.
g) Create a dictionary using the following data. Use
this dictionary to create a DataFrame Sales2.
2018
Madhu 160000
Kusum 110000
Kinshuk 500000
Ankit 340000
Shruti 900000
h) Check if Sales2 is empty or it contains data.

2024-25

Chapter 2.indd 61 11/26/2020 12:32:48 PM


62 Informatics Practices

Notes 11. Use the DataFrame created in Question 9 above to


do the following:
a) Append the DataFrame Sales2 to the DataFrame
Sales.
b) Change the DataFrame Sales such that it becomes
its transpose.
c) Display the sales made by all sales persons in the
year 2017.
d) Display the sales made by Madhu and Ankit in
the year 2017 and 2018.
e) Display the sales made by Shruti 2016.
f) Add data to Sales for salesman Sumeet where
the sales made are [196.2, 37800, 52000, 78438,
38852] in the years [2014, 2015, 2016, 2017,
2018] respectively.
g) Delete the data for the year 2014 from the
DataFrame Sales.
h) Delete the data for sales man Kinshuk from the
DataFrame Sales.
i) Change the name of the salesperson Ankit to
Vivaan and Madhu to Shailesh.
j) Update the sale made by Shailesh in 2018 to
100000.
k) Write the values of DataFrame Sales to a comma
separated file SalesFigures.csv on the disk. Do
not write the row labels and column labels.
l) Read the data in the file SalesFigures.csv into
a DataFrame SalesRetrieved and Display it.
Now update the row labels and column labels of
SalesRetrieved to be the same as that of Sales.

2024-25

Chapter 2.indd 62 11/26/2020 12:32:48 PM


Chapter
Data Handling using
3 Pandas - II

“We owe a lot to the Indians, who


taught us how to count, without
which no worthwhile scientific
discovery could have been made.”
— Albert Einstein

In this chapter
»» Introduction
»» Descriptive Statistics
3.1 Introduction
»» Data Aggregations
As discussed in the previous chapter, Pandas »» Sorting a DataFrame
is a well established Python Library used for
»» GROUP BY Functions
manipulation, processing and analysis of
data. We have already discussed the basic »» Altering the Index
operations on Series and DataFrame like »» Other DataFrame
creating them and then accessing data from Operations
them. Pandas provides more powerful and »» Handling Missing
useful functions for data analysis. Values
In this chapter, we will be working with »» Import and Export
more advanced features of DataFrame like of Data between
sorting data, answering analytical questions Pandas and MySQL
using the data, cleaning data and applying
different useful functions on the data. Below
is the example data on which we will be
applying the advanced features of Pandas.

2024-25

Chapter 3.indd 63 11/26/2020 12:46:03 PM


64 Informatics Practices

Case Study
Let us consider the data of marks scored in unit tests
held in school. For each unit test, the marks scored by
all students of the class is recorded. Maximum marks
are 25 in each subject. The subjects are Maths, Science.
Social Studies (S.St.), Hindi, and English. For simplicity,
we assume there are 4 students in the class and the
table below shows their marks in Unit Test 1, Unit Test
2 and Unit Test 3. Table 3.1 shows this data.
Table 3.1 Case Study
Result
Name/ Unit Maths Science S.St. Hindi Eng
Subjects Test
Raman 1 22 21 18 20 21
Raman 2 21 20 17 22 24
Raman 3 14 19 15 24 23
Zuhaire 1 20 17 22 24 19
Zuhaire 2 23 15 21 25 15
Zuhaire 3 22 18 19 23 13
Aashravy 1 23 19 20 15 22
Aashravy 2 24 22 24 17 21
Aashravy 3 12 25 19 21 23
Mishti 1 15 22 25 22 22
Mishti 2 18 21 25 24 23
Mishti 3 17 18 20 25 20

Let us store the data in a DataFrame, as shown in


Program 3.1:
Program 3-1 Store the Result data in a DataFrame called marksUT.

>>> import pandas as pd


>>> marksUT= {'Name':['Raman','Raman','Raman','Zuhaire','Zuhaire','Zu
haire', 'Ashravy','Ashravy','Ashravy','Mishti','Mishti','Mishti'],
'UT':[1,2,3,1,2,3,1,2,3,1,2,3],
'Maths':[22,21,14,20,23,22,23,24,12,15,18,17],
'Science':[21,20,19,17,15,18,19,22,25,22,21,18],
'S.St':[18,17,15,22,21,19,20,24,19,25,25,20],
'Hindi':[20,22,24,24,25,23,15,17,21,22,24,25],
'Eng':[21,24,23,19,15,13,22,21,23,22,23,20]
}
>>> df=pd.DataFrame(marksUT)
>>> print(df)

2024-25

Chapter 3.indd 64 11/26/2020 12:46:03 PM


Data Handling using Pandas - II 65

Name UT Maths Science S.St Hindi Eng


0 Raman 1 22 21 18 20 21
1 Raman 2 21 20 17 22 24
2 Raman 3 14 19 15 24 23
3 Zuhaire 1 20 17 22 24 19
4 Zuhaire 2 23 15 21 25 15
5 Zuhaire 3 22 18 19 23 13
6 Ashravy 1 23 19 20 15 22
7 Ashravy 2 24 22 24 17 21
8 Ashravy 3 12 25 19 21 23
9 Mishti 1 15 22 25 22 22
10 Mishti 2 18 21 25 24 23
11 Mishti 3 17 18 20 25 20

3.2 Descriptive Statistics


Descriptive Statistics are used to summarise the given
data. In other words, they refer to the methods which
are used to get some basic idea about the data.
In this section, we will be discussing descriptive
statistical methods that can be applied to a DataFrame.
These are max, min, count, sum, mean, median, mode,
quartiles, variance. In each case, we will consider the
above created DataFrame df.
3.2.1 Calculating Maximum Values
DataFrame.max() is used to calculate the maximum
values from the DataFrame, regardless of its data types.
The following statement outputs the maximum value of
each column of the DataFrame:
>>> print(df.max())
Name Zuhaire #Maximum value in name column
#(alphabetically)
UT 3 #Maximum value in column UT
Maths 24 #Maximum value in column Maths
Science 25 #Maximum value in column Science
S.St 25 #Maximum value in column S.St
Hindi 25 #Maximum value in column Hindi
Eng 24 #Maximum value in column Eng
dtype: object
If we want to output maximum value for the columns
having only numeric values, then we can set the
parameter numeric_only=True in the max() method, as
shown below:

2024-25

Chapter 3.indd 65 11/26/2020 12:46:04 PM


66 Informatics Practices

>>> print(df.max(numeric_only=True))
UT 3
Maths 24
Science 25
S.St 25
Hindi 25
Eng 24
dtype: int64
Program 3-2 Write the statements to output the
maximum marks obtained in each subject
in Unit Test 2.

>>> dfUT2 = df[df.UT == 2]


>>> print('\nResult of Unit Test 2:
\n\n',dfUT2)

Result of Unit Test 2:


Name UT Maths Science S.St Hindi Eng
1 Raman 2 21 20 17 22 24
4 Zuhaire 2 23 15 21 25 15
7 Ashravy 2 24 22 24 17 21
10 Mishti 2 18 21 25 24 23
The output of Program
3.2 can also be
achieved using the >>> print('\nMaximum Mark obtained in
following statements Each Subject in Unit Test 2: \n\n',dfUT2.
max(numeric_only=True))
>>> dfUT2=df[df
['UT']==2].max
(numeric_only=True) Maximum Mark obtained in Each Subject in Unit
>>> print(dfUT2) Test 2:

UT 2
Maths 24
Science 22
S.St 25
Hindi 25
Eng 24
dtype: int64
By default, the max() method finds the maximum
value of each column (which means, axis=0). However,
to find the maximum value of each row, we have to
specify axis = 1 as its argument.
#maximum marks for each student in each unit
test among all the subjects

2024-25

Chapter 3.indd 66 11/26/2020 12:46:04 PM


Data Handling using Pandas - II 67

>>> df.max(axis=1) Notes


0 22
1 24
2 24
3 24
4 25
5 23
6 23
7 24
8 25
9 25
10 25
11 25
dtype: int64

Note: In most of the python function calls, axis = 0 refers


to row wise operations and axis = 1 refers to column wise
operations. But in the call of max(), axis = 1 gives row wise
output and axis = 0 (default case) gives column-wise output.
Similar is the case with all statistical operations discussed
in this chapter.

3.2.2 Calculating Minimum Values


DataFrame.min() is used to display the minimum values
from the DataFrame, regardless of the data types. That
is, it shows the minimum value of each column or row.
The following line of code output the minimum value of
each column of the DataFrame:
>>> print(df.min())
Name Ashravy
UT 1
Maths 12
Science 15
S.St 15
Hindi 15
Eng 13
dtype: object

Program 3-3 Write the statements to display the


minimum marks obtained by a particular
student ‘Mishti’ in all the unit tests for
each subject.
>>> dfMishti = df.loc[df.Name == 'Mishti']

2024-25

Chapter 3.indd 67 11/26/2020 12:46:04 PM


68 Informatics Practices

>>> print('\nMarks obtained by Mishti in all


the Unit Tests \n\n',dfMishti)

Marks obtained by Mishti in all the Unit Tests


Name UT Maths Science S.St Hindi Eng
9 Mishti 1 15 22 25 22 22
10 Mishti 2 18 21 25 24 23
11 Mishti 3 17 18 20 25 20

>>> print('\nMinimum Marks obtained by


Mishti in each subject across the unit
tests\n\n', dfMishti[['Maths','Science','S.
St','Hindi','Eng']].min())
The output of Program
3.3 can also be Minimum Marks obtained by Mishti in each subject
achieved using the across the unit tests:
following statements
>>> dfMishti=df[['
Maths','Science','S. Maths 15
St','Hindi','Eng']][df. Science 18
Name == 'Mishti'].min() S.St 20
>>> print(dfMishti)
Hindi 22
Eng 20
dtype: int64
Note: Since we did not want to output the min value of
column UT, we mentioned all the other column names for
which minimum is to be calculated.

3.2.3 Calculating Sum of Values


DataFrame.sum() will display the sum of the values
from the DataFrame regardless of its datatype. The
following line of code outputs the sum of each column
of the DataFrame:
>>> print(df.sum())
Name
RamanRamanRamanZuhaireZuhaireZuhaireAshravyAsh...
UT 24
Maths 231
Science 237
S.St 245
Hindi 262
Eng 246
dtype: object

We may not be interested to sum text values. So,


to print the sum of a particular column, we need to

2024-25

Chapter 3.indd 68 11/26/2020 12:46:04 PM


Data Handling using Pandas - II 69

specify the column name in the call to function sum.


The following statement prints the total marks of
subject mathematics:
>>> print(df['Maths'].sum())
231

To calculate total marks of a particular student, the


name of the student needs to be specified.
Program 3-4 Write the python statement to print Think and Reflect
the total marks secured by raman in Can you write a
each subject. shortened code to get
the output of Program
>>> dfRaman=df[df['Name']=='Raman'] 3.4?
>>> print(“Marks obtained by Raman in each test
are:\n”, dfRaman)
Marks obtained by Raman in each test are:
Name UT Maths Science S.St Hindi Eng
0 Raman 1 22 21 18 20 21
1 Raman 2 21 20 17 22 24
2 Raman 3 14 19 15 24 23

>>> dfRaman[['Maths','Science','S.
St','Hindi','Eng']].sum()
Maths 57
Science 60
S.St 50 Activity 3.1
Hindi 66 Write the python
Eng 68 statements to print
dtype: int64 the sum of the english
marks scored by
#To print total marks scored by Raman in all Mishti.
subjects in each Unit Test
>>> dfRaman[['Maths','Science','S.
St','Hindi','Eng']].sum(axis=1)
0 102
1 104
2 95
dtype: int64

3.2.4 Calculating Number of Values


DataFrame.count() will display the total number of
values for each column or row of a DataFrame. To count
the rows we need to use the argument axis=1 as shown
in the Program 3.5 below.

2024-25

Chapter 3.indd 69 11/26/2020 12:46:04 PM


70 Informatics Practices

Notes >>> print(df.count())

Name 12
UT 12
Maths 12
Science 12
S.St 12
Hindi 12
Eng 12
dtype: int64

Program 3-5 Write a statement to count the number of


values in a row.
>>> df.count(axis=1)
0 7
1 7
2 7
3 7
4 7
5 7
6 7
7 7
8 7
9 7
10 7
11 7
dtype: int64

3.2.5 Calculating Mean


DataFrame.mean() will display the mean (average) of
the values of each column of a DataFrame. It is only
applicable for numeric values.
>>> df.mean()
UT 2.5000
Maths 18.6000
Science 19.8000
S.St 20.0000
Hindi 21.3125
Eng 19.8000
dtype: float64

Program 3-6 Write the statements to get an average


of marks obtained by Zuhaire in all the
Unit Tests.

2024-25

Chapter 3.indd 70 11/26/2020 12:46:04 PM


Data Handling using Pandas - II 71

>>> dfZuhaireMarks = dfZuhaire.


loc[:,'Maths':'Eng']
>>> print("Slicing of the DataFrame to get only
the marks\n", dfZuhaireMarks)

Slicing of the DataFrame to get only the marks


Maths Science S.St Hindi Eng
3 20 17 22 24 19
4 23 15 21 25 15
5 22 18 19 23 13

>>> print("Average of marks obtained by


Zuhaire in all Unit Tests \n", dfZuhaireMarks.
mean(axis=1))

Average of marks obtained by Zuhaire in all


Unit Tests
3 20.4
4 19.8
5 19.0 Think and Reflect
dtype: float64
Try to write a short
In the above output, 20.4 is the average of marks code to get the above
obtained by Zuhaire in Unit Test 1. Similarly, 19.8 and output. Remember
19.0 are the average of marks in Unit Test 2 and 3 to print the relevant
headings of the output.
respectively.
3.2.6 Calculating Median
DataFrame.Median() will display the middle value of the
data. This function will display the median of the values
of each column of a DataFrame. It is only applicable for
numeric values.
>>> print(df.median())

UT 2.5
Maths 19.0
Science 20.0
S.St 19.5
Hindi 21.5
Eng 21.0
dtype: float64

Program 3-7 Write the statements to print the median


marks of mathematics in UT1.

>>> dfMaths=df['Maths']

2024-25

Chapter 3.indd 71 11/26/2020 12:46:04 PM


72 Informatics Practices

>>> dfMathsUT1=dfMaths[df.UT==1]
>>> print("Displaying the marks scored in
Mathematics in UT1\n",dfMathsUT1)

Displaying the marks of UT1, subject


Mathematics
0 22
3 20
6 23
9 15
Name: Maths, dtype: int64

>>> dfMathMedian=dfMathsUT1.median()
>>> print("Displaying the median of Mathematics
in UT1\n”,dfMathMedian)

Activity 3.2 Displaying the median of Mathematics in UT1


Find the median of the 21.0
values of the rows of Here, the number of values are even in number
the DataFrame. so two middle values are there i.e. 20 and 22. Hence,
Median is the average of 20 and 22.
3.2.7 Calculating Mode
DateFrame.mode() will display the mode. The mode is
defined as the value that appears the most number of
times in a data. This function will display the mode of
each column or row of the DataFrame. To get the mode
of Hindi marks, the following statement can be used.
>>> df['Hindi']
0 20
1 22
2 24
3 24
4 25
5 23
6 15
7 17
8 21
9 22
Activity 3.3
10 24
Calculate the mode 11 25
of marks scored in
Maths.
Name: Hindi, dtype: int64
>>> df['Hindi'].mode()

2024-25

Chapter 3.indd 72 11/26/2020 12:46:04 PM


Data Handling using Pandas - II 73

0 24 Notes
dtype: int64

Note that three students have got 24 marks in Hindi


subject while two students got 25 marks, one student
got 23 marks, two students got 22 marks, one student
each got 21, 20, 15, 17 marks.
3.2.8 Calculating Quartile
Dataframe.quantile() is used to get the quartiles. It
will output the quartile of each column or row of the
DataFrame in four parts i.e. the first quartile is 25%
(parameter q = .25), the second quartile is 50% (Median),
the third quartile is 75% (parameter q = .75). By default,
it will display the second quantile (median) of all
numeric values.
>>> df.quantile() # by default, median is the
output
UT 2.0
Maths 20.5
Science 19.5
S.St 20.0
Hindi 22.5
Eng 21.5
Name: 0.5, dtype: float64

>>> df.quantile(q=.25)
UT 1.00
Maths 16.50
Science 18.00
S.St 18.75
Hindi 20.75
Eng 19.75
Name: 0.25, dtype: float64

>>> df.quantile(q=.75)
UT 3.00
Maths 22.25
Science 21.25
S.St 22.50
Hindi 24.00
Eng 23.00
Name: 0.75, dtype: float64

2024-25

Chapter 3.indd 73 11/26/2020 12:46:04 PM


74 Informatics Practices

Program 3-8 Write the statement to display the first and


third quartiles of all subjects.

>>> dfSubject=df[['Maths','Science','S.
St','Hindi','Eng']]
>>> print("Marks of all the subjects:\
n",dfSubject)

Marks of all the subjects:


Maths Science S.St Hindi Eng
0 22 21 18 20 21
1 21 20 17 22 24
2 14 19 15 24 23
3 20 17 22 24 19
4 23 15 21 25 15
5 22 18 19 23 13
6 23 19 20 15 22
7 24 22 24 17 21
8 12 25 19 21 23
9 15 22 25 22 22
10 18 21 25 24 23
11 17 18 20 25 20

>>> dfQ=dfSubject.quantile([.25,.75])
>>> print("First and third quartiles of all the
subjects:\n",dfQ)

First and third quartiles of all the subjects:


Maths Science S.St Hindi Eng
0.25 16.50 18.00 18.75 20.75 19.75

0.75 22.25 21.25 22.50 24.00 23.00

3.2.9 Calculating Variance


DataFrame.var() is used to display the variance. It is the
average of squared differences from the mean.
Activity 3.4
>>> df[['Maths','Science','S.
Find the variance and St','Hindi','Eng']].var()
standard deviation of
the following scores on
an exam: 92, 95, 85,
Maths 15.840909
80, 75, 50.
Science 7.113636
S.St 9.901515

2024-25

Chapter 3.indd 74 11/26/2020 12:46:04 PM


Data Handling using Pandas - II 75

Hindi 9.969697
Eng 11.363636
dtype: float64

3.2.10 Calculating Standard Deviation


DataFrame.std() returns the standard deviation of the
values. Standard deviation is calculated as the square
root of the variance.
>>> df[['Maths','Science','S.
St','Hindi','Eng']].std()

Maths 3.980064
Science 2.667140
S.St 3.146667
Hindi 3.157483
Eng 3.370999
dtype: float64
DataFrame.describe() function displays the
descriptive statistical values in a single command. These
values help us describe a set of data in a DataFrame.
>>> df.describe()
UT Maths Science S.St Hindi Eng
count 12.000000 12.000000 12.00000 12.000000 12.000000 12.000000
mean 2.000000 19.250000 19.75000 20.416667 21.833333 20.500000
std 0.852803 3.980064 2.66714 3.146667 3.157483 3.370999
min 1.000000 12.000000 15.00000 15.000000 15.000000 13.000000
25% 1.000000 16.500000 18.00000 18.750000 20.750000 19.750000
50% 2.000000 20.500000 19.50000 20.000000 22.500000 21.500000
75% 3.000000 22.250000 21.25000 22.500000 24.000000 23.000000
max 3.000000 24.000000 25.00000 25.000000 25.000000 24.000000

3.3 Data Aggregations


Aggregation means to transform the dataset and produce
a single numeric value from an array. Aggregation can
be applied to one or more columns together. Aggregate
functions are max(),min(), sum(), count(), std(), var().

>>> df.aggregate('max')

Name Zuhaire # displaying the maximum of Name


as well
UT 3
Maths 24

2024-25

Chapter 3.indd 75 11/26/2020 12:46:04 PM


76 Informatics Practices

Notes Science 25
S.St 25
Hindi 25
Eng 24
dtype: object

#To use multiple aggregate functions in a


single statement
>>> df.aggregate(['max','count'])

Name UT Maths Science S.St Hindi Eng


max Zuhaire 3 24 25 25 25 24
count 12 12 12 12 12 12 12

>>> df['Maths'].aggregate(['max','min'])
max 24
min 12
Name: Maths, dtype: int64
Note: We can also use the parameter axis with
aggregate function. By default, the value of axis is zero,
means columns.
#Using the above statement with axis=0 gives
the same result
>>> df['Maths'].aggregate(['max','min'],axis=0)
max 24
min 12
Name: Maths, dtype: int64

#Total marks of Maths and Science obtained by


each student.
#Use sum() with axis=1 (Row-wise summation)
>>> df[['Maths','Science']].
aggregate('sum',axis=1)
0 43
1 41
2 33
3 37
4 38
5 40
6 42
7 46
8 37
9 37
10 39
11 35
dtype: int64

2024-25

Chapter 3.indd 76 11/26/2020 12:46:04 PM


Data Handling using Pandas - II 77

3.4 Sorting a DataFrame


Sorting refers to the arrangement of data elements in
a specified order, which can either be ascending or
descending. Pandas provide sort_values() function to
sort the data values of a DataFrame. The syntax of the
function is as follows:
DataFrame.sort_values(by, axis=0, ascending=True)

Here, a column list (by), axis arguments (0 for rows


and 1 for columns) and the order of sorting (ascending
= False or True) are passed as arguments. By default,
sorting is done on row indexes in ascending order.
Consider a scenario, where the teacher is interested
in arranging a list according to the names of the students
or according to marks obtained in a particular subject.
In such cases, sorting can be used to obtain the desired
results. Following is the python code for sorting the data
in the DataFrame created at program 3.1.
To sort the entire data on the basis of attribute
‘Name’, we use the following command:
#By default, sorting is done in ascending order.
>>> print(df.sort_values(by=['Name']))

Name UT Maths Science S.St Hindi Eng


6 Ashravy 1 23 19 20 15 22
7 Ashravy 2 24 22 24 17 21
8 Ashravy 3 12 25 19 21 23
9 Mishti 1 15 22 25 22 22
10 Mishti 2 18 21 25 24 23
11 Mishti 3 17 18 20 25 20
0 Raman 1 22 21 18 20 21
1 Raman 2 21 20 17 22 24
2 Raman 3 14 19 15 24 23
3 Zuhaire 1 20 17 22 24 19
4 Zuhaire 2 23 15 21 25 15
5 Zuhaire 3 22 18 19 23 13
Now, to obtain sorted list of marks scored by all
students in Science in Unit Test 2, the following code
can be used:
# Get the data corresponding to Unit Test 2
>>> dfUT2 = df[df.UT == 2]
# Sort according to ascending order of marks in
Science

2024-25

Chapter 3.indd 77 11/26/2020 12:46:04 PM


78 Informatics Practices

>>> print(dfUT2.sort_values(by=['Science']))

Name UT Maths Science S.St Hindi Eng


4 Zuhaire 2 23 15 21 25 15
1 Raman 2 21 20 17 22 24
10 Mishti 2 18 21 25 24 23
7 Ashravy 2 24 22 24 17 21

Program 3-9 Write the statement which will sort the


marks in English in the DataFrame df
based on Unit Test 3, in descending order.
# Get the data corresponding to Unit Test 3
>>> dfUT3 = df[df.UT == 3]
# Sort according to descending order of marks in
Science
>>> print(dfUT3.sort_values(by=['Eng'],ascending=F
alse))

Name UT Maths Science S.St Hindi Eng


2 Raman 3 14 19 15 24 23
8 Ashravy 3 12 25 19 21 23
11 Mishti 3 17 18 20 25 20
5 Zuhaire 3 22 18 19 23 13
A DataFrame can be sorted based on multiple
columns. Following is the code of sorting the DataFrame
df based on marks in Science in Unit Test 3 in ascending
order. If marks in Science are the same, then sorting
will be done on the basis of marks in Hindi.
# Get the data corresponding to marks in Unit Test
3
>>> dfUT3 = df[df.UT == 3]
# Sort the data according to Science and then
according to Hindi
>>> print(dfUT3.sort_
values(by=['Science','Hindi']))

Name UT Maths Science S.St Hindi Eng


5 Zuhaire 3 22 18 19 23 13
11 Mishti 3 17 18 20 25 20
2 Raman 3 14 19 15 24 23
8 Ashravy 3 12 25 19 21 23
Here, we can see that the list is sorted on the basis
of marks in Science. Two students namely, Zuhaire and
Mishti have equal marks (18) in Science. Therefore for
them, sorting is done on the basis of marks in Hindi.

2024-25

Chapter 3.indd 78 11/26/2020 12:46:04 PM


Data Handling using Pandas - II 79

3.5 GROUP BY Functions Notes


In pandas, DataFrame.GROUP BY() function is used
to split the data into groups based on some criteria.
Pandas objects like a DataFrame can be split on any
of their axes. The GROUP BY function works based on
a split-apply-combine strategy which is shown below
using a 3-step process:
Step 1: Split the data into groups by creating a GROUP
BY object from the original DataFrame.
Step 2: Apply the required function.
Step 3: Combine the results to form a new DataFrame.
To understand this better, let us consider the data
shown in the diagram given below. Here, we have a two-
column DataFrame (key, data). We need to find the sum
of the data column for a particular key, i.e. sum of all
the data elements with key A, B and C, respectively. To
do so, we first split the entire DataFrame into groups
by key column. Then, we apply the sum function on the
respective groups. Finally, we combine the results to
form a new DataFrame that contains the desired result.
split Apply Combine
key data
A 0
A 0
A 5 Sum
B 5
A 10
C 10
A 5 B 5 A 15
Sum
B 10 B 10 B 30
C 15 B 15 C 45
A 10
C 10
B 15 Sum
C 15
C 20
C 20
Figure 3.1: A DataFrame with two columns
The following statements show how to apply GROUP
BY() function on our DataFrame df created at Program
3.1:
#Create a GROUP BY Name of the student from
DataFrame df
>>> g1=df.GROUP BY('Name')

2024-25

Chapter 3.indd 79 11/26/2020 12:46:04 PM


80 Informatics Practices

Notes
#Displaying the first entry from each group
>>> g1.first()
UT Maths Science S.St Hindi Eng
Name
Ashravy 1 23 19 20 15 22
Mishti 1 15 22 25 22 22
Raman 1 22 21 18 20 21
Zuhaire 1 20 17 22 24 19

#Displaying the size of each group


>>> g1.size()
Name
Ashravy 3
Mishti 3
Raman 3
Zuhaire 3
dtype: int64

#Displaying group data, i.e., group_name, row


indexes corresponding to the group and their
data type
>>> g1.groups
{'Ashravy': Int64Index([6, 7, 8],
dtype='int64'),
'Mishti': Int64Index([9, 10, 11],
dtype='int64'),
'Raman': Int64Index([0, 1, 2], dtype='int64'),
'Zuhaire': Int64Index([3, 4, 5],
dtype='int64')}

#Printing data of a single group


>>> g1.get_group('Raman')
UT Maths Science S.St Hindi Eng
0 1 22 21 18 20 21
1 2 21 20 17 22 24
2 3 14 19 15 24 23

#Grouping with respect to multiple attributes


#Creating a GROUP BY Name and UT

>>> g2=df.GROUP BY(['Name', 'UT'])

>>> g2.first()

2024-25

Chapter 3.indd 80 11/26/2020 12:46:04 PM


Data Handling using Pandas - II 81

Maths Science S.St Hindi Eng Notes


Name UT
Ashravy 1 23 19 20 15 22
2 24 22 24 17 21
3 12 25 19 21 23
Mishti 1 15 22 25 22 22
2 18 21 25 24 23
3 17 18 20 25 20
Raman 1 22 21 18 20 21
2 21 20 17 22 24
3 14 19 15 24 23
Zuhaire 1 20 17 22 24 19
2 23 15 21 25 15
3 22 18 19 23 13
The above statements show how we create groups by
splitting a DataFrame using GROUP BY(). Next step is
to apply functions over the groups just created. This is
done using Aggregation.
Aggregation is a process in which an aggregate
function is applied on each group created by GROUP
BY(). It returns a single aggregated statistical value
corresponding to each group. It can be used to apply
multiple functions over an axis. Be default, functions
are applied over columns. Aggregation can be performed
using agg() or aggregate() function.

#Calculating average marks scored by all


students in each subject for each UT
>>> df.GROUP BY(['UT']).aggregate('mean')

Maths Science S.St Hindi Eng


UT
1 20.00 19.75 21.25 20.25 21.00
2 21.50 19.50 21.75 22.00 20.75
3 16.25 20.00 18.25 23.25 19.75

#Calculate average marks scored in Maths in


each UT
>>> group1=df.GROUP BY(['UT'])
>>> group1['Maths'].aggregate('mean')
UT
1 20.00
2 21.50
3 16.25
Name: Maths, dtype: float64

2024-25

Chapter 3.indd 81 11/26/2020 12:46:04 PM


82 Informatics Practices

Program 3-10 Write the python statements to print the


mean, variance, standard deviation and
quartile of the marks scored in Mathematics
by each student across the UTs.

>>> df.GROUP BY(by='Name')['Maths'].agg(['mean','v


ar','std','quantile'])

mean var std quantile


Name
Activity 3.5 Ashravy 19.666667 44.333333 6.658328 23.0
Write the python Mishti 16.666667 2.333333 1.527525 17.0
statements to print Raman 19.000000 19.000000 4.358899 21.0
average marks in Zuhaire21.666667 2.333333 1.527525 22.0
Science by all the
students in each UT.
3.6 Altering the Index
We use indexing to access the elements of a DataFrame.
It is used for fast retrieval of data. By default, a numeric
index starting from 0 is created as a row index, as shown
below:
>>> df #With default Index
Name UT Maths Science S.St Hindi Eng
0 Raman 1 22 21 18 20 21
1 Raman 2 21 20 17 22 24
2 Raman 3 14 19 15 24 23
3 Zuhaire 1 20 17 22 24 19
4 Zuhaire 2 23 15 21 25 15
5 Zuhaire 3 22 18 19 23 13
6 Ashravy 1 23 19 20 15 22
7 Ashravy 2 24 22 24 17 21
8 Ashravy 3 12 25 19 21 23
9 Mishti 1 15 22 25 22 22
10 Mishti 2 18 21 25 24 23
11 Mishti 3 17 18 20 25 20

Here, the integer number in the first column


starting from 0 is the index. However, depending on our
requirements, we can select some other column to be
the index or we can add another index column.
When we slice the data, we get the original index
which is not continuous, e.g. when we select marks of
all students in Unit Test 1, we get the following result:
>>> dfUT1 = df[df.UT == 1]
>>> print(dfUT1)

2024-25

Chapter 3.indd 82 11/26/2020 12:46:04 PM


Data Handling using Pandas - II 83

Name UT Maths Science S.St Hindi Eng


0 Raman 1 22 21 18 20 21
3 Zuhaire 1 20 17 22 24 19
6 Ashravy 1 23 19 20 15 22
9 Mishti 1 15 22 25 22 22
Notice that the first column is a non-continuous
index since it is slicing of original data. We create a new
continuous index alongside this using the reset_index()
function, as shown below:
>>> dfUT1.reset_index(inplace=True)
>>> print(dfUT1)
index Name UT Maths Science S.St Hindi Eng
0 0 Raman 1 22 21 18 20 21
1 3 Zuhaire 1 20 17 22 24 19
2 6 Ashravy 1 23 19 20 15 22
3 9 Mishti 1 15 22 25 22 22
A new continuous index is created while the original
one is also intact. We can drop the original index by
using the drop function, as shown below:
>>> dfUT1.drop(columns=[‘index’],inplace=True)
>>> print(dfUT1)

Name UT Maths Science S.St Hindi Eng


0 Raman 1 22 21 18 20 21
1 Zuhaire 1 20 17 22 24 19
2 Ashravy 1 23 19 20 15 22
3 Mishti 1 15 22 25 22 22
We can change the index to some other column of
the data.
>>> dfUT1.set_index('Name',inplace=True)
>>> print(dfUT1)
UT Maths Science S.St Hindi Eng
Name
Raman 1 22 21 18 20 21
Zuhaire 1 20 17 22 24 19
Ashravy 1 23 19 20 15 22
Mishti 1 15 22 25 22 22
We can revert back to previous index by using
following statement:

>>> dfUT1.reset_index('Name', inplace = True)


>>> print(dfUT1)

2024-25

Chapter 3.indd 83 11/26/2020 12:46:04 PM


84 Informatics Practices

Name UT Maths Science S.St Hindi Eng


0 Raman 1 22 21 18 20 21
1 Zuhaire 1 20 17 22 24 19
2 Ashravy 1 23 19 20 15 22
3 Mishti 1 15 22 25 22 22

3.7 Other DataFrame Operations


In this section, we will learn more techniques and
functions that can be used to manipulate and analyse
data in a DataFrame.
3.7.1 Reshaping Data
The way a dataset is arranged into rows and columns is
referred to as the shape of data. Reshaping data refers
to the process of changing the shape of the dataset
to make it suitable for some analysis problems. The
example given in the below section explains the utility
of reshaping the data.
For reshaping data, two basic functions are available
in Pandas, pivot and pivot_table. This section covers
them in detail.
(A) Pivot
The pivot function is used to reshape and create a new
DataFrame from the original one. Consider the following
example of sales and profit data of four stores: S1, S2,
S3 and S4 for the years 2016, 2017 and 2018.
Example 3.1
>>> import pandas as pd

>>> data={'Store':['S1','S4','S3','S1','S2','S3
','S1','S2','S3'], 'Year':[2016,2016,2016,2017
,2017,2017,2018,2018,2018],
'Total_sales(Rs)':[12000,330000,420000,
20000,10000,450000,30000, 11000,89000],
'Total_profit(
Rs)':[1100,5500,21000,32000,9000,45000,3000,
1900,23000]
}

>>> df=pd.DataFrame(data)
>>> print(df)

Store Year Total_sales(Rs) Total_profit(Rs)
0 S1 2016 12000 1100
1 S4 2016 330000 5500
2 S3 2016 420000 21000

2024-25

Chapter 3.indd 84 11/26/2020 12:46:04 PM


Data Handling using Pandas - II 85

3 S1 2017 20000 32000


4 S2 2017 10000 9000
5 S3 2017 450000 45000
6 S1 2018 30000 3000
7 S2 2018 11000 1900
8 S3 2018 89000 23000
Let us try to answer the following queries on the
above data.
1) What was the total sale of store S1 in all the years?
Python statements to perform this task will be
as follows:

# will get the data related to store S1


>>> S1df = df[df.Store==’S1’]
#find the total of sales for Store S1
>>> S1df[‘Total_sales(Rs)’].sum()
62000
2) What is the maximum sale value by store S3 in
any year?

#will get the data related to store S3


>>> S3df = df[df.Store==’S3’]
#find the maximum sale for Store S3
>>> S3df[‘Total_sales(Rs)’].max()
450000
3) Which store had the maximum total sale in all
the years?
>>> S1df = df[df.Store=='S1']
>>> S2df=df[df.Store == 'S2']
>>> S3df = df[df.Store=='S3']
>>> S4df = df[df.Store=='S4']
>>> S1total = S1df['Total_sales(Rs)'].sum()
>>> S2total = S2df['Total_sales(Rs)'].sum()
>>> S3total = S3df['Total_sales(Rs)'].sum()
>>> S4total = S4df['Total_sales(Rs)'].sum()
>>> max(S1total,S2total,S3total,S4total)
959000
Notice that we have to slice the data corresponding to
a particular store and then answer the query. Now, let
us reshape the data using pivot and see the difference.
>>>
pivot1=df.pivot(index='Store',columns='Year',va
lues='Total_sales(Rs)')

2024-25

Chapter 3.indd 85 11/26/2020 12:46:04 PM


86 Informatics Practices

Here, Index specifies the columns that will be acting


as an index in the pivot table, columns specifies the
new columns for the pivoted data and values specifies
columns whose values will be displayed. In this
particular case, store names will act as index, year
will be the headers for columns and sales value will be
displayed as values of the pivot table.
>>> print(pivot1)

Year 2016 2017 2018
Store
S1 12000.0 20000.0 30000.0
S2 NaN 10000.0 11000.0
S3 420000.0 450000.0 89000.0
S4 330000.0 NaN NaN
As can be seen above, the value of Total_sales (Rs)
for every row in the original table has been transferred
to the new table: pivot1, where each row has data of a
store and each column has data of a year. Those cells in
the new pivot table which do not have a matching entry
in the original one are filled with NaN. For instance, we
did not have values corresponding to sales of Store S2
in 2016, thus the appropriate cell in pivot1 is filled with
NaN.
Now the python statements for the above queries will
be as follows:
1) What was the total sale of store S1 in all the years?
>>> pivot1.loc[‘S1’].sum()
2) What is the maximum sale value by store S3 in
any year?
>>> pivot1.loc[‘S3’].max()
3) Which store had the maximum total sale?
>>> S1total = pivot1.loc['S1'].sum()
>>> S2total = pivot1.loc['S2'].sum()
>>> S3total = pivot1.loc['S3'].sum()
>>> S4total = pivot1.loc['S4'].sum()
>>> max(S1total,S2total,S3total,S4total)
Activity 3.6
We can notice that reshaping has transformed the
Consider the data of structure of the data, which makes it more readable
unit test marks given and easy to analyse the data.
at program 3.1, write
the python statements (B) Pivoting by Multiple Columns
to print name wise UT For pivoting by multiple columns, we need to specify
marks in mathematics. multiple column names in the values parameter of

2024-25

Chapter 3.indd 86 11/26/2020 12:46:04 PM


Data Handling using Pandas - II 87

pivot() function. If we omit the values parameter, it will


display the pivoting for all the numeric values.
>>> pivot2=df.pivot(index='Store',columns='Year
',values=['Total_sales(Rs)','Total_profit(Rs)'])
>>> print(pivot2)
Total_sales(Rs) Total_profit(Rs)
Year 2016 2017 2018 2016 2017 2018
Store
S1 12000.0 20000.0 30000.0 1100.0 32000.0 3000.0
S2 NaN 10000.0 11000.0 NaN 9000.0 1900.0
S3 330000.0 NaN NaN 5500.0 NaN NaN
Let us consider another example, where suppose we
have stock data corresponding to a store as:
>>> data={'Item':['Pen','Pen','Pencil','Pencil'
,'Pen','Pen'],
'Color':['Red','Red','Black','Black','Blue','B
lue'],
'Price(Rs)':[10,25,7,5,50,20],
'Units_in_stock':[50,10,47,34,55,14]
}
>>> df=pd.DataFrame(data)
>>> print(df)
Item Color Price(Rs) Units_in_stock
0 Pen Red 10 50
1 Pen Red 25 10
2 Pencil Black 7 47
3 Pencil Black 5 34
4 Pen Blue 50 55
5 Pen Blue 20 14
Now, let us assume, we have to reshape the above
table with Item as the index and Color as the column.
We will use pivot function as given below:
>>> pivot3=df.pivot(index='Item',columns='Color
',values='Units_in_stock')
But this statement results in an error: “ValueError:
Index contains duplicate entries, cannot reshape”. This
is because duplicate data can’t be reshaped using pivot
function. Hence, before calling the pivot() function, we
need to ensure that our data do not have rows with
duplicate values for the specified columns. If we can’t
ensure this, we may have to use pivot_table function
instead.

2024-25

Chapter 3.indd 87 11/26/2020 12:46:04 PM


88 Informatics Practices

(C) Pivot Table


It works like a pivot function, but aggregates the values
from rows with duplicate entries for the specified
columns. In other words, we can use aggregate functions
like min, max, mean etc, wherever we have duplicate
entries. The default aggregate function is mean.
Syntax:
pandas.pivot_table(data, values=None,
index=None, columns=None, aggfunc='mean')
The parameter aggfunc can have values among sum,
max, min, len, np.mean, np.median.
We can apply index to multiple columns if we don't
have any unique column to act as index.
>>> df1 = df.pivot_
table(index=['Item','Color'])
>>> print(df1)
Price(Rs) Units_in_stock
Item Color
Pen Blue 35.0 34.5
Red 17.5 30.0
Pencil Black 6.0 40.5

Please note that mean has been used as the default


aggregate function. Price of the blue pen in the original
data is 50 and 20. Mean has been used as aggregate
and the price of the blue pen is 35 in df1.
We can use multiple aggregate functions on the
data. Below example shows the use of the sum, max
and np.mean function.
>>> pivot_table1=df.pivot_table(index='
Item',columns='Color',values='Units_in_
stock',aggfunc=[sum,max,np.mean])

>>> pivot_table1
sum max mean
Color Black Blue Red Black Blue Red Black Blue Red
Item
Pen NaN 69.0 60.0 NaN 55.0 50.0 NaN 34.5 30.0
Pencil 81.0 NaN NaN 47.0 NaN NaN 40.5 NaN NaN
Pivoting can also be done on multiple columns.
Further, different aggregate functions can be applied on
different columns. The following example demonstrates
pivoting on two columns - Price(Rs) and Units_in_stock.
Also, the application of len() function on the column

2024-25

Chapter 3.indd 88 11/26/2020 12:46:04 PM


Data Handling using Pandas - II 89

Price(Rs) and mean() function of column Units_in_ Notes


stock is shown in the example. Note that the aggregate
function len returns the number of rows corresponding
to that entry.
>>> pivot_table1=df.pivot_table(index='Item'
,columns='Color',values=['Price(Rs)','Units_
in_stock'],aggfunc={"Price(Rs)":len,"Units_in_
stock":np.mean})

>>> pivot_table1
Price(Rs) Units_in_stock
Color Black Blue Red Black Blue Red
Item
Pen NaN 2.0 2.0 NaN 34.5 30.0
Pencil 2.0 NaN NaN 40.5 NaN NaN
Program 3-11 Write the statement to print the maximum
price of pen of each color.

>>> dfpen=df[df.Item=='Pen']
>>> pivot_redpen=dfpen.pivot_table(index='Item'
,columns=['Color'],values=['Price(Rs)'],aggfun
c=[max])
>>> print(pivot_redpen)

max
Price(Rs)
Color Blue Red
Item
Pen 50 25

3.8 Handling Missing Values


As we know that a DataFrame can consist of many rows
(objects) where each row can have values for various
columns (attributes). If a value corresponding to a
column is not present, it is considered to be a missing
value. A missing value is denoted by NaN.
In the real world dataset, it is common for an object
to have some missing attributes. There may be several
reasons for that. In some cases, data was not collected
properly resulting in missing data e.g some people did
not fill all the fields while taking the survey. Sometimes,
some attributes are not relevant to all. For example, if
a person is unemployed then salary attribute will be
irrelevant and hence may not have been filled up.

2024-25

Chapter 3.indd 89 11/26/2020 12:46:04 PM


90 Informatics Practices

Notes Missing values create a lot of problems during data


analysis and have to be handled properly. The two
most common strategies for handling missing values
explained in this section are:
i) drop the object having missing values,
ii) fill or estimate the missing value
Let us refer to the previous case study given at table
3.1. Suppose, the students have now appeared for
Unit Test 4 also. But, Raman could not appear for the
Science, Maths and English tests, and suppose there
is no possibility of a re-test. Therefore, marks obtained
by him corresponding to these subjects will be missing.
The dataset after Unit Test 4 is as shown at Table 3.2.
Note that the attributes ‘Science, ‘Maths’ and ‘English’
have missing values in Unit Test 4 for Raman.
Table 3.2 Case study data after UT4
Result
Name/ Unit Maths Science S.St. Hindi Eng
Subjects Test
Raman 1 22 21 18 20 21
Raman 2 21 20 17 22 24
Raman 3 14 19 15 24 23
Raman 4 19 18
Zuhaire 1 20 17 22 24 19
Zuhaire 2 23 15 21 25 15
Zuhaire 3 22 18 19 23 13
Zuhaire 4 19 20 17 19 16
Aashravy 1 23 19 20 15 22
Aashravy 2 24 22 24 17 21
Aashravy 3 12 25 19 21 23
Aashravy 4 15 20 20 20 17
Mishti 1 15 22 25 22 22
Mishti 2 18 21 25 24 23
Mishti 3 17 18 20 25 20
Mishti 4 14 20 19 20 18

To calculate the final result, teachers are asked to


submit the percentage of marks obtained by all students.
In the case of Raman, the Maths teacher decides to
compute the marks obtained in 3 tests and then find the
percentage of marks from the total score of 75 marks.
In a way, she decides to drop the marks of Unit Test 4.
However, the English teacher decides to give the same

2024-25

Chapter 3.indd 90 11/26/2020 12:46:04 PM


Data Handling using Pandas - II 91

marks to Raman in the 4th test as scored in the 3rd


test. Science teacher decides to give Raman zero marks
in the 4th test and then computes the percentage of
marks obtained. Following sections explain the code
for checking missing values and the code for replacing
those missing values with appropriate values.
3.8.1 Checking Missing Values
Pandas provide a function isnull() to check whether any
value is missing or not in the DataFrame. This function
checks all attributes and returns True in case that
attribute has missing values, otherwise returns False.
The following code stores the data of marks of all
the Unit Tests in a DataFrame and checks whether the
DataFrame has missing values or not.
>>> marksUT = {
'Name':['Raman','Raman','Raman','Raman','Zuhaire','Zuhaire','Zuhaire'
,'Zuhaire','Ashravy','Ashravy','Ashravy','Ashravy','Mishti','Mishti',
'Mishti','Mishti'],
'UT':[1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4],
'Maths':[22,21,14,np.NaN,20,23,22,19,23,24,12,15,15,18,17,14],
'Science':[21,20,19,np.NaN,17,15,18,20,19,22,25,20,22,21,18,20],
'S.St':[18,17,15,19,22,21,19,17,20,24,19,20,25,25,20,19],
'Hindi':[20,22,24,18,24,25,23,21, 15,17,21,20,22,24,25,20],
'Eng':[21,24,23,np.NaN,19,15,13,16,22,21,23,17,22,23,20,18] }
>>> df = pd.DataFrame(marksUT)
>>> print(df.isnull())
Output of the above code will be
Name UT Maths Science S.St Hindi Eng
0 False False False False False False False
1 False False False False False False False
2 False False False False False False False
3 False False True True False False True
4 False False False False False False False
5 False False False False False False False
6 False False False False False False False
7 False False False False False False False
8 False False False False False False False
9 False False False False False False False
10 False False False False False False False
11 False False False False False False False
12 False False False False False False False
13 False False False False False False False
14 False False False False False False False
15 False False False False False False False

2024-25

Chapter 3.indd 91 11/26/2020 12:46:04 PM


92 Informatics Practices

Notes One can check for each individual attribute also,


e.g. the following statement checks whether attribute
‘Science’ has a missing value or not. It returns True for
each row where there is a missing value for attribute
‘Science’, and False otherwise.
>>> print(df['Science'].isnull())
0 False
1 False
2 False
3 True
4 False
5 False
6 False
7 False
8 False
9 False
10 False
11 False
12 False
13 False
14 False
15 False
Name: Science, dtype: bool
To check whether a column (attribute) has a missing
value in the entire dataset, any() function is used. It
returns True in case of missing value else returns False.
>>> print(df.isnull().any())
Name False
UT False
Maths True
Science True
S.St False
Hindi False
Eng True
dtype: bool

The function any() can be used for a particular


attribute also. The following statements) returns True
in case an attribute has a missing value else it returns
False.
>>> print(df['Science'].isnull().any())
True

2024-25

Chapter 3.indd 92 11/26/2020 12:46:05 PM


Data Handling using Pandas - II 93

>>> print(df['Hindi'].isnull().any()) Notes


False

To find the number of NaN values corresponding to


each attribute, one can use the sum() function along
with isnull() function, as shown below:
>>> print(df.isnull().sum())
Name 0
UT 0
Maths 1
Science 1
S.St 0
Hindi 0
Eng 1
dtype: int64

To find the total number of NaN in the whole dataset,


one can use df.isnull().sum().sum().
>>> print(df.isnull().sum().sum())
3

Program 3-12 Write a program to find the percentage of


marks scored by Raman in hindi.
>>> dfRaman = df[df['Name']=='Raman']
>>> print('Marks Scored by Raman \n\n',dfRaman)

Marks Scored by Raman


Name UT Maths Science S.St Hindi Eng
0 Raman 1 22.0 21.0 18 20 21.0
1 Raman 2 21.0 20.0 17 22 24.0
2 Raman 3 14.0 19.0 15 24 23.0
3 Raman 4 NaN NaN 19 18 NaN

>>> dfHindi = dfRaman['Hindi']


>>> print("Marks Scored by Raman in Hindi
\n\n",dfHindi)

Marks Scored by Raman in Hindi


0 20
1 22
2 24
3 18
Name: Hindi, dtype: int64
>>> row = len(dfHindi) # Number of Unit Tests
held. Here row will be 4

2024-25

Chapter 3.indd 93 11/26/2020 12:46:05 PM


94 Informatics Practices

Notes >>> print("Percentage of Marks Scored by Raman


in Hindi\n\n",(dfHindi.sum()*100)/(25*row),"%")

# denominator in the above formula represents


the aggregate of marks of all tests. Here row
is 4 tests and 25 is maximum marks for one test

Percentage of Marks Scored by Raman in Hindi


84.0 %

Program 3-13 Write a python program to find the


percentage of marks obtained by Raman
in Maths subject.

>>> dfMaths = dfRaman['Maths']


>>> print("Marks Scored by Raman in Maths
\n\n",dfMaths)
Marks Scored by Raman in Maths
0 22.0
1 21.0
2 14.0
3 NaN
Name: Maths, dtype: float64
>>> row = len(dfMaths) # here, row will be 4,
the number of Unit Tests
>>> print("Percentage of Marks Scored by Raman
in Maths\n\n", dfMaths.sum()*100/(25*row),"%")

Percentage of Marks Scored by Raman in Maths


57%
Here, notice that Raman was absent in Unit Test 4 in
Maths Subject. While computing the percentage, marks
of the fourth test have been considered as 0.
3.8.2 Dropping Missing Values
Missing values can be handled by either dropping the
entire row having missing value or replacing it with
appropriate value.
Dropping will remove the entire row (object) having
the missing value(s). This strategy reduces the size of
the dataset used in data analysis, hence should be used
in case of missing values on few objects. The dropna()
function can be used to drop an entire row from the
DataFrame. For example, calling dropna() function on

2024-25

Chapter 3.indd 94 11/26/2020 12:46:05 PM


Data Handling using Pandas - II 95

the previous example will remove the 4th row having


NaN value.

>>> df1 = df.dropna()


>>> print(df1)
Name UT Maths Science S.St Hindi Eng
0 Raman 1 22.0 21.0 18 20 21.0
1 Raman 2 21.0 20.0 17 22 24.0
2 Raman 3 14.0 19.0 15 24 23.0
4 Zuhaire 1 20.0 17.0 22 24 19.0
5 Zuhaire 2 23.0 15.0 21 25 15.0
6 Zuhaire 3 22.0 18.0 19 23 13.0
7 Zuhaire 4 19.0 20.0 17 21 16.0
8 Ashravy 1 23.0 19.0 20 15 22.0
9 Ashravy 2 24.0 22.0 24 17 21.0
10 Ashravy 3 12.0 25.0 19 21 23.0
11 Ashravy 4 15.0 20.0 20 20 17.0
12 Mishti 1 15.0 22.0 25 22 22.0
13 Mishti 2 18.0 21.0 25 24 23.0
14 Mishti 3 17.0 18.0 20 25 20.0
15 Mishti 4 14.0 20.0 19 20 18.0

Now, let us consider the following code:


# marks obtained by Raman in all the unit tests
>>> dfRaman=df[df.Name=='Raman']

# inplace=true makes changes in the #original


DataFrame i.e. dfRaman #here
>>> dfRaman.dropna(inplace=True,how='any')
>>> dfMaths = dfRaman['Maths'] # get the marks
scored in Maths
>>> print("\nMarks Scored by Raman in Maths
\n",dfMaths)
Marks Scored by Raman in Maths
0 22.0
1 21.0
2 14.0
3 NaN
Name: Maths, dtype: float64

>>> row = len(dfMaths)


>>> print("\nPercentage of Marks Scored by
Raman in Maths\n")
>>> print(dfMaths.sum()*100/(25*row),"%")

2024-25

Chapter 3.indd 95 11/26/2020 12:46:05 PM


96 Informatics Practices

Notes Percentage of Marks Scored by Raman in Maths


76.0 %
Note that the number of rows in dfRaman is 3 after
using dropna. Hence percentage is computed from
marks obtained in 3 Unit Tests.
3.8.3 Estimating Missing Values
Missing values can be filled by using estimations or
approximations e.g a value just before (or after) the
missing value, average/minimum/maximum of the
values of that attribute, etc. In some cases, missing
values are replaced by zeros (or ones).
The fillna(num) function can be used to replace
missing value(s) by the value specified in num. For
example, fillna(0) replaces missing value by 0. Similarly
fillna(1) replaces missing value by 1. Following code
replaces missing values by 0 and computes the
percentage of marks scored by Raman in Science.
#Marks Scored by Raman in all the subjects
across the tests
>>> dfRaman = df.loc[df['Name']=='Raman']

>>> (row,col) = dfRaman.shape


>>> dfScience = dfRaman.loc[:,'Science']
>>> print("Marks Scored by Raman in Science
\n\n",dfScience)

Marks Scored by Raman in Science

0 21.0
1 20.0
2 19.0
3 NaN
Name: Science, dtype: float64

>>> dfFillZeroScience = dfScience.fillna(0)


>>> print('\nMarks Scored by Raman in Science
with Missing Values Replaced with Zero\
n',dfFillZeroScience)
Marks Scored by Raman in Science with Missing
Values Replaced with Zero
0 21.0
1 20.0

2024-25

Chapter 3.indd 96 11/26/2020 12:46:05 PM


Data Handling using Pandas - II 97

2 19.0 Notes
3 0.0
Name: Science, dtype: float64

>>> print("Percentage of Marks Scored by Raman


in Science\n\n",dfFillZeroScience.sum()*100/
(25*row),"%")

Percentage of Marks Scored by Raman in Science


60.0 %
df.fillna(method='pad') replaces the missing
value by the value before the missing value while
df.fillna(method='bfill') replaces the missing value by the
value after the missing value. Following code replaces
the missing value in Unit Test 4 of English test by the
marks of Unit Test 3 and then computes the percentage
of marks obtained by Raman.
>>> dfEng = dfRaman.loc[:,'Eng']
>>> print("Marks Scored by Raman in English
\n\n",dfEng)
Marks Scored by Raman in English
0 21.0
1 24.0
2 23.0
3 NaN
Name: Eng, dtype: float64

>>> dfFillPadEng = dfEng.fillna(method='pad')


>>> print('\nMarks Scored by Raman in English
with Missing Values Replaced by Previous Test
Marks\n',dfFillPadEng)

Marks Scored by Raman in English with Missing


Values Replaced by Previous Test Marks
0 21.0
1 24.0
2 23.0
3 23.0
Name: Eng, dtype: float64
>>> print("Percentage of Marks Scored by Raman
in English\n\n")
>>> print(dfFillPadEng.sum()*100/(25*row),"%")
Percentage of Marks Scored by Raman in English
91.0 %
In this section, we have discussed various ways
of handling missing values. Missing value is loss of

2024-25

Chapter 3.indd 97 11/26/2020 12:46:05 PM


98 Informatics Practices

Notes information and replacing missing values by some


estimation will surely change the dataset. In all cases,
data analysis results will not be actual results but will
be a good approximation of actual results.

3.9 Import and Export of Data between Pandas


and MySQL

So far, we have directly entered data and created


a DataFrame and learned how to analyse data in a
DataFrame. However, in actual scenarios, data need
not be typed or copy pasted everytime. Rather, data is
available most of the time in a file (text or csv) or in
a database. Thus, in real-world scenarios, we will be
required to bring data directly from a database and load
to a DataFrame. This is called importing data from a
database. Likewise, after analysis, we will be required to
store data back to a database. This is called exporting
data to a database.
Data from DataFrame can be read from and written
to MySQL database. To do this, a connection is required
with the MySQL database using the pymysql database
driver. And for this, the driver should be installed in the
python environment using the following command:
pip install pymysql

sqlalchemy is a library used to interact with the


MySQL database by providing the required credentials.
This library can be installed using the following
command:
pip install sqlalchemy
Once it is installed, sqlalchemy provides a function
create_engine() that enables this connection to be
established. The string inside the function is known as
connection string. The connection string is composed of
multiple parameters like the name of the database with
which we want to establish the connection, username,
password, host, port number and finally the name of
the database. And, this function returns an engine
object based on this connection string. The syntax for
the same is discussed below:
engine=create_engine('driver://
username:password@host:port/name_of_
database',index=false)

2024-25

Chapter 3.indd 98 11/26/2020 12:46:05 PM


Data Handling using Pandas - II 99

where,
Driver = mysql+pymysql
username=User name of the mysql (normally it is root)
password= Password of the MySql
port = usually we connect to localhost with port number
3306 (Default port number)
Name of the Database = Your database
In the following subsections, importing and exporting
data between Pandas and MySQL applications are
demonstrated. For this, we will use the same database
CARSHOWROOM and Table INVENTORY created in
Chapter 1 of this book.
mysql> use CARSHOWROOM ;
Database changed
mysql> select * from INVENTORY;
+-------+--------+-----------+-----------+-----------------+----------+
| CarId | CarName| Price | Model | YearManufacture | Fueltype |
+-------+--------+-----------+-----------+-----------------+----------+
| D001 | Car1 | 582613.00 | LXI | 2017 | Petrol |
| D002 | Car1 | 673112.00 | VXI | 2018 | Petrol |
| B001 | Car2 | 567031.00 | Sigma1.2 | 2019 | Petrol |
| B002 | Car2 | 647858.00 | Delta1.2 | 2018 | Petrol |
| E001 | Car3 | 355205.00 | 5 STR STD | 2017 | CNG |
| E002 | Car3 | 654914.00 | CARE | 2018 | CNG |
| S001 | Car4 | 514000.00 | LXI | 2017 | Petrol |
| S002 | Car4 | 614000.00 | VXI | 2018 | Petrol |
+-------+--------+-----------+-----------+-----------------+----------+
8 rows in set (0.00 sec)
3.9.1 Importing Data from MySQL to Pandas
Importing data from MySQL to pandas basically refers
to the process of reading a table from MySQL database
and loading it to a pandas DataFrame. After establishing
the connection, in order to fetch data from the table of
the database we have the following three functions:
1) pandas.read_sql_query(query,sql_conn)
It is used to read an sql query (query) into a
DataFrame using the connection identifier (sql_
conn) returned from the create_engine ().
2) pandas.read_sql_table(table_name,sql_conn)
It is used to read an sql table (table_name) into a
DataFrame using the connection identifier (sql_
conn).
3) pandas.read_sql(sql, sql_conn)
It is used to read either an sql query or an sql
table (sql) into a DataFrame using the connection
identifier (sql_conn).

2024-25

Chapter 3.indd 99 11/26/2020 12:46:05 PM


100 Informatics Practices

>>> import pandas as pd


>>> import pymysql as py
>>> import sqlalchemy
>>> engine=create_engine('mysql+pymysql://
root:smsmb@localhost:3306/CARSHOWROOM')
>>> df = pd.read_sql_query('SELECT * FROM
INVENTORY', engine)
>>> print(df)
CarId CarName Price Model YearManufacture Fueltype
0 D001 Car1 582613.00 LXI 2017 Petrol
1 D002 Car1 673112.00 VXI 2018 Petrol
2 B001 Car2 567031.00 Sigma1.2 2019 Petrol
3 B002 Car2 647858.00 Delta1.2 2018 Petrol
4 E001 Car3 355205.00 5STR STD 2017 CNG
5 E002 Car3 654914.00 CARE 2018 CNG
6 S001 Car4 514000.00 LXI 2017 Petrol
7 S002 Car4 614000.00 VXI 2018`` Petrol
3.9.2 Exporting Data from Pandas to MySQL
Exporting data from Pandas to MySQL basically refers
to the process of writing a pandas DataFrame to a table
of MySQL database. For this purpose, we have the
following function:
pandas.DataFrame.to_sql(table,sql_conn,if_
exists=”fail”,index=False/True)
• Table specifies the name of the table in which we
want to create or append DataFrame values. It is
used to write the specified DataFrame to the table
the connection identifier (sql_conn) returned from the
create_engine ().
• The parameter if_exists specifies “the way data from
the DataFrame should be entered in the table. It
can have the following three values: “fail”, “replace”,
“append”.
οο “fail” is the default value that indicates a
ValueError if the table already exists in the
database.
οο “replace” specifies that the previous content of
the table should be updated by the contents of
the DataFrame.
οο “append” specifies that the contents of the
DataFrame should be appended to the existing
table and when updated the format must be the
same (column name sequences).

2024-25

Chapter 3.indd 100 11/26/2020 12:46:05 PM


Data Handling using Pandas - II 101

• Index — By default index is True means DataFrame Notes


index will be copied to MySQL table. If False, then it
will ignore the DataFrame indexing.
#Code to write DataFrame df to database

>>> import pandas as pd


>>> import pymysql as py
>>> import sqlalchemy
>>> engine=create_engine('mysql+pymysql://
root:smsmb@localhost:3306/CARSHOWROOM')
>>> data={
'ShowRoomId':[1,2,3,4,5],
‘Location':[‘Delhi','Bangalore','Mumbai','Chand
igarh','Kerala']}

>>> df=pd.DataFrame(data)
>>> df.to_sql('showroom_info',engine,if_
exists="replace",index=False)
After running this python script, a mysql table
with the name “showroom_info” will be created in the
database.

S ummary
• Descriptive Statistics are used to quantitatively
summarise the given data.
• Pandas provide many statistical functions for
analysis of data. Some of the functions are max(),
min(), mean(), median(), mode(), std(), var() etc.
• Sorting is used to arrange data in a specified
order, i.e. either ascending or descending.
• Indexes or labels of a row or column can be
changed in a DataFrame. This process is known
as Altering the index. Two functions reset_index
and set_index are used for that purpose.
• Missing values are a hindrance in data analysis
and must be handled properly.
• There are primarily two main strategies for
handling missing data. Either the row (or column)
having missing value is removed completely from
analysis or missing value is replaced by some

2024-25

Chapter 3.indd 101 11/26/2020 12:46:05 PM


102 Informatics Practices

Notes
appropriate value (which may be zero or one or
average etc.)
• Process of changing the structure of the DataFrame
is known as Reshaping. Pandas provide two basic
functions for this, pivot() and pivot_table().
• pymysql and sqlalchemy are two mandatory
libraries for facilitating import and export of data
between Pandas and MySQL. Before import and
export, a connection needs to be established from
python script to MySQL database.
• Importing data from MySQL to Panda refers to
the process of fetching data from a MySQL table
or database to a pandas DataFrame.
• Exporting data from Pandas to MySQL refers to the
process of storing data from a pandas DataFrame
to a MySQL table or database.

Exercise
1. Write the statement to install the python connector to
connect MySQL i.e. pymysql.
2. Explain the difference between pivot() and pivot_
table() function?
3. What is sqlalchemy?
4. Can you sort a DataFrame with respect to multiple
columns?
5. What are missing values? What are the strategies to
handle them?
6. Define the following terms: Median, Standard
Deviation and variance.
7. What do you understand by the term MODE? Name
the function which is used to calculate it.
8. Write the purpose of Data aggregation.
9. Explain the concept of GROUP BY with help on an
example.
10. Write the steps required to read data from a MySQL
database to a DataFrame.
11. Explain the importance of reshaping of data with an
example.

2024-25

Chapter 3.indd 102 11/26/2020 12:46:05 PM


Data Handling using Pandas - II 103

12. Why estimation is an important concept in Notes


data analysis?
13. Assuming the given table: Product. Write the python
code for the following:
Item Company Rupees USD

TV LG 12000 700
TV VIDEOCON 10000 650

TV LG 15000 800
AC SONY 14000 750

a) To create the data frame for the above table.


b) To add the new rows in the data frame.
c) To display the maximum price of LG TV.
d) To display the Sum of all products.
e) To display the median of the USD of Sony
products.
f) To sort the data according to the Rupees and
transfer the data to MySQL.
g) To transfer the new dataframe into the MySQL
with new values.
14. Write the python statement for the following question
on the basis of given dataset:

a) To create the above DataFrame.


b) To print the Degree and maximum marks in each
stream.
c) To fill the NaN with 76.
d) To set the index to Name.
e) To display the name and degree wise average
marks of each student.
f) To count the number of students in MBA.
g) To print the mode marks BCA.

2024-25

Chapter 3.indd 103 11/26/2020 12:46:05 PM


104 Informatics Practices

Notes Solved Case Study based on Open Datasets


UCI dataset is a collection of open datasets, available
to the public for experimentation and research
purposes. ‘auto-mpg’ is one such open dataset.
It contains data related to fuel consumption by
automobiles in a city. Consumption is measured in
miles per gallon (mpg), hence the name of the dataset
is auto-mpg. The data has 398 rows (also known as
items or instances or objects) and nine columns
(also known as attributes).

The attributes are: mpg, cylinders, displacement,


horsepower, weight, acceleration, model year, origin,
car name. Three attributes, cylinders, model year
and origin have categorical values, car name is a
string with a unique value for every row, while the
remaining five attributes have numeric value.
The data has been downloaded from the UCI data
repository available at http://archive.ics.uci.edu/
ml/machine-learning-databases/auto-mpg/.
Following are the exercises to analyse the data.
1) Load auto-mpg.data into a DataFrame autodf.
2) Give description of the generated DataFrame
autodf.
3) Display the first 10 rows of the DataFrame
autodf.
4) Find the attributes which have missing values.
Handle the missing values using following two
ways:
i. Replace the missing values by a value before
that.
ii. Remove the rows having missing values from
the original dataset
5) Print the details of the car which gave the
maximum mileage.
6) Find the average displacement of the car given
the number of cylinders.
7) What is the average number of cylinders in a car?
8) Determine the no. of cars with weight greater
than the average weight.

2024-25

Chapter 3.indd 104 11/26/2020 12:46:05 PM


Chapter
Plotting Data using
4 Matplotlib

“Human visual perception is the


“most powerful of data interfaces
between computers and Humans”
— M. McIntyre

In this chapter
»» Introduction
»» Plotting using
Matplotlib
4.1 Introduction »» Customisation of
We have learned how to organise and analyse Plots
data and perform various statistical operations »» The Pandas Plot
on Pandas DataFrames. Likewise, in Class XI, we Function (Pandas
have learned how to analyse numerical data using Visualisation)
NumPy. The results obtained after analysis is used
to make inferences or draw conclusions about data
as well as to make important business decisions.
Sometimes, it is not easy to infer by merely looking
at the results. In such cases, visualisation helps
in better understanding of results of the analysis.
Data visualisation means graphical or pictorial
representation of the data using graph, chart,
etc. The purpose of plotting data is to visualise
variation or show relationships between variables.

2024-25

Chapter 4.indd 105 10/9/2020 12:35:31 PM


106 Informatics Practices

Notes Visualisation also helps to effectively communicate


information to intended users. Traffic symbols,
ultrasound reports, Atlas book of maps, speedometer
of a vehicle, tuners of instruments are few examples
of visualisation that we come across in our daily lives.
Visualisation of data is effectively used in fields like
health, finance, science, mathematics, engineering, etc.
In this chapter, we will learn how to visualise data using
Matplotlib library of Python by plotting charts such
as line, bar, scatter with respect to the various types
of data.

4.2 Plotting using Matplotlib


Matplotlib library is used for creating static, animated,
and interactive 2D- plots or figures in Python. It can
be installed using the following pip command from the
command prompt:
pip install matplotlib
For plotting using Matplotlib, we need to import its
Pyplot module using the following command:
import matplotlib.pyplot as plt
Here, plt is an alias or an alternative name for
matplotlib.pyplot. We can use any other alias also.

Figure 4.1: Components of a plot


The pyplot module of matplotlib contains a collection
of functions that can be used to work on a plot. The
plot() function of the pyplot module is used to create a
figure. A figure is the overall window where the outputs
of pyplot functions are plotted. A figure contains a

2024-25

Chapter 4.indd 106 10/9/2020 12:35:31 PM


Plotting Data using Matplotlib 107

plotting area, legend, axis labels, ticks, title, etc. (Figure Notes
4.1). Each function makes some change to a figure:
example, creates a figure, creates a plotting area in a
figure, plots some lines in a plotting area, decorates the
plot with labels, etc.
It is always expected that the data presented through
charts easily understood. Hence, while presenting data
we should always give a chart title, label the axis of the
chart and provide legend in case we have more than one
plotted data.
To plot x versus y, we can write plt.plot(x,y). The
show() function is used to display the figure created
using the plot() function.
Let us consider that in a city, the maximum temperature
of a day is recorded for three consecutive days. Program
4-1 demonstrates how to plot temperature values for
the given dates. The output generated is a line chart.
Program 4-1 Plotting Temperature against Height

import matplotlib.pyplot as plt


#list storing date in string format
date=["25/12","26/12","27/12"]
#list storing temperature values
temp=[8.5,10.5,6.8]
#create a figure plotting temp versus date
plt.plot(date, temp)
#show the figure
plt.show()

Figure 4.2: Line chart as output of Program 4-1

2024-25

Chapter 4.indd 107 10/9/2020 12:35:32 PM


108 Informatics Practices

In program 4-1, plot() is provided with two parameters,


which indicates values for x-axis and y-axis, respectively.
The x and y ticks are displayed accordingly. As shown
in Figure 4.2, the plot() function by default plots a line
chart. We can click on the save button on the output
window and save the plot as an image. A figure can also
be saved by using savefig() function. The name of the
figure is passed to the function as parameter.
For example: plt.savefig('x.png').
In the previous example, we used plot() function
to plot a line graph. There are different types of data
available for analysis. The plotting methods allow for a
handful of plot types other than the default line plot, as
listed in Table 4.1. Choice of plot is determined by the
type of data we have.
Table 4.1 List of Pyplot functions to plot different charts
plot(\*args[, scalex, scaley, data]) Plot x versus y as lines and/or markers.

bar(x, height[, width, bottom, align, data]) Make a bar plot.

boxplot(x[, notch, sym, vert, whis, ...]) Make a box and whisker plot.

hist(x[, bins, range, density, weights, ...]) Plot a histogram.

pie(x[, explode, labels, colors, autopct, ...]) Plot a pie chart.

scatter(x, y[, s, c, marker, cmap, norm, ...]) A scatter plot of x versus y.

4.3 Customisation of Plots


Pyplot library gives us numerous functions, which can
be used to customise charts such as adding titles or
legends. Some of the customisation options are listed in
Table 4.2:
Table 4.2 List of Pyplot functions to customise plots
grid([b, which, axis]) Configure the grid lines.

legend(\*args, \*\*kwargs) Place a legend on the axes.

savefig(\*args, \*\*kwargs) Save the current figure.

show(\*args, \*\*kw) Display all figures.

title(label[, fontdict, loc, pad]) Set a title for the axes.

xlabel(xlabel[, fontdict, labelpad]) Set the label for the x-axis.

xticks([ticks, labels]) Get or set the current tick locations and labels of the x-axis.

ylabel(ylabel[, fontdict, labelpad]) Set the label for the y-axis.

yticks([ticks, labels]) Get or set the current tick locations and labels of the y-axis.

2024-25

Chapter 4.indd 108 10/9/2020 12:35:32 PM


Plotting Data using Matplotlib 109

Program 4-2 Plotting a line chart of date versus temperature


by adding Label on X and Y axis, and adding a
Title and Grids to the chart.

import matplotlib.pyplot as plt


date=["25/12","26/12","27/12"]
temp=[8.5,10.5,6.8]
plt.plot(date, temp)
plt.xlabel("Date") #add the Label on x-axis
plt.ylabel("Temperature") #add the Label on y-axis
plt.title("Date wise Temperature") #add the title to the chart
plt.grid(True) #add gridlines to the background
plt.yticks(temp)
plt.show()

Think and Reflect


On providing a single
list or array to the
plot() function, can
matplotlib generate
Figure 4.3: Line chart as output of Program 4-2 values for both the x
and y axis?
In the above example, we have used the xlabel, ylabel,
title and yticks functions. We can see that compared
to Figure 4.2, the Figure 4.3 conveys more meaning,
easily. We will learn about customisation of other plots
in later sections.
4.3.1 Marker
We can make certain other changes to plots by passing
various parameters to the plot() function. In Figure
4.3, we plot temperatures day-wise. It is also possible
to specify each point in the line through a marker.

2024-25

Chapter 4.indd 109 10/9/2020 12:35:34 PM


110 Informatics Practices

A marker is any symbol that represents a data value


in a line chart or a scatter plot. Table 4.3 shows a list
of markers along with their corresponding symbol and
description. These markers can be used in program codes:
Table 4.3 Some of the Matplotlib Markers
Marker Symbol Description Marker Symbol Description
“.” Point “8” octagon

“,” Pixel “s” square

“o” Circle “p” pentagon

“v” triangle_down “P” plus (filled)

“^” triangle_up “*” star

“<” triangle_left “h” hexagon1

“>” triangle_right “H” hexagon2

“1” tri_down “+” plus

“2” tri_up “x” x

“3” tri_left “X” x (filled)

“4” tri_right “D” diamond

4.3.2 Colour
It is also possible to format the plot further by changing
the colour of the plotted data. Table 4.4 shows the list of
colours that are supported. We can either use character
codes or the color names as values to the parameter
color in the plot().
Table 4.4 Colour abbreviations for plotting
Character Colour
‘b’ blue
‘g’ green
‘r’ red
‘c’ cyan
‘m’ magenta
‘y’ yellow
‘k’ black
‘w’ white

2024-25

Chapter 4.indd 110 10/9/2020 12:35:34 PM


Plotting Data using Matplotlib 111

4.3.3 Linewidth and Line Style


The linewidth and linestyle property can be used
to change the width and the style of the line chart.
Linewidth is specified in pixels. The default line width
is 1 pixel showing a thin line. Thus, a number greater
than 1 will output a thicker line depending on the
value provided.
We can also set the line style of a line chart using
the linestyle parameter. It can take a string such as
"solid", "dotted", "dashed" or "dashdot". Let us write the
Program 4-3 applying some of the customisations.

Program 4-3 Consider the average heights and weights of


persons aged 8 to 16 stored in the following
two lists:

height = [121.9,124.5,129.5,134.6,139.7,147.3,
152.4, 157.5,162.6]
weight= [19.7,21.3,23.5,25.9,28.5,32.1,35.7,39.6,
43.2]
Let us plot a line chart where:
i. x axis will represent weight
ii. y axis will represent height
iii. x axis label should be “Weight in kg”
iv. y axis label should be “Height in cm”
v. colour of the line should be green
vi. use * as marker
vii. Marker size as10
viii. The title of the chart should be “Average
weight with respect to average height”.
ix. Line style should be dashed
x. Linewidth should be 2.
import matplotlib.pyplot as plt
import pandas as pd
height=[121.9,124.5,129.5,134.6,139.7,147.3,152.4,157.5,162.6]
weight=[19.7,21.3,23.5,25.9,28.5,32.1,35.7,39.6,43.2]
df=pd.DataFrame({"height":height,"weight":weight})
#Set xlabel for the plot
plt.xlabel('Weight in kg')
#Set ylabel for the plot

2024-25

Chapter 4.indd 111 10/9/2020 12:35:34 PM


112 Informatics Practices

plt.ylabel('Height in cm')
#Set chart title:
plt.title('Average weight with respect to average height')
#plot using marker'-*' and line colour as green
plt.plot(df.weight,df.height,marker='*',markersize=10,color='green
',linewidth=2, linestyle='dashdot')
plt.show()
In the above we created the DataFrame using 2 lists,
and in the plot function we have passed the height and
weight columns of the DataFrame. The output is shown
in Figure 4.4.
Continuous data
are measured
while discrete
data are obtained
by counting.
Height, weight
are examples of
continuous data. It
can be in decimals.
Total number
of students in a
class is discrete.
It can never be in
decimals.

Figure 4.4: Line chart showing average weight against average


height

4.4 The Pandas Plot function (Pandas


Visualisation)
In Programs 4-1 and 4-2, we learnt that the plot()
function of the pyplot module of matplotlib can be used
to plot a chart. However, starting from version 0.17.0,
Pandas objects Series and DataFrame come equipped
with their own .plot() methods. This plot() method is just
a simple wrapper around the plot() function of pyplot.
Thus, if we have a Series or DataFrame type object (let's
say 's' or 'df') we can call the plot method by writing:
s.plot() or df.plot()

2024-25

Chapter 4.indd 112 10/9/2020 12:35:35 PM


Plotting Data using Matplotlib 113

The plot() method of Pandas accepts a considerable


number of arguments that can be used to plot a variety
of graphs. It allows customising different plot types by
supplying the kind keyword arguments. The general
syntax is: plt.plot(kind),where kind accepts a string
indicating the type of .plot, as listed in Table 4.5. In
addition, we can use the matplotlib.pyplot methods
and functions also along with the plt() method of
Pandas objects.
Table 4.5 Arguments accepted by kind for different plots

kind = Plot type


line Line plot (default)

bar Vertical bar plot

barh Horizontal bar plot

hist Histogram

box Boxplot
area Area plot
pie Pie plot
scatter Scatter plot

In the previous chapters, we have learned to store


different types of data in a two dimensional format using
DataFrame. In the subsequent sections we will learn to
use plot() function to create various types of charts with
respect to the type of data stored in DataFrames.
4.4.1 Plotting a Line chart
A line plot is a graph that shows the frequency of data Activity 4.1
along a number line. It is used to show continuous Create the MelaSale.
dataset. A line plot is used to visualise growth or decline csv using Python
Pandas containing
in data over a time interval. We have already plotted line data as shown in
charts through Programs 4-1 and 4-2. In this section, Table 4.6.
we will learn to plot a line chart for data stored in a
DataFrame.
Program 4-4 Smile NGO has participated in a three week
cultural mela. Using Pandas, they have stored
the sales (in Rs) made day wise for every week
in a CSV file named “MelaSales.csv”, as shown
in Table 4.6.

2024-25

Chapter 4.indd 113 10/9/2020 12:35:36 PM


114 Informatics Practices

Table 4.6 Day-wise mela sales data

Week 1 Week 2 Week 3

5000 4000 4000

5900 3000 5800

6500 5000 3500

3500 5500 2500

4000 3000 3000

5300 4300 5300

7900 5900 6000

Depict the sales for the three weeks using a Line chart. It
should have the following:
i. Chart title as “Mela Sales Report”.
ii. axis label as Days.
iii. axis label as “Sales in Rs”.
Line colours are red for week 1, blue for week 2 and brown
for week 3.

import pandas as pd
import matplotlib.pyplot as plt
# reads "MelaSales.csv" to df by giving path to the file
df=pd.read_csv("MelaSales.csv")
#create a line plot of different color for each week
df.plot(kind='line', color=['red','blue','brown'])
# Set title to "Mela Sales Report"
plt.title('Mela Sales Report')
# Label x axis as "Days"
plt.xlabel('Days')
# Label y axis as "Sales in Rs"
plt.ylabel('Sales in Rs')
#Display the figure
plt.show()

The Figure 4.5 displays a line plot as output for


Program 4-4. Note that the legend is displayed by default
associating the colours with the plotted data.

2024-25

Chapter 4.indd 114 10/9/2020 12:35:36 PM


Plotting Data using Matplotlib 115

Figure 4.5: Line plot showing mela sales figures

The line plot takes a numeric value to display on


the x axis and hence uses the index (row labels) of the
DataFrame in the above example. Thus, x tick values
are the index of the DataFramedf that contains data
stored in MelaSales.CSV.
Customising Line Plot
We can substitute the ticks at x axis with a list of values
of our choice by using plt.xticks(ticks,label) where
ticks is a list of locations(locs) on x axis at which ticks
should be placed, label is a list of items to place at the
given ticks.
Program 4-5 Assuming the same CSV file, i.e., MelaSales.
CSV, plot the line chart with following
customisations:

Maker ="*"
Marker size=10
linestyle="--"
Linewidth =3
import pandas as pd
import matplotlib.pyplot as plt
df=pd.read_csv("MelaSales.csv")
#creates plot of different color for each week
df.plot(kind='line', color=['red','blue','brown'],marker="*",marke
rsize=10,linewidth=3,linestyle="--")

2024-25

Chapter 4.indd 115 10/9/2020 12:35:37 PM


116 Informatics Practices

plt.title('Mela Sales Report')


plt.xlabel('Days')
plt.ylabel('Sales in Rs')
#store converted index of DataFrame to a list
ticks = df.index.tolist()
#displays corresponding day on x axis
plt.xticks(ticks,df.Day)
plt.show()
Figure 4.6 is generated as output of Program 4-5
with xticks as Day names.

Figure 4.6: Mela sales figures with day names


4.4.2 Plotting Bar Chart
The line plot in Figure 4.6 shows that the sales for all
the weeks increased during the weekend. Other than
weekends, it also shows that the sales increased on
Wednesday for Week 1, on Thursday for Week 2 and on
Tuesday for Week 3.
But, the lines are unable to efficiently depict
comparison between the weeks for which the sales data
is plotted. In order to show comparisons, we prefer Bar
charts. Unlike line plots, bar charts can plot strings on
the x axis. To plot a bar chart, we will specify kind=’bar’.
We can also specify the DataFrame columns to be used
as x and y axes.

2024-25

Chapter 4.indd 116 10/9/2020 12:35:39 PM


Plotting Data using Matplotlib 117

Let us now add a column “Days” consisting of day


names to “MelaSales.csv” as shown in Table 4.7. If we do not specify
the column name
Table 4.7 Day-wise sales data along with Day’s names for the x parameter
Week 1 Week 2 Week 3 Day in the plot(), the
bar plot will plot all
5000 4000 4000 Monday the columns of the
5900 3000 5800 Tuesday DataFrame with the
6500 5000 3500 Wednesday index (row label) of
DataFrame at x axis
3500 5500 2500 Thursday which is a numeric
4000 3000 3000 Friday starting from 0.
5300 4300 5300 Saturday
7900 5900 6000 Sunday

Program 4-6 This program displays the Python script to


display Bar plot for the “MelaSales.csv” file
with column Day on x axis as shown below in
Figure 4.7
import pandas as pd
df= pd.read_csv('MelaSales.csv')
import matplotlib.pyplot as plt
# plots a bar chart with the column "Days" as x axis
df.plot(kind='bar',x='Day',title='Mela Sales Report')
#set title and set ylabel
plt.ylabel('Sales in Rs')
plt.show()

Figure 4.7: A bar chart as output of Program 4-6

2024-25

Chapter 4.indd 117 10/9/2020 12:35:41 PM


118 Informatics Practices

Customising Bar Chart


We can also customise the bar chart by adding certain
parameters to the plot function. We can control the
edgecolor of the bar, linestyle and linewidth. We can
also control the color of the lines. The following example
shows various customisations on the bar chart of
Figure 4.8
Program 4-7 Let us write a Python script to display Bar plot
for the “MelaSales.csv” file with column Day on
x axis, and having the following customisation:
● Changing the color of each bar to red,
yellow and purple.
● Edgecolor to green
● Linewidth as 2
● Line style as "--"
import pandas as pd
import matplotlib.pyplot as plt
df= pd.read_csv('MelaSales.csv')
# plots a bar chart with the column "Days" as x axis
df.plot(kind='bar',x='Day',title='Mela Sales Report',color=['red',
'yellow','purple'],edgecolor='Green',linewidth=2,linestyle='--')
#set title and set ylabel
plt.ylabel('Sales in Rs')
plt.show()

Figure 4.8: A bar chart as output of Program 4-7

2024-25

Chapter 4.indd 118 10/9/2020 12:35:42 PM


Plotting Data using Matplotlib 119

4.4.3 Plotting Histogram


If we do not specify
Histograms are column-charts, where each column Bins are the
represents a range of values, and the height of a column number of intervals
corresponds to how many values are in that range. you want to divide
To make a histogram, the data is sorted into all of your data into,
such that it can be
"bins" and the number of data points in each bin is displayed as bars
counted. The height of each column in the histogram on a histogram.
is then proportional to the number of data points its
bin contains.
The df.plot(kind=’hist’) function automatically selects
the size of the bins based on the spread of values in
the data.
Program 4-8
import pandas as pd
import matplotlib.pyplot as plt
data = {'Name':['Arnav', 'Sheela', 'Azhar', 'Bincy', 'Yash',
'Nazar'],
'Height' : [60,61,63,65,61,60],
'Weight' : [47,89,52,58,50,47]}
} Think and Reflect
df=pd.DataFrame(data)
df.plot(kind='hist') How can we make the
bar chart of Figure 4.8
plt.show()
horizontal?
The Program 4-9 displays the histogram corresponding
to all attributes having numeric values, i.e., ‘Height’
and ‘Weight’ attributes as shown in Figure 4.9. On the
basis of the height and weight values provided in the
DataFrame, the plot() calculated the bin values.

Figure 4.9: A histogram as output of Program 4-8

2024-25

Chapter 4.indd 119 10/9/2020 12:35:44 PM


120 Informatics Practices

It is also possible to set value for the bins parameter,


for example,
df.plot(kind=’hist’,bins=20)
df.plot(kind='hist',bins=[18,19,20,21,22])
df.plot(kind='hist',bins=range(18,25))
Customising Histogram
Taking the same data as above, now let see how the
histogram can be customised. Let us change the
edgecolor, which is the border of each hist, to green.
Also, let us change the line style to ":" and line width
to 2. Let us try another property called fill, which takes
boolean values. The default True means each hist will
be filled with color and False means each hist will be
empty. Another property called hatch can be used to fill
to each hist with pattern ( '-', '+', 'x', '\\', '*', 'o', 'O', '.'). In
the Program 4-10, we have used the hatch value as "o".
Program 4-9
import pandas as pd
import matplotlib.pyplot as plt
data = {'Name':['Arnav', 'Sheela', 'Azhar','Bincy','Yash',
'Nazar'],
'Height' : [60,61,63,65,61,60],
'Weight' : [47,89,52,58,50,47]}
df=pd.DataFrame(data)
df.plot(kind='hist',edgecolor='Green',linewidth=2,linestyle=':',fil
l=False,hatch='o')
plt.show()

Figure 4.10: Customised histogram as output of Program 4-9

2024-25

Chapter 4.indd 120 10/9/2020 12:35:45 PM


Plotting Data using Matplotlib 121

Using Open Data


There are many websites that provide data freely for
anyone to download and do analysis, primarily for
educational purposes. These are called Open Data as
the data source is open to the public. Availability of
data for access and use promotes further analysis and
innovation. A lot of emphasis is being given to open data
to ensure transparency, accessibility and innovation.
“Open Government Data (OGD) Platform India” (data.
gov.in) is a platform for supporting the Open Data
initiative of the Government of India. Large datasets
on different projects and parameters are available on
the platform.
Let us consider a dataset called “Seasonal and Annual
Min/Max Temp Series - India from 1901 to 2017” from
the URL https://data.gov.in/resources/seasonal-and-
annual-minmax-temp-series-india-1901-2017.
Our aim is to plot the minimum and maximum
temperature and observe the number of times (frequency)
a particular temperature has occurred. We only need to
extract the 'ANNUAL - MIN' and 'ANNUAL - MAX' columns
from the file. Also, let us aim to display two Histogram plots:
i) Only for 'ANNUAL - MIN'
ii) For both 'ANNUAL - MIN' and 'ANNUAL - MAX'

Program 4-10

import pandas as pd
import matplotlib.pyplot as plt
#read the CSV file with specified columns
#usecols parameter to extract only two required columns
data=pd.read_csv("Min_Max_Seasonal_IMD_2017.csv",
usecols=['ANNUAL - MIN','ANNUAL - MAX'])
df=pd.DataFrame(data)
#plot histogram for 'ANNUAL - MIN'
df.plot(kind='hist',y='ANNUAL - MIN',title='Annual Minimum
Temperature (1901-2017)')
plt.xlabel('Temperature')
plt.ylabel('Number of times')
#plot histogram for both 'ANNUAL - MIN' and 'ANNUAL - MAX'
df.plot(kind='hist',

2024-25

Chapter 4.indd 121 10/9/2020 12:35:45 PM


122 Informatics Practices

title='Annual Min and Max Temperature (1901-2017)',color=['b


lue','red'])
plt.xlabel('Temperature')
plt.ylabel('Number of times')
plt.show()

The Figures 4.11 and 4.12 are produced as output


of Program 4-10.

Figure 4.11: Histogram for 'ANNUAL – MIN' and 'ANNUAL – MAX'

Figure 4.12: Histogram for 'ANNUAL – MIN'

2024-25

Chapter 4.indd 122 10/9/2020 12:35:48 PM


Plotting Data using Matplotlib 123

Program 4-11 Plot a frequency polygon for the ‘ANNUAL –


MIN’ column of the “Min/Max Temp” data
over the histogram depicting it.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data=pd.read_csv("Min_Max_Seasonal_IMD_2017.csv",
usecols=['ANNUAL - MIN'])
df=pd.DataFrame(data)
#convert the 'ANNUAL - MIN' column into a numpy 1D array
minarray=np.array([df['ANNUAL - MIN']])
# Extract y (frequency) and edges (bins)
y,edges = np.histogram(minarray)
#calculate the midpoint for each bar on the histogram
mid = 0.5*(edges[1:]+ edges[:-1])
df.plot(kind='hist',y='ANNUAL - MIN'
plt.plot(mid,y,'-^')
plt.title('Annual Min Temperature plot(1901 - 2017)')
plt.xlabel('Temperature')
plt.show()

Annual Min Temperature plot (1901 – 2017)

Figure 4.13: Output of Program 4-11

2024-25

Chapter 4.indd 123 10/9/2020 12:35:49 PM


124 Informatics Practices

4.4.4 Plotting Scatter Chart


A scatter chart is a two-dimensional data visualisation
method that uses dots to represent the values obtained
for two different variables —one plotted along the x-axis
and the other plotted along the y-axis.
Scatter plots are used when you want to show the
relationship between two variables. Scatter plots are
sometimes called correlation plots because they show
how two variables are correlated. Additionally, the size,
shape or color of the dot could represent a third (or even
fourth variable).
Program 4-12 Prayatna sells designer bags and wallets.
During the sales season, he gave discounts
ranging from 10% to 50% over a period of 5
weeks. He recorded his sales for each type
of discount in an array. Draw a scatter plot
to show a relationship between the discount
offered and sales made.
import numpy as np
import matplotlib.pyplot as plt
discount= np.array([10,20,30,40,50])
saleInRs=np.array([40000,45000,48000,50000,100000])
plt.scatter(x=discount,y=saleInRs)
plt.title('Sales Vs Discount')
plt.xlabel('Discount offered')
plt.ylabel('Sales in Rs')
plt.show()

Activity 4.2
What value does each
bubble on the plot at
Figure 4.14 represent?

Figure 4.14: Output of Program 4-12

2024-25

Chapter 4.indd 124 10/9/2020 12:35:51 PM


Plotting Data using Matplotlib 125

Customising Scatter chart


The size of the bubble can also be used to reflect a
value. For example, in program 4-14, we have opted
for displaying the size of the bubble as 10 times the
discount, as shown in Figure 4.15. The colour and
markers can also be changed in the above plot by adding
the following statements:

Program 4-13

import numpy as np
import matplotlib.pyplot as plt
discount= np.array([10,20,30,40,50])
saleInRs=np.array([40000,45000,48000,50000,100000])
size=discount*10
plt.scatter(x=discount,y=saleInRs,s=size,color='red',linewidth=3,m
arker='*',edgecolor='blue')
plt.title('Sales Vs Discount') Think and Reflect
plt.xlabel('Discount offered')
plt.ylabel('Sales in Rs') What would
happen if we use
plt.show() df.plot(kind=’scatter’)
instead of plt.scatter()
in Program 4-13?

Figure 4.15: Scatter plot based on modified Program 4-13

2024-25

Chapter 4.indd 125 10/9/2020 12:35:53 PM


126 Informatics Practices

Notes 4.4.5 Plotting Quartiles and Box plot


Suppose an entrance examination of 200 marks is
conducted at the national level, and Mahi has topped
the exam by scoring 120 marks. The result shows 100
percentile against Mahi’s name, which means all the
candidates excluding Mahi have scored less than Mahi.
To visualise this kind of data, we use quartiles.
Quartiles are the measures which divide the data
into four equal parts, and each part contains an equal
number of observations. Calculating quartiles requires
calculation of median. Quartiles are often used in
educational achievement data, sales and survey data
to divide populations into groups. For example, you can
use Quartile to find the top 25 percent of students in
that examination.
A Box Plot is the visual representation of the
statistical summary of a given data set. The summary
includes Minimum value, Quartile 1, Quartile 2, Median,
Quartile 4 and Maximum value. The whiskers are the
two lines outside the box that extend to the highest and
lowest values. It also helps in identifying the outliers.
An outlier is an observation that is numerically distant
from the rest of the data, as shown in Figure 4.16:

Figure 4.16: A Box Plot

2024-25

Chapter 4.indd 126 10/9/2020 12:35:53 PM


Plotting Data using Matplotlib 127

Program 4-14 In order to assess the performance of students


of a class in the annual examination, the
class teacher stored marks of the students in
all the 5 subjects in a CSV “Marks.csv” file
as shown in Table 4.8. Plot the data using
boxplot and perform a comparative analysis
of performance in each subject.

Table 4.8 Marks obtained by students in five subjects

Name English Maths Hindi Science Social_Studies


Rishika Batra 95 95 90 94 95

Waseem Ali 95 76 79 77 89

Kulpreet Singh 78 81 75 76 88

Annie Mathews 88 63 67 77 80

Shiksha 95 55 51 59 80

Naveen Gupta 82 55 63 56 74

Taleem Ahmed 73 49 54 60 77

Pragati Nigam 80 50 51 54 76

Usman Abbas 92 43 51 48 69

Gurpreet Kaur 60 43 55 52 71

Sameer Murthy 60 43 55 52 71

Angelina 78 33 39 48 68

Angad Bedi 62 43 51 48 54

Program 4-14 Think and Reflect


import numpy as np
import pandas as pd What would happen if
the label or row index
import matplotlib.pyplot as plt passed is not present
data= pd.read_csv('Marks.csv') in the DataFrame?
df= pd.DataFrame(data)
df.plot(kind='box')
#set title,xlabel,ylabel
plt.title('Performance Analysis')
plt.xlabel('Subjects')
plt.ylabel('Marks')
plt.show()

2024-25

Chapter 4.indd 127 10/9/2020 12:35:53 PM


128 Informatics Practices

Figure 4.17: A boxplot of “Marks.csv”


The distance between the box and lower or upper
whiskers in some boxplots are more, and in some less.
Shorter distance indicates small variation in data,
and longer distance indicates spread in data to mean
larger variation.
Program 4-15 To keep improving their services, XYZ group
of hotels have asked all the three hotels to
get feedback form filled by their customers
at the time of checkout. After getting ratings
on a scale of (1–5) on factors such as Food,
Service, Ambience, Activities, Distance from
tourist spots they calculate the average rating
and store it in a CSV file. The data are given
in Table 4.9.
Table 4.9 Year-wise average ratings on five parameters

Year Sunny Bunny Resort Happy Lucky Resort Breezy WIndy Resort
2014 4.75 3 4.5
2015 2.5 4 2
2016 3.5 2.5 3
2017 4 2 3.5
2018 1.5 4.5 1

This year, to award the best hotel they have


decided to analyse the ratings of the past
5 years for each of the hotels. Plot the data
using Boxplot.

2024-25

Chapter 4.indd 128 10/9/2020 12:35:55 PM


Plotting Data using Matplotlib 129

Program 4-15
Think and Reflect
import pandas as pd
import matplotlib.pyplot as plt Which of the three
resorts should be
#read the CSV file in 'data' awarded? Give
data= pd.read_csv('compareresort.csv') reasons.
#convert 'data' into a DataFrame 'df'
df= pd.DataFrame(data)
#plot a box plot for the DataFrame 'df'
with a title
df.plot(kind='box',title='Compare Resorts')
#set xlabel,ylabel
plt.xlabel('Resorts')
plt.ylabel('Rating (5 years)')
#display the plot
plt.show()

Activity 4.3
Plot a pie to display the
radius of the planets
and also give an
appropriate title to
the plot.
Figure 4.18: A boxplot as output of Program 4.15.

Customising Box plot


We can display the whisker in horizontal direction by
adding a parameter vert=False in the Program 4-15, as
shown in the following line of code. We can change the
color of the whisker as well. The output of the modified
Program is shown in Figure 4.19.
df.plot(kind='box',title='Compare Resorts',
color='red', vert=False)

2024-25

Chapter 4.indd 129 10/9/2020 12:35:56 PM


130 Informatics Practices

Figure 4.19: The horizontal boxplot after modifying Program 4.15.

4.4.6 Plotting Pie Chart


Pie is a type of graph in which a circle is divided into
different sectors and each sector represents a part of
the whole. A pie plot is used to represent numerical
data proportionally. To plot a pie chart, either column
label y or 'subplots=True' should be set while using
df.plot(kind='pie') . If no column reference is passed and
subplots=True, a 'pie' plot is drawn for each numerical
column independently.
In the Program 4.16, we have a DataFrame with
information about the planet's mass and radius. The
‘mass’ column is passed to the plot() function to get a
pie plot as shown in Figure 4.20.
Program 4-16

import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'mass': [0.330, 4.87 , 5.97],
'radius': [2439.7, 6051.8, 6378.1]},
index=['Mercury', 'Venus', 'Earth'])
df.plot(kind='pie',y='mass')
plt.show()

2024-25

Chapter 4.indd 130 10/9/2020 12:35:58 PM


Plotting Data using Matplotlib 131

Figure 4.20: Pie chart as output of Program 4-16.


It is important to note that the default label names
are the index value of the DataFrame. The labels as
shown in Figure 4.20 are the names of the planet which
are the index values as shown in Program 4.16.
Program 4-17 Let us consider the dataset of Table 4.10
showing the forest cover of north eastern
states that contains geographical area and
corresponding forest cover in sq km along
with the names of the corresponding states.
Table 4.10 Forest cover of north eastern states
State GeoArea ForestCover
Arunachal Pradesh 83743 67353
Assam 78438 27692
Manipur 22327 17280
Meghalaya 22429 17321
Mizoram 21081 19240
Nagaland 16579 13464
Tripura 10486 8073

Program 4-17
import pandas as pd
import matplotlib.pyplot as plt
df=pd.DataFrame({'GeoArea':[83743,78438,22327,22429,21081,16579,10
486],'ForestCover':[67353,27692,17280,17321,19240,13464,8073]},
index=['Arunachal Pradesh','Assam','Manipur','Meghalaya',
'Mizoram','Nagaland','Tripura'])

2024-25

Chapter 4.indd 131 10/9/2020 12:36:00 PM


132 Informatics Practices

df.plot(kind='pie',y='ForestCover',
title='Forest cover of North Eastern
states',legend=False)
plt.show()
Think and Reflect
What effect did
‘legend= False’ in
Program 4.17 have on
the output?

Figure 4.21: Pie chart as output of Program 4.17

Customisation of pie chart


To customise the pie plot of Figure 4.21, we have added
the following two properties of pie chart in program
4-18:
• Explode—it specifies the fraction of the radius with
which to explode or expand each slot.
• Autopct—to display the percentage of that part as a
label.

Program 4-18
import pandas as pd
import matplotlib.pyplot as plt
df=pd.DataFrame({'GeoArea':[83743,78438,22327,22429,21081,16579,1
0486],'ForestCover':[67353,27692,17280,17321,19240,13464,8073]},
index=['Arunachal Pradesh','Assam','Manipur','Meghalaya', 'Mizoram
','Nagaland','Tripura'])
exp=[0.1,0,0,0,0.2,0,0]
#explode the first wedge to .1 level and fifth to level 2.
c=['r','g','m','c','brown','pink','purple']

2024-25

Chapter 4.indd 132 10/9/2020 12:36:00 PM


Plotting Data using Matplotlib 133

#change the color of each wedge


df.plot(kind='pie',y='ForestCover',title='Forest cover of North
Eastern states', legend=False, explode=exp, autopct="%.2f",
colors=c)
plt.show()

Figure 4.22: Pie chart as output of Program 4.18

S ummary
• A plot is a graphical representation of a data set
which is also interchangeably known as a graph or
chart. It is used to show the relationship between
two or more variables.
• In order to be able to use Python’s Data
Visualisation library, we need to import the
pyplot module from Matplotlib library using the
following statement: import matplotlib.pyplot as
plt, where plt is an alias or an alternative name
for matplotlib.pyplot. You can keep any alias of
your choice.
• The pyplot module houses functions to create a
figure(plot), create a plotting area in a figure, plot
lines, bars, hist. etc., in a plotting area, decorate
the plot with labels, etc.

2024-25

Chapter 4.indd 133 10/9/2020 12:36:01 PM


134 Informatics Practices

Notes • The various components of a plot are: Title,


Legend, Ticks, x label, ylabel
• plt.plot() is used to build a plot, where plt is
an alias.
• plt.show() is used to display the figure, where
plt is an alias.
• plt.xlabel() and plt.ylabel() are used to set the x
and y label of the plot.
• plt.title() can be used to display the title of a plot.
• It is possible to plot data directly from the
DataFrame.
• Pandas has a built-in .plot() function as part of
the DataFrame class.
• The general format of plotting a DataFrame
is df.plot(kind = ' ') where df is the name of the
DataFrame and kind can be line, bar, hist,
scatter, box depending upon the type of plot to be
displayed.

Exercise
1. What is the purpose of the Matplotlib library?
2. What are some of the major components of any
graphs or plot?
3. Name the function which is used to save the plot.
4. Write short notes on different customisation options
available with any plot.
5. What is the purpose of a legend?
6. Define Pandas visualisation.
7. What is open data? Name any two websites from
which we can download open data.
8. Give an example of data comparison where we can
use the scatter plot.
9. Name the plot which displays the statistical summary.
Note: Give appropriate title, set xlabel and ylabel while
attempting the following questions.

2024-25

Chapter 4.indd 134 10/9/2020 12:36:01 PM


Plotting Data using Matplotlib 135

10. Plot the following data using a line plot: Notes


Day 1 2 3 4 5 6 7

Tickets 2000 2800 3000 2500 2300 2500 1000


sold

• Before displaying the plot display “Monday,


Tuesday, Wednesday, Thursday, Friday,
Saturday, Sunday” in place of Day 1, 2, 3, 4,
5, 6, 7
• Change the color of the line to ‘Magenta’.
11. Collect data about colleges in Delhi University or any
other university of your choice and number of courses
they run for Science, Commerce and Humanities,
store it in a CSV file and present it using a bar plot.
12. Collect and store data related to the screen time of
students in your class separately for boys and girls
and present it using a boxplot.
13. Explain the findings of the boxplot of Figure 4.18 by
filling the following blanks:
a) The median for the five subjects is _____ , ______,
_______, ______, ______
b) The highest value for the five subjects is : _____ ,
______, _______, ______, ______
c) The lowest value for the five subjects is : _____ ,
______, _______, ______, ______
d) ______________ subject has two outliers with the
value ________ and ________
e) ______________ subject shows minimum variation
14. Collect the minimum and maximum temperature
of your city for a month and present it using a
histogram plot.
15. Conduct a class census by preparing a questionnaire.
The questionnaire should contain a minimum of
five questions. Questions should relate to students,
their family members, their class performance,
their health etc. Each student is required to fill
up the questionnaire. Compile the information in
numerical terms (in terms of percentage). Present the
information through a bar, scatter–diagram. (NCERT
Geography class IX, Page 60)

2024-25

Chapter 4.indd 135 10/9/2020 12:36:01 PM


136 Informatics Practices

Notes 16. Visit data.gov.in , search for the following in “catalogs”


option of the website:
• Final population Totals, India and states
• State Wise literacy rate
Download them and create a CSV file containing
population data and literacy rate of the respective
state. Also add a column Region to the CSV file
that should contain the values East, West, North
and South. Plot a scatter plot for each region where
X axis should be population and Y axis should be
Literacy rate. Change the marker to a diamond and
size as the square root of the literacy rate.
Group the data on the column region and display
a bar chart depicting average literacy rate for
each region.

2024-25

Chapter 4.indd 136 10/9/2020 12:36:01 PM


Chapter

5 Internet and Web

“The internet could be a very


positive step towards education,
organisation and participation in a
meaningful society.”
— Noam Chomsky

In this chapter
5.1 Introduction to Computer »» Introduction to
Networks Computer Networks
»» Types of Networks
We are living in a connected world.
Information is being produced, exchanged, »» Network Devices
and traced across the globe in real time. It's »» Networking Topologies
possible as almost everyone and everything »» The Internet
in the digital world is interconnected through »» Applications of Internet
one way or the other.
A group of two or more similar things »» Website
or people interconnected with each other »» Web Page
is called network (Figure 5.1). Some of the »» Web Server
examples of network in our everyday life »» Hosting of a website
include:
»» Browser
• Social network
• Mobile network
• Network of computers
• Airlines, railway, banks, hospitals
networks.

2024-25

Chapter 5.indd 137 10/9/2020 12:57:50 PM


138 Informatics Practices

A computer
network (Figure 5.2)
is an interconnection
among two or
more computers or
computing devices.
Such interconnection
allows computers
to share data and
resources among
each other. A basic
network may connect
a few computers
placed in a room.
Figure 5.1: Interconnection forming a social network The network size
may vary from small
to large depending on the number of computers it
connects. A computer network can include different
types of hosts (also called nodes) like server, desktop,
laptop, cellular phones.
Activity 5.1 A computer network (Figure 5.2) is an interconnection
Identify some among two or more computers or computing devices.
other networks in Such interconnection allows computers to share data
the real world.
and resources among each other. A basic network may
connect a few computers placed in a room.
The network size may vary from small to large
depending on the number of computers it connects.
A computer network can include different types of
hosts (also called nodes) like server, desktop, laptop,
cellular phones.

Figure 5.2: A computer network


Apart from computers, networks include networking
devices like switch, router, modem, etc. Networking
devices are used to connect multiple computers in
different settings. For communication, data in a network
is divided into smaller chunks called packets. These

2024-25

Chapter 5.indd 138 10/9/2020 12:57:50 PM


Internet and Web 139

packets are then carried over a network. Devices in a


network can be connected either through wired media
like cables or wireless media like air.
In a communication network, each device that is a
part of a network and that can receive, create, store or
send data to different network routes is called a node.
In the context of data communication, a node can be a
device such as a modem, hub, bridge, switch, router,
digital telephone handset, a printer, a computer or
a server.
Interconnectivity of computing devices in a network Activity 5.2
allows us to exchange information simultaneously with Create a hotspot using
many parties through email, websites, audio/video a smartphone and
connect other devices
calls, etc. Network allows sharing of resources. For
to it.
example, a printer can be made available to multiple
computers through a network; a networked storage
can be accessed by multiple computers. People often
connect their devices through hotspot, thus forming a
small personal network.

5.2 Types of Networks


There are various types of computer networks ranging
from network of handheld devices (like mobile phones
or tablets) connected through Wi-Fi or Bluetooth within
a single room to the millions of computers spread across
the globe. Some are connected wireless while others are
connected through wires.
Based on the geographical area covered and
data transfer rate, computer networks are broadly
categorised as:
• LAN (Local Area Network)
• MAN (Metropolitan Area Network)
• WAN (Wide Area Network)

5.2.1 Local Area Network (LAN)


It is a network that connects computers, mobile phones,
tablet, mouse, printer, etc., placed at a limited distance.
The geographical area covered by a LAN can range from
a single room, a floor, an office having one or more
buildings in the same premise, laboratory, a school,
college, or university campus. The connectivity is done
by means of wires, Ethernet cables, fibre optics, or
Wi-Fi. A Local Area Network (LAN) is shown in
Figure 5.3.

2024-25

Chapter 5.indd 139 10/9/2020 12:57:50 PM


140 Informatics Practices

Figure 5.3: A Local Area Network


LAN is comparatively secure as only authentic
users in the network can access other computers or
shared resources. Users can print documents using
a connected printer, upload or download documents
and software to and from the local server. Such LANs
provide the short range communication with the high
speed data transfer rates. These types of networks can
be extended up to 1 km. Data transfer in LAN is quite
high, and usually varies from 10 Mbps (called Ethernet)
to 1000 Mbps (called Gigabit Ethernet), where Mbps
stands for Megabits per second. Ethernet is a set of rules
that decides how computers and other devices connect
Think and Reflect
with each other through cables in a local area network
Explore and find out or LAN.
the minimum internet
speed required to make 5.2.2 Metropolitan Area Network (MAN)
a video call. Metropolitan Area Network (MAN) is an extended form
of LAN which covers a larger geographical area like a
city or a town. Data transfer rate in MAN also ranges in
Mbps, but it is considerably less as compared to LAN.
Cable TV network or cable based broadband internet
services are examples of MAN. This kind of network

2024-25

Chapter 5.indd 140 10/9/2020 12:57:51 PM


Internet and Web 141

can be extended up to 30–40 km. Sometimes, many


LANs are connected together to form MAN, as shown in
Figure 5.4.

LAN 1

Networking LAN 3
Device

LAN 2

Figure 5.4: A Metropolitan Area Network

5.2.3 Wide Area Network (WAN)


Wide Area Network (WAN) connects computers and
others LANs and MANs, which are spread across
different geographical locations of a country or in
different countries or continents. A WAN could be
formed by connecting a LAN to other LANs (Figure 5.5)
via wired or wireless media. Large business, educational
and government organisations connect their different
branches in different locations across the world through
WAN. The Internet is the largest WAN that connects
billions of computers, smartphones and millions of
LANs from different continents.

2024-25

Chapter 5.indd 141 10/9/2020 12:57:51 PM


142 Informatics Practices

Network User Network User


Network User Network User

Network Switch Network Switch

Network User Network User

Internet
Network User Network User Network User Network User

LAN 1 - India LAN 1 - France

Figure 5.5: A Wide Area Network

5.3 Network Devices


To communicate data through different transmission
media and to configure networks with different
functionality, we require different devices like Modem,
Hub, Switch, Repeater, Router, Gateway, etc. Let us
explore them in detail.
Think and Reflect 5.3.1 Modem
It is possible to access Modem stands for ‘MOdulator DEMolulator’. It refers to
your bank account a device used for conversion between analog signals and
from any part of the digital bits. We know computers store and process data
world. Find if the in terms of 0s and 1s. However, to transmit data from
bank’s network is a
LAN, MAN, WAN or any
a sender to a receiver, or while browsing the internet,
other type. digital data are converted to an analog signal and the
medium (be it free-space or a physical media) carries
the signal to the receiver. There are modems connected
to both the source and destination nodes. The modem
at the sender’s end acts as a modulator that converts
the digital data into analog signals. The modem at the
receiver’s end acts as a demodulator that converts the
analog signals into digital data for the destination node
to understand. Figure 5.6 shows connectivity using
a modem.
Analog Signal
Modulation Modulation
Digital Signal Digital Signal
Demodulation Demodulation

Telephone Line
Modem Modem

Figure 5.6: Use of modem

2024-25

Chapter 5.indd 142 10/9/2020 12:57:52 PM


Internet and Web 143

5.3.2 Ethernet Card


Ethernet card, also known as
Network Interface Card (NIC
card in short) is a network
adaptor used to set up a
wired network. It acts as an
interface between computer
and the network. It is a
circuit board mounted on the
motherboard of a computer
as shown in Figure 5.7. The
Ethernet cable connects the
computer to the network
through NIC. Ethernet cards
can support data transfer
between 10 Mbps and 1 Gbps
(1000 Mbps). Each NIC has
a MAC address, which helps
Figure 5.7: A Network Interface Card
in uniquely identifying the
computer on the network.
5.3.3 Repeater
Data are carried in the form of signals over the cable.
These signals can travel a specified distance (usually
about 100 m). Signals lose their strength beyond this
limit and become weak. In such conditions, original
signals need to be regenerated.
A repeater is an analog device that works with signals
on the cables to which it is connected. The weakened
signal appearing on the cable is regenerated and put
back on the cable by a repeater.
5.3.4 Hub
An Ethernet hub (Figure 5.8) is a network device used
to connect different devices through wires. Data arriving
on any of the lines
are sent out on all
the others. The
limitation of hub 1 2 3 4 5 6 7 8
is that if data from
two devices come
at the same time,
they will collide. Figure 5.8: A network hub with 8 ports

2024-25

Chapter 5.indd 143 10/9/2020 12:57:52 PM


144 Informatics Practices

Notes 5.3.5 Switch


A switch is a networking device (Figure 5.9) that plays a
central role in a Local Area Network (LAN). Like a hub, a
network switch is used to connect multiple computers or
communicating devices. When data arrives, the switch
extracts the destination address from the data packet
and looks it up in a table to see where to send the packet.
Thus it sends signals to only selected devices instead of
sending to all. It can forward multiple packets at the
same time. A switch does not forward the signals which
are noisy or corrupted. It drops such signals and asks
the sender to resend it.

Figure 5.9: Cables connected to a network switch


Ethernet switches are common in homes and offices
to connect multiple devices, thus creating LANs or to
access the Internet.
5.3.6 Router
A router (Figure 5.10) is a network device that can receive
the data, analyse it and transmit it to other networks.
A router connects a local area network to the internet.
Compared to a hub or a switch, a router has advanced
capabilities as it can analyse the data being carried over
a network, decide or alter how it is packaged, and send
it to another network of a different type. For example,
data has been divided into packets of a certain size.
Suppose, these packets are to be carried over a different
type of network which cannot handle bigger packets,
in such a case, the data is to be repackaged as smaller
packets and then sent over the network by a router.

2024-25

Chapter 5.indd 144 10/9/2020 12:57:53 PM


Internet and Web 145

Figure 5.10: A Router An Internet service


provider (ISP) is any
A router can be wired or wireless. A wireless router organisation that
provides services for
can provide Wi-Fi access to smartphones and other accessing the Internet.
devices. Usually, such routers also contain some ports
to provide wired Internet access. These days, home
Wi-Fi routers perform the dual task of a router and a
modem or switch. These routers connect to incoming
broadband lines, from ISP (Internet Service Provider),
and convert them to digital data for computing devices
to process.
5.3.7 Gateway
As the term “Gateway” suggests, it is a key access point
that acts as a “gate” between an organisation's network
and the outside world of the Internet (Figure 5.11).
Gateway serves as the entry and exit point of a network,
as all data coming in or going out of a network must
first pass through the gateway in order to use routing
paths. Besides routing data packets, gateways also
maintain information about the host network's internal
connection paths and the identified paths of other
remote networks. If a node from one network wants to
communicate with a node of a foreign network, it will
pass the data packet to the gateway, which then routes
it to the destination using the best possible route.
For simple Internet connectivity at homes, the
gateway is usually the Internet Service Provider that Activity 5.3
provides access to the entire Internet. Generally, a router Find and list a few
is configured to work as a gateway device in computer ISPs in your region.
networks. A gateway can be implemented as software,
hardware, or a combination of both. This is because a

2024-25

Chapter 5.indd 145 10/9/2020 12:57:53 PM


146 Informatics Practices

network gateway is placed at the edge of a network and


the firewall is usually integrated with it.

10.0.0.0/8 Server GATEWAY Server 20.0.0.0/8


IP ADDRESS IP ADDRESS

PC 4 PC 5 PC 4 PC 5

PC 1 PC 2 PC 3 PC 1 PC 2 PC 3

Figure 5.11: A network gateway

5.4 Networking Topologies


We have already discussed that a number of computing
devices are connected together to form a Local Area
Network (LAN), and interconnections among millions of
LANs forms the Internet. The arrangement of computers
and other peripherals in a network is called its topology.
Common network topologies are mesh, ring, bus, star
and tree.
5.4.1 Mesh Topology
In this networking topology, each communicating device
is connected with every other device in the network as
shown in Figure 5.12. Such a network can handle large
amounts of traffic since multiple nodes can transmit
data simultaneously. Also, such networks are more
reliable in the sense that even if a node gets down, it
does not cause any break in the transmission of data
between other nodes. This topology is also more secure
as compared to other topologies because each cable
between two nodes carries different data. However,

2024-25

Chapter 5.indd 146 10/9/2020 12:57:54 PM


Internet and Web 147

wiring is complex and cabling cost is high in creating


such networks, and there are many redundant or
unutilised connections.

Figure 5.12: A mesh topology

5.4.2 Ring Topology To build a fully-


In ring topology, each node is connected to two other connected mesh
devices, one each on either side, as shown in Figure topology of n nodes,
5.13. The nodes connected with each other thus form a it requires n(n-1)/2
ring. The link in a ring topology is unidirectional. Thus, wires.
data can be transmitted in one direction only (clockwise
or counterclockwise).

Figure 5.13: A ring topology

5.4.3 Bus Topology


In bus topology (Figure 5.14), each communicating
device connects to a transmission medium, known as
bus. Data sent from a node are passed on to the bus
and hence are transmitted to the length of the bus in
both directions. That means data can be received by
any of the nodes connected to the bus.

Bus

Figure 5.14: A bus topology


In this topology, a single backbone wire called bus
is shared among the nodes, which makes it cheaper
and easy to maintain. Both ring and bus topologies are
considered to be less secure and less reliable.

2024-25

Chapter 5.indd 147 10/9/2020 12:57:54 PM


148 Informatics Practices

5.4.4 Star Topology


Think and Reflect
In star topology, each communicating device is connected
How will a bus and to a central node, which is a networking device like a
ring topology behave in
hub or a switch, as shown in Figure 5.15.
case a node is down?
Star topology is considered very effective, efficient
and fast as each device is directly connected with the
central device. Although disturbance in one device
will not affect the rest of the network, any failure in
the central networking device may lead to the failure of
complete network.

Figure 5.15: A star topology


The central node can be either a broadcasting device
means data will be transmitted to all the nodes in the
network, or a unicast device means the node can identify
the destination and forward data to that node only.
5.4.5 Tree or Hybrid Topology
It is a hierarchical topology, in which there are multiple
branches and each branch can have one or more basic
topologies like star, ring and bus. Such topologies are
usually realised in WANs where
multiple LANs are connected.
Those LANs may be in the form
of ring, bus or star. In Figure
5.16, a hybrid topology is shown
connecting 4 star topologies
in bus.
In this type of network, data
transmitted from source first
reaches the centralised device and
from there the data passes through
Figure 5.16: A hybrid topology every branch where each branch
can have link for more nodes.
5.5 The Internet
The Internet is the global network of computing devices
including desktop, laptop, servers, tablets, mobile
phones, other handheld devices as well as peripheral
devices such as printers, scanners, etc. In addition, it

2024-25

Chapter 5.indd 148 10/9/2020 12:57:54 PM


Internet and Web 149

also consists of networking devices such as routers, Notes


switches, gateways, etc. Today, smart electronic
appliances like TV, AC, refrigerator, fan, light, etc.,
can also communicate through the Internet. The list of
such smart devices are always increasing e.g., drones,
vehicles, door lock, security camera, etc.
The Internet is evolving everyday. Computers
are either connected to a modem through a cable or
wirelessly (Wi-Fi). A modem, be it wired or wireless,
is connected to a local Internet Service Provider (ISP)
who then connects to a national network. Many such
ISPs connect together forming a regional network and
regional networks connect together forming a national
network, and such country-wise networks form the
Internet backbone.
The Internet today is a widespread network, and its
influence is no longer limited to the technical fields of
computer communications. It is being used by everyone
in the society as is evident from the increasing use of
online tools for education, creativity, entertainment,
socialisation and e-commerce.

5.6 Applications of Internet


Following are some of the broad areas or services
provided through Internet:
• The World Wide Web (WWW)
• Electronic mail (Email)
• Chat
• Voice Over Internet Protocol (VoIP)

5.6.1 The World Wide Web (WWW)


The World Wide Web (WWW) or web in short, is an
ocean of information, stored in the form of trillions
of interlinked web pages and web resources. The
resources on the web can be shared or accessed through
the Internet.
Earlier, to access files residing in different computers,
one had to login individually to each computer through
the Internet. Besides, files in different computers were
sometimes in different formats, and it was difficult to
understand each other’s files and documents. Sir Tim
Berners-Lee — a British computer scientist invented the
revolutionary World Wide Web in 1990 by defining three
fundamental technologies that lead to creation of web:

2024-25

Chapter 5.indd 149 10/9/2020 12:57:54 PM


150 Informatics Practices

• HTML — HyperText Markup Language or HTML is a


language which is used to design standardised Web
Pages so that the Web contents can be read and
understood from any computer across the globe. It
uses tags to define the way page content should be
displayed by the web browser. Basic structure of
every webpage is designed using HTML.
• URI — Uniform Resource Identifier or URI is a unique
identifier to identify a resource located on the web.
URI identifies a resource (hardware or software) either
by its location or by its name or by both.
URL is Uniform Resource Locator and provides
the location and mechanism (protocol) to access
the resource. Examples of URI identifying resources
using location (i.e., URL) are: https://www.mhrd.gov.
in, http://www.ncert.nic.in, http://www.airindia.
in, etc. URL is sometimes also called a web address.
However, it is not only the domain name, but contains
other information that completes a web address, as
depicted below:
Domain Name

http://www.ncert.nic.in/textbook/textbook.htm
URL

In the above URL, http is the protocol name, it can


be https, http, FTP, Telnet, etc. www is a subdomain.
ncert.nic.in is the domain name.
Search Engine(s)
Note: These days it is not mandatory to mention protocol
like google.
and subdomain while entering a URL. The browser
co.in, bing.com,
automatically prefixes it.
duckduckgo.com,
• HTTP — The HyperText Transfer Protocol is a set
in.yahoo.com, etc.,
of rules which is used to retrieve linked web pages
can be used to
across the web. It’s more secure and advanced version
search and retrieve
is HTTPS.
information when
the address of the Many people confuse the web with the Internet.
web page is not The Internet as we know is the huge global network
known. of interconnected computers, which may or may not
have any file or webpage to share with the world.
The web on the other hand is the interlinking of a
collection of WebPages on these computers which are
accessible over the Internet. WWW today gives users
access to a vast collection of information created and

2024-25

Chapter 5.indd 150 10/9/2020 12:57:54 PM


Internet and Web 151

shared by people across the world. It is today the Notes


most popular information retrieval system.
5.6.2 Electronic Mail (Email)
Email is the short form of electronic mail. It is one of
the ways of sending and receiving message(s) using the
Internet. An email can be sent anytime to any number
of recipients at anywhere. The message can be either
text entered directly onto the email application or an
attached file (text, image audio, video, etc.) stored on
a secondary storage. An existing file can be sent as an
attachment with the email, so no need to type it again.
To use email service, one needs to register with an
email service provider by creating a mail account. These
services may be free or paid. Some of the popular email
service providers are Google (gmail), Yahoo (yahoo mail),
Microsoft (outlook), etc. However, many organisations
nowadays get customised business email addresses for
their staff using their own domain name. For example,
[email protected].
Following are some of the common facilities available
for an email user:
1. Creating an email, attaching files with an email,
saving an email as draft for mailing later. Creating
email is also termed as composing.
2. Sending and receiving mail. Same email can be sent
to multiple email addresses, simultaneously.
3. Sending the copy of mail, as carbon copy (cc) or
blind carbon copy (bcc).
4. Forwarding a received email to other user(s)
5. Filtering spam emails
6. Organising email in folders and sub folders
7. Creating and managing email ids of the people you
know.
8. Setting signature/footer to be inserted automatically
at the end of each email
9. Printing emails using a printer or saving as files.
10. Searching emails using email address or email
subject text
5.6.3 Chat
Chatting or Instant Messaging (IM) over the Internet
means communicating to people at different geographic
locations in real time through text message(s). It is a

2024-25

Chapter 5.indd 151 10/9/2020 12:57:54 PM


152 Informatics Practices

Notes forum where multiple people connect to each other,


to discuss their common interests. Two individuals
can also send messages instantly. The sender types
a message and sends it; the receiver immediately
receives the message and can read and revert through
text message. All this happens in real time, as if the
sender and receiver were sitting in the same place. For
a successful chat session, the communicating parties
should be online simultaneously, and use the same
chat application.
With ever increasing internet speed, it is now possible
to send image, document, audio, video as well through
instant messengers. It means, the communicating
parties can talk to each other through an audio call or
through a video call. Moreover, it is also possible to chat
through text, audio and video in a group. Thus, we can
have group chat or group calls.
Applications such as WhatsApp, Slack, Skype, Yahoo
Messenger, Google Talk, Facebook Messenger, Google
Hangout, etc., are examples of instant messengers.
Some of these applications support instant messaging
through all the modes — text, audio and video.
5.6.4 VoIP
Voice over Internet Protocol or VoIP, allows us to have
voice call (telephone service) over the Internet, i.e., the
voice transmission over a computer network rather
than through the regular telephone network. It is also
known as Internet Telephony or Broadband Telephony.
But to avail the phone service over the Internet, we
need to have an Internet connection with reasonably
good speed.
VoIP works on the simple principle of converting the
analogue voice signals into digital and then transmitting
them over the broadband line. There are two major
advantages of a VoIP—
• These services are either free or very economical,
so people use them to save on cost. That is why
these days even international calls are being made
using VoIP.
• VoIP call(s) can be received and made using IP phones
from any place having Internet access. Hence, VoIP
has increased the portability and functionality of the
voice calling system. Incoming phone calls can be

2024-25

Chapter 5.indd 152 10/9/2020 12:57:54 PM


Internet and Web 153

automatically routed to the VoIP phone as soon as it Notes


is connected to the Internet.
The only disadvantage of VoIP is that its call quality is
dependent on Internet connection speed. Slow Internet
connection will lead to poor quality voice calls.

5.7 Website
Each one of us might have visited one or the other
website. A website in general contains information
organised in multiple pages about an organisation. A
website can also be created for a particular purpose,
theme or to provide a service.
A website (usually referred to as a site in short) is a
collection of web pages related through hyperlinks, and
saved on a web server. A visitor navigates from one page
to another by clicking on hyperlinks. Also, all the pages
of a website are integrated under one domain name
and have a common theme and template. For example,
the website of NCERT will have all the pages related
to NCERT, viz., textbooks, syllabus, events, resource
materials, etc., under one domain name and having a
common design theme. To access a website, one has
to type the address of the website (URL) in the address
bar of a browser, and press enter. The home page of the
website will be displayed.
5.7.1 Purpose of a Website
We are living in an Internet era where the whole world
is connected. A website’s purpose is to make the
information available to people at large. For example, a
company might like to advertise or sell its products, a
government organisation may like to publish circulars,
float tenders, invite applications or get feedback from
various stakeholders. A website is a means that helps
to communicate with people in a specific, transparent
and user friendly manner. Therefore, while developing
a website, the first question to ask is why the website is
being created, and what should be its pages so that it
serves the required purpose.
Basically, a website should be user friendly and
provide information to users with minimum efforts. A
website should be designed keeping in mind different
categories of people that will be visiting the site. Some of
the common purposes for which websites are designed
are listed below:

2024-25

Chapter 5.indd 153 10/9/2020 12:57:54 PM


154 Informatics Practices

• Selling products and delivering services


• Posting and finding information on the internet
• Communicating with each other
• Entertainment purposes
• Disseminating contents and software

5.8 Web Page


A web page (also referred to as a page) is a document
on the WWW that is viewed in a web browser. Basic
structure of a web page is created using HTML (HyperText
Markup Language) and CSS (Cascaded Style Sheet). A
web page is usually a part of a website and may contain
information in different forms, such as:
Activity 5.4 ● text in the form of paragraphs, lists, tables, etc.
Visit NCERT, SWAYAM ● images
or any other website ● audio
and note down URLs ● video
of some of the specific
● software application
pages of that website.
● other interactive contents
Additionally, various styling and formatting are
applied on a web page to make it attractive and organised.
Further, program codes called scripts are used to define
the manner in which the page will behave on different
actions. Scripts make a web page interactive. JavaScript
is the most popular and commonly used scripting
language. However, Python and PHP are also used to
apply scripting on a web page.
The first page of the website is called a home page.
It generally contains information and links to all the
related web pages. Each web page has a unique address
that is visible on the address bar. Hence if we want to
view a particular web page, its address has to be typed in
the address bar of the browser. The web pages that are
linked to form a website share a unique domain name.
For example, https://swayam.gov.in/ is a website by
the Government of India to deliver online courses for
School, College and University students and teachers. It
is a collection of multiple web pages that link to different
courses related information.
5.8.1 Static and Dynamic Web Pages
A web page can be static or dynamic. A static webpage
is one whose content always remains static, i.e., does
not change for person to person. When a web server

2024-25

Chapter 5.indd 154 10/9/2020 12:57:54 PM


Internet and Web 155

receives a request (from browser) for a static web page,


it just locates the page on its storage media and sends
it to the browser of the client. No additional processing
is performed on the page. Hence, a static web page
remains the same for all users until someone changes
its code manually.
Static web pages are generally written in HTML,
JavaScript and/or CSS and have the extension .htm
or .html.

STEP 1: HTTP Request


Web Web
Browser Browser
STEP 2: HTTP Response

Figure 5.17: Working of a static web page

On the other hand, a dynamic web page is one in


which the content of the web page can be different for
different users. The difference in content may be because
of different choices made by the user. When a request for
a dynamic web page is made to the web server, it does
not simply retrieve the page and send. Before sending
the requested web page, the server may perform some
additional processes like getting information from the
database, updating date and time, updating weather
information, etc. The content of such pages changes
frequently. They are more complex and thus take more
time to load than static web pages.
Dynamic web pages can be created using various
languages such as JavaScript, PHP, ASP.NET, Python,
Java, Ruby, etc. These are complex to construct and
design, as the code to perform the additional operations
has to be added. Such server side code allows the server
to change its content each time the page is loaded.
Further, most dynamic pages are linked to databases
so that each time the page is uploaded, the required
information from the databases is retrieved to update
the web page. Few common examples of dynamic web
pages are those web pages displaying the date, time,
and weather report or having e-commerce applications.

2024-25

Chapter 5.indd 155 10/9/2020 12:57:55 PM


156 Informatics Practices

STEP 2: Calls
an application
program in
response to the
HTTP request.

STEP 1: HTTP Request


Web Web
Browser Browser
STEP 4: HTTP Response

STEP 3: The
program
executes and
produces
HTML output.
Figure 5.18: Working of a dynamic web page

5.9 Web Server


A web server is used to store and deliver the contents of
a website to clients such as a browser that request it. A
web server can be software or hardware.
When talking about a web server as computer
hardware, it stores web server software and a website's
contents (HTML pages, images, CSS stylesheets, and
JavaScript files). The server needs to be connected to
the Internet so that its contents can be made accessible
to others.
When talking about a web server as a software, it
is a specialised program that understands URLs or
web addresses coming as requests from browsers, and
responds to those requests. The server is assigned a
unique domain name so that it can be accessed from
anywhere using the domain name. To develop and test
a website using a personal computer, we need to first
install a web server on that computer.
The web browser from the client computer sends a
request (HTTP request) for a page containing the desired
data or service. The web server then accepts, interprets,
searches and responds (HTTP response) to the request
made by the web browser. The requested web page is
then displayed in the browser of the client. If the server
is not able to locate the page, it sends a page containing

2024-25

Chapter 5.indd 156 10/9/2020 12:57:55 PM


Internet and Web 157

the error message (Error 404 – page not found) to the


client’s browser.

5.10 Hosting of a Website


Web hosting is a service that allows us to put a website
or a web page onto the Internet, and make it a part of
the World Wide Web. Once a website is created using a
hardware server, we need to connect it to the Internet
so that users across the globe can access. On the other
hand, we can rent server resources (CPU, RAM, and Activity 5.5
storage) from a cloud service provider and host our locally
created website there. This is done by uploading the Find out some of the
files constituting the website (HTML, CSS, JavaScript, Web hosting service
images, databases, etc.) from the local computer onto providers from both
the space allocated on the server. For this, we have categories — free
to avail the services of a web hosting service provider. and paid.
These services for using the server’s resources such as
RAM, hard disk, bandwidth, etc., are usually paid and
these resources can be increased or decreased as per
the loads on the website.
A web server whether it is a local server or a cloud
server when connected to the Internet is assigned
a unique numeric address on the Internet called IP
address. This IP address needs to be mapped to a
textual name called domain name of the website. This
is because it is not convenient for users to remember a
numeric IP address. Thus, for accessing a website, the
user enters the domain through a browser (URL). The
domain name has to be registered (purchased) with an
authorised agency.
5.10.1 How to host a website?
To host a website, follow the steps given below:
• Select the web hosting service provider that will provide
the web server space as well as related technologies
and services such as database, bandwidth, data
backup, firewall support, email service, etc. This has
to be done keeping in mind the features and services
that we want to offer through our website.
• Identify a domain name, which best suits our
requirement, and get it registered through domain
name Registrar.
• Once we get web space, create logins with appropriate
rights and note down IP address to manage web space.

2024-25

Chapter 5.indd 157 11/12/2020 12:14:06 PM


158 Informatics Practices

• Upload the files in properly organised folders on the


allocated space.
• Get domain name mapped to the IP address of the
web server.
The domain name system (DNS) is a service that
does the mapping between domain name and IP
address. When the address of a website is entered in a
browser, the DNS finds out the IP address of the server
corresponding to the requested domain name and sends
the request to that server.

5.11 Browser
A browser is a software application that helps us to view
the web page(s). In other words, it helps us to view the
data or information that is retrieved from various web
servers on the Internet. Some of the commonly used web
browsers are Google Chrome, Internet Explorer, Mozilla
Firefox, Opera, etc. A web browser essentially displays
the HTML documents which may include text, images,
audio, video and hyperlinks that help to navigate from
one web page to another.

Mozilla Microsoft Google


Firefox Internet Chrome
rer
Explorer

Opera Apple
Mosaic was the Safari
first web browser
Figure 5.19: Some commonly used browsers
developed by the
National Centre for The initial web browsers like Mosaic used to support
Supercomputing HTML documents containing plain text (static website)
Application (NCSA). only, but nowadays with the advancement of technology,
modern web browsers allow us to view interactive and

2024-25

Chapter 5.indd 158 11/12/2020 10:58:28 AM


Internet and Web 159

dynamic websites. In addition to this, most modern


browsers allow a wide range of visual effects, use
encryption for advanced security and also have cookies
that can store the browser settings and data.
5.11.1 Browser Settings
Every web browser has got certain settings that define
the manner in which the browser will behave. These Mozilla Firefox is
an open source
settings may be with respect to privacy, search engine
web browser which
preferences, download options, auto signature, autofill
is available free of
and autocomplete feature, theme and much more. Each cost and can be
browser application allows us to change or customise easily downloaded
its settings in a user friendly manner. Let’s learn how from the Internet.
to change the browser settings using the open source
browser, Mozilla Firefox.
Open Mozilla Firefox, and on the top right corner of the
browser window, click the Menu button.

Figure 5.20: Mozilla Firefox Menu button


From the drop down button, select Options. The
preferences and Options window will be displayed in
the browser.

2024-25

Chapter 5.indd 159 10/9/2020 12:57:55 PM


160 Informatics Practices

Figure 5.21: Preference and options page


On the left side, there are multiple Panels to choose
from: General, Home, Search, Privacy and Security and
Sync.
General Panel: Some of the options that the panel
contains are as follows:
• setting the default browser
• language and appearance of text
• downloading files and applications
• firefox update settings
• browsing and network settings
Home Panel: This panel contains options to set the
home page of the browser, browser window and tab
settings.
Search Panel: This panel contains options to edit the
settings of the search engine used by Firefox.
Privacy and Security Panel: This panel contains
options to secure the browser and data. It includes the
following:
• enhanced tracking protection
• forms and passwords
• history and address bar
• cookies and site data
• permission to view pop ups windows and install add-
ons

2024-25

Chapter 5.indd 160 10/9/2020 12:57:55 PM


Internet and Web 161

Sync Panel: This panel contains options to set up and


manage a Firefox account which is needed to access all
services given by Mozilla.
Make the desired settings and close the browser
settings window. The changes made in the browser
settings will be applied.
5.11.2 Add-Ons and Plug-ins
Add-ons and plug-ins are the tools that help to extend
and modify the functionality of the browser. Both the
tools boost the performance of the browser, but are
different from each other.
A plug-in is a complete program or may be a
third-party software. For example, Flash and Java are
plug-ins. A Flash player is required to play a video in
the browser. A plug-in is a software that is installed on
the host computer and can be used by the browser for
multiple functionalities and can even be used by other
applications as well.
On the other hand, an add-on is not a complete program
and so is used to add only a particular functionality to
the browser. An add-on is also referred to as extension in
some browsers. Adding the functionality of a sound and
graphics card is an example of an add-on.

Figure 5.22: Add-ons and plug-ins

2024-25

Chapter 5.indd 161 10/9/2020 12:57:55 PM


162 Informatics Practices

To add an extension, click the Options button on


the top right corner of the browser and select the Add-
ons option. Click the Extensions Panel option on the
left. On the right, options to Manage your Extensions
will appear. There will be a list of enabled, disabled and
Think and Reflect recommended extensions. Make the desired selections
Can we compare Add- and close the add-ons window.
ons and Plug-ins with Similarly, to add plug-ins, click Plug-ins options on
utility software? the left side of the browser window. Make the desired
selections to enable or disable the required plug-ins.
5.11.3 Cookies
A cookie is a text file, containing a string of information,
which is transferred by the website to the browser when
we browse it. This string of information gets stored in the
First cookie form of a text file in the browser. The information stored
software was is retransmitted to the server to recognise the user, by
created in 1994 identifying pages that were visited, choices that were
at Netscape, made while browsing various menu(s) on a particular
for determining
whether the person
website. It helps in customising the information that
is a first time visitor will be displayed, for example the choice of language for
or a re-visitor of browsing, allowing the user to auto login, remembering
their site. the shopping preference, displaying advertisements of
one’s interest, etc.
Cookies are usually harmless and they can’t access
information from the hard disk of a user or transmit
virus or malware. It is the browser on our computer
which stores and manages the cookies. However,
viruses can also be tricked as cookies and cause harm
to a computer. One can disable cookies by changing the
Privacy and Security settings of our browser.

S ummary
• A group of two or more similar things or people
interconnected with each other is called network
• A computer network is an interconnection
among two or more computers to share data and
resources.
• Devices in a network can be connected either
through wired or wireless media.

2024-25

Chapter 5.indd 162 10/9/2020 12:57:55 PM


Internet and Web 163

Notes
• Based on the geographical area covered and data
transfer rate, computer networks are broadly
categorised as LAN, MAN and WAN.
• The protocol or the set of rules that decide
functioning of a LAN is called Ethernet.
• Local Area Network (LAN) is a network that
connects digital devices placed at a limited
distance of upto 1 km.
• Metropolitan Area Network (MAN) is an extended
form of LAN which covers a larger geographical
area like a city or a town.
• Wide Area Network (WAN) connects computers and
other LANs and MANs, which are spread across
different geographical locations of a country or in
different countries or continents.
• A repeater is an electronic device that receives a
weak signal and regenerates it.
• Modem (MOdulator DEMolulator) refers to any
such device used for conversion between analog
signals and digital bits.
• A hub is a network device used to connect
multiple devices to form a network or to connect
segment(s) of LAN.
• A switch is a networking device that filters network
traffic while connecting multiple computers or
communicating devices.
• A router is a network device that can receive the
data, analyse it and transmit to other networks.
• A gateway is a device that connects the
organisation’s network with the outside world of
the Internet.
• The physical organisation of computers, cables
and other peripherals in a network is called its
topology. Common network topologies are Bus,
Star, Tree, Mesh, etc.
• In bus topology, each communicating device
connects to a common central transmission
medium, known as bus.
• In star topology, each communicating device is
connected to a central node, which is a networking
device like a hub or a switch, through separate
cables.

2024-25

Chapter 5.indd 163 10/9/2020 12:57:55 PM


164 Informatics Practices

Notes • In tree topology, multiple star and bus topologies


are connected to a central cable, also called the
backbone of the network.
• In mesh topology, each communicating device is
connected with every other device in the network.
• The Internet is the largest WAN that connects
millions of computers across the globe.
• Some of the services provided through the Internet
are information sharing, communication, data
transfer, social networking, e-commerce, etc.
• A Uniform Resource Locator (URL) is a standard
naming convention used for accessing resources
over the Internet.
• Electronic mail is a means of sending and receiving
message(s) through the Internet.
• Chatting is communicating in real time using text
message(s).
• Voice over Internet Protocol (VoIP) allows you to
have voice calls over digital networks.
• A website is a collection of related web pages.
• A web page is a document that is viewed in a web
browser such as Google Chrome, Mozilla Firefox,
Opera, Internet Explorer, etc. It can be static or
dynamic.
• A static web page is one whose content does not
change for requests made by different people.
• A dynamic web page is one in which the content
of the web page displayed is different for different
users.
• A web server is a program or a computer that
provides services to other programs or computers
called clients.
• Web hosting is a service that allows you to post
the website created locally so that it is available
for all internet users across the globe.
• Every browser has got certain settings that define
the manner in which the browser will behave.
These settings may be with respect to privacy,
search engine preferences, download options,
auto signature, autofill and autocomplete feature
and much more.

2024-25

Chapter 5.indd 164 10/9/2020 12:57:55 PM


Internet and Web 165

Notes
• Add-ons and plug-ins are the tools that help
to extend and modify the functionality of the
browser.
• A cookie is a text file containing a string of
information which stores browsing information
on the hard disk of your computer.

Exercise
1. Fill in the blanks:
a) To transmit data for sharing on a network, it
has to be divided into smaller chunks called
______________________.
b) The set of rules that decide the functioning of a
network is called _______________.
c) A LAN can be extended up to a distance of
__________ km.
d) The ___________________ connects a local area
network to the internet.
e) The _____________ topology is of hierarchical
nature.
f) ____________________ is a standard naming
convention used for accessing resources over the
Internet.
g) ______________ is a collection of related web pages.
h) A _____________ is a computer that provides
services to other programs or computers.
2. Expand the following:
a) ARPANET
b) ISP
c) URL
3. Name the device for the following:
a) It stands for Modulator Demodulator
b) It regenerates the signals.
4. Differentiate between:
a) MAN and WAN
b) Website and web page
c) Router and Gateway

2024-25

Chapter 5.indd 165 10/9/2020 12:57:55 PM


166 Informatics Practices

Notes d) Bus and Star topology


e) Static and Dynamic web pages
5. Define a network. What is the need of forming a
network?
6. Give any two examples of networks.
7. Give any three applications on the Internet.
8. Name any two mail service providers.
9. Explain VoIP.
10. What is DNS?
11. Identify the type of topology from the following:
a) Each node is connected with the help of a single
cable.
b) Each node is connected with central switching
through independent cables.
12. Sahil, a Class X student, has just started
understanding the basics of Internet and web
technologies. He is a bit confused in between the
terms “World Wide Web” and “Internet”. Help him
in understanding both the terms with the help of
suitable examples of each.
13. Murugan wants to send a report on his trip to the
North East to his mentor. The report contains images
and videos. How can he accomplish his task through
the Internet?
14. Mampi is planning to open a company that deals
with rural handicrafts. She wants to advertise about
handicrafts on a social platform. Which Internet
service she should use and why?
15. Ruhani wants to edit some privacy settings of her
browser. How can she accomplish her task?
16. Shubham wants to play a video in his browser but
he is not able to do so. A message on the screen
instructs him to install the Adobe Flash Player plug-
in. Help him to add it in his browser.
17. When Joe typed a URL in the address bar of his
browser, Error 404 was displayed? Why did this
happen? What can be done to avoid it?

2024-25

Chapter 5.indd 166 10/9/2020 12:57:55 PM


Chapter

6 Societal Impacts

“I think computer viruses should


count as life. I think it says
something about human nature
that the only form of life we have
created so far is purely destructive.
We’ve created life in our own
image.”
— Stephen Hawking

In this chapter
»» Introduction
»» Digital Footprints
6.1 Introduction »» Digital Society and
Netizen
In recent years, the world around us has
»» Data Protection
seen a lot of changes due to use of ‘Digital
Technologies’. These changes have made a »» Creative Commons
dramatic impact on our lives, making things »» Cyber Crime
more convenient, faster, and easier to handle. »» Indian Information
In the past, a letter would take days to reach, Technology Act (IT Act)
and every recipient would get his or her own »» E-waste: Hazards and
copy and respond separately. Today, one can Management
send and receive emails to more than one »» Impact on Health
person at a time. The instantaneous nature
of electronic communications has made us
more efficient and productive.
From the banking industry to aviation,
industrial production to e-commerce,
especially with regard to the delivery of their

2024-25

Chapter 6.indd 167 10/9/2020 12:36:45 PM


168 Informatics Practices

goods and services, all are now dependent on the use


of computers and digital technologies. Applications
of digital technologies have redefined and evolved all
spheres of human activities. Today more and more people
are using digital technologies through smartphones,
computers, etc., with the help of high speed Internet.
Why did the digital technologies become so
widespread? The introduction of personal computers
(PCs) and Internet followed by smartphones has brought
these technologies to the common man.
While we reap the benefits of digital technologies,
these technologies can also be misused. Let’s look at
the impact of these technologies on our society and the
best practices that can ensure a productive and safe
digital environment for us.

6.2 Digital Footprints


Have you ever searched online for any information?
Have you ever purchased an online ticket, or responded
to your friend’s email, or checked the score of a
game online? Whenever we surf the Internet using
smartphones, tablets, computers, etc., we leave a
trail of data reflecting the activities performed by us
online, which is our digital footprint.
Our digital footprint can be created and used with
or without our knowledge. It includes websites we
visit, emails we send, and any information we submit
online, etc., along with the computer’s IP address,
location, and other device specific details. Such data
could be used for targeted advertisement or could
also be misused or exploited. Thus, it is good to be
aware of the data trail we might be leaving behind.
This awareness should make us cautious about what
we write, upload or download or even browse online.
There are two kinds of digital footprints we leave
behind. Active digital footprints which includes data
that we intentionally submit online. This would
include emails we write, or responses or posts we
make on different websites or mobile Apps, etc. The
digital data trail we leave online unintentionally is
Figure 6.1: Exemplar web
called passive digital footprints. This includes the
applications that result in data generated when we visit a website, use a mobile
digital footprints App, browse Internet, etc. as shown in Figure 6.1

2024-25

Chapter 6.indd 168 10/9/2020 12:36:54 PM


Societal Impacts 169

Everyone who is connected to the Internet may have


a digital footprint. With more usage, the trail grows. On
examining the browser settings, we can find out how it Think and Reflect
stores our browsing history, cookies, passwords, auto
fills, and many other types of data. Can your digital
Besides browser, most of our digital footprints are footprints be used to
judge your attitude
stored in servers where the applications are hosted.
and work ethics?
We may not have access to remove or erase that data,
neither do we have any control on how that data will
be used. Therefore, once a data trail is generated, even
if we later try to erase data about our online activities,
the digital footprints still remain. There is no guarantee
that digital footprints will be fully eliminated from the
Internet. Therefore, we need to be more cautious while
being online! All our online activities leave a data trace
on the Internet as well as on the computing device that
we use. This can be used to trace the user, their location,
device and other usage details.

6.3 Digital Society and Netizen


As our society is inclined towards using more and
more digital technologies, we end up managing most
of our tasks digitally. In this era of digital society, our
daily activities like communication, social networking,
banking, shopping, entertainment, education, Activity 6.1
transportation, etc., are increasingly being driven by As a digital citizen, list
online transactions. various services that
Digital society thus reflects the growing trend of you avail online.
using digital technologies in all spheres of human
activities. But while online, all of us need to be aware
of how to conduct ourselves, how best to relate with
others and what ethics, morals and values to maintain.
Anyone who uses digital technology along with Internet
is a digital citizen or a netizen. Being a good netizen
means practicing safe, ethical and legal use of digital
technology. A responsible netizen must abide by
net etiquettes, communication etiquettes and social
media etiquettes.
6.3.1 Net Etiquettes
We follow certain etiquettes during our social
interactions. Similarly, we need to exhibit proper
manners and etiquettes while being online as shown
in Figure 6.2. One should be ethical, respectful and
responsible while surfing the Internet.

2024-25

Chapter 6.indd 169 10/9/2020 12:36:54 PM


170 Informatics Practices

(A) Be Ethical
• No copyright violation: we should not
use copyrighted materials without the
permission of the creator or owner. As
an ethical digital citizen, we need to be
careful while streaming audio or video
or downloading images and files from
the Internet. We will learn more about
copyright in Section 6.4.
• Share the expertise: it is good to share
information and knowledge on Internet
so that others can access it. However,
prior to sharing information, we need to
be sure that we have sufficient knowledge
Figure 6.2: Net etiquettes
on that topic. The information shared
should be true and unambiguous. Also,
in order to avoid redundant information,
we should verify that the information is
not available already on Internet.

(B) Be Respectful
• Respect privacy: as good digital citizens we
have the right to privacy and the freedom of
While surfing the personal expression. At the same time, we have
Internet, we should
to understand that other digital citizens also
be cautious about
our personal and have the same rights and freedoms. Our personal
confidential data. communication with a digital citizen may include
√√ Think before
images, documents, files, etc., that are private
sharing credentials to both. We should respect this privacy and
with others on an should not share those images, documents, files,
online platform. etc., with any other digital citizen without each
√√ Keep personal others’ consent.
information safe • Respect diversity: in a group or public forum,
and protected we should respect the diversity of the people
through
passwords.
in terms of knowledge, experience, culture and
other aspects.

(C) Be Responsible
• Avoid cyber bullying: any insulting, degrading
or intimidating online behaviour like repeated
posting of rumours, giving threats online,
posting the victim’s personal information, sexual
harassment or comments aimed to publicly
ridicule a victim is termed as cyber bullying.
It implies repeatedly targeting someone with

2024-25

Chapter 6.indd 170 10/9/2020 12:36:54 PM


Societal Impacts 171

intentions to hurt or embarrass. Perhaps new or


non-frequent users of the Internet feel that things
done online have no effect in the real world. We Activity 6.2
need to realise that bullying online can have very Find out how to report
serious implications on the other person (victim). about an abusive or
Also, remember our actions can be traced back inappropriate post or
using our digital footprints. about a sender in a
social network.
• Don’t feed the troll: an Internet troll is a person
who deliberately sows discord on the Internet by
starting quarrels or upsetting people, by posting
inflammatory or off topic messages in an online
community, just for amusement. Since trolls thrive
on attention, the best way to discourage trolls is
not to pay any attention to their comments.

6.3.2 Communication Etiquettes


Digital communication includes email, texting, instant
messaging, talking on the cell phone, audio or video
conferencing, posting on forums, social networking
sites, etc. All these are great ways to connect with people
in order to exchange ideas, share data and knowledge.
Good communication over email, chat room and other
such forums require a digital citizen to abide by the
communication etiquettes as shown in Figure 6.3.

Communication Etiquettes

Be Be
Precise Polite

Respect Respect
Time Data Be
Limits Credible
Avoid Spam!!
On receiving
junk email (called
Figure 6.3: Communication etiquettes Spam), neither
reply nor open any
(A) Be Precise attachment in such
email.
• Respect time: we should not waste precious time
in responding to unnecessary emails or comments

2024-25

Chapter 6.indd 171 10/9/2020 12:36:54 PM


172 Informatics Practices

unless they have some relevance for us. Also, we


No Permanent should not always expect an instant response as
Deletion!!
the recipient may have other priorities.
We can post or
comment anything on • Respect data limits: For concerns related to data
Internet, and delete it and bandwidth, very large attachments may be
later. avoided. Rather send compressed files or link of
√√ But remember, the files through cloud shared storage like Google
it cannot be Drive, Microsoft OneDrive, Yahoo Dropbox, etc.
permanently
deleted. It is (B) Be Polite
recorded in our
Digital Footprint. Whether the communication is synchronous (happening
√√ This is how
in real time like chat, audio/video calls) or asynchronous
many culprits (like email, forum post or comments), we should be
who spread hate, polite and non-aggressive in our communication. We
bully others should avoid being abusive even if we don’t agree with
or engage in
criminal activities
others’ point of view.
are traced and
apprehended. (C) Be Credible
We should be cautious while making a comment,
replying or writing an email or forum post as such acts
decide our credibility over a period of time. That is how
we decide to follow some particular person’s forum posts
while ignoring posts of other members of the forum. On
various discussion forums, we usually try to go through
the previous comments of a person and judge their
credibility before relying on that person’s comments.
6.3.3 Social Media Etiquettes
In the current digital era, we are familiar with different
kinds social media and we may have an account on
Facebook, Google+, Twitter, Instagram, Pinterest,
or YouTube channel. Social media are websites or
applications that enable their users to participate in
social networking by creating and sharing content with
others in the community. These platforms encourage
users to share their thoughts and experiences through
posts or pictures. In this way users can interact with
other online users of those social media apps or channels.
This is why the impact and outreach of social media has
grown exponentially. It has begun to shape the outcome
of politics, business, culture, education and more. In
social media too, there are certain etiquettes we need to
follow as shown in Figure 6.4.

2024-25

Chapter 6.indd 172 10/9/2020 12:36:54 PM


Societal Impacts 173

Choose password wisely Don’t Meet Up!!


Know who you befriend √√ Never arrange to
meet an online
Beware of fake information friend because it
may not be safe.
√√ No matter how
Think before you upload genuine someone
is appearing
online,
Figure 6.4: Social media etiquettes they might be
pretending and
(A) Be Secure hiding their real
• Choose password wisely: it is vital for social identity.
network users. News of breaching or leakage of user
data from social network often attracts headlines.
Users should be wary of such possibilities and
must know how to safeguard themselves and
their accounts. The minimum one can do is to
have strong and frequently changed password.
Never share personal credentials like username
and password with others.
• Know who you befriend: social networks usually
encourage connecting with users (making friends), Think and Reflect
sometime even those whom we don’t know or have
not met. However, we need to be careful while Is having the same
password for all your
befriending unknown people as their intentions
accounts on different
possibly could be malicious and unsafe. websites safe?
• Beware of fake information: fake news, messages
and posts are common in social networks. As a
user, we should be aware of them. With experience,
we should be able to figure out whether a news,
message or post is genuine or fake. Thus, we
should not blindly believe in everything that we
come across on such platforms, we should apply
our knowledge and experience to validate such
news, message or post.

(B) Be Reliable Play Safe!!


• Think before uploading: we can upload almost Think carefully before
anything on social network. However, remember sharing personal
that once uploaded, it is always there in the photos.
remote server even if we delete the files. Hence we
need to be cautious while uploading or sending
sensitive or confidential files which have a bearing
on our privacy.

2024-25

Chapter 6.indd 173 10/9/2020 12:36:54 PM


174 Informatics Practices

6.4 Data Protection


In this digital age, data or information protection
Activity 6.3
is mainly about the privacy of data stored digitally.
Suppose someone's Elements of data that can cause substantial harm,
email password is
'technology', which is embarrassment, inconvenience and unfairness to an
weak. Can you suggest individual, if breached or compromised, is called sensitive
a stronger password? data. Examples of sensitive data include biometric
information, health information, financial information,
or other personal documents, images or audios or
videos. Privacy of sensitive data can be implemented by
encryption, authentication, and other secure methods
to ensure that such data is accessible only to the
authorised user and is for a legitimate purpose.
Think and Reflect All over the world, each country has its own data
protection policies (laws). These policies are legal
Why should we always
documents that provide guidelines to the user on
mention the source
from which we got an processing, storage and transmission of sensitive
idea or used resources information. The motive behind implementation of
(text, image, audio, these policies is to ensure that sensitive information is
video, etc.) to prepare a appropriately protected from modification or disclosure.
project or a writeup?
6.4.1 Intellectual Property Right (IPR)
When someone owns a house or a motorcycle, we
say that the person owns that property. Similarly,
if someone comes out with a new idea, this original
idea is that person’s intellectual property. Intellectual
Property refers to the inventions, literary and artistic
expressions, designs and symbols, names and logos.
The ownership of such concepts lies with the creator,
Executing IPR: say or the holder of the intellectual property. This enables
for a software the creator or copyright owner to earn recognition or
√√ Code of the financial benefit by using their creation or invention.
software will be Intellectual Property is legally protected through
protected by
a copyright copyrights, patents, trademarks,etc.
√√ Functional (A) Copyright
expression of Copyright grants legal rights to creators for their original
the idea will be
protected by a
works like writing, photograph, audio recordings, video,
patent sculptures, architectural works, computer software,
√√ The name and and other creative works like literary and artistic work.
logo of the Copyrights are automatically granted to creators and
software will come authors. Copyright law gives the copyright holder a set
under a registered of rights that they alone can avail legally. The rights
trademark include right to copy (reproduce) a work, right to create

2024-25

Chapter 6.indd 174 10/9/2020 12:36:54 PM


Societal Impacts 175

derivative works based upon it, right to distribute copies


of the work to the public, and right to publicly display
or perform the work. It prevents others from copying,
Activity 6.4
using or selling the work. For example, writer Rudyard
Kipling holds the copyright to his novel, ‘The Jungle Explore the following
Book’, which tells the story of Mowgli, the jungle boy. websites to know
about open/public
It would be an infringement of the writer’s copyright if licensing:
someone used parts of the novel without permission. To
(i) creativecommons.
use other’s copyrighted material, one needs to obtain a org for CC, and
license from them.
(ii) gnu.org for GNU
(B) Patent GPL.
A patent is usually granted for inventions. Unlike
copyright, the inventor needs to apply (file) for patenting
the invention. When a patent is granted, the owner gets
an exclusive right to prevent others from using, selling,
or distributing the protected invention. Patent gives full
control to the patentee to decide whether or how the
invention can be used by others. Thus it encourages
inventors to share their scientific or technological
findings with others. A patent protects an invention for
20 years, after which it can be freely used. Recognition
Only the copyright
and/or financial benefit foster the right environment, and
owner of a work can
provide motivation for more creativity and innovation. enter into a license
agreement.
(C) Trademark
Trademark includes any visual symbol, word, name,
design, slogan, label, etc., that distinguishes the
brand or commercial enterprise, from other brands
or commercial enterprises. For example, no company
other than Nike can use the Nike brand to sell shoes or End User License
clothes. It also prevents others from using a confusingly Agreement (EULA)
similar mark, including words or phrases. For example, contains the dos
and don’ts with
confusing brands like “Nikke” cannot be used. However, respect to the
it may be possible to apply for the Nike trademark for software being
unrelated goods like notebooks. purchased. It
covers all clauses of
6.4.2 Licensing software purchase,
viz., how many
We have studied about copyright in the previous section.
copies can be
Licensing and copyrights are two sides of the same coin. installed, whether
A license is a type of contract or a permission agreement source is available,
between the creator of an original work permitting whether it can
someone to use their work, generally for some price; be modified and
redistributed and
whereas copyright is the legal rights of the creator for the
so on.
protection of original work of different types. Licensing

2024-25

Chapter 6.indd 175 10/9/2020 12:36:55 PM


176 Informatics Practices

is the legal term used to describe the terms under which


Beware!! people are allowed to use the copyrighted material. We
√√ Plagiarism means will limit our study to software licensing in this chapter.
using other’s work A software license is an agreement that provides
and not giving
adequate citation legally binding guidelines pertaining to the authorised
for use. use of digital material. The digital material may include
√√ Copyright any software or any form of art, literature, photos,
infringement etc., in digital form. Any such resource posted on the
means using Internet constitutes intellectual property and must
another person’s
be downloaded, used or distributed according to the
work, without
permission or guidelines given in the license agreement. Failure to
without paying follow such guidelines is considered as an infringement of
for it, if it is being Intellectual Property Rights (IPR), and is a criminal offence.
sold.
6.4.3 Violation of IPR
Violation of intellectual property right may happen in
one of the following ways:
(A) Plagiarism
With the availability of Internet, we can instantly copy
or share text, pictures and videos. Presenting someone
else’s idea or work as one’s own idea or work is called
plagiarism. If we copy some contents from Internet,
but do not mention the source or the original creator,
then it is considered as an act of plagiarism. Further, if
someone derives an idea or a product from an already
existing idea or product, but instead presents it as a
new idea, then also it is plagiarism. It is a serious ethical
offense and sometimes considered as an act of fraud.
Even if we take contents that are open for public use,
we should cite the author or source to avoid plagiarism.
(B) Copyright Infringement
Copyright infringement is when we use other person’s
work without obtaining their permission to use or we
have not paid for it, if it is being sold. Suppose we
download an image from the Internet and use it in our
project. But if the owner of the copyright of the image
does not permit its free usage, then using such an image
even after giving reference of the image in our project is a
violation of copyright. Just because it is on the Internet,
does not mean that it is free for use. Hence, check the
copyright status of writer’s work before using it to avoid
copyright infringement.

2024-25

Chapter 6.indd 176 10/9/2020 12:36:55 PM


Societal Impacts 177

(C) Trademark Infringement


Trademark Infringement means unauthorised use of Remember
other’s trademark on products and services. An owner √√ CC licenses are
of a trademark may commence legal proceedings against a set of copyright
licenses that give
someone who infringes its registered trademark. the recipients,
rights to copy,
6.4.4 Public Access and Open Source Software
modify and
Copyright sometimes put restriction on the usage of redistribute the
the copyrighted works by anyone else. If others are creative material,
allowed to use and built upon the existing work, it but giving the
authors, the
will encourage collaboration and would result in new liberty to decide
innovations in the same direction. Licenses provide the conditions
rules and guidelines for others to use the existing work. of licensing.
When authors share their copyrighted works with others √√ GPL is the most
under public license, it allows others to use and even widely used free
modify the content. Open source licenses help others to software license
contribute to existing work or project without seeking which grants the
recipients, rights
special individual permission to do so. to copy, modify
The GNU General Public License (GPL) and the and redistribute
Creative Commons (CC) are two popular categories of the software
public licenses. CC is used for all kind of creative works and that the
same rights are
like websites, music, film, literature, etc. CC enables
preserved in all
the free distribution of an otherwise copyrighted work. derivative works.
It is used when an author wants to give people the right
to share, use and build upon a work that they have
created. GPL is primarily designed for providing public
licence to a software. GNU GPL is another free software
license, which provides end users the freedom to run,
study, share and modify the software, besides getting
regular updates.
Users or companies who distribute GPL licensed
works may charge a fee for copies or give them free of
charge. This distinguishes the GPL license from freeware
software licenses like Skype, Adobe Acrobat reader,
etc. that allow copying for personal use but prohibit
commercial distribution, or proprietary licenses where
copying is prohibited by copyright law.
Many of the proprietary software that we use are sold
commercially and their program code (source code) are
not shared or distributed. However, there are certain
software available freely for anyone and their source
code is also open for anyone to access, modify, correct
and improve. Free and open source software (FOSS)
has a large community of users and developers who are

2024-25

Chapter 6.indd 177 10/9/2020 12:36:55 PM


178 Informatics Practices

Notes contributing continuously towards adding new features


or improving the existing features. For example, Linux
kernel-based operating systems like Ubuntu and Fedora
come under FOSS. Some of the popular FOSS tools are
office packages, like Libre Office, browser like Mozilla
Firefox, etc.
Software piracy is the unauthorised use or distribution
of software. Those who purchase a license for a copy of
the software do not have the rights to make additional
copies without the permission of the copyright owner. It
amounts to copyright infringement regardless of whether
it is done for sale, for free distribution or for copier’s
own use. One should avoid software piracy. Using a
pirated software not only degrades the performance of a
computer system, but also affects the software industry
which in turn affects the economy of a country.

6.5 Creative Commons


Creative Commons is a non-profit organisation (https://
creativecommons.org/) that aims to build a publically
accessible global platform where a range of creative and
academic works are shared freely. Any one across the
globe can access them, share them, and even use them
for creating their own work out of it without infringing the
copyright or Intellectual Property rights of the owners.
In fact, it gives proper attribution to the owners.
The Creative Commons organisation provides
Creative Commons (CC) licenses free of charge. It allows
owners of a work to grant copyright permissions for their
creative and/or academic works in a free, simple and
standardised way. A CC license is a type of copyright
license that enables the free distribution of anybody’s
copyrighted work. This license is used when an author
wants to give others the right to share, use and extend
the work done by them. The work licensed under CC
is governed by the Copyright law and so applies to all
types of work including art, music, literature, dramatics,
movies, images, educational resources, photographs
and software. The CC Search feature of the online
platform makes the licensed material easier to find. The
author of the content is given full freedom to set up
conditions to use their work. The owner of a work can
combine these conditions to create six different types of
CC licenses, as listed in Table 6.1.

2024-25

Chapter 6.indd 178 10/9/2020 12:36:55 PM


Societal Impacts 179

Table 6.1 Creative Commons (CC) Licenses


License Name Symbolic License icon Description
name
Attribution CC BY This license lets others distribute, remix,
tweak, and build upon your work, even
commercially, as long as they credit you for
the original creation.
Attribution- CC BY-SA This license lets others remix, tweak, and
ShareAlike build upon your work even for commercial
purposes, as long as they credit you and
license their new creations under the identical
terms.
Attribution- CC BY-ND This license lets others reuse the work for any
NoDerivs purpose, including commercially; however, it
cannot be shared with others in adapted form,
and credit must be provided to you.
Attribution- CC BY-NC This license lets others remix, tweak, and
NonCommercial build upon your work non-commercially,
and although their new works must also
acknowledge you and be non-commercial.
Attribution- CC BY-NC-SA This license lets others remix, tweak, and
NonCommercial- build upon your work non-commercially, as
ShareAlike long as they credit you and license their new
creations under the identical terms.
Attribution- CC BY-NC-ND This license is the most restrictive of our
NonCommercial- six main licenses, only allowing others to
NoDerivs download your works and share them with
others as long as they credit you, but they
can’t change them in any way or use them
commercially.

6.6 Cyber Crime


Criminal activities or offences carried out in a digital
environment can be considered as cyber crime. In such
crimes, either the computer itself is the target or the
computer is used as a tool to commit a crime. Cyber
crimes are carried out against either an individual, or
Remember!!
a group, or an organisation or even against a country,
Cyber crime is defined
with the intent to directly or indirectly cause physical
as a crime in which
harm, financial loss or mental harassment. A cyber computer is the
criminal attacks a computer or a network to reach other medium of crime
computers in order to disable or damage data or services. (hacking, phishing,
Apart from this, a cyber criminal may spread viruses and spamming), or the
computer is used
other malwares in order to steal private and confidential as a tool to commit
data for blackmailing and extortion. A computer virus is crimes (extortion, data
some lines of malicious code that can copy itself and can breaches, theft).
have detrimental effect on the computers, by destroying
data or corrupting the system. Similarly, malware is

2024-25

Chapter 6.indd 179 10/9/2020 12:36:55 PM


180 Informatics Practices

a software designed to specifically gain unauthorised


access to computer systems. The nature of criminal
Activity 6.5 activities are alarmingly increasing day-by-day, with
frequent reports of hacking, ransomware attacks,
How can you
unsubscribe from a denial-of-service, phishing, email fraud, banking fraud
mail group or block an and identity theft.
email sender?
6.6.1 Hacking
Hacking is the act of unauthorised access to a computer,
computer network or any digital system. Hackers usually
have technical expertise of the hardware and software.
They look for bugs to exploit and break into the system.
Hacking, when done with a positive intent, is called
ethical hacking. Such ethical hackers are known as
white hat hackers. They are specialists in exploring
any vulnerability or loophole by during testing of the
software. Thus, they help in improving the security of
a software. An ethical hacker may exploit a website in
order to discover its security loopholes or vulnerabilities.
He then reports his findings to the website owner. Thus,
ethical hacking is actually preparing the owner against
any cyber attack.
A non-ethical hacker is the one who tries to gain
unauthorised access to computers or networks in order
to steal sensitive data with the intent to damage or
bring down systems. They are called black hat hackers
or crackers. Their primary focus is on security cracking
and data stealing. They use their skill for illegal or
malicious purposes. Such hackers try to break through
system securities for identity theft, monetary gain, to
bring a competitor or rival site down, to leak sensitive
Beware !! information, etc.
Accepting links from 6.6.2 Phishing and Fraud Emails
untrusted emails
can be hazardous, as Phishing is an unlawful activity where fake websites or
they may potentially emails that look original or authentic are presented to
contain a virus or link the user to fraudulently collect sensitive and personal
to malicious website.
details, particularly usernames, passwords, banking
We should ensure to
open any email link or and credit card details. The most common phishing
attachment only when method is through email spoofing where a fake or
it is from a trusted forged email address is used and the user presumes
source and doesn’t it to be from an authentic source. So you might get an
look doubtful.
email from an address that looks similar to your bank
or educational institution, asking for your information,

2024-25

Chapter 6.indd 180 10/9/2020 12:36:55 PM


Societal Impacts 181

but if you look carefully you will see their URL address
is fake. They will often use logo’s of the original, making
them difficult to detect from the real! Phishing attempts
through phone calls or text messages are also common
these days.
(A) Identity Theft
Identity thieves increasingly use personal information
stolen from computers or computer networks, to commit
fraud by using the data gained unlawfully. A user’s
identifiable personal data like demographic details,
email ID, banking credentials, passport, PAN, Aadhaar
number and various such personal data are stolen and
misused by the hacker on behalf of the victim. This Activity 6.6
is one type of phishing attack where the intention is
Explore and find out
largely for monetary gain. There can be many ways in how to file a complaint
which the criminal takes advantage of an individual’s with the cyber cell in
stolen identity. Given below are a few examples: your area.
• Financial identity theft: when the stolen identity is
used for financial gain.
• Criminal identity theft: criminals use a victim’s
stolen identity to avoid detection of their
true identity.
• Medical identity theft: criminals can seek medical
drugs or treatment using a stolen identity.

6.6.3 Ransomware
This is another kind of cyber crime where the attacker
gains access to the computer and blocks the user from
accessing, usually by encrypting the data. The attacker
blackmails the victim to pay for getting access to the
data, or sometimes threatens to publish personal and
sensitive information or photographs unless a ransom
is paid.
Ransomware can get downloaded when the users
visit any malicious or unsecure websites or download
software from doubtful repositories. Some ransomware
are sent as email attachments in spam mails. It can
also reach our system when we click on a malicious
advertisement on the Internet.
6.6.4 Combatting and Preventing Cyber Crime
The challenges of cyber crime can be mitigated with
the twin approach of being alert and taking legal help.

2024-25

Chapter 6.indd 181 10/9/2020 12:36:55 PM


182 Informatics Practices

Following points can be considered as safety measures


Digital signatures to reduce the risk of cyber crime:
are the digital • Take regular backup of important data.
equivalent of a
• Use an antivirus software and keep it updated
paper certificate.
Digital signatures always.
work on a unique • Avoid installing pirated software. Always download
digital ID issued software from known and secure (HTTPS) sites.
by an Certificate
Authority (CA) to
• Always update the system software which include
the user. Signing a the Internet browser and other application software
document digitally • Do not visit or download anything from untrusted
means attaching websites.
that user's identify,
which can be used • Usually the browser alerts users about doubtful
to authenticate. websites whose security certificate could not be
A licensed verified; avoid visiting such sites.
Certifying Authority
• Use strong password for web login, and change
(CA) who has been
granted a license it periodically. Do not use same password for
to issue it under all the websites. Use different combinations
Section 24 of the of alphanumeric characters including special
Indian IT-Act 2000, characters. Ignore common words or names
can issue the digital
in password.
signature.
• While using someone else’s computer, don’t allow
browser to save password or auto fill data, and try
to browse in your private browser window.
• For an unknown site, do not agree to use cookies
when asked for through a Yes/No option.
• Perform online transaction like shopping, ticketing,
and other such services only through well-known
and secure sites.
• Always secure wireless network at home with
strong password and regularly change it.

6.7 Indian Information Technology Act (IT Act)


With the growth of Internet, many cases of cyber crimes,
frauds, cyber attacks and cyber bullying are reported.
The nature of fraudulent activities and crimes keeps
changing. To deal with such menaces, many countries
have come up with legal measures for protection of
sensitive personal data and to safeguard the rights
of Internet users. The Government of India’s The
Information Technology Act, 2000 (also known as IT
Act), amended in 2008, provides guidelines to the user
on the processing, storage and transmission of sensitive

2024-25

Chapter 6.indd 182 10/9/2020 12:36:55 PM


Societal Impacts 183

information. In many Indian states, there are cyber


cells in police stations where one can report any cyber California Law
University has
crime. The act provides legal framework for electronic identified non-
governance by giving recognition to electronic records functioning cathode
and digital signatures. The act outlines cyber crimes ray tubes (CRTs)
and penalties for them. from televisions and
Cyber Appellate Tribunal has been established computer monitors
as hazardous.
to resolve disputes arising from cyber crime, such as
tampering with computer source documents, hacking
the computer system, using password of another person,
publishing sensitive personal data of others without
their consent, etc. The act is needed so that people can
perform transactions over the Internet through credit
cards without fear of misuse. Not only people, the act
empowers government departments also to accept filing,
creation and storage of official documents in the digital
format.

6.8 E-waste: Hazards and Management


E-waste or Electronic waste includes electric or
electronic gadgets and devices that are no longer in use.
Hence, discarded computers, laptops, mobile phones,
televisions, tablets, music systems, speakers, printers,
scanners etc. constitute e-waste when they are near or
end of their useful life.
E-waste is becoming one of the fastest growing
environmental hazards in the world today. The
increased use of electronic equipment has also caused
an exponential increase in the number of discarded
products. Lack of awareness and appropriate skill
Leaching is the
to manage it has further worsened the problem. So,
process of removing
Waste Electrical and Electronic Equipment (WEEE) is a substance from
becoming a major concern for all countries across the another substance
world. Globally, e-waste constitutes more than 5 per by passing water
cent of the municipal solid waste. Therefore, it is very through it.
important that e-waste is disposed of in such a manner
that it causes minimum damage to the environment
and society.
6.8.1 Impact of e-waste on environment
To some extent, e-waste is responsible for the degradation
of our environment. Whether it is emission of gases and
fumes into the atmosphere, discharge of liquid waste
into drains or disposal of solid e-waste materials, all of

2024-25

Chapter 6.indd 183 10/9/2020 12:36:55 PM


184 Informatics Practices

this contributes to environmental pollution in some way


or the other.
When e-waste is carelessly thrown or dumped in
landfills or dumping grounds, certain elements or
metals used in production of electronic products cause
air, water and soil pollution. This is because when these
products come in contact with air and moisture, they
tend to leach. As a result, the harmful chemicals seep
into the soil, causing soil pollution. Further, when these
chemicals reach and contaminate the natural ground
water, it causes water pollution as the water becomes
unfit for humans, animals and even for agricultural use.
When dust particles loaded with heavy metals enters
the atmosphere, it causes air pollution as well.
6.8.2 Impact of e-waste on humans
As mentioned before, the electrical or electronic devices
are manufactured using certain metals and elements
like lead, beryllium, cadmium, plastics, etc. Most of
these materials are difficult to recycle and are considered
to be toxic and carcinogenic. If e-waste is not disposed
Carcinogenic: May of in proper manner, it can be extremely harmful to
cause cancer humans, plants, animals and the environment as
discussed below:
• One of the most widely used metals in electronic
devices (such as monitors and batteries) is lead.
When lead enters the human body through
contaminated food, water, air or soil, it causes lead
poisoning which affects the kidneys, brain and
central nervous system. Children are particularly
vulnerable to lead poisoning.
• When e-waste such as electronic circuit boards are
burnt for disposal, the elements contained in them
create a harmful chemical called beryllium which
causes skin diseases, allergies and an increased
risk of lung cancer. Burning of insulated wires to
extract copper can cause neurological disorders.
• Some of the electronic devices contain mercury
which causes respiratory disorders and
brain damage.
• The cadmium found in semiconductors and
resistors can damage kidneys, liver and bones.
• None of the electronic devices are manufactured
without using plastics. When this plastic reacts

2024-25

Chapter 6.indd 184 10/9/2020 12:36:55 PM


Societal Impacts 185

with air and moisture, it passes harmful chemicals


into the soil and water resources. When consumed,
it damages the immune system of the body and also
causes various psychological problems like stress
and anxiety.

6.8.3 Management of e-waste


E-waste management is the efficient disposal of e-waste.
Although we cannot completely destroy e-waste, still
certain steps and measures have to be taken to reduce
harm to the humans and environment. Some of the
feasible methods of e-waste management are reduce,
reuse and recycle.
• Reduce: We should try to reduce the generation of
e-waste by purchasing the electronic or electrical
devices only according to our need. Also, they
should be used to their maximum capacity and
discarded only after their useful life has ended.
Good maintenance of electronics devices also
increases the life of the devices.
• Reuse: It is the process of re-using the electronic
or electric waste after slight modification. The
electronic equipment that is still functioning should
be donated or sold to someone who is still willing to
use it. The process of re-selling old electronic goods
at lower prices is called refurbishing.
• Recycle: Recycling is the process of conversion of
electronic devices into something that can be used
again and again in some or the other manner. Only
those products should be recycled that cannot
be repaired, refurbished or re-used. To promote
recycling of e-waste many companies and NGOs
are providing door-to-door pick up facilities for
collecting the e-waste from homes and offices.

6.8.4 E-waste Management in India


In India, the Environmental Protection Act, 1986, has Think and Reflect
been enacted to punish people responsible for causing
Do you follow
any form of pollution by paying for the damage done to precautions to stay
the natural environment. According to this act, “Polluter healthy - physically,
pays Principle”, any one causing any form of pollution mentally as well as
will pay for the damage caused. Any violation of the emotionally while using
provisions of this act is liable for punishment. digital technologies?

2024-25

Chapter 6.indd 185 10/9/2020 12:36:55 PM


186 Informatics Practices

The Central Pollution Control Board (CPCB) has


issued a formal set of guidelines for proper handling
and disposal of e-waste. According to these guidelines,
the manufacturer of any electronic equipment will be
“personally” responsible for the final safe disposal of the
product when it becomes an e-waste.
The Department of Information Technology (DIT),
Ministry of Communication and Information Technology,
has also issued a comprehensive technical guide on
“Environmental Management for Information Technology
Industry in India.” The industries have to follow these
guidelines for recycling and reuse of e-waste. In order to
Device Safety: Ensures
make the consumers aware of the recycling of e-waste,
Good Health of a prominent smartphone and computer manufacturing
Computer System companies have started various recycling programs.
√√ Regularly clean it
to keep the dust 6.9 Impact on Health
off. Use a liquid
solution specifically As digital technologies have penetrated into different
formulated for fields, we are spending more time in front of screens, be
the cleaning of it mobile, laptop, desktop, television, gaming console,
electronic screens. music or sound device. But interacting in an improper
√√ Wipe monitor’s posture can be bad for us — both physically, and
screen often mentally. Besides, spending too much time on Internet
using the regular
microfibre soft cloth can be addictive and can have a negative impact on our
(the one used for physical and psychological well being.
spectacles). However, these health concerns can be addressed
√√ Keep it away from to some extent by taking care of the way we position
direct heat, sunlight such devices and the way we position our posture.
and put it in a Ergonomics is a branch of science that deals with
room with enough
designing or arranging workplaces including the
ventilation for air
circulation. furniture, equipments and systems so that it becomes
√√ Do not eat food
safe and comfortable for the user. Ergonomics helps us
or drink over the in reducing the strain on our bodies — including the
keyboard. Food fatigue and injuries due to prolonged use.
crumbs that fall When we continuously look at the screen for
into the gaps watching, typing, chatting or playing games, our eyes
between the keys or
spilled over liquid
are continuously exposed to the glare coming from the
can cause issues to screens. Looking at small handheld devices makes it
the devices. worse. Eye strain is a symptom commonly complained
by users of digital devices. Ergonomically maintaining
the viewing distance and angle, along with the position

2024-25

Chapter 6.indd 186 10/9/2020 12:36:55 PM


Societal Impacts 187

can be of some help. Figure 6.5 shows the posture to


be maintained in order to avoid fatigue caused due to
prolonged use of computer system and other digital
devices. However, to get rid of dry, watering, or itchy
eyes, it is better to periodically focus on distant objects,
and take a break for outdoor activities.

Viewing Distance 19”-24”


Vie
win
gA
ngl
e

Wrist Straight
Lumber 900
Support
for Lower
Back
Seat Back Angle 900 Maintain a Balance!!
900 Knee Enjoy the exciting
Angle world of digital devices
in tandem with other
pursuits of thrilling
Adjustable Seat sports and hobbies.
Height 23”-28” Feet on floor Online friends are good,
footrest for but spending time with
shorter people
friends in real life is
very fulfilling. Often the
wholesome nature of
real interactions cannot
be compared to just
Figure 6.5: Correct posture while sitting in front of a computer online social networking.

Bad posture, backaches, neck and shoulder pains


can be prevented by arranging the workspace as
recommended by ergonomics. Overuse of keyboards
(be it physical keyboard or touchscreen-based virtual
keyboard) not aligned ergonomically, can give rise to a
painful condition of wrists and fingers, and may require
medical help in the long run.
Stress, physical fatigue and obesity are the other
related impacts the body may face if one spends too
much time using digital devices.

2024-25

Chapter 6.indd 187 10/9/2020 12:36:55 PM


188 Informatics Practices

Notes
S ummary
• Digital footprint is the trail of data we leave behind
when we visit any website (or use any online
application or portal) to fill-in data or perform
any transaction.
• A user of digital technology needs to follow certain
etiquettes like net-etiquettes, communication-
etiquettes and social media-etiquettes.
• Net-etiquette includes avoiding copyright
violations, respecting privacy and diversity of
users, and avoiding cyber bullies and cyber trolls,
besides sharing of expertise.
• Communication-etiquette requires us to be precise
and polite in our conversation so that we remain
credible through our remarks and comments.
• While using social media, one needs to take care
of security through password, be aware of fake
information and be careful while befriending
unknowns. Care must be taken while sharing
anything on social media as it may create havoc
if being mishandled, particularly our personal,
sensitive information.
• Intellectual Property Rights (IPR) help in data
protection through copyrights, patents and
trademarks. There are both ethical and legal
aspects of violating IPR. A good digital citizen
should avoid plagiarism, copyright infringement
and trademark infringement.
• Certain software are made available for free public
access. Free and Open Source Software (FOSS)
allow users to not only access but also to modify
(or improve) them.
• Cyber crimes include various criminal activities
carried out to steal data or to break down
important services. These include hacking,
spreading viruses or malware, sending phishing
or fraudulent emails, ransomware, etc.
• Excessive usage of digital devices has a negative
impact on our physical as well as psychological
well-being. Ergonomic positioning of devices as
well as our posture are important.

2024-25

Chapter 6.indd 188 10/9/2020 12:36:55 PM


Societal Impacts 189

Exercise Notes

1. After practicals, Atharv left the computer laboratory


but forgot to sign off from his email account. Later,
his classmate Revaan started using the same
computer. He is now logged in as Atharv. He sends
inflammatory email messages to few of his classmates
using Atharv’s email account. Revaan’s activity is
an example of which of the following cyber crime?
Justify your answer.
a) Hacking
b) Identity theft
c) Cyber bullying
d) Plagiarism
2. Rishika found a crumpled paper under her desk.
She picked it up and opened it. It contained some
text which was struck off thrice. But she could still
figure out easily that the struck off text was the email
ID and password of Garvit, her classmate. What is
ethically correct for Rishika to do?
a) Inform Garvit so that he may change his password.
b) Give the password of Garvit’s email ID to all other
classmates.
c) Use Garvit’s password to access his account.
3. Suhana is down with fever. So, she decided not to
go to school tomorrow. Next day, in the evening
she called up her classmate, Shaurya and enquired
about the computer class. She also requested him to
explain the concept. Shaurya said, “Mam taught us
how to use tuples in python”. Further, he generously
said, “Give me some time, I will email you the material
which will help you to understand tuples in python”.
Shaurya quickly downloaded a 2-minute clip from the
Internet explaining the concept of tuples in python.
Using video editor, he added the text “Prepared by
Shaurya” in the downloaded video clip. Then, he
emailed the modified video clip to Suhana. This act
of Shaurya is an example of —
a) Fair use
b) Hacking
c) Copyright infringement
d) Cyber crime
4. After a fight with your friend, you did the following
activities. Which of these activities is not an example
of cyber bullying?

2024-25

Chapter 6.indd 189 10/9/2020 12:36:55 PM


190 Informatics Practices

Notes a) You sent an email to your friend with a message


saying that “I am sorry”.
b) You sent a threatening message to your friend
saying “Do not try to call or talk to me”.
c) You created an embarrassing picture of your
friend and uploaded on your account on a social
networking site.
5. Sourabh has to prepare a project on “Digital India
Initiatives”. He decides to get information from the
Internet. He downloads three web pages (webpage
1, webpage 2, webpage 3) containing information on
Digital India Initiatives. Which of the following steps
taken by Sourabh is an example of plagiarism or
copyright infringement? Give justification in support
of your answer.
a) He read a paragraph on “ Digital India Initiatives”
from webpage 1 and rephrased it in his own
words. He finally pasted the rephrased paragraph
in his project.
b) He downloaded three images of “ Digital India
Initiatives” from webpage 2. He made a collage for
his project using these images.
c) He downloaded “Digital India Initiative” icon from
web page 3 and pasted it on the front page of his
project report.
6. Match the following:
Column A Column B
Plagiarism Fakers, by offering special rewards or money
prize asked for personal information, such as
bank account information
Hacking Copy and paste information from the Internet
into your report and then organise it

Credit card fraud The trail that is created when a person uses
the Internet.
Digital Foot Print Breaking into computers to read private emails
and other files

7. You got the below shown SMS from your bank querying
a recent transaction. Answer the following —
a) Will you SMS your pin number to the given
contact number?
b) Will you call the bank helpline number to recheck
the validity of the SMS received?
8. Preeti celebrated her birthday with her family. She
was excited to share the moments with her friend
Himanshu. She uploaded selected images of her

2024-25

Chapter 6.indd 190 10/9/2020 12:36:55 PM


Societal Impacts 191

birthday party on a social networking site so that


Himanshu can see them. After few days, Preeti had a
fight with Himanshu. Next morning, she deleted her
birthday photographs from that social networking
site, so that Himanshu cannot access them. Later
in the evening, to her surprise, she saw that one
of the images which she had already deleted from
the social networking site was available with their
common friend Gayatri. She hurriedly enquired
Gayatri “Where did you get this picture from?”.
Gayatri replied “Himanshu forwarded this image few
minutes back”.
Help Preeti to get answers for the following questions.
Give justification for your answers so that Preeti can
understand it clearly.
a) How could Himanshu access an image which I
had already deleted?
b) Can anybody else also access these deleted
images?
c) Had these images not been deleted from my digital
footprint?
9. The school offers wireless facility (wifi) to the Computer
Science students of Class XI. For communication,
the network security staff of the school have a
registered URL schoolwifi.edu. On 17 September
2017, the following email was mass distributed to
all the Computer Science students of Class XI. The
email claimed that the password of the students was
about to expire. Instructions were given to go to URL
to renew their password within 24 hours.

A+B

a) Do you find any discrepancy in this email?


b) What will happen if the student will click on the
given URL?

2024-25

Chapter 6.indd 191 10/9/2020 12:36:55 PM


192 Informatics Practices

Notes c) Is the email an example of cyber crime? If yes,


then specify which type of cyber crime is it. Justify
your answer.
10. You are planning to go for a vacation. You surfed the
Internet to get answers for the following queries —
a) Weather conditions
b) Availability of air tickets and fares
c) Places to visit
d) Best hotel deals
Which of your above mentioned actions might have
created a digital footprint?
11. How would you recognise if one of your friends is
being cyber bullied?
a) Cite the online activities which would help you
detect that your friend is being cyber bullied?
b) What provisions are in IT Act 2000, (amended in
2008) to combact such situations.
12. Write the differences between the following —
a) Copyrights and Patents
b) Plagiarism and Copyright infringement
c) Non-ethical hacking and Ethical hacking
d) Active and Passive footprints
e) Free software and Free and open source software

13. If you plan to use a short text from an article on the


web, what steps must you take in order to credit the
sources used?
14. When you search online for pictures, how will you
find pictures that are available in the free public
domain. How can those pictures be used in your
project without copyright violations?
15. Describe why it is important to secure your wireless
router at home. Search the Internet to find the rules
to create a reasonably secure password. Create an
imaginary password for your home router. Will you
share your password for home router with following
people. Justify your answer.
a) Parents
b) Friends
c) Neighbours
d) Home tutors
16. List down the steps you need to take in order to
ensure —

2024-25

Chapter 6.indd 192 10/9/2020 12:36:55 PM


Societal Impacts 193

a) your computer is in good working condition for a


longer time.
b) smart and safe Internet surfing.
17. What is data privacy? Websites that you visit collect
what type of information about you?
18. In the computer science class, Sunil and Jagdish
were assigned the following task by their teacher.
a) Sunil was asked to find information about “India,
a Nuclear power”. He was asked to use Google
Chrome browser and prepare his report using
Google Docs.
b) Jagdish was asked to find information about
“Digital India”. He was asked to use Mozilla Firefox
browser and prepare his report using Libre Office
Writer.
What is the difference between technologies used by
Sunil and Jagdish?
19. Cite examples depicting that you were a victim of
following cyber crime. Also, cite provisions in IT Act
to deal with such a cyber crime.
a) Identity theft
b) Credit card account theft
20. Neerja is a student of Class XI. She has opted for
Computer Science. Neerja prepared the project
assigned to her. She mailed it to her teacher. The
snapshot of that email is shown below.

Find out which of the following email etiquettes


are missing in it. Justify your answer.

2024-25

Chapter 6.indd 193 10/9/2020 12:36:55 PM


194 Informatics Practices

Notes a) Subject of the mail


b) Formal greeting
c) Self-explanatory terms
d) Identity of the sender
e) Regards
21. Sumit got good marks in all the subjects. His father
gifted him a laptop. He would like to make Sumit
aware of health hazards associated with inappropriate
and excessive use of laptop. Help his father to list the
points which he should discuss with Sumit.

2024-25

Chapter 6.indd 194 10/9/2020 12:36:55 PM


Chapter

7 Project Based Learning

“An idea that is developed and put


into action is more important than
idea that exists only as an idea.”
— Gautam Buddha

In this chapter
7.1 Introduction »» Introduction
Project based learning gives a thorough »» Approaches for
practical exposure to students regarding a Solving Projects
problem upon which the project is based. »» Teamwork
Through project based learning, students »» Project Descriptions
learn to organise their project and use their
time effectively for successful completion of
the project. Projects are developed generally
in groups where students can learn various
skills such as working together, problem
solving, decision making, and investigating
activities. Project based learning involves
the steps such as analysing the problem,
formulating the problem into small modules,
applying the mechanism or method to solve
each module and then integrating the solution

2024-25

Chapter 7.indd 195 11/26/2020 12:41:57 PM


196 Informatics Practices

Notes of all the modules to arrive at the complete solution of


the problem. To solve a problem it is required that those
who work on it gather the relevant data and process it by
applying a particular method. Data may be collected as
per the requirement of the project in a particular format.
All the team members should associate themselves to
accomplish the task. After collecting the data, it should
be processed to solve the problem. The results should
be reported in a predetermined format.

7.2 Approaches for Solving Projects


The approach followed for the development and
completion of a project plays a pivotal role in project-
based learning. There are several approaches to execute
a project such as modular approach, top down approach
and bottom up approach. A structured or a modular
approach to a project means that a project is divided
into various manageable modules, and each of the
modules has a well-defined task to be performed with a
set of inputs. This would lead to a set of outputs which
when integrated leads to the desired outcome.
Different steps involved in Project Based Learning
(Figure 7.1) are:
(1) Identification of a project: The project idea may
come through any real life situation. For example, one
could think of doing a project for organising a seminar.
One needs to understand the usefulness of the project
and its impact. Students must be encouraged to
undertake interdisciplinary projects.
(2) Defining a plan: Normally for any kind of project,
there are several project members involved in it. One
project leader has to be identified. The roles of project
leader and each project member have to be clearly
defined. Students who are performing a project must
be assigned with specific activities. The various tools
for executing these activities must be known. To obtain
a better solution, one should always think of the
extreme situations.
(3) Fixing of a time frame and processing: Every
project is a time relevance project. A student must
understand the importance of time frame for completion
of the project. All the activities which are performed in
the projects require a certain amount of time. Every

2024-25

Chapter 7.indd 196 11/26/2020 12:41:57 PM


Project Based Learning 197

project must be well structured and at the same time it


must be flexible in its time frame.
Identification
of Project Defining a plan

Fixing time frame Guidance and


and processing monitoring

Outcome
of project

Figure 7.1: Steps in project based learning

(4) Providing guidance and monitoring a project:


Many times, the participants in the project get stuck
up with a particular process and it becomes impossible
to proceed further. In such a case, they need guidance,
which can be obtained from various resources such
as books, websites and experts in the field. While
it is essential that the project leader should ensure
monitoring of the project, the guide teacher also helps
in monitoring the project.
(5) Outcome of a project: One needs to understand
thoroughly the outcome of a project. The outcome can
be single, or it can be multiple. The output of a project
can be peer reviewed and can be modified as per the
feedback from the guide teacher or other users.

7.3 Teamwork
Many real life tasks are very complex and require a lot
of individuals to contribute in achieving them. An effort
made collectively by individuals to accomplish a task is
called teamwork.
For example, in many sports, there is a team of
players. These players play together to win a match.
Take an example of a cricket team. We find that even if
a bowler bowls a good ball, but if the fielder cannot take
a catch then a wicket cannot be taken. So, in order to
take a catch, efforts of a bowler as well as of the fielders

2024-25

Chapter 7.indd 197 11/26/2020 12:41:58 PM


198 Informatics Practices

Notes are needed. To win a cricket match, contributions from


all the team members in all the three areas — batting,
bowling and fielding are required.
7.3.1 Components of Teamwork
Apart from technical proficiency, a wide variety of other
components make a successful teamwork. It comprises
skilled team members with specific roles to achieve the
goal.
(A) Communicate with Others
When a group of individuals perform one job, it is
necessary to have effective communication between
the members of the team. Such communication can
be done via e-mails, telephones, or by arranging group
meetings. This helps the team members to understand
each other and sort out their problems to achieve the
goal effectively.
(B) Listen to Others
It is necessary to understand the ideas of others while
executing a job together. This can be achieved when the
team members listen to each other in group meetings,
and follow steps that are agreed upon.
(C) Share with Others
Ideas, images and tools need to be shared with each
other in order to perform a job. Sharing is an important
component of teamwork. Any member of the team who is
well versed in a certain area should share the expertise
and experience with others to effectively achieve the
goal within the time frame.
(D) Respect for Others
Every member of the team must be treated respectfully.
All the thoughts and ideas that are put forth in the
group meetings may be respected and duly considered.
Not respecting the views of a particular member may
cause problems and that particular team member may
not give his best.
(E) <L3>Help Others
A helping hand from every member is a key to success.
Sometimes, help from people who are not a part of the
team is also obtained in order to accomplish a job.
(F) Participate
All the team members must be encouraged by each
other to participate in completing the project and also

2024-25

Chapter 7.indd 198 11/26/2020 12:41:58 PM


Project Based Learning 199

in discussions in group meetings. Also, every member Notes


should take an active participation so that they feel
their importance in the team.

7.4 Project Descriptions


In this section, some examples of project works are
given, which can be taken up in groups under project
based learning. However, a group may choose any other
project in consultation with the guide teacher.
7.4.1 Project I : Online Shopping Platform
Description
Murugan plans to launch an online shopping platform—
‘APPAREL EASY’. He plans to have two broad categories
of merchandise—Men, Women. Under both the
categories— Clothing, Footwear and Accessories will be
the sub-categories. Also, on his shopping platform, he
is planning to launch two mega events— Festive Sale (a
month before Diwali to Christmas), End of Season Sale
(February and August). Murugan also wants to keep
a record of his monthly revenue generation sales and
category wise sales, with special focus on mega events.
A record should also be kept on discounts being offered
by the manufacturers, payment sites or any discount
offered as a promotional campaign by the APPAREL
EASY portal.
Specification
The details of the Men and Women apparels should be
stored in a data file with fields as Apparel Code, Name,
Category, Size, Price, Customer Name, Payment Mode,
Discount Code, etc.
If the Category is Men, then apparels can be Men’s
Trousers, Men’s Shirt, Men’s Jeans, Men’s T-shirt. If the
Category is Women, then the apparels can be Skirts,
Top, Pants, Jeans, Kurta, etc.
If the Payment Mode is Credit or Debit Card, then
the Credit card number, name, CVV and validity should
be entered.
If the Payment Mode is Cash on Delivery(COD), then
no details to be asked.
Randomly select the merchandise to be put on sale.
The selected merchandise should not be more than 70
per cent of the total merchandise.

2024-25

Chapter 7.indd 199 11/26/2020 12:41:58 PM


200 Informatics Practices

Notes The discount code can be either FEST (for Festive) or


EOS (for End of Season). The discounts for FEST will be
10 per cent and for EOS will be 15 per cent.
You need to visualise the data structure, keeping all the
requirements of Murugan in mind, and then implement
it using Python Pandas. Thereafter, you need to design a
software to store details of the merchandise to put them
online for sale. At the same time, records of customers
visiting the e-commerce site and the number of
customers placing the order also have to be maintained.
The data collected should be plotted appropriately to
help Murugan make decisions for future marketing and
promotion strategies.
7.4.2 Project II: Automating a Books Donation Camp
Description
Realising the importance of Reduce, Reuse and Recycle,
the Bookworm club every year organises a Book Donation
Camp. The Book Donation camp collects books and
notebooks. The volunteers assess the condition of the
books and categorise them as Fit, Needs mending, or
Unfit. The unfit books’ pages are used to create paper
bags and envelopes. The other categories of books are
resold at half the price. They accept notebooks that
have pages left in them. The pages are torn from the
notebooks, and are attractively bound to create a new
notebook and sold. They create a variety of recycled
objects and sell them. They want to create a software
for this purpose and store details about the camp. To be
able to efficiently store, retrieve and visualise data, they
need to implement the following using Pandas.
Specification
The details of collections are stored in a CSV file with
column headings as Item category, Item ID, Item name,
Item type, Condition.
If the Item Category is Book, then the Item Type can
be either Academic or Non Academic, and Item Id shall
be prefixed with a ‘B’. In case of Academic, class shall
be entered.
If the Item Category is Notebook then the Item can
be Single line, Four Line, Five Line, and Item Id shall be
prefixed with an ‘N’.
Condition can only be Fit, Needs Mending or Unfit.
After the items are refurbished, the data are stored

2024-25

Chapter 7.indd 200 11/26/2020 12:41:58 PM


Project Based Learning 201

in another CSV file containing the following column Notes


headings: Item id, Item name, Item Category, Quantity,
Price. Item Category can be Paper bags, Notebook,
Books. In case of books Class is also to be entered.
Another CSV file to store orders is created that
stores Item Category, Item name, Quantity and Price.
In case of an order, the refurbished CSV shall update
the quantity.
To ensure effective decision making, it is required
that different data are plotted using appropriate plots
to show sales, items refurbished, and items collected.
7.4.3 Project III: A survey of the effect of social
networking sites on behaviour of teenagers
Description
With the Internet revolution everyone today is now
connected. Teenagersspend a good amount of time on
social networking sites, and it plays a vital role in their
behaviour. It is considered that excessive use of social
networking sites has sometimes a serious impact on
the mental health of individuals. A well-crafted survey
questionnaire can help in exploring and finding many
facts.
Specifications
1. Create a survey questionnaire using any of the freely
available online tools (such as google forms) and
store the responses in a CSV file.
2. Prepare some data analysis questions that you
expect them to answer
3. Import the CSV file in Pandas DataFrame
4. Perform statistical computation such as mean,
median, etc., with respect to the identified questions
5. Visualise the findings of the survey using appropriate
charts.
7.4.4 Project IV: Utilising an open data source to use
a national, state or district level Dataset
Description
Open Government Data (OGD) Platform India www.
data.gov.in is a platform for supporting open data
initiative of Government of India. From this platform,
let us consider the dataset “Special Tabulation on
Adolescent and youth population classified by various

2024-25

Chapter 7.indd 201 11/26/2020 12:41:58 PM


202 Informatics Practices

parameters for India, States and Union Territories,


2011”. The dataset was contributed by the Ministry of
Home Affairs, Government of India, and released under
National Data Sharing and Accessibility Policy (NDSAP).
The dataset was published on portal on 07/09/2015.
Statistics of the Data Set:
Number of rows: 12168
Number of columns: 123
Descriptions of some of the columns are given below:
State: Serial numbers given to states
Area Name: Name of the states and union territories
Total/Rural/Urban: Data about the total, rural or urban areas of a state or UT.
Adolescent and youth: Data for different age groups
Total Male: Total number of males
Total Female: Total number of females
SC-M: Total number of males of Scheduled Castes(SC)
SC-F: Total number of females of Scheduled Castes(SC)
ST-M: total number of males of Scheduled Tribes(ST)
ST-F: total number of females of Scheduled Tribes(ST)
Literates-M: total number of literate males
Literates-F: total number of literate females
LiteratesSC-M: total number of literate males of Scheduled Castes(SC)
LiteratesSC-F: total number of literate females of Scheduled Castes(SC)
LiteratesST-M: total number of literate males of Scheduled Tribes(ST)
LiteratesST-F: total number of literate females of Scheduled Tribes(ST)
Illiterates-M: total number of illiterate males
Illiterates-F: total number of illiterate females
IlliteratesSC-M: total number of illiterate males of Scheduled Castes(SC)
IlliteratesSC-F: total number of illiterate females of Scheduled Castes(SC)
IlliteratesST-M: total number of illiterate males of Scheduled Tribes(ST)
IlliteratesST-F: total number of illiterate females of Scheduled Tribes(ST)
MainWorker-M:total number of main worker males
MainWorker-F:total number of main worker females
MainWorkerSC-M: total number of main worker males of Scheduled Castes(SC)
MainWorkerSC-F: total number of main worker females of Scheduled Castes(SC)

2024-25

Chapter 7.indd 202 11/26/2020 12:41:58 PM


Project Based Learning 203

MainWorkerST-M: total number of main worker males of Scheduled Tribes(ST)


MainWorkerST-F: total number of main worker females of Scheduled Tribes(ST)
MarginalWorker-M: total number of marginal worker males
MarginalWorker-F: total number of marginal worker females
MarginalWorkerSC-M: total number of marginal worker males of Scheduled
Castes(SC)
MarginalWorkerSC-F: total number of marginal worker females of Scheduled
Castes(SC)
MarginalWorkerST-M: total number of marginal worker males of Scheduled
Tribes(ST)
MarginalWorkerST-F: total number of marginal worker females of Scheduled
Tribes(ST)
Specifications
On such a large dataset, various types of questions
can be answered by doing different analysis of data.
Following is a list of some of the possible queries that
can be answered by analysing the dataset:
1. What is the total population, total male population
and total female population aged 10 to 24 in India?
2. Which State or Union Territory in India has the
maximum number of illiterates in the youth ages?
3. What is the percentage of people working as a
marginal worker?
4. List the top 5 states or union territories which have
the maximum population working as a marginal
worker.
5. Compare the sex ratio of urban areas and rural
areas using appropriate graph.
6. Which state has the highest and the lowest
percentage of literate Scheduled Tribes and
Scheduled Castes?
7. For each state, compare the no. of female marginal
workers with no. of male marginal workers. Use
appropriate graphs.
8. What percentage of Scheduled Tribes lives in urban
areas? Draw a pie chart showing the proportion of
literate and illiterates scheduled tribes living in
urban areas.
9. What is the state wise ratio of literates vs. illiterates
in all age groups?

2024-25

Chapter 7.indd 203 11/26/2020 12:41:58 PM


204 Informatics Practices

Notes 10. Which state is home to the maximum no. of ST in


India? Which state has the minimum no. of ST in
India?
11. For each state, find the no. of literate females and
no. of literate males. Draw a bar graph for the same.
Which state has the highest ratio of literate female
vs literate male and which state has the minimum?
A project work can be carried out by taking any
4–5 of the above questions and any other similar
questions, and solving them step-by-step, with detailed
explanation and documentation. As an example, in the
following pages, we will solve the first question. This will
give us an idea about how the other questions are to be
answered.
Task 1: What is the total population, total male
population and total female population aged 10 to
24 in India?
Solution:
Prerequisite: we need to first download the CSV file
through the QR code given at the beginning of this
chapter.
Step 1: Read the CSV file in a DataFrame
Step 2: Check the shape of the DataFrame
Step 3: View the columns
Step 4: Filter data
a. Identify the columns that you wants to use
for plotting
b. Identify the number of rows required for
plotting
Step 5: Create a new DataFrame containing the filtered
data
Step 6: Rename the columns for ease of use
Step 7: Group data as per the requirement
Step 8: Plot data as a barchart for the DataFrame
obtained in Step 7.
Let us now write the code for the above identified
steps:
Step 0: Import required libraries.
import pandas as pd
importmatplotlib.pyplot as plt

2024-25

Chapter 7.indd 204 11/26/2020 12:41:58 PM


Project Based Learning 205

Step 1: Read the CSV file in a DataFrame.


# Add path to the CSV file in your computer
data=pd.read_csv("PCA_AY_2011_Revised.csv")
df=pd.DataFrame(data)
Step 2: Check shape of the DataFrame.
print(df.shape)
We get the output showing the dataset contains
12168 rows and 123 columns.
Step 3: Display the columns.
print(df.columns.values)
A part of the output produced for the 123 columns
is shown below:
['Table No.' 'State Code' 'District Code' 'Area Name'
'Total/ Rural/ Urban' 'Adolescent and youth categories'
'Total Population - Persons' 'Total Population - Males'
'Total Population - Females' 'Scheduled Caste - Persons'
'Scheduled Caste - Males' 'Scheduled Caste - Females'
...
'Scheduled Tribe Marginal Worker - Household Industry - Males'
'Scheduled Tribe Marginal Worker - Household Industry - Females'
'Scheduled Tribe Marginal Worker - Other Workers - Persons'
'Scheduled Tribe Marginal Worker - Other Workers - Males'
'Scheduled Tribe Marginal Worker - Other Workers - Females']
Step 4: Filter Data.
a. Identify the columns that you want to use for plotting.
For our analysis, we will consider only the columns
‘Area Name’; ’Total/Rural/Urban’, 'Adolescent and
youth categories', and 'Total Population - Persons' .
b. Identify the number of rows required for plotting.
In order to decide the number of rows, we needs to
check the values in the column ‘Area Name
print(df[‘Area Name’]
The following is the output:
0 INDIA
1 INDIA
2 INDIA
3 INDIA
4 INDIA
...

2024-25

Chapter 7.indd 205 11/26/2020 12:41:58 PM


206 Informatics Practices

12163 District - South Andaman (03)


12164 District - South Andaman (03)
12165 District - South Andaman (03)
12166 District - South Andaman (03)
12167 District - South Andaman (03)
Name: Area Name, Length: 12168, dtype: object
Step 5: Create a new DataFrame containing the filtered
data.
Suppose, we want to consider data for ‘Area Name’ =
‘INDIA’ only, Therefore, we shall create a new DataFrame
df1 containing only the filtered data, using the following
syntax:
df.loc[row selection, column selection]
df1=df.loc[(df['Area Name'] == 'INDIA'),
'Area Name':'Total Population - Females']
In the above statement, df [‘Area Name’] is used to
select the required rows.We apply slicing on column
labels to select the columns starting from ‘Area Name’
till ‘Total Population — Females’
Step 6: The names of the columns in the DataFrame
are too long. The following statement can be used to
rename the columns.
df1.columns = ['Area', 'Class', 'Category',
'TotalPop', 'MalePop', 'FemalePop']
Step 7: Group data as per the requirement.
We decided to plot TotalPop, MalePop, FemalePopwith
respect to Category. But, on inspecting the DataFrame
df1 we have noticed that the Category column contains
data under six different categories — '10-14', '15-19',
'20-24', 'Adolescent (10-19)', 'All Ages', 'Youth (15-24)'.
print(df1)
Area Class Category TotalPop MalePop FemalePop
0 INDIA Total All Ages 1210854977 623270258 587584719
1 INDIA Total 10-14 132709212 69418835 63290377
2 INDIA Total 15-19 120526449 63982396 56544053
3 INDIA Total 20-24 111424222 57584693 53839529
4 INDIA Total Adolescent (10-19) 253235661 133401231 119834430
5 INDIA Total Youth (15-24) 231950671 121567089 110383582
6 INDIA Rural All Ages 833748852 427781058 405967794
7 INDIA Rural 10-14 96804494 50488158 46316336
8 INDIA Rural 15-19 83902472 44570557 39331915

2024-25

Chapter 7.indd 206 11/26/2020 12:41:58 PM


Project Based Learning 207

9 INDIA Rural 20-24 73835046 38138662 35696384


10 INDIA Rural Adolescent (10-19) 180706966 95058715 85648251
11 INDIA Rural Youth (15-24) 157737518 82709219 75028299
12 INDIA Urban All Ages 377106125 195489200 181616925
13 INDIA Urban 10-14 35904718 18930677 16974041
14 INDIA Urban 15-19 36623977 19411839 17212138
15 INDIA Urban 20-24 37589176 19446031 18143145
16 INDIA Urban Adolescent (10-19) 72528695 38342516 34186179
17 INDIA Urban Youth (15-24) 74213153 38857870 35355283
Therefore, to plot TotalPop, MalePop, FemalePop, we
should do grouping of these six categories and find the
sum for each type of population. This will help to provide
a complete picture. The GROUP BY() function when
applied on the column ‘Category’ on our DataFrame
df1, gives us the following result:
d = df1.GROUP BY('Category').sum()
TotalPopMalePopFemalePop
Category
10-14 265418424 138837670 126580754
15-19 241052898 127964792 113088106
20-24 222848444 115169386 107679058
Adolescent (10-19) 506471322 266802462 239668860
All Ages 2421709954 1246540516 1175169438
Youth (15-24) 463901342 243134178 220767164

We are interested only in the categories ‘10-14’, ‘15-


19’ and ‘20-24’. So, let us drop the remaining rows using
the following Python statement:
d= d.drop(['Adolescent (10-19)','All
Ages','Youth (15-24)'],axis= 0)
TotalPopMalePopFemalePop
Category
10-14 265418424 138837670 126580754
15-19 241052898 127964792 113088106
20-24 222848444 115169386 107679058

Step 8: Plot the data as a barchart for the DataFrame


obtained in Step 7.
d.plot(kind='bar')
plt.show()

2024-25

Chapter 7.indd 207 11/26/2020 12:41:58 PM


208 Informatics Practices

The barchart shown at Figure 7.2 is produced as the


output. The value (1e8) marked at the top is offset that
is being displayed for the y axes which corresponds to
scientific notation which is used for numbers outside a
specified range.

Figure 7.2: Barchart showing population in different categories

2024-25

Chapter 7.indd 208 11/26/2020 12:41:58 PM


Notes

2024-25

Chapter 7.indd 209 11/26/2020 12:41:58 PM


Notes

2024-25

Chapter 7.indd 210 11/26/2020 12:41:58 PM

You might also like