Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
10 views66 pages

Data Structure Notes

The document provides an introduction to data structures, detailing basic terminology, types of data structures (static, dynamic, linear, non-linear), and their operations. It further explains arrays, their representation, address calculation, applications, and character strings in C, along with concepts of stacks, queues, and linked lists. Additionally, it touches on algorithm complexity and sparse matrices and vectors, emphasizing their storage and construction methods.

Uploaded by

Nxt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views66 pages

Data Structure Notes

The document provides an introduction to data structures, detailing basic terminology, types of data structures (static, dynamic, linear, non-linear), and their operations. It further explains arrays, their representation, address calculation, applications, and character strings in C, along with concepts of stacks, queues, and linked lists. Additionally, it touches on algorithm complexity and sparse matrices and vectors, emphasizing their storage and construction methods.

Uploaded by

Nxt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

Unit-1

Introduction

Basic Terminology-
 Data Structures are the programmatic way of storing data so that
data can be used efficiently. Almost every enterprise application
uses various types of data structures in one or the other way.

Elementary data organization-

1) Static data structure :

 A data structure whose organizational characteristics are invariant


throughout its lifetime. Such structures are well supported by high-
level languages and familiar examples are arrays and records.
The prime features of static structures are::
(a)None of the structural information need be stored
explicitly within the elements - it is often held in a distinct
logical/physical header.
(b)The elements of an allocated structure are physically
contiguous, held in a single segment of memory.
(c)All descriptive information, other than the physical
location of the allocated structure, is determined by the structure
definition.
(d)Relationships between elements do not change during the
lifetime of the structure.

2) Dynamic data structure :


 A data structure whose organizational characteristics may change
during its lifetime. The adaptability afforded by such structures, e.g.
linked lists, is often at the expense of decreased efficiency in
accessing elements of the structure.
Two main features distinguish dynamic structures from static data
structures-
 Firstly, it is no longer possible to infer all structural information
from a header; each data element will have to contain information
relating it logically to other elements of the structure.
 Secondly, using a single block of contiguous storage is often not
appropriate, and hence it is necessary to provide some storage
management scheme at run-time.

3) Linear data structure :

 In Linear data structure processing of data items is possible linear


that is one by one sequentially. Linear structure is a data structure
for holding multiple items. The items can be added in different ways,
but once added they retain a relation their neighbors' that does not
change. For example:- Bunch of numbers, Bunch of Strings, etc.
 Once, a item is added it stays in that position relative to the other
items that came before it and came after it. Linear data structure
includes: Array, Linked list, Stack and Queue.

4) Non-Linear data structure :

 In non-linear data structure processing of data items is not in linear


fashion. For example of non-linear data structure is graph and tree.

Data Structure Operations-


 The basic operations that are performed on data structures are as
follows: Insertion: Insertion means addition of a new data element in
a data structure. Deletion: Deletion means removal of a data element
from a data structure if it is found.
Algorithm Complexity-
 Algorithmic complexity is concerned about how fast or slow
particular algorithm performs. We define complexity as a numerical
function T(n) - time versus the input size n. We want to define time
taken by an algorithm without depending on the implementation
details. But you agree that T(n) does depend on the implementation!
A given algorithm will take different amounts of time on the same
inputs depending on such factors as: processor speed; instruction
set, disk speed, brand of compiler and etc. The way around is to
estimate efficiency of each algorithm asymptotically. We will
measure time T(n) as the number of elementary "steps" (defined in
any way), provided each such step takes constant time.
Unit-2
Array
Array Defination-
 An array is a collection of data items, all of the same type, accessed
using a common name. A one-dimensional array is like a list; A two
dimensional array is like a table; The C language places no limits on
the number of dimensions in an array, though specific
implementations may.

Representation And Analysis-

Single And Multidimentional Array-


Single-

Multidimetional-

Address Calculation-
Address Calculation in single (one)
Dimension Array:
 Array of an element of an array say “A[ I ]” is calculated using the
following formula:

Address of A [ I ] = B + W * ( I – LB )

Where,
B=Base_Address
W = Storage Size of one element stored in the array (in byte)
I = Subscript of element whose address is to be found
LB = Lower limit / Lower Bound of subscript, if not specified
assume 0 (zero).

Example:
Given the base address of an array B[1300…..1900] as 1020 and
size of each element is 2 bytes in the memory. Find the address
of B[1700].
Solution:
The given values are: B = 1020, LB = 1300, W = 2, I = 1700

Address of A [ I ] = B + W * ( I – LB )
=1020+2*(170–1300)
=1020+2*400
=1020+800
= 1820 [Ans]

Address Calculation in Double (Two)


Dimensional Array:

 While storing the elements of a 2-D array in memory, these are


allocated contiguous memory locations. Therefore, a 2-D array must
be linearized so as to enable their storage. There are two
alternatives to achieve linearization: Row-Major and Column-Major.
 Address of an element of any array say “A[ I ][ J ]” is calculated in
two forms as given:
(1) Row Major System
(2) Column Major System

Row Major System:

 The address of a location in Row Major System is calculated using


the following formula:

Address of A [ I ][ J ] = B + W * [ N * ( I – Lr ) + ( J – Lc ) ]

Column Major System:


 The address of a location in Column Major System is calculated using
the following formula:

Address of A [ I ][ J ] Column Major Wise = B + W * [( I – Lr ) +


M * ( J – Lc )]

Where,
B = Base address
I = Row subscript of element whose address is to be found
J = Column subscript of element whose address is to be found
W = Storage Size of one element stored in the array (in byte)
Lr = Lower limit of row/start row index of matrix, if not given
assume 0 (zero)
Lc = Lower limit of column/start column index of matrix, if not
given assume 0 (zero)
M = Number of row of the given matrix
N = Number of column of the given matrix

Important : Usually number of rows and columns of a matrix


are given ( like A[20][30] or A[40][60] ) but if it is given
as A[Lr- – – – – Ur, Lc- – – – – Uc]. In this case number of rows
and columns are calculated using the following methods:

 Number of rows (M) will be calculated as = (Ur – Lr) + 1


 Number of columns (N) will be calculated as = (Uc – Lc) + 1

And rest of the process will remain same as per requirement


(Row Major Wise or Column Major Wise).

Examples:
Q 1. An array X [-15……….10, 15……………40] requires one byte of
storage. If beginning location is 1500 determine the location of X
[15][20].
Solution:
As you see here the number of rows and columns are not given in
the question. So they are calculated as:

Number or rows say M = (Ur – Lr) + 1 = [10 – (- 15)] +1 = 26


Number or columns say N = (Uc – Lc) + 1 = [40 – 15)] +1 = 26

(i) Column Major Wise Calculation of above equation

The given values are: B = 1500, W = 1 byte, I = 15, J = 20, Lr = -15,


Lc = 15, M = 26

Address of A [ I ][ J ] = B + W * [ ( I – Lr ) + M * ( J – Lc ) ]

=1500+1*[(15–(-15))+26*(20–15)]
=1500+1*[30+26*5]
=1500+1 * [160]
= 1660 [Ans]

(ii) Row Major Wise Calculation of above equation

The given values are: B = 1500, W = 1 byte, I = 15, J = 20, Lr = -15,


Lc = 15, N = 26

Address of A [ I ][ J ] = B + W * [ N * ( I – Lr ) + ( J – Lc ) ]

=1500+1*[26*(15–(-15)))+(20–15)]
=1500+1*[26*30+5]
=1500+1*[780+5]
=1500+785
=2285[Ans]
Application Array-
 Arrays are used to implement mathematical vectors and matrices, as
well as other kinds of rectangular tables. Many databases, small and
large, consist of one-dimensional arrays whose elements
are records.
 Arrays are used to implement other data structures, such as
lists, heaps, hash tables, deques, queues and stacks.
 One or more large arrays are sometimes used to emulate in-
program dynamic memory allocation, particularly memory
pool allocation. Historically, this has sometimes been the only way to
allocate "dynamic memory" portably.
 Arrays can be used to determine partial or complete control flow in
programs, as a compact alternative to (otherwise repetitive)
multiple “if” statements. They are known in this context as control
tables and are used in conjunction with a purpose built interpreter
whose control flow is altered according to values contained in the
array. The array may contain subroutine pointers(or relative
subroutine numbers that can be acted upon by SWITCH statements)
that direct the path of the execution.

Character String in C-
 Strings are actually one-dimensional array of characters terminated
by a nullcharacter '\0'. Thus a null-terminated string contains the
characters that comprise the string followed by a null.
 The following declaration and initialization create a string
consisting of the word "Hello". To hold the null character at the end
of the array, the size of the character array containing the string is
one more than the number of characters in the word "Hello."
 char greeting[6] = {'H', 'e', 'l', 'l', 'o', '\0'};
 If you follow the rule of array initialization then you can write the
above statement as follows −
 char greeting[] = "Hello";
 Following is the memory presentation of the above defined string in C/C++ −
Character String Operation-
 A character string is a series of characters represented by bits of
code and organized into a single variable. This string variable
holding characters can be set to a specific length or analyzed by a
program to identify its length.
 A character string can play many roles in a computer program. For
example, a programmer can create an unpopulated character string
with a command in the load function of a program.
 A user event can input data into that character string. If the user
types in a word or phrase such as "hello world," the program can
then later read that character string and print it, display it on the
screen, reserve it for storage, etc.
 In modern programming, character strings are often involved in data
capture and data storage functions that take in names or other types
of information.
Array as Parameters-
 An array can also be passed to method as argument or parameter. A
method process the array and returns output. Passing array as
parameter in C++ is pretty easy as passing other value as parameter.
Just create a function that accepts array as argument and then
process them. The following demonstration will help you to
understand how to pass array as argument in C++ programming.

Void main ()
{
Int arr[5]={1,2,3,4,5}
Show (arr, 5);
}
Void show (int a[], int n);
Int i;
For(i=0;i<n;i++)
Cout<<a[i];
}
Ordered List-
 The structure of an ordered list is a collection of items where each
item holds a relative position that is based upon some underlying
characteristic of the item. The ordering is typically either ascending
or descending and we assume that list items have a meaningful
comparison operation that is already defined.

Sparse Matrix and Vectors-


 Sparse matrices are represented by the SparseMatrix< T> class that
defines some additional methods and properties common to all
sparse matrices. Currently, only matrices in Compressed Sparse
Column (CSC) format are available.
How sparse matrices are stored
o There are many representations for sparse matrices. Each has
its advantages and disadvantages. Each has a set of applications
for which it is particularly well suited. The only sparse storage
format currently supported is Compressed Sparse Column
(CSC) format. The elements are stored in column-major order.
Only the indexes and values of the nonzero elements are
stored. It consists of four arrays:
 An array, values, containing the values of the nonzero matrix
elements.
 An array, rows, containing the row indices of the elements
of values.
 An array, pointerB, of length ColumnCount containing the index
into values and rows where the ith column begins.
 An array, pointerE, of length ColumnCount containing the index
into values and rows where the ith column ends.
Sparse matrices in CSC format are represented by
the SparseCompressedColumnMatrix< T> class. Other
representations (compressed sparse row format, coordinate
format, sparse diagonal, block sparse row format) may be
supported in the future.
Sparse vectors
o Sparse vectors are represented by the SparseVector< T> class.
Some special sparse vectors, like the rows and columns of a
sparse matrix, are represented by internal types.

Constructing sparse vectors


The SparseVector< T> class has no public constructors. Instead, use
an overload of the CreateSparse method to construct a new sparse
vector. This method has four overloads. The first overload takes one
argument: the length of the vector. The second overload takes an
additional argument: a Double value between 0 and 1 that specifies
the proportion of nonzero values that is expected. The third
overload is similar to the second, but takes as its second argument
an integer that specifies the capacity. All three overloads need the
element type to be specified as a generic type argument.
Unit-3
Stack and Queue and
Linked List
Static and Dynamic Data Structure-
 In Static data structure the size of the structure is fixed. The content
of the data structure can be modified but without changing the
memory space allocated to it.

Example of Static Data Structures: Array

 In Dynamic data structure the size of the structure in not fixed and
can be modified during the operations performed on it. Dynamic
data structures are designed to facilitate change of data structures in
the run time.

Example of Dynamic Data Structures: Linked List


Definition and concept of Stack, Queue
and Linked List-
Stack- In computer science, a stack is an abstract data type that
serves as a collection of elements, with two principal operations:
push, which adds an element to the collection, and. pop, which
removes the most recently added element that was not yet removed.
 It is LIFO (Last in First Out) Bases data structure. There have one
end called TOS.

Insert Delete
LIFO
TOS

Queue- Queue is a linear structure which follows a particular


order in which the operations are performed. The order
is First In First Out (FIFO). A good example of queue is any queue of
consumers for a resource where the consumer that came first is
served first.
The difference between stacks and queues is in removing. In a stack
we remove the item the most recently added; in a queue, we remove
the item the least recently added.

Linked List- A linked list is a linear data structure where


each element is a separate object. Each element (we will call it a
node) of a list is comprising of two items - the data and a reference
to the next node. The last node has a reference to null. The entry
point into a linked list is called the head of the list.
Algorithm and Applications of Stack and
Queue-
 Algorithm-Step by step problem solving technique is called
algorithm.
 In stack terminology, insertion operation is called PUSH operation
and removal operation is called POP operation.
Stack Representation

 A stack can be implemented by means of Array, Structure, Pointer,


and Linked List. Stack can either be a fixed size one or it may have a
sense of dynamic resizing.
Push Operation
 The process of putting a new data element onto stack is known as a
Push Operation. Push operation involves a series of steps –
 Step 1 − Checks if the stack is full.
 Step 2 − If the stack is full, produces an error and exit.
 Step 3 − If the stack is not full, increments top to point next empty
space.
 Step 4 − Adds data element to the stack location, where top is
pointing.
 Step 5 − Returns success.

Pop Operation
 Accessing the content while removing it from the stack, is known as
a Pop Operation. In an array implementation of pop() operation, the
data element is not actually removed, instead top is decremented to
a lower position in the stack to point to the next value. But in linked-
list implementation, pop() actually removes data element and
deallocates memory space.
 A Pop operation may involve the following steps −
 Step 1 − Checks if the stack is empty.

 Step 2 − If the stack is empty, produces an error and exit.


 Step 3 −If the stack is not empty, accesses the data element
at which top is pointing.
 Step 4 − Decreases the value of top by 1.
 Step 5 − Returns success.
Applications of Stack and Queue-
Application of Stack-

 The simplest application of a stack is to reverse a word. You


push a given word to stack - letter by letter - and then pop
letters from the stack.
 Another application is an "undo" mechanism in text editors;
this operation is accomplished by keeping all text changes in
a stack.
 Reversing of String-
 Frame
 Police rotation
Reversion of string-
A

AGRUD(The Reverse of string)

Frame- When the CPU keep the process in waiting then it is store
in stack.

Police Rotation-
1. Prefix
2. Infix
3. Postfix
Converting from Infix to Postfix-(AB+)
 It is use BEDMAS Process to solve.
 It means- To solve it we use this procees in this sequence-
 B=Bracket

 E=Exponant

 D=Division /

 M=Multiplication *

 A=Addition +

 S=Subtraction -
Example- A/B+C-D*G

(A/B) +C-D*G
(AB/) +C-D*G Let AB/=P
P+C-(D*G)
P+C-(DG*) Let DG*=Q
(P+C)-Q
(PC+)-Q Let PC+=R
R-Q
RQ-
RDG*-
PC/C+DG*- Ans

Infix to Postfix- (+AB)


Q.
A/B+C-D*G
(/AB) +C-D*G Let /AB=P
P+C-D*G
P+C-(*DG) Let *DG=Q
A+C-Q
+PC-Q Let +PC=R
R-Q
-RQ
+PCQ
+/ABC*DG Ans

Q. Explain Stack in 40 20 30 + -

Input Stack Output

40 40 --------
20 40 20 --------
30 40 20 30 --------
+ PoP(30) --------
PoP(20) --------
30+20=50 50
- 40
Pop(40)
50-40=10 10 Ans

Application of Queue-
 Queue, as the name suggests is used whenever we need to
manage any group of objects in an order in which the first
one coming in, also gets out first while the others wait for
their turn, like in the following scenarios:
 Serving requests on a single shared resource, like a printer,

CPU task scheduling etc.


 In real life scenario, Call Center phone systems uses Queues

to hold people calling them in an order, until a service


representative is free.
 Handling of interrupts in real-time systems. The interrupts

are handled in the same order as they arrive i.e First come
first served.

Linked Stack & Queue-


 A linked stack is a linear list of elements commonly
implemented as a singly linked list whose start pointer
performs the role of the top pointer of a stack
 A linked queue is also a linear list of elements commonly
implemented as a singly linked list but with two pointers
viz., FRONT and REAR. The start pointer of the singly linked
list plays the role of FRONT while the pointer to the last node
is set to play the role of REAR.
Linked List Operations-
The three basic operations that can be performed on a list are:

» Creation: This operation is used to create constituent node as


and when required.

» Insertion: This operation is used to insert a new node in the


linked list.
 At the beginning of the list,
 At a certain position and
 At the end.

» Deletion: This operation is used delete node from the list.


 At the beginning of the list,
 At a certain position and
 At the end.

Other Linked list common operation :


» Traversing: It is a process of going through all the nodes of a
linked list from the end to the other end.

» Concatenation:The process of appending the second list to the


end of the first list.
» Display: This operation is used to print each and every node’s
information.

Doubly Linked List-


 In computer science, a doubly linked list is a linked
data structure that consists of a set of sequentially linked
records called nodes. Each node contains two fields, called
links, that are references to the previous and to the next
node in the sequence of nodes.
Unit-4
Tree and Graph
Definition and Concept of Tree-

.
 A tree is a data structure made up of nodes or vertices and
edges without having any cycle. The tree with no nodes is
called the null or empty tree. A tree that is not empty
consists of a root node and potentially many levels of
additional nodes that form a hierarchy.
 In computer science, a tree is a widely used abstract data
type (ADT)—or data structure implementing this ADT—that
simulates a hierarchical tree structure, with a root value
and sub trees of children with a parent node, represented as
a set of linked nodes.

Terminology used in trees


Root
 The top node in a tree.
Child
 A node directly connected to another node when moving
away from the Root.
Parent
 The converse notion of a child.
Siblings
 A group of nodes with the same parent.

Descendant
 A node reachable by repeated proceeding from parent to
child.
Ancestor
 A node reachable by repeated proceeding from child to
parent.
Leaf (less commonly called External node)
 A node with no children.

Branch (Internal node)


 A node with at least one child.

Degree
 The number of subtrees of a node.

Edge
 The connection between one node and another.

Path
 A sequence of nodes and edges connecting a node with a
descendant.
Level
 The level of a node is defined by 1 + (the number of
connections between the node and the root).
Height of node
 The height of a node is the number of edges on the longest
path between that node and a leaf.
Height of tree
 The height of a tree is the height of its root node.

Depth
 The depth of a node is the number of edges from the tree's
root node to the node.
Forest
 A forest is a set of n ≥ 0 disjoint trees.
Binary Tree-
 A binary tree is a non linier type of data structure their they
containing node in which each node has at most
two children, which are referred to as the left child and
the right child.
Types of Binary Tree-
Full Binary Tree
 A Binary Tree is full if every node has 0 or 2 children.
Following are examples of full binary tree. We can also say a
full binary tree is a binary tree in which all nodes except
leaves have two children.

Complete Binary Tree:


 A Binary Tree is complete Binary Tree if all levels are
completely filled except possibly the last level and the last
level has all keys as left as possible.
Perfect Binary Tree
 A Binary tree is Perfect Binary Tree in which all internal
nodes have two children and all leaves are at same level.

Balanced Binary Tree


 A binary tree is balanced if height of the tree is O(Log n)
where n is number of nodes. For Example, AVL tree maintain
O(Log n) height by making sure that the difference between
heights of left and right subtrees is 1. Red-Black trees
maintain O(Log n) height by making sure that the number of
Black nodes on every root to leaf paths are same and there
are no adjacent red nodes. Balanced Binary Search trees are
performance wise good as they provide O(log n) time for
search, insert and delete.
A degenerate (or pathological) tree
 A Tree where every internal node has one child. Such trees
are performance-wise same as linked list.

Conversion of General Tree to Binary


Tree-
Steps to convert a general tree to binary-

 Use the root of the general tree as the root of the binary tree.
 Determine the first child of the root. This is the leftmost
node in the general tree at the next level.
 Insert this node. The child reference of the parent node
refers to this node .
 Continue finding the first child of each parent node and
insert it below the parent node with the child reference of
the parent to this node.
 When no more first children exist in the path just used, move
back to the parent of the last node entered and repeat the
above process. In other words, determine the first sibling of
the last node entered.
 Complete the tree for all nodes. In order to locate where the
node fits you must search for the first child at that level and
then follow the sibling references to a nil where the next
sibling can be inserted. The children of any sibling node can
be inserted by locating the parent and then
inserting the first child. Then the above process is repeated.

Tree Traversal-
 Traversal is a process to visit all the nodes of a tree and may
print their values too. Because, all nodes are connected via
edges (links) we always start from the root (head) node.
That is, we cannot randomly access a node in a tree. There
are three ways which we use to traverse a tree –
 In-order Traversal
 Pre-order Traversal
 Post-order Traversal

In-order Traversal
 In this traversal method, the left subtree is visited first, then
the root and later the right sub-tree. We should always
remember that every node may represent a subtree itself.
 If a binary tree is traversed in-order, the output will produce
sorted key values in an ascending order.

We start from A, and following in-order traversal, we move to its


left subtree B. B is also traversed in-order. The process goes on
until all the nodes are visited. The output of inorder traversal of
this tree will be −
D→B→E→A→F→C→G

Algorithm
Until all nodes are traversed −
Step 1 − Recursively traverse left subtree.
Step 2 − Visit root node.
Step 3 − Recursively traverse right subtree.
Pre-order Traversal
 In this traversal method, the root node is visited first, then
the left subtree and finally the right subtree.

We start from A, and following pre-order traversal, we first


visit A itself and then move to its left subtree B. B is also
traversed pre-order. The process goes on until all the nodes are
visited. The output of pre-order traversal of this tree will be −
A→B→D→E→C→F→G
Algorithm
Until all nodes are traversed −
Step 1 − Visit root node.
Step 2 − Recursively traverse left subtree.
Step 3 − Recursively traverse right subtree.

Post-order Traversal

 In this traversal method, the root node is visited last, hence


the name. First we traverse the left subtree, then the right
subtree and finally the root node.
We start from A, and following Post-order traversal, we first visit
the left subtree B. B is also traversed post-order. The process
goes on until all the nodes are visited. The output of post-order
traversal of this tree will be −
D→E→B→F→G→C→A
Algorithm
Until all nodes are traversed −
Step 1 − Recursively traverse left subtree.
Step 2 − Recursively traverse right subtree.
Step 3 − Visit root node.

Rotation of Tree-
Balance height tree-
 Also called AVL Tree.
 Stands for Adelson, Velski & Landis.
 AVL trees are height balancing binary search tree. AVL tree
checks the height of the left and the right sub-trees and
assures that the difference is not more than 1. This
difference is called the Balance Factor.
 Here we see that the first tree is balanced and the next two
trees are not balanced-

In the second tree, the left subtree of C has height 2 and the
right subtree has height 0, so the difference is 2. In the third
tree, the right subtree of A has height 2 and the left is
missing, so it is 0, and the difference is 2 again. AVL tree
permits difference (balance factor) to be only 1.

Balance Factor = height(left-sub tree) − height(right-sub tree)

 If the difference in the height of left and right sub-


trees is more than 1, the tree is balanced using some
rotation techniques.

AVL Rotations-
 To balance itself, an AVL tree may perform the following four
kinds of rotations –
 Left rotation
 Right rotation
 Left-Right rotation
 Right-Left rotation

The first two rotations are single rotations and the next two
rotations are double rotations. To have an unbalanced tree, we at
least need a tree of height 2. With this simple tree, let's
understand them one by one.
Left Rotation
If a tree becomes unbalanced, when a node is inserted into the
right subtree of the right subtree, then we perform a single left
rotation −

In our example, node A has become unbalanced as a node is


inserted in the right subtree of A's right subtree. We perform the
left rotation by making Athe left-subtree of B.
Right Rotation
AVL tree may become unbalanced, if a node is inserted in the left
subtree of the left subtree. The tree then needs a right rotation.

As depicted, the unbalanced node becomes the right child of its


left child by performing a right rotation.
Left-Right Rotation
Double rotations are slightly complex version of already
explained versions of rotations. To understand them better, we
should take note of each action performed while rotation. Let's
first check how to perform Left-Right rotation. A left-right
rotation is a combination of left rotation followed by right
rotation.

State Action

A node has been inserted into the right


subtree of the left subtree. This makes C an
unbalanced node. These scenarios cause
AVL tree to perform left-right rotation.

We first perform the left rotation on the left


subtree of C. This makes A, the left subtree
of B.

Node C is still unbalanced, however now, it


is because of the left-subtree of the left-
subtree.
We shall now right-rotate the tree,
making B the new root node of this
subtree. C now becomes the right subtree of
its own left subtree.

The tree is now balanced.

Right-Left Rotation
The second type of double rotation is Right-Left Rotation. It is a
combination of right rotation followed by left rotation.

State Action

A node has been inserted into the left


subtree of the right subtree. This makes A,
an unbalanced node with balance factor 2.

First, we perform the right rotation


along C node, making C the right subtree of
its own left subtree B. Now, B becomes the
right subtree of A.
Node A is still unbalanced because of the
right subtree of its right subtree and
requires a left rotation.

A left rotation is performed by making B the


new root node of the subtree. A becomes the
left subtree of its right subtree B.

The tree is now balanced.

Balanced Tree-
 A Balanced-tree is a self-balancing tree data structure that
keeps data sorted and allows searches, sequential access,
insertions, and deletions in logarithmic time. The B-tree is a
generalization of a binary search tree in that a node can have
more than two children.[1] Unlike self-balancing binary
search trees, the B-tree is optimized for systems that read
and write large blocks of data. B-trees are a good example of
a data structure for external memory. It is commonly used
in databases and filesystems.
Graph-
 Graph is a non-linear type of data structure.
 A Graph is a pair of a sets (V,E) where, V is the set of vertices
and E is the set of edges, connecting the pair of vertices.
A B

C D E
In The above graph
V={A,B,C,D,E}
V={AB,AC,BD,CD,DE}
Types of Graph-
There are 2 types of Graph-
1. Directed Graph
2. Undirected Graph

Directed Graph-The Graph which have the knoldge of direction


is known as directed graph.

A B

D C

Undirected Graph-The Graph which have no knoldge of


direction is known as directed graph.
A B

D C

Graph Terminology-
Weighted graph- A weighted graph is a graph in which
each branch is given a numerical weight. A weighted
graph is therefore a special type of labeled graph in which
the labels are numbers (which are usually taken to be
positive).
Unweighted graph- A unweighted graph is a graph in
which each branch is no numerical weight.
Adjacent- If there is an edge between vertices A and B
then both A and B are said to be adjacent. In other words,
Two vertices A and B are said to be adjacent if there is an
edge whose end vertices are A and B.
Degree-Total number of edges connected to a vertex is
said to be degree of that vertex.
Incident edge-An edge is said to be incident on a vertex if
the vertex is one of the endpoint of that edge.
Isolated vertex- An isolated vertex is avertex with degree
zero; that is, a vertex that is not an endpoint of any edge (the
example image illustrates oneisolated vertex).
Path-A path is a sequence of alternating vertices and edges
that shorts at a vertex and end at a vertex such that each edge is
incident to its predecessor and successor vertex.
Self-loop- Self-loop is an edge with the end vertices the same
vertex.
Graph Representation-
 Graph data structure is represented using following
representation.
 Adjacency Matrix
 Incidence Matrix
 Adjcancy List
Adjcancy Matrix-

Adjacency Matrix is a 2D array of size V x V where V is the


number of vertices in a graph. Let the 2D array be adj[][], a slot
adj[i][j] = 1 indicates that there is an edge from vertex i to vertex
j. Adjacency matrix for undirected graph is always symmetric.
Adjacency Matrix is also used to represent weighted graphs. If
adj[i][j] = w, then there is an edge from vertex i to vertex j with
weight w.
The adjacency matrix for the above example graph is:
Incidence matrix-
 Incidence matrix is that matrix which represents the graph
such that with the help of that matrix we can draw a graph.
This matrix can be denoted as [AC] As in every matrix, there
are also rows and columns in incidence matrix [AC].The rows
of the matrix [AC] represent the number of nodes and the
column of the matrix [AC] represent the number of branches
in the given graph. If there are ‘n’ number of rows in a given
incidence matrix, that means in a graph there are ‘n’ number
of nodes. Similarly, if there are ‘m’ number of columns in that
given incidence matrix, that means in that graph there are ‘m’
number of branches.
n the above shown graph or directed graph, there are 4 nodes and 6
branches. Thus the incidence matrix for the above graph will have
4 rows and 6 columns. The entries of incidence matrix is always -1,
0, +1. This matrix is always analogous to KCL (Krichoff Current
Law). Thus from KCL we can derive that,
Type of branch Value
Outgoing branch from kth node +1
Incoming branch to kth node -1
Others 0
Steps to Construct Incidence Matrix-
Following are the steps to draw the incidence matrix :-
1. If a given kth node has outgoing branch, then we will write +1.
2. If a given kth node has incoming branch, then we will write -1.
3. Rest other branches will be considered 0.

Example of Incidence Matrix-

For the graph shown above write its incidence matrix-


Adjacency List-In This representation, every vertex of graph
contain list of its adjacent List.
 An adjacency list is a collection of unordered lists used to
represent a finite graph. Each list describes the set of
neighbors of a vertex in the graph. This is one of several
commonly used representations of graphs for use in
computer programs.

Graph Traversal-
 Graph traversal is technique used for searching a vertex in a
graph. The graph traversal is also used to decide the order of
vertices to be visit in the search process. A graph traversal
finds the egdes to be used in the search process without
creating loops that means using graph traversal we visit all
verticces of graph without getting into looping path.

There are two graph traversal techniques and they are as


follows...

 DFS (Depth First Search)


 BFS (Breadth First Search)
DFS (Depth First Search)-
DFS traversal of a graph, produces a spanning tree as final
result. Spanning Tree is a graph without any loops. We use Stack
data structurewith maximum size of total number of vertices in
the graph to implement DFS traversal of a graph.

We use the following steps to implement DFS traversal...

 Step 1: Define a Stack of size total number of vertices in the


graph.
 Step 2: Select any vertex as starting point for traversal. Visit
that vertex and push it on to the Stack.
 Step 3: Visit any one of the adjacent vertex of the verex
which is at top of the stack which is not visited and push it
on to the stack.
 Step 4: Repeat step 3 until there are no new vertex to be visit
from the vertex on top of the stack.
 Step 5: When there is no new vertex to be visit then use back
tracking and pop one vertex from the stack.
 Step 6: Repeat steps 3, 4 and 5 until stack becomes Empty.
 Step 7: When stack becomes Empty, then produce final
spanning tree by removing unused edges from the graph

Back tracking is coming back to the vertex from which we


came to current vertex.

Example:
BFS (Breadth First Search)-
BFS traversal of a graph, produces a spanning tree as final
result. Spanning Tree is a graph without any loops. We use Queue
data structure with maximum size of total number of vertices in
the graph to implement BFS traversal of a graph.

We use the following steps to implement BFS traversal...

 Step 1: Define a Queue of size total number of vertices in the


graph.
 Step 2: Select any vertex as starting point for traversal. Visit
that vertex and insert it into the Queue.
 Step 3: Visit all the adjacent vertices of the verex which is at
front of the Queue which is not visited and insert them into
the Queue.
 Step 4: When there is no new vertex to be visit from the
vertex at front of the Queue then delete that vertex from the
Queue.
 Step 5: Repeat step 3 and 4 until queue becomes empty.
 Step 6: When queue becomes Empty, then produce final
spanning tree by removing unused edges from the graph
Spanning Tree-
 A spanning tree is a subset of Graph G, which has all the
vertices covered with minimum possible number of edges.
Hence, a spanning tree does not have cycles and it cannot be
disconnected.. By this definition, we can draw a conclusion
that every connected and undirected Graph G has at least
onespanning tree.

We found three spanning trees off one complete graph. A complete


undirected graph can have maximum nn-2 number of spanning trees,
where n is the number of nodes. In the above addressed example, n is
3, hence 33−2 = 3spanning trees are possible.

Shortest Path-
 The problem of finding the shortest path in a graph from
one vertex to another. "Shortest" may be least number
of edges, least total weight, etc.
Shortest path (A, C, E, D, F) between vertices A and F in the
weighted directed graph.

Transitive Closuer-
 Given a directed graph, find out if a vertex j is reachable from
another vertex i for all vertex pairs (i, j) in the given graph.
Here reachable mean that there is a path from vertex i to
j. The reach-ability matrix is called transitive closure of a
graph.
For example, consider below graph

Transitive closure of above graphs is


1111
1111
1111
0001
Unit-5
Searching & Sorting
Sequencial Search-
 Also Known as Linear search.
 Linear search is a very simple search algorithm. In this type
of search, a sequential search is made over all items one by
one. Every item is checked and if a match is found then that
particular item is returned, otherwise the search continues
till the end of the data collection.

Algorithm
Linear Search ( Array A, Value x)

Step 1: Set i to 1
Step 2: if i > n then go to step 7
Step 3: if A[i] = x then go to step 6
Step 4: Set i to i + 1
Step 5: Go to Step 2
Step 6: Print Element x Found at index i and go to step 8
Step 7: Print element not found
Step 8: Exit
Binary Search-
 Binary search is a fast search algorithm with run-time
complexity of Ο(log n). This search algorithm works on the
principle of divide and conquer. For this algorithm to work
properly, the data collection should be in the sorted form.
 Binary search looks for a particular item by comparing the
middle most item of the collection. If a match occurs, then
the index of item is returned. If the middle item is greater
than the item, then the item is searched in the sub-array to
the left of the middle item. Otherwise, the item is searched
for in the sub-array to the right of the middle item. This
process continues on the sub-array as well until the size of
the subarray reduces to zero.
How Binary Search Works?
 For a binary search to work, it is mandatory for the target
array to be sorted. We shall learn the process of binary
search with a pictorial example. The following is our sorted
array and let us assume that we need to search the location
of value 31 using binary search.

First, we shall determine half of the array by using this formula −


mid = low + (high - low) / 2
Here it is, 0 + (9 - 0 ) / 2 = 4 (integer value of 4.5). So, 4 is the mid
of the array.
Now we compare the value stored at location 4, with the value
being searched, i.e. 31. We find that the value at location 4 is 27,
which is not a match. As the value is greater than 27 and we have
a sorted array, so we also know that the target value must be in
the upper portion of the array.

We change our low to mid + 1 and find the new mid value again.
low = mid + 1
mid = low + (high - low) / 2
Our new mid is 7 now. We compare the value stored at location 7
with our target value 31.

The value stored at location 7 is not a match, rather it is more


than what we are looking for. So, the value must be in the lower
part from this location.

Hence, we calculate the mid again. This time it is 5.

We compare the value stored at location 5 with our target value.


We find that it is a match.
We conclude that the target value 31 is stored at location 5.
Binary search halves the searchable items and thus reduces the
count of comparisons to be made to very less numbers.

Insertion Sort-
 This is an in-place comparison-based sorting algorithm.
Here, a sub-list is maintained which is always sorted. For
example, the lower part of an array is maintained to be
sorted. An element which is to be 'insert'ed in this sorted
sub-list, has to find its appropriate place and then it has to
be inserted there. Hence the name, insertion sort.
 The array is searched sequentially and unsorted items are
moved and inserted into the sorted sub-list (in the same
array). This algorithm is not suitable for large data sets as its
average and worst case complexity are of Ο(n2), where n is
the number of items.

Selection Sort-
 Selection sort is a simple sorting algorithm. This sorting
algorithm is an in-place comparison-based algorithm in
which the list is divided into two parts, the sorted part at the
left end and the unsorted part at the right end. Initially, the
sorted part is empty and the unsorted part is the entire list.
 The smallest element is selected from the unsorted array
and swapped with the leftmost element, and that element
becomes a part of the sorted array. This process continues
moving unsorted array boundary by one element to the
right.
 This algorithm is not suitable for large data sets as its
average and worst case complexities are of Ο(n2), where n is
the number of items.
Analysis of sorting algorithm-
Time_complexity_Analysis–
We have discussed best, average and worst case complexity of
different sorting techniques with possible scenarios.
Comparison_based_sorting–
In comparison based sorting, elements of array are compared
with each other to find the sorted array.
 Bubble_sort_and_Insertion_sort–
Average and worst case time complexity: n^2
Best case time complexity: n when array is already sorted.
 Selection_sort–
Best, average and worst case time complexity: n^2 which is
independent of distribution of data.
 Merge_sort–
Best, average and worst case time complexity: nlogn which is
independent of distribution of data.
 Heap_sort–
Best, average and worst case time complexity: nlogn which is
independent of distribution of data.
 Quick_sort–
It is a divide and conquer approach with recurrence relation:
 T(n) = T(k) + T(n-k-1) + cn

Worst case: when the array is sorted or reverse sorted, the


partition algorithm divides the array in two subarrays with 0
and n-1 elements. Therefore,

T(n) = T(0) + T(n-1) + cn


Solving this we get, T(n) = O(n^2)

Best case and Average case: On an average, the partition


algorithm divides the array in two subarrays with equal size.
Therefore,

T(n) = 2T(n/2) + cn
Solving this we get, T(n) = O(nlogn)

Non-comparison_based_sorting–
In non-comparison based sorting, elements of array are not
compared with each other to find the sorted array.
 Radix_sort–
Best, average and worst case time complexity: nk where k is
the maximum number of digits in elements of array.
 Count_sort–
Best, average and worst case time complexity: n+k where k is
the size of count array.
 Bucket_sort–
Best and average time complexity: n+k where k is the number
of buckets.
Worst case time complexity: n^2 if all elements belong to
same bucket.
Lower bounds-
 The term lower bound is defined dually as an element
of K which is less than or equal to every element of S. a set
with a lower bound is said to be bounded from below by
that bound.
 For example, 5 is a lower bound for the set
{ 5, 8, 42, 34, 13934 }; so is 4; but 6 is not.

Another example: for the set { 42 }, the number 42 is both an


upper bound and a lower bound; all other real numbers are
either an upper bound or a lower bound for that set.
 Every subset of the natural numbers has a lower bound,
since the natural numbers have a least element (0, or 1
depending on the exact definition of natural numbers).
 An infinite subset of the natural numbers cannot be
bounded from above.
Merge sort of linked List-
 Merge sort is often preferred for sorting a linked list. The
slow random-access performance of a linked list makes
some other algorithms (such as quicksort) perform poorly,
and others (such as heapsort) completely impossible.
 Let head be the first node of the linked list to be sorted and
headRef be the pointer to head. Note that we need a
reference to head in MergeSort() as the below
implementation changes next links to sort the linked lists
(not data at the nodes), so head node has to be changed if the
data at original head is not the smallest value in linked list.

MergeSort(headRef)
1) If head is NULL or there is only one element in the Linked
List
then return.
2) Else divide the linked list into two halves.
FrontBackSplit(head, &a, &b); /* a and b are two halves
*/
3) Sort the two halves a and b.
MergeSort(a);
MergeSort(b);
4) Merge the sorted a and b (using SortedMerge() discussed
here)
and update the head pointer using headRef.
*headRef = SortedMerge(a, b);

Quick sort-
 Quick sort is a highly efficient sorting algorithm and is based
on partitioning of array of data into smaller arrays. A large
array is partitioned into two arrays one of which holds
values smaller than the specified value, say pivot, based on
which the partition is made and another array holds values
greater than the pivot value.
 Quick sort partitions an array and then calls itself
recursively twice to sort the two resulting subarrays. This
algorithm is quite efficient for large-sized data sets as its
average and worst case complexity are of Ο(n2), where n is
the number of items.

File Structure
External Storage device-
 External storage comprises devices that store information
outside a computer. Such devices may be permanently
attached to the computer, may be removable or may use
removable media.

Types of external storage-


Magnetic storage

 magnetic tape
 floppy disk
 external hard disk drives
Optical storage

 CD
 DVD
 Blu-ray
Flash memory devices

 Memory card
 Memory stick
 USB drives
Files-
 A file is an object on a computer that
stores data, information, settings, or commands used with a
computer program. In a graphical user interface (GUI) such
as Microsoft Windows, files display as icons that relate to
the program that opens the file. For example, the picture is
an icon associated with Adobe Acrobat PDFfiles. If this file
was on your computer, double-clicking the icon in Windows
would open that file in Adobe Acrobat or the PDF reader
installed on the computer.

Sequential Organization-
 In sequential organization the records are placed
sequentially onto the storage media i.e. occupy consecutive
locations in the case of tape that means placing records
adjacent to each other.
 In addition the physical sequence of records is ordered on
some key called the primary key.
 Sequential organization is also possible in the case of DASD
such as a disk. Even though disk storage is really two
dimensional (cylinder x surface) it may be mapped down
into one dimensional memory.
 If the disk has c cylinders and s surfaces one possibility will
be to view disk memory as in figure.
 Using notation tij to represent the jth track of the ith surface,
the sequence is t11, t21, t31….ts1, t12, t22,…..ts2 etc.
 The sequential interpretation in figure is particularly efficient
for batched update and retrieval as the tracks are to be
accessed in order: all tracks on cylinder 1 followed by all
tracks on cylinder 2 etc. as a result of this the read/write
heads are moved one cylinder at a time and this movement is
necessitated only once for every s tracks.
 Its main advantages are:
o It is easy to implement;
o It provides fast access to the next record using
lexicographic order.
 Its disadvantages:
o It is difficult to update - inserting a new record may
require moving a large proportion of the file;
o Random access is extremely slow.

Random Organization-
 Records are stored at random locations on the disk. This
randomization could be achieved by any of several
techniques: direct addressing, directory lookup, hashing.
Direct addressing: in direct addressing with equi-size records,
available disk space is divided out into nodes large enough to
hold a record. Numeric value of primary key is used to
determine the node into which a particular record is to be
stored.
Directory lookup: the index is not direct access type but is a
dense index maintained using a structure suitable for index
operations. Retrieving a record involves searching the index for
the record address and then accessing the record itself. The
storage management scheme will depend on whether fixed size
or variable size nodes are being used. It requires more accesses
for retrieval and update, since index searching will generally
require more than one access. In both direct addressing and
directory lookup, some provision must be made to handle
collisions.
Hashing: the available file space is divided into buckets and
slots. Some space may have to be set aside for an overflow area
in case chaining is being used to handle overflows. When
variable size records are present, the no. of slots per bucket will
be only rough indicator of no. of records a bucket can hold. The
actual no. will vary dynamically with the size of records in a
particular bucket. Random organization on the primary key
using any of the above three techniques overcomes the
difficulties of sequential organizations. Insertion, deletions
become easy. But batch processing of queries becomes
inefficient as records are not maintained in order of primary
key. Handling range queries becomes very inefficient except in
case of directory lookup.
Linked Organization-
 Linked organizations differ from sequential organizations
essentially in that the logical sequence of records is generally
different from the physical sequence.
 In sequential ith record is placed at location li, then the
i+1st record is placed at li + c where c is the length of ith
record or some fixed constant.
 In linked organization the next logical record is obtained by
following link value from present record. Linking in order of
increasing primary key eases insertion deletion.
 Searching for a particular record is difficult since no index is
available, so only sequential search possible.
 We can facilitate indexes by maintaining indexes
corresponding to ranges of employee numbers eg. 501-700,
701-900. all records with same range will be linked together
i a list.
 We can generalize this idea for secondary key level also. We
just set up indexes for each key and allow records to be in
more than one list. This leads to the multilist structure for
file representation.

Inverted File-
 Inverted files are similar to multilists. Multilists records with
the same key value are linked together with link information
being kept in individual record. In case of inverted files the
link information is kept in index itself.
 EG. We assume that every key is dense. Since the index
entries are variable length, index maintenance becomes
complex fro multilists. Benefits being Boolean queries
require only one access per record satisfying the query.
Queries of type k1=xx and k2=yy can be handled similarly by
intersecting two lists.
 The retrieval works in two steps. In the first step, the indexes
are processed to obtain a list of records satisfying the query
and in the second, these records are retrieved using the list.
The no. of disk accesses needed is equal to the no. of records
being retrieved + the no. to process the indexes.
 Inverted files represent one extreme of file organization in
which only the index structures are important. The records
themselves can be stored in any way.
 Inverted files may also result in space saving compared with
other file structures when record retrieval doesn’t require
retrieval of key fields. In this case key fields may be deleted
from the records unlike multilist structures.

Indexing Techniques-
 We know that data is stored in the form of records. Every
record has a key field, which helps it to be recognized
uniquely.
 Indexing is a data structure technique to efficiently retrieve
records from the database files based on some attributes on
which the indexing has been done. Indexing in database
systems is similar to what we see in books.
 Indexing is defined based on its indexing attributes. Indexing
can be of the following types −
 Primary Index − Primary index is de ined on an ordered data
file. The data file is ordered on a key field. The key field is
generally the primary key of the relation.
 Secondary Index − Secondary index may be generated from a
field which is a candidate key and has a unique value in every
record, or a non-key with duplicate values.
 Clustering Index − Clustering index is defined on an ordered
data file. The data file is ordered on a non-key field.

You might also like