Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
9 views344 pages

1 Algorithms Gate

Uploaded by

ANIK CHAKRABORTY
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views344 pages

1 Algorithms Gate

Uploaded by

ANIK CHAKRABORTY
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 344

NOTE: You will see many times how correctness of algorithms is proved using Induction and also how

they say
constructing the Inductive form to make Recursive algorithms , the relation btw the concepts of Induction and
Recursion is this "Induction validates Recursion"

In General to solve problems we need to model them mathematically into a certain level of
detail that the Datastructures help us with and then the Algo design techniques are about
strategies to handle the problem a.k.a finding the solution

Dynamic programming is about discarding overlapping problems so we don’t waste


computational resources
AND in Discrete Mathematics course they Follow the CL Liu book(so we get the picture)
Typical Network representation

First step is to Model the problem –to take only the necessary details
A Graph --- a big advantage is the Picture can be distorted without changing it’s meaning
Can we scale this algorithm to cover multiple airlines etc
A Scheduling Problem
Combinatorial explosion
Like car and automotive
Usually the O(n) one is thus what we focus on
Proper scales of the Problem input reveals this huge difference in efficiency
The size of the problems we can tackle for n2 is much smaller than the size of problems we
can handle for nlogn , that n=k is essentially the point where the efficiency takes over and
the better performance reveals itself
Since we are interested in orders of magnitude we broadly classify functions based on
this(The Expression list)

Here the number of operations needed for that input are given

Note that for 2^n we can’t even go towards the n=100 input , a foreshadowing for NP Hard
and all

Efficiency --- computed in terms of basic operations

Running time –computed as a function of it’s input size n


Usually the Input size depends on the Natural parameter of the problem

But for the following class of problems we need to careful—Arithmetic functions involving
numbers
For Arithmetic problems – the number of digits required for representation is the input size
and the number itself is not the input size (for e.g. 100 is not the input size , 3 is)
The reason why we drop constants – depending on the whether we take swap as a basic
operation or assignment as a basic operation the total no.of basic operations changes as

1 swap operation -> 3 assignment operations

But because we drop the constants and focus on the order , this doesn’t really affect us.
Hence dropping the constants gives us this flexibility of choice of basic operations
Although it will make more sense to focus on Average but identifying them is very hard
So Remember 2 parameters that you are
omitting -- c and n0
Essentially we are focusing on finding the Upper Bound
Formally showing this
It is harder to establish the Lower bound of a class of problems
n>=2 for the Lower bound otherwise , the n/2*n/2 that we added will become a fraction and
we won’t be able to say what we said with that much confidence
We know that the for Loop runs (n-1) times we don’t exactly know how many times the if is
executed so we take them to be some c(n-1) times. Overall , when we focus on the Order
we see that it is O(n) times.

(BTW, the Counter update is never considered as separate , it is used to count the number of
times the Loop runs in general. The number of basic operations happening in the body of
the loop is what determines the total no.of steps. If the no.of body operations inside the
loop each time is constant say k and the outer loop runs n times then we can just write nk
instead of doing k+k+…n times . But, if the no.of operations is k1 for 1st iteration, k2 for 2nd
iteration etc, and the outer loop runs n times then total Operations = k1+k2+ … + kn)
This is all applicable for FOR LOOPS, for while loops directly count the no.of Body operations
based on how long it takes to violate the Check constraint

Here don’t be like (n-1)(n-1) --- don’t multiply , when do we multiply , when the number of
Steps inside the Loop is constant for every iteration , say i loop happens n times and k
operations take place every time inside it so we do – k+k+ … n times = nk directly

here the no.of times j will execute inside i loop is different for each iteration , so we go back

to doing normal addition in situations like these.

For i=0 , j takes place (n-1) times

For i=1 , j takes place (n-2) times

Etc so ---- (n-1) + (n-2) + … 0 = n(n-1)/2 aka O(n2)


Obviously here also the total no.of steps happening inside the Loop is dependent on how
many times n needs to be divided by 2 until n<=1 , which is log2(n), hence O(logn) is the
complexity of the problem
DD sir’s explanation for this is still the best , that is what is written here
Solving this using Backpropagation method
Access is just computing the offset, but Insertion and Contraction requires shifting the
elements which in Worst case can require shifting all n elements.

Access --Array--Constant
Lists -- Linear
Insert --Array --Linear
Lists --Constant
Delete --Array --Linear
Lists --Constant
Exchange --Array--Constant
Lists--Linear .....Array (Constant for Access/Exchange,Linear for Insert,Delete)
Insert & Delete takes constant time for List only if we are already at that position
Search takes linear time for both
Linear Search
Binary Search

Recursive function so we need to Write a Recurrence and solve it


Main Motivation for Sorting comes for reducing the Searching time
See because the for loop goes Forwards and the while loop goes Backwards , the
Intermediate steps happen , ie. , say we have 31 43 65 15

First when we go to 15 say , the first swap takes place 31 43 15 65 , but because the While
loop is going backwards nextpos=nextpos – 1, the second required swap for 15 also takes
place 31 15 43 65 , and then again the 3rd required swap takes place 15 31 43 65 etc.
Finding the position of search can be optimized using say Binary search but it doesn’t matter,
as we know the insertion is what takes the original time
For Selection sort this was n + T(n-1) giving n(n+1)/2
First we Divide
Now we merge
Formalising Theorems is done using Proof , Formalising Algorithms is done using Code
You see Merging operation is Iterative, the Mergesort is a Recursive algorithm
Obviously Set difference is also a variation of this
Hence walks in Quicksort with it’s Pivot Concept
A[l] is the pivot

If green less than Pivot,swap Yellow


and green

then swap yellow-1 and pivot


Once this Yellow Green swap happens, the Partitioning terminates and then the only left
thing to do is to move the Pivot to the Centre
Hence although Quicksort has the worst case O(n2) , in most of the cases that we will ever
encounter it will give O(nlogn) and it is Insorting hence it is better than Mergesort.

The worst case can occur due to a bad pivot choice


When we have small n , the simplicity of Naïve beats the efficiency of O(nlogn) algos
For each Row(Vertex) we skip the neighbours that we have already covered and focus on
unmarked neighbour
Wastefull representation as we are wasting space to store lots of 0s
So we have finished exploring 1, now we go back to the Queue to see whether there is
anything left to be explored and yes there is 2,3,4 basically 2 at the Head of the Queue , so
we explore 2
Likewise
So in BFS the visited vertices are in Queue and in DFS suspended vertices are stored in Stack which helps
us to Backtrack to the last visited one
Since all vertices of 1,2 and 3 are all visited we step by step backtrack all the way to 4

Everything for 4 has also been visited so we drop that too

We go to 5 and then 6 and 7 and likewise…


I think visited[i]=1 will also be a line
The numbering gives us the order in which we visited the vertices
Forward edges are edges that go forward but are not in the Tree

Backward edges are edges that go backwards but are not in the Tree

Cross edges neither go forward or backward but go sideways

As we can see only the Back edges are the ones responsible for Cycles
Strongly connected means every vertex is
reachable from every other vertex
And remember the defn of Unsatisfiability – there is no way to get a T value
Proof of the Statement

Recursive Topological ordering algo


indegree initialization

indegree calculation

choose the one with indegree=0 enumerate it and delete it from G

updating the neighbours indegrees after


deletion
Why Longest Path = Min Number of Semesters --- the longest path gives the deepest
dependency chain, so it shows how late you will need to wait before you can take the final course
of the semester
Recursive Definition of Longest path
if indeg=0 then no Longest path
if indeg>0 then LongestPath(k)=1+max(LongestPath(j))
which is the Shortest path for Unweighted graphs(obvioulsy bcoz visits all nodes 1 edge
away then 2 edges away etc.)
Basically , Greedy algorithm
And we see the Usage of Heaps when we
do Heaps
This is the reason why Dijkstra doesn't work for -ve edge weights bcoz our
Induction reasoning doensn't hold anymore, we might find a Smaller path
to burnt vertex later , a.k.a the Burnt vertices is no longer Invariant
Principle of Optimality
Essentially the Principle of Optimality
Dijkstra without the Visited[]
Blindly repeat (n-1) times

In Dijkstra we are chosing only Unvisited vertices but


In Bellman Update ALL Edges n-1 times

The first for loop initializes the distance for all the Vertices. The second i=1 to n-1 doesn’t
correspond to the vertex , the idea is to blindly repeat the j weight update operation n-1
times because of our claim that the shortest path will have at most n-1 edges.So from the
table that gets constructed we will find some combination from Source to Destination that
will give us the Shortest path.

Source Vertex – 1

In the first iteration , 2 and 8 get updated. In the 2nd iteration, 8’s neighbour 7 and 2’s
neighbour 6 get’s updated.In the 3rd iteration 6 again gives us a better path of 3 , and 6 itself
also get’s updated , because we have found a better path 1-8-7-6 instead of 1-2-6.

You see at each iteration you are taking the best one of j-k or j-v-k. As a result if j-v-k is better
that is getting stored , then when you are doing j-k or j-v’-k again you are getting j-v-v’-k and
likewise. This is how we are getting the best paths , like 1-8-7-6 etc. This is the Principle of
Optimality
After the final iteration we have got the shortest path to all the vertices with Source 1, as
shortest path for 1-2, 1-3 , … , 1-8
Wk essentially restricting which paths can
appear in between the i--j
W0 --all direct edges
W1 -- 1 can appear
W2 -- 2 can appear etc
In general --
Wk[i,j] = min(Wk-1[i,j] , Wk-1[i,k]+Wk-1[k,j])
We restrict that , only these vertices can be used in the path based on the smallest total
weight
From W0 to W1 nothing changes because no edge goes into 1 , so including 1 doesn’t help
reduce the distance for any vertex
So on computing till W8 we get all Pairs shortest path

Probably should Bellman ford in general because it has better efficiency


The same idea --restricting the vertices that can appear and seeing whether there is a path
unique path
adding an edge creates a cycle

Greedy strategies
Notice the difference between Prim’s and Kruskal’s at this stage we have two disconnected
components
A local heuristic to decide
which edge to add next

Because every Greedy algorithm you have to prove using a lemma why it is the Global Optimum, in case of
Dijkstra positive weights, the burnt vertices invariant idea
For Prim's algorithm , the Minimum separator lemma
A very powerful claim
It is “not enough” to just take any edge going from u to v , we need to pick the smallest one
that one will be in every MST

Because it's greedy (this exact mechanism is actually implemented in the local heuristic)
(bcoz at the end , as we are visiting
all vertices, the overall smallest will
also get included)
PRIM'S with Heaps --

Essentially we are redefining “smaller” by doing an ordering and then giving these two
conditions
= Duplicate Edge weights
The while loop condition is so long we haven’t n-1 edges bcoz as soon as we do that , we
have made a Tree

We cannot add this green 10 because it will form a cycle so we discard it . We can add the
other 10 and likewise
If one End point of an edge is in one Component and another is in another Component ,
both the components are disjoint trees and the Edge doesn’t form a Cycle. However if both
the endpoints of the edge are in the same component then they form a cycle.

Hence we keep track of components to see which edges form a cycle

Say the edge is (2,3)


if component[4] was equal to component[3]
then now component[4] will be equal to
component[2] as well
MEME --5 operations
MakeUnionFind
find()
merge()
members[]
size[]
Say initially Component[2] was 2 , now if you merge it with 3, essentially
Component[2] will now become 3
So we need improvements
adding Members and Size
MEME -- main difference is merging
Instead of merging the components we merge
the Members[] reduces the complexity from
O(n) to O(size[])

On doing this , the amortized complexity of the Merge() operation grows as logm
Amortized analysis is taking a fixed amount and dividing it across the entire lifespan for eg ,
here we are saying over the entire span of m -Union() operations it takes an average of
O(logm)
Kruskal’s with UnionFind() Datastructure has the same complexity as Prim’s with Heaps
Hence one or the other becomes a Bottleneck in a 1dimensional structure

Hence we move to a 2dimensional structure


Hence we have to go thru Root(N) max's

Each row is independent of other rows so we don’t know where the Max is
So Kruskal's get O((m+n)logn) using Union Find
Prim's can get O((m+n)logn) using Heap
Priority Scheduling --Naive O(n2), SortedRow Matrix O(Nroot(N))
using Heap we can get O(NlogN)
Dijkstra can get O((m+n)logN) using Heap
(You notice this calculation is the same
as towers of Hanoi)
We use this strategy bcoz this is the most Efficient to remove the Root
bcoz individually logN for N values

Going Leaf to up gets it down to O(N) from O(NlogN)

So now we have all the Heap properties MIND PALACE --


For a Heap with N nodes
N-1 edges bcoz trees
N/2 leaves
logN + 1 levels
The basic operations of
Heap() are delete_max()
and insert(). We need
Update() for Dijkstra algo.
Now we will see complexity
for that ....
These two arrays help us where the update starts for Dijkstra’s
Fixing your Choice,your friend's ranking is just a Permutation of
your rankings
In the original 12345 in the friend’s ranking 24315 , in original 2 occurs after 1 so (1,2) is an
inversion , similarly for all other – WE DONOT CARE how far apart they are , if A occurs
before B in both the rankings then no inversion otherwise Inversion

We count these inversions to gain a measure of the similarity between the rankings
These are not
Hence ,
2 separate
conditions but
one sentence
Size values of L and R
So this is just priority scheduling, why use Binary Trees and not Heaps---problem with Heaps is that if
we want additional constraints like "spacing the flights 3mins apart etc" then checking whether it
violates heap constraint while inserting is a problem , bcoz , Heap takes O(n) to find Predecessor and
Successor but Binary Trees take only O(logn) to find them
Bcoz moving n elements
For everything
LNR

So we start at the Root, go to the left, there is another left , so go to the left to 1, no more
left, so now we print value and go to the right which is the 2, now this left has already been
done so --- print value and then right like this

Here we do the Recursive call on either


children
Here we just put it in a while loop
BTW , Successor is the next bigger Value and Predecessor is smaller value just before the
Value.
Getting to minval(t.right)
y = t.parent
Until y is not NIL and t is the right child of y
t becomes y(t moves from right to y)
y becomes parent of y(y cursor moves up)
For the ones where we can’t go down left, we go up from below and where we turn is the
Predecessor
If t=t.parent.left that is , if current node is the left child of it’s parent (this is how you read
tree code)
x-tree has children TLR,TR
y-tree has children TLL,x
Which is essentially going to kill all our attempts towards efficiency making all of this
pointless.

To get around this , this is what we do –


Earliest finishing time first
1.start time earliest --FAIL
2.interval is shortest--FAIL
3.minimum overlap--FAIL
4.finish time earliest --- ??
For overlapping sets , selecting the one with the shortest finishing time
Hence to prove Correctness in this case we
don't need to show A and O are the same just
that they are the same size

f() is the finishing time of booking i and


s() is the starting time of booking i
As you can see here again the JUMP for the
Induction literally came from the Defn of the
Greedy approach we are using
Otherwise if you the unsorted case, then
you will have to search n times for the 1st
allocation , n-1 times for the second
allocation etc, leading to O(n2)
Earliest deadline first

If we take the shortest job t(1) we are going to get lateness,shortest job first doesn’t give us
better example
Obviously entirely based on the idea that there
are multiple Optimal solns

So inversion is when -- O contains opposite to what our Greedy strategy is


This obeys the Prefix principle but violates “shorter code should be assigned to more
frequent letters” principle

So we invert that
1.Always a complete tree(0 or 2 children)(to get shorter tree)
2.Leaf labelled at more depth has lower frequency(so ABL can be minimized)
3.Leafs at max depth occur in pairs
Leaf at more depth has lower frequency
FIRST MERGE THEN SPLIT
First merge then Split
We can assume this is how S is
This kind of Inductive definition is not restricted to Numeric problems(like Factorial), you can
also do it for Structural problems(like Insertion Sort)
We solved using Greedy strategy "Earliest finish time
first" and showed Global Optimality by doing "stays
ahead of optimal" --now we will use DP
Power set of N , Greedy strategy cuts down the Exponential space into a Linear space
For Dynamic Programming,the size of the
Table is the complexity
Wasteful computation

So instead of recomputing we can just look up the result we want again from the table
As AA said any kind of Dynamic programming involves some tables, that table be it Pascal’s
or Bellman Ford’s or Coin Change etc IS THE MEMORY TABLE, that is the whole point of
Dynamic programming anticipating what that Memory table looks like
--bcoz if there is a cycle a subproblem
would depend on itself leading to infinite
recursion
So general idea is -- from (0,0) to (m,n)
either (m+n)Cm or (m+n)Cn
Hence Holes --divide the problem into 2 problems
Can also do Column by Column but will end up with the same values
The choice of Topological ordering is entirely upto us
Like here for eg. S doesn’t match a , but stra starts from the 2nd position in the 2nd word
which is fine , but in such a situation a CANNOT match anything after s, hence a0 and b0 if
not equal cannot both be a part of LCS
Need to find the optimal order for computation
We are skipping LP for now, we will get this if we see questions of this
It has a Checking algorithm but not a Generating algorithm
Again no simple way to write a Generating algorithm, here we are looking for a Checking
Algorithm , but unless we know how to solve the problem , how do we Check whether the
path is least cost.

Common Misunderstanding – TSP cannot be solved using Single source shortest path ideas
because it is about finding “Shortest possible tour” visiting every city exactly once and then
coming back to the original city with the shortest possible cost.Now you see why it has no
suitable Generating algo
A vertex cover of a graph is a set of vertices such that
every edge in the graph has at least one of its endpoints
in the set.
P--Polynomial time solvable
NP --Polynomial time checking
NP-hard -- every problem in NP can be reduced to these
NP-complete -- NP as well as NP-hard
NP-Hard is the set of problems for which we don’t have a suitable checking algorithm but
reduces to NP

You might also like