Birla Institute of Technology and Science, Pilani
Department of computer science & information system
BITS F464 - Machine Learning
I Semester 2020-21
9-Sep-20 Lab Sheet-04 – Hidden Markov Model
What is a Markov Model?
A Markov chain (model) describes a stochastic process where the assumed probability of future
state(s) depends only on the current process state and not on any the states that preceded it
Let's get into a simple example. Assume you want to model the future probability that your dog is
in one of three states given its current state. To do this we need to specify the state space, the
initial probabilities, and the transition probabilities.
Imagine you have a very lazy fat dog, so we define the state space as sleeping, eating, or
pooping. We will set the initial probabilities to 35%, 35%, and 30% respectively.
import numpy as np
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
%matplotlib inline
# create state space and initial state probabilities
states = ['sleeping', 'eating', 'pooping']
pi = [0.35, 0.35, 0.3]
state_space = pd.Series(pi, index=states, name='states')
print(state_space)
print(state_space.sum())
The next step is to define the transition probabilities. They are simply the probabilities of staying
in the same state or moving to a different state given the current state.
# create transition matrix
# equals transition probability matrix of changing states given a state
# matrix is size (M x M) where M is number of states
q_df = pd.DataFrame(columns=states, index=states)
q_df.loc[states[0]] = [0.4, 0.2, 0.4]
q_df.loc[states[1]] = [0.45, 0.45, 0.1]
q_df.loc[states[2]] = [0.45, 0.25, .3]
print(q_df)
q = q_df.values
print('\n', q, q.shape, '\n')
print(q_df.sum(axis=1))
In our toy example the dog's possible states are the nodes and the edges are the lines that
connect the nodes. The transition probabilities are the weights. They represent the probability of
transitioning to a state given the current state. we need to create a dictionary object that holds our
edges and their weights.
from pprint import pprint
# create a function that maps transition probability dataframe
# to markov edges and weights
def _get_markov_edges(Q):
edges = {}
for col in Q.columns:
for idx in Q.index:
edges[(idx,col)] = Q.loc[idx,col]
return edges
edges_wts = _get_markov_edges(q_df)
pprint(edges_wts)
Now create nodes, edges and their transition probabilities
# create graph object
G = nx.MultiDiGraph()
# nodes correspond to states
G.add_nodes_from(states_)
print(f'Nodes:\n{G.nodes()}\n')
# edges represent transition probabilities
for k, v in edges_wts.items():
tmp_origin, tmp_destination = k[0], k[1]
G.add_edge(tmp_origin, tmp_destination, weight=v, label=v)
print(f'Edges:')
pprint(G.edges(data=True))
Now you can observe that If you follow the edges from any node, it will tell you the probability that
the dog will transition to another state. For example, if the dog is sleeping, we can see there is a
40% chance the dog will keep sleeping, a 40% chance the dog will wake up and poop, and a 20%
chance the dog will wake up and eat.
What Makes a Markov Model Hidden?
Consider a situation where your dog is acting strangely and you wanted to model the probability
that your dog's behavior is due to sickness or simply quirky
This is where it gets a little more interesting. Now we create the emission or
observation probability matrix. This matrix is size M × O where M is the number of hidden states
and O is the number of possible observable states. The emission matrix tells us the probability
the dog is in one of the hidden states, given the current, observable state.
Let's keep the same observable states from the previous example. The dog can be either
sleeping, eating, or pooping.
# create matrix of observation (emission) probabilities
# b or beta = observation probabilities given state
# matrix is size (M x O) where M is number of states
# and O is number of different possible observations
observable_states = states
b_df = pd.DataFrame(columns=observable_states, index=hidden_states)
b_df.loc[hidden_states[0]] = [0.2, 0.6, 0.2]
b_df.loc[hidden_states[1]] = [0.4, 0.1, 0.5]
print(b_df)
b = b_df.values
print('\n', b, b.shape, '\n')
print(b_df.sum(axis=1))
Now we create the graph edges and the graph object.
# create graph edges and weights
hide_edges_wts = _get_markov_edges(a_df)
pprint(hide_edges_wts)
emit_edges_wts = _get_markov_edges(b_df)
pprint(emit_edges_wts)
# create graph object
G = nx.MultiDiGraph()
# nodes correspond to states
G.add_nodes_from(hidden_states)
print(f'Nodes:\n{G.nodes()}\n')
# edges represent hidden probabilities
for k, v in hide_edges_wts.items():
tmp_origin, tmp_destination = k[0], k[1]
G.add_edge(tmp_origin, tmp_destination, weight=v, label=v)
# edges represent emission probabilities
for k, v in emit_edges_wts.items():
tmp_origin, tmp_destination = k[0], k[1]
G.add_edge(tmp_origin, tmp_destination, weight=v, label=v)
print(f'Edges:')
pprint(G.edges(data=True))
The hidden Markov graph is a little more complex but the principles are the same. For example,
you would expect that if your dog is eating there is a high probability that it is healthy (60%) and
a very low probability that the dog is sick (10%).
Now, what if you needed to discern the health of your dog over time given a sequence of
observations? (
# observation sequence of dog's behaviors
# observations are encoded numerically
obs_map = {'sleeping':0, 'eating':1, 'pooping':2}
obs = np.array([1,1,2,1,0,1,2,1,0,2,2,0,1,0,1])
inv_obs_map = dict((v,k) for k, v in obs_map.items())
obs_seq = [inv_obs_map[v] for v in list(obs)]
print( pd.DataFrame(np.column_stack([obs, obs_seq]),
columns=['Obs_code', 'Obs_seq']) )
Lab 04 Exercise (Submit the code in given time):
Now try Viterbi algorithm on above concept. Some details of Viterbi algorithm are described
below:
The Viterbi algorithm is a dynamic programming algorithm for finding the most likely sequence of
hidden states—called the Viterbi path—that results in a sequence of observed events, especially
in the context of markov information sources and hidden markov models (HMM).
Using the Viterbi algorithm, you can identify the most likely sequence of hidden states given the
sequence of observations (describe in labsheet).
Pseudocode of Viterbi algorithm