Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
9 views1,159 pages

AI with Python for Beginners

Uploaded by

karthythevar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views1,159 pages

AI with Python for Beginners

Uploaded by

karthythevar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1159

Introduction to

Artificial Intelligence
with Python
Artificial Intelligence
O
Search X X
O X
P→Q
Knowledge P
Q
Uncertainty
Optimization
Inbox
Learning

Spam
Neural
Networks
NP

NP PP

Language
ADJ N P N

artificial with
intelligence python
Search
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15
Search Problems
agent
entity that perceives its environment
and acts upon that environment
state
a configuration of the agent and
its environment
2 4 5 7 12 9 4 2 15 4 10 3
8 3 1 11 8 7 3 14 13 1 11 12
14 6 10 1 6 11 9 5 14 7
9 13 15 12 5 13 10 15 6 8 2
initial state
the state in which the agent begins
initial state 2 4 5 7
8 3 1 11
14 6 10
9 13 15 12
actions
choices that can be made in a state
actions
ACTIONS(s) returns the set of actions that
can be executed in state s
1 2
actions

3
4
transition model
a description of what state results from
performing any applicable action in any
state
transition model
RESULT(s, a) returns the state resulting from
performing action a in state s
2 4 5 7 2 4 5 7
8 3 1 11 8 3 1 11
RESULT( , )=
14 6 10 12 14 6 10 12
9 13 15 9 13 15

2 4 5 7 2 4 5 7
8 3 1 11 8 3 1 11
RESULT( , )=
14 6 10 12 14 6 10
9 13 15 9 13 15 12
transition model
2 4 5 7 2 4 5 7
8 3 1 11 8 3 1 11
RESULT( , )=
14 6 10 12 14 6 10
9 13 15 9 13 15 12
state space
the set of all states reachable from the
initial state by any sequence of actions
2 4 5 7

8 3 1 11

14 6 10 12

2 4 5 7 9 13 15 2 4 5 7

8 3 1 11 8 3 1 11

14 6 10 12 14 6 10

9 13 15 9 13 15 12

2 4 5 7 2 4 5 7 2 4 5 7 2 4 5 7

8 3 1 11 8 3 1 11 8 3 1 11 8 3 1

14 6 10 12 14 6 12 14 6 10 14 6 10 11

9 13 15 9 13 10 15 9 13 15 12 9 13 15 12
goal test
way to determine whether a given state
is a goal state
path cost
numerical cost associated with a given path
A
B

C D
E F G

I K
H J

L
M
A 4
2 B

5 2
C
1 D 6
E F G
3 2 3
I 4 K 3
H 4 J 2
1
2 L
M
A 1
1 B

1 1
C
1 D 1
E F G
1 1 1
I 1 K 1
H 1 J 1
1
1 L
M
Search Problems

• initial state
• actions
• transition model
• goal test
• path cost function
solution
a sequence of actions that leads from the
initial state to a goal state
optimal solution
a solution that has the lowest path cost
among all solutions
node
a data structure that keeps track of
- a state
- a parent (node that generated this node)
- an action (action applied to parent to get node)
- a path cost (from initial state to node)
Approach

• Start with a frontier that contains the initial state.


• Repeat:
• If the frontier is empty, then no solution.
• Remove a node from the frontier.
• If node contains goal state, return the solution.
• Expand node, add resulting nodes to the frontier.
Find a path from A to E. A
B
Frontier

C D
• Start with a frontier that contains the initial state.
• Repeat:
• If the frontier is empty, then no solution.
• Remove a node from the frontier. E
• If node contains goal state, return the solution. F
• Expand node, add resulting nodes to the frontier.
Find a path from A to E. A
B
Frontier

C D
• Start with a frontier that contains the initial state.
• Repeat:
• If the frontier is empty, then no solution.
• Remove a node from the frontier. E
• If node contains goal state, return the solution. F
• Expand node, add resulting nodes to the frontier.
Find a path from A to E. A
B
Frontier

C D
• Start with a frontier that contains the initial state.
• Repeat:
• If the frontier is empty, then no solution.
• Remove a node from the frontier. E
• If node contains goal state, return the solution. F
• Expand node, add resulting nodes to the frontier.
Find a path from A to E. A
B
Frontier

C D
• Start with a frontier that contains the initial state.
• Repeat:
• If the frontier is empty, then no solution.
• Remove a node from the frontier. E
• If node contains goal state, return the solution. F
• Expand node, add resulting nodes to the frontier.
Find a path from A to E. A
B
Frontier

C D
• Start with a frontier that contains the initial state.
• Repeat:
• If the frontier is empty, then no solution.
• Remove a node from the frontier. E
• If node contains goal state, return the solution. F
• Expand node, add resulting nodes to the frontier.
Find a path from A to E. A
B
Frontier

C D

C D
• Start with a frontier that contains the initial state.
• Repeat:
• If the frontier is empty, then no solution.
• Remove a node from the frontier. E
• If node contains goal state, return the solution. F
• Expand node, add resulting nodes to the frontier.
Find a path from A to E. A
B
Frontier

C D
• Start with a frontier that contains the initial state.
• Repeat:
• If the frontier is empty, then no solution.
• Remove a node from the frontier. E
• If node contains goal state, return the solution. F
• Expand node, add resulting nodes to the frontier.
Find a path from A to E. A
B
Frontier

E D

C D
• Start with a frontier that contains the initial state.
• Repeat:
• If the frontier is empty, then no solution.
• Remove a node from the frontier. E
• If node contains goal state, return the solution. F
• Expand node, add resulting nodes to the frontier.
Find a path from A to E. A
B
Frontier

C D
• Start with a frontier that contains the initial state.
• Repeat:
• If the frontier is empty, then no solution.
• Remove a node from the frontier. E
• If node contains goal state, return the solution. F
• Expand node, add resulting nodes to the frontier.
Find a path from A to E. A
B
Frontier

C D
• Start with a frontier that contains the initial state.
• Repeat:
• If the frontier is empty, then no solution.
• Remove a node from the frontier. E
• If node contains goal state, return the solution. F
• Expand node, add resulting nodes to the frontier.
What could go wrong?
Find a path from A to E. A
B
Frontier

C D

E
F
Find a path from A to E. A
B
Frontier

C D

E
F
Find a path from A to E. A
B
Frontier

C D

E
F
Find a path from A to E. A
B
Frontier

C D

E
F
Find a path from A to E. A
B
Frontier

C D

E
F
Find a path from A to E. A
B
Frontier

A C D

C D

E
F
Find a path from A to E. A
B
Frontier

C D

C D

E
F
Revised Approach
• Start with a frontier that contains the initial state.
• Start with an empty explored set.
• Repeat:
• If the frontier is empty, then no solution.
• Remove a node from the frontier.
• If node contains goal state, return the solution.
• Add the node to the explored set.
• Expand node, add resulting nodes to the frontier if they
aren't already in the frontier or the explored set.
Revised Approach
• Start with a frontier that contains the initial state.
• Start with an empty explored set.
• Repeat:
• If the frontier is empty, then no solution.
• Remove a node from the frontier.
• If node contains goal state, return the solution.
• Add the node to the explored set.
• Expand node, add resulting nodes to the frontier if they
aren't already in the frontier or the explored set.
stack
last-in first-out data type
Find a path from A to E. A
B
Frontier

C D
Explored Set

E
F
Find a path from A to E. A
B
Frontier

C D
Explored Set

E
F
Find a path from A to E. A
B
Frontier

C D
Explored Set

E
F
Find a path from A to E. A
B
Frontier

C D
Explored Set

E
F
Find a path from A to E. A
B
Frontier

C D
Explored Set

A B

E
F
Find a path from A to E. A
B
Frontier

C D

C D
Explored Set

A B

E
F
Find a path from A to E. A
B
Frontier

C D
Explored Set

A B D

E
F
Find a path from A to E. A
B
Frontier

C F

C D
Explored Set

A B D

E
F
Find a path from A to E. A
B
Frontier

C D
Explored Set

A B D F

E
F
Find a path from A to E. A
B
Frontier

C D
Explored Set

A B D F C

E
F
Find a path from A to E. A
B
Frontier

C D
Explored Set

A B D F C

E
F
Find a path from A to E. A
B
Frontier

C D
Explored Set

A B D F C

E
F
Depth-First Search
depth-first search
search algorithm that always expands the
deepest node in the frontier
Breadth-First Search
breadth-first search
search algorithm that always expands the
shallowest node in the frontier
queue
first-in first-out data type
Find a path from A to E. A
B
Frontier

C D
Explored Set

E
F
Find a path from A to E. A
B
Frontier

C D
Explored Set

E
F
Find a path from A to E. A
B
Frontier

C D
Explored Set

E
F
Find a path from A to E. A
B
Frontier

C D
Explored Set

E
F
Find a path from A to E. A
B
Frontier

C D
Explored Set

A B

E
F
Find a path from A to E. A
B
Frontier

C D

C D
Explored Set

A B

E
F
Find a path from A to E. A
B
Frontier

C D
Explored Set

A B C

E
F
Find a path from A to E. A
B
Frontier

D E

C D
Explored Set

A B C

E
F
Find a path from A to E. A
B
Frontier

C D
Explored Set

A B C D

E
F
Find a path from A to E. A
B
Frontier

E F

C D
Explored Set

A B C D

E
F
Find a path from A to E. A
B
Frontier

C D
Explored Set

A B C D

E
F
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Depth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
Breadth-First Search

A
uninformed search
search strategy that uses no problem-
specific knowledge
informed search
search strategy that uses problem-specific
knowledge to find solutions more efficiently
greedy best-first search
search algorithm that expands the node
that is closest to the goal, as estimated by a
heuristic function h(n)
Heuristic function?

A
Heuristic function?

A
Heuristic function? Manhattan distance.

A
Greedy Best-First Search

11 9 7 3 2 B
12 10 8 7 6 4 1

13 12 11 9 7 6 5 2

13 10 8 6 3

14 13 12 11 9 7 6 5 4

13 10

A 16 15 14 11 10 9 8 7 6
Greedy Best-First Search

11 9 7 3 2 B
12 10 8 7 6 4 1

13 12 11 9 7 6 5 2

13 10 8 6 3

14 13 12 11 9 7 6 5 4

13 10

A 16 15 14 11 10 9 8 7 6
Greedy Best-First Search

11 9 7 3 2 B
12 10 8 7 6 4 1

13 12 11 9 7 6 5 2

13 10 8 6 3

14 13 12 11 9 7 6 5 4

13 10

A 16 15 14 11 10 9 8 7 6
Greedy Best-First Search

11 9 7 3 2 B
12 10 8 7 6 4 1

13 12 11 9 7 6 5 2

13 10 8 6 3

14 13 12 11 9 7 6 5 4

13 10

A 16 15 14 11 10 9 8 7 6
Greedy Best-First Search

11 9 7 3 2 B
12 10 8 7 6 4 1

13 12 11 9 7 6 5 2

13 10 8 6 3

14 13 12 11 9 7 6 5 4

13 10

A 16 15 14 11 10 9 8 7 6
Greedy Best-First Search

11 9 7 3 2 B
12 10 8 7 6 4 1

13 12 11 9 7 6 5 2

13 10 8 6 3

14 13 12 11 9 7 6 5 4

13 10

A 16 15 14 11 10 9 8 7 6
Greedy Best-First Search

11 9 7 3 2 B
12 10 8 7 6 4 1

13 12 11 9 7 6 5 2

13 10 8 6 3

14 13 12 11 9 7 6 5 4

13 10

A 16 15 14 11 10 9 8 7 6
Greedy Best-First Search

11 9 7 3 2 B
12 10 8 7 6 4 1

13 12 11 9 7 6 5 2

13 10 8 6 3

14 13 12 11 9 7 6 5 4

13 10

A 16 15 14 11 10 9 8 7 6
Greedy Best-First Search

11 9 7 3 2 B
12 10 8 7 6 4 1

13 12 11 9 7 6 5 2

13 10 8 6 3

14 13 12 11 9 7 6 5 4

13 10

A 16 15 14 11 10 9 8 7 6
Greedy Best-First Search

11 9 7 3 2 B
12 10 8 7 6 4 1

13 12 11 9 7 6 5 2

13 10 8 6 3

14 13 12 11 9 7 6 5 4

13 10

A 16 15 14 11 10 9 8 7 6
Greedy Best-First Search

11 9 7 3 2 B
12 10 8 7 6 4 1

13 12 11 9 7 6 5 2

13 10 8 6 3

14 13 12 11 9 7 6 5 4

13 10

A 16 15 14 11 10 9 8 7 6
Greedy Best-First Search

11 9 7 3 2 B
12 10 8 7 6 4 1

13 12 11 9 7 6 5 2

13 10 8 6 3

14 13 12 11 9 7 6 5 4

13 10

A 16 15 14 11 10 9 8 7 6
Greedy Best-First Search

11 9 7 3 2 B
12 10 8 7 6 4 1

13 12 11 9 7 6 5 2

13 10 8 6 3

14 13 12 11 9 7 6 5 4

13 10

A 16 15 14 11 10 9 8 7 6
Greedy Best-First Search

11 9 7 3 2 B
12 10 8 7 6 4 1

13 12 11 9 7 6 5 2

13 10 8 6 3

14 13 12 11 9 7 6 5 4

13 10

A 16 15 14 11 10 9 8 7 6
Greedy Best-First Search

11 9 7 3 2 B
12 10 8 7 6 4 1

13 12 11 9 7 6 5 2

13 10 8 6 3

14 13 12 11 9 7 6 5 4

13 10

A 16 15 14 11 10 9 8 7 6
Greedy Best-First Search

11 9 7 3 2 B
12 10 8 7 6 4 1

13 12 11 9 7 6 5 2

13 10 8 6 3

14 13 12 11 9 7 6 5 4

13 10

A 16 15 14 11 10 9 8 7 6
Greedy Best-First Search

11 9 7 3 2 B
12 10 8 7 6 4 1

13 12 11 9 7 6 5 2

13 10 8 6 3

14 13 12 11 9 7 6 5 4

13 10

A 16 15 14 11 10 9 8 7 6
Greedy Best-First Search

11 9 7 3 2 B
12 10 8 7 6 4 1

13 12 11 9 7 6 5 2

13 10 8 6 3

14 13 12 11 9 7 6 5 4

13 10

A 16 15 14 11 10 9 8 7 6
Greedy Best-First Search

11 9 7 3 2 B
12 10 8 7 6 4 1

13 12 11 9 7 6 5 2

13 10 8 6 3

14 13 12 11 9 7 6 5 4

13 10

A 16 15 14 11 10 9 8 7 6
Greedy Best-First Search

11 9 7 3 2 B
12 10 8 7 6 4 1

13 12 11 9 7 6 5 2

13 10 8 6 3

14 13 12 11 9 7 6 5 4

13 10

A 16 15 14 11 10 9 8 7 6
Greedy Best-First Search

11 9 7 3 2 B
12 10 8 7 6 4 1

13 12 11 9 7 6 5 2

13 10 8 6 3

14 13 12 11 9 7 6 5 4

13 10

A 16 15 14 11 10 9 8 7 6
Greedy Best-First Search

11 9 7 3 2 B
12 10 8 7 6 4 1

13 12 11 9 7 6 5 2

13 10 8 6 3

14 13 12 11 9 7 6 5 4

13 10

A 16 15 14 11 10 9 8 7 6
Greedy Best-First Search

11 9 7 3 2 B
12 10 8 7 6 4 1

13 12 11 9 7 6 5 2

13 10 8 6 3

14 13 12 11 9 7 6 5 4

13 10

A 16 15 14 11 10 9 8 7 6
Greedy Best-First Search

11 9 7 3 2 B
12 10 8 7 6 4 1

13 12 11 9 7 6 5 2

13 10 8 6 3

14 13 12 11 9 7 6 5 4

13 10

A 16 15 14 11 10 9 8 7 6
Greedy Best-First Search

11 9 7 3 2 B
12 10 8 7 6 4 1

13 12 11 9 7 6 5 2

13 10 8 6 3

14 13 12 11 9 7 6 5 4

13 10

A 16 15 14 11 10 9 8 7 6
Greedy Best-First Search

11 9 7 3 2 B
12 10 8 7 6 4 1

13 12 11 9 7 6 5 2

13 10 8 6 3

14 13 12 11 9 7 6 5 4

13 10

A 16 15 14 11 10 9 8 7 6
Greedy Best-First Search

11 9 7 3 2 B
12 10 8 7 6 4 1

13 12 11 9 7 6 5 2

13 10 8 6 3

14 13 12 11 9 7 6 5 4

13 10

A 16 15 14 11 10 9 8 7 6
Greedy Best-First Search

10 9 8 7 6 5 4 3 2 1 B
11 1

12 10 9 8 7 6 5 4 2

13 11 5 3

14 13 12 10 9 8 7 6 4

13 11 5

A 16 15 14 12 11 10 9 8 7 6
Greedy Best-First Search

10 9 8 7 6 5 4 3 2 1 B
11 1

12 10 9 8 7 6 5 4 2

13 11 5 3

14 13 12 10 9 8 7 6 4

13 11 5

A 16 15 14 12 11 10 9 8 7 6
Greedy Best-First Search

10 9 8 7 6 5 4 3 2 1 B
11 1

12 10 9 8 7 6 5 4 2

13 11 5 3

14 13 12 10 9 8 7 6 4

13 11 5

A 16 15 14 12 11 10 9 8 7 6
A* search
search algorithm that expands node with
lowest value of g(n) + h(n)

g(n) = cost to reach node


h(n) = estimated cost to goal
A* Search

10 9 8 7 6 5 4 3 2 1 B
11 1

12 10 9 8 7 6 5 4 2

13 11 5 3

14 13 12 10 9 8 7 6 4

13 11 5

A 16 15 14 12 11 10 9 8 7 6
A* Search

10 9 8 7 6 5 4 3 2 1 B
11 1

12 10 9 8 7 6 5 4 2

13 11 5 3

14 13 12 10 9 8 7 6 4

13 11 5

A 1+16 15 14 12 11 10 9 8 7 6
A* Search

10 9 8 7 6 5 4 3 2 1 B
11 1

12 10 9 8 7 6 5 4 2

13 11 5 3

14 13 12 10 9 8 7 6 4

13 11 5

A 1+16 2+15 14 12 11 10 9 8 7 6
A* Search

10 9 8 7 6 5 4 3 2 1 B
11 1

12 10 9 8 7 6 5 4 2

13 11 5 3

14 13 12 10 9 8 7 6 4

13 11 5

A 1+16 2+15 3+14 12 11 10 9 8 7 6


A* Search

10 9 8 7 6 5 4 3 2 1 B
11 1

12 10 9 8 7 6 5 4 2

13 11 5 3

14 13 12 10 9 8 7 6 4

4+13 11 5

A 1+16 2+15 3+14 12 11 10 9 8 7 6


A* Search

10 9 8 7 6 5 4 3 2 1 B
11 1

12 10 9 8 7 6 5 4 2

13 11 5 3

14 13 5+12 10 9 8 7 6 4

4+13 11 5

A 1+16 2+15 3+14 12 11 10 9 8 7 6


A* Search

10 9 8 7 6 5 4 3 2 1 B
11 1

12 10 9 8 7 6 5 4 2

13 6+11 5 3

14 13 5+12 10 9 8 7 6 4

4+13 11 5

A 1+16 2+15 3+14 12 11 10 9 8 7 6


A* Search

10 9 8 7 6 5 4 3 2 1 B
11 1

12 7+10 9 8 7 6 5 4 2

13 6+11 5 3

14 13 5+12 10 9 8 7 6 4

4+13 11 5

A 1+16 2+15 3+14 12 11 10 9 8 7 6


A* Search

10 9 8 7 6 5 4 3 2 1 B
11 1

12 7+10 8+9 8 7 6 5 4 2

13 6+11 5 3

14 13 5+12 10 9 8 7 6 4

4+13 11 5

A 1+16 2+15 3+14 12 11 10 9 8 7 6


A* Search

10 9 8 7 6 5 4 3 2 1 B
11 1

12 7+10 8+9 9+8 7 6 5 4 2

13 6+11 5 3

14 13 5+12 10 9 8 7 6 4

4+13 11 5

A 1+16 2+15 3+14 12 11 10 9 8 7 6


A* Search

10 9 8 7 6 5 4 3 2 1 B
11 1

12 7+10 8+9 9+8 10+7 6 5 4 2

13 6+11 5 3

14 13 5+12 10 9 8 7 6 4

4+13 11 5

A 1+16 2+15 3+14 12 11 10 9 8 7 6


A* Search

10 9 8 7 6 5 4 3 2 1 B
11 1

12 7+10 8+9 9+8 10+7 11+6 5 4 2

13 6+11 5 3

14 13 5+12 10 9 8 7 6 4

4+13 11 5

A 1+16 2+15 3+14 12 11 10 9 8 7 6


A* Search

10 9 8 7 6 5 4 3 2 1 B
11 1

12 7+10 8+9 9+8 10+7 11+6 12+5 4 2

13 6+11 5 3

14 13 5+12 10 9 8 7 6 4

4+13 11 5

A 1+16 2+15 3+14 12 11 10 9 8 7 6


A* Search

10 9 8 7 6 5 4 3 2 1 B
11 1

12 7+10 8+9 9+8 10+7 11+6 12+5 13+4 2

13 6+11 5 3

14 13 5+12 10 9 8 7 6 4

4+13 11 5

A 1+16 2+15 3+14 12 11 10 9 8 7 6


A* Search

10 9 8 7 6 5 4 3 2 1 B
11 1

12 7+10 8+9 9+8 10+7 11+6 12+5 13+4 2

13 6+11 14+5 3

14 13 5+12 10 9 8 7 6 4

4+13 11 5

A 1+16 2+15 3+14 12 11 10 9 8 7 6


A* Search

10 9 8 7 6 5 4 3 2 1 B
11 1

12 7+10 8+9 9+8 10+7 11+6 12+5 13+4 2

13 6+11 14+5 3

14 6+13 5+12 10 9 8 7 6 4

4+13 11 5

A 1+16 2+15 3+14 12 11 10 9 8 7 6


A* Search

10 9 8 7 6 5 4 3 2 1 B
11 1

12 7+10 8+9 9+8 10+7 11+6 12+5 13+4 2

13 6+11 14+5 3

14 6+13 5+12 10 9 8 7 15+6 4

4+13 11 5

A 1+16 2+15 3+14 12 11 10 9 8 7 6


A* Search

10 9 8 7 6 5 4 3 2 1 B
11 1

12 7+10 8+9 9+8 10+7 11+6 12+5 13+4 2

13 6+11 14+5 3

7+14 6+13 5+12 10 9 8 7 15+6 4

4+13 11 5

A 1+16 2+15 3+14 12 11 10 9 8 7 6


A* Search

10 9 8 7 6 5 4 3 2 1 B
11 1

12 7+10 8+9 9+8 10+7 11+6 12+5 13+4 2

8+13 6+11 14+5 3

7+14 6+13 5+12 10 9 8 7 15+6 4

4+13 11 5

A 1+16 2+15 3+14 12 11 10 9 8 7 6


A* Search

10 9 8 7 6 5 4 3 2 1 B
11 1

9+12 7+10 8+9 9+8 10+7 11+6 12+5 13+4 2

8+13 6+11 14+5 3

7+14 6+13 5+12 10 9 8 7 15+6 4

4+13 11 5

A 1+16 2+15 3+14 12 11 10 9 8 7 6


A* Search

10 9 8 7 6 5 4 3 2 1 B
10+11 1

9+12 7+10 8+9 9+8 10+7 11+6 12+5 13+4 2

8+13 6+11 14+5 3

7+14 6+13 5+12 10 9 8 7 15+6 4

4+13 11 5

A 1+16 2+15 3+14 12 11 10 9 8 7 6


A* Search

11+10 9 8 7 6 5 4 3 2 1 B
10+11 1

9+12 7+10 8+9 9+8 10+7 11+6 12+5 13+4 2

8+13 6+11 14+5 3

7+14 6+13 5+12 10 9 8 7 15+6 4

4+13 11 5

A 1+16 2+15 3+14 12 11 10 9 8 7 6


A* Search

11+10 12+9 8 7 6 5 4 3 2 1 B
10+11 1

9+12 7+10 8+9 9+8 10+7 11+6 12+5 13+4 2

8+13 6+11 14+5 3

7+14 6+13 5+12 10 9 8 7 15+6 4

4+13 11 5

A 1+16 2+15 3+14 12 11 10 9 8 7 6


A* Search

11+10 12+9 13+8 7 6 5 4 3 2 1 B


10+11 1

9+12 7+10 8+9 9+8 10+7 11+6 12+5 13+4 2

8+13 6+11 14+5 3

7+14 6+13 5+12 10 9 8 7 15+6 4

4+13 11 5

A 1+16 2+15 3+14 12 11 10 9 8 7 6


A* Search

11+10 12+9 13+8 14+7 6 5 4 3 2 1 B


10+11 1

9+12 7+10 8+9 9+8 10+7 11+6 12+5 13+4 2

8+13 6+11 14+5 3

7+14 6+13 5+12 10 9 8 7 15+6 4

4+13 11 5

A 1+16 2+15 3+14 12 11 10 9 8 7 6


A* Search

11+10 12+9 13+8 14+7 15+6 5 4 3 2 1 B


10+11 1

9+12 7+10 8+9 9+8 10+7 11+6 12+5 13+4 2

8+13 6+11 14+5 3

7+14 6+13 5+12 10 9 8 7 15+6 4

4+13 11 5

A 1+16 2+15 3+14 12 11 10 9 8 7 6


A* Search

11+10 12+9 13+8 14+7 15+6 16+5 4 3 2 1 B


10+11 1

9+12 7+10 8+9 9+8 10+7 11+6 12+5 13+4 2

8+13 6+11 14+5 3

7+14 6+13 5+12 10 9 8 7 15+6 4

4+13 11 5

A 1+16 2+15 3+14 12 11 10 9 8 7 6


A* Search

11+10 12+9 13+8 14+7 15+6 16+5 17+4 3 2 1 B


10+11 1

9+12 7+10 8+9 9+8 10+7 11+6 12+5 13+4 2

8+13 6+11 14+5 3

7+14 6+13 5+12 10 9 8 7 15+6 4

4+13 11 5

A 1+16 2+15 3+14 12 11 10 9 8 7 6


A* Search

11+10 12+9 13+8 14+7 15+6 16+5 17+4 18+3 2 1 B


10+11 1

9+12 7+10 8+9 9+8 10+7 11+6 12+5 13+4 2

8+13 6+11 14+5 3

7+14 6+13 5+12 10 9 8 7 15+6 4

4+13 11 5

A 1+16 2+15 3+14 12 11 10 9 8 7 6


A* Search

11+10 12+9 13+8 14+7 15+6 16+5 17+4 18+3 19+2 1 B


10+11 1

9+12 7+10 8+9 9+8 10+7 11+6 12+5 13+4 2

8+13 6+11 14+5 3

7+14 6+13 5+12 10 9 8 7 15+6 4

4+13 11 5

A 1+16 2+15 3+14 12 11 10 9 8 7 6


A* Search

11+10 12+9 13+8 14+7 15+6 16+5 17+4 18+3 19+2 20+1 B
10+11 1

9+12 7+10 8+9 9+8 10+7 11+6 12+5 13+4 2

8+13 6+11 14+5 3

7+14 6+13 5+12 10 9 8 7 15+6 4

4+13 11 5

A 1+16 2+15 3+14 12 11 10 9 8 7 6


A* search
optimal if
- h(n) is admissible (never overestimates the
true cost), and
- h(n) is consistent (for every node n and
successor n' with step cost c, h(n) ≤ h(n') + c)
Adversarial Search
O X
O X X
O X
O X
O X X
O X
Minimax
O X X X O X O X
O O O O X X O
O X X X X O X O X

-1 0 1
Minimax

• MAX (X) aims to maximize score.


• MIN (O) aims to minimize score.
Game
• S0 : initial state
• PLAYER(s) : returns which player to move in state s
• ACTIONS(s) : returns legal moves in state s
• RESULT(s, a) : returns state after action a taken in state s
• TERMINAL(s) : checks if state s is a terminal state
• UTILITY(s) : final numerical value for terminal state s
Initial State
PLAYER(s)

PLAYER( )= X

PLAYER( X )= O
ACTIONS(s)

X O O
ACTIONS( O X X )={ ,
O
}
X O
RESULT(s, a)

X O O X O
O
RESULT( O X X , )= O X X
X O X O
TERMINAL(s)
O
TERMINAL( O X ) = false
X O X

O X
TERMINAL( O X ) = true
X O X
UTILITY(s)
O X
UTILITY( O X )= 1
X O X

O X X
UTILITY( X O ) = -1
O X O
O X O
O X X
X X O
VALUE: 1
MIN-VALUE: X O
PLAYER(s) = O 0 O X X
X O

O X O X O
MAX-VALUE: MAX-VALUE:
1 O X X 0 O X X
X O X O O

O X O X X O
VALUE: O X X VALUE: O X X
1 0
X X O X O O
MIN-VALUE: X O
PLAYER(s) = O 0 O X X
X O

O X O X O
MAX-VALUE: MAX-VALUE:
1 O X X 0 O X X
X O X O O

O X O X X O
VALUE: O X X VALUE: O X X
1 0
X X O X O O
MAX-VALUE:
1
X O
PLAYER(s) = X O X
X O

MIN-VALUE: X O MIN-VALUE: X X O VALUE: X O


0 -1 1
O X X O X O X
X O X O X X O

O X O X O X X O X X O
MAX-VALUE: MAX-VALUE: VALUE: MAX-VALUE:
1 O X X 0 O X X -1 O X O 0 O X
X O X O O X O X O O

O X O X X O X X O
VALUE: O X X VALUE: O X X VALUE: O X X
1 0 0
X X O X O O X O O
9

5 3 9
8

9 8

5 3 9 2 8
Minimax

• Given a state s:
• MAX picks action a in ACTIONS(s) that produces
highest value of MIN-VALUE(RESULT(s, a))
• MIN picks action a in ACTIONS(s) that produces
smallest value of MAX-VALUE(RESULT(s, a))
Minimax

function MAX-VALUE(state):
if TERMINAL(state):
return UTILITY(state)
v = -∞
for action in ACTIONS(state):
v = MAX(v, MIN-VALUE(RESULT(state, action)))
return v
Minimax

function MIN-VALUE(state):
if TERMINAL(state):
return UTILITY(state)
v=∞
for action in ACTIONS(state):
v = MIN(v, MAX-VALUE(RESULT(state, action)))
return v
Optimizations
4

4 5 3 2

4 8 5 9 3 7 2 4 6
4

4 5 ≤3 ≤2

4 8 5 9 3 2
Alpha-Beta Pruning
255,168
total possible Tic-Tac-Toe games
288,000,000,000
total possible chess games
after four moves each
29000
10
total possible chess games
(lower bound)
Depth-Limited Minimax
evaluation function
function that estimates the expected utility
of the game from a given state
https://xkcd.com/832/
Search
Introduction to
Artificial Intelligence
with Python
Introduction to
Artificial Intelligence
with Python
Knowledge
knowledge-based agents
agents that reason by operating on
internal representations of knowledge
If it didn't rain, Harry visited Hagrid today.

Harry visited Hagrid or Dumbledore today, but not both.

Harry visited Dumbledore today.

Harry did not visit Hagrid today.

It rained today.
Logic
sentence
an assertion about the world
in a knowledge representation language
Propositional Logic
Proposition Symbols

P Q R
Logical Connectives

¬ ∧ ∨
not and or

→ ↔
implication biconditional
Not (¬)

P ¬P
false true

true false
And (∧)

P Q P∧Q
false false false

false true false

true false false

true true true


Or (∨)

P Q P∨Q
false false false

false true true

true false true

true true true


Implication (→)

P Q P→Q
false false true

false true true

true false false

true true true


Biconditional (↔)

P Q P↔Q
false false true

false true false

true false false

true true true


model
assignment of a truth value to every
propositional symbol (a "possible world")
P: It is raining.
model
Q: It is a Tuesday.

{P = true, Q = false}
knowledge base
a set of sentences known by a
knowledge-based agent
Entailment

α⊨β

In every model in which sentence α is true,


sentence β is also true.
If it didn't rain, Harry visited Hagrid today.

Harry visited Hagrid or Dumbledore today, but not both.

Harry visited Dumbledore today.

Harry did not visit Hagrid today.

It rained today.
inference
the process of deriving new sentences
from old ones
P: It is a Tuesday.
Q: It is raining.
R: Harry will go for a run.

KB: (P ∧ ¬Q) → R P ¬Q

Inference: R
Inference Algorithms
Does
KB ⊨ α
?
Model Checking
Model Checking

• To determine if KB ⊨ α:
• Enumerate all possible models.
• If in every model where KB is true, α is true, then
KB entails α.
• Otherwise, KB does not entail α.
P: It is a Tuesday. Q: It is raining. R: Harry will go for a run.
KB: (P ∧ ¬Q) → R P ¬Q
Query: R
P Q R KB
false false false
false false true
false true false
false true true
true false false
true false true
true true false
true true true
P: It is a Tuesday. Q: It is raining. R: Harry will go for a run.
KB: (P ∧ ¬Q) → R P ¬Q
Query: R
P Q R KB
false false false false
false false true false
false true false false
false true true false
true false false false
true false true true
true true false false
true true true false
P: It is a Tuesday. Q: It is raining. R: Harry will go for a run.
KB: (P ∧ ¬Q) → R P ¬Q
Query: R
P Q R KB
false false false false
false false true false
false true false false
false true true false
true false false false
true false true true
true true false false
true true true false
Knowledge Engineering
Clue
Clue
People Rooms Weapons
Col. Mustard Ballroom Knife

Prof. Plum Kitchen Revolver

Ms. Scarlet Library Wrench


Clue
People Rooms Weapons
Col. Mustard Ballroom Knife

Prof. Plum Kitchen Revolver

Ms. Scarlet Library Wrench


Clue
People Rooms Weapons
Prof. Plum Library Wrench

Ms. Scarlet Kitchen Knife

Col. Mustard Ballroom Revolver


People Rooms Weapons
Prof. Plum Library Wrench

Ms. Scarlet Kitchen Knife

Col. Mustard Ballroom Revolver


People Rooms Weapons
Library Wrench

Ms. Scarlet Kitchen

Col. Mustard Revolver

Prof.
KnifePlum
Ballroom
Clue
Propositional Symbols

mustard ballroom knife


plum kitchen revolver
scarlet library wrench
Clue
(mustard ∨ plum ∨ scarlet)
(ballroom ∨ kitchen ∨ library)
(knife ∨ revolver ∨ wrench)

¬plum
¬mustard ∨ ¬library ∨ ¬revolver
Logic Puzzles

• Gilderoy, Minerva, Pomona and Horace each belong


to a different one of the four houses: Gryffindor,
Hufflepuff, Ravenclaw, and Slytherin House.
• Gilderoy belongs to Gryffindor or Ravenclaw.
• Pomona does not belong in Slytherin.
• Minerva belongs to Gryffindor.
Logic Puzzles
Propositional Symbols
GilderoyGryffindor MinervaGryffindor
GilderoyHufflepuff MinervaHufflepuff
GilderoyRavenclaw MinervaRavenclaw
GilderoySlytherin MinervaSlytherin
PomonaGryffindor HoraceGryffindor
PomonaHufflepuff HoraceHufflepuff
PomonaRavenclaw HoraceRavenclaw
PomonaSlytherin HoraceSlytherin
Logic Puzzles
(PomonaSlytherin → ¬PomonaHufflepuff)

(MinervaRavenclaw → ¬GilderoyRavenclaw)

(GilderoyGryffindor ∨ GilderoyRavenclaw)
Mastermind

4
Inference Rules
Modus Ponens
If it is raining, then Harry is inside.

It is raining.

Harry is inside.
Modus Ponens

α→ β
α

β
And Elimination

Harry is friends with Ron and Hermione.

Harry is friends with Hermione.


And Elimination

α∧β

α
Double Negation Elimination

It is not true that Harry did not pass the test.

Harry passed the test.


Double Negation Elimination

¬(¬α)

α
Implication Elimination

If it is raining, then Harry is inside.

It is not raining or Harry is inside.


Implication Elimination

α→ β

¬α ∨ β
Biconditional Elimination

It is raining if and only if Harry is inside.

If it is raining, then Harry is inside,


and if Harry is inside, then it is raining.
Biconditional Elimination

α↔ β

(α → β) ∧ (β → α)
De Morgan's Law

It is not true that both


Harry and Ron passed the test.

Harry did not pass the test


or Ron did not pass the test.
De Morgan's Law

¬(α ∧ β)

¬α ∨ ¬β
De Morgan's Law

It is not true that


Harry or Ron passed the test.

Harry did not pass the test


and Ron did not pass the test.
De Morgan's Law

¬(α ∨ β)

¬α ∧ ¬β
Distributive Property

(α ∧ (β ∨ γ))

(α ∧ β) ∨ (α ∧ γ)
Distributive Property

(α ∨ (β ∧ γ))

(α ∨ β) ∧ (α ∨ γ)
Search Problems

• initial state
• actions
• transition model
• goal test
• path cost function
Theorem Proving

• initial state: starting knowledge base


• actions: inference rules
• transition model: new knowledge base after inference
• goal test: check statement we're trying to prove
• path cost function: number of steps in proof
Resolution
(Ron is in the Great Hall) ∨ (Hermione is in the library)

Ron is not in the Great Hall

Hermione is in the library


P∨Q
¬P

Q
P ∨ Q1 ∨ Q2 ∨ ...∨ Qn
¬P

Q1 ∨ Q2 ∨ ...∨ Qn
(Ron is in the Great Hall) ∨ (Hermione is in the library)

(Ron is not in the Great Hall) ∨ (Harry is sleeping)

(Hermione is in the library) ∨ (Harry is sleeping)


P∨Q
¬P ∨ R

Q∨R
P ∨ Q1 ∨ Q2 ∨ ...∨ Qn
¬P ∨ R1 ∨ R2 ∨ ...∨ Rm

Q1 ∨ Q2 ∨ ...∨ Qn ∨ R1 ∨ R2 ∨ ...∨ Rm
clause
a disjunction of literals

e.g. P ∨ Q ∨ R
conjunctive normal form
logical sentence that is a conjunction of
clauses

e.g. (A ∨ B ∨ C) ∧ (D ∨ ¬E) ∧ (F ∨ G)
Conversion to CNF
• Eliminate biconditionals
• turn (α ↔ β) into (α → β) ∧ (β → α)
• Eliminate implications
• turn (α → β) into ¬α ∨ β
• Move ¬ inwards using De Morgan's Laws
• e.g. turn ¬(α ∧ β) into ¬α ∨ ¬β
• Use distributive law to distribute ∨ wherever possible
Conversion to CNF
(P ∨ Q) → R
¬(P ∨ Q) ∨ R eliminate implication

(¬P ∧ ¬Q) ∨ R De Morgan's Law

(¬P ∨ R) ∧ (¬Q ∨ R) distributive law


Inference by Resolution
P∨Q
¬P ∨ R

(Q ∨ R)
P∨Q∨S
¬P ∨ R ∨ S

(Q ∨ S ∨ R ∨ S)
P∨Q∨S
¬P ∨ R ∨ S

(Q ∨ R ∨ S)
P
¬P

()
Inference by Resolution

• To determine if KB ⊨ α:
• Check if (KB ∧ ¬α) is a contradiction?
• If so, then KB ⊨ α.
• Otherwise, no entailment.
Inference by Resolution
• To determine if KB ⊨ α:
• Convert (KB ∧ ¬α) to Conjunctive Normal Form.
• Keep checking to see if we can use resolution to
produce a new clause.
• If ever we produce the empty clause (equivalent
to False), we have a contradiction, and KB ⊨ α.
• Otherwise, if we can't add new clauses, no
entailment.
Inference by Resolution
Does (A ∨ B) ∧ (¬B ∨ C) ∧ (¬C) entail A?

(A ∨ B) ∧ (¬B ∨ C) ∧ (¬C) ∧ (¬A)

(A ∨ B) (¬B ∨ C) (¬C) (¬A)


Inference by Resolution
Does (A ∨ B) ∧ (¬B ∨ C) ∧ (¬C) entail A?

(A ∨ B) ∧ (¬B ∨ C) ∧ (¬C) ∧ (¬A)

(A ∨ B) (¬B ∨ C) (¬C) (¬A)


Inference by Resolution
Does (A ∨ B) ∧ (¬B ∨ C) ∧ (¬C) entail A?

(A ∨ B) ∧ (¬B ∨ C) ∧ (¬C) ∧ (¬A)

(A ∨ B) (¬B ∨ C) (¬C) (¬A) (¬B)


Inference by Resolution
Does (A ∨ B) ∧ (¬B ∨ C) ∧ (¬C) entail A?

(A ∨ B) ∧ (¬B ∨ C) ∧ (¬C) ∧ (¬A)

(A ∨ B) (¬B ∨ C) (¬C) (¬A) (¬B)


Inference by Resolution
Does (A ∨ B) ∧ (¬B ∨ C) ∧ (¬C) entail A?

(A ∨ B) ∧ (¬B ∨ C) ∧ (¬C) ∧ (¬A)

(A ∨ B) (¬B ∨ C) (¬C) (¬A) (¬B)


Inference by Resolution
Does (A ∨ B) ∧ (¬B ∨ C) ∧ (¬C) entail A?

(A ∨ B) ∧ (¬B ∨ C) ∧ (¬C) ∧ (¬A)

(A ∨ B) (¬B ∨ C) (¬C) (¬A) (¬B) (A)


Inference by Resolution
Does (A ∨ B) ∧ (¬B ∨ C) ∧ (¬C) entail A?

(A ∨ B) ∧ (¬B ∨ C) ∧ (¬C) ∧ (¬A)

(A ∨ B) (¬B ∨ C) (¬C) (¬A) (¬B) (A)


Inference by Resolution
Does (A ∨ B) ∧ (¬B ∨ C) ∧ (¬C) entail A?

(A ∨ B) ∧ (¬B ∨ C) ∧ (¬C) ∧ (¬A)

(A ∨ B) (¬B ∨ C) (¬C) (¬A) (¬B) (A)


Inference by Resolution
Does (A ∨ B) ∧ (¬B ∨ C) ∧ (¬C) entail A?

(A ∨ B) ∧ (¬B ∨ C) ∧ (¬C) ∧ (¬A)

(A ∨ B) (¬B ∨ C) (¬C) (¬A) (¬B) (A) ()


Inference by Resolution
Does (A ∨ B) ∧ (¬B ∨ C) ∧ (¬C) entail A?

(A ∨ B) ∧ (¬B ∨ C) ∧ (¬C) ∧ (¬A)

(A ∨ B) (¬B ∨ C) (¬C) (¬A) (¬B) (A) ()


First-Order Logic
Propositional Logic
Propositional Symbols
MinervaGryffindor
MinervaHufflepuff
MinervaRavenclaw
MinervaSlytherin

First-Order Logic
Constant Symbol Predicate Symbol
Minerva Person
Pomona House
Horace BelongsTo
Gilderoy
Gryffindor
Hufflepuff
Ravenclaw
Slytherin
First-Order Logic

Person(Minerva) Minerva is a person.

House(Gryffindor) Gryffindor is a house.

¬House(Minerva) Minerva is not a house.

BelongsTo(Minerva, Gryffindor)
Minerva belongs to Gryffindor.
Universal Quantification
Universal Quantification

∀x. BelongsTo(x, Gryffindor) →


¬BelongsTo(x, Hufflepuff)
For all objects x, if x belongs to Gryffindor,
then x does not belong to Hufflepuff.

Anyone in Gryffindor is not in Hufflepuff.


Existential Quantification
Existential Quantification

∃x. House(x) ∧ BelongsTo(Minerva, x)

There exists an object x such that


x is a house and Minerva belongs to x.

Minerva belongs to a house.


Existential Quantification

∀x. Person(x) → (∃y. House(y) ∧ BelongsTo(x, y))


For all objects x, if x is a person, then
there exists an object y such that
y is a house and x belongs to y.

Every person belongs to a house.


Knowledge
Introduction to
Artificial Intelligence
with Python
Introduction to
Artificial Intelligence
with Python
Uncertainty
Probability
Possible Worlds

P(ω)
P(ω)
0 ≤ P(ω) ≤ 1
0∑≤ P(ω) = 1
ω∈Ω
1 1 1 1 1 1
6 6 6 6 6 6
1
P( ) = 1/6
6
2 3 4 5 6 7
3 4 5 6 7 8
4 5 6 7 8 9
5 6 7 8 9 10
6 7 8 9 10 11
7 8 9 10 11 12
2 3 4 5 6 7
3 4 5 6 7 8
4 5 6 7 8 9
5 6 7 8 9 10
6 7 8 9 10 11
7 8 9 10 11 12
2 3 4 5 6 7
3 4 5 6 7 8
4 5 6 7 8 9
5 6 7 8 9 10
6 7 8 9 10 11
7 8 9 10 11 12
1
P(sum to 12) =
36
6 1
P(sum to 7) = =
36 6
unconditional probability
degree of belief in a proposition
in the absence of any other evidence
conditional probability
degree of belief in a proposition
given some evidence that has already
been revealed
conditional probability

P(a | b)
P(rain today | rain yesterday)
P(route change | traffic conditions)
P(disease | test results)
P(a ∧ b)
P(a | b) =
P(b)
P(sum 12 | )
1
P( ) =
6
1
P(sum 12) =
36
1
1 P(sum 12 | ) =
P( ) = 6
6
P(a ∧ b)
P(a | b) =
P(b)

P(a ∧ b) = P(b)P(a | b)
P(a ∧ b) = P(a)P(b | a)
random variable
a variable in probability theory with a
domain of possible values it can take on
random variable
Roll

{1, 2, 3, 4, 5, 6}
random variable
Weather

{sun, cloud, rain, wind, snow}


random variable
Traffic

{none, light, heavy}


random variable
Flight

{on time, delayed, cancelled}


probability distribution
P(Flight = on time) = 0.6
P(Flight = delayed) = 0.3
P(Flight = cancelled) = 0.1
probability distribution
P(Flight) = ⟨0.6, 0.3, 0.1⟩
independence
the knowledge that one event occurs does
not affect the probability of the other event
independence
P(a ∧ b) = P(a)P(b | a)
independence
P(a ∧ b) = P(a)P(b)
independence
P( ) = P( )P( )
1 1 1
= ⋅ =
6 6 36
independence
P( ) ≠ P( )P( )
1 1 1
= ⋅ =
6 6 36
independence
P( ) ≠ P( )P( | )
1
= ⋅0=0
6
Bayes' Rule
P(a ∧ b) = P(b) P(a | b)

P(a ∧ b) = P(a) P(b | a)


P(a) P(b | a) = P(b) P(a | b)
Bayes' Rule

P(b) P(a | b)
P(b | a) =
P(a)
Bayes' Rule

P(a | b) P(b)
P(b | a) =
P(a)
AM PM

Given clouds in the morning,


what's the probability of rain in the afternoon?

• 80% of rainy afternoons start with cloudy


mornings.
• 40% of days have cloudy mornings.
• 10% of days have rainy afternoons.
P(clouds | rain)P(rain)
P(rain | clouds) =
P(clouds)

(.8)(.1)
=
.4

= 0.2
Knowing

P(cloudy morning | rainy afternoon)

we can calculate

P(rainy afternoon | cloudy morning)


Knowing

P(visible effect | unknown cause)

we can calculate

P(unknown cause | visible effect)


Knowing

P(medical test result | disease)

we can calculate

P(disease | medical test result)


Knowing

P(blurry text | counterfeit bill)

we can calculate

P(counterfeit bill | blurry text)


Joint Probability
PM
AM

C = cloud C = ¬cloud R = rain R = ¬rain


0.4 0.6 0.1 0.9

PM
AM

R = rain R = ¬rain
C = cloud 0.08 0.32
C = ¬cloud 0.02 0.58
P(C | rain)
P(C, rain)
P(C | rain) = = αP(C, rain)
P(rain)

= α⟨0.08, 0.02⟩ = ⟨0.8, 0.2⟩

R = rain R = ¬rain
C = cloud 0.08 0.32
C = ¬cloud 0.02 0.58
Probability Rules
Negation

P( ¬a) = 1 − P(a)
Inclusion-Exclusion

P(a ∨ b) = P(a) + P(b) − P(a ∧ b)


Marginalization

P(a) = P(a, b) + P(a, ¬b)


Marginalization


P(X = xi) = P(X = xi, Y = yj)
j
Marginalization
R = rain R = ¬rain
C = cloud 0.08 0.32
C = ¬cloud 0.02 0.58

P(C = cloud)
= P(C = cloud, R = rain) + P(C = cloud, R = ¬rain)
= 0.08 + 0.32
= 0.40
Conditioning

P(a) = P(a | b)P(b) + P(a | ¬b)P( ¬b)


Conditioning


P(X = xi) = P(X = xi | Y = yj)P(Y = yj)
j
Bayesian Networks
Bayesian network
data structure that represents the
dependencies among random variables
Bayesian network
• directed graph
• each node represents a random variable
• arrow from X to Y means X is a parent of Y
• each node X has probability distribution
P(X | Parents(X))
Rain
{none, light, heavy}

Maintenance
{yes, no}

Train
{on time, delayed}

Appointment
{attend, miss}
Rain none light heavy
{none, light, heavy} 0.7 0.2 0.1
Rain
{none, light, heavy}

R yes no

Maintenance none 0.4 0.6


{yes, no} light 0.2 0.8
heavy 0.1 0.9
Rain
{none, light, heavy}

R M on time delayed
Maintenance none yes 0.8 0.2
{yes, no} none no 0.9 0.1
light yes 0.6 0.4
light no 0.7 0.3
Train heavy yes 0.4 0.6
{on time, delayed} heavy no 0.5 0.5
Maintenance
{yes, no}

Train
{on time, delayed}

T attend miss
Appointment on time 0.9 0.1
{attend, miss} delayed 0.6 0.4
Rain
{none, light, heavy}

Maintenance
{yes, no}

Train
{on time, delayed}

Appointment
{attend, miss}
Rain Computing Joint Probabilities
{none, light, heavy}

Maintenance
{yes, no}

Train
{on time, delayed}

Appointment
{attend, miss}
P(light)

P(light)
Rain Computing Joint Probabilities
{none, light, heavy}

Maintenance
{yes, no}

Train
{on time, delayed}

Appointment
{attend, miss}
P(light, no)

P(light) P(no | light)


Rain Computing Joint Probabilities
{none, light, heavy}

Maintenance
{yes, no}

Train
{on time, delayed}

Appointment
{attend, miss}
P(light, no, delayed)

P(light) P(no | light) P(delayed | light, no)


Rain Computing Joint Probabilities
{none, light, heavy}

Maintenance
{yes, no}

Train
{on time, delayed}

Appointment
{attend, miss}
P(light, no, delayed, miss)

P(light) P(no | light) P(delayed | light, no) P(miss | delayed)


Inference
Inference

• Query X: variable for which to compute distribution


• Evidence variables E: observed variables for event e
• Hidden variables Y: non-evidence, non-query variable.

• Goal: Calculate P(X | e)


Rain
{none, light, heavy}

P(Appointment | light, no)


Maintenance
{yes, no}
= α P(Appointment, light, no)
Train
{on time, delayed}
= α [P(Appointment, light, no, on time)
+ P(Appointment, light, no, delayed)] Appointment
{attend, miss}
Inference by Enumeration

P(X | e) = α P(X, e) = α ∑ P(X, e, y)


y

X is the query variable.


e is the evidence.
y ranges over values of hidden variables.
α normalizes the result.
Approximate Inference
Sampling
Rain
{none, light, heavy}

Maintenance
{yes, no}

Train
{on time, delayed}

Appointment
{attend, miss}
R = none

Rain none light heavy


{none, light, heavy} 0.7 0.2 0.1
R = none
M = yes

Rain
{none, light, heavy}

R yes no

Maintenance none 0.4 0.6


{yes, no} light 0.2 0.8
heavy 0.1 0.9
Rain R = none
{none, light, heavy} M = yes
T = on time

Maintenance
{yes, no} R M on time delayed
none yes 0.8 0.2
none no 0.9 0.1
light yes 0.6 0.4
Train light no 0.7 0.3
heavy yes 0.4 0.6
{on time, delayed} heavy no 0.5 0.5
Maintenance R = none
{yes, no}
M = yes
T = on time
Train A = attend
{on time, delayed}

T attend miss
Appointment on time 0.9 0.1
{attend, miss} delayed 0.6 0.4
R = none
M = yes
T = on time
A = attend
R = light R = light R = none R = none
M = no M = yes M = no M = yes
T = on time T = delayed T = on time T = on time
A = miss A = attend A = attend A = attend

R = none R = none R = heavy R = light


M = yes M = yes M = no M = no
T = on time T = on time T = delayed T = on time
A = attend A = attend A = miss A = attend
P(Train = on time) ?
R = light R = light R = none R = none
M = no M = yes M = no M = yes
T = on time T = delayed T = on time T = on time
A = miss A = attend A = attend A = attend

R = none R = none R = heavy R = light


M = yes M = yes M = no M = no
T = on time T = on time T = delayed T = on time
A = attend A = attend A = miss A = attend
R = light R = light R = none R = none
M = no M = yes M = no M = yes
T = on time T = delayed T = on time T = on time
A = miss A = attend A = attend A = attend

R = none R = none R = heavy R = light


M = yes M = yes M = no M = no
T = on time T = on time T = delayed T = on time
A = attend A = attend A = miss A = attend
P(Rain = light | Train = on time) ?
R = light R = light R = none R = none
M = no M = yes M = no M = yes
T = on time T = delayed T = on time T = on time
A = miss A = attend A = attend A = attend

R = none R = none R = heavy R = light


M = yes M = yes M = no M = no
T = on time T = on time T = delayed T = on time
A = attend A = attend A = miss A = attend
R = light R = light R = none R = none
M = no M = yes M = no M = yes
T = on time T = delayed T = on time T = on time
A = miss A = attend A = attend A = attend

R = none R = none R = heavy R = light


M = yes M = yes M = no M = no
T = on time T = on time T = delayed T = on time
A = attend A = attend A = miss A = attend
R = light R = light R = none R = none
M = no M = yes M = no M = yes
T = on time T = delayed T = on time T = on time
A = miss A = attend A = attend A = attend

R = none R = none R = heavy R = light


M = yes M = yes M = no M = no
T = on time T = on time T = delayed T = on time
A = attend A = attend A = miss A = attend
Rejection Sampling
Likelihood Weighting
Likelihood Weighting

• Start by fixing the values for evidence variables.


• Sample the non-evidence variables using conditional
probabilities in the Bayesian Network.
• Weight each sample by its likelihood: the probability
of all of the evidence.
P(Rain = light | Train = on time) ?
Rain
{none, light, heavy}

Maintenance
{yes, no}

Train
{on time, delayed}

Appointment
{attend, miss}
R = light

T = on time

Rain none light heavy


{none, light, heavy} 0.7 0.2 0.1
R = light
M = yes
T = on time
Rain
{none, light, heavy}

R yes no

Maintenance none 0.4 0.6


{yes, no} light 0.2 0.8
heavy 0.1 0.9
Rain R = light
{none, light, heavy} M = yes
T = on time

Maintenance
{yes, no} R M on time delayed
none yes 0.8 0.2
none no 0.9 0.1
light yes 0.6 0.4
Train light no 0.7 0.3
heavy yes 0.4 0.6
{on time, delayed} heavy no 0.5 0.5
Maintenance R = light
{yes, no}
M = yes
T = on time
Train A = attend
{on time, delayed}

T attend miss
Appointment on time 0.9 0.1
{attend, miss} delayed 0.6 0.4
Rain R = light
{none, light, heavy} M = yes
T = on time
A = attend
Maintenance
{yes, no} R M on time delayed
none yes 0.8 0.2
none no 0.9 0.1
light yes 0.6 0.4
Train light no 0.7 0.3
heavy yes 0.4 0.6
{on time, delayed} heavy no 0.5 0.5
Rain R = light
{none, light, heavy} M = yes
T = on time
A = attend
Maintenance
{yes, no} R M on time delayed
none yes 0.8 0.2
none no 0.9 0.1
light yes 0.6 0.4
Train light no 0.7 0.3
heavy yes 0.4 0.6
{on time, delayed} heavy no 0.5 0.5
Uncertainty over Time
Xt: Weather at time t
Markov assumption
the assumption that the current state
depends on only a finite fixed number of
previous states
Markov Chain
Markov chain
a sequence of random variables where the
distribution of each variable follows the
Markov assumption
Transition Model
Tomorrow (Xt+1)

0.8 0.2
Today (Xt)
0.3 0.7
X0 X1 X2 X3 X4
Sensor Models
Hidden State Observation
robot's position robot's sensor data

words spoken audio waveforms

user engagement website or app analytics

weather umbrella
Hidden Markov Models
Hidden Markov Model
a Markov model for a system with hidden
states that generate some observed event
Sensor Model
Observation (Et)

0.2 0.8
State (Xt)
0.9 0.1
sensor Markov assumption
the assumption that the evidence variable
depends only the corresponding state
X0 X1 X2 X3 X4

E0 E1 E2 E3 E4
Task Definition
given observations from start until now,
filtering calculate distribution for current state
given observations from start until now,
prediction calculate distribution for a future state
given observations from start until now,
smoothing calculate distribution for past state
most likely given observations from start until now,
explanation calculate most likely sequence of states
Uncertainty
Introduction to
Artificial Intelligence
with Python
Introduction to
Artificial Intelligence
with Python
Optimization
optimization
choosing the best option from a set of
options
local search
search algorithms that maintain a single
node and searches by moving to a
neighboring node
B

A
B
Cost: 17
state-space landscape
objective global maximum
function
cost global minimum
function
current state
neighbors
Hill Climbing
Hill Climbing

function HILL-CLIMB(problem):
current = initial state of problem
repeat:
neighbor = highest valued neighbor of current
if neighbor not better than current:
return current
current = neighbor
Cost: 17
Cost: 17
Cost: 17
Cost: 15
Cost: 13
Cost: 11
Cost: 9
global maximum
local maxima
global minimum
local minima
flat local maximum
shoulder
Hill Climbing Variants
Variant Definition
steepest-ascent choose the highest-valued neighbor

choose randomly from higher-valued


stochastic neighbors

first-choice choose the first higher-valued neighbor

random-restart conduct hill climbing multiple times

local beam search chooses the k highest-valued neighbors


Simulated Annealing
Simulated Annealing

• Early on, higher "temperature": more likely to accept


neighbors that are worse than current state
• Later on, lower "temperature": less likely to accept
neighbors that are worse than current state
Simulated Annealing
function SIMULATED-ANNEALING(problem, max):
current = initial state of problem
for t = 1 to max:
T = TEMPERATURE(t)
neighbor = random neighbor of current
ΔE = how much better neighbor is than current
if ΔE > 0:
current = neighbor
with probability e ΔE/T set current = neighbor
return current
Traveling Salesman Problem
Linear Programming
Linear Programming

• Minimize a cost function c1x1 + c2x2 + ... + cnxn


• With constraints of form a1x1 + a2x2 + ... + anxn ≤ b
or of form a1x1 + a2x2 + ... + anxn = b
• With bounds for each variable li ≤ xi ≤ ui
Linear Programming Example
• Two machines X1 and X2. X1 costs $50/hour to run, X2
costs $80/hour to run. Goal is to minimize cost.
• X1 requires 5 units of labor per hour. X2 requires 2
units of labor per hour. Total of 20 units of labor to
spend.
• X1 produces 10 units of output per hour. X2 produces
12 units of output per hour. Company needs 90 units
of output.
Linear Programming Example
Cost Function: 50x + 80x
• Two machines X1 and X12. X1 costs
2 $50/hour to run, X2
costs $80/hour to run.
• X1 requires 5 units of labor per hour. X2 requires 2
units of labor per hour. Total of 20 units of labor to
spend.
• X1 produces 10 units of output per hour. X2 produces
12 units of output per hour. Company needs 90 units
of output.
Linear Programming Example
Cost Function: 50x + 80x
• Two machines X1 and X12. X1 costs
2 $50/hour to run, X2
costs $80/hour to run.
• X1 requires 5 units of labor per hour. X2 requires 2
Constraint: 5x1 +
units of labor per hour. 2x2 of
Total ≤ 20
20 units of labor to
spend.
• X1 produces 10 units of output per hour. X2 produces
12 units of output per hour. Company needs 90 units
of output.
Linear Programming Example
Cost Function: 50x + 80x
• Two machines X1 and X12. X1 costs
2 $50/hour to run, X2
costs $80/hour to run.
• X1 requires 5 units of labor per hour. X2 requires 2
Constraint: 5x1 +
units of labor per hour. 2x2 of
Total ≤ 20
20 units of labor to
spend.
• X1 produces 10 units of output per hour. X2 produces
10xhour.
12 units of output per
Constraint: 1 + 12x ≥ 90
Company
2 needs 90 units
of output.
Linear Programming Example
Cost Function: 50x + 80x
• Two machines X1 and X12. X1 costs
2 $50/hour to run, X2
costs $80/hour to run.
• X1 requires 5 units of labor per hour. X2 requires 2
Constraint: 5x1 +
units of labor per hour. 2x2 of
Total ≤ 20
20 units of labor to
spend.
• X1 produces 10 units of output per hour. X2 produces
(−10x
12 units of output per
Constraint: hour.
1 ) + (−12x
Company2 ) ≤ −
needs90
90 units
of output.
Linear Programming Algorithms

• Simplex
• Interior-Point
Constraint Satisfaction
Student:

4
Student: Taking classes:

1 A B C

2 B D E

3 C E F

4 E F G
Student: Taking classes: Exam slots:

1 A B C Monday

Tuesday

2 B D E Wednesday

3 C E F

4 E F G
A
1 A B C

B C
2 B D E

D F
3 C E F

E
4 E F G
G
A
1 A B C

B C
2 B D E

D F
3 C E F

E
4 E F G
G
A
1 A B C

B C
2 B D E

D F
3 C E F

E
4 E F G
G
A
1 A B C

B C
2 B D E

D F
3 C E F

E
4 E F G
G
A
1 A B C

B C
2 B D E

D F
3 C E F

E
4 E F G
G
A
1 A B C

B C
2 B D E

D F
3 C E F

E
4 E F G
G
A
1 A B C

B C
2 B D E

D F
3 C E F

E
4 E F G
G
A
1 A B C

B C
2 B D E

D F
3 C E F

E
4 E F G
G
A
1 A B C

B C
2 B D E

D F
3 C E F

E
4 E F G
G
A

B C

D F

E G
Constraint Satisfaction Problem

• Set of variables {X1, X2, ..., Xn}


• Set of domains for each variable {D1, D2, ..., Dn}
• Set of constraints C
5 3 7 Variables
6 1 9 5 {(0, 2), (1, 1), (1, 2), (2, 0), ...}
9 8 6
8 6 3 Domains
4 8 3 1 {1, 2, 3, 4, 5, 6, 7, 8, 9}
7 2 6 for each variable
6 2 8
4 1 9 5 Constraints
8 7 9 {(0, 2) ≠ (1, 1) ≠ (1, 2) ≠ (2, 0), ...}
A Variables
{A, B, C, D, E, F, G}

B C
Domains
{Monday, Tuesday, Wednesday}
D F for each variable

Constraints
E {A≠B, A≠C, B≠C, B≠D, B≠E, C≠E,
G
C≠F, D≠E, E≠F, E≠G, F≠G}
hard constraints
constraints that must be satisfied in a
correct solution
soft constraints
constraints that express some notion of
which solutions are preferred over others
A

B C

D F

E G
unary constraint
constraint involving only one variable
unary constraint
{A ≠ Monday}
binary constraint
constraint involving two variables
binary constraint
{A ≠ B}
node consistency
when all the values in a variable's domain
satisfy the variable's unary constraints
A B

{Mon, Tue, Wed} {Mon, Tue, Wed}

{A ≠ Mon, B ≠ Tue, B ≠ Mon, A ≠ B}


A B

{Mon, Tue, Wed} {Mon, Tue, Wed}

{A ≠ Mon, B ≠ Tue, B ≠ Mon, A ≠ B}


A B

{Tue, Wed} {Mon, Tue, Wed}

{A ≠ Mon, B ≠ Tue, B ≠ Mon, A ≠ B}


A B

{Tue, Wed} {Mon, Tue, Wed}

{A ≠ Mon, B ≠ Tue, B ≠ Mon, A ≠ B}


A B

{Tue, Wed} {Mon, Wed}

{A ≠ Mon, B ≠ Tue, B ≠ Mon, A ≠ B}


A B

{Tue, Wed} {Mon, Wed}

{A ≠ Mon, B ≠ Tue, B ≠ Mon, A ≠ B}


A B

{Tue, Wed} {Wed}

{A ≠ Mon, B ≠ Tue, B ≠ Mon, A ≠ B}


A B

{Tue, Wed} {Wed}

{A ≠ Mon, B ≠ Tue, B ≠ Mon, A ≠ B}


arc consistency
when all the values in a variable's domain
satisfy the variable's binary constraints
arc consistency
To make X arc-consistent with respect to Y,
remove elements from X's domain until every
choice for X has a possible choice for Y
A B

{Tue, Wed} {Wed}

{A ≠ Mon, B ≠ Tue, B ≠ Mon, A ≠ B}


A B

{Tue, Wed} {Wed}

{A ≠ Mon, B ≠ Tue, B ≠ Mon, A ≠ B}


A B

{Tue} {Wed}

{A ≠ Mon, B ≠ Tue, B ≠ Mon, A ≠ B}


A B

{Tue} {Wed}

{A ≠ Mon, B ≠ Tue, B ≠ Mon, A ≠ B}


Arc Consistency

function REVISE(csp, X, Y):


revised = false
for x in X.domain:
if no y in Y.domain satisfies constraint for (X, Y):
delete x from X.domain
revised = true
return revised
Arc Consistency
function AC-3(csp):
queue = all arcs in csp
while queue non-empty:
(X, Y) = DEQUEUE(queue)
if REVISE(csp, X, Y):
if size of X.domain == 0:
return false
for each Z in X.neighbors - {Y}:
ENQUEUE(queue, (Z, X))
return true
A

B C

D F

E G
{Mon, Tue, Wed}
A

{Mon, Tue, Wed} B C {Mon, Tue, Wed}

{Mon, Tue, Wed} D F {Mon, Tue, Wed}

E G
{Mon, Tue, Wed} {Mon, Tue, Wed}
Search Problems

• initial state
• actions
• transition model
• goal test
• path cost function
CSPs as Search Problems
• initial state: empty assignment (no variables)
• actions: add a {variable = value} to assignment
• transition model: shows how adding an assignment
changes the assignment
• goal test: check if all variables assigned and
constraints all satisfied
• path cost function: all paths have same cost
Backtracking Search
Backtracking Search
function BACKTRACK(assignment, csp):
if assignment complete: return assignment
var = SELECT-UNASSIGNED-VAR(assignment, csp)
for value in DOMAIN-VALUES(var, assignment, csp):
if value consistent with assignment:
add {var = value} to assignment
result = BACKTRACK(assignment, csp)
if result ≠ failure: return result
remove {var = value} from assignment
return failure
{Mon, Tue, Wed}
A

{Mon, Tue, Wed} B C {Mon, Tue, Wed}

{Mon, Tue, Wed} D F {Mon, Tue, Wed}

E G
{Mon, Tue, Wed} {Mon, Tue, Wed}
Mon
A

{Mon, Tue, Wed} B C {Mon, Tue, Wed}

{Mon, Tue, Wed} D F {Mon, Tue, Wed}

E G
{Mon, Tue, Wed} {Mon, Tue, Wed}
Mon
A

Mon B C {Mon, Tue, Wed}

{Mon, Tue, Wed} D F {Mon, Tue, Wed}

E G
{Mon, Tue, Wed} {Mon, Tue, Wed}
Mon
A

Mon B C {Mon, Tue, Wed}

{Mon, Tue, Wed} D F {Mon, Tue, Wed}

E G
{Mon, Tue, Wed} {Mon, Tue, Wed}
Mon
A

Tue B C {Mon, Tue, Wed}

{Mon, Tue, Wed} D F {Mon, Tue, Wed}

E G
{Mon, Tue, Wed} {Mon, Tue, Wed}
Mon
A

Tue B C {Mon, Tue, Wed}

Mon D F {Mon, Tue, Wed}

E G
{Mon, Tue, Wed} {Mon, Tue, Wed}
Mon
A

Tue B C {Mon, Tue, Wed}

Mon D F {Mon, Tue, Wed}

E G
Mon {Mon, Tue, Wed}
Mon
A

Tue B C {Mon, Tue, Wed}

Mon D F {Mon, Tue, Wed}

E G
Mon {Mon, Tue, Wed}
Mon
A

Tue B C {Mon, Tue, Wed}

Mon D F {Mon, Tue, Wed}

E G
Tue {Mon, Tue, Wed}
Mon
A

Tue B C {Mon, Tue, Wed}

Mon D F {Mon, Tue, Wed}

E G
Tue {Mon, Tue, Wed}
Mon
A

Tue B C {Mon, Tue, Wed}

Mon D F {Mon, Tue, Wed}

E G
Wed {Mon, Tue, Wed}
Mon
A

Tue B C Mon

Mon D F {Mon, Tue, Wed}

E G
Wed {Mon, Tue, Wed}
Mon
A

Tue B C Mon

Mon D F {Mon, Tue, Wed}

E G
Wed {Mon, Tue, Wed}
Mon
A

Tue B C Tue

Mon D F {Mon, Tue, Wed}

E G
Wed {Mon, Tue, Wed}
Mon
A

Tue B C Tue

Mon D F {Mon, Tue, Wed}

E G
Wed {Mon, Tue, Wed}
Mon
A

Tue B C Wed

Mon D F {Mon, Tue, Wed}

E G
Wed {Mon, Tue, Wed}
Mon
A

Tue B C Wed

Mon D F {Mon, Tue, Wed}

E G
Wed {Mon, Tue, Wed}
Mon
A

Tue B C {Mon, Tue, Wed}

Mon D F {Mon, Tue, Wed}

E G
Wed {Mon, Tue, Wed}
Mon
A

Tue B C {Mon, Tue, Wed}

Mon D F {Mon, Tue, Wed}

E G
{Mon, Tue, Wed} {Mon, Tue, Wed}
Mon
A

Tue B C {Mon, Tue, Wed}

{Mon, Tue, Wed} D F {Mon, Tue, Wed}

E G
{Mon, Tue, Wed} {Mon, Tue, Wed}
Mon
A

Tue B C {Mon, Tue, Wed}

Tue D F {Mon, Tue, Wed}

E G
{Mon, Tue, Wed} {Mon, Tue, Wed}
Mon
A

Tue B C {Mon, Tue, Wed}

Tue D F {Mon, Tue, Wed}

E G
{Mon, Tue, Wed} {Mon, Tue, Wed}
Mon
A

Tue B C {Mon, Tue, Wed}

Wed D F {Mon, Tue, Wed}

E G
{Mon, Tue, Wed} {Mon, Tue, Wed}
Mon
A

Tue B C {Mon, Tue, Wed}

Wed D F {Mon, Tue, Wed}

E G
Mon {Mon, Tue, Wed}
Mon
A

Tue B C Mon

Wed D F {Mon, Tue, Wed}

E G
Mon {Mon, Tue, Wed}
Mon
A

Tue B C Mon

Wed D F {Mon, Tue, Wed}

E G
Mon {Mon, Tue, Wed}
Mon
A

Tue B C Tue

Wed D F {Mon, Tue, Wed}

E G
Mon {Mon, Tue, Wed}
Mon
A

Tue B C Tue

Wed D F {Mon, Tue, Wed}

E G
Mon {Mon, Tue, Wed}
Mon
A

Tue B C Wed

Wed D F {Mon, Tue, Wed}

E G
Mon {Mon, Tue, Wed}
Mon
A

Tue B C Wed

Wed D F Mon

E G
Mon {Mon, Tue, Wed}
Mon
A

Tue B C Wed

Wed D F Mon

E G
Mon {Mon, Tue, Wed}
Mon
A

Tue B C Wed

Wed D F Tue

E G
Mon {Mon, Tue, Wed}
Mon
A

Tue B C Wed

Wed D F Tue

E G
Mon Mon
Mon
A

Tue B C Wed

Wed D F Tue

E G
Mon Mon
Mon
A

Tue B C Wed

Wed D F Tue

E G
Mon Tue
Mon
A

Tue B C Wed

Wed D F Tue

E G
Mon Tue
Mon
A

Tue B C Wed

Wed D F Tue

E G
Mon Wed
Inference
Mon
A

Tue B C {Mon, Tue, Wed}

{Mon, Tue, Wed} D F {Mon, Tue, Wed}

E G
{Mon, Tue, Wed} {Mon, Tue, Wed}
Mon
A

Tue B C {Mon, Tue, Wed}

Mon D F {Mon, Tue, Wed}

E G
{Mon, Tue, Wed} {Mon, Tue, Wed}
Mon
A

Tue B C {Mon, Tue, Wed}

Mon D F {Mon, Tue, Wed}

E G
{Mon, Tue, Wed} {Mon, Tue, Wed}
Mon
A

Tue B C {Mon, Tue, Wed}

{Mon, Tue, Wed} D F {Mon, Tue, Wed}

E G
{Mon, Tue, Wed} {Mon, Tue, Wed}
Mon
A

Tue B C {Wed}

{Mon, Tue, Wed} D F {Mon, Tue, Wed}

E G
{Mon, Tue, Wed} {Mon, Tue, Wed}
Mon
A

Tue B C {Wed}

{Mon, Tue, Wed} D F {Mon, Tue, Wed}

E G
{Mon} {Mon, Tue, Wed}
Mon
A

Tue B C {Wed}

{Wed} D F {Mon, Tue, Wed}

E G
{Mon} {Mon, Tue, Wed}
Mon
A

Tue B C {Wed}

{Wed} D F {Tue}

E G
{Mon} {Mon, Tue, Wed}
Mon
A

Tue B C {Wed}

{Wed} D F {Tue}

E G
{Mon} {Wed}
Mon
A

Tue B C Wed

Wed D F Tue

E G
Mon Wed
maintaining arc-consistency
algorithm for enforcing arc-consistency
every time we make a new assignment
maintaining arc-consistency
When we make a new assignment to X, calls
AC-3, starting with a queue of all arcs (Y, X)
where Y is a neighbor of X
function BACKTRACK(assignment, csp):
if assignment complete: return assignment
var = SELECT-UNASSIGNED-VAR(assignment, csp)
for value in DOMAIN-VALUES(var, assignment, csp):
if value consistent with assignment:
add {var = value} to assignment
inferences = INFERENCE(assignment, csp)
if inferences ≠ failure: add inferences to assignment
result = BACKTRACK(assignment, csp)
if result ≠ failure: return result
remove {var = value} and inferences from assignment
return failure
function BACKTRACK(assignment, csp):
if assignment complete: return assignment
var = SELECT-UNASSIGNED-VAR(assignment, csp)
for value in DOMAIN-VALUES(var, assignment, csp):
if value consistent with assignment:
add {var = value} to assignment
inferences = INFERENCE(assignment, csp)
if inferences ≠ failure: add inferences to assignment
result = BACKTRACK(assignment, csp)
if result ≠ failure: return result
remove {var = value} and inferences from assignment
return failure
function BACKTRACK(assignment, csp):
if assignment complete: return assignment
var = SELECT-UNASSIGNED-VAR(assignment, csp)
for value in DOMAIN-VALUES(var, assignment, csp):
if value consistent with assignment:
add {var = value} to assignment
inferences = INFERENCE(assignment, csp)
if inferences ≠ failure: add inferences to assignment
result = BACKTRACK(assignment, csp)
if result ≠ failure: return result
remove {var = value} and inferences from assignment
return failure
SELECT-UNASSIGNED-VAR

• minimum remaining values (MRV) heuristic: select


the variable that has the smallest domain
• degree heuristic: select the variable that has the
highest degree
Mon
A

Tue B C {Wed}

{Mon, Wed} D F {Mon, Tue, Wed}

E G
{Mon, Tue, Wed} {Mon, Tue, Wed}
Mon
A

Tue B C {Wed}

{Mon, Wed} D F {Mon, Tue, Wed}

E G
{Mon, Tue, Wed} {Mon, Tue, Wed}
{Mon, Tue, Wed}
A

{Mon, Tue, Wed} B C {Mon, Tue, Wed}

{Mon, Tue, Wed} D F {Mon, Tue, Wed}

E G
{Mon, Tue, Wed} {Mon, Tue, Wed}
{Mon, Tue, Wed}
A

{Mon, Tue, Wed} B C {Mon, Tue, Wed}

{Mon, Tue, Wed} D F {Mon, Tue, Wed}

E G
{Mon, Tue, Wed} {Mon, Tue, Wed}
function BACKTRACK(assignment, csp):
if assignment complete: return assignment
var = SELECT-UNASSIGNED-VAR(assignment, csp)
for value in DOMAIN-VALUES(var, assignment, csp):
if value consistent with assignment:
add {var = value} to assignment
inferences = INFERENCE(assignment, csp)
if inferences ≠ failure: add inferences to assignment
result = BACKTRACK(assignment, csp)
if result ≠ failure: return result
remove {var = value} and inferences from assignment
return failure
function BACKTRACK(assignment, csp):
if assignment complete: return assignment
var = SELECT-UNASSIGNED-VAR(assignment, csp)
for value in DOMAIN-VALUES(var, assignment, csp):
if value consistent with assignment:
add {var = value} to assignment
inferences = INFERENCE(assignment, csp)
if inferences ≠ failure: add inferences to assignment
result = BACKTRACK(assignment, csp)
if result ≠ failure: return result
remove {var = value} and inferences from assignment
return failure
DOMAIN-VALUES

• least-constraining values heuristic: return variables in


order by number of choices that are ruled out for
neighboring variables
• try least-constraining values first
Mon
A

{Mon, Tue, Wed} B C {Tue, Wed}

{Mon, Tue, Wed} D F {Mon, Tue}

E G
{Mon, Tue, Wed} Wed
Mon
A

{Mon, Tue, Wed} B C Wed

{Mon, Tue, Wed} D F {Mon, Tue}

E G
{Mon, Tue, Wed} Wed
Mon
A

Tue B C Wed

Wed D F Tue

E G
Mon Wed
Problem Formulation
A

B C
50x1 + 80x2
5x1 + 2x2 ≤ 20 D F
(−10x1) + (−12x2) ≤ − 90
E G

Local Linear Constraint


Search Programming Satisfaction
Optimization
Introduction to
Artificial Intelligence
with Python
Introduction to
Artificial Intelligence
with Python
Learning
Supervised Learning
supervised learning
given a data set of input-output pairs, learn
a function to map inputs to outputs
classification
supervised learning task of learning a
function mapping an input point to a
discrete category
Date Humidity Pressure Rain
(relative humidity) (sea level, mb)
Date Humidity Pressure Rain
(relative humidity) (sea level, mb)

January 1 93% 999.7 Rain

January 2 49% 1015.5 No Rain

January 3 79% 1031.1 No Rain

January 4 65% 984.9 Rain

January 5 90% 975.2 Rain


f(humidity, pressure)
f(93, 999.7) = Rain
f(49, 1015.5) = No Rain
f(79, 1031.1) = No Rain
h(humidity, pressure)
pressure

humidity
pressure

humidity
pressure

humidity
pressure

humidity
pressure

humidity
nearest-neighbor classification
algorithm that, given an input, chooses the
class of the nearest data point to that input
pressure

humidity
pressure

humidity
pressure

humidity
pressure

humidity
pressure

humidity
pressure

humidity
k-nearest-neighbor classification
algorithm that, given an input, chooses the
most common class out of the k nearest
data points to that input
pressure

humidity
pressure

humidity
x1 = Humidity
x2 = Pressure

Rain if w0 + w1x1 + w2x2 ≥ 0


h(x1, x2) =
No Rain otherwise
Weight Vector w: (w0, w1, w2)
Input Vector x: (1, x1, x2)
w · x: w0 + w1x1 + w2x2

1 if w0 + w1x1 + w2x2 ≥ 0
h(x1, x2) =
0 otherwise
Weight Vector w: (w0, w1, w2)
Input Vector x: (1, x1, x2)
w · x: w0 + w1x1 + w2x2

hw(x) = 1 if w · x ≥ 0
0 otherwise
perceptron learning rule
Given data point (x, y), update each weight
according to:

wi = wi + α(y - hw(x)) × xi
perceptron learning rule
Given data point (x, y), update each weight
according to:

wi = wi + α(actual value - estimate) × xi


output

0
1

w·x
pressure

humidity
pressure

humidity
hard threshold
1
output

0
w·x
soft threshold
1
output

0
w·x
Support Vector Machines
maximum margin separator
boundary that maximizes the distance
between any of the data points
regression
supervised learning task of learning a
function mapping an input point to a
continuous value
f(advertising)
f(1200) = 5800
f(2800) = 13400
f(1800) = 8400
h(advertising)
sales

advertising
Evaluating Hypotheses
loss function
function that expresses how poorly our
hypothesis performs
0-1 loss function
L(actual, predicted) =
0 if actual = predicted,
1 otherwise
pressure

humidity
0
0 0 0
0
0 1
0
0 0 0
1 0
pressure

0 0
0 0 0
0
1 0 0
0
0 0
0
1 0 0 0

humidity
L1 loss function
L(actual, predicted) = | actual - predicted |
sales

advertising
sales

advertising
L2 loss function
L(actual, predicted) = (actual - predicted) 2
overfitting
a model that fits too closely to a particular
data set and therefore may fail to generalize
to future data
pressure

humidity
pressure

humidity
pressure

humidity
sales

advertising
sales

advertising
penalizing hypotheses that are more complex
to favor simpler, more general hypotheses

cost(h) = loss(h)
penalizing hypotheses that are more complex
to favor simpler, more general hypotheses

cost(h) = loss(h) + complexity(h)


penalizing hypotheses that are more complex
to favor simpler, more general hypotheses

cost(h) = loss(h) + λcomplexity(h)


regularization
penalizing hypotheses that are more complex
to favor simpler, more general hypotheses

cost(h) = loss(h) + λcomplexity(h)


holdout cross-validation
splitting data into a training set and a
test set, such that learning happens on the
training set and is evaluated on the test set
k-fold cross-validation
splitting data into k sets, and experimenting
k times, using each set as a test set once,
and using remaining data as training set
scikit-learn
Reinforcement Learning
reinforcement learning
given a set of rewards or punishments, learn
what actions to take in the future
Environment

action state reward

Agent
Markov Decision Process
model for decision-making, representing
states, actions, and their rewards
Markov Decision Process
model for decision-making, representing
states, actions, and their rewards
Markov Chain

X0 X1 X2 X3 X4
r r r r

r r r r

r r r r
Markov Decision Process

• Set of states S
• Set of actions ACTIONS(s)
• Transition model P(s' | s, a)
• Reward function R(s, a, s')
Q-learning
method for learning a function Q(s, a),
estimate of the value of performing action a
in state s
Q-learning Overview

• Start with Q(s, a) = 0 for all s, a


• When we taken an action and receive a reward:
• Estimate the value of Q(s, a) based on current
reward and expected future rewards
• Update Q(s, a) to take into account old estimate as
well as our new estimate
Q-learning

• Start with Q(s, a) = 0 for all s, a


• Every time we take an action a in state s and observe a
reward r, we update:

Q(s, a) ← Q(s, a) + α(new value estimate - old value estimate)


Q-learning

• Start with Q(s, a) = 0 for all s, a


• Every time we take an action a in state s and observe a
reward r, we update:

Q(s, a) ← Q(s, a) + α(new value estimate - Q(s, a))


Q-learning

• Start with Q(s, a) = 0 for all s, a


• Every time we take an action a in state s and observe a
reward r, we update:

Q(s, a) ← Q(s, a) + α((r + future reward estimate) - Q(s, a))


Q-learning

• Start with Q(s, a) = 0 for all s, a


• Every time we take an action a in state s and observe a
reward r, we update:

Q(s, a) ← Q(s, a) + α((r + maxa' Q(s', a')) - Q(s, a))


Q-learning

• Start with Q(s, a) = 0 for all s, a


• Every time we take an action a in state s and observe a
reward r, we update:

Q(s, a) ← Q(s, a) + α((r + γ maxa' Q(s', a')) - Q(s, a))


Greedy Decision-Making

• When in state s, choose action a with highest Q(s, a)


Explore vs. Exploit
ε-greedy

• Set ε equal to how often we want to move randomly.


• With probability 1 - ε, choose estimated best move.
• With probability ε, choose a random move.
Nim
function approximation
approximating Q(s, a), often by a function
combining various features, rather than
storing one value for every state-action pair
Unsupervised Learning
unsupervised learning
given input data without any additional
feedback, learn patterns
Clustering
clustering
organizing a set of objects into groups in
such a way that similar objects tend to be in
the same group
Some Clustering Applications

• Genetic research
• Image segmentation
• Market research
• Medical imaging
• Social network analysis.
k-means clustering
algorithm for clustering data based on
repeatedly assigning points to clusters and
updating those clusters' centers
Learning

• Supervised Learning
• Reinforcement Learning
• Unsupervised Learning
Learning
Introduction to
Artificial Intelligence
with Python
Introduction to
Artificial Intelligence
with Python
Neural Networks
Neural Networks

• Neurons are connected to and receive electrical


signals from other neurons.
• Neurons process input signals and can be activated.
artificial neural network
mathematical model for learning inspired by
biological neural networks
Artificial Neural Networks

• Model mathematical function from inputs to outputs


based on the structure and parameters of the
network.
• Allows for learning the network's parameters based
on data.
(x1, x2)
h(x1, x2) = w0 + w1x1 + w2x2
h(x1, x2) = w0 + w1x1 + w2x2
h(x1, x2) = w0 + w1x1 + w2x2
step function
1
g(x) = 1 if x ≥ 0, else 0
output

0
w·x
step function
1
g(x) = 1 if x ≥ 0, else 0
output

0
w·x
logistic sigmoid
1 x
e
g(x) = x
e +1
output

0
w·x
rectified linear unit (ReLU)
g(x) = max(0, x)
output

0
w·x
h(x1, x2) = g(w0 + w1x1 + w2x2)
h(x1, x2) = g(w0 + w1x1 + w2x2)
x1

x2
x1 w1

x2 w2
x1 w1

g(w0 + w1x1 + w2x2)

x2 w2
w0

x1 w1

g(w0 + w1x1 + w2x2)

x2 w2
x1 w1

g(w0 + w1x1 + w2x2)

x2 w2
Or

x y f(x, y)
0 0 0

0 1 1

1 0 1

1 1 1
w0

x1 w1
g(w0 + w1x1 + w2x2)

x2 w2
-1

x1 1
g(-1 + 1x1 + 1x2)

x2 1

-1 0 1
-1

x1 0 1
g(-1 + 1x1 + 1x2)
0
x2 0 1

-1 0 1
-1

x1 1 1
g(-1 + 1x1 + 1x2)
1
x2 0 1

-1 0 1
-1

x1 1 1
g(-1 + 1x1 + 1x2)
1
x2 1 1

-1 0 1
And

x y f(x, y)
0 0 0

0 1 0

1 0 0

1 1 1
-1

x1 1
g(-1 + 1x1 + 1x2)

x2 1

-1 0 1
-2

x1 1
g(-2 + 1x1 + 1x2)

x2 1

-1 0 1
-2

x1 1 1
g(-2 + 1x1 + 1x2)
1
x2 1 1

-1 0 1
-2

x1 1 1
g(-2 + 1x1 + 1x2)
0
x2 0 1

-1 0 1
humidity
probability
of rain
pressure
advertising

sales

month

x1 w1

g(w0 + w1x1 + w2x2)

x2 w2
x1 w1

w2 g(w0 + w1x1 + w2x2


x2 + w3x3)
w3

x3
x1
w1
x2 w2
5
w3

x3 g( xiwi + w0)
w4
i=1
x4 w5

x5
x1
w1
x2 w2
n


… g( xiwi + w0)
wn-1 i=1
xn-1 wn

xn
gradient descent
algorithm for minimizing loss when training
neural network
Gradient Descent

• Start with a random choice of weights.


• Repeat:
• Calculate the gradient based on all data points:
direction that will lead to decreasing loss.
• Update weights according to the gradient.
Gradient Descent

• Start with a random choice of weights.


• Repeat:
• Calculate the gradient based on all data points:
direction that will lead to decreasing loss.
• Update weights according to the gradient.
Stochastic Gradient Descent

• Start with a random choice of weights.


• Repeat:
• Calculate the gradient based on one data point:
direction that will lead to decreasing loss.
• Update weights according to the gradient.
Mini-Batch Gradient Descent

• Start with a random choice of weights.


• Repeat:
• Calculate the gradient based on one small batch:
direction that will lead to decreasing loss.
• Update weights according to the gradient.
rainy

sunny

cloudy

snowy
rainy

sunny

cloudy

snowy
rainy

sunny

cloudy

snowy
rainy

sunny

cloudy

snowy
0.1 rainy

0.6 sunny

0.2 cloudy

0.1 snowy
0.1 rainy

0.6 sunny

0.2 cloudy

0.1 snowy
action 1

action 2

action 3

action 4
Perceptron

• Only capable of learning linearly separable decision


boundary.
multilayer neural network
artificial neural network with an input layer,
an output layer, and at least one hidden layer
backpropagation
algorithm for training neural networks with
hidden layers
Backpropagation
• Start with a random choice of weights.
• Repeat:
• Calculate error for output layer.
• For each layer, starting with output layer, and
moving inwards towards earliest hidden layer:
• Propagate error back one layer.
• Update weights.
deep neural networks
neural network with multiple hidden layers
Overfitting
dropout
temporarily removing units — selected at
random — from a neural network to prevent
over-reliance on certain units
TensorFlow
playground.tensorflow.org
computer vision
computational methods for analyzing and
understanding digital images
0 255
0 255
0 255
0 255
0 255

0 255

0 255
0 255

0 255

0 255
0 255

0 255

0 255
0 255

0 255

0 255
0 255

0 255

0 255
0 255

0 255

0 255
0 255

0 255

0 255
0 255

0 255

0 255
...

...
image convolution
applying a filter that adds each pixel value
of an image to its neighbors, weighted
according to a kernel matrix
0 -1 0
-1 5 -1
0 -1 0
0 -1 0
10 20 30 40
-1 5 -1

10 20 30 40 0 -1 0

20 30 40 50

20 30 40 50
0 -1 0
10 20 30 40
-1 5 -1

10 20 30 40 0 -1 0

20 30 40 50

20 30 40 50
0 -1 0
10 20 30 40
-1 5 -1

10 20 30 40 0 -1 0

20 30 40 50

20 30 40 50
0 -1 0
10 20 30 40
-1 5 -1

10 20 30 40 0 -1 0

20 30 40 50 10

20 30 40 50
0 -1 0
10 20 30 40
-1 5 -1

10 20 30 40 0 -1 0

20 30 40 50 10

20 30 40 50
0 -1 0
10 20 30 40
-1 5 -1

10 20 30 40 0 -1 0

20 30 40 50 10 20

20 30 40 50
0 -1 0
10 20 30 40
-1 5 -1

10 20 30 40 0 -1 0

20 30 40 50 10 20

20 30 40 50 40
0 -1 0
10 20 30 40
-1 5 -1

10 20 30 40 0 -1 0

20 30 40 50 10 20

20 30 40 50 40
0 -1 0
10 20 30 40
-1 5 -1

10 20 30 40 0 -1 0

20 30 40 50 10 20

20 30 40 50 40 50
0 -1 0
10 20 30 40
-1 5 -1

10 20 30 40 0 -1 0

20 30 40 50 10 20

20 30 40 50 40 50
-1 -1 -1
-1 8 -1
-1 -1 -1
-1 -1 -1

20 20 20 -1 8 -1
-1 -1 -1
20 20 20 (20)(-1) + (20)(-1) + (20)(-1)
+ (20)(-1) + (20)(8) + (20)(-1)
+ (20)(-1) + (20)(-1) + (20)(-1)
20 20 20
0
-1 -1 -1

20 20 20 -1 8 -1
-1 -1 -1
50 50 50 (20)(-1) + (20)(-1) + (20)(-1)
+ (50)(-1) + (50)(8) + (50)(-1)
+ (50)(-1) + (50)(-1) + (50)(-1)
50 50 50
90
pooling
reducing the size of an input by sampling
from regions in the input
max-pooling
pooling by choosing the maximum value in
each region
30 40 80 90

20 50 100 110

0 10 20 30

10 20 40 30
30 40 80 90

20 50 100 110

0 10 20 30

10 20 40 30
30 40 80 90

20 50 100 110

0 10 20 30

10 20 40 30
30 40 80 90

20 50 100 110 50

0 10 20 30

10 20 40 30
30 40 80 90

20 50 100 110 50

0 10 20 30

10 20 40 30
30 40 80 90

20 50 100 110 50 110

0 10 20 30

10 20 40 30
30 40 80 90

20 50 100 110 50 110

0 10 20 30

10 20 40 30
30 40 80 90

20 50 100 110 50 110

0 10 20 30 20

10 20 40 30
30 40 80 90

20 50 100 110 50 110

0 10 20 30 20

10 20 40 30
30 40 80 90

20 50 100 110 50 110

0 10 20 30 20 40

10 20 40 30
30 40 80 90

20 50 100 110 50 110

0 10 20 30 20 40

10 20 40 30
convolutional neural network
neural networks that use convolution,
usually for analyzing images
convolution pooling flattening
first second
convolution and pooling convolution and pooling

low-level features: high-level features:


edges, curves, shapes objects
input network output
input network output
feed-forward neural network
neural network that has connections only in
one direction
recurrent neural network
neural network that generates output that
feeds back into its own inputs
input network output
group of people walking in front of a building
input network output

network output

network output

network output
input network

input network

input network

input network output


She is in the library.

她在圖書館
input network

input network

input network

input network output

network output

network output
Neural Networks
Introduction to
Artificial Intelligence
with Python
Introduction to
Artificial Intelligence
with Python
Language
Natural Language Processing
Natural Language Processing
• automatic summarization
• information extraction
• language identification
• machine translation
• named entity recognition
• speech recognition
• text classification
• word sense disambiguation
• ...
Syntax
"Just before nine o'clock Sherlock
Holmes stepped briskly into the room."
"Just before Sherlock Holmes nine
o'clock stepped briskly the room."
"I saw the man on the mountain
with a telescope."
Semantics
"Just before nine o'clock Sherlock
Holmes stepped briskly into the room."
"Sherlock Holmes stepped briskly into
the room just before nine o'clock."
"A few minutes before nine, Sherlock
Holmes walked quickly into the room."
"Colorless green ideas sleep furiously."
Natural Language Processing
Syntax
formal grammar
a system of rules for generating sentences
in a language
Context-Free Grammar
N V D N

she saw the city


N V D N

she saw the city


N → she | city | car | Harry | ...

D → the | a | an | ...

V → saw | ate | walked | ...

P → to | on | over | ...

ADJ → blue | busy | old | ...


NP → N | D N
NP

NP → N | D N N

she
NP

NP → N | D N D N

the city
VP → V | V NP
VP

VP → V | V NP V

walked
VP

V NP
VP → V | V NP
D N

saw the city


S → NP VP
S

NP VP

S → NP VP V NP

N D N

she saw the city


nltk
n-gram
a contiguous sequence of n items
from a sample of text
character n-gram
a contiguous sequence of n characters
from a sample of text
word n-gram
a contiguous sequence of n words
from a sample of text
unigram
a contiguous sequence of 1 item
from a sample of text
bigram
a contiguous sequence of 2 items
from a sample of text
trigrams
a contiguous sequence of 3 items
from a sample of text
"How often have I said to you that
when you have eliminated the
impossible whatever remains,
however improbable, must be the
truth?"
"How often have I said to you that
when you have eliminated the
impossible whatever remains,
however improbable, must be the
truth?"
"How often have I said to you that
when you have eliminated the
impossible whatever remains,
however improbable, must be the
truth?"
"How often have I said to you that
when you have eliminated the
impossible whatever remains,
however improbable, must be the
truth?"
"How often have I said to you that
when you have eliminated the
impossible whatever remains,
however improbable, must be the
truth?"
"How often have I said to you that
when you have eliminated the
impossible whatever remains,
however improbable, must be the
truth?"
"How often have I said to you that
when you have eliminated the
impossible whatever remains,
however improbable, must be the
truth?"
tokenization
the task of splitting a sequence of
characters into pieces (tokens)
word tokenization
the task of splitting a sequence of
characters into words
sentence tokenization
the task of splitting a sequence of
characters into sentences
"Whatever remains, however
improbable, must be the truth."
"Whatever remains, however
improbable, must be the truth."

["Whatever", "remains,", "however",


"improbable,", "must", "be", "the",
"truth."]
"Whatever remains, however
improbable, must be the truth."

["Whatever", "remains,", "however",


"improbable,", "must", "be", "the",
"truth."]
"Whatever remains, however
improbable, must be the truth."

["Whatever", "remains", "however",


"improbable", "must", "be", "the",
"truth"]
"Just before nine o'clock Sherlock
Holmes stepped briskly into the room."
"Just before nine o'clock Sherlock
Holmes stepped briskly into the room."
"He was dressed in a sombre yet rich
style, in black frock-coat, shining hat,
neat brown gaiters, and well-cut pearl-
grey trousers."
"He was dressed in a sombre yet rich
style, in black frock-coat, shining hat,
neat brown gaiters, and well-cut pearl-
grey trousers."
"I cannot waste time over this sort of
fantastic talk, Sherlock. If you can catch
the man, catch him, and let me know
when you have done it."
"I cannot waste time over this sort of
fantastic talk, Sherlock. If you can catch
the man, catch him, and let me know
when you have done it."
"I cannot waste time over this sort of
fantastic talk, Sherlock. If you can catch
the man, catch him, and let me know
when you have done it."
"I cannot waste time over this sort of
fantastic talk, Sherlock. If you can catch
the man, catch him, and let me know
when you have done it."
"I cannot waste time over this sort of
fantastic talk, Mr. Holmes. If you can
catch the man, catch him, and let me
know when you have done it."
"I cannot waste time over this sort of
fantastic talk, Mr. Holmes. If you can
catch the man, catch him, and let me
know when you have done it."
"I cannot waste time over this sort of
fantastic talk, Mr. Holmes. If you can
catch the man, catch him, and let me
know when you have done it."
"I cannot waste time over this sort of
fantastic talk, Mr. Holmes. If you can
catch the man, catch him, and let me
know when you have done it."
"I cannot waste time over this sort of
fantastic talk, Mr. Holmes. If you can
catch the man, catch him, and let me
know when you have done it."
"I cannot waste time over this sort of
fantastic talk, Mr. Holmes," he said. "If
you can catch the man, catch him, and
let me know when you have done it."
Markov Models
Text Categorization
Inbox Spam
😀 🙁
"My grandson loved it! So much fun!"

"Product broke after a few days."

"One of the best games I've played in a


long time."

"Kind of cheap and flimsy, not worth it."


😀 "My grandson loved it! So much fun!"

🙁 "Product broke after a few days."

"One of the best games I've played in a


😀 long time."

🙁 "Kind of cheap and flimsy, not worth it."


😀 "My grandson loved it! So much fun!"

🙁 "Product broke after a few days."

"One of the best games I've played in a


😀 long time."

🙁 "Kind of cheap and flimsy, not worth it."


bag-of-words model
model that represents text as an unordered
collection of words
Naive Bayes
Bayes' Rule

P(a | b) P(b)
P(b | a) =
P(a)
P(Positive)

P(Negative)
P(😀)

P(🙁)
"My grandson loved it!"
P(😀)
P(😀 | "my grandson loved it")
P(😀 | "my", "grandson", "loved", "it")
P(😀 | "my", "grandson", "loved", "it")
P(😀 | "my", "grandson", "loved", "it")

equal to

P("my", "grandson", "loved", "it" | 😀) P(😀)


P("my", "grandson", "loved", "it")
P(😀 | "my", "grandson", "loved", "it")

proportional to

P("my", "grandson", "loved", "it" | 😀) P(😀)


P(😀 | "my", "grandson", "loved", "it")

proportional to

P(😀, "my", "grandson", "loved", "it")


P(😀 | "my", "grandson", "loved", "it")

naively proportional to

P(😀)P("my" | 😀)P("grandson" | 😀)
P("loved" | 😀) P("it" | 😀)
number of positive samples
P(😀) = number of total samples
number of positive samples with "loved"
P("loved" | 😀) =
number of positive samples
P(😀)P("my" | 😀)P("grandson" | 😀)
P("loved" | 😀) P("it" | 😀)

😀 🙁 😀 🙁
0.49 0.51 my 0.30 0.20

grandson 0.01 0.02

loved 0.32 0.08

it 0.30 0.40
P(😀)P("my" | 😀)P("grandson" | 😀)
P("loved" | 😀) P("it" | 😀)

😀 🙁 😀 🙁
0.49 0.51 my 0.30 0.20

grandson 0.01 0.02

loved 0.32 0.08

it 0.30 0.40
P(😀)P("my" | 😀)P("grandson" | 😀)
P("loved" | 😀) P("it" | 😀)

😀 🙁 😀 🙁
0.49 0.51 my 0.30 0.20

grandson 0.01 0.02


😀 0.00014112 loved 0.32 0.08

it 0.30 0.40
P(😀)P("my" | 😀)P("grandson" | 😀)
P("loved" | 😀) P("it" | 😀)

😀 🙁 😀 🙁
0.49 0.51 my 0.30 0.20

grandson 0.01 0.02


😀 0.00014112 loved 0.32 0.08

it 0.30 0.40
P(🙁)P("my" | 🙁)P("grandson" | 🙁)
P("loved" | 🙁) P("it" | 🙁)

😀 🙁 😀 🙁
0.49 0.51 my 0.30 0.20

grandson 0.01 0.02


😀 0.00014112 loved 0.32 0.08

it 0.30 0.40
P(🙁)P("my" | 🙁)P("grandson" | 🙁)
P("loved" | 🙁) P("it" | 🙁)

😀 🙁 😀 🙁
0.49 0.51 my 0.30 0.20

grandson 0.01 0.02


😀 0.00014112 loved 0.32 0.08

it 0.30 0.40
P(🙁)P("my" | 🙁)P("grandson" | 🙁)
P("loved" | 🙁) P("it" | 🙁)

😀 🙁 😀 🙁
0.49 0.51 my 0.30 0.20

grandson 0.01 0.02


😀 0.00014112 loved 0.32 0.08
🙁 0.00006528 it 0.30 0.40
P(🙁)P("my" | 🙁)P("grandson" | 🙁)
P("loved" | 🙁) P("it" | 🙁)

😀 🙁 😀 🙁
0.49 0.51 my 0.30 0.20

grandson 0.01 0.02


😀 0.00014112 loved 0.32 0.08
🙁 0.00006528 it 0.30 0.40
P(🙁)P("my" | 🙁)P("grandson" | 🙁)
P("loved" | 🙁) P("it" | 🙁)

😀 🙁 😀 🙁
0.49 0.51 my 0.30 0.20

grandson 0.01 0.02


😀 0.6837 loved 0.32 0.08
🙁 0.3163 it 0.30 0.40
P(🙁)P("my" | 🙁)P("grandson" | 🙁)
P("loved" | 🙁) P("it" | 🙁)

😀 🙁 😀 🙁
0.49 0.51 my 0.30 0.20

grandson 0.01 0.02

loved 0.32 0.08

it 0.30 0.40
P(🙁)P("my" | 🙁)P("grandson" | 🙁)
P("loved" | 🙁) P("it" | 🙁)

😀 🙁 😀 🙁
0.49 0.51 my 0.30 0.20

grandson 0.00 0.02

loved 0.32 0.08

it 0.30 0.40
P(🙁)P("my" | 🙁)P("grandson" | 🙁)
P("loved" | 🙁) P("it" | 🙁)

😀 🙁 😀 🙁
0.49 0.51 my 0.30 0.20

grandson 0.00 0.02


😀 0.00000000 loved 0.32 0.08
🙁 0.00006528 it 0.30 0.40
additive smoothing
adding a value α to each value in our
distribution to smooth the data
Laplace smoothing
adding 1 to each value in our distribution:
pretending we've seen each value one more
time than we actually have
information retrieval
the task of finding relevant documents in
response to a user query
topic modeling
models for discovering the topics for a set
of documents
term frequency
number of times a term appears in a
document
function words
words that have little meaning on their own,
but are used to grammatically connect
other words
function words
am, by, do, is, which, with, yet, ...
content words
words that carry meaning independently
content words
algorithm, category, computer, ...
inverse document frequency
measure of how common or rare a word is
across documents
inverse document frequency

TotalDocuments
log
NumDocumentsContaining(word)
tf-idf
ranking of what words are important in a
document by multiplying term frequency
(TF) by inverse document frequency (IDF)
Semantics
information extraction
the task of extracting knowledge from
documents
"When Facebook was founded in 2004, it began with a seemingly
innocuous mission: to connect friends. Some seven years and 800
million users later, the social network has taken over most aspects of
our personal and professional lives, and is fast becoming the dominant
communication platform of the future."
Harvard Business Review, 2011

"Remember, back when Amazon was founded in 1994, most people


thought his idea to sell books over this thing called the internet was
crazy. A lot of people had never even hard of the internet."
Business Insider, 2018
"When Facebook was founded in 2004, it began with a seemingly
innocuous mission: to connect friends. Some seven years and 800
million users later, the social network has taken over most aspects of
our personal and professional lives, and is fast becoming the dominant
communication platform of the future."
Harvard Business Review, 2011

"Remember, back when Amazon was founded in 1994, most people


thought his idea to sell books over this thing called the internet was
crazy. A lot of people had never even hard of the internet."
Business Insider, 2018
"When Facebook was founded in 2004, it began with a seemingly
innocuous mission: to connect friends. Some seven years and 800
million users later, the social network has taken over most aspects of
our personal and professional lives, and is fast becoming the dominant
communication platform of the future."
Harvard Business Review, 2011

"Remember, back when Amazon was founded in 1994, most people


thought his idea to sell books over this thing called the internet was
crazy. A lot of people had never even hard of the internet."
Business Insider, 2018
When {company} was founded in {year},
WordNet
Word Representation
"He wrote a book."

he [1, 0, 0, 0]
wrote [0, 1, 0, 0]
a [0, 0, 1, 0]
book [0, 0, 0, 1]
one-hot representation
representation of meaning as a vector with
a single 1, and with other values as 0
"He wrote a book."

he [1, 0, 0, 0]
wrote [0, 1, 0, 0]
a [0, 0, 1, 0]
book [0, 0, 0, 1]
"He wrote a book."

he [1, 0, 0, 0, 0, 0, 0, 0, ...]
wrote [0, 1, 0, 0, 0, 0, 0, ...]
a [0, 0, 1, 0, 0, 0, 0, 0, ...]
book [0, 0, 0, 1, 0, 0, 0, ...]
"He wrote a book."
"He authored a novel."
wrote [0, 1, 0, 0, 0, 0, 0, 0, 0]
authored [0, 0, 0, 0, 1, 0, 0, 0, 0]

book [0, 0, 0, 0, 0, 0, 1, 0, 0]
novel [0, 0, 0, 0, 0, 0, 0, 0, 1]
distribution representation
representation of meaning distributed
across multiple values
"He wrote a book."

he [-0.34, -0.08, 0.02, -0.18, 0.22, ...]

wrote [-0.27, 0.40, 0.00, -0.65, -0.15, ...]

a [-0.12, -0.25, 0.29, -0.09, 0.40, ...]

book [-0.23, -0.16, -0.05, -0.57, ...]


"You shall know a word
by the company it keeps."
J. R. Firth, 1957
for he ate
for breakfast he ate
for lunch he ate
for dinner he ate
for he ate
word2vec
model for generating word vectors
skip-gram architecture
neural network architecture for predicting
context words given a target word
target context
word word
target context
word word
target context
word word
book
memoir
breakfast

lunch

dinner
novel
memoir
book

breakfast
novel

dinner lunch
king

king - man king - man

man woman
king queen

king - man king - man

man woman
Language
Artificial Intelligence
O
Search X X
O X
P→Q
Knowledge P
Q
Uncertainty
Optimization
Inbox
Learning

Spam
Neural
Networks
NP

NP PP

Language
ADJ N P N

artificial with
intelligence python
Introduction to
Artificial Intelligence
with Python

You might also like