Dr.
Firoz Anwar
Source: https://www.edureka.co/blog/what-is-data-science/
Source: https://www.dataquest.io/blog/what-is-data-science/
Source: https://data-flair.training/blogs/data-science-applications/
§ Time Series Analysis
§ Method of analysing time series data to extract meaningful pattern and
characteristics of the data
§ Time Series Forecasting
Use of Model to predict the future values based on previously observed
data
§ Time Series Analysis
§ Method of analysing time series data to extract meaningful pattern and
characteristics of the data
§ Time Series Forecasting
Use of Model to predict the future values based on previously observed
data
§ Transport projects
§ Sydney Metro Project
§ Analysing/Estimating number of passengers
§ Peak/Off-Peak hour flow
§ Number of services
§ Westconnex Project
§ Analysing/Estimating number of transports
§ Peak/Off-Peak hour flow
§ Stock Price Prediction
§ Estimating Stock Price index
§ How is it different from Traditional Regression Analysis?
§ It
is Time Dependent. So the basic assumption of a linear
regression model that the observations are independent doesn’t
hold in this case.
§ It
has Seasonality Trends, i.e. variations specific to a particular
time frame. For example, if you see the sales of a woollen jacket
over time, you will invariably find higher sales in winter seasons.
§ Trend
A general direction in which
something is developing or changing.
§ Seasonality
Predictable pattern that recurs or
repeats over regular intervals,
typically within a year or less.
§ Irregular fluctuation
§ Variations that occur due to sudden
causes and are unpredictable.
Source: https://towardsdatascience.com/
§ Typical Steps
§ Understanding the Data
§ Hypothesis
§ Feature Extraction
§ Exploratory Data Analysis (EDA)
§ Forecasting with Multiple Model
§ Naive approach, Moving average, Simple exponential smoothing, Holt.s linear
trend model, Auto Regression Integrated Moving Average(ARIMA), SARIMAX,
etc.
§ Model Evaluation
§ Mean Square Error(MSE), Root Mean Squared Error(RMSE) etc.
“Around 25-50 billion devices are
expected to be connected to the
Internet by 2020.” (Mahdavinejad et
al. 2017)
§ Network of Connected Devices
§ Interact with the environment
§ Data accessed
§ Configured/Manipulated
Source: https://www.channelfutures.com/
§ https/json
§ Plain text
§ Binary data
§ XML
§ Proprietary
§ Stream Data
§ Periodical data collection from device
§ API endpoints
§ Window-based descriptive statistics
§ Seasonal pattern
§ Trend pattern
Source: https://www.i-scoop.eu/internet-of-things-guide/
CGM (Continuous Glucose
Monitoring)
§ “Glucose Concen- tration can be
Predicted Ahead in Time From
Continuous Glucose Moni- toring
sensor Time-Series” by Sparacino et
al.
§ Parameter estimation
§ Weighted Linear Regression on
sampling window
Source: “Hands-On Artificial Intelligence for IoT” by Amita Kapoor
Source: Mahdavinejad,
M.S et al. “Machine
learning for Internet of
Things data analysis: A
survey”. Digit.
Commun. Netw. 2017
Source: https://cuppa.uic.edu/academics/upp/upp-programs/certificates/gsav-certificate/
House Price Prediction
§ Regression Model without Geo-data
§ Regression Model with Geo-data
Source: https://towardsdatascience.com/
§ Data Analysis is the heart of Data Science
§ Combination of various analytics/advance analytics shows the bigger
picture.
Introduction to Python Programming
Why Python Programming
I Why Programming?
I Programming is a tool to realise your data analysis ideas
I Data Science relies on programming heavily (why?)
I Why Python Programming?
I Interpreted Programming Language
I Can run interactively (natively interactive including terminal
and IPython)
I Other fancy stuff: Jupyter Notebook, python markdown, . . .
I Easy and flexible syntax
I Powerful third-party package support
I Convenient interface with other languages such as C/C++
The classic Hello, World! program
Python version
print "Hello, world!"
Java version
public class HelloWorldApp {
C++ version public static void main(String [] arg
{
#include <iostream> System.out.println("Hello, World!");
}
int main(){ }
std::cout<<"Hello, World!"<<std::endl;
return 0;
}
Basics of Python Programming
Output and input
The print command
print command is used in Python to display messages
I print "Hello, World" 4
I print Hello, World 4
I print Hello, World 8 Must enclose message with
quotation marks
I print "Hello, World 8 Quotation marks do not match
I Print "Hello, World" 8 print should be lower case
Display on Multiple Lines
print "Hello, World"
print "I love programming"
print "I love Python"
## Hello, World
## I love programming
## I love Python
or
print "Hello, World\nI love programming\nI love Python"
## Hello, World
## I love programming
## I love Python
\n to start a new line
Display Special Characters
print "\"Hello, World\""
print "\ Hello, World\ "
## "Hello, World"
## Hello, World
Display Variable Values
= "Hello"
t = "World"
print s, t
## Hello World
Keyboard input method
Read a string
str = raw_input("Enter a string: ")
Read an integer
x = input("Enter an integer: ")
# read a string and convert it to int
x = raw_input("Enter an integer: ")
x = int(x)
Read a fractional number
x = input("Enter a number: ")
# read a string and convert it to float
x = raw_input("Enter a number: ")
x = float(x)
Variables, data types and operators
Variables
I A variable is a name for data stored ‘in’ the program
I A variable is automatically created by assignment
I No need to define variables (unlike C++ or Java)
variable = expression
I Variable naming rules
I Cannot use keywords (if, else, while, for, . . . )
I Must be one word and cannot contain spaces
I First character must be a (upper or lower-case) letter or ’_’
I Cannot contain any other character than letters, numbers and
’_’
I Case sensitive: student and Student are distinct variables
Data types
I A variable can be used to hold different types of data
I Common data types used in Python
I Whole Numbers (Integer): -5, 1, 3
I Fractional Numbers (Float): 1.5, -0.8
I Strings: “hello”, “Room”, “PYTHON”
I Ordered Data — call by index
I Lists: [‘Australia’, ‘China’, ‘USA’]
I List can contain mixed types: [‘Australia’, ‘China’, 2, 5.8]
I Tuples: (‘Australia’, ‘China’, ‘USA’)
I Tuples are immutable (can’t change once created)
I Dictionaries: {‘name’: ‘Jackson’, ‘Title’: ‘Dr’, ‘Age’: 30}
I Basically a list of key-value pairs
I Unordered Data — call by key
Complex Data types, e.g. lists of lists
Examples of variable assignment
I room=234 4 Assign an integer 234 to the variable named room.
I room= 234 4 Assign a string ‘234’ to the variable named room.
I room=[234,123] 4 Assign a list to the variable named room; the
list contains two integers: 234 and 123.
I 234=room 8 Variable can only be placed on the left side of =
operator.
I 2room=234 8 Illegal variable name
Operators
I Arithmetic Operators (+,-, *, /, %, **)
I Order: ** > *,/,% > +,- , use () to change order — or always use ()!
I Integer division vs float division
a=5/2 # a=2,
b=5.0/2 # b=2.5
I Relational Operators (>,>=,<,<=, ==, !=)
I Used for variable comparison, e.g. numbers and strings
I == (equality) vs = (assignment)
if a==b:
print "a equals b"
I Logical Operators (and, or, not)
I Order: not > and > or — use ()
I Arithmetical > Relational > Logical — use ()
Program structures
Control structures
Sequential
Example: Celsius to Fahrenheit converter
C = input("Enter a
Celsius value: ")
F = 9.0/5*C+32
print C,"Celsius =",F,
"Fahrenheit"
Conditional
Example: grade calculator I
mark = input("Enter
your mark: ")
if mark<50:
print "Fail"
else:
print "Pass"
Example: grade calculator II
mark = input("Enter your mark: ")
if mark<50:
print "Fail"
else:
if mark<65:
print "Pass"
else:
if mark<75:
print "Credit"
else:
if mark<85:
print "D"
else:
print "HD"
Example: grade calculator II (using elif statement)
mark = input("Enter your mark: ")
if mark<50:
print "Fail"
elif mark<65:
print "Pass"
elif mark<75:
print "Credit"
elif mark<85:
print "D"
else:
print "HD"
More about conditional structure
I More complex conditions can be described by using logical operators
(and, or, not)
if age>65 and income<10000:
print "qualify for pensioner discount"
I Correct indention is important in if and all other python code
blocks!
8 4
8
if mark<50: if mark<50:
if mark<50: print "Fail" print "Fail"
print "Fail" else: else:
print "Pass" print "Pass"
Don’t miss out the ‘:’ at the end of if statement
Iterative (loops)
Example: input validation
str = raw_input("Enter a 8-character
string")
# len() returns the size of string,
# list, tuple, ...
while len(str)!=8:
print "Input error"
str = raw_input("Enter a
8-character string")
Example: range-based loop
Loops can be used to go through all items in a list
# sum over all items in the list
# print all items in the list
xlist = [1, 3, 5, 7, 9]
xlist = [1, 3, 5, 7, 9]
sum = 0
for x in xlist:
for x in xlist:
print x,
sum = sum + x
print
print "sum =", sum
## 1 3 5 7 9
## sum = 25
How about searching for a number in the list?
Or finding the maximum/minimum value?
Specifying range
range(stop)
range(start, stop[, step])
I return a list of numbers
(integers only!)
I start: begin value Examples:
(inclusive) of the list, 0 if
# x=[0,1,2,...,9]
omitted x = range(10)
I stop: end value x = range(0,10)
(exclusive) of the list # x=[1,3,5,7,9]
I step (optional): interval x = range(1,10,2)
between two values, 1 by # x=[10,9,...1]
default x = range(10, 0, -1)
I equivalent to Matlab ‘:’ # how about [10,9,...,0]?
operator (=
start:step:stop)
Get help
1. Use help() function: help(cumsum)
2. Use ?cmd or cmd?: ?sum (type in q to quit help page)
3. Use cmd without brackets: range (with very limited text)
4. Use internet: most comprehensive way of getting help
Strings
String operations
# Basic manipulations #Type checks
# create a string
s.isalpha() # letters
s = "Hello world"
s.isdigit() # digits
# Length of string # letters + digits
len(s) # = 11 s.isalnum()
s.isspace() # spaces
# Type conversion s.isupper() # upper-case
# string to number s.islower() # lower-case
s = "123.0"
#Case conversion
s = float(s) # s = 123.0
s = "123" s = "hEllo"
s = int(s) # s = 123 s = s.upper() # "HELLO"
# number to string s = s.lower() # "hello"
s = str(123) # s = "123" s = s.capitalize() #"Hello"
Functions
Functions
I Function is a stored procedure to performs some task
I How to use functions?
I Function definition: define the function
I Function call: use the defined function elsewhere
I Two types of functions in Python
I Built-in (system-defined) functions
I print(), input(), raw_input(), float(), int(), len(). . .
I Must treat built-in function names as reserved words
I User-defined Functions
Elements in functions
Define and use a function
Function definition
def add(a, b):
c = a + b
return c
Use a function
a = input("Enter the first number: ")
b = input("Enter the second number: ")
c = add(a, b) # Function call
print "sum =", c
I keyword def indicates start of function definition
I indentation is used to indicate content of function
I return statement returns value to caller
I multiple values can be returned by tuple e.g. return (a,b)
Why use functions?
# x = [0,1,2,3,4,5,6,7,8,9]
xlist = range(10)
# the following code prints the sum of xlist
sum = 0
for x in xlist:
sum = sum + x
print "sum =", sum
# x = [1,3,5,7,9]
xlist = range(1, 10, 2)
# the following code prints the sum of xlist
sum = 0
for x in xlist:
sum = sum + x
print "sum =", sum
## sum = 45
## sum = 25
Code reuse
# define the sum function
def sumFunc(xlist):
sum = 0
for x in xlist:
sum = sum + x
print "sum =", sum
# x = [0,1,2,3,4,5,6,7,8,9]
xlist = range(10)
# the following code prints the sum of xlist
sumFunc(xlist)
# x = [1,3,5,7,9]
xlist = range(1, 10, 2)
# the following code prints the sum of xlist
sumFunc(xlist)
## sum = 45
## sum = 25
Summary
I A function can be defined once and used everywhere
throughout the program
I Enhance code reusability and maintainability
I Avoid changing one place without updating other parts
I Improve readability and create be�er structured program
I Function design considerations
I What does the function do?
I What input does the function take? (input arguments)
I What result should the function return?
I No return value: e.g. print the result inside function
I Return value required: e.g. return the calculation result to the
caller of the function
More on data types
List and tuples
List
- A list is a collection of values
food = [“chicken”, “beef”, “egg”, “milk”]
- ‘[’ and ’]’ are used to define the list
- items are separated by ’,’s — A list item can be any object — even
another list
Lists behaves like arrays in C++ and Java and follow similar indexing
rules.
List operations
Create a list
x = [ hello , world ] # list of two words
x = [] # an empty list
x = range(5) # x=[0,1,2,3,4]
# x = [0,1,4,9,16]
x = [i**2 for i in range(5)]
Search a list
x = [ hello , world ]
# return the position of "world" in the list
pos = x.index("world") # pos = 1
# raises a valueError if item not found
pos = x.index("work") # pos undefined
Modify a list
Initial: x = [0,1,2]
1. add elements to the end
x = x + [3] # x = [0,1,2,3]
x.append(3) # x = [0,1,2,3]
x = x + [3,4,5] # x = [0,1,2,3,4,5]
x.extend([3,4,5]) # x = [0,1,2,3,4,5]
2. insert element in the middle
x.insert(1,5) # x = [0,5,1,2]
3. add elements in the front
x.insert(0,3) # x = [3,0,1,2]
x = [3] + x # x = [3,0,1,2]
x = [3,4] + x # x = [3,4,0,1,2]
Iteration and List Comprehension
List items can be iterated in a loop
for x in range(5):
print x
## 0
## 1
## 2
## 3
## 4
This is the same as
for x in [0,1,2,3,4]:
print x
## 0
## 1
## 2
## 3
## 4
Return the list index in a loop
Z = ["Hello", "world", "Python"]
for i,x in enumerate(Z):
print i, x
## 0 Hello
## 1 world
## 2 Python
List comprehension o�ers easy and natural ways to construct lists
squares = [x**2 for x in range(5)]
print squares
evens = [ x for x in range(10) if x % 2==0]
print evens
## [0, 1, 4, 9, 16]
## [0, 2, 4, 6, 8]
Tuples
I Tuples are sequences that behave like lists
I Unlike lists, tuples are immutable and can’t be changed
I Tuples are defined by ‘(’ and ‘)’, lists use ‘[’ ’]’
x = (‘Jack’, ‘Smith’, ‘Lecturer’, ‘B’, 1)
Note items can have di�erent data types
I Retrieve items of a tuple
I x[0], x[1], . . . : first, second,. . . items of a tuple
I Tuple Expansion:
(fName, lName, title, level, step) = x
equivalent to 5 separate assigments so that fName = ‘Jack’,
lName = ‘Smith’, and so on.
Mutable vs Immutable Types
Dictionary
Collection is a bunch of values in a singe variable
I List, Tuple: collection of single values in order
x = ["Hello", "World", "Python"] # List
I Dictionary: order-less collection of key-value pairs
Keys must be unique, case sensitive if keys are strings
Create a dictionary
x = {"Hello":1, "World":2, "Python":2}
# the following defines the same dictionary
x = dict()
x["Hello"] = 1
x["World"] = 2
x["Python"] = 2
Modify a dictionary
# update the value for an existing key
x["Hello"] = 2
# add a new key-value pair
x["Programming"] = 5
# delete a key-value pair
del x["World"]
Dictionary example: counting words
wordList = ["hello", "hello", "world", "python", "PYTHON", "Hello"]
# define a list of words
wordDict = dict() # create an empty dictionary
for word in wordList:
word = word.lower() # convert to lower case
# add a new word to dictionary
if word not in wordDict:
wordDict[word] = 1
else: # increment count for old word
wordDict[word] = wordDict[word] + 1
print wordDict
## { python : 2, world : 1, hello : 3}
Files
Open files
Files can be opened with the open() function
fid = open("mytext.txt")
open() returns a file identifier (stored in fid) – a handle for further
file operations
It returns error if file does not exist
File reading
# read all lines into lines variable
lines = fid.read()
# read the next line into line variable
line = fid.readline()
An open file must be closed by fid.close()
File read example
fid = open("mytext.txt")
# print each line of mydata.txt in a loop
for line in fid: fid = open("mytext.txt")
print line # print each line of mydata.txt in a loop
fid.close() for line in fid:
print line.strip()
## First line fid.close()
##
## Second line ## First line
## ## Second line
## Three lines in total ## Three lines in total
Unpleasant extra line break
Write to files
Opening files for writing
fid = open("mydata.txt", "w")
I creates a new file if mydata.txt does not exist
I overwrites the old file if mydata.txt already exists
I use "a" instead of "w" to append to mydata.txt instead of
overwriting
Write to files
fid.write(line)
I need to pay a�ention to "newlines"
I print() prints a new line automatically, write() does not
I may have to use fid.write(line+ \n ) in most cases