DATA ENGINEERING: LECTURE 1
Dr Usman Qamar
There is an inherent meaning in everything. “Signs for people who can
see.”
Prof Usman Qamar
• Joined NUST in 2011
• Tenured Professor of Data Sciences
• Director, Knowledge and Data Science Centre, Centre of Excellence, NUST
• HoD Computer & Software Engineering
Education
• Post-Doc, University of Manchester, UK, 2011
• PhD, University of Manchester, UK, 2008
• MPhil, University of Manchester, UK, 2005
• MS, University of Manchester Institute of Science and Technology (UMIST), UK,
2003
• B.E, National University of Sciences and Technology (NUST), Pakistan, 2001
Previous Experience
• Office of National Statics, London, UK
• Vodafone, UK
Prof Usman Qamar
• Associate Editor: Information Sciences, Computers in
Biology & Medicine, Applied Soft Computing,
Engineering Applications of Artificial Intelligence,
ACM Transactions on Asian & Low-Resource
Language Processing, Applied Intelligence
• Editorial Board: Neural Computing & Applications |
Informatics in Medicine Unlocked
• Academic Editor: PLOS One
Prof Dr Usman Qamar
Prof Dr Usman Qamar
Prof Dr Usman Qamar
Prof Dr Usman Qamar
www.kdrc.live/usman
8
AGENDA
Course Introduction
Course Details
What is Data Engineering?
WHAT IS DATA SCIENCE?
Potential of AI
DATA SCIENCE
Potential of AI
Big Data Everywhere!
BIG
DATA
Shares traded on US
Stock Markets each
day:
7 Billion
Data generated in
one flight from NY
to London:
10 Terabytes
Number of tweets Number of ‘Likes’
Data that is TOO LARGE & TOO each day on
per day on Twitter:
COMPLEX for conventional data tools Facebook:
to capture, store and analyze.
400 Million 3 Billion
The 3V’s of Big Data
VOLUME VARIETY VELOCITY
90 % OF THE WORLD’S
DATA WAS
GENERATED IN THE
LAST TWO YEARS
12
DATA ENGINNERING
Data on its own is useless unless you can make sense of it!
13
Increasing Intelligence
KNOWLEDGE DISCOVERY (KDD) PROCESS
Data mining—core of knowledge
discovery process Pattern Evaluation
Data Mining
Task-relevant Data
Data Warehouse Selection
Data Cleaning
Data Integration
Databases
DATA MINING
• Data mining (knowledge discovery from data)
• Extraction of interesting (non-trivial, implicit, previously
unknown and potentially useful) patterns or knowledge
from huge amount of data
• Alternative names
• Knowledge discovery (mining) in databases (KDD),
knowledge extraction, data/pattern analysis, data
archeology, data dredging, information harvesting,
business intelligence, etc.
Unsupervised Learning
Natural partitioning by plotting data
Unsupervised Learning
Unsupervised Learning
Aim is to assign class labels
Examples: Clustering
k-means,
SOM (Self Organizing Map)
Expectation Max.,
etc.
COURSE DETAILS
• Course Description
• Basic principles, techniques, tools and
applications of Data Engineering.
• Science of Data Engineering as the automatic
extraction of patterns representing knowledge
stored in large databases, data warehouses, and
other massive information repositories
• About the overlap that exists with areas such as
machine learning and pattern recognition.
• The concepts of data pre-processing, cluster
analysis, classification and prediction, frequent
pattern mining and data warehousing.
COURSE RESOURCES
• Recommended Textbook:
COURSE RESOURCES
• Reference books:
MARKS DISTRIBUTION
Quizzes (x6) 10
Midterm (x1) 30
Assign/Labs (x4) 5
Term Paper (x1) 10
Final (x1) 45
LABS/ASSIGN
• Matlab (2x)
• Rapidminer (2x)
TERM PAPER
• Each student will be required to develop a term
paper.
• Some research project ideas will become evident during
the course while others may be envisaged by the
students.
• The aim will be to write a research paper that may be
published in a conference at the end of the course