Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
20 views26 pages

Lec1 - Intro To Data Engg

This document provides an overview of a data engineering lecture. It introduces the lecturer, Prof. Usman Qamar, and his background. It then discusses what data science and data engineering are, including how big data is growing exponentially. It outlines the knowledge discovery process and how data mining fits within it. Finally, it provides details about the course, including recommended textbooks, labs/assignments, term paper requirements, and marks distribution.

Uploaded by

Awais Imdad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views26 pages

Lec1 - Intro To Data Engg

This document provides an overview of a data engineering lecture. It introduces the lecturer, Prof. Usman Qamar, and his background. It then discusses what data science and data engineering are, including how big data is growing exponentially. It outlines the knowledge discovery process and how data mining fits within it. Finally, it provides details about the course, including recommended textbooks, labs/assignments, term paper requirements, and marks distribution.

Uploaded by

Awais Imdad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

DATA ENGINEERING: LECTURE 1

Dr Usman Qamar

There is an inherent meaning in everything. “Signs for people who can


see.”
Prof Usman Qamar
• Joined NUST in 2011
• Tenured Professor of Data Sciences
• Director, Knowledge and Data Science Centre, Centre of Excellence, NUST
• HoD Computer & Software Engineering

Education
• Post-Doc, University of Manchester, UK, 2011
• PhD, University of Manchester, UK, 2008
• MPhil, University of Manchester, UK, 2005
• MS, University of Manchester Institute of Science and Technology (UMIST), UK,
2003
• B.E, National University of Sciences and Technology (NUST), Pakistan, 2001

Previous Experience
• Office of National Statics, London, UK
• Vodafone, UK
Prof Usman Qamar
• Associate Editor: Information Sciences, Computers in
Biology & Medicine, Applied Soft Computing,
Engineering Applications of Artificial Intelligence,
ACM Transactions on Asian & Low-Resource
Language Processing, Applied Intelligence

• Editorial Board: Neural Computing & Applications |


Informatics in Medicine Unlocked

• Academic Editor: PLOS One


Prof Dr Usman Qamar
Prof Dr Usman Qamar
Prof Dr Usman Qamar
Prof Dr Usman Qamar
www.kdrc.live/usman

8
AGENDA

Course Introduction

Course Details

What is Data Engineering?


WHAT IS DATA SCIENCE?

Potential of AI
DATA SCIENCE

Potential of AI
Big Data Everywhere!

BIG
DATA
Shares traded on US
Stock Markets each
day:

7 Billion
Data generated in
one flight from NY
to London:

10 Terabytes

Number of tweets Number of ‘Likes’


Data that is TOO LARGE & TOO each day on
per day on Twitter:
COMPLEX for conventional data tools Facebook:
to capture, store and analyze.
400 Million 3 Billion
The 3V’s of Big Data

VOLUME VARIETY VELOCITY


90 % OF THE WORLD’S
DATA WAS
GENERATED IN THE
LAST TWO YEARS

12
DATA ENGINNERING

Data on its own is useless unless you can make sense of it!

13
Increasing Intelligence
KNOWLEDGE DISCOVERY (KDD) PROCESS

Data mining—core of knowledge


discovery process Pattern Evaluation

Data Mining

Task-relevant Data

Data Warehouse Selection

Data Cleaning

Data Integration

Databases
DATA MINING

• Data mining (knowledge discovery from data)


• Extraction of interesting (non-trivial, implicit, previously
unknown and potentially useful) patterns or knowledge
from huge amount of data
• Alternative names
• Knowledge discovery (mining) in databases (KDD),
knowledge extraction, data/pattern analysis, data
archeology, data dredging, information harvesting,
business intelligence, etc.
Unsupervised Learning
Natural partitioning by plotting data
Unsupervised Learning
 Unsupervised Learning
 Aim is to assign class labels
 Examples: Clustering
 k-means,
 SOM (Self Organizing Map)
 Expectation Max.,
 etc.
COURSE DETAILS

• Course Description

• Basic principles, techniques, tools and


applications of Data Engineering.
• Science of Data Engineering as the automatic
extraction of patterns representing knowledge
stored in large databases, data warehouses, and
other massive information repositories
• About the overlap that exists with areas such as
machine learning and pattern recognition.
• The concepts of data pre-processing, cluster
analysis, classification and prediction, frequent
pattern mining and data warehousing.
COURSE RESOURCES
• Recommended Textbook:
COURSE RESOURCES
• Reference books:
MARKS DISTRIBUTION

Quizzes (x6) 10
Midterm (x1) 30
Assign/Labs (x4) 5
Term Paper (x1) 10
Final (x1) 45
LABS/ASSIGN

• Matlab (2x)

• Rapidminer (2x)
TERM PAPER

• Each student will be required to develop a term


paper.
• Some research project ideas will become evident during
the course while others may be envisaged by the
students.
• The aim will be to write a research paper that may be
published in a conference at the end of the course

You might also like