Introduction to
BIG DATA
(Tools & Trends)
Md. Anower Perves
Assistant Professor
Department of CSE
Southeast University.
[email protected]
www.anowerperves.com
Outline
What is Big Data?
Why Big Data?
Big Data Tools
2
Big Data
Big data is an umbrella term for datasets that cannot
reasonably be handled by traditional computers or tools
due to their volume, velocity, and variety. This term is
also typically applied to technologies and strategies to
work with this type of data.
3
What is Big Data?
● There is lot of data available
○ E.g. Internet of things
● We have computing power
● We have technology
● Goal is same
○ To know
○ To Explain
○ To predict
● Challenge is the full lifecycle
4
Data, the wealth of our time
"Data is a precious thing because they last longer than
systems”
- Tim Berners Lee
● Access to data is becoming ultimate
competitive advantage
○ E.g. Google+ vs. Facebook
○ Why many organizations try hard to give
us free things and keep us always logged
in (e.g. Gmail, facebook, search engine
toolbars)
5
3 V’s !!!
Data
Velocity
e
m
Ti
al
Re e
im
alT
R e
ar dic
Ne io
Per
tch
Ba
MB GB TB PB
ta
b le
da
ta
Ph
b as Data
So
ot
e
cia
o
We Volume
l
Vi b
de Audio
o
Uns
Data truc
ture
d Mobile
Variety
6
Byte : one grain of rice
Byte
7
Byte : one grain of rice
Kilobyte : cup of rice
Kilobyte
8
Byte : one grain of rice
Kilobyte : cup of rice
Megabyte : 8 bags of rice
Megabyte
9
Byte : one grain of rice
Kilobyte : cup of rice
Megabyte : 8 bags of rice
Gigabyte : 3 Semi trucks
Gigabyte
10
Byte : one grain of rice
Kilobyte : cup of rice
Megabyte : 8 bags of rice
Gigabyte : 3 Semi trucks
Terabyte : 2 Container Ships
Terabyte
11
Byte : one grain of rice
Kilobyte : cup of rice
Megabyte : 8 bags of rice
Gigabyte : 3 Semi trucks
Terabyte : 2 Container Ships
Petabyte : Blankets Manhattan
Petabyte
12
Byte : one grain of rice
Kilobyte : cup of rice
Megabyte : 8 bags of rice
Gigabyte : 3 Semi trucks
Terabyte : 2 Container Ships
Petabyte : Blankets Manhattan
Exabyte : Blankets west coast states
Exabyte
13
Byte : one grain of rice
Kilobyte : cup of rice
Megabyte : 8 bags of rice
Gigabyte : 3 Semi trucks
Terabyte : 2 Container Ships
Petabyte : Blankets Manhattan
Exabyte : Blankets west coast states
Zettabyte : Fills the Pacific Ocean
Zettabyte
14
Byte : one grain of rice
Kilobyte : cup of rice
Megabyte : 8 bags of rice
Gigabyte : 3 Semi trucks
Terabyte : 2 Container Ships
Petabyte : Blankets Manhattan
Exabyte : Blankets west coast states
Zettabyte : Fills the Pacific Ocean
Yottabyte : A Earth Size Rice Ball
Yottabyte
15
Byte : one grain of rice
Kilobyte : cup of rice
Megabyte : 8 bags of rice
Gigabyte : 3 Semi trucks
Terabyte : 2 Container Ships
Petabyte : Blankets Manhattan
Exabyte : Blankets west coast states
Zettabyte : Fills the Pacific Ocean
Yottabyte : A Earth Size Rice Ball
Hobbyist
16
Byte : one grain of rice
Kilobyte : cup of rice
Megabyte : 8 bags of rice
Gigabyte : 3 Semi trucks
Terabyte : 2 Container Ships
Petabyte : Blankets Manhattan
Exabyte : Blankets west coast states
Zettabyte : Fills the Pacific Ocean
Yottabyte : A Earth Size Rice Ball
Desktop
17
Byte : one grain of rice
Kilobyte : cup of rice
Megabyte : 8 bags of rice
Gigabyte : 3 Semi trucks
Terabyte : 2 Container Ships
Petabyte : Blankets Manhattan
Exabyte : Blankets west coast states
Zettabyte : Fills the Pacific Ocean
Yottabyte : A Earth Size Rice Ball
Internet
18
Byte : one grain of rice
Kilobyte : cup of rice
Megabyte : 8 bags of rice
Gigabyte : 3 Semi trucks
Terabyte : 2 Container Ships
Petabyte : Blankets Manhattan
Exabyte : Blankets west coast states
Zettabyte : Fills the Pacific Ocean
Yottabyte : A Earth Size Rice Ball
Big Data
19
Byte : one grain of rice
Kilobyte : cup of rice
Megabyte : 8 bags of rice
Gigabyte : 3 Semi trucks
Terabyte : 2 Container Ships
Petabyte : Blankets Manhattan
Exabyte : Blankets west coast states
Zettabyte : Fills the Pacific Ocean
Yottabyte : A Earth Size Rice Ball
20
Byte : one grain of rice
Kilobyte : cup of rice
Megabyte : 8 bags of rice
Gigabyte : 3 Semi trucks
Terabyte : 2 Container Ships
Petabyte : Blankets Manhattan
Exabyte : Blankets west coast states
Zettabyte : Fills the Pacific Ocean
Yottabyte : A Earth Size Rice Ball
The Future?
21
Why Big Data is hard?
● How store?
○ Assuming 1TB bytes it takes 1000 computers to
store a 1PB
● How to move?
○ Assuming 10Gb network, it takes 2 hours to copy
1TB, or 83 days to copy a 1PB
● How to search?
○ Assuming each record is 1KB and one machine
can process 1000 records per sec, it needs
277CPU days to process a 1TB and 785 CPU
years to process a 1 PB
● How to process?
○ How to convert algorithms to work in large size
○ How to create new algorithms
22
23
Big Data Tools
24