Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
10 views4 pages

Big Data

Big Data refers to vast amounts of data that are challenging to process with traditional tools, characterized by the 5 V's: Volume, Velocity, Variety, Veracity, and Value. Two scaling approaches include vertical scaling (expanding a single machine) and horizontal scaling (using multiple machines). Technologies for Big Data include distributed file systems, NoSQL databases, and large-scale machine learning frameworks like Hadoop and Spark.

Uploaded by

Jaith Vindinu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views4 pages

Big Data

Big Data refers to vast amounts of data that are challenging to process with traditional tools, characterized by the 5 V's: Volume, Velocity, Variety, Veracity, and Value. Two scaling approaches include vertical scaling (expanding a single machine) and horizontal scaling (using multiple machines). Technologies for Big Data include distributed file systems, NoSQL databases, and large-scale machine learning frameworks like Hadoop and Spark.

Uploaded by

Jaith Vindinu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Big Data

Large amount of data in petabytes, exabytes which are difficult to process using current
database management tools or traditional data processing applications.

Main 5 V’s of big data

Volume

• Tens of Thousands of IoT sensors and thousands of cameras are placed around a
massive farm and this number will not change soon. The Audio and video streams
will be of high quality.

Velocity

• It is anticipated that millions of data points will be captured and transmitted a


second from these devices.

Variety

• IOT sensors will be used to capture the environment such as temperature, humidity,
and Light level and cameras will be used to capture audio and video data.

Veracity

• The sensors and cameras cannot be verified/authenticated in real time as the


overhead is too much

Value

• The system will use AI to analyze this data and make real time decisions abouts the
automated fans functions.

Two approaches to scale up Big Data systems

Vertical scaling – Enlarge a single machine (Limited in space and expensive)

Horizontal scaling – Use many commodities machine and form computer cluster or grids.

Features of Hadoop Features of Spark


Storage Unit (Hadoop distributed file Resilience distributed dataset (Usage of
system) RAM)
Replication of data (Redundancy) 100 times faster than Hadoop and
efficiency
Map Reduce (Split the data into parts) Spark core (Help data processing among
multiple computers, maintain efficiency
and smoothness)
Map Reduce leads to efficiency in load Spark streaming (Processing real time
balancing and save time) data)
Yarn (Contain containers, Fault tolerance Spark SQL (Write directly on data set)
Spark ML (Train large scale big data model)
Cluster manager handles spark driver
processes and executors)

Weakness of Hadoop Weakness of Spark


Rely on storing data on disk Memory consumption can lead to resource
exhaustion
Data processing is slow Since Hadoop introduces the first spark is
more unmature than Hadoop
Batch processing makes in wait to When a small data process does not need
complete another batch and then coming the Memory it transfers to the disk, but with
them together and after that giving the the spark disk configurations disk is
output. insufficient because spark mainly focus on
Memory.
Do not use RAM memory Insufficient disk usage can lead to
inefficiency in resource usage.

Technologies for Big Data

Distributed File Systems – HDFS, Google File System (GFS)

Distributed/Parallel Programming (MapReduce Model)

NoSQL database – MongoDB, Cassandra

Large Scale Machine Learning – Deep learning

Data warehouses/ Data Lakes

You might also like