0% found this document useful (0 votes)

338 views4 pages

Data Analyst Training Guide

The document provides an overview of a 20-day foundational training program covering topics related to business analytics, data engineering, cloud technologies, and programming. It discusses key concepts like data analysis processes, types of data analysts, requirements for becoming a data analyst, and characteristics of data pipelines. Example code is also provided to define a car class with attributes and methods in Python.

Uploaded by

pranay mahindrakar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

338 views4 pages

Data Analyst Training Guide

Uploaded by

pranay mahindrakar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 4

###Day 1 of Foundational Training;###

11 Topics to learn in 20 Days:

1) business analytics
2) Real world Scenarios, Case studies
3) SQL/Databases/Data Modelling
4) Cloud Service providers
5) Data Warehouse, Data Lake and its Architechture
6) ETL (Extract Tranasfer Load) - Informatica, Data Quality
7) Data Engg - big Data, PySpark, Kafka
8) Cloud Engg. - AWS, Azure, GCB, Data Visualization
9) Power BI
10) Adv. Python Programming
11) Github repositories

Data Analyst: someone who collects/cleans and interprets Data in order to solve a
particular problem.
They can work in diff. industries like business, finance, justice,
science, medicine, govt., etc.

Data Analysis: getting insights into data. Steps include;

1. Identify
2. Collect
3. Clean
4. analyze
5. represent/interpret

Types of Data analysts;

1. business Analyst
2. Market Research Analyst
3. medical and healthcare analyst

Requirements for becoming a data analyst;

1. Database Technologies
2. Programming Languages (Python) (this step is cleaning)
3. Visualization
4. Statistical and Mathematical methods
5. Industry Knowledge
6. Problem solving

Business Analyst: someone who looks into Business insights.

his day to day roles are understanding strategies, goals and requirements for his
business,
creating financial models to handle business decisions, data
visualization/representation (charts, pies), identifying and prioritizing the
requirements,
large data sets analyze by Excel, Microsoft Power BI tools, SQL, Tableau, Python
(Jupyter notebooks), etc.

DATA ENGINEERING

This includes building systems for collecting, storing and analyzing the data.
Organizations have the abiltity to collect massive amounts of data
and they need the right people amd technology to ensure the data is in highly
usable state
by the time it reaches the data scientists and analysts.
Diff. between Data Scientists and analysts is that a scientist thinks of the future
probabilities/possiblities of data and etc.

Hence a data engineer works in a variety of settings that collects, stores,

manipulates data and converts this raw data into usable information.
Make data accessible so that organizations can use it to evaluate and optimize
their performances.

Tasks;

1. Acquiring Data
2. Develop Algorithms
3. Build, test, and maintain database pipeline arhitectures
4. compliance with data governance and security policies

Typical flow in analysis project;

Define goal --> get the data --> clean the data --> enrich the data --> find
insights and visualize --> deploy ML --> Iterate

BI tools - application software which collects and processes large amounts of

unstructured data from internal and external systems, e.g., Power BI

Data Warehousing - process for collecting and managing data from varied sources to
provide meaningful business insights.

Data Pipeline - a data pipeline essentially is the steps involved in aggregating,

organizing and moving data.
Modern data pipelines automate many of the manual steps involved in
transforming and optimizing continuous data loads.

their importance is;

- rely on real-time data analysis

- stores data in the cloud
- houses data in multiple resources

elements of a data pipeline;

1. source
2. processing steps
3. destination

characteristics when considering data pipeline;

1. continuous and extensible data processing

2. high availability and disaster recovery
3. the elasticty and agility of the cloud
4. self-service management

SNOWFLAKE is one of the important and popular Data pipelining services.

COMPUTATION COST IN CLOUD;

PAY AS YOU GO
5 Vs of Marketing;
1. Variety
2. velocity
3. volume
4. value
5. veracity

###Day 2 of FT###

login credentials for google rps cloud; 18

Car and specifcs; class - car, cub capacity, no. of seats= attributes
functions = fFw, rear sriving, parkinhg
classes - properties = Hyundai, toyota
model name

class Car:
def __init__(self, capacity, number_of_seats):
self.capacity = capacity
self.number_of_seats = number_of_seats
return 'the capcity is'
return 'the no. of seats are'

def fwd(self, drive_type1):

self.drive_type = drive_type1
return 'the car is a FW drive'

def rwd(self, drive_type2):

self.drive_type = drive_type2
return 'the car is an RW drive'

def parking(self, parking):

self.parking = parking

class Model_name(Car):
def Hyundai(self, name1):
self.name = name1

def Toyota(self, name2):

self.name = name2

Big Data: stands for 3Vs;

Veracity
Velocity
Volume

HDFS - Hadoop Distributor File System

Topics for Assessment:

1. Data Engineering, - done

2. Advanced Python, - done
3. Big Data, - done
4. Hadoop, - done
5. Spark, - done
6. PySpark, - done
7. Cloud Technology, - done
8. GCP, - done
9. AWS, - done
10. Azure - done

Forensic Toxicology
90% (10)
Forensic Toxicology
165 pages
Grade Ten (10) Work: Unit 1: The Birth and Infancy of John The Baptist and Jesus Christ
67% (3)
Grade Ten (10) Work: Unit 1: The Birth and Infancy of John The Baptist and Jesus Christ
88 pages
Complete Data Engineering Roadmap With Resources
No ratings yet
Complete Data Engineering Roadmap With Resources
16 pages
THINK L2 Unit 4 Vocabulary Extension
No ratings yet
THINK L2 Unit 4 Vocabulary Extension
2 pages
Verrier Elwin, Sarat Chandra Roy - The Agaria (1992, Oxford University Press, USA)
No ratings yet
Verrier Elwin, Sarat Chandra Roy - The Agaria (1992, Oxford University Press, USA)
383 pages
Science 10 Q3 Summative Test
No ratings yet
Science 10 Q3 Summative Test
9 pages
Richc Dad Financial Statement Template
No ratings yet
Richc Dad Financial Statement Template
10 pages
Big Data: Challenges and Applications
No ratings yet
Big Data: Challenges and Applications
30 pages
3 Lecture 3-ETL
100% (1)
3 Lecture 3-ETL
42 pages
Web Guide
No ratings yet
Web Guide
60 pages
Algorithms and Models For The Web Graph Anthony Bonato Download
No ratings yet
Algorithms and Models For The Web Graph Anthony Bonato Download
175 pages
Food Package Distribution Report
No ratings yet
Food Package Distribution Report
7 pages
UNIT 1 Merged
No ratings yet
UNIT 1 Merged
11 pages
DevOps KKK PDF
No ratings yet
DevOps KKK PDF
168 pages
Spark NLP Training-Public-April 2020
No ratings yet
Spark NLP Training-Public-April 2020
39 pages
Introduction To Programming Paradigms
No ratings yet
Introduction To Programming Paradigms
16 pages
Orientation Groupings Auxilio
No ratings yet
Orientation Groupings Auxilio
12 pages
Data Engineering Cert Guide
No ratings yet
Data Engineering Cert Guide
15 pages
Data Engineering Bootcamp for All
No ratings yet
Data Engineering Bootcamp for All
12 pages
Modern World History - Chapter 16
No ratings yet
Modern World History - Chapter 16
52 pages
Mongodb Spark
No ratings yet
Mongodb Spark
13 pages
MSSQL Server 2008 Developer
No ratings yet
MSSQL Server 2008 Developer
240 pages
Slide 6 NoSQL Database and HBase Tutorial
No ratings yet
Slide 6 NoSQL Database and HBase Tutorial
110 pages
PTC Big Data Analysis With ApacheS 27.11-28.11.2019 Handout
No ratings yet
PTC Big Data Analysis With ApacheS 27.11-28.11.2019 Handout
48 pages
Mission 04 Briefing
No ratings yet
Mission 04 Briefing
20 pages
Data Engineering UNIT-1
No ratings yet
Data Engineering UNIT-1
5 pages
Tax Updates for Business Owners
No ratings yet
Tax Updates for Business Owners
61 pages
DVS SPARK Course Content PDF
No ratings yet
DVS SPARK Course Content PDF
2 pages
C Api For KDB PDF
No ratings yet
C Api For KDB PDF
35 pages
RPG and Story-Based Game in Game Development
No ratings yet
RPG and Story-Based Game in Game Development
9 pages
Miracle Worker: Chase Ra'Mel Phillips Ms. Nelson English 1
No ratings yet
Miracle Worker: Chase Ra'Mel Phillips Ms. Nelson English 1
3 pages
History of Tango
No ratings yet
History of Tango
22 pages
Python Advanced - Threads and Threading
No ratings yet
Python Advanced - Threads and Threading
9 pages
Mutus Liber Images
No ratings yet
Mutus Liber Images
15 pages
Natural Language Processing
No ratings yet
Natural Language Processing
116 pages
Whitepaper KX
No ratings yet
Whitepaper KX
230 pages
Soal Ujian Kelas 9 SMP Inggris 2018
No ratings yet
Soal Ujian Kelas 9 SMP Inggris 2018
6 pages
Class-3 - Ratio & Proportion& Data Interpreation
No ratings yet
Class-3 - Ratio & Proportion& Data Interpreation
11 pages
Data Warehouse Modeling Guide
0% (1)
Data Warehouse Modeling Guide
50 pages
Serverless Architecture For Product Defect Detection Using Computer Vision Ra
No ratings yet
Serverless Architecture For Product Defect Detection Using Computer Vision Ra
1 page
Berecki, S., A Settlement Belonging To The Coţofeni Culture From Ogra (Mureş County), Marisia, XXVIII, 7-25.
No ratings yet
Berecki, S., A Settlement Belonging To The Coţofeni Culture From Ogra (Mureş County), Marisia, XXVIII, 7-25.
19 pages
Learning Apache Spark With Python
No ratings yet
Learning Apache Spark With Python
10 pages
Cloudera Administrator Training
100% (6)
Cloudera Administrator Training
373 pages
LUFFING TOWER CRANE Tower Crane Specifications - Jib Tower Crane
No ratings yet
LUFFING TOWER CRANE Tower Crane Specifications - Jib Tower Crane
14 pages
Pydon'ts: Write Beautiful Python Code
No ratings yet
Pydon'ts: Write Beautiful Python Code
110 pages
Cloudera Hadoop Admin Notes PDF
No ratings yet
Cloudera Hadoop Admin Notes PDF
65 pages
GA WP ArchitectingResilientPrivateClouds 0511
No ratings yet
GA WP ArchitectingResilientPrivateClouds 0511
16 pages
Embracing Identity Through Names
No ratings yet
Embracing Identity Through Names
1 page
PLSQL
100% (1)
PLSQL
195 pages
Hadoop Overview
100% (1)
Hadoop Overview
16 pages
Hadoop Interview Questions Guide
100% (1)
Hadoop Interview Questions Guide
34 pages
Dental Manpower
No ratings yet
Dental Manpower
24 pages
KDB
No ratings yet
KDB
27 pages
Fuzzy Logic
No ratings yet
Fuzzy Logic
23 pages
3 Mapreduce Notes
No ratings yet
3 Mapreduce Notes
25 pages
Evidence of Evolution
No ratings yet
Evidence of Evolution
15 pages
Microservices With Docker, Flask, and React
No ratings yet
Microservices With Docker, Flask, and React
2 pages
Mini Project 1
No ratings yet
Mini Project 1
9 pages
Rogers Et Al., 2018
No ratings yet
Rogers Et Al., 2018
12 pages
Apache Spark Theory by Arsh
No ratings yet
Apache Spark Theory by Arsh
4 pages
Technologies For Handling Big Data: Prepared By: Saidatul Rahah Hamidi
No ratings yet
Technologies For Handling Big Data: Prepared By: Saidatul Rahah Hamidi
49 pages
Cicd Workshop
No ratings yet
Cicd Workshop
46 pages
Hadoop Lab
100% (1)
Hadoop Lab
32 pages
Unit 1 Introduction To BIG DATA ANALYSIS: Evolution of Technology
No ratings yet
Unit 1 Introduction To BIG DATA ANALYSIS: Evolution of Technology
9 pages
Spark Training in Bangalore
No ratings yet
Spark Training in Bangalore
36 pages
13 SparkBuildingAndDeploying
No ratings yet
13 SparkBuildingAndDeploying
53 pages
Some of The Frequently Asked Interview Questions For Hadoop Developers Are
100% (1)
Some of The Frequently Asked Interview Questions For Hadoop Developers Are
72 pages
Components of .NET Framework
No ratings yet
Components of .NET Framework
25 pages
Problem Description: Sensitivity: Internal & Restricted
No ratings yet
Problem Description: Sensitivity: Internal & Restricted
2 pages
Apache Druid: Sudhindra Tirupati Nagaraj
No ratings yet
Apache Druid: Sudhindra Tirupati Nagaraj
12 pages
P.D. No. 223
No ratings yet
P.D. No. 223
1 page
7 Hive Notes
No ratings yet
7 Hive Notes
36 pages
TF On Spark
No ratings yet
TF On Spark
35 pages
Apache Spark Essential Training
No ratings yet
Apache Spark Essential Training
30 pages
Data Warehouse - What Is It
No ratings yet
Data Warehouse - What Is It
5 pages
2 Hadoop (Uploaded)
No ratings yet
2 Hadoop (Uploaded)
82 pages
PracticeExam DCADAS3 Scala 1
No ratings yet
PracticeExam DCADAS3 Scala 1
27 pages
Big Data Hadoop Training Certification 7
No ratings yet
Big Data Hadoop Training Certification 7
40 pages
Building Data Pipelines - 3
No ratings yet
Building Data Pipelines - 3
29 pages
25 Python Materials
No ratings yet
25 Python Materials
3 pages
Hadoop Notes Unit2
No ratings yet
Hadoop Notes Unit2
24 pages
Python Data Pipeline Guide
No ratings yet
Python Data Pipeline Guide
38 pages
Lab - GAE
No ratings yet
Lab - GAE
133 pages
Comprehensive Azure SQL Training Guide
No ratings yet
Comprehensive Azure SQL Training Guide
6 pages
BD - Spark - Baladasu A - SightSpectrum
No ratings yet
BD - Spark - Baladasu A - SightSpectrum
3 pages
Data Science ML Full Stack 2022 GitHub
No ratings yet
Data Science ML Full Stack 2022 GitHub
9 pages
Facebook Hive POC
No ratings yet
Facebook Hive POC
18 pages
Unstructured Dataload Into Hive Database Through PySpark
No ratings yet
Unstructured Dataload Into Hive Database Through PySpark
9 pages

Data Analyst Training Guide

Uploaded by

Data Analyst Training Guide

Uploaded by

###Day 1 of Foundational Training;###

11 Topics to learn in 20 Days:

Data Analysis: getting insights into data. Steps include;

Types of Data analysts;

Requirements for becoming a data analyst;

Business Analyst: someone who looks into Business insights.

Hence a data engineer works in a variety of settings that collects, stores,

Typical flow in analysis project;

BI tools - application software which collects and processes large amounts of

Data Pipeline - a data pipeline essentially is the steps involved in aggregating,

their importance is;

- rely on real-time data analysis

elements of a data pipeline;

characteristics when considering data pipeline;

1. continuous and extensible data processing

SNOWFLAKE is one of the important and popular Data pipelining services.

COMPUTATION COST IN CLOUD;

login credentials for google rps cloud; 18

def fwd(self, drive_type1):

def rwd(self, drive_type2):

def parking(self, parking):

def Toyota(self, name2):

Big Data: stands for 3Vs;

HDFS - Hadoop Distributor File System

Topics for Assessment:

1. Data Engineering, - done

You might also like