###Day 1 of Foundational Training;###
11 Topics to learn in 20 Days:
1) business analytics
2) Real world Scenarios, Case studies
3) SQL/Databases/Data Modelling
4) Cloud Service providers
5) Data Warehouse, Data Lake and its Architechture
6) ETL (Extract Tranasfer Load) - Informatica, Data Quality
7) Data Engg - big Data, PySpark, Kafka
8) Cloud Engg. - AWS, Azure, GCB, Data Visualization
9) Power BI
10) Adv. Python Programming
11) Github repositories
Data Analyst: someone who collects/cleans and interprets Data in order to solve a
particular problem.
They can work in diff. industries like business, finance, justice,
science, medicine, govt., etc.
Data Analysis: getting insights into data. Steps include;
1. Identify
2. Collect
3. Clean
4. analyze
5. represent/interpret
Types of Data analysts;
1. business Analyst
2. Market Research Analyst
3. medical and healthcare analyst
Requirements for becoming a data analyst;
1. Database Technologies
2. Programming Languages (Python) (this step is cleaning)
3. Visualization
4. Statistical and Mathematical methods
5. Industry Knowledge
6. Problem solving
Business Analyst: someone who looks into Business insights.
his day to day roles are understanding strategies, goals and requirements for his
business,
creating financial models to handle business decisions, data
visualization/representation (charts, pies), identifying and prioritizing the
requirements,
large data sets analyze by Excel, Microsoft Power BI tools, SQL, Tableau, Python
(Jupyter notebooks), etc.
DATA ENGINEERING
This includes building systems for collecting, storing and analyzing the data.
Organizations have the abiltity to collect massive amounts of data
and they need the right people amd technology to ensure the data is in highly
usable state
by the time it reaches the data scientists and analysts.
Diff. between Data Scientists and analysts is that a scientist thinks of the future
probabilities/possiblities of data and etc.
Hence a data engineer works in a variety of settings that collects, stores,
manipulates data and converts this raw data into usable information.
Make data accessible so that organizations can use it to evaluate and optimize
their performances.
Tasks;
1. Acquiring Data
2. Develop Algorithms
3. Build, test, and maintain database pipeline arhitectures
4. compliance with data governance and security policies
Typical flow in analysis project;
Define goal --> get the data --> clean the data --> enrich the data --> find
insights and visualize --> deploy ML --> Iterate
BI tools - application software which collects and processes large amounts of
unstructured data from internal and external systems, e.g., Power BI
Data Warehousing - process for collecting and managing data from varied sources to
provide meaningful business insights.
Data Pipeline - a data pipeline essentially is the steps involved in aggregating,
organizing and moving data.
Modern data pipelines automate many of the manual steps involved in
transforming and optimizing continuous data loads.
their importance is;
- rely on real-time data analysis
- stores data in the cloud
- houses data in multiple resources
elements of a data pipeline;
1. source
2. processing steps
3. destination
characteristics when considering data pipeline;
1. continuous and extensible data processing
2. high availability and disaster recovery
3. the elasticty and agility of the cloud
4. self-service management
SNOWFLAKE is one of the important and popular Data pipelining services.
COMPUTATION COST IN CLOUD;
PAY AS YOU GO
5 Vs of Marketing;
1. Variety
2. velocity
3. volume
4. value
5. veracity
###Day 2 of FT###
login credentials for google rps cloud; 18
Car and specifcs; class - car, cub capacity, no. of seats= attributes
functions = fFw, rear sriving, parkinhg
classes - properties = Hyundai, toyota
model name
class Car:
def __init__(self, capacity, number_of_seats):
self.capacity = capacity
self.number_of_seats = number_of_seats
return 'the capcity is'
return 'the no. of seats are'
def fwd(self, drive_type1):
self.drive_type = drive_type1
return 'the car is a FW drive'
def rwd(self, drive_type2):
self.drive_type = drive_type2
return 'the car is an RW drive'
def parking(self, parking):
self.parking = parking
class Model_name(Car):
def Hyundai(self, name1):
self.name = name1
def Toyota(self, name2):
self.name = name2
Big Data: stands for 3Vs;
Veracity
Velocity
Volume
HDFS - Hadoop Distributor File System
Topics for Assessment:
1. Data Engineering, - done
2. Advanced Python, - done
3. Big Data, - done
4. Hadoop, - done
5. Spark, - done
6. PySpark, - done
7. Cloud Technology, - done
8. GCP, - done
9. AWS, - done
10. Azure - done