Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
28 views54 pages

Internshipdocument

Internship document of google ai and ml
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views54 pages

Internshipdocument

Internship document of google ai and ml
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

A

REPORT ON
SUMMER INTERNSHIP
FOR
IV YEAR-I SEM
Submitted in partial fulfilment for the award of certificate of the degree of
BACHELOR OF TECHNOLOGY
IN

COMPUTER SCIENCE AND ENGINEERING


BY
PALLAPOTHU TEJASWINI
(21JD1A0584)

ALTERYX SPARKED DATA ANALYTICS PROCESS


AUTOMATION VIRTUAL INTERNSHIP

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


ELURU COLLEGE OF ENGINEERING AND TECHNOLOGY
(JNTUK)
DUGGIRALA(V),PEDAVEGI(M),ELURU-534004
APPROVED BY AICTE-NEW DELHI&AFFILLIATED TO JNTUK-
KAKINADA
2024-2025

1
ELURU COLLEGE OF ENGINEERING AND TECHNOLOGY(JNTUK)
DUGGIRALA(V).PEDAVEGI(M).ELURU-534004
Affiliated to JNTUK,Kakinada & Approved By AICTE-New Delhi
Department of Computer Science and Engineering

CERTIFICATE

This is to certify that the Summer Internship Report entitled “Alteryx Sparked Data
Analytics Process Automation Virtual Internship” being submitted in partial
fulfilment for the award of the degree Bachelor of Technology in Department of
Computer Science and Engineering to the Jawaharlal Nehru Technological
University Kakinada is a record of Bonafide work carried out by PALLAPOTHU
TEJASWINI(21JD1A0584)

HEAD OF THE DEPARTMENT


Dr.S.Suresh Mtech.,Ph.D

EXTERNAL EXAMINER

2
DECLARATION

I here by declare that the Summer Internship work entitled “Data Analytics Process
Automation Virtual Internship”Submitted to JNTU Kakinada is a record of the original
work I did.This Summer Internship Work is submitted in partial fulfilment for the
degree of Bachelor of Technology in Specialization .The results embedded in this
thesis have not been submitted to anyother university or Institute for the award of
any degree.

PALLAPOTHU TEJASWINI
(21JD1A0584)

3
PROGRAM BOOK FOR

SUMMER INTERNSHIP
Name of the Student : PALLAPOTHU TEJASWINI
Name of the College : ELURU COLLEGE OF ENGINEERING & TECHNOLOGY

Registered No: 21JD1A0584

Period of Internship:

From:- APRIL 2024


To:- JUNE 2024
Name and Address of Intern / Organization : Eduskills Supported By India Edu
Program

4
5
ELURU COLLEGE OF ENGINEERING & TECHNOLOGY
Department of Computer Science
VISION-MISSION-PEOs

Institute Vision Pioneering Professional Education through Quality


Institute Mission IM1 : To deliver quality education through good infrastructure ,
facilities and committed staff.

IM2 : To train students as proficient, competent and socially


responsible engineers.

IM3 : To promote the research and development activities


among faculty and students for betterment of society.

Department Empower the students of computer science &


Vision engineering Department to be technologically strong,
innovative & global citizens maintaining human values.

Department DM1: Inspire students to become self motivated & problem solving
Mission individuals.

DM2: Furnish students for professional career with


academic excellence and leadership skills.

DM3: Create centre of excellence in Computer Science


& Engineering.

DM4: Empower the youth & rural communities with


computer education

Program Graduates of Information technology are able to:


Education
al PEO1: Excel in Professional career through knowledge in
Objectives mathematics and engineering principles.
(PEOs)
PEO2: Able to pursue higher education and research.

PEO3: Communicate effectively, recognize, and incorporate societal


needs in their professional endeavors.

PEO4: Adapt to technological advancements by continuous learning.

6
PROGRAM OUTCOMES(POs)

1 Engineering Knowledge: Apply the knowledge of mathematics, science, engineering


fundamentals and an engineering specialization to the solution of complex
engineeringProblems.
2 Problem analysis: Identify, formulate, review research literature, and analyze complex
engineering problems reaching substantiated conclusions using first principles of
mathematics, natural sciences, and engineering sciences.
3 Design/development of solutions: Design solutions for complex engineering problems
and design system components or processes that meet the specified needs with
appropriate consideration for the public health and safety, and the cultural, societal,
and environmental
considerations
4 Conduct investigations of complex problems: Use research-based knowledge and
research
methods including design of experiments, analysis and interpretation of data, and
synthesis of the information to provide valid conclusions.
5 Modern tool usage: Create, select, and apply appropriate techniques, resources, and
modernengineering and IT tools including prediction and modeling to complex
engineering activities
with an understanding of the limitations.
6 The engineer and society: Apply reasoning informed by the contextual knowledge to
assess
societal, health, safety, legal and cultural issues and the consequent responsibilities
relevant to the professional engineering practice.
7 Environment and sustainability: Understand the impact of the professional
engineering
solutions in societal and environmental contexts, and demonstrate the knowledge of,
and need for sustainable development.
8 Ethics: Apply ethical principles and commit to professional ethics and responsibilities and
norms of the engineering practice.
9 Individual and team work: Function effectively as an individual, and as a member or
leader in diverse teams, and in multidisciplinary settings.
10 Communication: Communicate effectively on complex engineering activities with the
engineering community and with society at large, such as, being able to comprehend and
write effective reports and design documentation, make effective presentations, and give
and receive
clear instructions.
11 Project management and finance: Demonstrate knowledge and understanding of the
engineering and management principles and apply these to one’s own work, as a
member and
leader in a team, to manage projects and in multidisciplinary environments.
12 Life-long learning: Recognize the need for, and have the preparation and ability to
engage inindependent and life-long learning in the broadest context of technological
change.

7
INTERNSHIP LOG
Internship with: AICTE
Duration: [23/04/2024 – 25/06/2024]

CYBER SECURITY VIRTUAL INTERNSHIP


S.no Date Program
1 23-04-2024 Introduction to Data Analytics Process Automation
2 24-04-2024 Understanding Of Data Analytics Process Automation
3 25-04-2024 Evolution Data Analytics Process Automation
4 26-04-2024 Data Analytics Process Automation Overview
5 27-04-2024 Daily Test - 01
6 28-04-2024 Daily Test – 02
7 29-04-2024 Objectives Of Data Analytics Process Automation Internship
8 30-04-2024 Fundamentals Of Data Analytics Process Automation
9 31-04-2024 Data Analytics Process Automation Platform Overview
10 01-05-2024 Key Features Of Comonents
11 02-05-2024 Data Analytics Process Automation Studio
12 03-05-2024 Assignment – 01
13 04-05-2024 Data Analytics Process Automation orchestrator - 01
14 05-05-2024 Data Analytics Process Automation
15 06-05-2024 Daily Test – 03
16 07-05-2024 Daily Test – 04
17 08-05-2024 Scope And Learning Of Objectives
18 09-05-2024 Roles And Responsibilities
19 10-05-2024 Data Analytics Process Automation Briefing Session - 01
20 11-05-2024 Assignment – 02
21 12-05-2024 Project Overview
22 13-05-2024 Setting Up Data Analytics Process Automation Environment
23 14-05-2024 Basics Of Data Analytics Process Automation Studio
24 15-05-2024 Daily Test – 05
25 16-05-2024 Daily Test – 06
26 17-05-2024 Assignment – 03
27 18-05-2024 Building First Automation Process
28 19-05-2024 Data Analytics Process Automation Briefing Session - 02
29 20-05-2024 Data Analytics Process Automation Activities and Tasks

8
30 21-05-2024 Doubts Clearing Session – 01
31 22-05-2024 Core Activities and Tasks
32 23-05-2024 Variables and Data types
33 24-05-2024 Submission Of Finished Task – 01
34 25-05-2024 Daily Test – 07
35 26-05-2024 Daily Test - 08
36 27-05-2024 Control Flow Activites
37 28-05-2024 UI automation
38 29-05-2024 Assignment – 04
39 30-05-2024 Data Scrapping and Data Extraction
40 31-05-2024 Data Analytics Process Automation Briefing Session – 03
41 01-06-2024 Data Analytics Process Automation Labs
42 02-06-2024 Doubts Clearing Session – 02
43 03-06-2024 Data Analytics Process Automation orchestrator – 02
44 04-06-2024 Automation Projects and Challenges In Implementation
45 05-06-2024 Daily Test – 09
46 06-06-2024 Daily Test – 10
47 07-06-2024 Desing and Development Of Process
48 08-06-2024 Testing and debugging
49 09-06-2024 Submission Of Finished Task – 02
50 10-06-2024 Integration With Other Systems
51 11-06-2024 Submission Of Finished Task – 03
52 12-06-2024 Assignment – 05
53 13-06-2024 Doubts Clearing Session – 03
54 14-06-2024 Common Challenges in Data Analytics Process Automation Projects
55 15-06-2024 Solutions and Best Practices
56 16-06-2024 Data Analytics Process Automation Briefing Session – 04
57 17-06-2024 Case Study
58 18-06-2024 Internship Conclusion and Recommendations
59 19-06-2024 Key Takeaways
60 20-06-2024 Application Development
61 21-06-2024 Application Processing
62 22-06-2024 Final Assessment
63 23-06-2024 Process of Certification
64 24-06-2024 Career Paths In Data Analytics Process Automation

9
Module contents Page number
Module 1 Introduction to Data Analytics process 11-12
automation virtual internship
1.overview
2.objectives
Module 2 Alteryx Foundation Micro credential 13-17
1.overview of foundational concepts
2.data preparation and blending
3. analysis and reporting
Module 3 Machine learning fundamentals Micro 18-23
Credential
1.Introduction to machine learning
2.Key algorithms and techniques
3.Implementing machine learning in
Alteryx
Module 4 Alteryx Designer core certification 24-28
1.Introduction to Alteryx designer
2.core concepts and tools
Introduction to Alteryx Sparked 29-33
1.History
2.Needs and uses
Introduction to Data Anlaytics 34-42
Machine Learning 43-49
HTML code 50-53
Conclusion 54

10
Module 1: Introduction to Data Analytics
The internship was designed to equip participants with essential skills in data
analytics, focusing on automation and leveraging the capabilities of Alteryx
Designer. Throughout the program, several core courses were completed, including
Alteryx Designer Core, Micro Fundamentals, and Machine Learning Core Designer
courses. These courses collectively built a strong foundation in data analytics and
process automation, paving the way for proficiency in handling complex data
workflows and analytics tasks using Alteryx.
After completing this training, you should be able to:
 Understand the fundamental concepts and capabilities of Alteryx Designer.
 Develop and automate data workflows using Alteryx tools.
 Apply micro-level data processing techniques to enhance data quality and
analytics.
 Utilize machine learning models within Alteryx for advanced data analysis and
 predictive modeling
 Integrate various data sources and perform comprehensive data blending and
 preparation.
 Implement best practices in data analytics and process automation to improve

efficiency and accuracy.

1.Overview
The Alteryx Data Analytics Automation Virtual Internship offers an immersive
experiencedesigned to equip participants with hands-on skills in data analytics and
automation using theAlteryx platform. This program is ideal for individuals looking to
enhance their capabilitiesin data manipulation, analysis, and workflow automation,
preparing them for a successfulcareer in data analytics

2.Objectives
 Develop Advanced Data Preparation Skills: Gain proficiency in
using AlteryxDesigner to clean, transform, and prepare data for analysis. This
includes masteringvarious tools and functions within Alteryx to automate data
workflows, ensuring dataaccuracy, and minimizing manual intervention.
 Master Data Integration Techniques: Learn to integrate data from diverse
sources such as databases, APIs, and flat files. The objective is
to streamline dataconsolidation processes, making it easier to combine,
enrich, and leverage data for comprehensive analysis
 Enhance Analytical Capabilities: Utilize Alteryx’ s robust analytical
tools toperform complex data analyses. This includes statistical analysis,
predictive modeling,and spatial analysis to derive actionable insights and
support data-driven decision-making processes.

11
 Automate Data Workflows: Focus on automating repetitive and time-
consumingdata tasks using Alteryx. Develop skills to create efficient
workflows that can bescheduled and monitored, reducing the time
required for data preparation and analysis
 Understand Data Governance and Security: Learn best practices
for datagovernance within the Alteryx platform. This includes understanding
how to manageuser access, ensure data quality, and comply with
data security standards and regulations.
 Develop Problem-Solving Skills: Apply critical thinking and problem-solving
skills to real-world business scenarios. Use Alteryx to identify issues, develop
hypotheses,test solutions, and implement effective data-driven strategies
 Improve Collaboration and Communication: Enhance the ability to
work collaboratively with team members and stakeholders.
Communicate findings and insights effectively using Alteryx’s reporting
tools and visualizations to support strategic business decisions
 Explore Machine Learning and AI Integration: Gain exposure to
integrating machine learning models and artificial intelligence within
Alteryx workflows. Understand how to use Alteryx to train, validate, and
deploy predictive models to solve business problems
 Project management and Execution:Develop project management skills by
planning and executing data analytics projects from inception to completion
learn to manage timelines,resources,and deliverables effectiently using
Alteryx

12
Module 2: Alteryx Foundational Micro-Credential
1.Overview of Foundational Concepts
The Alteryx Foundational Micro-Credential is designed to provide learners
with a comprehensive understanding of the core concepts and functionalities of
Alteryx Designer.This certification focuses on essential skills required to efficiently
utilize Alteryx for data analytics. Here are the fundamental concepts covered in this
micro-credential:
Introduction to Alteryx Designer: The micro-credential begins with an overview of
Alteryx Designer, highlighting its user-friendly interface and its role in data analytics.
Learners are introduced to the workspace, including the canvas, tool palettes, and
configuration windows.
Data Input and Output: A critical aspect of data analytics is the ability to import and
export data. This section covers various data input tools that allow users to connect
to different data sources, such as Excel files, databases, and cloud services.
Additionally, it explains how to output data in various formats, ensuring seamless
integration with other systems.
Data Preparation and Blending: Data preparation is a key step in the analytics
process.This concept involves using tools to clean, filter, sort, and join datasets.
Learners gain hands- on experience with tools like Select, Filter, and Join, which
help in transforming raw data into an analysis-ready format.
Data Transformation: Transformation tools in Alteryx enable users to
manipulate and reshape data. The micro-credential covers tools such as Formula,
Multi-Row Formula, and Transpose, which are essential for creating new data
fields, performing calculations, and restructuring data layouts.
Basic Analysis Tools: Alteryx provides several tools for basic data analysis. This
section introduces tools like Summarize, which aggregates data, and the Frequency
Table, which provides descriptive statistics. These tools are foundational for
conducting preliminary data analyses
Workflow design and automation: Learners are taught how to design efficient
workflows that automate repetitive tasks. This includes understanding the
importance of workflow organization, using containers to manage complex
workflows, and scheduling workflows for automated execution

13
What Steps Are Involved in Data Preparation Processes?

Data preparation steps can vary depending on the industry or need, but typically
consists of the following:

Acquiring data: Determining what data is needed, gathering it, and


establishing consistent access to build powerful, trusted analysis

Exploring data: Evaluating the data’s quality, examining its


distribution, and analyzing the relationship between each variable to better
understand how to compose an analysis (also referred to as data profiling)

Cleansing data: Improving data quality and overall productivity by


deleting unnecessary data, removing poor quality data, or fixing inaccuracies to craft
error- proof insights

Transforming data: Formatting, orienting, aggregating, and enriching the


datasets used in an analysis to produce more meaningful insights
While data preparation processes build upon each other in a serialized fashion, it’s
not always linear. The order of these steps might shift depending on the data and
questions being asked.
It’s common to revisit a previous data preparation step as new insights are
uncovered or new data sources are integrated into the process.
The entire data preparation process can be notoriously time-intensive,
iterative, and repetitive. That’s why it’s important to ensure the individual
steps taken can be easily understood, repeated, revisited, and revised so
analysts and data scientists can spend less time preparing and more time analyzing.

Data Preparation for Machine Learning

14
Machine learning is a type of artificial intelligence where algorithms, or models, use
massive amounts of data to improve their performance. Both structured data and
unstructured data are critical for training and validating machine learning algorithms
that underpin any AI system or process. The rise of Big Data and cloud computing
have exponentially increased the use cases and applications of AI, but having a lot of
data isn’t enough to create a successful machine learning model. Raw data is hard
to integrate with the cloud and machine learning models because there are still
anomalies and missing values that make the data hard to use or result in
inaccurate models. Building accurate and trustworthy machine learning
models requires a significant amount of data preparation.
According to a survey by Anaconda, data scientists spend 45% of their
time on data preparation tasks, including loading and cleaning data. With self-
service data preparation tools, data scientists and citizen data scientists can
automate significant portions of the data preparation process to focus their time on
higher-value data-science activities

Data Preparation in the Cloud


With the rise of cloud data storage centers, including cloud data warehouses and
cloud data lakes, organizations are able to increase the accessibility and speed of
their data preparation and data analytics while also leveraging the power of the cloud
for improved security and governance. Historically, organizations stored their data in
on-premise data centers. These physical servers limit organizations’ ability to
scale their usage of data up or down on demand, cost large amounts of
money to operate, and often consume vast amounts of time, especially when
working with large datasets.

What Is Data Blending?


Data blending is the process of combining data from multiple sources to create an
actionable analytic dataset for business decision-making or for driving a specific
business process. This process allows organizations to obtain value from a variety
of sources and create deeper analyses. Data blending differs from data integration
and data warehousing in that its primary use is not to create a single version of the
truth that’s stored in data warehouses or other systems of record within an
organization. Instead, this process is conducted by a business or data analyst with
the goal of building an analytic dataset to help answer specific business questions

15
Why Is Data Blending Important?
Data blending empowers a data analyst to incorporate data of any type or any
source into their analysis for faster, deeper business insights. Combining two or
more datasets often illuminates valuable information that might otherwise not be
discovered if the data wasn’t blended — information that provides a new perspective
that might lead to better business decisions. Traditionally, analysts have relied
on VLOOKUPs, scripting, and multiple spreadsheets for constructing datasets,
but this can be clunky and time consuming. Utilizing manual processes or relying
on data scientists to build analytical datasets is increasingly ineffective — it’s not
scalable with the number of ad-hoc requests analysts receive. Data blending
building blocks speed up the process of constructing datasets and can help analysts
and business leaders get more accurate answers
In order to live at the forefront of innovation, the focus of data analysis must focus on
high- level business questions rather than the minutiae of spreadsheets and manual
SQL queries. Data blending can help analysts take full advantage of
expanding roles, as well as the expansion of data needed to make critical
business decisions

While there are many different techniques for bringing data together, from inner and
outer joins to fuzzy matching and unions, data blending boils down to four simple
steps.

Preparing Data
The first step in gathering data is to ask what information might be helpful to answer
the questions being asked. Identify pertinent datasets from various sources,
a wide array of structures or file types can be used. Each data source that is
included will need to share a common dimension in order to be combined.
The ability to transform these different types into a common structure that
allows for a meaningful blend, without manipulating the original data source, is

16
something that modern analytics technology can do in an automated and repeatable
way.

Blending Data
Combine the data from various sources and customize each join based on
the common dimension to ensure the data blending is seamless. Think about the
desired blended view and only include data that is essential to answer the questions
being asked as well as any fields that may give additional context to those answer
when an alaysis is stressed. The resulting dataset should be easy to comprehend
and explain to stake holders

3.Analysis and Reporting:


Analysis
1.Predictive Analytics:Alteryx includes a suite a predictive tools based on the R
programming language.

2.Spatial Analytics:Spatial analysis in alteryx allows users to analyze


geographic data.

3.Advanced Analytics:Beyond predictive analytics alteryx provides tools for


machine learning,optimization and statistical analysis.

Reporting
1.Report Creation: user can generate reports by combining
text,tables,charts,maps and images.

2.Data visualization:what alteryx is not primarly a data visualization tool


itintegrates well with platforms like Power Bi.

17
Module 3: Machine Learning Fundamentals Micro-Credential
1.Introduction to Machine Learning: The Alteryx Machine Learning
Fundamentals Micro-Credential is a certification designed to equip individuals with
the foundational skills necessary to perform machine learning tasks using the Alteryx
platform. This credential is suitable for data analysts, data scientists, and other
professionals who want to enhance their machine learning capabilities within
Alteryx's user-friendly enviromnent. Here's an overview of what the micro-credential
encompasses:

Objectives
The primary objective of the Alteryx Machine Leaming Fundamentals Micro-
Credential is to provide participants with a practical understanding of machine
learning concepts and the ability to implement these concepts using Alteryx tools.
The credential focuses on:

Understanding Machine Learning Concepts:


 Introduction to key machine learning principles.
 Distinguishing between supervised and unsupervised leaming.
 Exploring various types of algorithms and their applications.

Data Preparation:
 Techniques for cleaning and preprocessing data.
 Handling missing values and outliers.
 Feature engineering and selection.

Building and Evaluating Models:


 Constructing machine learning models using Alteryx tools,
 Training and testing models.
 Evaluating model performance using metrics such as accuracy, precision,
recall, and AUC-ROC curves,

Deploying Machine Learning Models:


 Integrating models into Alteryx workflows.
 Automating model retraining and deployment.
 Monitoring and updating models as necessary.

18
2.Key Algorithms and Techniques
Alteryx Machine Learing offers a robust set of tools and functionnlities that enable
users to perform a wide range of machine learning tasks. Here are some of the key
algorithms and techniques available within Alteryx Machine Learning:

Supervised Learning Algorithms


1.Linear Regression:
 Used for predicting a continuous target variable based on one or more predietor
variables
 Useful for tasks like sales forecasting and trend analysis.
2.Logistic Regression:
 Used for binary classification problems, predicting the probability of a binary
outcome,
 Commonly applied in scenarios like spam detection and customer chum
predietion.

3.Decision Trees:
 Non-parametric algorithm used for classification and regression tasks.
 Easy to interpret and visualize, suitable for understanding the decision-making
process.

4.Random Forest:
 An ensemble method that builds multiple decision trees and combines their
outputs.
 Provides robust performance and reduces the risk of overfitting.
5.Support Vector Machines (SVM):
 Used for classification tasks, especially effective in high-dimensional spaces.
 Finds the hyperplane that best separates different classes.

6.Gradient Boosting Machines (GBM):


 An ensemble technique that builds models sequentially, with each new model
correcting the errors of the previous ones.
 Includes algorithms like Boost and Light, which are known for their high
accuracy.

Unsupervised Learning Algorithms


1.K-Means Clustering:
 Partitions data into a predefined number of clusters based on feature
similarity.

19
 Useful for customer segmentation and market basket annlysis.

2.Hierarchical Clustering:
 Builds a tree-like structure of nested clusters by iteratively merging or splitting
clusters
 Helps in understanding the hierarchical relationships among data points.

3.Principal Component Analysis (PCA):


 A dimensionality reduction technique that transforms data into a set of
orthogonal components.
 Helps in reducing the complexity of data while retaining most of the variance.

Data Preprocessing Techniques


1.Data Cleaning:
 Handling missing values, outliers, and inconsistencies in the data.
 Ensures that the data is of high quality and suitable for analysis.

2.Feature Engineering:
 Creating new features from existing data to improve model performance.
 Includes techniques like polynomial fentures, interaction terms, and
logarithmic transformations.

3.Normalization and Sealing:


 Adjusting the scales of features to ensure that they contribute equally to the
model.
 Techniques include min-max scaling and standardization.

3.Implementing Machine Learning in Alteryx


With Alteryx Machine Learning, you or any business analyst can use AutoML
software to uncover insights and build optimized machine leaming models to prediect
future behavior. Alteryx Machine Learning guides you through the entire model
building process, and you can use Designer to access any data source, file,
application, or data type-including the data you already have in Designer. Alteryx
Machine Learning will also analyze your data to ensure it's ready for modelling. If the
software detects potential problems, it will provide instructions as well as notifications
at the top of columns to help you solve them. It also provides a score for your data's
health based on a few factors for each column, such as the:

20
 Fraction of missing values
 Number of outliers
 Target leakage
 Class

Along with data analysis, Alteryx Machine Learning also offers many useful
features you can use to improve your machine leaming skills and the quality of
your predictions.

Education Mode:
Education mode provides contextual guides that explain any terins you might not be
familiar with and can be toggled off and on as needed. Text explanations improve
your knowledge of machine learning terms, which will help you develop your skills as
you use Alteryx Machine Learing.

Correlation Matrix
The Correlation Matrix shows how two or more variables are related to each other,
so you can understand how one variable might affect another. The matrix intuitive
visualization with darker colors representing stronger correlations

21
Chord Diagram
The Chord Diagram shows you the connection between data points, especially when
there are a large number of columns, The more lines connecting to one data point,
the higher the correlation it has to the predicted outcome.

Model Ranking Leaderboard


Fenture engineering options let you add columns to improve the accuracy and
performance of your model. You ean add columns with a few elicks, and Alteryx
Machine Leaming will update based on your selections. When you're done adding
columns, Alteryx Machine Learning will recommend models for you to use and rank
them in a leaderboard. All you have to do is click the model to run and review it.

Predection Explanation

22
Prediction Explanations tells you how the prediction for a single row is explained by
the feature's values. This includes information for the performance of your models,
insights, and features. You can use this information Alteryx Machine Leaming to help
explain your model's results and modify the models to see how they'Il react to new
information When you're all set, you can download the graphics from Alteryx
Machine Leaming.
Downloadable file types include:
 PDF
 Image
 PowerPoint

23
Module 4: Alteryx Designer Core Certification
1.Introduction to AlteryxDesigner
Key Features of Alteryx Designer
1.Data Preparation and Blending:
Alteryx Designer excels at integrating data from multiple sources, including
databases, spreadsheets, cloud applications, and more. Users can clean, transform,
and join data from these diverse sources without needing advanced coding skills.
The platform supports a wide array of input and output formats, ensuring flexibility in
handling various data types.

2.Analytics and Reporting:


The platform provides robust tools for performing advanced analytics, including
predictive,spatial, and statistical analysis. Users can create complex analytical
workflows using a simple drag-and-drop interface. Alteryx Designer includes a
comprehensive library of pre-built tools for tasks such as regression analysis,
predictive modelling, and time series forecasting.

3.Intuitive Workflow Interface:


One of the standout features of Alteryx Designer is its visual workflow interface.
Users can build data workflows by dragging and dropping tools onto a canvas and
connecting them to define the flow of data. This approach eliminates the need for
extensive coding, making it accessible to users with little or no programming
experience.

4.Automation and Reusability:


Workflows created in Alteryx Designer can be automated to run on a schedule,
allowing for the consistent and timely execution of data processes. Additionally,
workflows can be saved, shared, and reused across different projects, promoting
efficiency and collaboration within teams.

5.Integrntion and Extensibility:


Alteryx Designer integrates seamlessly with other Alteryx products and third-party
applications. It supports APIs and SDKs, enabling users to extend its functionality
and incorporate custom tools or external scripts. This makes it a versatile tool that
can fit into a wide range of data environments.

6.Visualization and Insights:


The platform includes tools for creating data visualizations that help in interpreting
and communicating insights. Users can generate interactive dashboards and reports,
which can be shared with stakeholders to support data-driven decision-making.

2.Core Concepts and Tools

24
Data Preparation:
 Intro to Data Analytics
 Formatting Data
 Sorting Data
 Filtering Data
 Sampling Data

Intro Data Analytics:


Data analytics is the magic that transforms piles of data into valuable insights. It
involves cleaning, organizing, and analysing information to uncover hidden patters
and trends. This knowledge empowers businesses to understand their customers,
optimize operations, and make data-driven decisions for success.

Formatting Data:
SELECT Tool:
Designer makes it very easy to change the datatype at any point in the workflow,The
Select tool displays the columns in your data set. Use the Select tool to change the
datatype of those columns and reorder, rename, and drop columns from the
DataStream

Sorting Data:
SORT Tool:
Organize your large data sets by sorting information in ascending descending order.
The Sort tool's configuration window is simple but powerful. Select the column to be
sorted, then the order for sorting. You can select multiple columns for sorting in a
single tool. The Sort tool will work on the first column listed in its configuration, then
move onto the next order.

Filtering Data:
FILTER Tool:
As you start to read in more data into Designer, you will continue to increase the
amount of data in your workflow. That's great because you have it all in one place,
but it can get overwhelming when you are seeking specific information.
An extremely useful tool for dividing your data sets is the Filter Tool. Using the Filter
tool, you can create logical statements. The incoming data set is evaluated against
that criteria and output to either the True anchor or the False anchor. The basic filter
option helps to construct your criteria, but you can also use the custom filter option to
create more complex statements

25
Sampling Data:
SAMPLE Tool:
After sorting data, you may be interested in a subset of the data. The Sample tool
provides options for the data. Use the radio button to select one of the configumtion
options, then set the value for N. The Sample tool's output will only include the
specified data and drop the rest.

Combining and Cleansing Data:


Basic Data Combining and Cleansing:
 Separating Data
 Removing Duplicates
 Blending Data
 Joining Data
Separating Data
TEXT TO COLUMNS Tool
Sometimes, information that should be separate is combined into a single cell,
making it difficult to filter, sort, or sample. In those instances, it makes sense to split
the information into columns or rows. If the values are separated by a delimiter, the
Text to Columns tool is a quick way to split data into columns or rows. A delimiter is a
character that divides values. Commas, spaces, tabs, pipes, and many other
characters can be used as a delimiter.

Removing Duplicates:
UNIQUE Tool:
Another common need when analysing data is finding the unique values within a
data set. The Unique tool will divide the data set into unique values and duplicate
values. Selecting a single column in the Unique tool's configuration window will
evaluate values in that column only.Selecting multiple columns will evaluate the
combination of values and determine if the combination is unique.

Blending Data:
UNION Tool
When inputting more than one data set into a single workflow, you will likely need to
combine those data sets. The Union tool combines data sets vertically by name, by
position, or manual configuration to align columns of data. The Union tool's input
anchor accepts multiple inputs and even includes an option to set a specific output
order.

JOIN Tool

26
If you need to combine data horizontally, the JOIN tool can utilize a common field or
combine by position. If both incoming data sets share a common column, joining on
that column ean be used to match rows of data. Alternatively, if you are confident
that the row order of the data sets match, you can join by record position. This tool is
very powerful and makes it easy to work with multiple data sources or combine
disparate data streams.

Core Topics:
 Path to Core Certification:
 Date Time
 Rows vs Columns
 Functions
 Expressions
 Summarizing Data
DATETIME Tool:
Similar to the way some values need to be split in order to be as useful as possible,
date time values need to be formatted properly in order to be most useful. Designer
contains functions which can calculate time intervals without needing extra
conversion ealeulations between units. In order to use those funetions, date time
values need to be formatted into a specific order. The Select tool is a go-to for
changing datatypes, but the requirement that the characters be arranged in the
corect order is something the Select tool cannot achieve. The Date Time tool,
however, can easily convert string data into properly formatted date time data and
vice versa. Simply select the direction of conversion and the format of the string
value.

Rows vs Columns:
An important concept to keep in mind when using Designer is that rows are not
treated the same as columns. Unlike some spreadsheet programs where you can
specify an array, values are tied to the headers above them in Designer. This is why
you will see some tools which require data to be oriented in a particular way in order
to function. Rows are also referred to as records and columns can be referred to as
fields.

Functions:
Altering data is an integral part of Designer. One of the most powerful ways to alter
your data is by applying functions, Designer includes a Function Library which is
categorized to help you find the one you need. Some functions will require that the
data be in a specific datatype, but others are agnostic. Regardless of which function
you need you can use it in any tool that has an expression editor. The Expression
Editor is where you will construct your function by selecting the function you want
use and properly formatting it into a statement. All expression editors have a tab with
the full Function Library, as well as a tab containing the "columns and constants".

27
You can also choose which column to overwrite or create a new column with a name
and datatype of your choosing.

Expressions:
FORMULA Tool:
While there are many tools which support the use of functions, the most common is
the formula tool. Using the formula tool, you can utilize values from other columns to
perform calculations, categorize, convert datatypes, formnat values, and much more.
The only limitation is that values in the statement are limited to the current row being
processed.

28
ALTERYX SPARKED:

Alteryx SparkED is a free program that offers data analytics education to learners of
all levels, including students, career changers, and military veterans:
 Learners: Receive a free license to Alteryx Designer, access to online
courses, and opportunities to earn certifications.
 Educators: Get free curriculum materials and real-world data sets.
 Customers: Collaborate with SparkED on datathons and career outreach.

SparkED has helped over 170,000 learners in more than 50 countries develop in-
demand data analytics skills. The program can be integrated into many fields of
study, including finance, accounting, marketing, and supply chain.

The Alteryx SparkED program offers a variety of benefits to learners interested in


data analytics, including:
 Free software licenses: Learners receive free software licenses to use.

29
 Teaching tools: Learners receive teaching tools to help them learn.
 Learning experiences: Learners receive learning experiences to help them
solve problems with data.
 Financial assistance: Learners from diverse backgrounds can receive
financial assistance.
 Career experiences: Learners can gain career experiences.
Alteryx SparkED is a free data analytics education program from Alteryx, a data
analytics automation platform.
Alteryx is a data analytics automation platform that can help users collect, prepare,
and blend data. Some benefits of using Alteryx include:
 Increased efficiency: Alteryx's automation capabilities can help users
streamline data workflows
 Improved data quality: Alteryx's data preparation and blending capabilities
can help improve data quality
 Cost savings: Alteryx can help users save money
 Answers to complex business questions: Alteryx can help users find
answers to complex business questions

HISTORY OF ALTERYX:
Olivia Duane Adams (Libby) is a co-founder of Alteryx and the chief advocacy officer
(CAO). She is one of a few female founders to take a technology company public.
The other co-founders of Alteryx are Dean Stoecker, who is the executive chairman,
and Ned Harding, who was the original CTO.
Alteryx is a company that makes data analytics tools for businesses. The company
was launched in 2010 by Stoecker and Harding, who left their jobs in 1997 to start
Spatial Re-Engineering Consultants (SRC LLC).
Alteryx SparkED is an education program that offers free software licenses, teaching
tools, and learning experiences. The program is designed to help learners
understand data, question data, and solve with data.

30
Alteryx SparkED is a free program that helps learners acquire data analytics skills
and prepare for tech careers. It offers resources for:
 Learners: Free software licenses, teaching tools, and learning experiences to
help learners question, understand, and solve with data
 Educators: Teaching materials and a connection with other educators
 Customers: Opportunities to collaborate with SparkED on datathons and
career outreach
 Students: Opportunities to acquire Alteryx Micro-Credentials and
Certifications, network, and discover opportunities with potential employers
 Scholars: Financial assistance, career experiences, and other engagements
SparkED can be used in many fields of study, including: Finance, Accounting,
Marketing, and Supply chain.
SparkED has helped over 170,000 learners acquire data analytic skills. It partners
with higher education systems to provide a complete education package. SparkED
also partners with online-learning platforms like Datacamp and Udacity.
A data analytics process automation virtual internship helps participants learn how to
automate data processes, perform data analysis, and apply machine learning
techniques. The internship provides hands-on experience with data preparation,
blending, and cleansing, as well as an understanding of how to integrate various
data sources. Participants also learn the fundamentals of predictive analytics,
including statistical modeling, machine learning, and advanced data visualization.
Data analytics automation can help businesses in many ways, including:
 Faster results
Automated programs can process data faster than humans, which can help
businesses save time and money.
 Increased efficiency
Automation can help businesses increase efficiency by allowing employees to spend
more time on other tasks.
 Improved data accuracy
Automation can help businesses improve the accuracy of their data.
 Better scalability
Automation can help businesses scale their data operations and expand new ideas
quickly.

Alteryx SparkED Virtual Internship is an exciting opportunity for students and young
professionals to gain hands-on experience in data analytics and science. SparkED is

31
Alteryx's analytics education program, designed to help learners develop data
analytic skills and kickstart their careers.¹
Through the virtual internship, participants can expect to work on real-world projects,
collaborate with Alteryx experts, and develop skills in data preparation, analysis, and
visualization. The program aims to provide a comprehensive learning experience,
covering various aspects of data science, including machine learning, predictive
analytics, and data storytelling.
Some of the key benefits of the Alteryx SparkED Virtual Internship include:
- Hands-on experience: Work on real-world projects and develop practical skills in
data analytics and science.
- Mentorship: Collaborate with Alteryx experts and receive guidance and feedback on
your projects.
- Career development: Enhance your career prospects by developing in-demand
data analytic skills.
- Networking opportunities: Connect with like-minded professionals and Alteryx
experts, potentially leading to valuable connections and job opportunities.
To learn more about the Alteryx SparkED Virtual Internship and how to apply, I
recommend visiting the Alteryx website and exploring the SparkED program in more
detail.
The Alteryx SparkED Virtual Internship Program is a remote internship opportunity
offered by Alteryx, a data analytics software company. The program is designed to
provide students with hands-on experience in data analytics and process automation
using Alteryx tools.
Key Features:
* Focus: Data analytics and process automation using Alteryx software.
* Format: Virtual internship, allowing participation from anywhere.
* Duration: Typically lasts for a specific period, such as a semester or summer.
* Learning: Students learn to use Alteryx tools to create data workflows, manipulate
data, and automate tasks.
* Projects: Interns work on real-world projects to apply their skills and gain practical
experience.
* Mentorship: Students often receive guidance and mentorship from Alteryx
professionals.
Benefits for Participants:
* Skill Development: Gain valuable skills in data analysis, process automation, and
Alteryx software.

32
* Real-World Experience: Work on practical projects that simulate real-world
business challenges.
* Portfolio Enhancement: Build a strong portfolio to showcase your skills to potential
employers.
* Networking: Connect with Alteryx professionals and other interns in the field.
* Career Advancement: Increase your chances of landing a job in data analytics or a
related field.
Eligibility:
* Typically open to students pursuing degrees in relevant fields like computer
science, data science, business analytics, or related disciplines.
* May have specific requirements regarding academic standing or prior experience.
Overall, the Alteryx SparkED Virtual Internship Program offers a valuable opportunity
for students to develop in-demand data analytics skills, gain practical experience,
and enhance their career prospects.
The Alteryx SparkED Data Analytics Process Automation Virtual Internship is a
program designed to equip students with the skills and knowledge needed to
succeed in the field of data analytics. The program provides participants with hands-
on experience using Alteryx Designer, a powerful tool for data analysis and
automation.
The internship curriculum covers a wide range of topics, including data preparation,
transformation, analysis, and visualization. Participants will learn how to use Alteryx
Designer to create complex data workflows, automate repetitive tasks, and generate
actionable insights from data.
The internship also provides participants with the opportunity to earn Alteryx
Designer certification, which can enhance their credibility as data analysts and
increase their job prospects.

33
Introduction to Data Analytics
Data analytics is the process of using data to solve problems and find insights. It
involves collecting, organizing, and transforming data to make predictions, draw
conclusions, and inform decisions. Data analytics can be used to improve business
processes, foster growth, and improve decision-making.
Here are some things to know about data analytics:
 What it involves
Data analytics uses a variety of tools, technologies, and processes to analyze
data. It can include math, statistics, computer science, and other techniques.
 What it's used for
Data analytics can help businesses understand their performance, customer
behavior, and market trends. It can also help companies make better decisions by
using the data they generate from log files, web servers, social media, and more.
 Soft skills
Some soft skills that are useful for data analytics include:
 Analytical thinking and problem-solving
 Strong communication and presentation skills
 Attention to detail
 Critical thinking
 Adaptability

Data analytics focuses on analyzing past data to derive insights and make decisions
based on historical trends. On the other hand, data science encompasses a broader
scope, including data analysis, machine learning, predictive modelling, and more, to
solve complex problems and uncover new insights from data

The four types of analytics maturity


 Descriptive
 Diagnostic
 Predictive
 Prescriptive analytics

34
We need data analytics for the purpose of Business data analytics collects,
processes, and analyzes data to help make smart decisions. Smart Decision-
Making: By looking at past data, businesses can predict what's coming next, helping
them act before problems pop up.

The scope of data analytics


It includes different areas where data analysis can be used to gain valuable insights
and guide decision-making. Key areas include:
1. Business Intelligence: Data analytics helps businesses track performance, spot
trends, and predict future outcomes.
2. Healthcare: In healthcare, data analytics improves patient care, optimizes
operations, and manages costs. It can predict disease outbreaks, personalize
treatment plans, and enhance patient care.
3. Finance: Financial institutions use data analytics for assessing risk, detecting
fraud, and managing investments. It helps monitor transactions, spot unusual
activity, and make investment decisions.
35
4. Marketing: Marketers use data analytics to understand consumer preferences,
measure campaign success, and target specific audiences. This leads to more
effective marketing strategies and better returns on investment.
5. Supply Chain Management: Data analytics helps optimize supply chain
operations by predicting demand, managing inventory, and improving logistics. This
helps reduce costs and increase efficiency.
6. Education: In education, data analytics tracks student performance, personalizes
learning, and improves outcomes. It also aids in decision-making and resource
management for educational institutions.
7. Sports: Teams and coaches use data analytics to improve performance, plan
strategies, and monitor player statistics. This helps in gaining a competitive edge and
improving results.

Data analysis tools are software programs, applications, and other aids that
professionals use to analyze data sets in ways that characterize the big picture of the
information and provide usable information for meaningful insights, predictions, and
decision-making purposes.

Data analysis tools


 Data mining: Data mining helps users find the key characteristics of their
data so they can apply this knowledge to real-world problems, and data
mining software helps automate this process by looking for patterns and
trends within the data. Three common data mining software you may benefit
from include the following.
 Data visualization: Data visualization is a powerful way to transform raw data
into meaningful and comprehensive visual representations. It provides us with
a way to understand complex data patterns, trends, and insights that people
might miss in text-based data. Data visualization tools help professionals
streamline the data visualization process. You can use these tools to visually
manipulate data and create basic to advanced graphical representations.
 Business intelligence: Data analysis is a powerful tool for understanding the
story data tells and using it to make informed decisions. Businesses can use
these insights to enhance their performance, improve customer satisfaction,
gain a competitive advantage, and benefit the overall health of their company.
Whether you are part of a small or large organization, learning how to
effectively utilize data analytics can help you take advantage of the wide
range of data-driven benefits.

36
1.RapidMiner

Primary use: Data mining

o RapidMiner is a comprehensive package for data mining and model


development. This platform allows professionals to work with data at
stages, including preparation, visualization, and review. This can be
beneficial for professionals who have data that isn’t in raw format or that
they have mined in the past.

2. Orange
Primary use: Data mining
o Orange is a package renowned for data visualization and analysis,
especially appreciated for its user-friendly, color-coordinated interface.
You can find a comprehensive selection of color-coded widgets for
functions like data input, cleaning, visualization, regression, and
clustering, which makes it a good choice for beginners or smaller
projects.

3. KNIME
Primary use: Data mining
o KNIME, short for KoNstanz Information MinEr, is a free and open-
source data cleaning and analysis tool that makes data mining
accessible even if you are a beginner. Along with data cleaning and
analysis software, KNIME has specialized algorithms for areas like
sentiment analysis and social network analysis.

4. Tableau
Primary use: Data visualization and business intelligence
o Tableau stands out as a leading data visualization software, widely
utilized in business analytics and intelligence.
Tableau is a popular data visualization tool for its easy-to-use interface and
powerful capabilities. Its software can connect with hundreds of different data
sources and manipulate the information in many different visualization types

5. Google Charts
Primary use: Data visualization
o Google Charts is a free online tool that excels at producing a wide array
of interactive and engaging data visualizations. Its design caters to user-
friendliness, offering a comprehensive selection of pre-set chart types
that can be embedded into web pages or applications

37
6. Datawrapper
Primary use: Data visualization
o Datawrapper is a tool primarily designed for creating online visuals, such
as charts and maps. Initially conceived for journalists reporting news
stories, its versatility makes it suitable for any professional in charge of
website management.

7. Microsoft Excel and Power BI


Primary use: Business intelligence
o Microsoft Excel, fundamentally a spreadsheet software, also has
noteworthy data analytics capabilities. Because of the wide enterprise-
level adoption of Microsoft products, many businesses find they already
have access to it.

8. Qlik
Primary use: Business intelligence
o Qlik is a global company designed to help businesses utilize data for
decision-making and problem-solving. It provides comprehensive, real-
time data integration and analytics solutions to turn data into valuable
insights.

9. Google Analytics
Primary use: Business intelligence
o Google Analytics is a tool that helps businesses understand how people
interact with their websites and apps. To use it, you add a
special Javascript code to your web pages. This code collects
information when someone visits your website, like which pages they
see, what device they’re using, and how they found your site.

10. Spotfire
Primary use: Business intelligence
o TIBCO Spotfire is a user-friendly platform that transforms data into
actionable insights. It allows you to analyze historical and real-time data,
predict trends, and visualize results in a single, scalable platform.

38
APA isn’t RPA or Business Process Automation (BPA), and it isn’t a data tool, either.
It’s an automated, self-service data analytics platform that focuses on business
outcomes first while empowering everyone in your organization to adopt a culture of
analytics. It allows you and anyone else in your organization to perform advanced
analytics whether or not they know any code and whether or not they’re trained in
data science. Analytic Process Automation (APA) describes a unified platform for
self-service data analytics that makes data easily available and accessible to
everyone in your organization, optimizes and automates data analytics and data
science processes, and empowers your entire organization to develop skills and
make informed decisions using machine learning (ML), artificial intelligence (AI), and
predictive and prescriptive analytics.
Think of Analytic Process Automation as an all-in-one supercharged data analytics
machine. If all you want to do is clean up some data, you can do that. If you want to
merge multiple different data types, you can do that, too. If you want to automate
month-long tedious and complex data tasks and push those outputs to decision-
makers and business processes, you got it.

39
If you want advanced analytics that provide forward-looking insights you can use to
produce better business outcomes, improve revenue, reduce spending, and
transform your organization from the ground up, then you need it.

Top Benefits of Automated Data Analytics


 Increase speed of onboarding and processing data. ...
 Identify data observability issues. ...
 Accelerate time-to-insights. ...
 Save time and costs. ...
 Improve efficiency in decision-making. ...
 Reduce potential of errors. ...
 Improve productivity and innovation. ...
 Data quality and integrity.

Data analytics process automation can have many advantages, including:


 Time savings: Automated programs can process data faster than humans,
allowing you to get results faster.
 Improved accuracy: Automation can help eliminate human errors.
 Increased efficiency: Automation can free up employees' time for more
valuable responsibilities.
 Better scalability: Automation can help you handle large amounts of data.
 More frequent insights: Automation can help you get insights more
frequently.
 Data cleansing and enrichment: Automation can help you identify and
correct errors in your data, and add relevant context or information to enhance
the value of your data.
 Self-service data analytics: Automation can make data easily available and
accessible to everyone in your organization.
Other benefits of automation include: higher productivity, reliability, availability,
increased performance, and reduced operating costs.

Here are some disadvantages of data analytics process automation:


 Initial setup: Automated data processing systems can be complex to set up
and may require a significant investment in technology and expertise.

40
 Integration: Integrating automated systems with legacy systems can be
challenging.
 Data security: Automated systems may be vulnerable to data security risks.
 Data quality: Poor data quality, such as missing fields or duplicated data, can
lead to inaccurate analytics and poor decision-making.
 Adaptability: Automated tools may struggle with dynamic websites and may
not be able to adapt to changing requirements.
 Contextual understanding: Automated tools may miss nuanced information.
 Monitoring: Continuous monitoring is needed to ensure data accuracy.
Other disadvantages of automation include: Job displacement and unemployment,
Reduced human interaction and customer experience, and Dependency on
technology and loss of human skills.

Data analytics automation is a technique that uses computer systems and processes
to perform analytical tasks with little or no human intervention. It can be useful for
many reasons, including:
 Time and money savings: Automation can save time and money by
eliminating manual, repetitive tasks.
 Improved accuracy: Automation can improve the accuracy of data.
 Faster insights: Automation can provide insights faster.
 Frees up time: Automation can free up time for employees to focus on
higher-level tasks, such as interpreting automated data.
 Better scalability: Automation can improve scalability.
 Competitive edge: Automation can help an organization gain a competitive
edge by leading to new products and tweaks to existing ones.
Automation can be used for a variety of tasks, including data discovery, preparation,
replication, and warehouse maintenance. It can be especially useful for big data.
Automation can range from simple scripts to full-service tools that can perform
exploratory data analysis, statistical analysis, and model selection.

41
42
Machine learning (ML) can automate many steps in the data analytics process,
including:
 Data cleansing: ML algorithms can automatically detect and fix errors,
inconsistencies, or missing data.
 Data transformation: ML models can automatically transform raw data into a
more usable format.
 Feature engineering: ML can automate feature selection and engineering,
which are essential for building predictive models.
 Predictive analytics: ML models can identify patterns, trends, and
correlations in data to help predict future trends or events.
 Reducing bias: ML models can help reduce unintended bias.
 Network architecture search: Some automation of network architecture
searches is possible, such as with the neural architecture search (NAS)
method.
Data analytics automation can help businesses make better, more informed
decisions by analyzing large sets of data quickly and efficiently. Here are some other
benefits of data analytics automation:
 Faster work: Employees can use their time on high-value work.
 Improved customer satisfaction: Data analytics automation can help
improve customer satisfaction.
 Free up time: Businesses can focus on creating and selling products and
services.
Some of the most common scripting languages for process automation are Python,
Ruby, and PowerShell. Python is a versatile and easy-to-learn language that has
many libraries and frameworks for web development, data analysis, and machine
learning.

What it might involve:


 Data Extraction and Preparation: You might work with tools to gather data
from various sources (databases, APIs, spreadsheets) and clean it to ensure
accuracy and consistency. This could involve using scripting languages like
Python with libraries like Pandas.
 Data Analysis and Visualization: You'll likely analyze data to identify trends,
patterns, and insights. This might involve statistical analysis, predictive
modeling, and creating dashboards or reports using tools like Tableau or
Power BI.
 Process Automation: You'll learn how to automate repetitive tasks using
Robotic Process Automation (RPA) software like UiPath, Automation

43
Anywhere, or Blue Prism. This could include automating data entry, report
generation, web scraping, or other workflows.
 Developing Automation Scripts: Depending on the internship, you might
write scripts or code to automate more complex processes. This could involve
using Python, Java, or other programming languages.
 Real-World Applications: You might work on projects related to customer
service, finance, human resources, or supply chain management, applying
your skills to solve real business problems.
Where to find these internships:
 Company Websites: Check the career pages of companies known for data
analytics and automation (e.g., Accenture, Deloitte, IBM, UiPath, Automation
Anywhere).
 Job Boards: Search for "data analytics process automation internship" or
related terms on sites like Indeed, LinkedIn, Glassdoor, and Monster.
 Internship Platforms: Explore platforms like Internshala, LetsIntern (India-
specific), WayUp, and Chegg Internships.
 Networking: Attend online industry events, webinars, and connect with
professionals on LinkedIn to learn about potential opportunities.
Tips for your search:
 Develop Relevant Skills: Build a foundation in data analysis (Excel, SQL,
Python, R) and familiarize yourself with basic automation concepts.
 Highlight Your Interest: In your resume and cover letter, clearly express your
enthusiasm for data analytics and process automation.
 Portfolio Projects: Create personal projects that demonstrate your abilities
(e.g., automate a task in your daily life, analyze a public dataset).
 Practice Your Skills: Use online resources and practice platforms (like
HackerRank or LeetCode) to improve your data analysis and coding skills.

PROCESS AUTOMATION CODE:

# Define the file name


file_name = "automated_file.txt"

# Define the text to write


text = "Hello! This is an automated file with text."

44
# Open the file in write mode (this will create the file if it doesn't exist)
with open(file_name, 'w') as file:
file.write(text)
print(f"File '{file_name}' has been created and the message has been
written.")
Output
File 'automated_file.txt' has been created and the message has been written.
After executing this code, you will find the file with text at the same location/directory
as your Python file.

PYTHON CODE:
LOGISTIC REGRESSION USING MACHINE LEARNING
# Logistic Regression
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset


dataset = pd.read_csv('poke2.csv')
X = dataset.iloc[:, [4, 6]].values
y = dataset.iloc[:, 15].values
from sklearn.preprocessing import LabelEncoder,OneHotEncoder
labelencoder_y=LabelEncoder()
y=labelencoder_y.fit_transform(y)

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.5, random_state =
0)

45
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Fitting Logistic Regression to the Training set


from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, y_train)

# Predicting the Test set results


y_pred = classifier.predict(X_test)

# Making the Confusion Matrix


from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
actual=y_test
predicted=y_pred
results=confusion_matrix(actual,predicted)
print('Confusion matrix:')
print(results)
print('Accuracy Score:',accuracy_score(actual,predicted))
print('Report:')

# Visualising the Training set results

46
from matplotlib.colors import ListedColormap
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max()
+ 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(),
X2.ravel()]).T).reshape(X1.shape),
alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Logistic Regression (Training set)')
plt.xlabel('Loan Amount')
plt.ylabel('Income')
plt.legend()
plt.show()

# Visualising the Test set results


from matplotlib.colors import ListedColormap
X_set, y_set = X_test, y_test X1, X2 = np.meshgrid(np.arange(start = X_set[:,
0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01), np.arange(start = X_set[:,
1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(),
X2.ravel()]).T).reshape(X1.shape),
alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)

47
plt.title('Logistic Regression (Test set)')
plt.xlabel('Loan Amount')
plt.ylabel('Income')
plt.legend()
plt.show()

OUTPUT
runfile('C:/Users/TEJASWINI/OneDrive/Documents/project/logistic2.py',
wdir='C:/Users/TEJASWINI/OneDrive/Documents/project')
Confusion matrix:
[[ 0 0 0 14 0 0 0 1]
[ 0 0 0 1 0 0 0 0]
[ 0 0 0 8 0 0 0 0]
[ 0 0 0 223 0 0 0 0]
[ 0 0 0 11 0 0 0 0]
[ 0 0 0 52 0 0 0 0]
[ 0 0 0 12 0 0 0 1]
[ 0 0 0 21 0 0 0 17]]

Accuracy Score: 0.6648199445983379


APPLICATION PROGRAM FILE HAS BEEN RUN IN ANACONDA
NAVIGATOR-SPYDER

48
49
You may make your HTML page for Data Analytics professionals completely from
scratch, even if you do not have experience with coding or design. The template
library makes it even simpler to visualize and set up your landing page .

HTML CODE:
DATA ANALYTICS AUTOMATION DASHBOARD

<!DOCTYPE html>
<html>
<head>
<title>Data Analytics Automation Dashboard</title>
<style>
body {
font-family: Arial, sans-serif;
}

.container {
width: 80%;
margin: 40px auto;
padding: 20px;
background-color: #f9f9f9;
border: 1px solid #ddd;
box-shadow: 0 0 10px rgba(0,0,0,0.1);
}

.header {
background-color: #333;
color: #fff;
padding: 10px;
text-align: center;
}

50
.section {
margin-bottom: 20px;
}

.section-header {
background-color: #f0f0f0;
padding: 10px;
border-bottom: 1px solid #ddd;
}

.button {
background-color: #4CAF50;
color: #fff;
padding: 10px 20px;
border: none;
border-radius: 5px;
cursor: pointer;
}

.button:hover {
background-color: #3e8e41;
}
</style>
</head>
<body>
<div class="container">
<div class="header">
<h1>Data Analytics Automation Dashboard</h1>
</div>

51
<div class="section">
<div class="section-header">
<h2>Data Source Selection</h2>
</div>
<form>
<label for="data-source">Select Data Source:</label>
<select id="data-source" name="data-source">
<option value="csv">CSV File</option>
<option value="database">Database</option>
<option value="api">API</option>
</select>
<button class="button">Submit</button>
</form>
</div>
<div class="section">
<div class="section-header">
<h2>Data Processing</h2>
</div>
<form>
<label for="data-processing">Select Data Processing
Task:</label>
<select id="data-processing" name="data-processing">
<option value="cleaning">Data Cleaning</option>
<option value="transformation">Data
Transformation</option>
<option value="analysis">Data Analysis</option>
</select>
<button class="button">Submit</button>
</form>
</div>
<div class="section">

52
<div class="section-header">
<h2>Data Visualization</h2>
</div>
<form>
<label for="data-visualization">Select Data Visualization
Type:</label>
<select id="data-visualization" name="data-
visualization">
<option value="bar-chart">Bar Chart</option>
<option value="line-chart">Line Chart</option>
<option value="scatter-plot">Scatter Plot</option>
</select>
<button class="button">Submit</button>
</form>
</div>
</div>
</body>
</html>

OUTPUT

53
CERTIFICATION COMPLETION OF BADGES

54

You might also like