Internshipdocument
Internshipdocument
REPORT ON
SUMMER INTERNSHIP
FOR
IV YEAR-I SEM
Submitted in partial fulfilment for the award of certificate of the degree of
BACHELOR OF TECHNOLOGY
IN
1
ELURU COLLEGE OF ENGINEERING AND TECHNOLOGY(JNTUK)
DUGGIRALA(V).PEDAVEGI(M).ELURU-534004
Affiliated to JNTUK,Kakinada & Approved By AICTE-New Delhi
Department of Computer Science and Engineering
CERTIFICATE
This is to certify that the Summer Internship Report entitled “Alteryx Sparked Data
Analytics Process Automation Virtual Internship” being submitted in partial
fulfilment for the award of the degree Bachelor of Technology in Department of
Computer Science and Engineering to the Jawaharlal Nehru Technological
University Kakinada is a record of Bonafide work carried out by PALLAPOTHU
TEJASWINI(21JD1A0584)
EXTERNAL EXAMINER
2
DECLARATION
I here by declare that the Summer Internship work entitled “Data Analytics Process
Automation Virtual Internship”Submitted to JNTU Kakinada is a record of the original
work I did.This Summer Internship Work is submitted in partial fulfilment for the
degree of Bachelor of Technology in Specialization .The results embedded in this
thesis have not been submitted to anyother university or Institute for the award of
any degree.
PALLAPOTHU TEJASWINI
(21JD1A0584)
3
PROGRAM BOOK FOR
SUMMER INTERNSHIP
Name of the Student : PALLAPOTHU TEJASWINI
Name of the College : ELURU COLLEGE OF ENGINEERING & TECHNOLOGY
Period of Internship:
4
5
ELURU COLLEGE OF ENGINEERING & TECHNOLOGY
Department of Computer Science
VISION-MISSION-PEOs
Department DM1: Inspire students to become self motivated & problem solving
Mission individuals.
6
PROGRAM OUTCOMES(POs)
7
INTERNSHIP LOG
Internship with: AICTE
Duration: [23/04/2024 – 25/06/2024]
8
30 21-05-2024 Doubts Clearing Session – 01
31 22-05-2024 Core Activities and Tasks
32 23-05-2024 Variables and Data types
33 24-05-2024 Submission Of Finished Task – 01
34 25-05-2024 Daily Test – 07
35 26-05-2024 Daily Test - 08
36 27-05-2024 Control Flow Activites
37 28-05-2024 UI automation
38 29-05-2024 Assignment – 04
39 30-05-2024 Data Scrapping and Data Extraction
40 31-05-2024 Data Analytics Process Automation Briefing Session – 03
41 01-06-2024 Data Analytics Process Automation Labs
42 02-06-2024 Doubts Clearing Session – 02
43 03-06-2024 Data Analytics Process Automation orchestrator – 02
44 04-06-2024 Automation Projects and Challenges In Implementation
45 05-06-2024 Daily Test – 09
46 06-06-2024 Daily Test – 10
47 07-06-2024 Desing and Development Of Process
48 08-06-2024 Testing and debugging
49 09-06-2024 Submission Of Finished Task – 02
50 10-06-2024 Integration With Other Systems
51 11-06-2024 Submission Of Finished Task – 03
52 12-06-2024 Assignment – 05
53 13-06-2024 Doubts Clearing Session – 03
54 14-06-2024 Common Challenges in Data Analytics Process Automation Projects
55 15-06-2024 Solutions and Best Practices
56 16-06-2024 Data Analytics Process Automation Briefing Session – 04
57 17-06-2024 Case Study
58 18-06-2024 Internship Conclusion and Recommendations
59 19-06-2024 Key Takeaways
60 20-06-2024 Application Development
61 21-06-2024 Application Processing
62 22-06-2024 Final Assessment
63 23-06-2024 Process of Certification
64 24-06-2024 Career Paths In Data Analytics Process Automation
9
Module contents Page number
Module 1 Introduction to Data Analytics process 11-12
automation virtual internship
1.overview
2.objectives
Module 2 Alteryx Foundation Micro credential 13-17
1.overview of foundational concepts
2.data preparation and blending
3. analysis and reporting
Module 3 Machine learning fundamentals Micro 18-23
Credential
1.Introduction to machine learning
2.Key algorithms and techniques
3.Implementing machine learning in
Alteryx
Module 4 Alteryx Designer core certification 24-28
1.Introduction to Alteryx designer
2.core concepts and tools
Introduction to Alteryx Sparked 29-33
1.History
2.Needs and uses
Introduction to Data Anlaytics 34-42
Machine Learning 43-49
HTML code 50-53
Conclusion 54
10
Module 1: Introduction to Data Analytics
The internship was designed to equip participants with essential skills in data
analytics, focusing on automation and leveraging the capabilities of Alteryx
Designer. Throughout the program, several core courses were completed, including
Alteryx Designer Core, Micro Fundamentals, and Machine Learning Core Designer
courses. These courses collectively built a strong foundation in data analytics and
process automation, paving the way for proficiency in handling complex data
workflows and analytics tasks using Alteryx.
After completing this training, you should be able to:
Understand the fundamental concepts and capabilities of Alteryx Designer.
Develop and automate data workflows using Alteryx tools.
Apply micro-level data processing techniques to enhance data quality and
analytics.
Utilize machine learning models within Alteryx for advanced data analysis and
predictive modeling
Integrate various data sources and perform comprehensive data blending and
preparation.
Implement best practices in data analytics and process automation to improve
1.Overview
The Alteryx Data Analytics Automation Virtual Internship offers an immersive
experiencedesigned to equip participants with hands-on skills in data analytics and
automation using theAlteryx platform. This program is ideal for individuals looking to
enhance their capabilitiesin data manipulation, analysis, and workflow automation,
preparing them for a successfulcareer in data analytics
2.Objectives
Develop Advanced Data Preparation Skills: Gain proficiency in
using AlteryxDesigner to clean, transform, and prepare data for analysis. This
includes masteringvarious tools and functions within Alteryx to automate data
workflows, ensuring dataaccuracy, and minimizing manual intervention.
Master Data Integration Techniques: Learn to integrate data from diverse
sources such as databases, APIs, and flat files. The objective is
to streamline dataconsolidation processes, making it easier to combine,
enrich, and leverage data for comprehensive analysis
Enhance Analytical Capabilities: Utilize Alteryx’ s robust analytical
tools toperform complex data analyses. This includes statistical analysis,
predictive modeling,and spatial analysis to derive actionable insights and
support data-driven decision-making processes.
11
Automate Data Workflows: Focus on automating repetitive and time-
consumingdata tasks using Alteryx. Develop skills to create efficient
workflows that can bescheduled and monitored, reducing the time
required for data preparation and analysis
Understand Data Governance and Security: Learn best practices
for datagovernance within the Alteryx platform. This includes understanding
how to manageuser access, ensure data quality, and comply with
data security standards and regulations.
Develop Problem-Solving Skills: Apply critical thinking and problem-solving
skills to real-world business scenarios. Use Alteryx to identify issues, develop
hypotheses,test solutions, and implement effective data-driven strategies
Improve Collaboration and Communication: Enhance the ability to
work collaboratively with team members and stakeholders.
Communicate findings and insights effectively using Alteryx’s reporting
tools and visualizations to support strategic business decisions
Explore Machine Learning and AI Integration: Gain exposure to
integrating machine learning models and artificial intelligence within
Alteryx workflows. Understand how to use Alteryx to train, validate, and
deploy predictive models to solve business problems
Project management and Execution:Develop project management skills by
planning and executing data analytics projects from inception to completion
learn to manage timelines,resources,and deliverables effectiently using
Alteryx
12
Module 2: Alteryx Foundational Micro-Credential
1.Overview of Foundational Concepts
The Alteryx Foundational Micro-Credential is designed to provide learners
with a comprehensive understanding of the core concepts and functionalities of
Alteryx Designer.This certification focuses on essential skills required to efficiently
utilize Alteryx for data analytics. Here are the fundamental concepts covered in this
micro-credential:
Introduction to Alteryx Designer: The micro-credential begins with an overview of
Alteryx Designer, highlighting its user-friendly interface and its role in data analytics.
Learners are introduced to the workspace, including the canvas, tool palettes, and
configuration windows.
Data Input and Output: A critical aspect of data analytics is the ability to import and
export data. This section covers various data input tools that allow users to connect
to different data sources, such as Excel files, databases, and cloud services.
Additionally, it explains how to output data in various formats, ensuring seamless
integration with other systems.
Data Preparation and Blending: Data preparation is a key step in the analytics
process.This concept involves using tools to clean, filter, sort, and join datasets.
Learners gain hands- on experience with tools like Select, Filter, and Join, which
help in transforming raw data into an analysis-ready format.
Data Transformation: Transformation tools in Alteryx enable users to
manipulate and reshape data. The micro-credential covers tools such as Formula,
Multi-Row Formula, and Transpose, which are essential for creating new data
fields, performing calculations, and restructuring data layouts.
Basic Analysis Tools: Alteryx provides several tools for basic data analysis. This
section introduces tools like Summarize, which aggregates data, and the Frequency
Table, which provides descriptive statistics. These tools are foundational for
conducting preliminary data analyses
Workflow design and automation: Learners are taught how to design efficient
workflows that automate repetitive tasks. This includes understanding the
importance of workflow organization, using containers to manage complex
workflows, and scheduling workflows for automated execution
13
What Steps Are Involved in Data Preparation Processes?
Data preparation steps can vary depending on the industry or need, but typically
consists of the following:
14
Machine learning is a type of artificial intelligence where algorithms, or models, use
massive amounts of data to improve their performance. Both structured data and
unstructured data are critical for training and validating machine learning algorithms
that underpin any AI system or process. The rise of Big Data and cloud computing
have exponentially increased the use cases and applications of AI, but having a lot of
data isn’t enough to create a successful machine learning model. Raw data is hard
to integrate with the cloud and machine learning models because there are still
anomalies and missing values that make the data hard to use or result in
inaccurate models. Building accurate and trustworthy machine learning
models requires a significant amount of data preparation.
According to a survey by Anaconda, data scientists spend 45% of their
time on data preparation tasks, including loading and cleaning data. With self-
service data preparation tools, data scientists and citizen data scientists can
automate significant portions of the data preparation process to focus their time on
higher-value data-science activities
15
Why Is Data Blending Important?
Data blending empowers a data analyst to incorporate data of any type or any
source into their analysis for faster, deeper business insights. Combining two or
more datasets often illuminates valuable information that might otherwise not be
discovered if the data wasn’t blended — information that provides a new perspective
that might lead to better business decisions. Traditionally, analysts have relied
on VLOOKUPs, scripting, and multiple spreadsheets for constructing datasets,
but this can be clunky and time consuming. Utilizing manual processes or relying
on data scientists to build analytical datasets is increasingly ineffective — it’s not
scalable with the number of ad-hoc requests analysts receive. Data blending
building blocks speed up the process of constructing datasets and can help analysts
and business leaders get more accurate answers
In order to live at the forefront of innovation, the focus of data analysis must focus on
high- level business questions rather than the minutiae of spreadsheets and manual
SQL queries. Data blending can help analysts take full advantage of
expanding roles, as well as the expansion of data needed to make critical
business decisions
While there are many different techniques for bringing data together, from inner and
outer joins to fuzzy matching and unions, data blending boils down to four simple
steps.
Preparing Data
The first step in gathering data is to ask what information might be helpful to answer
the questions being asked. Identify pertinent datasets from various sources,
a wide array of structures or file types can be used. Each data source that is
included will need to share a common dimension in order to be combined.
The ability to transform these different types into a common structure that
allows for a meaningful blend, without manipulating the original data source, is
16
something that modern analytics technology can do in an automated and repeatable
way.
Blending Data
Combine the data from various sources and customize each join based on
the common dimension to ensure the data blending is seamless. Think about the
desired blended view and only include data that is essential to answer the questions
being asked as well as any fields that may give additional context to those answer
when an alaysis is stressed. The resulting dataset should be easy to comprehend
and explain to stake holders
Reporting
1.Report Creation: user can generate reports by combining
text,tables,charts,maps and images.
17
Module 3: Machine Learning Fundamentals Micro-Credential
1.Introduction to Machine Learning: The Alteryx Machine Learning
Fundamentals Micro-Credential is a certification designed to equip individuals with
the foundational skills necessary to perform machine learning tasks using the Alteryx
platform. This credential is suitable for data analysts, data scientists, and other
professionals who want to enhance their machine learning capabilities within
Alteryx's user-friendly enviromnent. Here's an overview of what the micro-credential
encompasses:
Objectives
The primary objective of the Alteryx Machine Leaming Fundamentals Micro-
Credential is to provide participants with a practical understanding of machine
learning concepts and the ability to implement these concepts using Alteryx tools.
The credential focuses on:
Data Preparation:
Techniques for cleaning and preprocessing data.
Handling missing values and outliers.
Feature engineering and selection.
18
2.Key Algorithms and Techniques
Alteryx Machine Learing offers a robust set of tools and functionnlities that enable
users to perform a wide range of machine learning tasks. Here are some of the key
algorithms and techniques available within Alteryx Machine Learning:
3.Decision Trees:
Non-parametric algorithm used for classification and regression tasks.
Easy to interpret and visualize, suitable for understanding the decision-making
process.
4.Random Forest:
An ensemble method that builds multiple decision trees and combines their
outputs.
Provides robust performance and reduces the risk of overfitting.
5.Support Vector Machines (SVM):
Used for classification tasks, especially effective in high-dimensional spaces.
Finds the hyperplane that best separates different classes.
19
Useful for customer segmentation and market basket annlysis.
2.Hierarchical Clustering:
Builds a tree-like structure of nested clusters by iteratively merging or splitting
clusters
Helps in understanding the hierarchical relationships among data points.
2.Feature Engineering:
Creating new features from existing data to improve model performance.
Includes techniques like polynomial fentures, interaction terms, and
logarithmic transformations.
20
Fraction of missing values
Number of outliers
Target leakage
Class
Along with data analysis, Alteryx Machine Learning also offers many useful
features you can use to improve your machine leaming skills and the quality of
your predictions.
Education Mode:
Education mode provides contextual guides that explain any terins you might not be
familiar with and can be toggled off and on as needed. Text explanations improve
your knowledge of machine learning terms, which will help you develop your skills as
you use Alteryx Machine Learing.
Correlation Matrix
The Correlation Matrix shows how two or more variables are related to each other,
so you can understand how one variable might affect another. The matrix intuitive
visualization with darker colors representing stronger correlations
21
Chord Diagram
The Chord Diagram shows you the connection between data points, especially when
there are a large number of columns, The more lines connecting to one data point,
the higher the correlation it has to the predicted outcome.
Predection Explanation
22
Prediction Explanations tells you how the prediction for a single row is explained by
the feature's values. This includes information for the performance of your models,
insights, and features. You can use this information Alteryx Machine Leaming to help
explain your model's results and modify the models to see how they'Il react to new
information When you're all set, you can download the graphics from Alteryx
Machine Leaming.
Downloadable file types include:
PDF
Image
PowerPoint
23
Module 4: Alteryx Designer Core Certification
1.Introduction to AlteryxDesigner
Key Features of Alteryx Designer
1.Data Preparation and Blending:
Alteryx Designer excels at integrating data from multiple sources, including
databases, spreadsheets, cloud applications, and more. Users can clean, transform,
and join data from these diverse sources without needing advanced coding skills.
The platform supports a wide array of input and output formats, ensuring flexibility in
handling various data types.
24
Data Preparation:
Intro to Data Analytics
Formatting Data
Sorting Data
Filtering Data
Sampling Data
Formatting Data:
SELECT Tool:
Designer makes it very easy to change the datatype at any point in the workflow,The
Select tool displays the columns in your data set. Use the Select tool to change the
datatype of those columns and reorder, rename, and drop columns from the
DataStream
Sorting Data:
SORT Tool:
Organize your large data sets by sorting information in ascending descending order.
The Sort tool's configuration window is simple but powerful. Select the column to be
sorted, then the order for sorting. You can select multiple columns for sorting in a
single tool. The Sort tool will work on the first column listed in its configuration, then
move onto the next order.
Filtering Data:
FILTER Tool:
As you start to read in more data into Designer, you will continue to increase the
amount of data in your workflow. That's great because you have it all in one place,
but it can get overwhelming when you are seeking specific information.
An extremely useful tool for dividing your data sets is the Filter Tool. Using the Filter
tool, you can create logical statements. The incoming data set is evaluated against
that criteria and output to either the True anchor or the False anchor. The basic filter
option helps to construct your criteria, but you can also use the custom filter option to
create more complex statements
25
Sampling Data:
SAMPLE Tool:
After sorting data, you may be interested in a subset of the data. The Sample tool
provides options for the data. Use the radio button to select one of the configumtion
options, then set the value for N. The Sample tool's output will only include the
specified data and drop the rest.
Removing Duplicates:
UNIQUE Tool:
Another common need when analysing data is finding the unique values within a
data set. The Unique tool will divide the data set into unique values and duplicate
values. Selecting a single column in the Unique tool's configuration window will
evaluate values in that column only.Selecting multiple columns will evaluate the
combination of values and determine if the combination is unique.
Blending Data:
UNION Tool
When inputting more than one data set into a single workflow, you will likely need to
combine those data sets. The Union tool combines data sets vertically by name, by
position, or manual configuration to align columns of data. The Union tool's input
anchor accepts multiple inputs and even includes an option to set a specific output
order.
JOIN Tool
26
If you need to combine data horizontally, the JOIN tool can utilize a common field or
combine by position. If both incoming data sets share a common column, joining on
that column ean be used to match rows of data. Alternatively, if you are confident
that the row order of the data sets match, you can join by record position. This tool is
very powerful and makes it easy to work with multiple data sources or combine
disparate data streams.
Core Topics:
Path to Core Certification:
Date Time
Rows vs Columns
Functions
Expressions
Summarizing Data
DATETIME Tool:
Similar to the way some values need to be split in order to be as useful as possible,
date time values need to be formatted properly in order to be most useful. Designer
contains functions which can calculate time intervals without needing extra
conversion ealeulations between units. In order to use those funetions, date time
values need to be formatted into a specific order. The Select tool is a go-to for
changing datatypes, but the requirement that the characters be arranged in the
corect order is something the Select tool cannot achieve. The Date Time tool,
however, can easily convert string data into properly formatted date time data and
vice versa. Simply select the direction of conversion and the format of the string
value.
Rows vs Columns:
An important concept to keep in mind when using Designer is that rows are not
treated the same as columns. Unlike some spreadsheet programs where you can
specify an array, values are tied to the headers above them in Designer. This is why
you will see some tools which require data to be oriented in a particular way in order
to function. Rows are also referred to as records and columns can be referred to as
fields.
Functions:
Altering data is an integral part of Designer. One of the most powerful ways to alter
your data is by applying functions, Designer includes a Function Library which is
categorized to help you find the one you need. Some functions will require that the
data be in a specific datatype, but others are agnostic. Regardless of which function
you need you can use it in any tool that has an expression editor. The Expression
Editor is where you will construct your function by selecting the function you want
use and properly formatting it into a statement. All expression editors have a tab with
the full Function Library, as well as a tab containing the "columns and constants".
27
You can also choose which column to overwrite or create a new column with a name
and datatype of your choosing.
Expressions:
FORMULA Tool:
While there are many tools which support the use of functions, the most common is
the formula tool. Using the formula tool, you can utilize values from other columns to
perform calculations, categorize, convert datatypes, formnat values, and much more.
The only limitation is that values in the statement are limited to the current row being
processed.
28
ALTERYX SPARKED:
Alteryx SparkED is a free program that offers data analytics education to learners of
all levels, including students, career changers, and military veterans:
Learners: Receive a free license to Alteryx Designer, access to online
courses, and opportunities to earn certifications.
Educators: Get free curriculum materials and real-world data sets.
Customers: Collaborate with SparkED on datathons and career outreach.
SparkED has helped over 170,000 learners in more than 50 countries develop in-
demand data analytics skills. The program can be integrated into many fields of
study, including finance, accounting, marketing, and supply chain.
29
Teaching tools: Learners receive teaching tools to help them learn.
Learning experiences: Learners receive learning experiences to help them
solve problems with data.
Financial assistance: Learners from diverse backgrounds can receive
financial assistance.
Career experiences: Learners can gain career experiences.
Alteryx SparkED is a free data analytics education program from Alteryx, a data
analytics automation platform.
Alteryx is a data analytics automation platform that can help users collect, prepare,
and blend data. Some benefits of using Alteryx include:
Increased efficiency: Alteryx's automation capabilities can help users
streamline data workflows
Improved data quality: Alteryx's data preparation and blending capabilities
can help improve data quality
Cost savings: Alteryx can help users save money
Answers to complex business questions: Alteryx can help users find
answers to complex business questions
HISTORY OF ALTERYX:
Olivia Duane Adams (Libby) is a co-founder of Alteryx and the chief advocacy officer
(CAO). She is one of a few female founders to take a technology company public.
The other co-founders of Alteryx are Dean Stoecker, who is the executive chairman,
and Ned Harding, who was the original CTO.
Alteryx is a company that makes data analytics tools for businesses. The company
was launched in 2010 by Stoecker and Harding, who left their jobs in 1997 to start
Spatial Re-Engineering Consultants (SRC LLC).
Alteryx SparkED is an education program that offers free software licenses, teaching
tools, and learning experiences. The program is designed to help learners
understand data, question data, and solve with data.
30
Alteryx SparkED is a free program that helps learners acquire data analytics skills
and prepare for tech careers. It offers resources for:
Learners: Free software licenses, teaching tools, and learning experiences to
help learners question, understand, and solve with data
Educators: Teaching materials and a connection with other educators
Customers: Opportunities to collaborate with SparkED on datathons and
career outreach
Students: Opportunities to acquire Alteryx Micro-Credentials and
Certifications, network, and discover opportunities with potential employers
Scholars: Financial assistance, career experiences, and other engagements
SparkED can be used in many fields of study, including: Finance, Accounting,
Marketing, and Supply chain.
SparkED has helped over 170,000 learners acquire data analytic skills. It partners
with higher education systems to provide a complete education package. SparkED
also partners with online-learning platforms like Datacamp and Udacity.
A data analytics process automation virtual internship helps participants learn how to
automate data processes, perform data analysis, and apply machine learning
techniques. The internship provides hands-on experience with data preparation,
blending, and cleansing, as well as an understanding of how to integrate various
data sources. Participants also learn the fundamentals of predictive analytics,
including statistical modeling, machine learning, and advanced data visualization.
Data analytics automation can help businesses in many ways, including:
Faster results
Automated programs can process data faster than humans, which can help
businesses save time and money.
Increased efficiency
Automation can help businesses increase efficiency by allowing employees to spend
more time on other tasks.
Improved data accuracy
Automation can help businesses improve the accuracy of their data.
Better scalability
Automation can help businesses scale their data operations and expand new ideas
quickly.
Alteryx SparkED Virtual Internship is an exciting opportunity for students and young
professionals to gain hands-on experience in data analytics and science. SparkED is
31
Alteryx's analytics education program, designed to help learners develop data
analytic skills and kickstart their careers.¹
Through the virtual internship, participants can expect to work on real-world projects,
collaborate with Alteryx experts, and develop skills in data preparation, analysis, and
visualization. The program aims to provide a comprehensive learning experience,
covering various aspects of data science, including machine learning, predictive
analytics, and data storytelling.
Some of the key benefits of the Alteryx SparkED Virtual Internship include:
- Hands-on experience: Work on real-world projects and develop practical skills in
data analytics and science.
- Mentorship: Collaborate with Alteryx experts and receive guidance and feedback on
your projects.
- Career development: Enhance your career prospects by developing in-demand
data analytic skills.
- Networking opportunities: Connect with like-minded professionals and Alteryx
experts, potentially leading to valuable connections and job opportunities.
To learn more about the Alteryx SparkED Virtual Internship and how to apply, I
recommend visiting the Alteryx website and exploring the SparkED program in more
detail.
The Alteryx SparkED Virtual Internship Program is a remote internship opportunity
offered by Alteryx, a data analytics software company. The program is designed to
provide students with hands-on experience in data analytics and process automation
using Alteryx tools.
Key Features:
* Focus: Data analytics and process automation using Alteryx software.
* Format: Virtual internship, allowing participation from anywhere.
* Duration: Typically lasts for a specific period, such as a semester or summer.
* Learning: Students learn to use Alteryx tools to create data workflows, manipulate
data, and automate tasks.
* Projects: Interns work on real-world projects to apply their skills and gain practical
experience.
* Mentorship: Students often receive guidance and mentorship from Alteryx
professionals.
Benefits for Participants:
* Skill Development: Gain valuable skills in data analysis, process automation, and
Alteryx software.
32
* Real-World Experience: Work on practical projects that simulate real-world
business challenges.
* Portfolio Enhancement: Build a strong portfolio to showcase your skills to potential
employers.
* Networking: Connect with Alteryx professionals and other interns in the field.
* Career Advancement: Increase your chances of landing a job in data analytics or a
related field.
Eligibility:
* Typically open to students pursuing degrees in relevant fields like computer
science, data science, business analytics, or related disciplines.
* May have specific requirements regarding academic standing or prior experience.
Overall, the Alteryx SparkED Virtual Internship Program offers a valuable opportunity
for students to develop in-demand data analytics skills, gain practical experience,
and enhance their career prospects.
The Alteryx SparkED Data Analytics Process Automation Virtual Internship is a
program designed to equip students with the skills and knowledge needed to
succeed in the field of data analytics. The program provides participants with hands-
on experience using Alteryx Designer, a powerful tool for data analysis and
automation.
The internship curriculum covers a wide range of topics, including data preparation,
transformation, analysis, and visualization. Participants will learn how to use Alteryx
Designer to create complex data workflows, automate repetitive tasks, and generate
actionable insights from data.
The internship also provides participants with the opportunity to earn Alteryx
Designer certification, which can enhance their credibility as data analysts and
increase their job prospects.
33
Introduction to Data Analytics
Data analytics is the process of using data to solve problems and find insights. It
involves collecting, organizing, and transforming data to make predictions, draw
conclusions, and inform decisions. Data analytics can be used to improve business
processes, foster growth, and improve decision-making.
Here are some things to know about data analytics:
What it involves
Data analytics uses a variety of tools, technologies, and processes to analyze
data. It can include math, statistics, computer science, and other techniques.
What it's used for
Data analytics can help businesses understand their performance, customer
behavior, and market trends. It can also help companies make better decisions by
using the data they generate from log files, web servers, social media, and more.
Soft skills
Some soft skills that are useful for data analytics include:
Analytical thinking and problem-solving
Strong communication and presentation skills
Attention to detail
Critical thinking
Adaptability
Data analytics focuses on analyzing past data to derive insights and make decisions
based on historical trends. On the other hand, data science encompasses a broader
scope, including data analysis, machine learning, predictive modelling, and more, to
solve complex problems and uncover new insights from data
34
We need data analytics for the purpose of Business data analytics collects,
processes, and analyzes data to help make smart decisions. Smart Decision-
Making: By looking at past data, businesses can predict what's coming next, helping
them act before problems pop up.
Data analysis tools are software programs, applications, and other aids that
professionals use to analyze data sets in ways that characterize the big picture of the
information and provide usable information for meaningful insights, predictions, and
decision-making purposes.
36
1.RapidMiner
2. Orange
Primary use: Data mining
o Orange is a package renowned for data visualization and analysis,
especially appreciated for its user-friendly, color-coordinated interface.
You can find a comprehensive selection of color-coded widgets for
functions like data input, cleaning, visualization, regression, and
clustering, which makes it a good choice for beginners or smaller
projects.
3. KNIME
Primary use: Data mining
o KNIME, short for KoNstanz Information MinEr, is a free and open-
source data cleaning and analysis tool that makes data mining
accessible even if you are a beginner. Along with data cleaning and
analysis software, KNIME has specialized algorithms for areas like
sentiment analysis and social network analysis.
4. Tableau
Primary use: Data visualization and business intelligence
o Tableau stands out as a leading data visualization software, widely
utilized in business analytics and intelligence.
Tableau is a popular data visualization tool for its easy-to-use interface and
powerful capabilities. Its software can connect with hundreds of different data
sources and manipulate the information in many different visualization types
5. Google Charts
Primary use: Data visualization
o Google Charts is a free online tool that excels at producing a wide array
of interactive and engaging data visualizations. Its design caters to user-
friendliness, offering a comprehensive selection of pre-set chart types
that can be embedded into web pages or applications
37
6. Datawrapper
Primary use: Data visualization
o Datawrapper is a tool primarily designed for creating online visuals, such
as charts and maps. Initially conceived for journalists reporting news
stories, its versatility makes it suitable for any professional in charge of
website management.
8. Qlik
Primary use: Business intelligence
o Qlik is a global company designed to help businesses utilize data for
decision-making and problem-solving. It provides comprehensive, real-
time data integration and analytics solutions to turn data into valuable
insights.
9. Google Analytics
Primary use: Business intelligence
o Google Analytics is a tool that helps businesses understand how people
interact with their websites and apps. To use it, you add a
special Javascript code to your web pages. This code collects
information when someone visits your website, like which pages they
see, what device they’re using, and how they found your site.
10. Spotfire
Primary use: Business intelligence
o TIBCO Spotfire is a user-friendly platform that transforms data into
actionable insights. It allows you to analyze historical and real-time data,
predict trends, and visualize results in a single, scalable platform.
38
APA isn’t RPA or Business Process Automation (BPA), and it isn’t a data tool, either.
It’s an automated, self-service data analytics platform that focuses on business
outcomes first while empowering everyone in your organization to adopt a culture of
analytics. It allows you and anyone else in your organization to perform advanced
analytics whether or not they know any code and whether or not they’re trained in
data science. Analytic Process Automation (APA) describes a unified platform for
self-service data analytics that makes data easily available and accessible to
everyone in your organization, optimizes and automates data analytics and data
science processes, and empowers your entire organization to develop skills and
make informed decisions using machine learning (ML), artificial intelligence (AI), and
predictive and prescriptive analytics.
Think of Analytic Process Automation as an all-in-one supercharged data analytics
machine. If all you want to do is clean up some data, you can do that. If you want to
merge multiple different data types, you can do that, too. If you want to automate
month-long tedious and complex data tasks and push those outputs to decision-
makers and business processes, you got it.
39
If you want advanced analytics that provide forward-looking insights you can use to
produce better business outcomes, improve revenue, reduce spending, and
transform your organization from the ground up, then you need it.
40
Integration: Integrating automated systems with legacy systems can be
challenging.
Data security: Automated systems may be vulnerable to data security risks.
Data quality: Poor data quality, such as missing fields or duplicated data, can
lead to inaccurate analytics and poor decision-making.
Adaptability: Automated tools may struggle with dynamic websites and may
not be able to adapt to changing requirements.
Contextual understanding: Automated tools may miss nuanced information.
Monitoring: Continuous monitoring is needed to ensure data accuracy.
Other disadvantages of automation include: Job displacement and unemployment,
Reduced human interaction and customer experience, and Dependency on
technology and loss of human skills.
Data analytics automation is a technique that uses computer systems and processes
to perform analytical tasks with little or no human intervention. It can be useful for
many reasons, including:
Time and money savings: Automation can save time and money by
eliminating manual, repetitive tasks.
Improved accuracy: Automation can improve the accuracy of data.
Faster insights: Automation can provide insights faster.
Frees up time: Automation can free up time for employees to focus on
higher-level tasks, such as interpreting automated data.
Better scalability: Automation can improve scalability.
Competitive edge: Automation can help an organization gain a competitive
edge by leading to new products and tweaks to existing ones.
Automation can be used for a variety of tasks, including data discovery, preparation,
replication, and warehouse maintenance. It can be especially useful for big data.
Automation can range from simple scripts to full-service tools that can perform
exploratory data analysis, statistical analysis, and model selection.
41
42
Machine learning (ML) can automate many steps in the data analytics process,
including:
Data cleansing: ML algorithms can automatically detect and fix errors,
inconsistencies, or missing data.
Data transformation: ML models can automatically transform raw data into a
more usable format.
Feature engineering: ML can automate feature selection and engineering,
which are essential for building predictive models.
Predictive analytics: ML models can identify patterns, trends, and
correlations in data to help predict future trends or events.
Reducing bias: ML models can help reduce unintended bias.
Network architecture search: Some automation of network architecture
searches is possible, such as with the neural architecture search (NAS)
method.
Data analytics automation can help businesses make better, more informed
decisions by analyzing large sets of data quickly and efficiently. Here are some other
benefits of data analytics automation:
Faster work: Employees can use their time on high-value work.
Improved customer satisfaction: Data analytics automation can help
improve customer satisfaction.
Free up time: Businesses can focus on creating and selling products and
services.
Some of the most common scripting languages for process automation are Python,
Ruby, and PowerShell. Python is a versatile and easy-to-learn language that has
many libraries and frameworks for web development, data analysis, and machine
learning.
43
Anywhere, or Blue Prism. This could include automating data entry, report
generation, web scraping, or other workflows.
Developing Automation Scripts: Depending on the internship, you might
write scripts or code to automate more complex processes. This could involve
using Python, Java, or other programming languages.
Real-World Applications: You might work on projects related to customer
service, finance, human resources, or supply chain management, applying
your skills to solve real business problems.
Where to find these internships:
Company Websites: Check the career pages of companies known for data
analytics and automation (e.g., Accenture, Deloitte, IBM, UiPath, Automation
Anywhere).
Job Boards: Search for "data analytics process automation internship" or
related terms on sites like Indeed, LinkedIn, Glassdoor, and Monster.
Internship Platforms: Explore platforms like Internshala, LetsIntern (India-
specific), WayUp, and Chegg Internships.
Networking: Attend online industry events, webinars, and connect with
professionals on LinkedIn to learn about potential opportunities.
Tips for your search:
Develop Relevant Skills: Build a foundation in data analysis (Excel, SQL,
Python, R) and familiarize yourself with basic automation concepts.
Highlight Your Interest: In your resume and cover letter, clearly express your
enthusiasm for data analytics and process automation.
Portfolio Projects: Create personal projects that demonstrate your abilities
(e.g., automate a task in your daily life, analyze a public dataset).
Practice Your Skills: Use online resources and practice platforms (like
HackerRank or LeetCode) to improve your data analysis and coding skills.
44
# Open the file in write mode (this will create the file if it doesn't exist)
with open(file_name, 'w') as file:
file.write(text)
print(f"File '{file_name}' has been created and the message has been
written.")
Output
File 'automated_file.txt' has been created and the message has been written.
After executing this code, you will find the file with text at the same location/directory
as your Python file.
PYTHON CODE:
LOGISTIC REGRESSION USING MACHINE LEARNING
# Logistic Regression
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.5, random_state =
0)
45
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
46
from matplotlib.colors import ListedColormap
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max()
+ 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(),
X2.ravel()]).T).reshape(X1.shape),
alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Logistic Regression (Training set)')
plt.xlabel('Loan Amount')
plt.ylabel('Income')
plt.legend()
plt.show()
47
plt.title('Logistic Regression (Test set)')
plt.xlabel('Loan Amount')
plt.ylabel('Income')
plt.legend()
plt.show()
OUTPUT
runfile('C:/Users/TEJASWINI/OneDrive/Documents/project/logistic2.py',
wdir='C:/Users/TEJASWINI/OneDrive/Documents/project')
Confusion matrix:
[[ 0 0 0 14 0 0 0 1]
[ 0 0 0 1 0 0 0 0]
[ 0 0 0 8 0 0 0 0]
[ 0 0 0 223 0 0 0 0]
[ 0 0 0 11 0 0 0 0]
[ 0 0 0 52 0 0 0 0]
[ 0 0 0 12 0 0 0 1]
[ 0 0 0 21 0 0 0 17]]
48
49
You may make your HTML page for Data Analytics professionals completely from
scratch, even if you do not have experience with coding or design. The template
library makes it even simpler to visualize and set up your landing page .
HTML CODE:
DATA ANALYTICS AUTOMATION DASHBOARD
<!DOCTYPE html>
<html>
<head>
<title>Data Analytics Automation Dashboard</title>
<style>
body {
font-family: Arial, sans-serif;
}
.container {
width: 80%;
margin: 40px auto;
padding: 20px;
background-color: #f9f9f9;
border: 1px solid #ddd;
box-shadow: 0 0 10px rgba(0,0,0,0.1);
}
.header {
background-color: #333;
color: #fff;
padding: 10px;
text-align: center;
}
50
.section {
margin-bottom: 20px;
}
.section-header {
background-color: #f0f0f0;
padding: 10px;
border-bottom: 1px solid #ddd;
}
.button {
background-color: #4CAF50;
color: #fff;
padding: 10px 20px;
border: none;
border-radius: 5px;
cursor: pointer;
}
.button:hover {
background-color: #3e8e41;
}
</style>
</head>
<body>
<div class="container">
<div class="header">
<h1>Data Analytics Automation Dashboard</h1>
</div>
51
<div class="section">
<div class="section-header">
<h2>Data Source Selection</h2>
</div>
<form>
<label for="data-source">Select Data Source:</label>
<select id="data-source" name="data-source">
<option value="csv">CSV File</option>
<option value="database">Database</option>
<option value="api">API</option>
</select>
<button class="button">Submit</button>
</form>
</div>
<div class="section">
<div class="section-header">
<h2>Data Processing</h2>
</div>
<form>
<label for="data-processing">Select Data Processing
Task:</label>
<select id="data-processing" name="data-processing">
<option value="cleaning">Data Cleaning</option>
<option value="transformation">Data
Transformation</option>
<option value="analysis">Data Analysis</option>
</select>
<button class="button">Submit</button>
</form>
</div>
<div class="section">
52
<div class="section-header">
<h2>Data Visualization</h2>
</div>
<form>
<label for="data-visualization">Select Data Visualization
Type:</label>
<select id="data-visualization" name="data-
visualization">
<option value="bar-chart">Bar Chart</option>
<option value="line-chart">Line Chart</option>
<option value="scatter-plot">Scatter Plot</option>
</select>
<button class="button">Submit</button>
</form>
</div>
</div>
</body>
</html>
OUTPUT
53
CERTIFICATION COMPLETION OF BADGES
54