0 ratings0% found this document useful (0 votes) 95 views106 pagesAlteryx Chapter 2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
“CHAPTER 2
. The Games
a
4The Games 51
We all love playing make believe, don’t we? As we discussed in
the last chapter, assume that you, dear reader, are the new
consultant at a company that works with the Alteryx Analytical
Platform, 4
This book is your mentor and here is our first problem to
solve!
‘wacom Aboot
Weta togetal of ur new hie basieundestarding fers quel 9 poste
i
i
|
‘i center ihe basting crouse mst impstat sporting eet ne worl, which should
| seed mitre,
"be asking youa fw questions snag eto examples ul elke you ae rey to
ane tours.
| Mest question we are gig to exo Wich couney has producedthe best Feeestle sng ests |
eran te 2002 and 2006 Wer Game? |
Assume each Soltis woth 3 pats, ris weth 1.5 pits an Bronce wert pat
Sametting nortan! oecogie that am ashing you fr he ase toa especie question.
‘nce youve some ofthe bases dow, weil takaboit making» generator you or your |
enduser task ebted questions, Fr no, ist understand that wen youare asked aba a spec
swe, thoy are gig tonya theres
show youhow ts was
hans,The Games
a 21 Tools & Concepts
In this chapter, we will cover the improved features for Alteryx
designer 10.x and the various tools and concepts as mentioned
below: i
4
Tools Concepts
Browse Importing Data
Comment Viewing Data
Filter Outputting Data
Formula Identifying Desired Results
Input Data Answers to Questions
Join idy Datasets i
Output Data Normalized Datasets i
Running Total Creating Calculations '
Sample Combining Data \
Select Creating Data Subsets i
Sort Summarizing Data |
Summarize Organizing Data i
Tool Container Organizing Workflows '
Transpose Documenting Your Work |
Union '
|The Games 53
22 Improved Features
Several updates were implemented since Alteryx 10.0 was
released. Most of the tool icons got a new look, and some tools
got new features. New features for each tool willbe covered in
their respective sections. Here, we will cover the new UI and
Alteryx Designer’s updated features
+ Reading and writing Excel files (.xIsx) is done using a new
Alteryx driver for Microsoft Excel
+ The Predictive Tools Installer has been separated from the
Alteryx Designer install to allow for future updates to the
Predictive Tools and packages without the need for an
update to the Designer.
+ After running a workflow, clicking on the Input or Output
tools will automatically populate results (up to IMB)
+ Tool palette is customizable to show only those categories
and tools you want to use
+ The File menu is reorganized to be simpler and more
intuitive to use.
+ The enhanced integration between the Designer and
Gallery makesit easier to collaborate on workflows hosted
in a Gallery. From within the Designer, one can add and
maintain connections to a Gallery, open Gallery
workflows and edit them, and then save a new version of
the workflow back to the Gallery. Previous versions of the
workflow are maintained and can be retrieved via a
version history window in both the Gallery and the
Designer. Additionaily, any workflow version can be
made the “published” version that users will see by
default,
* InDatabase support is added for these Data Platforms:
Amazon Redshift, Impala, Teradata, and Spark.The Games
Changes to the Connect In-DB tool include: new
streamlined UI, new option to allow password decryption,
and new option for creating file-based In-Database
connections in order to simplify obtaining conn
ion
information from IT Admin/DBA for server users.
Changes to the Data Stream In tool include: new option to
allow password decryption, users can write out to a
permanent table when streaming data from an external
data source,
New Macro Input In-DB and Macro Output In-DB tools
have been added for In-Database processing so users can
build macros with In-Database.
New Dynamic Input In-DB and Dynamic Output In-DB
tools have been added for In-Database processing so users
can retrieve the underlying SQL query and other metadata
info being sent to the database or to the In-Database tools
ina workflow.
‘These connectors are now supported: Amazon Redshift
bulk load (write), Netsuite (read/write), Qlik (re:
PostgreSQL 9.4 (read/write), SAP Hana (read/write),
Spark SQL (read)
Users can now read and write to JSON from the Input and
Output Data tools.
Users can now browse to their Hadoop distributed file
system and read/write to HDFS via the new connections
in the Input Data tool. (HDS tools released in 9.5 have
been deprecated)
Salesforce and Marketo connector tools built on the REST
API provide enhanced functionality. The existing SOAP-
based connectors are now deprecated.
7The Games 55
MongoDB Input and Output tools now support ve
3.0. Mongo input will also read from replica set members
including primary and secondary. :
Users now have the ability to retrieve a list of sheet nasves
from an Excel (.xlsx) file and read the list via the Alteryx
XLSX driver.
Apache Avro support is no longer listed as being in beta
Support for SQLite has now been upgraded to 3.8.9.
The Download tool now supports multithreading to
increase the speed at which data is retrieved. The
maximum number of connections that can be used in the
a
same tool is 32.
‘Authenticated proxy can now be enabled from User
Settings. The Download Tool, Amazon Redshift Bulk
loader, Amazon $3 Upload, and Amazon $3 Download
tools will function through a proxy server.
Alias Repository has been renamed to Alias Manager and
supports both standard as well as In-Database
connections,
‘A new option that provides information about the
performance of tools in a workflow is available. Select the
“Enable Performance Profiling” checkbox on the Runtime
tab of the workflow, and run the workflow to view the
percentage of time spent processing each tool in the
option should only be used when
Results window.
debugging a specific workflow, as it may decrease the
performance of the workflow slightly. Sample workflows
that demonstrate different configuration options for one
tool ata time are one example, Twenty new examples have
been added, which can be accessed via Help > Sample
Workflows > One Tool Examples. In/Out: Directory; Map56 The Games
Input Preparation: Auto Field; Generate Rows;
Imputation; RecordID; Tile; Unique Join: Append Fields;
Find/Replace; Fuzzy Match; Make Group Parse:
Date/Time; Regkx; Text To Columns; XML Parse
Transform: Count Records; Cross Tab; Running Total;
Transpose.
+ The Block Until Done tool now has numbered outputs that
output “in order.”
+ The Map tool now has a zoom/bounds option that makes
the map zoom/pan to the reference file.
+ In the Alteryx Designer version 10.x, the Configure
Workflow and Results windows are split into two distinct
views.
79% pats vox
[a
© coon
=
nos oCone 4,0 Waningy PON
Figure 2-1 - Workflow and Results windows
+ Configuration and Results windows can be dragged and
displaced to different possible positions.‘The Games 57
+ Every tool in Alteryx now has a Preview feature to show
a snapshot of the data it contains.
|
|
B ome
3 [m= oe
a |) Feost =
riers Groote
Os rates omy abn
Figure 23 ~ Preview Data58 The Games
23 Browse
The Browse tool gives usa tabular view
of the data in a data stream at the point
it is connected.
y Group | Input ‘Output |
Figure24-Browse [In/Out — | Data stream | None
‘Note: Itis very important for the development of workflows,
applications, and macros but should be disabled when
development is completed to improve speed
An Action tool can connect to the Lightning Bolt Anchor to
modify how this tool works in apps and macros.
Results Window:
The Browse Results Window allows us to view the data that
was in the data strcam during the last run,
The message option shows the number of records processed,
Processing time, and any errors during processing.
Figure 25 - Browse, Results
The two icons on the right side (shown above) are Copy
and Save functions. They allow us to directly copy the data out of
the browse tool or save it toa file.
|
|Figure 2-6 ~ Browse, Field Names
Clicking on the down arrow shown above gets us a list of
all field names so we can select only the relevant fields to be
displayed. The checkboxes ¥ in this list allow us to select or
deselect every field.
The text shown after clicking on Cell Viewer depends on
what is selected in the records
Figure 2-7 - Browse, Cell Viewer60 The Games
ee
‘ Clicking Record # selects everything, allowing us to see the
information about each column. Ifa single column is selected, we
see metadata about that column. If a row is selected, we see how
much data the record has in it. Ifa cell is selected, we can see the
contents of that cell formatted with all line breaks,
Figure 2-8 - Browse, Side by Side
We have the ability to compare the dataset either Side-by-
Side (vertical compare) or Top-and-Bottom (horizontal compare)
by clicking on the new window button located at the top-right of
the resuits window.The Games 61
snc Age Cetey Yar Cig ormont ne
Figure 2-9 - Browse, Top and Bottom
You will also notice that we have an Input button on the
Properties window side pane.
rome) Inpat
| ie ee
Fats
a [Wane Twa See Souce —_Desciton
a) ee V.WSing 255. Fle: ier
ViSeing 288. Fil: Cees,
5. county Vi3ving 235 Fle Coes.
Ble Year Seieg
@ |5 csingceenon Ox Oe >| Input window
6 spon Vv.stieg
7 Goa Dostle Fler CUsem”
8 Stee Daitle Fie: Cinea,
8. Berwe Doitle 8 Fie Cis
©
Figure 2-10 - Browse, Input‘The Games
—
‘This is the Input window. It tells us information about each
field that comes into the tool: the field name, field type, total size
of each cell, the original source of the data, and a description of
the field.,*
‘The Crosstab tool creates a normalized |
(more human-readable) dataset by
creating columns out of the rows of
data
Group Input Output |
Figure2.11-Crosstab
Transform | Any data | Data |
stream —_| stream
wider than
input
vosstab tool will convert alll spaces and special
characters to underscores in the column headers.
An Action tool can connect to the Lightning Rolt Anchor to
modify how this tool works in apps and macros.
Properties Window:
The Cross Tab Properties window has four components,
as shown in the following figure.The Games 63
Hi cross Tab ()- Configuration
Output fields
Column to split into
Tet ———4] multiple columns a
Matetsopies Numeic 095K)
best
(E} Count (ith Wis) =
© Eaten ree a
Figure 2-12 - Crosstab, configuration
We notice the grey boxes that have arrows at the top-left
and bottom-right corners. These show us what the input and
output fields are. These boxes are standard for tools that have
input and output.
+ Grouping Fields allows us to select which fields we want to
group by in the resulting dataset, If nothing is selected, we
will only get a single record as output.
* Header Field is the field we are splitting into multiple
columns.
+ Data Field is the field we want to put in each record for the
columns created by the Header Field,* Methodologies (Numeric Data Field) allows us to select the
type of aggregation method used if we have multiple data
entries that fit into the same cell of the resulting dataset.
.5 Comment
2.
- The Comment tool gives us the al bility |
to write notes on our workflows to
add additional information on the
data stream,
‘Group | Input [pa]
Figure2413-Comment | Documentation | None | None |
Note: Conment is an annotaive tool to help give meaning to
developers using this workflow.
The Comment Properties window allows us to customize
the comment field that appears on the canvas.The Games 65
ee eee
Qcomment 6) Coniurtion = 9% hn Wott
°
Notes are portant for
teadabiny
°
Figure 2-14 - Comment, configuration
‘The configuration settings in the figure above create the
comument to the right of the window. We can edit text, shape, font
and color of the comment background, adjust alignment of the
text, or select an image to write over,
I Using the settings in the Comment tool aliows us to create
easily recognizable, distinct comments throughout our data
stream.The Games
26 Filter ~
| The Filter tool gives us the ability to
create a function that will split the
© data rowéhy row into either the true or
[7 | false outputs,
y [Group [Input | Output
LF) Preparation | Any data | T&F
stream section
Figure 245-Fiter
= below
(null evaluates to False unless the formula is looking for nutls)
The formulas we create here can be arbitrarily complex, and thus
can significantly slow our data stream. “//“is the comment
character,
Application questions can be connected to the Top Black Question
| Anchor to use those answers in this tool.
An Action tool can connect to the Lightning Bolt Anchor to modify
how this tool works in apps and mactos.
the set of records where the formula evaluated
Output F: This is the set of records where the formula evaluated
to false.
| A basic filter was added to the Filter tool. One can use this basic
filter to quickly construct a simple query on a single field in the
incoming data stream.
Note: The formula must evaluate to a True or False Boolean value |The Games
Properties Window:
We see that there are two different ways to create a filter
Basic Filter and Custom Filter. The regions for each are as shown
below.
SEIT cE
J ic Fite Region
Castom Filter Region
osenans
]
fer Expression Result
J for roth Base and
Custom Filters
°
Figure 2-16 - Filter, basic filter
* Basic Filter: Allows us to pick a field and operator, and
type in the value that the field should be compared to.
The options for the Basic Filter change depending, on the
field type. The Basic Filter option allows us to easily create simple
filters as well as start building formulas before we know the
syntax.
As we enter values into the Basic Filter options, it populates the
Expression box at the bottom of the window. This helps us learn
the associated syntax.
Custom Filter: allows us to click on variables, function, and
saved expressions to populate the expressThe Games
a
type the formula directly. A sample custom fitter is shown
below
i) aserae
@ Pr
if
Etat expesion
forex iter
°. a 7
Figure 2-17 ~ Filter, custom filter
We will discuss creating formulas in the Formula tool and
throughout the exercises in this book. On an error, the red Error
symbol replaces the message symbol, as shown in the figure
below.The Games 69
Filter (5) - Messe gee ~ ax
Show Bessages From
8 Last Contain
Last
[@ 2 Ess @o Conv Exors 9 Warnings
[@ Fiter(s) Parse Error at char(0): Unknown function *
OOCRHMHSS
8
2
:
‘The Formula tool gives us the ability to
create a function that will be written to
anew column in our data
Group Input | Output
Preparation | Any data _
stream
Augmented
original
data stream
Feste219-Formaln |
‘Note: The formulas created here can be arbitrarily complex, and
thus can significantly slow the data stream. Ensure the created
output field hasa file type that’s compatible with the result being,
created. We can use the formulas created higher in the list in
calculations lower in that list. “/’ is the comment character.
|
Application questions can be connected to the Top Black Question
Anchor to use those answers in this tool.
_70 The Games
————
‘An Action tool can connect to the Lightning Boli Anchor to modif:
how this tool works in apps and macros.
each formula we create.
|
Output: The original data stream with one additional field for |
A new function is added to the Formula tool: Starts With, Ends
With, and Contains.
The formula window looks similar to the filter window; in
fact, the expression building section here is identical to the
Custom Filter.
The top of the Formula Properties window has an option to
define multiple calculations, change the order the fields are
and define the major metadata for the field that we are
creating,
enonnee
Figure 2:20 - Formula, optionsdown the list while the circle with the line in it removes the
highlighted formula, :
The Games
n
The arrows on the side move tiie created formula up and
Each of these output fields will/have an associated formula
and will add a column to our data stream.
Under the Variables Tab, we can see Fields and Constants.
The incoming data determines what
Environmental Variables determine
information.)
ist of Fields, and the
what the Constants are,
(Environmental Variables can be defined in the workflow
properties window; see Properties Window in Chapter 1 for more
Under the Functions fab, we see a tree structure, This
allows us to look for ihe functions needed by double-clicking on
and moving, them into the expressions window to work with.
The Saved Expressions Tab allows us to acces
recent and
saved expressions, as well as save our current expression for later
use.
28 _Input Data
F
The Input Data tool gives us the ability
to import data from specific databases.
D
Figure 2-21 - Input Data
Group
Input
‘Output
In/Out
None
Data stream
in initial
data format
Note: This is the most common start to a data stream. We can use
full or relative file paths to files as well as database connections. |2 The Games
——————
[ An Action tool can connect to the Lightning Bolt ‘Anchor to modify
.
how this tool works in apps and maci
A new option is available for relational database connections.
When enabled, the data is stored in a yxdb file on disk so that
data sources aren't hit repeatedly during workflow |
development. |
Window:
The Input Data Property window has three main
components. The first is the field that shows our data connection.
When connected to a data source, we can see the address of the
file or database that we are connected to.
Figure 2-22 - Input Data, options
When we click on the drop-down arrow on the right-hand
side of the field, we see this menu and can use this option to
connect to data sources that are accessible. If present, a list of
recent connections is shown, allowing us to save connections.
This would appear under the Alias Link shown aboveThe Games B
‘The second main component of the window is the Options
section where we can change the setting associated with the data
connection to modify exactly what we are connecting to. This will
allow us to modify many of the options ist define the
connection.
The third component is the Preview, which gives usa view
of the first 100 records to help ensure that we are connected to the
correct data source.
Table or Query: This option needs to be called out separately
because it allows us to open a new connection window by
clicking on the “...” button. That menu looks similar to this,
depending on what we are connecting to,
The image below describes each of the tabs available when
clicking on the "...” button.“a The Games
List of Tables
Figure 2-23 - Input Data, table and query optionsThe Games,
| The Join tool gives us the ability to
| combine two data streams by lining
up records based on matching fields.
|Group input] Ouiput
|
Join | See Input Left | See Output
and Input Left, Output |
Right Join, and
Output Right
Figure 2:20- Join
[ Note: Join does not work like a join in SQL;
groups in order to perform a left, right, or full outer join We need
a Union tool after the join to identify which outputs should be
brought together. (See example “Brains vs. Brawns.”) If we have
multiple fields that match each other, records are replicated from
the original data stream
tool creates three
An Action tool can connect to the Lightning Bolt Anchor to modify
how this tool works in apps and macros.
Input Left: A data stream with at least one common field to Input
| R (the fields do not need to share a name).
Input Right: A data stream with at least one common field to Input
(the fields do not need to share a name).
Output Left: Data stream contain
anything from the right input.
g records that did not match
Output Join: Data stream containing, records that match both left
and tight inputs. Records may be replicated as a result of this
operation.The Games
Se
Oulput Right: Data stream containing records =e
anything from the left input,
Properties Window:
The Join Properties window has two major components.
The top asks how we want to join the two data sets, We
can join by position or by specific fields. Most of the time, we
would be joining on specific fields because it allows greater
control over the join.
Figure 2-25 — Join Properties
The bottom allows us to define which fields will be in the output
as well as define the metadata for those fields (more on this when
we talk about the Select tool).The Games 7
ee
We can see that there are three images separating the top
and bottom of the window. These Venn diagrams show us what
will be in each of the three outputs, More succinetly, we can
consider the image below. The two inputs are the pink and blue
circles, and the three outputs are the pink, purple, and blue
shaded regions.
Figure 2-26 - Join Properties Venn Representation2B The Games
‘The Output Data tool allows u:
write the data stream out to a file or
database
Group Input Output
[In/Out | Any data | File Or
Figure 2-27- Output Data stream | Database
Note: The output window has the ability to write to files or t
databases using SOL.
how this tool works in apps and mac
to |
=|
An Action tool can connect to the Lightning Bolt Anchor to modify ad
Properties Window:
The top half of the Output Data Property window is very similar
to the Input Data window. Both allow us to navigate to a file or
database and set options related to the dataset.‘The Games 79
[ cxteur
5] etree ona
eo
Cotes
s
°
rs
Eye Fate ane rom Fils
Pace Aino le Tete Nae :
Figure 2-28 - Output Data Properties
The difference is that at the bottom, there are some special
options that allow us to modify the way the metadata is written
based on the incoming data stream.
We can see here that there is a file format called an
“Alteryx database (“yxdb).” Alteryx allows us to store data in
files specifically designed to work well with Alteryx. ‘These file
types are native to Input and Output Data tools, so they do not
need to perform any conversion to connect to these files. We can
also use this file type to store both spatial and non-spatial data
together.80 The Games
eee
2.11 Running Total _
The Running Total tool allows us ©]
create a running sum for a numeric
5|
field in the incoming data stream
y p [Sam [nat Taper
Transform | See See |
details | details
Figure 229- Running Total below | below
| Note: Running Total prod juces the running sum of the data from
the top of the column down, so it is important to make sure the
data is properly ordered (See the Sort Tool). |
Input: Any data stream with at least one numeric field.
Output: The original data stream with additional columns called
RunTot_ for each of the selected “Create
Running Total” fields.
An Action tool can connect to the Lightning Bolt Anchor to modify |
how this tool works in apps and macros.
eelag ore
The Games 81
Properties Window:
The Running Total Properties window has two components.
Running Total) = Configuration
I} Sexe er0rteen
pasion
@ [Be
[Acomy
[E}veo
[Fite ceremony Ose
Crete Rig Tt
OS0Rn
Figure 2-30 - Running Total Properties
The first is Group By, which allows us to define a field or
set of fields that the running sum will be unique to the set of
elements in our group by fields,
‘The second is the selection of which fields we want to
create a running total on.
2.12 Sample _
The Sample tool allows us to work
with a subset of data
_ |
Group Input Output® |82 The Games
z Preparation | Any data|§
stream | below
n@ D
Figure 2-91 - Sample
Note: This is useful for limiting the amount of data we've run
through our data stream when we are testing, creating different
samples of our dataset, and skipping header or footer
information that may exist in our data
Output: The original data stream with potentially modified
metadata and truncated fields,
An Aciion tool can connect to the Lightning Bolt Anchor to modify
how this tool works in apps and macros.‘The Games 83
Properties Window:
The Sample Properties window has three different settings.
Sample) Coguaton
FR ones
@ DlethRecaés
Hy OSH Resets
© Veoney Wee
pO Fanen tia Chee edad
@ OF dtesads
@ 8
rowing Fits Orton
“Figure 2-32 - Running Total Properties
‘The
st of these settings is the type of sampling that is needed,
First N Records: the first N records in our data stream,
Last N Records: the last N records in our data stream,
Skip 1 N Records: all but the first N records in our data stream.
1 of every N Records: create groups of records based on the order,
and take one of each of those records.
Random 1 in N Chance for each Record: Every record has a 1/N
chance of being kept.84 The Games |
es
First N% of Records: the first N percent of Records in our data
stream,
N’ to be
The second settings sed in the sampling,
The third is the ability to select the fields we want to group the
sampling by. In the scenario pictured above, this filtering will
Keep the first 100 records for each date in the data
2.13 Select
OO The Select tool allows us to modify
metadata associated with the data
stream, including the order of
columns
Group | Input
Preparation | Any data
stream
Output
See details
below
Select is used after every data connection and periodically
throughout the data stream to ensure everything in the data
stream is in the right format, named appropriately, and
necessary. Use this tool to drop fields that are no longer needed
to save space.
Output: The original data stream with potentially fewer fields
with modified metadata
An Action tool can connect to the Lightning Bolt Anchor to modify
how this tool works in apps and macros.The Select Properties window has multiple important tasks
associated with the maintenance of data and metadata. The
window shown below shows usa list of every field name coming
in so we can reset the information, if needed.
Figure 2-34 - Select Configuration
In this case, the Date field has been renamed to Sales Date,
and the Sales field was converted to a float type and given a
description saying that it is the Total Cash Sales. The red cells
above indicate that the metadata in the cell has been modified.
There are two other things we can do with the fields
without going into the options menu: The first is checking and
unchecking the boxes at the left of each row in order to drop that
field from the data stream, The second is reordering the fields so
that the columns are ina different order downstream. This can be
accomplished by selecting a field and clicking the up and down.
arrows or by right-clicking in the space to the left of the
checkmark and dragging the field up or down in the list.86 The Games
————
In addition to the two fields that are part of the incoming data
stream, there is a special field called “Unknown that acts like a
placeholder for all new fields that come into this tool
Select Options Menu: oy
The options menu allows us to systematically modify the fields.
‘Set (12) Conigoration
amore
Clarighigcd erin
even ts Cig ype Se
Deve ghd gon Tye See
Foggia Fs
Figure 2-95 ~ Select Options 1
Save/Load allows us to create or load a Field Type File (yxft), Field
Type Files are a metadata file that can be used to appropriately
define or redefine the columns in our data stream.
Save Field Configuration creates anew yxtt file.
‘Load Field Names imports field names from an existing yx file.
Load Field Names & Types imports field names and the type of field
that should be allocated.The Games 87
Other options are as shown in the figure below.
i rere
ee
Figure 2-36 - Select Options?
Select aliows us to select or deselect all fieids.
Sort has four primary methods of ordering fields.
Sort on Original Field Name will alphabetically sort our fields in
either ascending or descending order based on the field names
that came in from the data stream.
Sort on New Field Namte will alphabetically sort our fields in either
ascending or descending order based on the field names that
leave the tool.
Sort on Field Type will group all fields that have the same data
type together,
Revert To litcoming Field Order will clear the ordering of fields.88 The Games
If we have selected fields, Move allows us to groujs them ail at
the top of the data field list.
Aud Prefix to Field Names will allow us to add a prefix to all fields
or alll selected fields 4,
Add Suffix to Field Names will allow us to add a suffix to alll fields
or all selected fields.
Remove Prefix will allow us to remove a common prefix between
selected fields.
Remove Suffix will allow us to remove a common suffix between
selected fields,
Clear All Renames will remove all renaming that has been defined
for this select.
Clear Highlighted Renames will remove ali renaming in the
highlighted (selected) fields.
Revert All To Original Type & Size will remove all changes to the
field types or allocated data sizes.
Revert Highlighted To Original Type & Size will remove all changes
to the field types or allocated data sizes for highlighted fields
Forget all Missing Fields will remove this tool's metadata about
fields that are no ionger coming into the tool from the data
stream,
Forget Highlighted Missing Fields will remove this tool's metadata
about fields that are no longer coming into the tool from the data
stream that are highlighted.The Games 89
Type: Type is an important thing to know because each have
different attributes and mean different things to other tools. Each
of the types are described in Appendix D.
2.14 Sort 4
"| The Sort tool allows us to reorder
records.
‘Group [Input | Output
Preparation | Any ‘See below
data
Figure 2:37- Sort rea
‘Note: Sorting is most important when we are working with
calculations that consider multiple rows or ordering data for
| normalized consumption.
| An Action tool can be connected to the Lightning Bolt Anchor to
modify how this tool works in apps and macros.
Output: The original data stream with records sorted in a
different order.90 The Games
————————— eee
Properties Window: -
The Sort Properties Configuration window allows us to select one
or more fields by name, and Ascending or Descending for each +
to determine an order to our records, We can change the order oft,
these sorts by moving them up and down the list
© Sort (4) - Configuration + ax
[Use iesonary Ower Engeh Whted Sales) .
Figure 2-38 - Sort Configuration
By checking the Use Dictionary Order option, we can select the
dictionary order that should be applied to sort the data when
appropriate.
215 Summarize
| The Siammarize tool allows us to
z summarize data in our data strea:
| Group Input | Output
y D Prranstom See | See below
below
Figure 2-39 - SummarizeThe Games s1
[ Note: When summarizmg data, it may be necessary to|
| reexamine the underlying calculations because aggregating
those calculations may not make sense. Summarizing a single |
field using Group By is a good wey’ to get a unique list of the
data, Running complex analysis lie geocoding is often more
efficient to do on a summarized list and then join back onto the
full dataset.
An Action tool can be connected to the Lightning Bolt Anchor to
modify how this tool works in apps and macros.
Input: Any data stream with too granular a level of detail
Output: A summarized data stream with a less granular level
of detail.
Properties Window:
‘The Summarize Configuration window has two basic components,The Games
|
esennog
deem es)
Misia
Figure 2-40 - Summarize Configuration
The Fields list shows each of the incoming fields, and the
Actions list shows the fields created in this tool.
The select button at the top-right allows us to select tools
in a systematic way so that if we want to take the sum of all of
our Numeric fields, we can select them all and add them in one
step.
When we have something in the fields list selected, we can
click on the Add button, which shows the drop-down menis
shown above. It lists every operation that can be used to
aggregate the data using Summarize.
We will not be going through this menu in detail, as it is a
list of aggregation methods; however, Group By needs to be given
special attention,
When we use Group By on fields in the summary, we will
end up with one line item for each combination of field elementsThe Games 93
that we grouped by. If this is unclear, it should make more sense
as we go through exercises.
2.16 Tool Container +f
The Tool Container tool allows us to
group tools together for clarity and
allows the tools to be disabled
ary.
[input | Output
we241-Toot Container | Documentation | None | None
Note: Click and drag tools onto the box to put them into the tool
‘ontainer. Tool containers make it much easier to navigate our
data stream because they allow us to consider a series of tools
asa single unit. If we click on the arrow at the top-right corner,
it will collapse the box without disabling it.
when unnece:
An Action tool can be connected to the Lightning Bolt Anchor to |
[moat how this tool works in apps and macros. |
Properties Window:
The Tool Container Configuration window allows us to
customize the text and formatting of the container as well as
disable the tools inside it2 1t cntine 4 - Cention + ox
Al Se »
1G TolGntasemerarddbdeaton lone +
a —
Otc rm >
noxeanae ‘o
wee Ree) Paar
sie Big
romana
Figure 2-42 - Tool Container Configuration
The Disabled option allows us to turn off sections of our
data stream. This is typically used in testing data and application
building,
217 Transpose
The Transpose tool allows us to de- |
normalize data
& _
Group ‘| Input Output |
DY YEBSD [Testom [se See |
= below | below
Figure 2-43- Transpose
Note: This tool converts all spaces and special characters in the
titles into underscores in each record.
An Action tool can be connected to the Lightning Bolt Anchor to
modify how this tool works in apps and macros,
Input: Any data stream with multiple fields that need to be
combined into rows.c |
ee in order to consolidate the columns. The columns |
| consolidate into two columns; Name, which is a column with
each of the former column names in it, and Value, which is a
column with each of the data values. }
Properties Window:
The Transpose Configuration window has three elements, as
shown in the figure below.
igure 2-44 ~ Transpose Configuration
Key Fields allows us to select the fields that will be
maintained after the transposition.
Data Fields allows us to select the fields that will be
combined following the transposition.
The drop-down at the bottom allows us to change the
message behavior if there are missing fields in the incoming data
stream,96 The Games
2.18 Union _
_— ‘The Union tool allows us to append
records together one after another
Z from multiple data sources
y D [Sup Yimpat Output
Jon ———«[ See —*Y See
Figure 2-45- cmon | below | below
Note: The order that we connect data stream to this tool's input |
will determine the default order that they are combined. In this
tool, naming the incoming connections is often helpful.
An Action tool can be connected to the Lightning Bolt Anchor to
modify how this tool works in apps and macros.
Input: Multiple data streams that should be combined by
adding the records from one set to the end of the others.
Output: A data stream that has the records from multiple data
streams,
The Union Configuration window has three core elements.
The first drop-down allows us to change the method that
is used to align the columns from the different inputs.The Games 7
von (6) Contiguation
oscanee
(sete Ose Ore
ee 1
° a
Figure 2-46 - Union Configuration by Name
Auto Config by Name makes the union align the fields that
have the same name. This is best used if we know that our data
will always be named the same way.
Figure 247 Union Configueation by Psion
Auto Config by Position makes the union align the fields by
the column number. This is best used if we know our data will
always be in the same order but may have different (or no) field
names.98 The Games
_——— eee
In both of these options, we see the same Properties
section asking what should happen when the fields differ. The
first drop-down allows us to change the behavior between an
error, a warning, and nothing, The second ‘drop-down allows us
to decide if we want all fields or only the fidids matched from all
outputs to be in the output
union 0) -contiguation
car cat Gad car cas Cas Ca Cad Ca Cato
BB | sete owe Coot Year Cling Coumer Dole Spt Gall Sher Bure
] JF Hive — Cony Yer Ching Cero, Dae Spt Gold Sher Bone Age
i
Fle Beg Heid Nal Change
Ona
seas ound Or
Figure 2-48 -Union Configuration Manual
Configuration
Manually Configure Fields is the third option in the first
drop-down. It allows us to select exactly which fields to beThe Games 99)
brought together by manipulating the Output Columns portion
of the configuration window, This is best used if we know that
our fields may be named differently or may be in different order.
The final component is the Output Order. It allows us to
set the order of the records by choosing the order that the data
streams are combined
219 Freestyle
| Mettoestteter neues taseanestanag Aero ys pose
‘ovr center the bs traning around the most important parte even nthe worl wich shoud
reed itroction.
| fb aig yous ew questions and waing You Uo xara wrt fee the ou are ready
| memos
| he fst questo ate going t exe teh eu as poured the best Freestyle hig rests
| cealn the 2002 and 2006 Ver Games? i
| assume each Golds worth’ points, Seri vost 15 pos andronte ts Werth 1p
Something nortan to recogni ta am asking your the answer toa very specie question
(nce you have some athe ties down, we wt stout maga generalizes toa for ou e your
tender toaskreted questions. or nom, st unertar asked about aspetie
nse, they ate gogo ony want theres,
show youbow ths wos
hans,100 The Games
ames
Let's start building a workflow that will answer our
question, We are going to start with a blank canvas and save it as
Freestyle Skiing. Next, bring an Input Data tool so that we can
connect to data. t
a TUNE
See
Select Sheet name
= “Athletes”
Click OK
Figure 2-49 Freestyle Skiing: Pata Input
Now navigate to where data files are unpackaged, and
connect to the file All Medals.xlsx in Chapter 2 The Games>
Medals. For downloading the data associated with this book,
Please refer to the letter to the reader on this chapter's first page.
When connected, we see this window pop up. Click on
Athletes$ then OK to connect to the Athletes sheet in the All
Medals Excel file. This is shown in the figure above
Best practices are to put a select and a browse after every input.
* Browse helps us check the data at the time of import. This
ensures that the data we are getting is correct.| aa
The Games 401
+ Select allows us to make sure that the fields are in the right
format from the beginning,
9 Set 2 -coigeton 9X fnethey
7 ei a
oncmnse
Figure 250- Freestyle Skiing, Select configuration
If we click on Select, we should see that our fields are in
different types than the above image. Change them to match
what is shown.
Now that we have the data and the fields are the right
type, the first thing we should do is filter the data. We always
want to limit the data as soon as possible, since this will speed up
our data stream and prevent memory errors by limiting the
information.
Best practice is to remove data as soon as it is no longer
needed.
It makes sense that the first step in filtering would be to
bring in the Filter tool; however, if we are not familiar with the
data set and we have not run it, we may not have enough
information to filter properly. In this case, we want to run the
‘module so that there's data in the Browse tool for us to work with.102 The Games
Bz : 2 oo # f
Click Run to i
ulate Browse
Aven dite oink nate [Siaaaea]
(© sn congo
eee
fe “fA
S| len
a) SS Cong:
il oun [ t
le ae —__.
‘alae [aul medats:xisx
(alone. [Table=" Athletes:
Figure 2.51 - Freestyle Skiing populate Browse
We can see that when the module finishes running, we get
a pop-up window that lets us know how long it took to run and
if there were errors
Ateryx Designer x64
rahe nneing Frese pd i 0.4 seconde
(Cl don't show this message again
Figure 2-52 Freestyle Skiing, message after Running
Feel free to click on the Don’t show this message again check
box before closing if the pop-up window is distracting
Now, we can start thinking about the filter. We know that
we are only interested in freestyle skiing results for the 2002 andThe Games 103
2006 games. So the first thing we are goiiig to filter is the sport to
“freestyle skiing.” If we look at the browse tool, we first see
freestyle skiing at row 5818, identified by the string Freestyle
Skiing.
[isting a
seat oy | Calne »| 1 | [ted 88 +65
Ja) monte se Canin Yer hig area te Soa he Hote
Sr toned Imenaesang 4
Figure 2-53 - Freestyle Skiing, Browse configura
This is the crucial piece of information we didn’t have
before. Now that we know exactly what we need to look for in
our data so we can create the filter.
Drag a Filter tool after the Select, and make sure there is a
connection between the Select output and the Filter input. This
time, we will use the Basic Filter builder. Set the field drop-down
to Sport, and type Freestyle Skiing into the text box like we see
below.104 The Games
inl Medals spor = 1
Tableabiews | rele ing |
Figure 54 - Freestyle Skiing, Browse configurations
Notice that the Expression says [Sport] = “Freestyle
Skiing.” This is because field names are in brackets, and string
values are in quotes. What is happening here is that for each
record, we test to see if the value in Sport in exactly Freestyle
Skiing. If itis, then True; if it isn’t, then False.
The next thing we want to do is create a filter to keep only
2002 and 2006. If we look at the Select tool ou the previous page,
we will see that the Year field is a string. This is fine; we need to
remember that when we ate writing the filter formula. Drag a
new Filter tool onto the canvas, and make sure that the first
Filler’s true (7) output is connected to the new Filter’s input.
d
sony A | Both OR Logic operators
Figure 2-55 - Freestyle Skiing, Filter configurations - Functions‘The Games 105
—————
This time, we are going to create the filter logiz on our own
using the Custom Filter option. We know from the previous filter
that [] = “" is the syntax for filtering a
string field, so creating the first half of the filter is jiot hard ~ it is
[Year] = “2002” — but what we need to do now is nfake sure 2006
is also kept.
One way we could do this is by using logical operators,
Logical operators are terms that allow us to combine two or more
Boolean (true or false) values to create a single Boolean from the
two. The three Boolean operators that we will be discussing are
and, or, and not,
AND: if both the value to the left and the value to the right are
true, then true,
OR: if either the value to the left or the values to the right or both
are true, then true.
NOT: if a value is true then false; if the value is false, then true.
We can see in the Functions Tab that we have the option to
use the Boolean OR - Keyword or the Boolean OR | |. There is no
computational difference between using the keyword or the
double vertical bar symbol. Both options are available for our
convenience, For those not used io programming, the keyword
OR is much easier to remember and use, but for those who
program, double vertical bars (| |) is a common standard they
may be used to.106 ‘The Games
ree catgusn + 8X nti
Somme
[_
5 seen 1
(on ean «=
shi
Figure 256-- Freestyle Skiing, Custom filter
See Appendix F for examples of Boolean logic.
Now that we know about logical operators, we can finally
finish putting the filter together. We can use the formula [Year] =
“2002” OR [Year] = “2006” in order to filter out this data.
Note: We could have combined both of these filters together by
using the following: [Sport] = “Freestyle Skiing” AND [Year] =
2002" OR [Year] = "2006")
The parentheses allow us to change the order of
operations so that this formula reads “Freestyle Skiing in the
years 2002 or 2006” instead of “Freestyle Skiing in 2002 or
anything in 2006.”
We now have removed all of the information we don’t
need in order to answer the question. But we have the problem
that the data is too granular. We know who the athlete was and
in which year they won their medal(s). We should bring in a
Summarize Tool in order to bring the data up to the country level.
If we add Country using Group By, and Gold, Silver, and Bronze
using Sum, we will get a list of countries and their total medalThe Games
count for Freestyle skiing for 2002 and 2006. Place a Browse Tool,
and run the module to see what we have so far:
ye
igure 257 - eestyle Skiing, Suramary configuration
Best practice is to place a browse tool after every tool that
transforms data into a significantly different shape. Summarize
is one of the tools,108. The Games
——————
Ce
Hot AFieds 7 | Cenviewer v| 1 | + A”
(A tecorte county Sum .coll Sum. siher Sum prone *
img i
Figure 2.58 - Freestyle Skiing, Browse configuration after
We can see that we have four columns with the total
counts of gold, silver, and bronze medals listed for each of the 12
countries that won freestyle skiing medals during 2002 and 2006.
Notice the fields are titled Sum_followed by the original
field name. Alteryx is making sure we know the method used to
summarize the data.
The next thing we need to do is determine which country
was the best. If we look back at the email, we can see that best is
defined as a function of the medals won; 3 points for gold, 1.5
points for silver and 1 point for bronze.
Bring a Formula Tool onto the canvas following the
Summarize tool, and we are going to create a calculation called
Score that has the Type Double, with the formula [Score] =
3*[Sum_Gold] + 1.5*{Sum-Silver] + [Sum_Bronze]The Games 109
ell
= he.
Formulafor core
° ° See
igure2.59- Freestyle Sling, Formula configuration
We can now add another Browse after the Formula Tool
to see what the data looks like.
Bo”
Be ee |
Figure 2-60 Freestyle Skiing, Browse configuration after formula
We see there is a new field called Score that is an
unordered dataset and with multiple unnecessary values. We can
also see that Australia has the highest score and therefore is the
answer to the original question. But for good practice, we are110 The Games
going to-continue to build this workflow so that no interpretation
is needed.
This process is going to take four steps:
1. Reorder the data based on the score field.
. Select only the top-scoring country.
. Remove all data other than the name of the best country.
Browse that data
Like we discussed, the Sort Tool is how we reorder the
data and will be our first step. We will set up our data in a
descending order based on Score, like we see here
© sen congnton
@ m
3 [=
a ae F
at " r]
8
: f
° [or
© Wetton Getta {sum.col) «48 | bescening |
* [sum Sivan»
Figure 2-61 - Freestyle Skiing, Sort configuration
Next, we need only the first record, so we are going to use
the Sample Tool to keep only the Top 1 Record coming out of the
Sort.The Games at
ste -cotigutin 2 9X beet
soe)
{eo
Figure 2-62 - Freestyle Skiing, Sample configuration
We know we have fields we no longer need, so we can use a
Select Too! to eliminate everything that is not the country name.
Figure 2-63 - Freestyle Skiing, Selection after Sample
Finally, we can put a browse tool at the end and run the
workflow to see the results,112 The Games
Rete Brome) + OX Feetfegm x
SNe + | caves “1 1] de EG”
A] Recs County .
ae eg
Figure 2-64 - Freestyle Skiing, Browse after Select,
We could have stopped when we first saw Australia had
the highest score in the previous browse tool. The reason we did
not, is that when we are performing an analysis, we want our
esuilts to be perfectly repeatable. If we had interpreted the
previous browse tool incorrectly then there would be no way of
finding out why an error occurred. This is a problem because it
makes the individual analyst entirely responsible for the answer,
and anyone who checked the results could easily find the correct
answer where we mistakenly picked the wrong one. Finishing
the worktlow in this way affords us two benefits:
+ We would have a second verification that the answer was
what we expected.
+ Repeatability of the result so we can point to a single issue
in the data preparation process that needs fixing instead
of not being able to fix it at all
Here is how the workflow would look like when complete:14 The Games
a
seta fats
othe ar thing we sre ging to cover 2 question tht requires yout padice dss ateodot
estan answer,
|
| The goat of most data maniguation sto get the data n'a mare useable format. Typically, there are two
| Someta st rperite wadyr cto goy sparen eae
‘elotaanureraiale Fob swecamtclans costes aie ee
“Rint tse a ete once em kt
erected cn yatoes rteyrontacatpeesece Meta een
| iets geared ee na
| seSatectgtediynmgs argc anloeoeey i a
cto amp ne
a ti‘e—ts—s—OCOCONOCOCOCOCOCOCOCOCOC*CN
| sehies mat ema ntretetuetnttsepening steko
Sree estate gs sna eed ae a
| wnt pancreas aaa
| tants,This process must include at least four steps:
The Games
1) Import the dataset
2) Transpose the dataset
3) Make sure the fields are named correctly
4) Export the dataset,
However, we are going to make the data cleaner and employ best
practices. So our process is:
1) Import the data.
2) Browse the data.
3) Make sure the data has the right type
4) ‘Transpose the data,
5) Browse the restructured data,
6) Make sure the fields are named correctly.
7) Remove records that say there was no medals won.
8) Browse the data that will be exported.
9) Export the dataset.
Let's create a new workflow and save it as Let's Tidy Things Up.
examp!
we sav
We need to import the same data that we used in the last
le. Bring an input tool onto the canvas, navigate to where
ed this book’s data, and connect to the file All Medals.xlsx
in Chapter 2 — The Games> Medals.
COE EEE ts Tos pms x
RR SemaFteaate
ft
ae
nab Comacion
26D Cnectin
ors conan
Figue2.6-Les Thy Tings Up, Dalat116 The Games
Now we will put & Browse and select statement following,
the Input Tool.
lets Tidy Things Up yemd
a
i
o
CQ
Jail Medals.xlsx
Table="Athletes$>
Figure 2.67~ Medale data
All of the fields are in appropriate types for what we are
trying to do, so we can move directly to the transposition.
Let’s run the Module to see how the data is structured.
toh a
5 PF «vce = 11 [tam deo ao
feos ie ye Gt Ye Gace St Sh tw»
ts Ip)
adulneps sets 208 AN sry bt
tens dis ‘sirens 6b
hanes 2 umdgee aa ae ‘ean $2
Nese 3 “isan tant Siesey 1
Mianeer 36 ask NDE” Teme
vcs 2 hada aden Sieneg 1)
fetaite 7 “owdaae 32 seen Smog 2
Miovséon ‘ean Om innte2 ‘eneng 31
iden 2 teed 28 ces ‘sionag "2d
ivteoe bin a a Sieerg 39
a
Donfrs 3 ed 3) abt ‘sin
igure2-68 - Let's Tidy Things Up, Browse configurationThe Transpose tool takes normalized data and de-
normalizes it. If we take the data stream coming out of Select and
pass it into a Transpose, we can make the data tidier.
2) Configuration =X Aate Hy Mage pam
a
cle.)
fRiteisnain)
[Hormone eum Fle
@ [teronmewe fs)
Figure 2-69 - Let's Tidy Things Up, Transpose configuration
We want to keep all of the fields as they are except for
gold, silver and bronze. So we select all but those three fields
under Key Fields and we select gold, silver and bronze under the
Data Fields. If we had wanted to drop a field entirely ~ say,
Closing Ceremony Date — we could leave it unchecked in both
lists
Best practice is to always include a Browse after a tool that
modifies the structure of a data stream. Transpose is one of these
tools, Let’s add a Browse to the end of the data stream and run it
to see what we have.118 The Games
ee
Pauw ee ol
ee tetra + vcatow o]1 | [tmnt O oo
[ote [tke Je comty Yer Gg coenor Oe ot tine be «
I 1 RR wetsan ae Ue omy oud
i 2 esa onset Sr 8
3 nda Ozu alt ~~ Iona pe 3
4 Red te te
5 idhe oes ae es a
© owe Glau a Fahey “aoe 9)
> 2 isis 2 aati fen
‘ 2 vats [sey Se}
: Danae a ‘some be" |
* 35 eau ‘Sn Kort |
sede 8 ted an9 aa AON” oS
| mateo ome | oy I
Figure 2-70 - Let’s Tidy Things Up, Browse configuration
If we compare the top three records from the new Browse
to the one that came out of the Input in figure 2-66, we see that
we have two fields called Name and Value and no longer have the
fields Gold, Silver, and Bronze. We also notice from Athlete to Sport,
all fields are identical to the first three records in the original
dataset. This is because we replicated them for each column we
This is one of the reasons that tidy data is not particularly
human readable but is highly computer readable. Since all of the
information is displayed in each record and there is only a single
column to work on, interactive front-end software can work very
fast with the data,The Games 9
RESIST EME OE IE
5] Ontene=| tL TP Tore mice or: eet ih ck»
a Tipe Se fname Doin
ES
osemncy
Figure2-71- Let’ Tidy Things Up, Select configuration
Making this data truly tidy would mean we need to
rename Namie and Value to names that give better context to the
field. Add a Select statement, and rename the Name and Value
fields Medal Type and Medal Count, respectively.
We know we have rows that say zero medals were won by
looking at the values in the last Browse tool we created, We are
going to filter those data points out by adding a Filter tool after
the Select.
Our goal is to filter out any records that have zero medals.
We are filtering on a numeric field for the first time, which means
we should use the Basic Filter to learn about the syntax. The
configuration is as shown in the following figure120 The Games
Games
Fitter (4) - Configuration vax
BX] © easicriee
@ (Petre ji -
nett coumty > ©
°
Figure?2-72- Let's Tidy Things Up, Filter configuration.
We can see how if we select Medal count, we have
different options in the operator drop-down, This is because
Auumeric fields allow different comparison methods than string
fields,
We want to select greater than — ">" — and type “0” in the
text box. When we look at the Expression below, we see that it
says [Medal Count] > 0. This is because we do not put numeric
values in quotes. Alteryx recognized that when we selected a
numeric field in the basic filter drop-down, the “0” we typed in
meant the number 0 and not the string 0, so it put the numeric
value into the formula.
The last step involves two tools: the Browse tool and the
Output Data tool
Best practice dictates that we put a Browse before every
data output s0 that we do not need to open the file to make sure
we created it correctly,The Games wa
——
We now add a Browse tool to the end of the data stream
and also add an Output Data tool. We are going to write the file
to the same folder we have saved the Let's Tidy Things Up.yxmd.
{pov one ts cnton a
A
el
oa
S Specify
oe Location and
8 Ts cctinite File name
> ne rn ve
ytaerierae tener te
Figure2-73- Let’s Tidy Things Up, Output Data configuration
To do this, we are going to type .\Tidy Medal Data.cso in
the text box labeled Write to File or Database.
We just used a relative file path. Which allows us to
reference files in relation to where we currently are, Some basics
of relative paths are “.\”, which means the current folder. “..\"
means the parent folder (the folder that our current folder is in)
”\Folder Name\" will move our file into a folder below where
we have the workflow.
We do not necessarily need to use relative paths, but if we
are sharing Alteryx files, it is very beneficial to do so. We can use
absolute paths (full file locations) by pasting them into this box
or navigating to them in the File Browse option
If we run the module, we can see what the transformed
dataset looks like. This ensures that the information written into
the .csv was correct.122 The Games
ee
Ress trome
3S Seetees «| coven || 1 |strmcntdgis
‘ge Couey Year Casey Ceremary Date Spat Helle Het
2_jumedsates 08 satsae2¢ eit
ines
ie Foe [2 na ane
1 ted
28 andes 03
aie
Aaseresae taro:
Figure2-74- Let's Tidy Things Up, Final Browse configuration
Here is how the Let's Tidy Things Up data stream looks on
completion.
Lets Tidy Tings Up yam" x
Iti Medal |
{countesr
Figure2-75 ~ Let’s Tidy Things Up, Data stream after completionThe Games 123
a
2.21 Modern History
reat!
Now that you
co gettng the sense of tidy dat, let's go inthe oppasite direction and
‘create a norinaliesd dataset.
Hove about we create nice table with countries alohatetically onthe eft a column for
each year in the dataset ordered trom longest ago to most recent}and a historical total
‘medal count inthe crass ection?
should be fun,i
|
|
The Games
Notice that there is considerably less context built into this,
email. We often get very sparse information from people, and
they will assume we know the context. In this case, it was
assumed we were talking about the medal data that we have been
working with during the training so far.
This is a much more complicated process than the last
exercise, but that is only because the data was set up very well
for what we were doing last time, and it isn’t here.
We are going to be connecting to the same data source that
we have been using, but we are going to use a shortcut in the
connection process, Open a new workflow and save it as Modern
History yxmd, but make sure that Let’s Tidiy Things Up.yzumd is still
open.
Click on the data input in Let's Tidy Things Up and copy it.
Move over to the Modern History canvas, and paste what we
have copied. We sce that the input has been copied over and we
do not need to recreate the connection.
Modem History yxmd* x
All Medais.xisx
Table="Athletes$”
Figure2-76 Modem history, Input DataThe Games 125
Best practice will once again bring in Brows? and Select
tools, But since we know from past experience what the data
looks like and how it is read in, we will move directly into the
next step. re
1
oD
olpm
Figure2-77 - Modemhistory, Summarize configuration,
We know that this data is too granular for our desired
result. So we can summarize it. Based on the email, we know the
only information we will need in the end is the country, year, and
something to do with the medals. So when using the Summary
tool, we can group by the country and year fields and take the
sum of each of the medal counts to take our first step down this
path.126 The Games
© Feral) Contain 59% sade me
|tsum. Gola} +
[tsum.stver «|
[sum Bronze) |
Figure2-78- Modere history, Formulaconfiguration
We now add a formula that creates a Total Medal count by
adding the gold, silver and bronze fields for each record.
(Remember that we used a Summarize tool so we should have a
Browse tool.)
[Ssetett2)-contguion + 9X sedan teon nee x
is) cere ft Were
o o
nig ale
ee | eS
Figure 2.79 - Modern History Select
Configuration
We can now add a select statement that will allow us to
Keep only the Country, Year, and Total Medals fields, which we
will use to create the table.
We know we need a historical medal count, which means
we are going to need to take the running total along with theThe Games wz
country and year. But because Running Total is a tool where ©
order matters, we need to sort the data.
We can sort the Country and Year in ascending order to,
help us in two places: Initially, this will help because we arc,
creating the order for the Running Total, but it will also help us”
with the order of records and columns when we normalize the
data set.
Figure 2-80 - Modern History Sort
Configuration
Now that we have ordered the data, we can create the
Running Total for each country across years. To do this, we Group
By Country and Create Rusmning Total on Total Medals. What this
will do is create the running sum of Total Medals down the data
set (as time increases) and have that count restart every time a
new country shows up.128 The Games
Ey Running Total (8) - Configuration + BX Modem History yor
castro)
x [Country Al
3 ma
eos =] .
Se Ee
D create Runsing Tat
Ra Totai Metals — All
a ca
Cero fore pee
ere
Let's take a look at what we have created so we can geta
better sense of what the process so far has done.The Games 129
G} Browse (9) - Configuration + EX Modem History
Seo)
Record # Country Year Total Metals RunTot_Total Metals
1 Afghanistan 2008 4 1 |
2 Aighanistan:2012 1 2 :
3 Algeria 2000S 5
4 Algeria 2082 a
s Algeria 2012, 1 8
6 Acgenting 2000-20 120
7 ‘Aegenting 200343 C
8 Argentina 2008 St ‘0 :
8 ‘Aegenting 2012 1 x41
10 Remenia 20001 1
a ‘Armenia 2008.6 7
Figure 2-82 - Modern History Browse
Configuration after Total
If we add a Browse tool and run the workflow, we can see
that we have an alphabetical list of countries with a record for
every year they won a medal. We can also see the year is
increasing as we move down the list within a country. We then
see the Total Medal count for that year and the running total for
medals that the county has won going from one year to the next
ina field called RunTot_Total MedalsThe Games
—
Cross Tab (6) - Contiguration
Figure2-83 - Moder History Cross Tab Configuration
The next step in this process is to convert the data into a
Cross Tab. If we add the Cross Tab tool to the end of the data
stream and apply the settings in the above image, we will be close
to our goal.The Games 1
Let's add a browse tool and see what we have so far
G) Browse (7) - Configuration EX Mode History amd” X
Results - Browse (7)
of Fields ~ | CelViewer =| 1 1 | 11Orecons doayed 101 tes,
Record Country 20002002 2004 2006
1 Afghanistan on) tt) tat low)
2 Aleta 5 thi a) ‘ca
3 sence |e iam] a int)
‘ Ai it "|
5 1a ss a nn
. hos 3 a
7 wai ‘ei
® A tui ts)
° ial “tual
10 1 fh twat rl
Figure2-84~ Modern History Browse Configuration
Cross Tab
‘The result in the image is close but not exactly what we
wanted. We get the correct running totals in the years that each
country won medals, however we get nulls in the years that they
did not
‘What we need to do now is create a series of formulas
replace the nulls with zero or the previous value as appropriate.
Since we need to create formulas. we are going to need the
formula tool; but this time, we are going to need to create seven
similar calculations because we need to replace the values in
seven different fields.
Let's think through these formulas. We want to change the
cell only if it is null. If the column we are fixing is 2000, then it
should be replaced with 0, and if the column is not 2000, it should
be replaced with whatever is in the previous fixed column.132 The Games
———— eee
. For those familiar with conditional statements, the syntax
for an if-then statement is:
IF b1 THEN x ELSEIF b2 THEN y ELSE z ENDIF
For those unfamiliar with conditional statements, the
concept is: Given a true or false (Boolean) expression, the
calculation should do one of two things. The logic is if something
is true, then do that; else, if the previous is false and something.
else is true, do the second option; else, do the default,
The other thing we need to know in writing these formulas
is the test to see if something is null. The function used is:
IsNull(x)
Both of these syntax are under the functions tab in the
Formula tool if we need to reference them
The formulae that we need are:
Opa ype Ske Gorentoe
TSUES) - [pate [Fenny MENTE OF]
‘ane ~
Fak) THEW
Fed] LSE POON
Jes etmamacosy T1002 Fees ELSE 208 ENF
8 tr tateaacesy THEW 208 Fined ELSE [te] NOE
3 wmutziesy THE 206 Fes ELSE 28) EUOUE
8
:
Ft zo.y THE 06 Fie ELSE oe ENE
Fanzone THEN 201 Fe ELSE Or2) ENCAF
Figure2-85 ~ Modern History Formula List
Add a Formula tool to the end of data stream, and add the
seven formulas we sce here with corresponding field names. We
can also add a browse tool after that to see what we have createdThe Games
We can see that we have two sets of fields: those with the
original sparse data, and those with the new dense data
Figure 2-86 ~ Modern History Formula List
‘The next thing that we need to do is remove and rename
the columns that we have, so add a Select tool to the end of the
data stream.
2 ste): Contgurtin 8K teen oy
oem t Toned ies
rmectiean ty
ure2-87 - Modern Select Configuration
after Formula
Now the data has finished being prepped. We need to
‘out, which we know because we were asked for a data set
and not a specific answer. We should add a Browse tool and an134
The Games
Output Data“tool to end the data flow. Save the output as
Historical Medal Count.esv.
zf)
Be
|
Historical Medat
iCount.csw
Figure2-88 - Modem History Output
After doing these steps, the final workflow is as shown in
the below figure,The Games
Figure2-89- Modern History Data Stream
When Complete136 The Games
eee
2.22 Brains vs Brawns ©
Awesome!
We only have one more basic skill go over before we test to see hows much you
know.
Combining data,
{have beee working withthe medals dataset fr a wile and tis interesting to see bow
itcomparesto diferent metic,
| think we should compare the medal counts to Nobel Laureates trom each of the
{ts put together the data to see what the relationship between the cotnt of medals
on Nobel auteates was since 2000, (We vill map the county of Nobel aureate bith
tothe country that von the medal)
Thanks,
$eThe Games 137
—————————— —
Since we are combining data, let’s revisit the analogy
presented in the preface. When we look at a river, we see there
are tributaries all along its length. Each of these tributaries may
ghave gone through different terrain and could have started as
“Wery different sources. When they come together, they add
whatever they carried along with them into the river they form.
To relate it to the task at hand, tributaries are branches of
our data stream that come together, and when they come
together, we have a richer data stream because we have the
information that comes from everything contributing to it.
We are going to start by prepping the medals data and
preparing them to be joined. We'll create a table with two
columns called Country and Medal Count.
In order to do this, we are going to follow the following
steps:
Import data
2. Transpose and rename the columns so that the data
is tidy.
3. Filter out the 0 medal records
4, Summarize the data so that we only have one
record per country and the total medal count.
5. Rename the medal count column Total Medal Count.
Since we have covered the tools and the concepts used in
this exercise in previous exercises, overall flow should look
familiar. Please rebuild the following workflow with the
following configurations.
The properties windows for each of these tools as well as
the data stream that is produced are shown in the following
figures.138
The Games
o o
Figure2-90- Brains vs
Brawns data stream
Figure2-91 - Initial Steps
-SelectThe Games 139
Transpose 2) - Configuration ~ ax
[21m orUkrcwn Fields
Figure2-92 Initial Steps - Transpose
@ Select (4) - Configuration ax
Options =| 1 4 TIP: Toreocdee multiple rows: elect, ight tick and
OSCREOA
es) [Dynamic of Unknown Feds
Figure 2-93 - Initial Steps Select140
The Games
| rier (8) - contiguration
[mck
OSPRME CE
Figure 2-94 Initial Steps- Filter
3 summarize (7) - Configuration vax
[croveey [=
Metal count [Sum
Tr] Sum_Metat Count
Figure 2-95 - Initial Steps - SummarizeThe Games aaa
©
ia
BB | |) 5_éetcou
>
(2) -Unkown Dynan or Unknown Fie
Figure 2-96 - Initial Steps- Sort
Selec US)= Coniguation EE
&) owemett "Tord ilar sc chick nd
a
s
Figure 2-97 - Initial Steps - Select
Now that we have the data in the above stream prepared
to be combined, we should prepare the other contributing data
stream,
Let us open the file called Nobel Laureates.csv in the folder
Chapter 2 ~ The Games > Nobel Laureates. (Remember that we
should always bring in a Browse and Select Tool with an input.)142 The Games
Be 2 Beh
=. ay f=
Figure 2-98 ~ Running unrelated analysis simultaneously
Notice that we now have two completely separate
workflows. This ability is often a useful feature because we can
run unrelated analyses at the same time, which aids in testing
and in conditional application development,
If we run the workflow, we can look at the structure of the
Nobel Laureates dataset. Here, we want to make sure that the
field we plan on joining (Birth Country) is in the same type as
County in the medal data stream.The Games 143
htrome 2 -contguton + 0% “ave mms an
Rents ome 29
6eterads = | Caer +] 1 | 30nd 588
Ba cer feenemlcenay teat ie oe
3 (Sie Agen aarti
: ae
: iy es 4
Figure 2-99 - Nobel Laureates - Browse Configuration
CC NT
Figure 2-100- Nobel Laureates - Select Configuration
Now that we know what the data structure is and that it
parallels the medals file, we can start our preparation for the join.144 The Games
Figure 2-101 - Nobel Laureates —
Preparing to join Medals
We know we want to limit this data to years starting in
2000. One way we can do this is to convert Year to a Double Type
and set up a filter to be [Year] >= 2000,
Since we only need to know what the total number of
Nobel Laureates there were for each country of birth, we can
summarize the data.‘The Games
Browse (29) a
HS 22rd ~y| Be EG”
i] Record # Birth Countey Count
a Austealia -
2 Austria 2 1
3 Bangladesh -
4 Canada
5 China
6 Egypt 1
7 Finlond
8 France i
9 Germany |
10 Ghans i
11 ‘Hong Koag —
12 Hungary
13 India I
34 ian
15 kdl
16 itl
37 Sapan {
18 Kenya i
New 2ilend 72
Figure 2-102 - Nobel Laureates - Browse
after Summarize
145146 The Games
Figure 2-103 - Nobei Laureaies - vis a vis All MedalsThe Games a7
LE EERE eeene ee ee
We see that we have a list of countries and a count of the
number of Nobel Laureates. However, it is unclear what the
number is because the field is called Count. We should rename it
Total Nobel Laureates.
@ Select (15) - Configuration vax
TIP: To ecrder multiple rows: select. right-l
Figure 2-104 - Nobel Laureates - Select Configuration
We now have two data streams ready to be merged. We
want to align the two datasets so that matching countries from
each of the data steams share the same record, which means we
want to join the data. Because we don’t want to lose any data
points if we have countries in one dataset but not the other, we
will want to unite the three outputs from the join into a single
data stream.148 The Games
(23 Join (9) - Configuration - ax
Figure 2-105 - Nobel Laureates Join Configuration
‘We want to join on Country field from the Left (Input L)
with Birth Country field from the Right (Input R).
It is important in this instance that we keep both joining
fields because we intend to combine all three outputs in the next
step, However, if this was not our intention, we could have
removed the joining field from one of the two inputs.
Best practice is to give useful names to every connection
that enters a multiple connection anchor.
Thus, we can see in the following image that we have
relabeled the connections from #1, #2, and #3 to Left, Join, and
Right.Er fx]
seer ose
Eb
Figure 2-106 - Nobel Laureates - Union Configuration and
Output Stream
Since we are doing a union of three output streams of a
Join tool, we know that we will have matching column names.
This allows us to use the Auto Config by Name setting for the
Union tool and leave the rest of the defauits.
We need to add a browse tool again as we have just altered
the structure of the data. This is to make sure the data looks the
way we expect. Notice that we are doing this after the Union and
not the Join, That is because when we are combining the three
Output streams of a Join tool using a Union, we are performing a
single logical step called an outer join. Because this is a single
step, we know that we should check both tools if an issue arises,
We are getting close to our goal; however; the data stream
is also starting to get complex. So we should take a minute to
annotate what we have so it will be easier to follow later. We are
going to add Tool Containers and Comments to the two
contributing data streams so we can easily identify different parts
of this data stream. We can create the comments and containers
like we see in the next image150 The Games
Figure 2-107 - Nobel Laureates - Comments and Containers
We can now drag the appropriate tools into the tool
containers so the data stream is easier to understand.The Games 151
| Summarizing Medal count
and preparing for Join
Summarizing Nobel
Laureate count and
preparing for Join
Figure 2-108 - Nobel Laureates -Output Stream with Comments
Looking at the data stream this way is helpful, but if we
click on the arrows at the top-right corner, we can condense what
we are looking at.
me
iA]
Figure 2-109 - Nobel Laureates -Simplified Output Stream
Now we can easily see the medal count preparation and
the Nobel Laureate count preparation as two single processes
instead of a series of tools. Now that we have made the data
stream easier to understand, we should finish building. the
workflow.152 ‘The Games
Pasults - Output Date @
ES] tt | Woot coed
I Records coutrm —otaohmic tats Totaloel inet
Atahostan
(away
[vui}
ull
‘k
Wai
(ui)
“vail
f wot)
Braai fai]
Bulges ear
Cenads
chile
Figure 2-110- Nobel Laureates -Browse Configuration
We can observe from the Browse that the country nam
matched the names in both the Country and Birth Counity fields.
Let’s create a conditional formula that allows us to convert the
two columns with nulls into a single column that always has a
Country name.
Add a formula tool to the end of the data stream with a
formula called Countries with the formula: IF IsNull ({Country])
THEN [Birth Country] ELSE [Country] ENDIF. This will take
the Country value unless it is null and the Birth County if it is.
Now we only need to clean up the data and export it toa
-¢sv file. Add a Select tool with the following configuration, and
export the file to Brains vs Brawns.csv.‘G Select (27)-Contguraton :
sx] Ortens = Te Tor om ade ick os
Sf ome TT (re mbps se ikon
Tach Coun, Birth County
nd Unknown
Figure 2-111 - Nobel Laureates -Select ConfigurationThe Games
ee
The final workflow is as shown in the following figure