AB INITIO
( Day -2 )
A Practical
Introduction to
Ab Initio
Software:
Part 2
AB INITIO
Simple
Components
Component Organizer
Click on Component
Organizer Button
The Graph Model
The Graph Model: Naming the
Pieces
Components
Datasets
Dataset
Flows
The Graph Model: Some Details
Ports
Record format
Expression
metadata
metadata
Components
Components may run on any computer
running the Co>Operating System.
Different components do different jobs.
The particular work a component
accomplishes depends upon its parameter
settings.
Some parameters are data
transformations, that is business rules to
be applied to an input(s) to produce a
required output.
Datasets
A dataset is a source or destination of
data. It can be a simple file, a database
table, a SAS dataset, ...
Datasets may reside on any machine
running the Co>Operating System.
Datasets may reside on other machines if
connected by FTP or database
middleware.
Data is always described by record format
metadata (termed “DML”).
Dataset : Simple components
• These are main
dataset components
use for the data
Source & result
storage in the files.
Input file
Input File represents data records read as
input to a graph from one or multiple
serial files or from a multifile.
Description Tab :
URL : Path of the file where data is stored.
( file: / mfile: )
Partition : Ad hoc multifile (changing depth)
Access Tab :
File Handling
File Protection
Ports : DML
Dataset: Records and Fields
0345John Smith
A dataset is
made up of 0212Sam Spade
records; a Records
0322Elvis Jones
record 0492Sue West
consists of 0121Mary Forth
fields.
0221Bill Black
Analogous
database
terms are Fields
rows and
columns
Sources of Record Format
Metadata
Record formats can be generated
from:
• Database catalogs
• COBOL copybooks
• Other third-party products
• SAS datasets
One can always resort to manual
entry!
Viewing Component Properties
Double click on a
component to bring
up its Properties Page
Viewing Port Properties
DML
Record Format Metadata in
Graphical Form
0345John Smith
0212Sam Spade
0322Elvis Jones
0492Sue West
0121Mary Forth
0221Bill Black
DML Types
Fixed length
Delimiter
Mixed
Editing Types in GDE
Field name Field type Field length
DML : Fixed length
record
decimal(4) id;
string(6) first_name;
string(6) last_name;
string(1) new_line;
end
DML : Delimited
record
decimal(“|”) id;
string(„|‟) first_name;
string(“|”) last_name;
string(“\n”) new_line;
end
DML : Mixed
record
decimal(4) id;
string(„|‟) first_name;
string(“|”) last_name;
string(1) new_line;
end
Field Names
Names consist of letters, digits, and
underscores:
a … z, A … Z, 0 … 9, _
Note: No spaces, hyphens, $‟s, #‟s, %‟s
Case does matters! ABC and abc are
different!
Some words are reserved (record, end,
date, …)
Field Type and Field Length
• There are several built-in types available
via the drop-down menu. This course
uses three types: string, decimal (for all
numbers), and date.
• A date type requires a format specifier
that is an exact representation of the date
(e.g., “MM-DD-YYYY”).
• A field length is either a number for fixed-
length fields, or the delimiter that
terminates the field for variable-length
fields.
What Data Can Be Described?
There are both fixed-size and
variable-length types.
ASCII, EBCDIC, UNICODE character
sets are supported.
Supported types can represent
strings, numbers, binary numbers,
packed decimals, dates …
Complex data formats can consist of
nested records, vectors, ...
Access to Field Characteristics
Some aspects of field descriptions
(e.g., date formats) must be
accessed via the attribute pane.
To see additional attributes, use the
„Attributes‟ item on the Record
Format Editor‟s View Menu or use the
Attributes button.
More Record Format Editing
View… Attributes. Length can be delimiter string
Field Type drop-down Date format goes here
Expressions in DML
Computations are expressed in the
algebraic syntax of C.
Field names act as variables.
Arithmetic operators: +, -, *, ...
Comparison operators: >, <, ==, !=, ...
Many built-in functions: string_concat,
string_trim, today, date_day_of_week, …
(but field sequence dependency)
Output file
Output File represents data records written
as output from a graph into one or
multiple serial files or a multifile.
Description Tab :
URL : Path of the file where data is stored.
( file: / mfile: )
Partition : Ad hoc multifile (changing depth)
Access Tab :
File Handling
File Protection
Port : DML
Intermediate file
Intermediate File represents one or
multiple serial files or a multifile of
intermediate results that a graph writes
during execution, and saves for your
review after execution.
Description Tab :
URL : Path of the file where data is stored.
( file: / mfile: )
Partition : Ad hoc multifile (changing depth)
Access Tab :
File Handling
File Protection
Port : DML
Viewing Data
1. Right click on dataset.
2. Select “View Data...”
The View Data Panel
Evaluating Expressions from
View Data
Type in an expression...
…or use the expression editor
Expression Editor
Fields Functions Operators
Expression text
Exercise : Writing DML
Open New Graph create input file
The data file data1.dat contains following data:
Rao,Sunita,20031223,24000,\n
Shinde,Sachin,19931029,32000,\n
Sharma,Sunil,19941102,19000,\n
Use the Record Format Editor to create a
description of this data:
last_name
first_name
joining_date
salary
Then use View Data to verify the description is
correct.
Simple components
In these
components don‟t
have any
parameter
Trash
Trash ends a flow by accepting all
the data records in it and discarding
them.
Replicate
Replicate arbitrarily combines all the
data records it receives into a single
flow and writes a copy of that flow to
each of its output flows.
Component: Gather Logs
Reads logging records from
multiple flows connected to the
input port and writes them to the
specified „log file‟ outside of the
application‟s transactional context.
Database Components
In these
components deals
with the third
party databases
for data Reading,
Manipulating and
Saving data in the
tables.
Note : Parameters changes depend on the database and utility to connect
the database.
Database Configuration (.dbc)
dbms: oracle ## Required. Do not change
db_version: 9.2.0.1 ## Required. Enter the Oracle version
number
db_home: /etl_test/u01/app/oracle/product/9.2.0.1 ##
ORACLE_HOME
db_name: @RDMETL ## Connect string
db_nodes: localhost
user: abinitio1 ## Or use a variable to avoid hard coding -
${MY_USER}
password: abinitio ## Can be encrypted from 2.12 onwards
Input Table
Input Table unloads data records from a
database into an Ab Initio graph, allowing
you to specify as the source either a
database table, or an SQL statement that
selects data records from one or more
tables.
Output Table
Output Table loads data records from a
graph into a database, letting you specify
the records' destination either directly as a
single database table, or through an SQL
statement that inserts data records into
one or more tables..
Run SQL
Run SQL executes SQL statements in a
database and writes confirmation
messages to the log port.
You can use Run SQL to perform database
operations such as table or index creation.
Exercise: Input Table / Run SQL
Create DBC to connect the database
Create table temp1 with columns
id number
first_name varchar2(10)
last_name varchar2(10)
Create index on id column
Insert dummy data in the table using
database insert statements
View data in GDE using Input Table.
Update table
Update Table executes UPDATE, INSERT or
DELETE statements in embedded SQL
format to modify a table in a database, and
writes status information to the log port.
Port SQL is associated with the parameter
updateSqlOnceOnly.
• true :executed once (only), at the start of the
component's execution.
• false :first executed, and re-executed
immediately after each commit
Update Table : Working
It‟s work like merge (upsert) command:
The statements are applied to the
incoming records as follows. For each
record:
• The statement referenced by updateSqlFile is
attempted first. If the statement can be
successfully applied to the current record, it is
executed, and the statement referenced by
insertSqlFile is skipped.
• If the updateSqlFile statement cannot be
applied to the current record, the statement
referenced by insertSqlFile is attempted.
Note that updateSqlFile and insertSqlFile need not be files: the SQL
statements can be embedded in the component directly.
Simple Components
In these
components the
record format
metadata does
not change from
input to output
The Filter by Expression
For each record on the input port the
„select_expr‟ parameter is evaluated.
• If „select_expr‟ evaluates true (non-zero), the
input record is written to the „out‟ port exactly
as the input was read.
• If the „select_expr‟ evaluates false (zero), the
record is written to the „deselect‟ port.
The „out‟ port must be connected
downstream, those records meeting the
„select_expr‟ criteria
The „deselect‟ output may be optionally
used
Filter Data (Selection)
1. Push “Run” button.
2. View monitoring information.
3. View output data.
Expression Parameter
The Sort Component
Reads records from input port, sorts
them by key, and writes the result
on the output port.
Keys
A key identifies a single field or set
of fields (a composite key) used to
organize a dataset in some way.
Single field: {id}
Multiple fields: {last_name;
first_name}
Modifiers: {id descending}
Used for sorting, grouping,
partitioning.
Sorting
Sorting - The Key Specifier Editor
Exercise : Sorting & Filtering
Read data1.dat
Sort data according to the id field
(asc. /desc.)
Save records having id <100 in
outfile1.dat
Save records having id >= 100 in
table temp1
Run the application and examine the
resulting data.
More Complex Components
In these
components the
record format
metadata typically
changes (goes
through a
transformation)
from input to
output
Reformat
Reads records from input port,
reformats each according to a
transform function (optional in the case
of the Reformat Component), and
writes the result records to the output
(out0) port.
Additional output ports (out1, ...) can
be created by adjusting the count
parameter.
Transformation Functions
A transform function specifies the
business rules used to create the
output record.
Each field of the output record must
successfully be assigned a value.
Partial output records are not
allowed!
The Transform Editor is used to
create a transform function in a
graphical manner.
Data Transformation
id,last_name,first_name,j_date,salary
Reformat Change format to Remove
DD/MM/YYYY
id+1000000
Combine
n_id,full_name,n_date
The Transform Function Editor
Text DML: Transform Function
Syntax
Transform Functions look like:
output-variables :: name ( input-
variables ) =
begin
assignments;
end;
Assignments look like:
output-variable.field :: expression;
The Transform Function in Text
Format
out :: reformat (in) =
begin
out.id :: in.id + 1000000;
out.last_name :: string_concat(“Mac”, in.last_name);
end;
A Look Inside the Reformat Component
a b c
x y z
A Record arrives at the input port
9 45 QF
out :: trans(in) =
begin
out.x :: in.b - 1;
out.y :: in.a;
out.z :: fn(in.c);
end;
The Record is read into the component
9 45 QF
out :: trans(in) =
begin
out.x :: in.b - 1;
out.y :: in.a;
out.z :: fn(in.c);
end;
The Transformation Function is evaluated
9 45 QF
out :: trans(in) =
begin
out.x :: in.b - 1;
out.y :: in.a;
out.z :: fn(in.c);
end;
Since every rule within the Transform
function
out :: trans(in) =
begin
out.x :: in.b - 1;
out.y :: in.a;
out.z :: fn(in.c);
end;
44 9 RG
The result record is written to the output port
of the component
out :: trans(in) =
begin
out.x :: in.b - 1;
out.y :: in.a;
out.z :: fn(in.c);
end;
44 9 RG
Exercise : Reformat Data
New graph (use i/p file data1.dat)
• id|city|name|salary
Remove records for employee having
id >500
Add a New field in output name
city (default value „Mumbai‟)
Change the delimiter from “|” to “;”
Increase salary by 25% for employee
having id <100
Run the graph and examine the results.
Rollup
Rollup generates data records that
summarize groups of data records.
By default, Rollup reads grouped
(sorted) records from the input port,
aggregates them as indicated by key
and transform parameters, and writes
the resulting aggregate record on the
out port.
Data Aggregation
0345Smith Bristol 56 Bristol 63
0212Spade London 8 Compton 12
0322Jones Compton 12 London 31
0492West London 23 New York 42
0121Forth Bristol 7
0221Black New York 42
Data Aggregation of
Sorted/Grouped Input
0345Smith Bristol 56
0121Forth Bristol 7 Bristol 63
0322Jones Compton 12 Compton 12
0212Spade London 8
0492West London 23 London 31
0221Black New York 42 New York 42
Built-in Functions for Rollup
The following aggregation functions
are predefined and are only available
in the rollup component:
avg
max
count
min
first
product
last
sum
Rollup Wizard
Note the use of an aggregation function in the expression
Exercise : Rollup Data
For above data find max, min points
associated with the city name
Save aggregation result in different
fields (max_pt, min_pt)
Run the application and examine the
results.
The Join Component
Join performs a join of inputs.
Join types are inner, outer, and
semi-joins with multiple flows of
data records.
By default, the inputs to join must be
sorted and an inner join is computed.
Joining Data
0345Smith Bristol 56 0322970402 1242.50
0212Spade London 8 0345970924 923.75
0322Jones Compton 12 0121961211 12392.00
0492West London 23 0492971123 234.12
0121Forth Bristol 7 0666950616 2312.10
0221Black New York 42
0345Bristol 561997/09/24
0212London 81900/01/01
0322Compton 121997/04/02
0492London 231997/11/23
0121Bristol 71996/12/11
0221New York 421900/01/01
Joining Sorted Data on the „id‟ field
0121Forth Bristol 7 0121961211 12392.00
0212Spade London 8
0221Black New York 42
0322Jones Compton 12 0322970402 1242.50
0345Smith Bristol 56 0345970924 923.75
0492West London 23 0492971123 234.12
0666950616 2312.10
0121Bristol 71996/12/11
0212London 81900/01/01
...
Building the Output Record
in0: in1:
record record
decimal(4) id; decimal(4) id;
string(6) name; date(”YYMMDD”) dt;
string(8) city; decimal(9.2) cost;
decimal(3) amount; end
end
out:
record
decimal(4) id;
string(8) city;
decimal(3) amount;
date(“YYYY/MM/DD”)dt;
end
What if the in1 record is missing?
in0: in1:
record record
decimal(4) id; decimal(4) id;
string(6) name; date(”YYMMDD”) dt; ???
string(8) city; decimal(9.2) cost;
decimal(3) amount; end
end
out:
record
decimal(4) id;
string(8) city;
decimal(3) amount;
date(“YYYY/MM/DD”)dt;
end
Prioritized Assignment
Destination Priority Source
out.dt :1: in1.dt;
out.dt :2: “1900/01/01”;
In DML, a missing value (say, if there is
no „in1‟ record) causes an assignment
to fail.
If an assignment for a left hand side
fails, the next priority assignment is
tried. There must be one successful
assignment for each output field.
Assigning Priorities to Business
Rules
Resulting display when out.dt is
selected
Joining
A Look Inside the Join Component*
a b c a q r
Align inputs by key *join-type = Full
Outer join
a b c a q r
out :: fname(in0, in1) =
begin
...
...
...
...
...
end;
a x q
Records arrive at the inputs of the Join
G 234 42 G NY 4
Align inputs by a
out :: join(in0, in1) =
begin
out.a : : in0.a;
out.x :1: in1.r + 20;
out.x :2: in0.b + 10;
out.q :1: in1.q;
out.q :2: ”XX”;
end;
The input records are read into the Join
G 234 42 G NY 4
Align inputs by a
out :: join(in0, in1) =
begin
out.a : : in0.a;
out.x :1: in1.r + 20;
out.x :2: in0.b + 10;
out.q :1: in1.q;
out.q :2: ”XX”;
end;
The input Key fields are compared
G 234 42 G NY 4
Align inputs by a
out :: join(in0, in1) =
begin
out.a : : in0.a;
out.x :1: in1.r + 20;
out.x :2: in0.b + 10;
out.q :1: in1.q;
out.q :2: ”XX”;
end;
The aligned records are passed to the
transformation function
Align inputs by a
G 234 42 G NY 4
out :: join(in0, in1) =
begin
out.a : : in0.a;
out.x :1: in1.r + 20;
out.x :2: in0.b + 10;
out.q :1: in1.q;
out.q :2: ”XX”;
end;
The transformation engine evaluates based
on the inputs
Align inputs by a
G 234 42 G NY 4
out :: join(in0, in1) =
begin
out.a : : in0.a;
out.x :1: in1.r + 20;
out.x :2: in0.b + 10;
out.q :1: in1.q;
out.q :2: ”XX”;
end;
A result record is emitted and written out
Align inputs by a
out :: join(in0, in1) =
begin
out.a : : in0.a;
out.x :1: in1.r + 20;
out.x :2: in0.b + 10;
out.q :1: in1.q;
out.q :2: ”XX”;
end;
G 24 NY
New records arrive at the inputs of the Join
H 79 23 K IL 8
Align inputs by a
out :: join(in0, in1) =
begin
out.a : : in0.a;
out.x :1: in1.r + 20;
out.x :2: in0.b + 10;
out.q :1: in1.q;
out.q :2: ”XX”;
end;
Again, they are read into the Join
component
H 79 23 K IL 8
Align inputs by a
out :: join(in0, in1) =
begin
out.a : : in0.a;
out.x :1: in1.r + 20;
out.x :2: in0.b + 10;
out.q :1: in1.q;
out.q :2: ”XX”;
end;
The input key fields are compared
H 79 23 K IL 8
Align inputs by a
out :: join(in0, in1) =
begin
out.a : : in0.a;
out.x :1: in1.r + 20;
out.x :2: in0.b + 10;
out.q :1: in1.q;
out.q :2: ”XX”;
end;
The aligned records are passed to the
transformation function
K IL 8
Align inputs by a
H 79 23
out :: join(in0, in1) =
begin
out.a : : in0.a;
out.x :1: in1.r + 20;
out.x :2: in0.b + 10;
out.q :1: in1.q;
out.q :2: ”XX”;
end;
The transformation engine evaluates based
on the inputs
K IL 8
Align inputs by a
H 79 23
out :: join(in0, in1) =
begin
out.a : : in0.a;
out.x :1: in1.r + 20;
out.x :2: in0.b + 10;
out.q :1: in1.q;
out.q :2: ”XX”;
end;
A result record is generated and written out
K IL 8
Align inputs by a
out :: join(in0, in1) =
begin
out.a : : in0.a;
out.x :1: in1.r + 20;
out.x :2: in0.b + 10;
out.q :1: in1.q;
out.q :2: ”XX”;
end;
H 89 XX
Exercise: Join Data
Study different joins
• Inner
• Full Outer
• Explicit
Record required parameter
If a port does not have a record with a key
value that matches the current key value,
and you set the record-required parameter
for that port to:
• false - Join calls the transform function with NULL
for the corresponding argument.
• true - Join does not call the transform function at
all for the current key value.
The GDE Debugger
The GDE has a built in debugger
capability
To enable the Debugger,
Debugger:Enable Debugger
The Debugger Toolbar
Enable Debugger Remove All Watchers
Add Watcher File Isolate Components
The GDE Debugger
To add a Watcher File, select a flow and
click Add Watcher
To remove a Watcher File, click Remove
All Watchers
To Isolate a set of components, select the
components to be Isolated, Watcher Files
will automatically be placed into the graph
by the Debugger.
END
( Day -2 )