Mail: Kitsonlinetrainings@gmail.
com
Phone: +91 9959766329
Ab initio Interview Questions
Q.What is surrogate key?
Answer: surrogate key is a system
generated sequential number
which acts as a primary key.
Q.Differences Between Ab-Initio and
Informatica?
Answer: Informatica and Ab-Initio both
support parallelism. But Informatica
supports only one type of parallelism
but the Ab-Initio supports three types of
parallelisms.
Component
Data Parallelism
Pipe Line parallelism.
We don’t have scheduler in Ab-Initio like
Informatica , you need to schedule
through script or you need to run
manually.
Ab-Initio supports different types of text
files means you can read same file with
different structures that is not possible
in Informatica, and also Ab-Initio is
more user friendly than Informatica .
Informatica is an engine based ETL tool,
the power this tool is in it’s
transformation engine and the code that
it generates after development cannot
be seen or modified.
Ab-Initio is a code based ETL tool, it
generates ksh or bat etc. code, which
can be modified to achieve the goals, if
any that can not be taken care through
the ETL tool itself.
Initial ramp up time with Ab-Initio is
quick compare to Informatica, when it
comes to standardization and tuning
probably both fall into same bucket.
Ab-Initio doesn’t need a dedicated
administrator, UNIX or NT admin will
suffice, where as Informatica need a
dedicated administrator.
With Ab-Initio you can read data with
multiple delimiter in a given record,
where as Informatica force you to have
all the fields be delimited by one
standard delimiter
Error Handling – In Ab-Initio you can
attach error and reject files to each
transformation and capture and analyze
the message and data separately.
Informatica has one huge log! Very
inefficient when working on a large
process, with numerous points of
failure.
Q.What is the difference between rollup
and scan?
Answer : By using rollup we cant
generate cumulative summary records
for that we will be using scan
Q.Why we go for Ab-Initio?
Answer : Ab-Initio designed to support
largest and most complex business
applications.
We can develop applications easily
using GDE for Business requirements.
Data Processing is very fast and
efficient when compared to other ETL
tools.
Available in both Windows NT and UNIX
Q.What is the difference between
partitioning with key and round robin?
Answer:
PARTITION BY KEY:
In this, we have to specify the key based
on which the partition will occur. Since it
is key based it results in very well
balanced data. It is useful for key
dependent parallelism.
PARTITION BY ROUND ROBIN:
In this, the records are partitioned in
sequential way, distributing data evenly
in blocksize chunks across the output
partition. It is not key based and results
in well balanced data especially with
blocksize of 1. It is useful for record
independent parallelism.
Q.How to Create Surrogate Key using
Ab Initio?
Answer. A key is a field or set of fields
that uniquely identifies a record in a file
or table.
A natural key is a key that is meaningful
in some business or real-world sense.
For example, a social security number
for a person, or a serial number for a
piece of equipment, is a natural key.
A surrogate key is a field that is added
to a record, either to replace the natural
key or in addition to it, and has no
business meaning. Surrogate keys are
frequently added to records when
populating a data warehouse, to help
isolate the records in the warehouse
from changes to the natural keys by
outside processes.
Q.What are the most commonly used
components in a Ab-Initio graphs?
Answer:
input file / output file
input table / output table
lookup / lookup_local
reformat
gather / concatenate
join
run sql
join with db
compression components
filter by expression
sort (single or multiple keys)
rollup
partition by expression / partition by key
Q.How do we handle if DML changing
dynamically?
Answer: There are lot many ways to
handle the DMLs which changes
dynamically with in a single file.
Some of the suitable methods are to
use a conditional DML or to call the
vector functionality while calling the
DMLs.
Q.What is meant by limit and ramp in
Ab-Initio? Which situation it’s using?
Answer: The limit and ramp are the
variables that are used to set the reject
tolerance for a particular graph. This is
one of the option for reject-threshold
properties. The limit and ramp values
should pass if enables this option.
Graph stops the execution when the
number of rejected records exceeds the
following formula.
limit + (ramp *
no_of_records_processed).
The default value will be set to 0.0.
The limit parameter contains an integer
that represents a number of reject
events The ramp parameter contains a
real number that represents a rate of
reject events in the number of records
processed.
Typical Limit and Ramp settings
Limit = 0 Ramp = 0.0 Abort on any error
Limit = 50 Ramp = 0.0 Abort after 50
errors
Limit = 1 Ramp = 0.01 Abort if more
than 2 in 100 records causes error
Limit = 1 Ramp = 1 Never Abort
Q.What are data mapping and data
modeling?
Answer: Data mapping deals with the
transformation of the extracted data at
FIELD level i.e. the transformation of
the source field to target field is
specified by the mapping defined on the
target field. The data mapping is
specified during the cleansing of the
data to be loaded.
For Example:
source;
string(35) name = “Siva Krishna “;
target;
string(“01”) nm=NULL(“”);/*(maximum
length is string(35))*/
Then we can have a mapping like:
Straight move.Trim the leading or
trailing spaces.
The above mapping specifies the
transformation of the field nm.
Q.What is the difference between a DB
config and a CFG file?
Answer : .dbc file has the information
required for Ab Initio to connect to the
database to extract or load tables or
views. While .CFG file is the table
configuration file created by db_config
while using components like Load DB
Table
Q.What is mean by Layout?
Answer: A layout is a list of host and
directory locations, usually given by the
URL of a file or multi file. If a layout has
multiple locations but is not a multi file,
the layout is a list of URLs called a
custom layout.
A program component’s layout is the list
of hosts and directories in which the
component runs.
A dataset component’s layout is the list
of hosts and directories in which the
data resides. Layouts are set on the
Properties Layout tab.
The layout defines the level of
Parallelism . Parallelism is achieved by
partitioning data and
computation across processors.
Q.What are Cartesian joins?
Answer: A Cartesian join will get you a
Cartesian product. A Cartesian join is
when you join every row of one table to
every row of another table. You can also
get one by joining every row of a table to
every row of itself.
Q.What is the function you would use to
transfer a string into a decimal?
Answer: For converting a string to a
decimal we need to typecast it using the
following syntax,
out.decimal_field :: ( decimal(
size_of_decimal ) ) string_field;
The above statement converts the
string to decimal and populates it to the
decimal field in output.
Q.How do we handle if DML changing
dynamically?
Answer: There are lot many ways to
handle the DMLs which changes
dynamically with in a single file. Some
of the suitable methods are to use a
conditional DML or to call the vector
functionality while calling the DMLs.we
can use MULTIREFORMAT component
to handle dynamically changing DML’s.
Q.Explain the differences between api
and utility mode?
Answer: API and UTILITY are the two
possible interfaces to connect to the
databases to perform certain user
specific tasks. These interfaces allow
the user to access or use certain
functions (provided by the database
vendor) to perform operation on the
databases. The functionality of each of
these interfaces depends on the
databases.
API has more flexibility but often
considered as a slower process as
compared to UTILITY mode. Well the
trade off is their performance and
usage.
Contact for Ab initio training
Q.What are the uses of is_valid,
is_define functions?
Answers:
is_valid and is_defined are Pre defined
functions
is valid(): Tests whether a value is valid.
The is_valid function returns:
The value 1 if expr is a valid data
item.
The value 0 if the expression does
not evaluate to NULL.
If expr is a record type that has field
validity checking functions, the is_valid
function calls each field validity
checking function. The is_valid function
returns 0 if any field validity checking
function returns 0 or NULL.
Example:
is_valid(1) 1
is_valid(“oao”) 1
is_valid((decimal(8))”1,000″) 0
is_valid((date(“YYYYMMDD”))”19960504″)
1
is_valid((date(“YYYYMMDD”))”abcdefgh”)
0
is_valid((date(“YYYY MMM DD”))”1996
May 04″) 1
is_valid((date(“YYYY MMM
DD”))”1996*May&04″) 0
is defined():
Tests whether an expression is not
NULL.
The is_defined function returns:
The value 1 if expr evaluates to a non
NULL value.
The value 0 otherwise.
The inverse of is_defined is is_null.
Q.What is meant by merge join and
hash join? Where those are used in Ab
Initio?
Answer: The command line syntax for
Join Component consists of two
commands. The first one calls the
component, and is one of two
commands:
mp merge join to process sorted
input
mp hash join to process unsorted
input
Q.What is data mapping and data
modelling?
Answer: Data mapping deals with the
transformation of the extracted data at
FIELD level i.e. the transformation of the
source field to target field is specified
by the mapping defined on the target
field. The data mapping is specified
during the cleansing of the data to be
loaded
What is the difference between sandbox
and EME, can we perform checkin and
checkout through sandbox/ Can
anybody explain checkin and checkout?
Sandboxes are work areas used to
develop, test or run code associated
with a given project. Only one version of
the code can be held within the sandbox
at any time.
The EME Datastore contains all versions
of the code that have been checked into
it.A particular sandbox is associated
with only one Project where as a Project
can be checked out to a number of
sandboxes
Q.What are the Graph parameter?
Answer: The graph paramaters are one
which are added to the respective
graph. You can added the graph
parameters by selecting the
edit>parameters from the menu tab.
Here’s the example for the graph
parameters.
If you want to run a same graph for n
number of files in a directory, You can
assign a graph parameter to the input
file name and you can supply the
paramter value from the script before
invoking the graph.
How to Schedule Graphs in Ab Initio, like
workflow Schedule in Informatica? And
where we must is Unix shell scripting in
Ab Initio?
Q.How to Improve Performance of
graphs in Ab initio? Give some
examples or tips.
There are so many ways to improve the
performance of the graphs in Ab initio.
Here are few points
Use MFS system using Partion by
Round by robin.
.If needed use lookup local than
lookup when there is a large data.
Takeout unnecessary components
like filter by exp instead provide them
in reformat/Join/Rollup.
Use gather instead of concatenate.
Tune Max_core for Optional
performance.
Try to avoid more phases.
Go Parallel as soon as possible using
Ab Initio Partitioning technique.
Once Data Is partitioned do not bring
to serial , then back to parallel.
Repartition instead.
For Small processing jobs serial may
be better than parallel.
Do not access large files across NFS,
Use FTP component
Use Ad Hoc MFS to read many serial
files in parallel and use concat
coponenet.
Using Phase breaks let you allocate
more memory to individual
component and make your graph run
faster
Use Checkpoint after the sort than
land data on to disk
Use Join and rollup in memory
feature
Best performance will be gained
when components can work with in
memory by MAX CORE.
MAR CORE for SORT is
calculated by finding size of
input data file.
For In memory join memory needed
is equal to non driving data size +
overhead.
If in memory join cannot fir its
non driving inputs in the
provided MAX CORE then it will
drop all the inputs to disk and in
memory does not make sence.
Use rollup and Filter by EX as soon
as possible to reduce number of
records.
When joining very small dataset
to a very large dataset, it is
more efficient to broadcast the
small dataset to MFS using
broadcast component or use
the small file as lookup.
Use MFS, use Round robin partition
or load balance if you are not joining
or rollup
Filter the data in the beginning
of the graph.
Take out unnecessary components
like filter by expression instead use
select expression in join, rollup,
reformat etc
Use lookups instead of joins if
you are joining small tale to
large table.
Take out old components use new
components like join instead of math
merge .
Use gather instead of concat
Use Phasing if you have too many
components
Tune the max core for optimal
performance
Avoid sorting data by using in
memory for smaller datasets join
Use Ab Initio layout instead of
database default to achieve parallel
loads
Change AB_REPORT parameter to
increased monitoring duration ( )
Use catalogs for reusability
Use sort after partition component
instead of before.
Partition the data as early as
possible and departition the data as
late as possible.
Filter unwanted fields/records as
early as possible.
Try to avoid the usage of join with db
component.
Q.How does force_error function work ?
If we set never abort in reformat , will
force_error stop the graph or will it
continue to process the next set of
records ?
Answer: force_error as the name
suggests it works on as to force an error
in case of not meeting of any conditions
mentioned.The function can be used as
per the requirement.
If you want to stop execution of graph in
case of not meeting a specific condition
say you have to compare the input and
out put records reconciliation and the
graph should fail if the input record
count is not same as output record
count
“THEN set the reject-threshold to Abort
on first reject” so that the graph stops.
Note:- force_error directs all the records
meeting the condition to reject port with
the error message to error port.
In certain special circumstances you
can also use to treat the reject port as
an additional data flow path leaving the
component.When using force_error to
direct valid records to the reject port for
separate processing you must
remember that invalid records will also
be sent there.
Q.What are the most commonly used
components in a Ab inition graph?can
anybody give me a practical example of
a trasformation of data, say customer
data in a credit card company into
meaningful output based on business
rules?
Answer: The most commonly used
components in to any Ab Initio project
are
input file/output file
input table/output table
lookup file
reformat,gather,join,runsql,join with
db,compress
components,sort,trash,partition by
expression,partition by key ,concatinate
Q.How to work with parameterized
graphs?
Answer: One of the main purpose of the
parameterized graphs is that if we need
to run the same graph for n number of
times for different files, we set up the
graph parameters like $INPUT_FILE,
$OUTPUT_FILE etc and we supply the
values for these in the
Edit>parameters.These parameters are
substituted during the run time. we can
set different types of parameters like
positional, keyword, local etc.
The idea here is, instead of maintaining
different versions of the same graph, we
can maintain one version for different
files.
Q.What is the use of unused port in join
component?
Answer: While joining two input flows,
records which match the join condition
goes to output port and we can get the
records which do not meet the join
condition at unused ports.
Q.What is meant by dedup Sort with
null key?
Answer: If we don’t use any key in the
sort component while using the dedup
sort, then the output depends on the
keep parameter. It considers whole
records as one group
first – only the first record
last – only last record
unique_only – there will be no records in
the output file.
Q.Hi can anyone tell me what happens
when the graph run? i.e The Co-
operating System will be at the host,
We are running the graph at some other
place. How the Co-operating System
interprets with Native OS?
Answer: CO-operating system is layered
on the top of the native OS
When a graph is executed it has to be
deployed in host settings and
connection method like rexec, telnet,
rsh, rlogin This is what the graph
interacts with the co>op.
when ever you press Run button on your
GDE,the GDE genarates a script
and the genarated script will be
transfered to your host which is
specified in to your GDE run settings.
then the Co>operating system interprets
this script and executes the script on
different mechins(if required) as a sub
process(threads),after compleation of
each sub process,these sub_processes
will return status code to main process
this main process in tern returns error or
sucess code of the job to GDE
Q. Difference between conventional
loading and direct loading? When it is
used in real time.
Answer:
Conventional Load:
Before loading the data, all the Table
constraints will be checked against the
data.
Direct load:(Faster Loading)
All the Constraints will be disabled. Data
will be loaded directly. Later the data
will be checked against the table
constraints and the bad data won’t be
indexed.
Api conventional loading
utility direct loading.
Q.explain the environment varaibles
with example.?
Answer: Environemental variables
server as global variables in unix
envrionment. They are used for passing
on values from a shell/ process to
another. They are inherited by Abinitio
as sandbox variables/ graph parameters
like
AI_SORT_MAX_CORE
AI_HOME
AI_SERIAL
AI_MFS etc.
To know what all variables exist, in your
unix shell, find out the naming
convention and type a command like |
grep . This will provide you a list of all
the variables set in the shell. You can
refer to the graph parameters/
components to see how these variables
are used inside Abinitio.
Q.How to find the number of arguments
defined in graph ?
Answer: List of shell arguments $*.
then what is $# and $? …
$# – No of positional parameters
$? – the exit status of the last executed
command
Q.How many numbers of inputs join
component support ?
Answer: Join will support maximum of
60 inputs and minimum is 2 inputs.
Q.What is max-core? What are the
Components that use MAX_CORE?
Answer: The value of the MAX_CORE
parameter is that it determines the
maximum amount of memory, in bytes,
that a specified component will use. If
the component is running in parallel, the
value of MAX_CORE represents the
maximum memory usage per partition.
If MAX_CORE is set too low the
component will run slower than
expected. Too high and the component
will use too many machine resources
and slow up Dramatically.
The Max core parameter can be defined
in the following components:
SCAN
in-memory SCAN
ROLLUP
in-memory ROLLUP
in-memory JOIN
SORT
Whenever these components are used
and have the component set to
parameter set to “In-memory; Inputs
need not be sorted”, a max-core variable
must be specified.
Q.What does dependency analysis
mean in Ab Initio?
Answer :
Dependency Analysis
It analyses the Project for the
dependencies within and between the
graphs. The EME examines the Project
and develops a survey tracing how data
is transformed and transferred field by
field from component to component.
Dependency analysis has two basic
steps:
Translation
Analysis
Analysis Level:
In the check in wizard’s advanced
options, the analysis level can be
specified as one of the following:
None:
No dependency analysis is
performed during the check in.
Translation only:
Graph being checked in is translated to
data store format but no error checking
is done. This is the minimum
requirement during check in.
Translation with checking: (Default)
Along with the translation, errors, which
will interfere with dependency analysis,
are checked for. These include:
Absolute paths
Undefined parameters
dml syntax errors
Parameter reference to objects that
can’t be resolved
Wrong substitution syntax in
parameter definition
Full Dependency Analysis:
Full dependency analysis is done during
check in. It is not recommended as
takes a long time and in turn can delay
the check in process.
What to analyse:
All files:
Analyse all files in the Project
All unanalysed files:
Analyse all files that have been changed
or which are dependent on or required
by files that have changed since the last
time they were analysed.
Only my checked in files:
All files checked in by you would be
analysed if they have not been before.
Only the file specified:
Apply analysis to the file specified only.
Abinitio Online Training
Q.what is the difference between .dbc
and .cfg file?
Answer: .cfg file is for the remote
connection and .dbc is for connecting
the database.
.cfg contains :
The name of the remote machine
The username/pwd to be used while
connecting to the db.
The location of the operating system
on the remote machine.
The connection method.
.dbc file contains :
The database name
Database version
Userid/pwd
Database character set and some
more.
Q.What are the Graph parameter?
Answer: There are 2 types of graph
parameters in AbInitio
1. local parameter
2. Formal parameters.(those
parameters working at runtime)
Q.How many types of joins are in Ab-
Initio?
Answer: Join is based on a match key
for inputs, Join components describes
out port, unused ports, reject ports and
log port.
Inner Joins:
The most common case is when join-
type is Inner Join. In this case, if each
input port contains a record with the
same value for the key fields, the
transform function is called and an
output record is produced.
If some of the input flows have more
than one record with that key value, the
transform function is called multiple
times, once for each possible
combination of records, taken one from
each input port.Whenever a particular
key value does not have a matching
record on every input port and Inner
Join is specified, the transform function
is not called and all incoming records
with that key value are sent to the
unused ports.
Full Outer Joins:
Another common case is when join-type
is Full Outer Join: if each input port has
a record with a matching key value, Join
does the same thing as it does for an
Inner Join. If some input ports do not
have records with matching key values,
Join applies the transform function
anyway, with NULL substituted for the
missing records. The missing records
are in effect ignored. With an Outer Join,
the transform function typically requires
additional rules (as compared to an
Inner Join) to handle the possibility of
NULL inputs.
Explicit Joins:
The final case is when join-type is
Explicit. This setting allows you to
specify True or False for the record-
required n parameter for each in n port.
The settings you choose determine
when Join calls the transform function.
The join-type and record-required n
Parameters
The two intersecting ovals in the
diagrams below represent the key
values in the records on the two ports —
in0 and in1 — that are the inputs to join:
For each possible setting of join-type or
(if join-type is Explicit) combination of
settings for
record-required n, the shaded region of
each of the following diagrams
represents the inputs for which Join
calls the transform. Join ignores the
records that have key values
represented by the white regions, and
consequently those records go to the
unused port.
Q.what is semi-join ?
Answer: A left semi-join on two input
files, connected to ports in0 and in1 is
the Inner Join .The dedup0 parameter
is set to Do not dedup this input, but
dedup1 is set to Dedup this input before
joining.
Duplicates were removed from only the
in1 port, that is, from Input File 2.
semijoins can be achieved by using the
join component with parameter
Join Type set to explicit join and the
parameters
recordrequired0,recordrequired1 set one
to true and the other false depending on
whether you require left outer or right
outer join.
in abinitio,there are 3 types of join…
1.inner join. 2.outer join and
3.semi join.
for inner join ‘record_requiredn’
parameter is true for all in ports.
for outer join it is false for all the in
ports.
if u want the semi join u put
‘record_required n’ as true for the
required component and false for other
components..
Q.How to do we run sequences of jobs?
like output of A JOB is Input to B
how do we co-ordinate the jobs ?
Answer: By writing the wrapper scripts
we can control the sequence of
execution of more than one job.
Q.How would you do performance
tuning for already built graph ? Can you
let me know some examples?
Answer:
example :- 1.)suppose sort is used in
fornt of merge component its no use of
using sort ! because we have sort
component built in merge.
2) we use lookup instead of JOIN,Merge
Component.
3.) suppose we want to join the data
coming from 2 files and we don’t want
duplicates we will use union function
instead of adding additional component
for duplicate remover.
Q.What is the relation between EME ,
GDE and Co-operating system ?
Answer: EME is said as enterprise
metadata env,
GDE as graphical development env and
Co-operating system can be said as
abinitio server relation b/w this CO-OP,
EME AND GDE is as follows Co
operating system is the Abinitio
Server.This co-op is installed on
particular O.S platform that is called
NATIVE O.S .coming to the EME, its i
just as repository in informatica , its
hold the metadata,transformations,db
config files source and targets
information. coming to GDE its is end
user environment where we can develop
the graphs(mapping just like in
informatica) designer uses the GDE and
designs the graphs and save to the EME
or Sand box it is at user side where EME
is at server side.
Q.When we use Dynamical DML?
Answer: Dynamic DML is used if the
input meta data can change. Example:
at different time different input files are
received for processing which have
different dml. in that case we can use
flag in the dml and the flag is first read
in the input file received and according
to the flag its corresponding dml is
used.
Q.Explain the differences between
Replicate and BROADCAST?
Answer: Replicate takes records from
input flow arbitrarily combines and
gives to components which connected
to its output port.Broadcast is partition
component copies the input record to
components which connected to its
output port.Consider one example,input
file contains 4 records and level of
parallelism is 3 then Replicate gives 4
records to each component connected
to it’s out port whereas Broadcast gives
12 records to each component
connected to it’s out port.
Q.How do you truncate a table?
Answer: From Abinitio run sql
component using the DDL truncate table
By using the Truncate table component
in Ab Initio.
Q.How to get DML using Utilities in
UNIX?
Answer: By using the command
m_db gendml -table
Q.Explain the difference between
REFORMAT and Redefine FORMAT?
Answer: Reformat changes the record
format by adding or deleting fields in the
DML record. Length of the record can be
changed.
Redefine copies it’s input flow to it’s out
port without any transform.
Redefine is used to rename the fields in
the DML. But Length of record should
not change.
Q.How to work with parameterized
graphs?
Answer: Parameterized graphs
specifies everything through
parameters. i.e,data locations in
input/output
files,DMLs etc…
Q.What is driving port? When do you
use it?
Answer: When you set the sorted-input
parameter of “JOIN” component to “In
memory: Input need not be sorted”, you
can find the driving port.
Generally driving port use to improve
performance in a graph.
The driving input is the largest input. All
other inputs are read into memory.
For example, suppose the largest input
to be joined is on the in1 port. Specify a
port number of 1 as the value of the
driving parameter. The component
reads all other inputs to the join — for
example, in0, and in2 — into memory.
Default is 0, which specifies that the
driving input is on port in0.
Join also improves performance by
loading all records from all inputs
except the driving input into main
memory.
driving port in join supplies the data that
drives join . That means, for every
record from the driving port, it will be
compared against the data from non
driving port.
We have to set the driving port to the
larger dataset sothat non driving data
which is smaller can be kept in main
memory for speedingup the operation.
Contact for Abinitio Online Training
Q.How can we test the ab-Initio
manually and automation?
Answer: By running a graph through
GDE is manual test.
By running a graph using deployed
script is automated test.
Q.What is the difference between
partitioning with key and round robin?
Answer: Partition by Key or hash
partition -> This is a partitioning
technique which is used to partition
data when the keys are diverse. If the
key is present in large volume then there
can large data skew. But this method is
used more often for parallel data
processing.
Round robin partition is another
partitioning technique to uniformly
distribute the data on each of the
destination data partitions. The skew is
zero in this case when no of records is
divisible by number of partitions. A real
life example is how a pack of 52 cards
is distributed among 4 players in a
round-robin manner
Q.what is skew and skew
measurement?
Answer: skew is the measure of data
flow to each partition .
suppose i/p is coming from 4 files and
size is 1 gb
1 gb= ( 100mb+200mb+300mb+5oomb)
1000mb/4= 250 mb
(100- 250 )/500= –> -150/500 == cal ur
self it wil come in -ve value.
Cal clu for 200,500,300.
+ve value of skew is all ways desirable.
skew is a indericet measure of graph.
Q.What is error called ‘depth not equal’?
Answer: When two components are
linked together if their layout does not
match then this problem can occur
during the compilation of the graph. A
solution to this problem would be to use
a partitioning component in between if
there was change in layout.
Latest Ab initio Interview Questions
Ab initio Interview Questions Pdf
Q.What is the function you would use to
transfer a string into a decimal?
Answer : For converting a string to a
decimal we need to typecast it using the
following syntax, out.decimal_field :: (
decimal( size_of_decimal ) ) string_field;
The above statement converts the
string to decimal and populates it to the
decimal field in output.
Q.Which one is faster for processing
fixed length dmls or delimited dmls and
why?
Answer: Fixed length,because for
delimited dml it has to check for
delimiter every time but for fixed length
dml directly length will b taken.
Q.What are kinds of layouts does ab-
Initio supports?
Answer: Ab-Initio supports two kinds of
Layouts:
Serial Layout
Multi layout.
In Ab-Initio Layout tells which
component should run where and it also
gives level of parallelism.
For serial Layout,level of parallelism is
1.
For Multi layout,Level of parallelism
depends on data partition.
Q.How can you run a graph infinitely?
Answer:
To run a graph infinitely,
The end script of the graph should call
the .ksh file of the graph. Thus if the
name of the graph is abc.mp then in the
end script of the graph there should be a
call to abc.ksh. Then this graph will run
infinitely.
Run the deployed script in a loop
infinitely.
Q.what is local and formal parameter ?
Answer: Two are graph level parameters
but in local you need to initialize the
value at the time of declaration where
as globle no need to initialize the data it
will promt at the time of running the
graph for that parameter.
local parameter is like local variable in c
language where as formal parameter is
like command line argument we need to
pass at run time.
Q.what is BRODCASTING and
REPLICATE ?
Answer:Broadcast can do everything
that replicate does broadcast can also
send singlt file to mfs with out splitiong
and brodcast makes multiple copies of
single file mfs. Replicate combines data
rendomly, receives in single flow and
write a copy of that flow in each of
output flow.
replicate generates multiple straight
flows as the output where as broadcast
results single fanout flow.
replicate improves component
parallelism where as broadcast
improves data parallelism.
Broadcast – Takes data from multiple
inputs, combines it and sends it to all
the output ports.
Eg – You have 2 incoming flows (This
can be data parallelism or component
parallelism) on Broadcast component,
one with 10 records & other with 20
records. Then on all the outgoing flows
(it can be any number of flows) will have
10 + 20 = 30 records
Replicate – It replicates the data for a
particular partition and send it out to
multiple out ports of the component, but
maintains the partition integrity.
Eg – Your incoming flow to replicate has
a data parallelism level of 2. with one
partition having 10 recs & other one
having 20 recs. Now suppose you have
3 output flos from replicate. Then each
flow will have 2 data partitions with 10 &
20 records respectively.
Q.what is the importance of EME in
abinitio?
Answer: EME is a repository in Ab
Inition and it used for checkin and
checkout for graphs also maintains
graph version.
Q.what is m_dump
Answer: It is a co-opating system’s
command that we use to view data from
the command prompt.
m_dump command prints the data in a
formatted way.
m_dump
Q.what is the syntax of m_dump
command?
Answer: m_dump
Q.what are differences between
different GDE
versions(1.10,1.11,1.12,1.13and 1.15)?
Answer: what are differences between
different versions of Co-op?
1.10 is a non key version and rest are
key versions.
There are lot of components added and
revised at following versions.
Q.How to run the graph without GDE?
Answer: In the run directory a graph can
be deployed as a .ksh file. Now, this .ksh
file can be run at the command prompt
as:
ksh
Q.What is the Difference between DML
Expression and XFR Expression ?
Answer: dml expression means abinitio
dml are stored or saved in a file and dml
describs the data interms of
expressions that performs simple
computations such as files, dml also
contains transform functions that
control data transforms,and also
describs data interms of keys that
specify grouping or non grouping ,that
means dml expression are non
embedded record format files
.xfr means simply say it is non
embedded transform files ,Transform
function is express business rules ,local
variables, statements and as well as
conn between this elements and the
input and the ouput fields.
Q.How Does MAXCORE works?
Answer: Maxcore is a temporary
memory used to sort the records
Maxcore is a value (it will be in Kb).
Whenever a component is executed it
will take that much memory we
specified for execution
Maxcore is the maximum memory that
could be used by a component in its
execution.
Q.What is $mpjret? Where it is used in
ab-initio?
Answer: $mpjret is return value of shell
command “mp run” execution of Ab-
Initio graph.
this is generally treated as graph
execution status return value
Q.What is the latest version that is
available in Ab-initio?
Answer: The latest version of GDE
ism1.15 AND Co>operating system is
2.14
Q.What is mean by Co>Operating
system and why it is special for Ab-
initio ?
Answer: Co-Operating systems, that
itself means a lot, it’s not merely an
engine or interpretor. As it says, it’s an
operating system which co-exists with
another operating system. What does
that mean…. in layman’s term abinitio,
unlike other applications, does not sit as
a layer on top of any OS? It itself has
quite a lot of operating system level
capabilities such as multi files, memory
management and so on and this way it
completely integrate with any other OS
and work jointly on the available
hardware resources. This sort of
Synergy with OS optimize the utilization
of available hardware resources. Unlike
other applications (including most
other ETL tools) it does not work like a
layer and interprete the commands.
That is the major difference with other
ETL tools , this is the reason why
abinitio is much much faster than any
other ETL tool and obviously much
much costlier as well.
Q.How to take the input data from an
excel sheet?
Answer: There is a Read Excell
component that reads the excel either
from host or from local drive. The dml
will be a default one.
Through Read Excel component in
$AB_HOME we can read excell directly.
Q.How will you test a dbc file from
command prompt ??
Answer: You can test a dbc file from
command prompt(Unix) using m_db
test command which gives the checking
of data base connection, version of data
base, user
Q.Which one is faster for processing
fixed length dmls or delimited dmls and
why?
Answer: Fixed length DML’s are faster
because it will directly read the data of
that length without any comparisons but
in delimited one,s every character is to
be compared and hence delays
Q.what are the contineous components
in Abinitio?
Answer: Contineous components used
to create graphs,that produce useful
output file while running continously
Ex:- Contineous rollup,Contineous
update,batch subscribe
Q.How can I calculate the total memory
requirement of a graph?
Answer:
You can roughly calculate memory
requirement as:
Each partition of a component uses:~ 8
MB + max-core (if any)
Add size of lookup files used in phase (if
multiple components use same lookup
only count it once) Multiply by degree of
parallelism. Add up all components in a
phase; that is how much memory is used
in that phase.
Add size of input and output
datasets(Total memory requirement of a
graph) > (the largest-memory phase in
the graph).
Q.What is multistage component?
Answer: Multistage component are
nothing but the transform components
where the records are transformed into
five stages like input selection,
temporary records initialization,
processing , finalization and output
selection.
examples of multistage components
are like
Rollup
Scan
Normalize
Denormalize sorted.
Q.what is the use of aggregation when
we have rollup as we know rollup
component in ab-Initio is used to
summarize group of data record. then
where we will use aggregation ?
Answer:Rollup has a good control over
record selection grouping and
aggregation as compared to that of
aggregate. Rollup is an updated version
of aggregate.
When Rollup is in template mode ,it has
aggregation functions to use. So it is
better to go for Rollup.
Q.Phase verses Checkpoint ?
Answer:
Difference between a phase and
checkpoint .
phases are used to break up a graph so
that it does not use up all the memory ,
it limits the number of active
components thus reduce the number of
components running in parallel hence
improves the performance .Phases
make possible the effective utilization
of the resources such as memory disk
space and CPU So when we have
memory consuming components in the
straight flow and the data in flow is in
millions we can separate the
process out in one phase so as the CPU
allocation is more for the process to
consume less time for the whole
process to get over.
Temporary files created during a phase
will be deleted after completion of that
phase.
Don’t put phase after
Replicate,sort,across all to all flows and
temporary files.
Check points are used for the purpose
of recovery.
In contrary Checkpoints are like save
points .These are required if we need to
run the graph from the saved last phase
recovery file(phase break checkpoint) if
it fails unexpectedly.
At job start,output datasets are copied
into temporary files and after the
completion of check pointing all
datasets and job state are copied into
temporary files. so if any failure occurs
job can be run from last committed
check point.
Use of phase breaks which includes the
checkpoints would degrade the
performance but ensures save point
run.
The major difference between these two
is that phasing deletes the intermediate
files made at the end of each phase as
soon as it enters the next phase.
On the other hand what check pointing
does is…it stores these intermediate
files till the end of the graph. Thus we
can easily use the intermediate file to
restart the process from where it failed.
But this cannot be done in case of
phasing.
We can have phases without check
points.
We can not assign checkpoints without
phases.
Q.In Ab-Initio, How can you display
records between 50-75.. ?
Answer: Input dataset having 100
records. I want records between 50-75
then use m_dump -start 50 -end 75
For serial and mfs there are many ways
the components can be used.
1.Filter by Expression : use
next_in_sequence() >50 &&
next_in_sequence() <75 2.We can also
use multiple LEADING RECORDS
components for meeting the
requirement. If you have the access to
Co>Op then you can try an alternate.
Say suppose the input file is : file 1
Use the Run program component in GDE
and write
the below command: `sed -n50 75p file
1 > file 2`
Q.What is the order of evaluation of
parameters?
Answer: When you run a graph,
parameters are evaluated in the
following order
The host setup script is run.Common (i.e,
included) sandbox parameters are
evaluated.
Sandbox parameters are evaluated.
The project-start.ksh script is run.
Graph parameters are evaluated.
The graph Start Script is run.
The execution of process is run
simultaneously based component’s
layouts.
The Lookup files is run
The graph Meta data is checking
process.
The in/out file paths with files are
checking.
The graph runs as order of phase0,
phase1, phase2,..
Q.How do you convert 4-way MFS to
8-way mfs?
Answer: By partitioning. we can use any
partition method to partition.
Partitioning methods are:
Partition by Round-robin
Broadcast
Partition by Key
Partition by Expression
Partition by Range
Partition by Percentage
Partition by Load Balance
Q.For data parallelism,we can use
partition components. For component
parallelism,we can use replicate
component.Like this which
component(s) can we use for pipeline
parallelism?
Answer:When connected sequence of
components of the same branch of
graph execute concurrently is called
pipeline parallelism.
Components like reformat where we
distribute input flow to multiple o/p flow
using output index depending on some
selection criteria and process those o/p
flows simultaneously creates pipeline
parallelism.
But components like sort where entire
i/p must be read before a single record
is written to o/p can not achieve
pipeline parallelism.
Q.what is meant by fancing in abinitio ?
Answer:The word Abinitio means from
the beginning.
did you mean “fanning” ? “fan-in” ? “fan-
out” ?
Q.how to retrive data from database to
source in that case whice componenet
is used for this?
Answer:To unload (retrive) Data from
the database DB2, Informix, or Oracle
we have components like Input Table
and Unload DB Table by using these two
components we can unload data from
the database
Q.what is the relation between EME ,
GDE and Co-operating system ?
Answer: EME is said as enterprise
metdata env, GDE as graphical
devlopment env and Co-operating
sytem can be said as asbinitio server
relation b/w this CO-OP, EME AND GDE
is as fallows
Co operating system is the Abinitio
Server. this co-op is installed on
perticular O.S platform that is called
NATIVE O.S .comming to the EME, its i
just as repository in informatica , its
hold the metadata,trnsformations,db
config files source and targets
informations. comming to GDE its is
end user envirinment where we can
devlop the graphs(mapping just like in
informatica)
desinger uses the GDE and designs the
graphs and save to the EME or Sand box
it is at user side.where EME is ast server
side.
Q.what is the use of aggregation when
we have rollup
as we know rollup component in
abinitio is used to summirize group of
data record. then where we will use
aggregation ?
Answer: Aggregation and Rollup both
can summerise the data but rollup is
much more convenient to use. In order
to understand how a particular
summerisation being rollup is much
more explanatory compared to
aggregate. Rollup can do some other
functionalities like input and output
filtering of records.
Q.what are kinds of layouts does ab
initio supports
Answer: Basically there are serial and
parallel layouts supported by AbInitio. A
graph can have both at the same time.
The parallel one depends on the degree
of data parallelism. If the multi-file
system is 4-way parallel then a
component in a graph can run 4 way
parallel if the layout is defined such as
it’s same as the degree of parallelism.
Q.How can you run a graph infinitely?
Answer:To run a graph infinitely, the end
script in the graph should call the .ksh
file of the graph. Thus if the name of the
graph is abc.mp then in the end script of
the graph there should be a call to
abc.ksh.
Like this the graph will run infinitely.
Q.How do you add default rules in
transformer?
Answer: Double click on the transform
parameter of parameter tab page of
component properties, it will open
transform editor. In the transform editor
click on the Edit menu and then select
Add Default Rules from the drop down.
It will show two options – 1) Match
Names 2) Wildcard.
Q.Do you know what a local lookup is?
Answer: If your lookup file is a multifile
and partioned/sorted on a particular key
then local lookup function can be used
ahead of lookup function call. This is
local to a particular partition depending
on the key.
Lookup File consists of data records
which can be held in main memory. This
makes the transform function to retrieve
the records much faster than retrieving
from disk. It allows the transform
component to process the data records
of multiple files fastly.
Q.What is the difference between look-
up file and look-up, with a relevant
example?
Answer: Generally Lookup file
represents one or more serial files(Flat
files). The amount of data is small
enough to be held in the memory. This
allows transform functions to retrive
records much more quickly than it could
retrive from Disk.
A lookup is a component of abinitio
graph where we can store data and
retrieve it by using a key parameter.
A lookup file is the physical file where
the data for the lookup is stored.
Q.how to handle if DML changes
dynamically in abinitio
Answer: If the DML changes
dynamically then both dml and xfr has
to be passed as graph level parameter
during the runtime.
By parametrization or by conditional
record format or by metadata
Q.Explain what is lookup?
Answer: Lookup is basically a specific
dataset which is keyed. This can be
used to mapping values as per the data
present in a particular file (serial/multi
file). The dataset can be static as well
dynamic ( in case the lookup file is
being generated in previous phase and
used as lookup file in current phase).
Sometimes, hash-joins can be replaced
by using reformat and lookup if one of
the input to the join contains less
number of records with slim record
length.
AbInitio has built-in functions to retrieve
values using the key for the lookup.
Q.What is a ramp limit?
Answer: The limit parameter contains
an integer that represents a number of
reject events .
The ramp parameter contains a real
number that represents a rate of reject
events in the number of records
processed.
no of bad records allowed = limit + no of
records*ramp.
ramp is basically the percentage value
(from 0 to 1)
This two together provides the threshold
value of bad records.
Q.Have you worked with packages?
Answer: Multistage transform
components by default uses packages.
However user can create his own set of
functions in a transfer function and can
include this in other transfer functions.
Q.Have you used rollup component?
Describe how.
Answer: If the user wants to group the
records on particular field values then
rollup is best way to do that. Rollup is a
multi-stage transform function and it
contains the following mandatory
functions.
1. initialise
2. rollup
3. finalise
Also need to declare one temporary
variable if you want to get counts of a
particular group.
For each of the group, first it does call
the initialise function once, followed by
rollup function calls for each of the
records in the group and finally calls the
finalise function once at the end of last
rollup call.
Q.How do you add default rules in
transformer?
Answer: In case of reformat if the
destination field names are same or
subset of the source fields then no need
to write anything in the reformat xfr
unless you dont want to use any real
transform other than reducing the set of
fields or split the flow into a number of
flows to achive the functionality.
1)If it is not already displayed, display
the Transform Editor Grid.
2)Click the Business Rules tab if it is not
already displayed.
3)Select Edit > Add Default Rules.
Add Default Rules — Opens the Add
Default Rules dialog. Select one of the
following: Match Names — Match
names: generates a set of rules that
copies input fields to output fields with
the same name. Use Wildcard (.*) Rule
— Generates one rule that copies input
fields to output fields with the same
name.
Q.What is the difference between
partitioning with key and round robin?
Answer: Partition by Key or hash
partition -> This is a partitioning
technique which is used to partition
data when the keys are diverse. If the
key is present in large volume then there
can large data skew. But this method is
used more often for parallel data
processing.
Round robin partition is another
partitioning technique to uniformly
distribute the data on each of the
destination data partitions. The skew is
zero in this case when no of records is
divisible by number of partitions. A real
life example is how a pack of 52 cards
is distributed among 4 players in a
round-robin manner.
If you have some 30 cards taken at
random from 52 card pack——-If take
the card color as key(red or white) and
distribute then the no of cards in each
partion may vary much.But in Round
robin , we distribute with block size , so
the variation is limited to the block size
Partition by Key – Distribute according
to the key value
Partition by Round Robin – Distribute a
predefined number of records to one
flow and then the same numbers of
records to the next flow and so on. After
the last flow resumes the pattern and
almost evenly distributes the records…
This patter is called round robin fashion.
Q.How do you truncate a table? (Each
candidate would say only 1 of the
several ways to do this.)
Answer: From Abinitio run sql
component using the DDL “trucate
table”
By using the Truncate table component
in Ab Initio
There are many ways to do it.
1. Probably the easiest way is to use
Truncate Table
2. Run Sql or update table can be used to
do the same thing
3. Run Program
Q.Have you eveer encountered an error
called “depth not equal”? (This occurs
when you extensively create graphs it is
a trick question)
Answer: When two components are
linked together if their layout doesnot
match then this problem can occur
during the compilation of the graph. A
solution to this problem would be to use
a partitioning component in between if
there was change in layout.
have talked about a situation where you
have linked
2 components – each of them having
different layouts.
Think about a situation where the
components on the left hand side is
linked to a serial dataset and on the
right hand side the downstream
component is linked to a multifile.
Layout is going to be propagaed from
naghbours.
So without any partitioning component
the jump in the depth cannot be
achieved and I suppose you must need
one partitioning component which can
help alleviate this depth discrepancy.
Q.What is the function you would use to
transfer a string into a decimal?
In this case no specific function is
required if the size of the string and
decimal is same. Just use decimal cast
with the size in the transform function
and will suffice. For example, if the
source field is defined as string(8) and
the destination as decimal(8) then (say
the field name is field1).
out.field :: (decimal(8)) in.field
If the destination field size is lesser than
the input then use of string_substring
function can be used likie the following.
say destination field is decimal(5).
out.field ::
(decimal(5))string_lrtrim(string_substring(in.field,1,5))
/* string_lrtrim used to trim leading and
trailing spaces */
Hope this solution works.
Q.How many parallelisms are in
Abinitio? Please give a definition of
each.
Answer: There are 3 kinds of
Parallelism:
1) Data Parallesim
2)Componnent Paralelism
3) Pipeline.
When the data is divided into smalll
chunks and processed on different
components simultaneously we call it
DataParallelism
When different components work on
different data sets it is called
Component parallelism
When a graph uses multiple
components to run on the same data
simultaneously we call it Pipeline
parallelism
Q.What is multi directory?
Answer:A multi directory is a parallel
directory that is composed of individual
directories, typically on different disks
or computers. The individual directories
are partitions of the multi directory.
Each multi directory contains one
control directory and one or more data
directories. Multi files are stored in multi
directories.
Q.What is multi file?
Answer: A multi file is a parallel file that
is composed of individual files, typically
on different disks or computers. The
individual files are partitions of the multi
file. Each multi file contains one control
partitions and one or more data
partitions. Multi files are stored in
distributed directories called multi
directories. This diagram shows a multi
directory and a multi file in a multi file
system:
The data in a multi file is usually divided
across partitions by one of these
methods:
Random or round robin partitioning
Partitioning based on ranges or
functions
Replication or broadcast, in which each
partition is an identical copy of the
serial data.
Q.What is mean by GDE, SDE? What is
purpose of GDE, SDE?
Answer:
GDE – Graphical Development
Environment –it is used for developing
the graphs
SDE – Shell Development Environment,
which is used for developing the korn
shell script on co>operating system.
Q.What is difference between Rollup
and Scan ?
Answer:
Roll up comp:
Rollup evaluates a group of input
records that have the same key and
then generates data records that either
summarize each group or select certain
information from each group.
Using Rollup component can evaluates
to two ways as follows: 1. Template
mode 2. Expanded Mode
1. Template Mode:
This mode options evaluates using built
aggregation functions alike sum, min,
max, count, avg,
product, first, last.
2. Expanded Mode:
This mode option can evaluates using
(without aggregation functions) user
defined functions alike temporary
function, initialize, finalize and rollup
functions in transform function
propriety.
Scan generates a series of cumulative
summary records — such as successive
year-to-date totals for groups of data
records. Scan produces intermediate
summary records.
Rollup is for group by and Scan is for
successive total. Basically, when we
need to produce summary then we use
scan. Rollup is used to aggregate data.
Q.What is Runtime Behavior of Rollup?
Answer: Roll up can supports two types
of modes.
1.Template Mode:
This mode options evaluates using built
aggregation functions alike sum, min,
max, count, avg,
product, first, last.
2. Expanded Mode:
This mode option can evaluates using
(without aggregation functions) user
defined functions alike temporary
function, initialize, finalize and rollup
functions in transform function
propriety.
Rollup component’s performance differs
from using Rollup Input is Sorted and
Rollup Input is Unsorted
When Rollup Input is sorted
When you set the sorted-input
parameter to Input must be sorted or
grouped (the default), Rollup requires
data records grouped according to the
key parameter. If you need to group the
records, use Sort with the same key
specifier that you use for Rollup. It will
produces sorted outputs in output port.
When Rollup Input is Unsorted
When you set the sorted-input
parameter to In memory: Input need not
be sorted, Rollup accepts un grouped
input, and groups all records according
to the key parameter. It does not
produce sorted output.
Q.How do you do rollback in Ab-Initio?
Answer:Ab-Initio has supports very
good recovery options for any failures at
runtime and interrupted powers at
development time.
Development time:
You can get a recovery graph file while
occurred any interrupted failures at
development time.
At Runtime:
You can get a recovery file while
occurred any failures at execution of
graph and you can restart the execution.
The recovery file has last checkpoint
information and restarts from last
checkpoint onwards.
you can use two ways to rollback the
Ab-Initio graphs
m_rollback –d -deletes all intermediate
files and checkpoints
Q.What is internal execution (process)
of the Ab-Initio graphs in Ab-Initio
co>operating system on while running
the graphs?
Answer:Normally the Ab-Initio Co>
operating system checks relevant code
compatible of GDE and
Co>operating system. if you are used
any lookup files in graphs. This is called
lookup layout checking.
The graphs are having input and output
files and it checks whether the path are
correct or not, given below the
sequence of process has done while
running the graphs.
Checks lookup files layouts.
Checks meta data part (this is part
check whether data types are used or
not and related everything) – dml
checking for each component basis.
Checks input files
Checks output files
Checks each component’s layouts
Finally, it checks flow of process
assigns to straight.
Q.What does dependency analysis
mean in Ab-Initio?
Answer: dependency analysis will
answer the questions regarding data
linage that is where does the data
comes from and what applications
produced depend on this data etc..
Q.What is meant by Fencing in Ab-
Initio?
Answer: In Software World fencing
means job controlling on priority basis.
In Ab-Initio it actually refers to
customized phase breaking.
A well fenced graph means no matter
what is source data volume process will
not cough in dead locks.
It actually limits the number of
simultaneous processes.
In Ab-Initio you need to Fence the job in
some times to stop the schedule.
Fencing is nothing but changing the
priority of the particular job.
Q.What is the function of fuse
component?
Answer: Fuse combines multiple input
flows into a single output flow by
applying a transform function to
corresponding records of each flow
Runtime Behavior of Fuse
Fuse applies a transform function to
corresponding records of each input
flow. The first time the transform
function executes, it uses the first
record of each flow. The second time
the transform
function executes, it uses the second
record of each flow, and so on. Fuse
sends the result of the transform
function to the out port.
The component works as follows. The
component tries to read from each of its
input flows.
* If all of its input flows are finished,
Fuse exits.
* Otherwise, Fuse reads one record from
each still-unfinished input port and a
NULL from each finished input port.
Q.what is data skew? how can you
eliminate data skew while i am using
partiiion by key?
Answer: The skew of a data or flow
partition is the amount by which its size
deviates from the average partition size
expressed as a percentage of the
largest partition
Skew of data (partition size –
avg.partition size)*100/(size of largest
partition)
Q.What is $mpjret? Where it is used in
ab-Initio?
Answer:
$mpjret gives the status of a graph.
U can use $mpjret in end script like
if 0 -eq($mpjret)
then
echo success
else
mailx -s [graph_name] failed mail_id
Q.What are primary keys and foreign
keys?
Answer: In RDBMS the relationship
between the two tables is represented
as Primary key and foreign key
relationship.Wheras the primary key
table is the parent table and foreignkey
table is the child table.The criteria for
both the tables is there should be a
matching column.
Q.What is an outer join?
Answer: An outer join is used when one
wants to select all the records from a
port – whether it has satisfied the join
criteria or not.
If you want to see all the records of one
input file independent of whether there
is a matching record in the other file or
not. then its an outer join.
Q.What are Cartesian joins?
Answer: joins two tables without a join
key. Key should be {}.
A Cartesian join will get you a Cartesian
product. A Cartesian join is when you
join every row of one table to every row
of another table. You can also get one
by joining every row of a table to every
row of itself.
Q.What is the difference between a DB
config and a CFG file?
Answer: A .dbc file has the information
required for Ab Initio to connect to the
database to extract or load tables or
views. While .CFG file is the table
configuration file created by db_config
while using components like Load DB
Table.
Both DBC and CFG files are used for
database connectivity, basically both
are of similar use. The only difference is,
cfg file is used for Informix Database,
whereas dbc are used for other
database such as Oracle or Sqlserver
Q.What is the difference between a
Scan component and a RollUp
component?
Answer: Rollup is for group by and Scan
is for successive total. Basically, when
we need to produce summary then we
use scan. Rollup is used to aggregate
data.
1. what is local and formal parameter?
Answer: Two are graph level parameters
but in local you need to initialize the
value at the time of declaration where
as globle no need to initialize the data it
will promt at the time of running the
graph for that parameter.
Q.How will you test a dbc file from
command prompt ??
Answer: try “m_db test myfile.dbc”
Q.Explain the difference between the
“truncate” and “delete” commands ?
Answer. Truncate :- It is a DDL
command, used to delete tables or
clusters. Since it is a DDL command
hence it is auto commit and Rollback
can’t be performed. It is faster than
delete.
Delete:- It is DML command, generally
used to delete a record, clusters or
tables. Rollback command can be
performed , in order to retrieve the
earlier deleted things. To make deleted
things permanently, “commit” command
should be used.
Q.How to retrive data from database to
source in that case whice componenet
is used for this?
Answer. To unload (retrive) Data from
the database DB2, Informix, or Oracle
we have components like Input Table
and Unload DB Table by using these two
components we can unload data from
the database.
Q.How many components are there in
your most complicated graph?
Answer: This is a tricky question,
number of component in a graph has
nothing to do with the level of
knowledge a person has. On the
contrary, a proper standardized and
modular parametric approach will
reduce the number of components to a
very few. In a well thought modular and
parametric design, mostly the graphs
will have 3/4 components, which will be
doing a particular task and will then call
another sets of graphs to do the
next and so on. This way total numbers
of distinct graphs will
drastically come down, support and
maintenance will be much more
simplified. The bottom line is, there are
lot more other things to plan rather than
to add components.
Q.Do you know what a local lookup is?
Answer: This function is similar to a
lookup…the difference being that this
function returns NULL when there is no
record having the value that has been
mentioned in the arguments of the
function.
If it finfs the matching record it returns
the complete record..that is all the fields
along with their values corresponding to
the expression mentioned in the lookup
local function.
eg: lookup_local( “LOOKUP_FILE”,81) ->
null
if the key on which the lookup file is
partitioned does not hold any value as
mentioned.
Local Lookup files are small files that
can be accommodated into
physical memory for use in transforms.
Details like country
code/country, Currency code/currency,
forexrate/value can be used in a lookup
file and mapped during transformations.
Lookup files are not connected to any
component of the graph but available to
reformat for
mapping.
Q.How to Create Surrogate Key using Ab
Initio?
Ans. A key is a field or set of fields that
uniquely identifies a record in a file or
table.
A natural key is a key that is meaningful
in some business or real-world sense.
For example, a social security number
for a person, or a serial number for a
piece of equipment, is a natural key.
A surrogate key is a field that is added
to a record, either to replace the natural
key or in addition to it, and has no
business meaning. Surrogate keys are
frequently added to records when
populating a data warehouse, to help
isolate the records in the warehouse
from changes to the natural keys by
outside processes.
Q.How to Improve Performance of
graphs in Ab initio?
Give some examples or tips.
Ans. There are somany ways to improve
the performance of the graphs in
Abinitio.
I have few points from my side.
1.Use MFS system using Partion by
Round by robin.
2.If needed use lookup local than lookup
when there is a large data.
3.Takeout unnecessary components like
filter by exp instead provide them in
reformat/Join/Rollup.
4.Use gather instead of concatenate.
5.Tune Max_core for Optional
performance.
6.Try to avoid more phases.
There are many ways the performance
of the graph can be improved.
1) Use a limited number of components
in a particular phase
2) Use optimum value of max core
values for sort and join components
3) Minimise the number of sort
components
4) Minimise sorted join component and
if possible replace them by in-memory
join/hash join
5) Use only required fields in the sort,
reformat, join components
6) Use phasing/flow buffers in case of
merge, sorted joins
7) If the two inputs are huge then use
sorted join, otherwise use hash join with
proper driving port
8) For large dataset don’t use broadcast
as partitioner
9) Minimise the use of regular
expression functions like re_index in the
trasfer functions
10) Avoid repartitioning of data
unnecessarily
Q.Describe the process steps you
would perform when defragmenting a
data table. This table contains mission
critical data ?
Answer: There are several ways to do
this:
1) We can move the table in the same or
other tablespace and rebuild all the
indexes on the table.
alter table move this activity reclaims
the defragmented space in the table
analyze table table_name compute
statistics to capture the updated
statistics.
2)Reorg could be done by taking a dump
of the table, truncate the table and
import the dump back into the table.
Q.How do we handle if DML changing
dynamically ?
Answer: There are lot many ways to
handle the DMLs which changes
dynamically with in a single file. Some
of the suitable methods are to use a
conditional DML or to call the vector
functionality while calling the DMLs.
Q.What r the Graph parameter?
Answer: There are 2 types of graph
parameters in AbInitio
1. local parameter
2. Formal parameters.(those
parameters working at runtime)
Q.What is meant by fancing in abinitio ?
Answer: The word Abinitio means from
the beginning.
Q.What is a ramp limit?
Answer: Limit and Ramp.
For most of the graph components, we
can manually set the error
threshold limit, after which the graph
exits. Normally there are three
levels of thresholds like “Never Exit” and
“Exit on First Occurance”,
very clear from the text. They represent
both the extremes. The third
one is Limit along with Ramp. Limit
talks about max limit where as RAMP
talks in terms of percentage of
processed records. For example a ramp
value of 5 means, if less than 5% of the
total records are rejected,
continue running. If it crosses the ramp
then it will come out of the
graph. Typically development starts with
never exit, followed by ramp
and finally in production “Exit on First
Occurance”. Case to case basis
RAMP can be used in production but
definitely not a desired approach.
Q.Difference between conventional
loading and direct loading ? when it is
used in real time ?
Answer:
Conventional Load:
Before loading the data all the Table
constraints will be checked against the
data.
Direct load:(Faster Loading)
All the Constraints will be disabled. Data
will be loaded directly. Later the data
will be checked against the table
constraints and the bad data won’t be
indexed.
api conventional loading
utility direct loading.
Q.How do you done the unit testing in
Ab-Initio? How will you perform the Ab-
Initio Graphs executions? How will you
increase the performance in Ab-Inito
graphs?
Answer:
The Ab-Initio Co>operating system is
handling the graph with multiple
processes running simultaneously. This
is primary performance. Follows the
given below actions:
1. The data separators mostly use “\307”
and “\007” instead of “~”, “,” and
special characters and avoids these
delimiters. Because of the Ab-Initio has
predefined these data separators.
2. Avoids repeated aggregation in graphs.
You calculate for required aggregation
at once and stores in file calls value
using parameters and then you can use
this parameter where it required.
3. Avoids the maximum number of
components in graph and max core
components in graphs.
4. Don’t write any kinds looping
statements in start script
5. Mostly use the sources are flat files
Q.How do you improve the performance
of a graph?
Answer:There are many ways the
performance of the graph can be
improved.
Use a limited number of components in a
particular phase
Use optimum value of max core values
for sort and join components
Minimize the number of sort
components
Minimize sorted join component and if
possible replace them by in-memory
join/hash join
Use only required fields in the sort,
reformat, join components
Use phasing/flow buffers in case of
merge, sorted joins
If the two inputs are huge then use
sorted join, otherwise use hash join with
proper driving port
For large dataset don’t use broadcast as
partitioner
Minimize the use of regular expression
functions like re_index in the transfer
functions
Avoid repartitioning of data
unnecessarily
Q.How would you do performance
tuning for already built graph?
Answer:Steps to performance Tuning
for already built graph.
Understand the functionality of the
Graph.
Modularize(i.e,check for dependencies
among components).
Give Phasing.
Check for correct Parallelism.
Check for DB component(i.e,take
required data from DB. Instead of taking
whole data from DB which consumes
more time and memory.
Q.What is .abinitiorc ? What it contain?
Answer:.abinitiorc is a file which
contains the credentials to connect to
host.
Credentials like
1)Host IP
2)User-name
3)Password etc…
This is a config file for ab-Initio – in
user’s home directory and in
$AB_HOME/Config. It sets Ab-Initio
home path, configuration variables
(AB_WORK_DIR, AB_DATA_DIR, etc.),
login info (id, encrypted password),
login methods for hosts for execution
(like EME host, etc.), etc.
Q.Why might you create a stored
procedure with the ‘with recompile’
option?
Answer: Recompile is useful when the
tables referenced by the stored
procedure undergoes a lot of
modification/deletion/addition of data.
Due to the heavy modification activity
the execute plan
becomes outdated and hence the stored
procedure performance goes down. If
we create the stored procedure with
recompile option, the sql server wont
cache a plan for this stored procedure
and it will be recompiled every time it is
run.
Q.What is the purpose of having stored
procedures in a database?
Answer:Main Purpose of Stored
Procedure for reduce the network traffic
and all sql statement executing in
cursor so speed too high.
We use Run SQL and Join with DB
components to run Stored Procedures.
Q.What is mean by Co>Operating
system and why it is special for Ab-
Initio?
Answer:
Co > Operating System:Layered top to
the Native operating system.
It converts the Ab-Initio specific code
into the format, which the
UNIX/Windows can understand and
feeds it to the native operating system,
which carries out the task.
Q.How to retrieve data from database
to source in that case which component
is used for this?
Answer: To unload (retrieve) Data from
the database DB2, Informix, or Oracle
we have components like Input Table
and Unload DB Table by using these two
components we can unload data from
the database.
Input Table Component use the
following parameters:
1)db_config file(which contains
credentials to interface with Database)
2)Database Types
3)SQL file (which contains sql queries to
unload data from table(s)).
Q.How to execute the graph from start
to end stages?Tell me and how to run
graph in non Ab-Initio system?
Answer:
There are so many ways to do this,
1.you can run components according to
phases how you defined.
2.by creating ksh, sh scripts also you
can run.
Q.What is Join With DB?
Answer: Join with DB Component joins
records from the flow or flows
connected to its in port with records
read directly from a database, and
outputs new records containing data
based on, transform function.
Q.How do you truncate a table?
Answer: Use Truncate Table component
to truncate a table from DB in Ab-Initio.
Truncate Table Component has the
following parameters:
1)db_config file(which contains
credentials to interface with Database)
2)Database Types
3)SQL file (which contains sql queries to
truncate table(s)).
Q.Can we load multiple files?
Answer: Yes,we can load multiple file in
Ab-Initio.
Q.What is the syntax of m_dump
command?
Answer: m_dump command prints the
data in a formatted way.
The general syntax is
m_dump
“m_dump meta data data [action] ”
e.g
m_dump emp.dml emp.dat -start 10
-end 20
– it will give record from 10 to 20 from
emp.dat file.
Q.How to Create Surrogate Key using
Ab-Initio?
Answer: A surrogate key is a
substitution for the natural primary key.
–It is just a unique identifier or number
for each record like ROWID of an Oracle
table.
Surrogate keys can be created using
1)next_in_sequence
2)this_partition
3)no_of_partitions
Q.Can any one give me an example of
real-time start script in the graph?
Answer: Start script is a script which
gets executed before the graph
execution starts. If we want to export
values of parameters to the graph then
we can write in start script then run the
graph then those values will be exported
to graph.
Q.What is the difference between
sandbox and EME, can we perform
checkin and checkout through
sandbox/ Can anybody explain checkin
and checkout?
Answer. Sandboxes are work areas
used to develop, test or run code
associated with a given project. Only
one version of the code can be held
within the sandbox at any time.
The EME Datastore contains all versions
of the code that have been checked into
it. A particular sandbox is associated
with only one Project where as a Project
can be checked out to a number of
sandboxes.
Q.What is skew and skew
measurement?
Answer: skew is the mesaureof data
flow to each partation .
suppose i/p is comming from 4 files
and size is 1 gb
1 gb= ( 100mb+200mb+300mb+5oomb)
1000mb/4= 250 mb
(100- 250 )/500= –> -150/500 == cal ur
self it wil come in -ve value.
calclu for 200,500,300.
+ve value of skew is allways desriable.
skew is a indericet measure of graph.
Q.What is the syntax of m_dump
command?
Answer: The genaral syntax is “m_dump
metadata data [action] ”
Q.What is the latest version that is
available in Ab-initio?
Answer: The latest version of GDE
ism1.15 AND Co>operating system is
2.14
Q.What is the Difference between DML
Expression and XFR Expression ?
Answer: The main difference b/w dml &
xfr is that
DML represent format of the metadata.
XFR represent the tranform
functions.which will contain business
rules
Q.What are the most commonly used
components in a Abinition graph? can
anybody give me a practical example of
a trasformation of data, say customer
data in a credit card company into
meaningful output based on business
rules?
Answer: The most commonly used
components in to any Ab Initio project
are
input file/output file
input table/output table
lookup file
reformat,gather,join,runsql,join with
db,compress
components,sort,trash,partition by
expression,partition by key ,concatinate
Q.Have you used rollup component?
Describe how ?
Answer: Rollup component can be used
in different number of ways. It basically
acts on a group of records based on a
certain key.
The simplest application would be to
count the number of records in a certain
file or table.
In this case there would not be any “key”
associated with it. A temp variable
would be created for eg. ‘temp.count’
which would be increamented with
every record ( since there is no key here
all the fields are trated as one group)
that flows through the transform, like
temp.count=temp.count+1.
Again the rollup component can be used
to discard duplicates from a
group.Rollup basically acting as the
dedup component in this case.
1. What is the difference between
partitioning with key and round robin?
Answer: PARTITION BY KEY:
In this, we have to specify the key based
on which the partition will occur. Since it
is key based it results in very well
balanced data. It is useful for key
dependent parallelism.
PARTITION BY ROUND ROBIN:
In this, the records are partitioned in
sequential way, distributing data evenly
in blocksize chunks across the output
partition. It is not key based and results
in well balanced data especially with
blocksize of 1. It is useful for record
independent parallelism.
Q.How to work with parameterized
graphs?
Answer: One of the main purpose of the
parameterized graphs is that if we need
to run the same graph for n number of
times for different files, we set up the
graph parameters like $INPUT_FILE,
$OUTPUT_FILE etc and we supply the
values for these in the
Edit>parameters.These parameters are
substituted during the run time. we can
set different types of parameters like
positional, keyword, local etc.
The idea here is, instead of maintaining
different versions of the same graph, we
can maintain one version for different
files.
Q.How Does MAXCORE works?
Answer: Maxcore is a value (it will be in
Kb).Whne ever a component is executed
it will take that much memeory we
specified for execution.
Q.What does layout means in terms of
Ab Initio?
Answer: Before you can run an Ab Initio
graph, you must specify layouts to
describe the following to the
Co>Operating System:
The location of files
The number and locations of the
partitions of multifiles
The number of, and the locations in
which, the partitions of program
components execute
A layout is one of the following:
A URL that specifies the location of a
serial file
A URL that specifies the location of the
control partition of a multifile
A list of URLs that specifies the locations
of:
The partitions of an ad hoc multifile
The working directories of a
program component
Every component in a graph — both
dataset and program components —
has a layout. Some graphs use one
layout throughout; others use several
layouts and repartition data as needed
for processing by a greater or lesser
number of processors.
During execution, a graph writes various
files in the layouts of some or all of the
components in it. For example:
An Intermediate File component writes to
disk all the data that passes through it.
A phase break, checkpoint, or watcher
writes to disk, in the layout of the
component downstream from it, all the
data passing through it.
A buffered flow writes data to disk, in the
layout of the component downstream
from it, when its buffers overflow.
Many program components — Sort is one
example — write, then read and remove,
temporary files in their layouts.
A checkpoint in a continuous graph
writes files in the layout of every
component as it moves through the
graph.
Q.Can we load multiple files?
Answer: Load multiple files from my
perspective means writing into more
than one file at a time. If this is the
same case with you, Ab initio provides a
component called Write Multiplefiles (in
dataset Component group) which can
write multiple files at a time. But the
files which are to be written must be
local files i.e., they should reside in your
local PC. For more information on this
component read in help file.
Q.How would you do performance
tuning for already built graph ? Can you
let me know some examples?
Answer: example :- suppose sort is
used in fornt of merge component its no
use of using sort !
1)we have sort component built in
merge.
2) we use lookup instead of JOIN,Merge
Componenet.
3) suppose we wnt to join the data
comming from 2 files and we dnt wnt
dupliates we will use union funtion
instead of adding addtional component
for duplicate remover.
Q.Which one is faster for processing
fixed length dmls or delimited dmls and
why ?
Answer: Fixed length DML’s are faster
because it will directly read the data of
that length without any comparisons but
in delimited one,s every character is to
be compared and hence delays.
Q.What is the function you would use to
transfer a string into a decimal?
Answer: For converting a string to a
decimal we need to typecast it using the
following syntax,
out.decimal_field :: ( decimal(
size_of_decimal ) ) string_field;
The above statement converts the
string to decimal and populates it to the
decimal field in output.
Q.What is the importance of EME in ab
initio?
Answer: EME is a repository in Ab
Inition and it used for checkin and
checkout for graphs also maintains
graph version.
Q.How do you add default rules in
transformer?
Answer: Double click on the transform
parameter of parameter tab page of
component properties, it will open
transform editor. In the transform editor
click on the Edit menu and then select
Add Default Rules from the dropdown. It
will show two options – 1) Match
Names 2) Wildcard.
Q.What is data mapping and data
modeling?
Answer: data mapping deals with the
transformation of the extracted data at
FIELD level i.e. the transformation of the
source field to target field is specified
by the mapping defined on the target
field. The data mapping is specified
during the cleansing of the data to be
loaded.
For Example:
source;
string(35) name = “Siva Krishna “;
target;
string(“01”) nm=NULL(“”);/*(maximum
length is string(35))*/
Then we can have a mapping like:
Straight move.Trim the leading or
trailing spaces.
The above mapping specifies the
transformation of the field nm.
Q.Difference between conventional
loading and direct loading ? when it is
used in real time .
Answer: Conventional Load:
Before loading the data, all the Table
constraints will be checked against the
data.
Direct load:(Faster Loading)
All the Constraints will be disabled. Data
will be loaded directly.Later the data will
be checked against the table
constraints and the bad data won’t be
indexed.
Api conventional loading
utility direct loading.
Q.What are the contineous components
in Abinitio?
Answer: Contineous components used
to create graphs,that produce useful
output file while running continuously
Ex:- Contineous rollup,Contineous
update,batch subscribe
Q.What is mean by Co > Operating
system and
why it is special for Ab-initio ?
Answer: Co > Operating System:
It converts the AbInitio specific code
into the format, which the
UNIX/Windows can understand and
feeds it to the native operating system,
which carries out the task.
Q.How do you add default rules in
transformer?
Answer: Click to transformer then go to
edit …then click to add default rule……
In Abinitio there is a concept called Rule
Priority, in which you can assign priority
to rules in Transformer.
Let’s have a example:
Ouput.var1 :1: input.var1 + 10
Ouput.var1 :2: 100
This example shows that output
variable is assigned an input variable +
100 or if input variable do not have a
value then default value 100 is set to the
output variable.
The numbers 1 and 2 represents the
priority.
Q.How to do we run sequences of jobs ,
like output of A JOB is Input to B,How
do we co-ordinate the jobs?
Answer: By writing the wrapper scripts
we can control the sequence of
execution of more than one job.
Q.what is BRODCASTING and
REPLICATE ?
Answer: Broadcast – Takes data from
multiple inputs, combines it and sends it
to all the output ports.
Eg – You have 2 incoming flows (This
can be data parallelism or component
parallelism) on Broadcast component,
one with 10 records & other with 20
records. Then on all the outgoing flows
(it can be any number of flows) will have
10 + 20 = 30 records
Replicate – It replicates the data for a
particular partition and send it out to
multiple out ports of the component, but
maintains the partition integrity.
Eg – Your incoming flow to replicate has
a data parallelism level of 2. with one
partition having 10 records & other one
having 20 recs. Now suppose you have
3 output flos from replicate. Then each
flow will have 2 data partitions with 10 &
20 records respectively.
Ab initio Interview Questions
Ab initio Interview Questions and
Answers
Q.When using multiple DML statements
to perform a single unit of work, is it
preferable to use implicit or explicit
transactions, and why.
Answer: Because implicit is using for
internal processing and explicit is using
for user open data required.
Q.What are kinds of layouts does ab
initio supports
Answer: Basically there are serial and
parallel layouts supported by AbInitio. A
graph can have both at the same time.
The parallel one depends on the degree
of data parallelism. If the multi-file
system is 4-way parallel then a
component in a graph can run 4 way
parallel if the layout is defined such as
it’s same as the degree of parallelism.
Q.What is the difference between look-
up file and look-up, with a relevant
example?
Answer: A lookup is a component of
abinitio graph where we can store data
and retrieve it by using a key parameter.
A lookup file is the physical file where
the data for the lookup is stored.
Q.How will you test a dbc file from
command prompt?
Answer: A .dbc file can be tested using
m_db command
eg: m_db test .dbc_filename
Q.Can we merge two graphs?
Answer: You can not merge two ab-
Initio graphs. You can use the output of
one graph as input for another. You can
also copy/paste the contents between
graphs.
Q.Explain the differences between api
and utility mode?
Answer: api and Utility are Database
Interfaces.
api use SQL where table constrains are
checked against the data before loading
data into Database.
Utility uses Bulk Loading where table
constraints are disabled first and data
loaded into Database and then table
constraints are checked against data.
Data loading using Utility is faster when
compared to Api. if a crash occurs while
loading data into database we can have
commit and rollback in Api but we need
to load whole in Utility mode.
Q.How to Schedule Graphs in Ab-
Initio,like work flow Schedule in
Informatica? And where we must use
Unix shell scripting in Ab-Initio?
Answer: We can use Autosys, Control-
M, or any other external scheduler to
schedule graphs in Ab-Initio.
We can take care of dependencies in
many ways. For example, if scripts
should run sequentially, we can arrange
for this in Autosys, or we can create a
wrapper script and put there several
sequential commands (nohup
command1.ksh & ; nohup
command2.ksh &; etc). We can even
create a special graph in Ab-Initio to
execute individual scripts as needed.
Q.What is Environment project in Ab-
Initio?
Answer: Environment project is a
special public project that exists in
every Ab-Initio environment. It contains
all the environment parameters required
by the private or public projects which
constitute AI Standard Environment.
Q.What is Component Folding?What is
the use of it?
Answer: Component Folding is a new
feature by which Co>operating System
combines a group of components and
runs them as a single process.
Component Folding improves the
performance of graph.
Pre-Requirements for Component
Folding
The components must be foldable.
They must be in same phase and
layout.
Components must be connected via
straight flow
Q.How do you Debug a graph ,If an error
occurs while running?
Answer: There are many ways to debug
a graph. we can use
Debugger
File Watcher
Intermediate File for debugging
purpose.
Q.What do u mean by $RUN?
Answer: This is parameter variable and
it contains only path of project sandbox
run directory. Instead of using hard-code
value to use this parameter and this is
default sandbox run directory
parameter.
fin ——-> top-level directory (
$AI_PROJECT )
|—- mp ——-> second level directory
($MP )
|—- xfr ——-> second level directory
($XFR )
|—- run ——–> second level directory
($RUN )
|—- dml ——-> second level directory
($DML )
Q.What is the importance of EME in ab-
Initio?
Answer: EME is a repository in Ab-Initio
and it used for check-in and checkout
for graphs also maintains graph version.
EME is source code control system in
Ab-Initio world. It is repository where all
the sandboxes
related(project related codes(graphs
version are maintained) code version
are maintained , we just check-in and
checkout graphs and modified it
according. There will be lock put once it
is access by any users.
Q.What is the difference between
sandbox and EME, can we perform
check-in and checkout through
sandbox/ Can anybody explain check-in
and checkout?
Answer: Sandboxes are work areas
used to develop test or run code
associated with a given project.
Only one version of the code can be
held within the sandbox at any time. The
EME Data-store contains all versions of
the code that have been checked into it.
A particular sandbox is associated with
only one Project where as a Project can
be checked out to a number of
sandboxes.
Q.What is difference between sandbox
parameters and graph parameters?
Answer: Sandbox Parameters are
common parameters for the project. it
can be used to accessible with in a
project. The graph parameters are uses
with in graph but you can’t access
outside of other graphs. It’s called local
parameters.
Q.How do you connect EME to Ab-Initio
Server?
Answer:There are several ways of
connecting to EME
Set AB_AIR_ROOT
GDE you can connect to EME data-
store
login to eme web interface
using the air command, i don’t know
much about this.
Q.What is use of co>operating system
between GDE and Host?
Answer: The co>operating system is
heart of GDE, It always referring the host
setting, environmental variable and
functions while running the graphs
through GDE. It’s interfacing the
connection setting information between
HOST and GDE.
Q.What is the use of Sandbox ? What is
it.?
Answer: Sandbox is a directory
structure of which each directory level is
assigned a variable name, is used to
manage check-in and checkout of
repository based objects such as mp,
run, dml, db, xfr and sql (graphs, graph
ksh files, wrapper scripts, dml files, xfr
files, dbc files, sql files.)
Fin ——-> top-level directory (
$AI_PROJECT )
|—- mp ——-> second level directory
($AI_MP )
|—- xfr ——-> second level directory
($AI_XFR )
|—- run ——–> second level directory
($AI_RUN )
|—- dml ——-> second level directory
($AI_DML )
Sandbox contains various directories,
which is used for specific purpose only.
The mp directory is used for storing
data mapping details about between
sources and targets or components and
the file extension must be *.mp. The xfr
directory denotes purpose of stores the
transform files and the file extension
must be *.xfr. The dml directory is used
for storing all meta-data information of
data with Ab-Initio supported data types
and the file extensions must be *.dml.
The run directory contains only the
graph’s shell script (korn shell script)
files that are created after deploying the
graph.
The sandbox contains might be stores
all kinds of information for data.
Q.What is mean by EME Data Store and
what is use of EME Data Store in
Enterprise world?
Answer: EME Data Store is a Enterprise
Meta Environment Data store
(Enterprise Repository) and its contains
’n’ number of projects (sandbox) which
are interfacing the meta data between
them. These sandbox project objects
(mp, run, db, xfr, dml) are can be easily
to manage the check-in, checked out of
the repository objects.
Mode:
In the EME Data-store Mode box of the
EME Data-store Settings dialog, choose
one of the following:
Source Code Control — This is the
recommended setting. When you set a
data-store to this mode, you must check
out a project in order to work on it. This
prevents multiple users from making
conflicting changes to a project.
Full Access — This setting is strongly
not recommended. It is for advanced
users only. It allows you to edit a project
in the data-store without checking it out.
Save Script When Graph Saved to
Sandbox
In the EME Data-store Settings dialog,
select this option to have the GDE save
the script it generates for a graph when
you save the graph. The script lets you
run the graph without the GDE if, for
example, you relocate the project.
Contact for Ab initio Training
Overall rating: ★★★★☆ based on 433
reviews
Name
Course
Country
Email Address
Phone Number
Timings For Demo
Message
Send
Abinitio Online Training
WRITE A REVIEW
Name *
Email
Review Title *
Rating *
Review Content *
Submit
Ab initio interview questions Tags
ab initio interview questions and
answers,abinitio online training, ab initio
interview questions, ab initio online
training, abinitio training, ab initio
training institute, latest ab initio
interview questions, best ab initio
interview questions 2019, top 100 ab
initio interview questions,sample ab
initio interview questions,ab intio
interview questions technical, best ab
intio interview tips, best ab initio
interview basics
For online training videos
Copy Rights Reserved by KITS Online Training's Pvt Ltd - Powered By Best Design Infotech