Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
75 views1 page

Dataset and Fileset

Dataset stores data in its native format and supports either one input or output link with no reject links. It processes data in parallel by default and stores data in the repository. The descriptor file contains schema details and data file addresses, while the data file stores data in native format. Control and header files reside in the operating system. Pipeline parallelism allows data exchange between stages as soon as it is available without waiting for the entire record set. Partitioning parallelism partitions the entire record set into smaller sets processed on different nodes. A file set stores data similar to a sequential file but preserves the partitioning scheme, allowing you to view data in the defined partition order.

Uploaded by

tab12345
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views1 page

Dataset and Fileset

Dataset stores data in its native format and supports either one input or output link with no reject links. It processes data in parallel by default and stores data in the repository. The descriptor file contains schema details and data file addresses, while the data file stores data in native format. Control and header files reside in the operating system. Pipeline parallelism allows data exchange between stages as soon as it is available without waiting for the entire record set. Partitioning parallelism partitions the entire record set into smaller sets processed on different nodes. A file set stores data similar to a sequential file but preserves the partitioning scheme, allowing you to view data in the defined partition order.

Uploaded by

tab12345
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 1

DATASET

Dataset will stores the data in the Native Format. Ex .DS Dataset is file stage, which is used for staging the data when we design dependent jobs. Dataset Supports 1 input link or 1 Output link and there will be no reject links in dataset stage. By Default Dataset will processed parallely. Dataset will stores the data inside Repository ( i.e inside Datastage) And Dataset is multiple files. They are a) Descriptor File b) Data File c) Control file d) Header Files In Descriptor File, we can see the Schema details and address of data. In Data File, we can see the data in Native format. And Control and Header files resides in Operating System. Pipeline anD partitioning
Pipeline parallelism means that as soon as data is available between stages( in pipes or links), it can be exchanged between them without waiting for the entire record set to be read. Partitioning parallelism means that entire record set is partitioned into small sets and processed on different nodes (logical processors).

File set 1)It stores data in the format similar to a sequential file. 2) Only advantage of using file set over a sequential file is "it preserves partioning scheme". 3) You can view the data but in the order defined in partitioning schema

You might also like