0% found this document useful (0 votes)

40 views14 pages

Checklist Best Practices Known Issues v4

Uploaded by

santhosh1148

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as XLSX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views14 pages

Checklist Best Practices Known Issues v4

Uploaded by

santhosh1148

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as XLSX, PDF, TXT or read online on Scribd

You are on page 1/ 14

S.

No Applicable Phase
1 Analysis
2 Analysis
3 Analysis
4 Analysis
5 Analysis
7 Analysis
8 Analysis
9 Analysis
10 Analysis
11 Analysis
12 Analysis
15 Design
17 Design
18 Design
19 Design
20 Design
21 Execution
22 Execution
Title
Limitations in techstack technology tools identified and documented?
Was the discovery phase done to finalize inventory or scope from client ?
DB objects inventory categorized based on size ?
End to end lineage of source scripts identified and finalized ?
Validating identified lineage with scheduling tool sequence flow ?
Data availability in lower environments during project execution agreed ?
Is the priority of work planned, what are the objects/subject Area to be worked upon in phases ?
Data validation criteria and accepted deviation & agreed with customer ?
Plan in place to address source - target limitations and exceptions?
Analysis done on load management (target DB) between existing apps running in prod with new development work ?
Availability of required documentation for migrating apps ?
Validation approach identified for UT, SIT, UAT?
Decided on lift & shift migration or rearchitecturing the shift ?
Performace optimation approach / plan identified ?
Provisioning server configuration based on analysed data volume in scope ?
Performance optimization techniques identified and implemented for execution engines / clusters ?
Identifying code reviewers and establishing code review approval process before moving it to higher environments ?
DBA approval received for projected load in prod servers ?
Checklist(Y/N)
Comment
Any limitations with existing tech stack has to be compensated with work around logic and additional manual effort causing de
Without final inventory, it may impact project development pipelines at any phase of the project leading significant rework
Need to know number of large volume of objects being handled throughout the engagement to align with data load and delive
Without proper lineage, souce scripts might be missed which may be the issue for rework and data validation mismatch
This may give insight on missing information on lineage and dependencies
lower environmnt (dev,qa) needs to have sufficient data for testing purposes to avoid data mismatch during later stage of the
Sharing info between teams on data anamoly, data inconsistency etc., saves lot of time
To ensure data quality, accepted deviation from source data should be finalized

Any new development work should not affect the apps already running in production. Work should be segrregated accordingly
Agree with customer on approach and deviation arise due to document unavailability
Without proper testing plan, data quality might be compromised.
All performance issues may not be addressed in lift and shift
This will be helpful in the long run to save time of running jobs in production.

To avoid load on teradata leading to id block, prior permission from DBA is nencessary before running load
S.No Applicable Phase
1 Analysis
2 Analysis
3 Analysis
4 Analysis
5 Analysis
7 Analysis
8 Analysis
10 Analysis
11 Analysis
12 Analysis
14 Analysis
15 Analysis
16 Analysis
20 Design
21 Design
22 Design
23 Design
24 Design
25 Design
26 Design
28 Design
29 Execution
30 Execution
31 Execution
32 Execution
33 Execution
34 Execution
35 Execution
Title
Datatype compatibility checks done between source and target DB ?
Access level validation on source side (inclusive of all drill down levels) ?
Access validation on target side (some targets may need delete privilege) ?
Scheduling / data migration window identified for data migration ?
Datatype limitations (like BLOB/CLOB/BINARY/FLOAT) specific to target execution engine ?
Catch up load time frame discussed, considering Go Live deadlines ?
Identification of current job execution timings to perform data validation if comparison against daily refreshing data
Defining equivalents for non-compatible datatype between source and target ?
Concurrency and performance optimization parameters definition checks on source side to increase throughput efficiency ?
Agreed on retaining leading/trailing spaces during migration ?
Decommission plan for bridging tables(Used in more than one application/warehouse) ?
Identifying number of encrypted column tables involved in migration for sensitive data ?
Tables which have hard delete in source db (daily/weekly/monthly)
View migration analysis and approach finalized ?
Partitioning required and applicable ?
Data retention policy for offloaded historical data (in ADLS / storage layers) ?
Data migration strategy identified for large volume tables ?
Incremental / catch up load strategy and execution duration analysis ?
Plan to test end to end data validation across datalake layers ?
Server / Hardware sizing based on volume in scope ?
Is there any sensitive data available, what would be the pre-defined testing approach on these tables data ?
Identified approach for Null / blank value handling between source and target ?
Identified approach for any New Line or ASCII char handling between source and target ?
Foreign language characters compaitility check between source and target ?
Validation of data migration from views traversing multi layers ?
Usage of appropriate delimiter and field separated by values to avoid data mismatch and data inconsistency issues
Availability of data in lower environment to perform data validation against stagnant data
Case sensitivity checks in DDLs and source data between source and target dbs ?
Checklist(Y/N)
Comment
This help in reducing the data quality issue.

Avoiding business peak time

Databricks is an example, float getting converted to double are having precision round off issues
Catch up load needs to be completed in time otherwise this may cause delay in go live
Need to identify proper window for data validation after load as prod data is refreshed on daily basis
Acitivity should be performed during initial phase of the project
Having source batch ids that can handle large volume and parallel executions
To avoid any reporting tool issues during later stage of the project
One by one warehouse may go live. This will impact data in these tables.
If we are migrating from Non-prod env, need to decrypt the data and encrypt in lower env. Also test script requires changes inc
Need additional effort to run the script in target db after data migration/catchup.
drill down views
This will enable the faster data offload/ migration.
Cost consideration
Splitting in to years / months to avoid issues due in case of file system storage
Avoid delays during Go Live
Avoid data reload at later stage of project
App server should have sufficient space to copy data and data copied should be removed once full load is done

helps in reducing the data quality issues as nulls sometimes get converted to blank as the source value is not recognized by too
helps in reducing the data quality issues. New line causes data to be shifted to next line causing count and data mismatch
All foreign languages are not supported in target so some samples needs to be validated to ensure
Data and dataype mismatch validation
Field delimiter if present in data causes data shift so there needs to be delimiter combination identified that wont be present in
Sufficient data needs to be available in dev and qa environment for proper testing
This will help in avoiding column name and data mismatch issue.
S.No Applicable Phase
1 Analysis
2 Analysis

3 Analysis
4 Analysis
5 Analysis
6 Analysis

7 Analysis
8 Analysis
9 Design
10 Design
11 Design
12 Design
13 Execution
14 Execution
15 Execution
16 Execution
17 Execution
Title
Is there a need to merge multiple scripts into a single script based on identified pattern?
Do scripts that belong to different layers require different implementations?

Handling of db specific approach in target db (volatile tables) ?

Will additional custom code be required on top of the converted scripts ?
Availabiity of source objects non-compatible with target ?
Analysis of complex scripts and finding performance improvement pattern ?

Analyzing execution time across each layers and report performance issue in case of performance lag compared to current db
Cost estimation for new objects deployment in target or reporting DB
Alignment with customer on naming convention to be followed for containers / files / folders ?
Approach identified and pattern analysis done on keys handling ?
Data alignment validation based on different tools used for data loading (historical / incremental) ?
Analysis and agreeing on data processing requirement for BI layers ?
Validating and mapping key function or workaround availability between source and target ?
Converted output code review by SME to meet customer standards and expectation ?
Avoiding parameters hardcoding in scripts to eliminate manual update of key variables and connection parameters ?
Processing time validation across each datalake layers and validating requirement for maintaining history in each layers ?
Options considered to enhance cluster performance during script execution ?
Checklist(Y/N)
Comment
w
Some Layer specifc changes e.g. : DWL layer scripts adhere to different naming convention compared to ACQ layer scripts
In Teredata, volatile tables are temporary tables. There are various way to implement the functionality of volatile table in the t
virews or normal tables.
Target environment may require pre-processing and post processing of tables before executing the actual SQL transaction.
Certain source-specific objects like record Error, sys and activity tables etc. will not be applicable in the target environment.
Avoid multi joins and implementing techniques to rewrite existing queries in more optimized way

This will help to locate and indentify important factor that is causing delay in execution and reducing performance which migh
Azure SQL DB
In case of storage like ADLS Gen2 or Deltalake
Surrogate Keys

High concurrency and low concurrency data

The coding structure, standards and naming conventions of the converted code need to be evaluated by subject matter experts
This will reduce the manual changes to be done on the top of converted scripts. Scripts can be used across applications.
Based on data retention requirements across layers as per architecture
Photon accelerator in databricks
cluster fine tuning options to enahnce performance and parallel executions across teams / projects
S.No Title

1 Key tables identification during initial phase of the project

2 Partition columns identification

3 Handling tables with larger volumes during historical data offload

4 Using cloud advantage to enable faster data load options

5 Understanding extraction scheduling window

6 Incremental strategy for data offloads

7 Identified and agreed data retention policy

8 Optimization done using databricks

Comment

Key tables identification is a significant factor for project timeline. Defining proper keys and surrogate
key approach implementation during initial phase of project reduced lot of rework and pipeline issues.

Partitioning the tables by date allows for pointing the ETL pipeline to the partitions/folders that need to
be processed and greatly improve the read performance. This applies for both Full loads and incremental
ingestion patterns. Partitioning also helps with Delta table management scenarios such as running
OPTIMIZE command at the partition level

Identify data breakdown policy. Break large tables into smaller chucks of data based on year before
history load to avoid putting too much load on Teradata server. Also check on the capacity of Teradata
for data extraction during batch jobs execution

For history data load TPT is a slower process than NOS. During client engagements, team needs to
ensure Teradata Vantage is opened for leveraging NOS capabilities for faster history data load. The
configurations and server permissions required for the same should also be considered alongside app
server storage configuration for copying intermediate data

Teradata server should have the capacity to support large volume data offloads and extract tables/views
in parallel. There should be runtime window clearly set for history data loads without affecting batch
executions and mitigating slowness issues. Monitoring should also be done for the jobs that run in
schedules
Change data feed is a feature for delta tables that require incremental loads. CDF allows to efficiently
identify the data changes in the form of INSERT, UPDATE, MERGE and DELETE against the base table.
Setting proper incremental columns and right filter conditions reduces DB overloads.

The operational reason for implementing a data retention policy involves proper data backup to help it
recover in the event of data loss. Set data retention policies for inactive (deleted) data and enforcing it
for different tables using both delta VACUUM feature and ADLS Soft Delete feature saves cost
Periodically run ANALYZE TABLE COMPUTE STATISTICS to make sure spark optimizer has accurate
understanding on data distributions of delta tables. This would specifically help AQE (Adaptive Query
Execution) to make better optimization decisions during the execution time

Data Migration Checklist
100% (2)
Data Migration Checklist
6 pages
Unit Test Template
0% (1)
Unit Test Template
6 pages
Business Intelligence Guidebook
No ratings yet
Business Intelligence Guidebook
50 pages
Telecom Case Study - ETL Design Document
No ratings yet
Telecom Case Study - ETL Design Document
9 pages
CDS Views in SAP HANA and How To Create One
No ratings yet
CDS Views in SAP HANA and How To Create One
9 pages
Requirements Gathering Questionnaire Checklist
100% (2)
Requirements Gathering Questionnaire Checklist
3 pages
Crew Management Process Mining
No ratings yet
Crew Management Process Mining
26 pages
Requirements Questionnaire Checklist
No ratings yet
Requirements Questionnaire Checklist
3 pages
Requirements Questionnaire Checklist
100% (1)
Requirements Questionnaire Checklist
3 pages
Etl Testing Usefull Notes PDF Free
No ratings yet
Etl Testing Usefull Notes PDF Free
4 pages
ERP Rollout and Merger - Demerger Process
No ratings yet
ERP Rollout and Merger - Demerger Process
15 pages
Data Migration Testing
No ratings yet
Data Migration Testing
2 pages
Agile Data Integration Best Practices
No ratings yet
Agile Data Integration Best Practices
33 pages
Actions To Consider and Integrate Into Software Design
No ratings yet
Actions To Consider and Integrate Into Software Design
3 pages
Requirements Gathering Checklist
No ratings yet
Requirements Gathering Checklist
3 pages
Business Analyst Role in Integration
No ratings yet
Business Analyst Role in Integration
4 pages
Interview PREP
No ratings yet
Interview PREP
14 pages
SYys
No ratings yet
SYys
2 pages
Oracle Cloud PM 3
No ratings yet
Oracle Cloud PM 3
22 pages
BA Interview Questions
No ratings yet
BA Interview Questions
4 pages
Oracle Cloud PM 1
No ratings yet
Oracle Cloud PM 1
20 pages
Data Warehouse Testing Guide
60% (5)
Data Warehouse Testing Guide
4 pages
Data Migration Strategy Guide
No ratings yet
Data Migration Strategy Guide
81 pages
Data Migration
No ratings yet
Data Migration
31 pages
Oracle Cloud PM 5
No ratings yet
Oracle Cloud PM 5
20 pages
DATA MIGRATION - Learnings
No ratings yet
DATA MIGRATION - Learnings
6 pages
Strategies For Testing Data Warehouse
100% (1)
Strategies For Testing Data Warehouse
4 pages
Project Implementation
No ratings yet
Project Implementation
15 pages
Data Warehouse Testing Guide
No ratings yet
Data Warehouse Testing Guide
5 pages
File Tracking System Overview
100% (1)
File Tracking System Overview
5 pages
Project P
No ratings yet
Project P
6 pages
System Analysis & Design
No ratings yet
System Analysis & Design
141 pages
Oracle Cloud PM 1
No ratings yet
Oracle Cloud PM 1
21 pages
Typical Data Migration Process For A Software Upgradation
No ratings yet
Typical Data Migration Process For A Software Upgradation
6 pages
All Questions
No ratings yet
All Questions
7 pages
Week 07 & 08 & 09
No ratings yet
Week 07 & 08 & 09
74 pages
Data Analysis Portfolio
No ratings yet
Data Analysis Portfolio
26 pages
Harneet Resume
No ratings yet
Harneet Resume
8 pages
Oil and Gas Major Retires Mainframe After Upstream Downstream Split
No ratings yet
Oil and Gas Major Retires Mainframe After Upstream Downstream Split
3 pages
Step 1
No ratings yet
Step 1
2 pages
Cloud Project Management Essentials
No ratings yet
Cloud Project Management Essentials
23 pages
Data Warehousing Career Profile
No ratings yet
Data Warehousing Career Profile
5 pages
Functional Specification: Owners and List of Contacts
No ratings yet
Functional Specification: Owners and List of Contacts
6 pages
Business Analyst Role & Gap Analysis
No ratings yet
Business Analyst Role & Gap Analysis
12 pages
Oracle Cloud PM 4
No ratings yet
Oracle Cloud PM 4
21 pages
Data Integration Expert Profile
No ratings yet
Data Integration Expert Profile
6 pages
Data Migration Plan
100% (8)
Data Migration Plan
15 pages
Employee Leave Management System: Certificate
100% (1)
Employee Leave Management System: Certificate
59 pages
NAME: Raghav Ohri Rollno. 501904173
No ratings yet
NAME: Raghav Ohri Rollno. 501904173
4 pages
Cloud Migration Strategies
No ratings yet
Cloud Migration Strategies
21 pages
1892 - TST Document DV 4876 003
No ratings yet
1892 - TST Document DV 4876 003
5 pages
Oracle Cloud PM 1
No ratings yet
Oracle Cloud PM 1
21 pages
Oracle Cloud PM 3
No ratings yet
Oracle Cloud PM 3
21 pages
SismarJames Adrian T. SIA
No ratings yet
SismarJames Adrian T. SIA
6 pages
Annual Report 1
No ratings yet
Annual Report 1
23 pages
Oracle Cloud PM 2
No ratings yet
Oracle Cloud PM 2
22 pages
Data Integration & ETL Expertise
No ratings yet
Data Integration & ETL Expertise
5 pages
Data Preparation
No ratings yet
Data Preparation
19 pages
KM Data Migration Approach and Strategy For OPM Modules v1 0
No ratings yet
KM Data Migration Approach and Strategy For OPM Modules v1 0
12 pages
DataStage Administration
No ratings yet
DataStage Administration
98 pages
Sap Bods: - Vijaya Polisetty
No ratings yet
Sap Bods: - Vijaya Polisetty
51 pages
SSIS Questions - Highlighted
No ratings yet
SSIS Questions - Highlighted
20 pages
Shubham Pandit: - Delhi
No ratings yet
Shubham Pandit: - Delhi
3 pages
Introduction To Data Warehouse Edited
No ratings yet
Introduction To Data Warehouse Edited
34 pages
Aditya Bharadwaj SG Resume de
No ratings yet
Aditya Bharadwaj SG Resume de
1 page
Department of Computer Science and Engineering: Rajalakshmi Institute of Technology
No ratings yet
Department of Computer Science and Engineering: Rajalakshmi Institute of Technology
16 pages
Google Cloud Data Lakes & Warehouses
No ratings yet
Google Cloud Data Lakes & Warehouses
4 pages
India and SEA Infra Startups - Lightspeed, Google and Github Report (8th Sept 2022)
No ratings yet
India and SEA Infra Startups - Lightspeed, Google and Github Report (8th Sept 2022)
12 pages
Summary Chapter 5 - 7 - Group 4
No ratings yet
Summary Chapter 5 - 7 - Group 4
47 pages
Strings Shorted
No ratings yet
Strings Shorted
6 pages
Unit 2 Data Preprocessing and Association Rule Mining
No ratings yet
Unit 2 Data Preprocessing and Association Rule Mining
31 pages
Data Engineer & ETL Expert Resume
No ratings yet
Data Engineer & ETL Expert Resume
6 pages
Data Literacy Glossary
No ratings yet
Data Literacy Glossary
2 pages
Ab Initio ETL Specialist Resume
No ratings yet
Ab Initio ETL Specialist Resume
6 pages
Kowsalya Selvaraj
No ratings yet
Kowsalya Selvaraj
3 pages
The Buyers Guide To Generative Integration
No ratings yet
The Buyers Guide To Generative Integration
15 pages
Mandeep Singh
No ratings yet
Mandeep Singh
1 page
What Is Informatica?
No ratings yet
What Is Informatica?
4 pages
Power BI Fundamentals - 191202
100% (1)
Power BI Fundamentals - 191202
31 pages
Data Solution Architect
No ratings yet
Data Solution Architect
3 pages
Informatica ETL & Network Architect Roles
No ratings yet
Informatica ETL & Network Architect Roles
3 pages
Data Stage FAQS
No ratings yet
Data Stage FAQS
34 pages
Arnav Verma Data Analyst Resume
No ratings yet
Arnav Verma Data Analyst Resume
3 pages
Lecture 2 - Data Warehouse Architecture
No ratings yet
Lecture 2 - Data Warehouse Architecture
28 pages
Enterprise Reporting Best Practices in An SAP Environment: White Paper
100% (1)
Enterprise Reporting Best Practices in An SAP Environment: White Paper
22 pages

Checklist Best Practices Known Issues v4

Uploaded by

Checklist Best Practices Known Issues v4

Uploaded by

S.

Avoiding business peak time

Handling of db specific approach in target db (volatile tables) ?

High concurrency and low concurrency data

1 Key tables identification during initial phase of the project

2 Partition columns identification

3 Handling tables with larger volumes during historical data offload

4 Using cloud advantage to enable faster data load options

5 Understanding extraction scheduling window

6 Incremental strategy for data offloads

7 Identified and agreed data retention policy

8 Optimization done using databricks

You might also like