• Outline current data challenges and desired outcomes from data lineage
• Define Specific, Measurable, Achievable, Relevant, and Time-bound critical
success factors against which the POV's success will be measured.
• Examples of Success Metrics:
• Time saved on data issue troubleshooting.
• Improved data quality metrics (e.g., accuracy, completeness).
• Reduced regulatory compliance risks.
• Faster impact analysis for data changes.
• Increased trust in data assets for decision-making
• Identify the most critical data elements and systems that will be the focus of the
POV
• Decide whether you need technical lineage (physical movement) or business
lineage (logical flow), and the level of detail required
• Select specific scenarios where data lineage can deliver immediate and
measurable value, like troubleshooting a persistent data quality issue or
demonstrating compliance for a key report.
• Assess Existing Infrastructure: Evaluate your current data sources, processing
systems, and storage solutions involved in the chosen data workflows.
• Gather Relevant Metadata: Identify and collect existing metadata about the
prioritised data assets, including schemas, data types, transformation rules, and
data quality metrics. Ensure Data Quality: Implement data quality controls to
ensure the accuracy and consistency of the data used in the POV, as the lineage
will only be as reliable as the underlying data
• Implement data lineage tracking. Focus on Automated Lineage: Leverage
automated tracking capabilities to minimise manual effort and ensure real-time
or near real-time updates to lineage information
• Assess the impact of potential changes or actual data quality issues by
leveraging the lineage information
• OpenLineage, Collibra
1. Time, Quality, Changes
2. Regulatory and Trust for decision making
3. Find Critical data elements or systems to focus on for POV
4. Technical or business plan
5. Scenarios where data lineage can deliver immediate, measurable value
6. Implement points one and two on prioritised data elements after collection of
metadata, which includes schema, data types, transformational rules and data
quality metrics
7. Automated tracking using openlineage or collabora