Follow-Up Data Centric Questions / Thought Speak
by Sandy and Arjun – Revision 11/24/2021
Data Lake > Data Hub Approach for AI / ML
1) Will a Data Lake be provided to us, or will we be responsible to Create it? Either way, a Data
Lake needs to exist, and all data contained within will need to be harmonized. The data lake
model for this type of solution is built on the idea that the data needs to be in one place and
truly accessible, which being organized. The issue with Data Lakes typically, in our experience, is
that AI is less about the data and more about the signal contained within the Data Lake. As Data
Lakes are critical concepts for companies looking to put information in one place and then tap
into it for business intelligence, analytics and big data, the promise almost always, never suffices
to play out, thus, we are recommending that a Data Hub be rationalized. Data Lakes are typically
built for big data and batch processing, but AI and machine learning models need more flow and
third-party connectivity. With a Data Lake we can analyze the lake, but you, as information
query consumers, may not find all the signals needed to learn over time with a built AI solution.
This is a rallying point for many of our customers as corporate enterprises realize that they must
connect to more than their own data to enable their algorithms to provide centralized
intelligence for information consumption. The data lake model by default, doesn't consider AI
and the ability to learn. It needs to adapt to something that enables intelligence systems to
evolve. Thus, the recommendation to stand up a centralized Data Hub if one does not already
exist. Stefanini holds the capabilities and industry experts to assist in materializing this
requirement. For the underlying solution requirement, this is a fundamental piece that cannot
be overlooked.
2) Data Warehouse will be available for us to connect for consumption into the centralized Data
Hub or will be we be building it? If exists, what are the current ETL process for the respective
Source Systems (understand the complete Information Delivery Process Flow) from Source to
Target to Transformation.
3) Designed algorithm and its respective output(s) shall be consumed and ingested via syndication
into the centralized Data Hub, from which AI modelling output shall occur. Are all aligned
fundamentally as a team to this architecture?
4) Data Quality control needs to be established, what are the precise metrics for the current Data
Quality state across all the Information Delivery systems within the Ferrari landscape? Stefanini
holds the capabilities and industry experts to assist in materializing this requirement. For the
underlying solution requirement, this is a fundamental piece that cannot be overlooked or the
value-add of the overall solution deduces itself to zero.