Author: Corey Biehl
The proliferation of data has been a tremendous enabler for AI initiatives. With so many different types of data and many formats available, however, its true value can be hard to determine. The goal for most companies, though, is simply to make better decisions, faster. That’s where data and AI intersect with business value.
With all the buzz around AI capabilities now, we wanted to discuss a strategic vision for bringing your data into alignment with the goals your organization has for AI.
There are three primary approaches to aligning your data with the AI that utilizes it: Bottom Up, Top Down and Meet in the Middle. We have assisted organizations with each of these three, and additionally, have used AI to help target which data sources have the most impact to be cleansed and validated first, deriving extra value from these efforts.
There are three high-level areas of focus for AI implementation and model building within your business.
- Data Quality & Governance
- AI Testing Frameworks
- Data Model Architecture
Data Quality & Governance
Data’s usefulness is determined by its quality. A good data quality initiative focuses on improving the original data without any loss in context. It focuses on 5 main points: Accuracy, Completeness, Timeliness, Relevancy & Reliability.
Data Governance, on the other hand, defines the necessary processes and supporting roles that ensure data is of the highest quality. This includes defining roles and responsibilities, aligning them to support governance processes, and establishing clearly defined ownership.
All these characteristics apply to data that is being used for AI implementation and models. In coming articles, we will discuss the types of frameworks that can be used for each of the organizational approaches to data preparation in depth.
Each of the three approaches to data alignment (Bottom Up, Top Down and Meet in the Middle) have different priorities when it comes to accelerating data availability. The tooling and governance postures need to reflect the timeframe and motivation for each of these approaches.
[Read More: Key Considerations for Selecting Data Governance Tools]
AI Testing Frameworks
An AI Testing Framework is necessary to ensure that data and solutions can be evaluated as usage scales. The sheer volume of data points and the need to ensure testing scopes are wide enough to provide meaningful results means that manual testing will be unable to support this new AI-driven world.
Automated testing frameworks bring speed and scale, ensuring quick responses are delivered to the development teams. This provides additional safeguards to the AI solution’s scope and often times exceeds its delivery schedule. Examples of this approach include:
- Automated logging and auditing
- Data regression tests and model monitoring
- CI/CD and ML Ops to enable A/B testing between data pipeline and model versions
Data Model Architecture
Before investing heavily in AI, there are also questions related to data architecture which need to be addressed. Does your organization have the right data architecture in place to spring into AI and deliver meaningful insights? How big is the gap between today’s architecture and the one needed to support AI?
Enjoying this insight?
Sign up for our newsletter to receive data-driven insights right to your inbox on a monthly basis.
Aligning AI Implementation & Data
There are three primary approaches to aligning your data with the AI that utilizes it:
Bottom Up
Create a framework for quality data governance. Apply these practices to organize data into logical schema.
Pros: Builds a foundation of quality data for the entire organization to use.
Cons: Can take a very long time to develop enough data structure to meet the needs of modelling and data science.
Top Down
Rapidly build and test cycle of AI using raw, un-governed data. Create value with a successful model and then use that value to drive data governance imperatives.
Pros: Fast time to value. Allows leading the market with AI innovation.
Cons: Data may shift and cause model drift. Changing data may require continuous relearning of data sources by data science team.
Meet in the Middle
Create and follow data governance best practices. Create curated data sources. Where data science and AI work begins, created dedicated data lakes and govern these lakes with tools that allow contiguity between data warehouses and data science data lakes.
Pros: Creates a separation between primary data sources such as a data warehouse, and the modified data sources needed for data science, reducing issues with confusion and governance.
Cons: Pushes the boundaries of current tools and governance practices, requiring specialized skills to bring data sources and data science needs into alignment.
Upcoming articles related to our AI Data & Analytics series will explain in detail the technologies and approaches that we use to achieve the best results for your organization. Our goal is to maximize your ability to accelerate AI implementation and innovation while also retaining a controlled, well organized data landscape.
RevGen has a proven track record of delivering Analytics and Data Insights solutions for clients. If your organization is currently considering improving your Data or Analytics services, please contact us for an initial consultation on which solutions could work for your organization or visit our Artificial Intelligence page to learn more.
Anne Lifton is a Principal Architect of Data Science and Artificial Intelligence at RevGen. She has over 10 years of experience in building, deploying, and managing the lifecycle of data science models across several industries and all three major cloud platforms.
Corey Biehl is a technology leader in RevGen’s Analytics & Insights practice. He is passionate about designing and developing data and analytic solutions that make a difference.