Scaled AI Data Volume, Quality, And Model Degradation

Category :

AI

Posted On :

Share This :

 

Because mistakes, bias, and inconsistencies are introduced by poor data quality and spread across the pipeline, lowering accuracy and reliability, machine learning models perform noticeably worse. Predictive power is strongly impacted by key quality parameters, including as accuracy, completeness, and consistency, according to research from the University of Amsterdam in the Netherlands.

 

According to the report, using faulty data to train models can produce inaccurate results that impair business operations, causing monetary losses and harm to an organization’s reputation. The dependability and trust in AI systems at scale are limited in high-stakes industries like banking and healthcare, where even little deteriorations due to poor data quality can lead to expensive or detrimental business choices.

 

Infrastructure constraints and poor data quality are two of the most costly hidden costs that companies must deal with. By 2025, big businesses will need to manage almost twice as much data as they do now, with an average of more than 65 petabytes, according to a Hitachi Vantara analysis. But 75% of IT executives worry that their current infrastructure, which is constrained by slow, unreliable, and restricted data access, won’t grow to accommodate these demands, which would have an immediate effect on AI’s efficacy. These difficulties lead to ineffective decision-making, time waste, and higher operating expenses.

 

Editorial Director Matthew DeMello spoke with Sunitha Rao, SVP for the Hybrid Cloud Business at Hitachi Vantara, on a recent episode of the “AI in Business” podcast. They talked about the data and infrastructure issues that arise when growing AI and how to create dependable, long-lasting processes to get beyond them.

 

Two crucial insights are revealed in this article for any firm looking to successfully scale AI:

Optimizing data for performance and dependability: By putting an emphasis on data governance, freshness, and quality while putting checks in place for PII, anomalies, and redundancy, workflows are strengthened and expensive mistakes are avoided.
Putting an emphasis on sustainable, intelligent, and monitored AI workflows: Performance, affordability, and sustainability are maximized by setting relevant SLOs and allocating workloads effectively.

 

Guest: Hitachi Vantara’s SVP of Hybrid Cloud Business, Sunitha Rao

Knowledge of Cloud Computing, Storage Visualization, and Business Strategy

Brief Recognition: At Hitachi Vantara, Sunitha spearheads innovation and strategic expansion in the provision of game-changing cloud solutions. She formerly worked for Nimble Solutions and NetApp. She graduated from the Indian Institute of Management in India with a Master of Business Administration.

 

 

Data Optimization For Reliability And Performance

Sunitha introduces the topic by outlining a number of major obstacles to scaling AI, including the high infrastructure requirements. She explains how unstructured data is frequently dispersed throughout silos, posing complex governance and compliance challenges. She points out that the inclination to just add more GPUs or data centers is insufficient because bottlenecks are soon created by hardware shortages as well as restrictions on power, cooling, and sustainability.

 

Low-latency, high-bandwidth networks are necessary for distributed workloads, and unified, scalable solutions are needed for traditional storage systems that have trouble with AI read/write patterns. Optimized MLOps pipelines are also necessary in hybrid and multi-cloud setups.

 

Finally, Roa emphasizes that ESG alignment and a definite return on investment are critical due to rising expenses. To close these gaps, we need strong leadership and AI-ready platforms with integrated MLOps, auto-tiering, storage, and elastic compute.

Rao goes on to highlight that bad data is very expensive in AI, particularly when used at scale, summarizing this as “garbage in, expensive garbage out.”

 

She notes that the issue can be resolved if strong workflows are put in place as soon as possible that:

  • Examine the ceilings and floors for errors: Recognize the complete extent of your data’s errors.
  • Deal with data that is noisy or duplicated: Determine and control redundant or unnecessary inputs.
  • Track the gradient variance: Verify that datasets don’t cause model training instability.
  • Assure high-quality data structures: Performance is enhanced by clean, varied, and de-duplicated data, particularly in circumstances when the data is not distributed.
  • Deal with bias and safety: During train/test cycles, skewed or low-quality data can raise costs, spread leaks, and enhance security threats.

 

Rao continues by outlining the need of enhancing AI data pipelines by prioritizing quality over quantity:

Instead of constructing larger haystacks, we ought to consider ways to improve the system’s needles. At that point, you will enhance the data flow degradation aspect. I believe that the quality gates and the freshness of the data must be taken into account. Take streaming ETL, for example:

 

Schema checks, anomaly detection, and, for instance, personally identifiable information are necessary to determine the type of data being utilized. We are considering adopting the PII data service for this reason. The main goal is to examine how to close these quality gaps, consider adding more pauses prior to training and serving, and consider how to avoid skewing the data while still creating a smooth workflow.

SVP of Hitachi Vantara’s Hybrid Cloud Business, Sunitha Rao​

 

Give Intelligent, Tracked, And Sustainable AI Workflows Priority​

Rao discusses the significance of service-level goals, reproducibility, and monitoring in AI workflows. She emphasizes the need for ongoing dataset tracking, the development of root-cause alerts and playbooks, and the transition from outdated threshold-based scripts to self-learning models that adjust at every turn in order to achieve early detection.

 

Rebuilding models, learning from mistakes, and methodically resolving problems all depend on tracking versions of datasets, features, models, and code. Lastly, she emphasizes that in order to guarantee dependable, durable, and constantly advancing AI infrastructure, meaningful SLOs must be established, tracked, and breaches proactively addressed. SLOs should not be limited to basic measures like latency.

 

SLOs, which serve as the commitments you define for each workflow to prevent deterioration, have become fundamental to AI infrastructure, according to Sunita. SLOs give clients a foundation for understanding what can be consistently delivered across data pipelines, training, and serving.

 

Following the establishment of these goals, attention turns to enhancing results and guaranteeing smooth operation of batch workflows, vector storage pipelines, retrieval systems, and offline/online operations. In order to monitor metrics like pass/fail rates, training-to-serving skew, and data freshness, she highlights the necessity of periodical KPIs. By keeping an eye on these signs, teams may determine the point at which degradation starts and put in place the necessary controls to guarantee dependable and effective AI systems.

 

Finally, Sunita discusses how crucial it is to map workloads to the appropriate execution venues, whether they be on-premises, public cloud, edge, or hybrid, since this choice affects performance, compliance, sustainability, ROI, and investments. The design of carbon-conscious operations, tiered storage, and power-efficient infrastructure are all influenced by the location of data.