Data quality is the foundation of ethical and sustainable Artificial Intelligence (AI) development, yet many organizations are still prioritizing technological advancements over establishing responsible frameworks for data use and long-term sustainability. Yes, that’s what Hitachi Vantara reported in the latest “How AI is Shifting Data’s Foundation” study. As AI advances astonishingly, businesses find their traditional data infrastructure is pushed to its limits. In a world where technology is evolving faster than ever, IT leaders are under immense pressure to keep up. Businesses and customers now demand faster, more accurate, and more reliable AI-driven insights.
But, how are IT leaders responding to this relentless pace of change?
To explore the challenges and opportunities that businesses now face, Hitachi Vantara surveyed 1,200 IT decision-makers from large organizations across 15 countries. 76% of IT leaders revealed more than half of their stored data is “unstructured.” The results are published in a report titled, “The Hitachi Vantara State of Data Infrastructure Survey”. The report reinforces the critical role that data infrastructure and data management play in the overall data quality and the ability to drive positive AI outcomes. Their insights paint a clear picture of the evolving demands on data infrastructure and how critical it is for IT leaders to ensure their data systems can handle the surge of information required for AI.
One of the most striking revelations from this research?
65% of large organizations have a clear priority with their GenAI strategies as they prioritize bigger and more general-purpose LLMs instead of focusing on smaller, specialized ones. This shift is happening even though larger models require significantly more resources to train AI models, often consuming up to 100 times the energy of standard models. While IT leaders recognize that data quality is essential for AI success, many still fail to prioritize it adequately in their actions. This gap between awareness and execution presents a significant challenge to organizations looking to take full advantage of the data they collect and store.
Sustainability and Ethics in AI Development: A Critical Focus for the Future
It is impossible to think of AI technologies, including LLMs and NLPs without essential considerations for responsible AI model development. However, Hitachi Vantara found that 60% of IT leaders have a different approach to technology innovations versus ethical guidelines. In 2024, as per the survey, most IT leaders are prioritizing technological capabilities over ethical frameworks and policies. This approach often leads to ethical concerns being addressed after the technology is already in place. Additionally, 29% of organizations still do not factor sustainability into their AI strategies, despite growing calls for centering efforts toward sustainability.
The lack of a sustainability focus in AI development is short-sighted, especially as regulations around AI sustainability are being introduced globally.
Data infrastructure built without sustainability in mind may need to be completely overhauled to comply with future regulatory requirements. In the U.S., new sustainability reporting mandates will require large public companies to disclose their environmental impact starting in fiscal year 2025, with full implementation across all public companies expected by 2026/2027.
Similarly, in Europe, the EU’s upcoming Artificial Intelligence Act includes ethical guidelines for AI development, placing emphasis on transparency, accountability, and sustainability.
This regulatory pressure is growing, but many organizations still lack clear standards or guidance on what constitutes “sustainable AI.” About a third (34%) of IT leaders report that the absence of such standards is a barrier to implementing ethical and sustainable AI practices. As new laws and ethical frameworks shape the AI landscape, organizations must prioritize sustainability and ethics alongside technological advancement. Failure to do so could result in increased costs, regulatory compliance challenges, and a loss of trust from consumers and stakeholders.
The Explosion of Data and the Growing Need for Quality
The sheer volume of data generated and stored by large organizations is expanding at a dizzying rate. Last year, IT leaders expected data storage to double within two years.
Yet, the reality is that data storage needs have almost tripled!
Today, the average large organization holds around 150 petabytes (Pb) of data — an amount so vast it would be the equivalent of storing every film ever made worldwide nearly 200 times over, in 4K. And this is just the beginning. By the end of 2026, businesses expect to be storing over 300Pb of data.
This explosive growth in data presents a unique opportunity for AI-powered insights and innovations. But it also presents a massive challenge. For AI to function at its best, high-quality data is essential. And the truth is, without it, the results can be subpar — or worse, misleading.
Data Quality Is Critical for AI, Yet It’s Often Overlooked
Despite the clear link between data quality and successful AI implementation, there is a surprising disconnect between what IT leaders know and what they do. According to the research, 38% of IT decision-makers identify data quality as one of the top factors for implementing AI technologies, such as Generative AI (GenAI). However, only a few are actively prioritizing it.
This disconnect is concerning because high-quality data is the foundation of AI success. The research highlights that data quality is the second-highest concern for IT leaders after securing data, with 37% of them flagging it as a key challenge when deploying AI solutions. In some countries like India, the focus on data quality is even stronger, with 58% of IT leaders ranking it as a priority.
The Struggles with AI Accuracy: A Data Quality Crisis
The rapid adoption of AI technologies has been accompanied by mixed results. IT leaders report successes in AI implementation between 76% of the time (when using free models) and 85% of the time (when collaborating with global systems integrators). However, the bar for AI accuracy remains low. Only 42% of AI results are considered accurate by IT decision-makers, and one in five AI models are described as “hallucinating” — producing largely irrelevant or erroneous outcomes.
The current state of data management reveals a significant disconnect, with only 38% of respondents reporting that data is consistently available when needed.
Even more concerning, just 33% believe that the majority of outputs from their AI models are accurate. Meanwhile, a staggering 80% acknowledge that most of their data is unstructured, a challenge that grows more complex as data volumes continue to surge.
Despite these challenges, few organizations are taking active steps to improve data quality.
Nearly half (47%) do not tag data for effective visualization, only 37% are focused on enhancing the quality of training data to better explain AI outputs, and over a quarter (26%) fail to review their datasets for accuracy and reliability. This lack of attention to data quality could exacerbate risks as reliance on AI and data-driven decision-making increases.
Worse still, only 36% of IT leaders trust their AI models more than half of the time. This lack of trust in AI outputs is a serious concern, especially as AI becomes more deeply integrated into decision-making processes across industries.
Why Data Quality Is So Important
To better understand why data quality is so vital to AI’s success, we need to explore what “quality data” actually means in the context of AI.
This is crucial for AI development teams because the quality and accessibility of data directly impact the performance, accuracy, and reliability of AI models. When data is not readily available or properly structured, AI systems may struggle to generate meaningful insights or deliver accurate results, leading to ineffective or biased outputs.
Unstructured data, which makes up a large portion of many organizations’ datasets, is particularly challenging because it requires additional effort to process and interpret correctly. Training AI models on poor-quality or incomplete data can lead to flawed predictions and decisions, reducing trust in the AI system and its value.
Furthermore, AI teams need high-quality, well-tagged, and well-maintained datasets to ensure that their models transparently explain their outputs. This is especially important as AI becomes more integrated into decision-making processes where explainability is essential for ethical, legal, and operational reasons.
Without a focused effort on improving data quality—such as tagging data for visualization, refining training datasets, and reviewing data for accuracy—AI development teams risk creating systems that are inefficient, biased, and could harm the organization’s reputation and objectives. Therefore, addressing data management challenges is not just a technical issue, but a foundational element for building effective, trustworthy AI systems.
Therefore, for organizations, quality data isn’t just about quantity or size — it’s about the accuracy, completeness, and relevance of the data used to train AI models.
- Accurate Data Leads to Reliable Insights: AI models are only as good as the data they are trained on. Inaccurate, incomplete, or outdated data can lead to flawed conclusions, biased outcomes, or systems that simply don’t work as expected. For example, in sectors like finance and healthcare, poor data quality could result in missed opportunities or even dangerous missteps — like false fraud detection or incorrect medical diagnoses.
- Data Completeness Prevents Gaps in AI Analysis: AI relies on patterns in data to make predictions. If critical pieces of data are missing or incomplete, AI systems can miss key insights, leaving gaps in analysis that could affect decisions. In large organizations where data is siloed or fragmented across different departments, ensuring that all data sources are unified and accurate becomes a critical task.
- Bias-Free Data Is Key to Ethical AI: AI systems trained on biased data can perpetuate discrimination, whether in hiring practices, loan approvals, or other business operations. High-quality data means addressing biases and ensuring that training sets are representative, diverse, and inclusive. Without proper data governance, organizations risk developing AI systems that are not only inaccurate but also ethically flawed.
Strategies for IT Leaders to Prioritize Data Quality
As AI becomes more critical to business operations, IT leaders must take deliberate steps to ensure that data quality is at the forefront of their AI strategies.
Here are a few ways they can address this issue:
1. Implement Robust Data Governance Frameworks
Data governance involves creating a framework of policies and procedures to ensure data is accurate, secure, and compliant. By defining ownership, stewardship, and accountability for data quality, organizations can ensure that data remains clean, consistent, and trustworthy.
2. Invest in Data Quality Tools and Platforms
Advanced data management and quality tools are essential for identifying and correcting data errors before they can affect AI outputs. Investing in AI-powered data quality platforms can automate error detection, data cleansing, and data validation, making it easier to maintain high-quality data.
3. Ensure Data Standardization Across the Organization
Data standardization ensures that data is formatted consistently, reducing discrepancies and making it easier to integrate data from different departments and systems. A unified data approach helps prevent issues that could arise from fragmented or inconsistent data sources.
4. Establish Continuous Data Monitoring and Feedback Loops
Data quality isn’t a one-time task. It requires ongoing monitoring, validation, and optimization. Establishing feedback loops that regularly assess the quality of data used for AI models will help organizations stay ahead of potential issues and ensure their AI systems remain accurate and reliable.
The Importance of Trusted Third-Party Support for AI Initiatives
As organizations progress with AI initiatives, many IT leaders are recognizing the critical need for trusted third-party partners to support key areas of their infrastructure. According to a recent survey, CIOs and CISOs face growing challenges in several areas:
- Hardware: To meet the demands of AI, hardware must be secure, available around the clock, and efficient in supporting sustainability goals. 22% of IT leaders report needing assistance in developing scalable, future-proof hardware solutions.
- Data Storage and Processing: Effective data storage solutions are essential for ensuring data security, accessibility, and sustainability. 41% of leaders require help managing ROT (redundant, obsolete, or trivial) data storage and preparing data, while 25% seek expertise in optimizing data processing.
- Software: Secure and resilient software is critical for mitigating cyber risks and ensuring seamless access to data. 31% of IT leaders turn to third-party support for developing AI models that are both effective and secure.
- Skilled Staff: The skills gap remains a significant challenge, with 50% of leaders addressing it through hands-on experimentation and 36% relying on self-teaching to build AI expertise within their teams.
Given these challenges, partnering with trusted third-party experts can help CIOs and CISOs accelerate their AI initiatives while ensuring robust security, efficiency, and scalability across hardware, software, and data management.
Conclusion: Bridging the Gap Between Data Quality and AI Success
As AI transforms industries and business operations, the need for high-quality data has never been more critical. IT leaders know that data quality is essential for successful AI implementation, yet many still fail to prioritize it effectively. This gap between recognition and action is one of the key barriers to AI success.
In 2025, implementing comprehensive data governance strategies, investing in advanced data quality tools, and ensuring data consistency and completeness would be top priorities for IT leaders. Finding success depends on how leaders unlock the true potential of AI with reliable and trustworthy data infrastructure platforms such as Hitachi Vantara. Organizations that get data quality right will be better positioned to leverage AI’s full capabilities, driving business growth, efficiency, and innovation in an increasingly data-driven world.
Cyber Technology Insights: Sea Street Technologies Unveils EdgeSentinel
To participate in our interviews, please write to our CyberTech Media Room at news@intentamplify.com