Topics in this article

You’re ready to take your organization to the next level with AI – but how ready is your data?

The amount of data generated each year has been growing exponentially over the past decade, driven by the proliferation of digital devices, internet connectivity, social media, cloud computing and other technological innovations. According to a report by Statista, the volume of data created, captured, copied and consumed globally has grown from about 2 zettabytes (ZB) in 2010 to an expected 149.3 ZB this year.

We’re looking at a scenario where the volume of data will more than double every four years, on average. This raises an important question: can you trust the data stored by your organization to deliver the AI results you’re looking for?

Do you know where your data is?

Without proper data management, AI cannot deliver its full potential. And to manage data properly, you first need to understand it.

One way of doing this is to classify data according to how accessible it is and how often it’s used in your organization. Using this classification, we can categorize data into three types: active data, passive data and dark data.

Active data is relevant to your organization’s current needs. It’s accessed often for analysis, decision-making, reporting or communication. It’s usually stored in databases, data warehouses or cloud platforms so it can be retrieved and worked with easily. Passive data, on the other hand, is what you’ll find backed up in less-accessible storage devices. Is may have potential value, but it’s not actively used and is typically kept for future reference or compliance reasons.

Dark data can come from sources such as sensors, logs, emails, documents, images and videos. It is unknown, unused or forgotten by the organization. Dark data may contain hidden insights – or risks – but these are not visible or accessible because dark data is not stored, catalogued or analyzed in a systematic way.

According to Statista, between 2010 and 2020, active data increased only slightly, from 8% to 9%; passive data decreased considerably, from 71% to 36%; and dark data more than doubled, from 21% to 55%. These trends suggest that the growth of data generation has outpaced the growth of data storage and analysis, leading to a large amount of data being unaccounted for or unused by organizations.

Unanalyzed, untapped, unsafe: dark data is the most significant risk to AI

Data can provide valuable insights that prompt you to think smarter and act faster to create business value. But it is not without risk.

When using data for AI, you need to understand the broad spectrum of risks in the form of privacy breaches, ethical dilemmas and compliance issues – among others.

Of the data types discussed above, dark data poses the biggest risk to AI, for three main reasons:

  1. Dark data may contain valuable information that could improve the performance or accuracy of AI models – but, because you don’t know it’s there or can’t access it easily, you don’t use it. This means AI may miss out on important features or patterns that could enhance its learning or prediction capabilities.
  2. Lack of visibility and governance means that AI may also inadvertently expose or compromise confidential data. If AI uses dark data that contains sensitive or personal information that is not protected or regulated, you may have to deal with legal or ethical issues that arise should you violate the privacy or security of individuals or organizations
  3. You’ve heard the saying “garbage in, garbage out”, and it applies to data and AI, too. Dark data may contain erroneous or outdated information that could impair the quality or reliability of AI models. Without data management and maintenance, information is not verified or updated – and AI may incorporate or propagate inaccurate or obsolete data that could lead to faulty or misleading outcomes.

Given these risks, it’s essential to ensure that your data is properly stored, catalogued, analyzed and used for AI purposes.

The four essentials of an AI-ready data strategy

AI-ready starts with data readiness: your ability to collect, store, manage, analyze and use data effectively. Data readiness involves several dimensions, such as data quality, data accessibility, data integration, data security, data ethics, and data culture.

To maximize the value of all data in your organization and minimize risk, you need a data strategy that covers these four elements:

  • Data discovery: Identify and locate all the data sources in your organization, including those that are currently hidden, neglected or forgotten. Use tools such as data catalogs, data quality assessments and data lineage analysis to help you discover and document your data assets.
  • Data governance: Establish and enforce policies and standards for how data is collected, stored, accessed, shared and used. To protect and regulate your data assets, explore best practices for data security, data privacy, data ethics and data compliance.
  • Data analysis: To transform your data into meaningful insights that can inform your business decisions, objectives and actions, use tools such as data visualization, data mining and data analytics to better understand your data assets.
  • Data utilization: AI doesn’t stop with integrating and applying your data insights to your AI models and systems. You also need to monitor and evaluate the performance and impact of these models. Data pipelines, platforms and feedback loops will help you optimize and improve your data assets.

By implementing a data strategy that covers these four aspects, you can ensure that your organization is ready for AI and can manage the risks of dark data.

Conclusion

There is no doubt that AI is transforming the world of business and creating new opportunities everywhere. However, to successfully leverage AI, you need to a robust data strategy that ensures the quality, security and governance of your data assets.

To use the power of AI to create value and impact for your customers, employees, stakeholders and society, align your data strategy with your business goals. Then, invest in the right data infrastructure and tools, establish clear policies and standards, foster a data-driven culture and mindset, and continuously monitor and improve data performance and outcomes to ensure that your data is always AI-ready.

WHAT TO DO NEXT
Read more about NTT DATA’s data-driven intelligence for success to benefit from our expertise in data strategy and governance, data infrastructure, AI and more.