Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Quality Data Drives the success of Machine Learning and Artificial Intelligence
Machine Learning

Quality Data Drives the success of Machine Learning and Artificial Intelligence

Last Updated on July 18, 2020 by Editorial Team

Author(s): Mohua Sen

AI/ML application to perform analysis and generate insights, you need to provide useful qualityΒ data.

History says the 16th century was the time during which the rise of Western civilization occurred. During this time, Spain and Portugal explored the Indian Ocean and opened worldwide oceanic trade routes, and Vasco da Gama was given permission by the Indian Sultans to settle in the wealthy Bengal Sultanate. Large parts of the New World became Spanish and Portuguese colonies, and as the Portuguese became the masters of Asia’s and Africa’s Indian Ocean trade, the Spanish opened trade across the Pacific Ocean, linking the Americas withΒ India.

Another linking happened between minds and machines during this time. French philosopher, scientist & metaphysician, RenΓ© Descartes (1596–1650), came up with a world in his mind where machines could make decisions. And then, in 1956, an American computer scientist and cognitive scientist John McCarthy coined the term Artificial Intelligence (AI), which defines β€œthe science and engineering of making intelligent machines.” AI is the ability of a computer program or a machine to think andΒ learn.

As time rolled over, at present, in 2020, we are now using AI widely across sectors. Be it supporting organizations to take well-thought-out decisions, or something as regular routine as sorting our emails, or to even helping credit risk manager or detecting financial fraud, this branch of technology, by teaming up with advanced data analytics has all the markings for creating a revolutionary effect.

The AI scenarios show the technology’s unbelievable computational power, but in practical, operative applications begin with data. Data is the fundamental of any advanced analytics algorithms, which are the backbone of AI/ML models. It must be supplied in the required form that the algorithm understands. The main function of AI/ML algorithms is to unlock the concealed information available in the data. The algorithm will be resulting in incorrect insights if the data available is of poor quality. This might end in revenue loss for the project or organization. A Forrester report on β€œAI Experiences A Reality Check” indicates that the data quality is one of the utmost challenges towards accomplishing a desired result from the AI/ML systems in enterprises. Most organizations lack a clear understanding of the right data needed for ML models (according to Forrester), and hence businesses often struggle with data preparation as perΒ need.

Human beings learn from experience. I remember when I learned things in my life, when I was younger, like hitting my finger on a hot plate taught me how to deal with it in the future through perception. On the contrary, machines follow instructions. They need to be trained, programmed to do things, e.g., any car manufacturing company has machines that put different parts togetherβ€Šβ€”β€Šthey are programmed, they are just following instructions. But for machine learning is a process where both things are tied togetherβ€Šβ€”β€Šlearning from experience and following instructions. Here the only difference is β€œlearning from data,” so we need good quality data to make it effective. And to control the quality of data, one needs rules in place. So how is good dataΒ defined?

While describing good data quality, we should focus on the important dimensions of data quality. Though not all dimensions may be relevant for every field, one should have a clear understanding of these dimensions while thinking of enhancing the quality ofΒ data.

Completenessβ€Šβ€”β€ŠLevel at which desired data attributes are supplied. Your data does not need to be 100% complete, but you need to keep the focus on a few areasβ€Šβ€”β€ŠAre there any value missing? Are data being captured in the full extent at the source? For data to be useful, you need to see the whole picture, not just part of it. For example, all employees have a location.

Accuracyβ€Šβ€”β€ŠDegree to which data should match the agreed source. For example, the initial base salary reflects the amount on the contract.

Uniquenessβ€Šβ€”β€ŠExtent that data should be uniquely stored in one place and not duplicated e.g., there must not exist multiple records for the same employee. Each record should be unique based on a given criterion; otherwise, the risk of accessing outdated information increases.

Integrityβ€Šβ€”β€ŠData is traceable back to the source. It’s the extent to which data adheres to defined business rules, accepted values, and accepted formats e.g., employee gender is F, M, orΒ U.

Consistencyβ€Šβ€”β€ŠExtent to which identical data must have the same value wherever it is stored or displayed. For example, the aggregated base salary by cost center is consistent betweenΒ systems.

Timelinessβ€Šβ€”β€ŠDoes the data represent reality from the required point in time? The data should be refreshed, including acceptable systems β€˜lag’ when values change e.g., base salary updated after promotion within xΒ days.

So to have good quality data, at the initial level, data quality assessment needs to be performed in order to confirm the data quality dimensions, and subsequently, a remediation process should be in place to prevent any data issues at the source. According to research, inaccurate or incomplete data can lead to a 20% drop in productivity; i.e., companies that did put a focus on high-quality data saw a revenue increase of aroundΒ 20%.

We can see that high-quality data is the need for the hour, and every organization should establish a data quality assessment process at the source itself so that all the downstream applications can have data of good health. The far-fetched influence of AI/ML models might get overlooked or delayed due to poor quality of data. The data quality and master data management is the utmost important part of this competitive era to reduce cost. We should remember the 1–10–100 rule: β€œIt costs: $1 to verify the accuracy of data at the point of entry, $10 to correct or clean up data in batch form, and $100 (or more) per record if nothing is done at the initialΒ level”.

References

  1. Vadime Elisseeff (1998). The Silk Roads: Highways of Culture and Commerce. Berghahn Books. ISBN 978–1–57181–221–6.
  2. Nanda, J. N (2005). Bengal: the unique state. Concept Publishing Company. p. 10. 2005. ISBN 978–81–8069–149–2. Bengal […] was rich in the production and export of grain, salt, fruit, liquors and wines, precious metals, and ornaments besides the output of its handlooms in silk and cotton. Europe referred to Bengal as the richest country to tradeΒ with.
  3. β€œPortuguese, Theβ€Šβ€”β€ŠBanglapedia.” en.banglapedia.org. Archived from the original on 1 AprilΒ 2017.
  4. Portal: Modern historyβ€Šβ€”β€ŠWikipedia. en.wikipedia.org/wiki/Portal:Modern_history
  5. β€œ16th century”. en.wikipedia.org.
  6. β€œWhat is AI? / Basic Questions,”. jmc.stanford.edu/artificial-intelligence.
  7. β€œArtificial intelligenceβ€Šβ€”β€ŠSimple English Wikipedia.” simple.wikipedia.org/wiki/Artificial_intelligence
  8. β€œData Is The Foundation For Artificial Intelligence.” www.forbes.com. OctΒ 2018.
  9. β€œThe 5 Key Reasons Why Data Quality Is So Important”. cerasis.com/data-quality.
  10. β€œThe Cost of Quality: The 1–10–100 Rule”. www.makingstrategyhappen.com.
  11. β€œForrester Infographic: AI Experiences A Reality Check.” www.forrester.com/report/. MayΒ 2019.


Quality Data Drives the success of Machine Learning and Artificial Intelligence was originally published in Towards AIβ€Šβ€”β€ŠMultidisciplinary Science Journal on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Feedback ↓