Data-Centric AI: Decoding the Hype
Last Updated on March 24, 2022 by Editorial Team
Author(s): Paul Dovidavicius
Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.
Data-centric approaches to Model Centric-focus on AI
The database and its impact on the quality of ML-based solutions provide different sessions, and it’s important for the NG session (Zagatti, 2021). The advantages are the large investment in the preparation of data and its team. Andrew discussed different advantages of a larger contribution in the preparation of data with his team, demonstrating that improving data quality that is exists effective to collect three times the data with a high amount.
Different data, the same model
It is known by the components that make up the solution that helps in development as AI practitioners:
“AI System = Code + Data, where code means model/algorithm”
It refers that we can improve it by coding or improving data by giving some solution. Both solutions can work better for it. What is the best way to strike the appropriate balance in order to succeed?
The data is free through databases or Kaggle; for, instance, it provides more models in which centric approach is dealing with more or less well-behaved improve solutions. It means that improving solutions required focusing on the only element that could be tweaked and changed, the code. However, what we see in the industry is a very different story. Andrew NG expressed a viewpoint so I completely agree: until now, the approach of model-centric has had a significant impact on the available gear in the ML area teams for various data science.
Data-centric vs. model-centric
In my opinion, achieving a solid AI solution necessitates giving the balance of what is known as a “model-centric vs. a data-centric perspective”; yet, I am aware that the data side retains the greater stakes of value. Happily, this viewpoint is not based on intuition, and “Andrew NG and his team” chose to demonstrate it with different experiments using real-world data. But first, let’s clarify what it means to be model-centric and data-centric.
One of the examples presented during the session was the detection of faults in steel sheets — it gives the sequence of photos from different sheets of steel to construct the best model to recognize the defects during the process of manufacturing, and it provides the best accuracy on the base of baseline systems, and it gives vision model to well-tuned hyper-parameters. The goal is to achieve 90 percent accuracy. How it can be accomplished by different accuracy.
The baseline model is to improve the 90 percent, and it gives impossible for the model-centric and it gives an improvement in the network architecture search and it gives the state of the art architecture. The data-driven approach is to identify the clean noisy labels and inconsistency. The findings are as follows:
It deals with the steel sheets defects detection in which the baseline presents the accuracy in the baseline is 76.2%, model-centric is +0% and data-centric is +16.9% that shows the improvement in the data-driven approach (Dario, 2021).
The benefits of adopting an approach data-centric are not limited to the vision of computers; they also apply to other areas such as natural language processing (NLP) and “tabular and time-series data.”
Why is it important to switch from a model to a data-centric approach?
Data is extremely important in AI research, and adopting a strategy that prioritizes obtaining high-quality data is critical — After all, useful data isn’t easy to get by just noisy, but also extremely expensive to get. AI is treated in the same way that we would care for the greatest materials while building a house. The right hyper-parameters and model selection are giving generalizable results and it gives more performant and it optimizable to influence systems and it gives high-quality models to train and it utilized to train the models. AI provides the clean and de-noising datasets to become the fundamental differentiator in the structure of data. Semi-supervised learning techniques can be highly beneficial for detecting and correcting inconsistencies, and synthetic data can be used to produce and simulate more events to aid with generalization issues.
Data is one of the most expensive assets today, thanks to the infrastructure involved, the number of human resources dedicated to it, and the rarity of having it acquired in optimum circumstances. Data quality must be maintained and improved at every stage of AI development, each of which will, by definition, require various frameworks and tools and, don’t forget, this must be delivered and measured on a continual basis, making MLOps a valuable ally in achieving a suitable and successful data-centric paradigm in AI solution development.
Hajij, M., Zamzmi, G., Ramamurthy, K. N., & Saenz, A. G. (2021). Data-Centric AI Requires Rethinking Data Notion. arXiv preprint arXiv:2110.02491.
Zagatti, G. A., Ng, S. K., & Bressan, S. (2021). A Data Warehouse of Wi-Fi Sessions for Contact Tracing and Outbreak Investigation. In Transactions on Large-Scale Data-and Knowledge-Centered Systems XLVIII (pp. 85–104). Springer, Berlin, Heidelberg.
Rafi Karlansik., (2021). Need for data-centric ML platforms. Available at: https://databricks.com/blog/2021/06/23/need-for-data-centric-ml-platforms.html
Fabiana Clemente., (2019) From model-centric to data-centric. Available at : https://towardsdatascience.com/from-model-centric-to-data-centric-4beb8ef50475
Dario Radecic., (2021). Data-centric vs Model-centric AI? The Answer is clear. Available at:
Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI