Airflow is on the Cloud | ELT Pipeline Orchestration With Airflow & AWS
Author(s): Kaan Boke Ph.D. Originally published on Towards AI. Photo by James Wheeler: https://www.pexels.com/photo/symmetrical-photography-of-clouds-covered-blue-sky-1486974/ You will see the ELT pipeline with Airflow orchestration. You will learn to load data to the PostgreSQL database directly from AWS S3. Youβll do everything with the …
Do Not Curse Your Machine Learning Models When They Are Not Performing Well in Real-time β Instead, Do This
Author(s): Suhas Maddali Originally published on Towards AI. While the performance of machine learning models can seem extremely good on the test data, failing to understand the chances of them not performing well on real-time data can cause a lot of loss …
Amazing SQL Queries for Data Science
Author(s): Ashbab khan Originally published on Towards AI. Top highlight SQL is the essential language for developers, Engineers, and Data professionals. Intermediate knowledge in SQL gives you an edge in your data science career. Photo by Shubham Dhage on Unsplash So in …
YouTube Dislikes Prediction in Real-time β Working With a Combination of Data; A Practical Guide
Author(s): Nafiu Originally published on Towards AI. Hi everyone, this is a practical guide to a fascinating topic; today, we will discuss how you can work with a combination of mixed data. Well, we have all gone through it when we go …
How to Maximize ML Project Success with Efficient Scoping? | MLOps 5
Author(s): Akhil Theerthala Originally published on Towards AI. How to Maximize ML Project Success with Efficient Scoping? U+007C MLOps 5 In our past articles of this series, we have seen many things. We started our journey by looking at the lifecycle of …
Diagnosing the Stubborn Mediocrity of the Western Bulldogs
Author(s): Ranganath Venkataraman Originally published on Towards AI. Using data to get insight into why my supported Australian Football League team perennially languishes and the corresponding lessons learned Photo by Tingey Injury Law Firm on Unsplash Note β this article reflects my …
From Detection to Correction: How to Keep Your Production Data Clean and Reliable
Author(s): Youssef Hosni Originally published on Towards AI. Table of Contents: In Production ML, data quality is everything. No matter how great your models or algorithms are, if the data you feed them is garbage, youβll get garbage results. But how can …
3 Efficient Ways to Filter a Pandas DataFrame Column by Substring
Author(s): Byron Dolon Originally published on Towards AI. How to quickly filter string columns in Pandas for Machine Learning pre-processing Used with permission from ohmintyartz The Pandas library is used extensively not only for crunching numbers but also for working with text …
From Synonyms to GPT-3: The Ultimate Guide to Text Augmentation for Improving Minority Class Labels in NLP
Author(s): Harshmeet Singh Chandhok Originally published on Towards AI. Image by vectorjuice on Freepik βIn NLP, we often encounter problems related to class imbalance, where minority classes are underrepresented in the training data. Text augmentation techniques can help address this issue, improving …
US Department of Educationβs DataLab: A Data Scientistβs Guide
Author(s): Adam Ross Nelson Originally published on Towards AI. An Insiderβs Guide to Navigating and Sharing Educational Insights As a data scientist, one of the fascinating aspects of our work is that it is, by nature, replicable. This characteristic promotes transparency and …
FineTuning Local Large Language Models on Your Data Using LangChain
Author(s): Serop Baghdadlian Originally published on Towards AI. Stop sending your private data through OpenAI API! Use local and secure LLMs like GPT4all-J from Langchain instead. Photo by Annie Spratt on Unsplash The recent introduction of Chatgpt and other large language models …
Simplifying MongoDB for Data Scientists: Essential Commands You Should Know
Author(s): Gaurav Nair Originally published on Towards AI. A Guide to NoSQL Fundamentals and MongoDB Commands for Beginners Image source: https://www.mongodb.com/brand-resources Table of Contents Introduction What is NoSQL? Limitations of RDBMS and the need for NoSQL SQL vs NoSQL MongoDB How do …
Three Insidious Data Fallacies to Recognise in the Age of AI
Author(s): John Adeojo Originally published on Towards AI. A Brief Examination of Data Fallacies and Their Influence on Decisions Image by Author: Generated with Midjourney In the age of AI, business leaders are becoming increasingly aware of the merits of a data-driven …
Data Lifecycle in Production: Defining and Collecting useful data.
Author(s): Akhil Theerthala Originally published on Towards AI. Photo by Mika Baumeister / Unsplash Recently, I have worked on an MLOps series, where I briefly discussed the different steps involved in the lifecycle of a Machine learning project. We started with the …
Box Plot, Violin Plot, Ridgeline PlotβββOh My
Author(s): Adam Ross Nelson Originally published on Towards AI. How, when, and why to use these lesser-appreciated plots I say move over mere histograms! Make room for the triumvirate of data visualization tools that often fly under the radar. Box plots, violin …