Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!


Demystifying Time Series Outliers: 3/4
Latest   Machine Learning

Demystifying Time Series Outliers: 3/4

Author(s): Andrea Ianni

Originally published on Towards AI.

Breaking Down the Series and Doing Some Cleanup

Finally, we’ve arrived at the long-anticipated third episode. Overcoming, with a hint of sentiment, the halfway mark of our saga, let’s revisit where we last stood.

In the first episode, the thunder of the pandemic introduced us to our blond hero. We tracked the evolution of his rise amid a downpour of tweets. We noticed a discord in the data and thus sought to ‘straighten it out’ by taking the simplest and most direct path possible (it always starts this way).

Rovella and the Rebel Data

import pandas as pdimport numpy as nplink = ''tweets = pd.read_csv(link, sep=';', decimal=',', index_col='date', parse_dates=['date'])tweets_series = tweets['target']

In the second episode, we wielded the electric saw and pruned our outliers, aiming to purify the dataset. Unfortunately, we encountered an unpleasant surprise: things didn’t pan out as anticipated. We stumbled into the quintessential trap for those hunting outliers, inadvertently discarding crucial data along with the unwanted noise.

La gestione degli outlier e la brutta sorpresa

Let’s set aside the modeling anxiety for a moment and look at the series: it’s clear there is a trend (Nicolò’s fame has grown over the years), and it’s equally clear there is an oscillatory behavior with a pattern that, if visible to the naked eye,… Read the full blog for free on Medium.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓