Demystifying Time Series Outliers: 3/4
Author(s): Andrea Ianni
Originally published on Towards AI.
Breaking Down the Series and Doing Some Cleanup
Finally, weβve arrived at the long-anticipated third episode. Overcoming, with a hint of sentiment, the halfway mark of our saga, letβs revisit where we last stood.
In the first episode, the thunder of the pandemic introduced us to our blond hero. We tracked the evolution of his rise amid a downpour of tweets. We noticed a discord in the data and thus sought to βstraighten it outβ by taking the simplest and most direct path possible (it always starts this way).
Rovella and the Rebel Data
pub.towardsai.net
import pandas as pdimport numpy as nplink = 'https://raw.githubusercontent.com/ianni-phd/Datasets/main/rovella_tweets.csv'tweets = pd.read_csv(link, sep=';', decimal=',', index_col='date', parse_dates=['date'])tweets_series = tweets['target']
In the second episode, we wielded the electric saw and pruned our outliers, aiming to purify the dataset. Unfortunately, we encountered an unpleasant surprise: things didnβt pan out as anticipated. We stumbled into the quintessential trap for those hunting outliers, inadvertently discarding crucial data along with the unwanted noise.
La gestione degli outlier e la brutta sorpresa
medium.com
Letβs set aside the modeling anxiety for a moment and look at the series: itβs clear there is a trend (NicolΓ²βs fame has grown over the years), and itβs equally clear there is an oscillatory behavior with a pattern that, if visible to the naked eye,… Read the full blog for free on Medium.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI