Anomaly Detection with MIDAS

Last Updated on July 20, 2023 by Editorial Team

Author(s): Nunzio Logallo

Originally published on Towards AI.

How can we detect anomalies more accurately and faster?

Anomaly detection in graphs is a severe problem finding strange behaviors in systems, like intrusion detection, fake ratings, and financial fraud. To minimize the effect of malicious activities as soon as possible, we need to detect anomalies in real-time to identify an incoming edge and decide if it is anomalous or not. Existing methods, process edge streams in an online manner and can miss a large amount of suspicious activity; in contrast to this, MIDAS detects microclusters anomalies in edge streams using constant time and memory, providing theoretical bounds on the false positive probability.

MIDAS is a project made by Siddharth Bhatia, Bryan Hooi, Minji Yoon, Kijung Shin, and Christos Faloutsos.

Main MIDAS contributions are:
1. Streaming Microcluster Detection, novel streaming approach for detecting microcluster anomalies;
2. Theoretical Guarantee, on the false positive probability of MIDAS;
3. Effectiveness, MIDAS’ experimental results show that MIDAS outperforms the baseline approaches by 42%-48% accuracy and processes the data 162–644 times faster.

If we compare MIDAS to previous approaches that detect anomalies in edge streams, we see that MIDAS includes more features like Microcluster Detection and Guarantee on false-positive probability, keeping the other elements of other approaches.

Comparison of relevant edge stream anomaly detection approaches — Source: the MIDAS repository

Algorithm

There are two approaches proposed: MIDAS and MIDAS-R.
Here is an overview:

Streaming Hypothesis Testing Approach, is MIDAS’ work, where we can obtain guarantees on false positive probability using streaming data structures in a hypothesis testing-based framework;
Detection and Guarantees, we make a decision on the procedure for determinating if a point is abnormal or not, obtaining guarantees on false-positive probability;
Incorporating Relations, where MIDAS-R comes into play, incorporating relationships between edges temporally and spatially.

If you want to learn more about the algorithm, please visit the MIDAS repository.

Accuracy

ROC for DARPA dataset — Source: the MIDAS repository

In the graph above, which plots the ROC curve for MIDAS, MIDAS-R, and SedanSpot (a consistent anomaly detection approach), we can see that MIDAS is 42% more accurate compared to the baseline, and also run significantly faster (644×).

Average Precision Score vs. running time of MIDAS and MIDAS-R — Source: the MIDAS repository

In the graph above, which plots the average precision score vs. the running time, we see that MIDAS is 27% more precise compared to the baseline. In comparison, MIDAS-R is 29% more precise, achieving the highest average precision score. We can say that both MIDAS and MIDAS-R outperform other anomaly detection approaches in edge streams.

Scalability

The graph above shows the scalability of MIDAS and MIDAS-R. As we can see, it confirms the scalability of them compared to the processing time per edge with an increase in the number of edges. Both MIDAS and MIDAS-R, allow real-time anomaly detection, processing 4M edges within 0.5s.

Real-World Effectiveness

Correspondance between detected anomalies by MIDAS and major security-related events in TwitterSecurity — Source: the MIDAS repository

One last time we compare MIDAS, MIDAS-R, and SedanSpot measuring their anomaly scores in a real-world example: TwitterSecurity dataset. The graph above plots anomaly scores vs. day, from May to September 2014. As we can see, we have different peaks of anomalies that coincide with significant events in the TwitterSecurity timeline for MIDAS. In contrast, SedanSpot simply outputs a lot of high anomalousness scores, thereby leading to low AUC.

Other use cases

Let’s think about one application of MIDAS in the manufacturing sector, where there are a lot of working machines interconnected as a graph; if these machines have strange behavior, it can result in an overrun of costs in terms of power consumption and raw materials waste. An anomaly detection algorithm like MIDAS is capable of detecting these strange behaviors in real-time, reducing, and preventing a loss. There are many more applications of MIDAS, for example, as a detector of fake accounts for social networks like Twitter and Facebook, where there are people or bots who create false identities. MIDAS can help in detecting fake news too, deciding whether an article is real or it is just a clickbait.

Conclusion and References

MIDAS and MIDAS-R make the detection of anomalies in edge streams, faster and more accurate, keeping high scalability and real-world effectiveness.

If you want to learn more about MIDAS, check the MIDAS repository where you can find examples and a getting started guide. If you have any questions, please don’t hesitate to contact Siddharth Bhatia.

Siddharth Bhatia, Bryan Hooi, Minji Yoon, Kijung Shin and Christos Faloutsos. “MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams.” AAAI Conference on Artificial Intelligence (AAAI), 2020. https://arxiv.org/abs/1911.04464

Nunzio Logallo

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Anomaly Detection with MIDAS

Author(s): Nunzio Logallo

How can we detect anomalies more accurately and faster?

Algorithm

Accuracy

Scalability

Real-World Effectiveness

Other use cases

Conclusion and References

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

The Fundamental Mathematics of Machine Learning

Built-In AI Web APIs Will Enable A New Generation Of AI Startups

Auditing Predictive A.I. Models for Bias and Fairness

Why is Llama 3.1 Such a Big deal?

5 AI Real-World Projects To Set Foot in The Door

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Anomaly Detection with MIDAS

Author(s): Nunzio Logallo

How can we detect anomalies more accurately and faster?

Algorithm

Accuracy

Scalability

Real-World Effectiveness

Other use cases

Conclusion and References

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement