Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!

Publication

Anomaly Detection with MIDAS
Latest   Machine Learning

Anomaly Detection with MIDAS

Last Updated on July 20, 2023 by Editorial Team

Author(s): Nunzio Logallo

Originally published on Towards AI.

How can we detect anomalies more accurately and faster?

Anomaly detection in graphs is a severe problem finding strange behaviors in systems, like intrusion detection, fake ratings, and financial fraud. To minimize the effect of malicious activities as soon as possible, we need to detect anomalies in real-time to identify an incoming edge and decide if it is anomalous or not. Existing methods, process edge streams in an online manner and can miss a large amount of suspicious activity; in contrast to this, MIDAS detects microclusters anomalies in edge streams using constant time and memory, providing theoretical bounds on the false positive probability.

MIDAS is a project made by Siddharth Bhatia, Bryan Hooi, Minji Yoon, Kijung Shin, and Christos Faloutsos.

Main MIDAS contributions are:
1. Streaming Microcluster Detection, novel streaming approach for detecting microcluster anomalies;
2. Theoretical Guarantee, on the false positive probability of MIDAS;
3. Effectiveness, MIDAS’ experimental results show that MIDAS outperforms the baseline approaches by 42%-48% accuracy and processes the data 162–644 times faster.

If we compare MIDAS to previous approaches that detect anomalies in edge streams, we see that MIDAS includes more features like Microcluster Detection and Guarantee on false-positive probability, keeping the other elements of other approaches.

Comparison of relevant edge stream anomaly detection approaches — Source: the MIDAS repository

Algorithm

There are two approaches proposed: MIDAS and MIDAS-R.
Here is an overview:

  1. Streaming Hypothesis Testing Approach, is MIDAS’ work, where we can obtain guarantees on false positive probability using streaming data structures in a hypothesis testing-based framework;
  2. Detection and Guarantees, we make a decision on the procedure for determinating if a point is abnormal or not, obtaining guarantees on false-positive probability;
  3. Incorporating Relations, where MIDAS-R comes into play, incorporating relationships between edges temporally and spatially.

If you want to learn more about the algorithm, please visit the MIDAS repository.

Accuracy

ROC for DARPA dataset — Source: the MIDAS repository

In the graph above, which plots the ROC curve for MIDAS, MIDAS-R, and SedanSpot (a consistent anomaly detection approach), we can see that MIDAS is 42% more accurate compared to the baseline, and also run significantly faster (644×).

Average Precision Score vs. running time of MIDAS and MIDAS-R — Source: the MIDAS repository

In the graph above, which plots the average precision score vs. the running time, we see that MIDAS is 27% more precise compared to the baseline. In comparison, MIDAS-R is 29% more precise, achieving the highest average precision score. We can say that both MIDAS and MIDAS-R outperform other anomaly detection approaches in edge streams.

Scalability

Scalability of MIDAS and MIDAS-R compared to the number of edges — Source: the MIDAS repository

The graph above shows the scalability of MIDAS and MIDAS-R. As we can see, it confirms the scalability of them compared to the processing time per edge with an increase in the number of edges. Both MIDAS and MIDAS-R, allow real-time anomaly detection, processing 4M edges within 0.5s.

Real-World Effectiveness

Correspondance between detected anomalies by MIDAS and major security-related events in TwitterSecurity — Source: the MIDAS repository

One last time we compare MIDAS, MIDAS-R, and SedanSpot measuring their anomaly scores in a real-world example: TwitterSecurity dataset. The graph above plots anomaly scores vs. day, from May to September 2014. As we can see, we have different peaks of anomalies that coincide with significant events in the TwitterSecurity timeline for MIDAS. In contrast, SedanSpot simply outputs a lot of high anomalousness scores, thereby leading to low AUC.

Other use cases

Let’s think about one application of MIDAS in the manufacturing sector, where there are a lot of working machines interconnected as a graph; if these machines have strange behavior, it can result in an overrun of costs in terms of power consumption and raw materials waste. An anomaly detection algorithm like MIDAS is capable of detecting these strange behaviors in real-time, reducing, and preventing a loss. There are many more applications of MIDAS, for example, as a detector of fake accounts for social networks like Twitter and Facebook, where there are people or bots who create false identities. MIDAS can help in detecting fake news too, deciding whether an article is real or it is just a clickbait.

Conclusion and References

MIDAS and MIDAS-R make the detection of anomalies in edge streams, faster and more accurate, keeping high scalability and real-world effectiveness.

If you want to learn more about MIDAS, check the MIDAS repository where you can find examples and a getting started guide. If you have any questions, please don’t hesitate to contact Siddharth Bhatia.

Siddharth Bhatia, Bryan Hooi, Minji Yoon, Kijung Shin and Christos Faloutsos. “MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams.” AAAI Conference on Artificial Intelligence (AAAI), 2020. https://arxiv.org/abs/1911.04464

Nunzio Logallo

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓