Statistics > Computation
[Submitted on 2 Jan 2025]
Title:An Efficient Outlier Detection Algorithm for Data Streaming
View PDF HTML (experimental)Abstract:The nature of modern data is increasingly real-time, making outlier detection crucial in any data-related field, such as finance for fraud detection and healthcare for monitoring patient vitals. Traditional outlier detection methods, such as the Local Outlier Factor (LOF) algorithm, struggle with real-time data due to the need for extensive recalculations with each new data point, limiting their application in real-time environments. While the Incremental LOF (ILOF) algorithm has been developed to tackle the challenges of online anomaly detection, it remains computationally expensive when processing large streams of data points, and its detection performance may degrade after a certain threshold of points have streamed in. In this paper, we propose a novel approach to enhance the efficiency of LOF algorithms for online anomaly detection, named the Efficient Incremental LOF (EILOF) algorithm. The EILOF algorithm only computes the LOF scores of new points without altering the LOF scores of existing data points. Although exact LOF scores have not yet been computed for the existing points in the new algorithm, datasets often contain noise, and minor deviations in LOF score calculations do not necessarily degrade detection performance. In fact, such deviations can sometimes enhance outlier detection. We systematically tested this approach on both simulated and real-world datasets, demonstrating that EILOF outperforms ILOF as the volume of streaming data increases across various scenarios. The EILOF algorithm not only significantly reduces computational costs, but also systematically improves detection accuracy when the number of additional points increases compared to the ILOF algorithm.
Current browse context:
stat.CO
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.