Ensemble anomaly detection from multi-resolution trajectory features

Shin Ando, Theerasak Thanomphongphan, Yoichi Seki, Einoshin Suzuki

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

The numerical, sequential observation of behaviors, such as trajectories, have become an important subject for data mining and knowledge discovery research. Processing the raw observation into representative features of the behaviors involves an implicit choice of time-scale and resolution, which critically affect the final output of the mining techniques. The choice is associated with the parameters of data-processing, e.g., smoothing and segmentation, which unintuitively yet strongly influence the intrinsic structure of the numerical data. Data mining techniques generally require users to provide an appropriately processed input, but selecting a resolution is an arduous task that may require an expensive, manual examination of outputs between different settings. In this paper, we propose a novel ensemble framework for aggregating outcomes in different settings of scale and resolution parameters for an anomaly detection task. Such a task is difficult for existing ensemble approaches based on weighted combination because: (a) evaluating and weighing an output requires training samples of anomalies which are generally unavailable, (b) the detectability of anomalies can depend on the resolution, i.e., the distinction from normal instances may only be apparent within a small, selective range of parameters. In the proposed framework, predictions based on different resolutions are aggregated to construct meta-feature representations of the behavior instances. The meta-features provide the discriminative information for conducting a clustering-based anomaly detection. In the proposed framework, two interrelated tasks of the behavior analysis: processing the numerical data and discovering anomalous patterns, are addressed jointly, providing an intuitive alternative for a knowledge-intensive parameter selection. We also design an efficient clustering-based anomaly detection algorithm which reduces the computational burden of mining at multiple resolutions. We conduct an empirical study of the proposed framework using real-world trajectory data. It shows that the proposed framework achieves a significant improvement over the conventional ensemble approach.

Original languageEnglish
Pages (from-to)39-83
Number of pages45
JournalData Mining and Knowledge Discovery
Volume29
Issue number1
DOIs
Publication statusPublished - Jan 1 2013

Fingerprint

Data mining
Trajectories
Weighing
Processing

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Computer Science Applications
  • Computer Networks and Communications

Cite this

Ensemble anomaly detection from multi-resolution trajectory features. / Ando, Shin; Thanomphongphan, Theerasak; Seki, Yoichi; Suzuki, Einoshin.

In: Data Mining and Knowledge Discovery, Vol. 29, No. 1, 01.01.2013, p. 39-83.

Research output: Contribution to journalArticle

Ando, Shin ; Thanomphongphan, Theerasak ; Seki, Yoichi ; Suzuki, Einoshin. / Ensemble anomaly detection from multi-resolution trajectory features. In: Data Mining and Knowledge Discovery. 2013 ; Vol. 29, No. 1. pp. 39-83.
@article{3cf1a92c02714fbbaaad978741654d05,
title = "Ensemble anomaly detection from multi-resolution trajectory features",
abstract = "The numerical, sequential observation of behaviors, such as trajectories, have become an important subject for data mining and knowledge discovery research. Processing the raw observation into representative features of the behaviors involves an implicit choice of time-scale and resolution, which critically affect the final output of the mining techniques. The choice is associated with the parameters of data-processing, e.g., smoothing and segmentation, which unintuitively yet strongly influence the intrinsic structure of the numerical data. Data mining techniques generally require users to provide an appropriately processed input, but selecting a resolution is an arduous task that may require an expensive, manual examination of outputs between different settings. In this paper, we propose a novel ensemble framework for aggregating outcomes in different settings of scale and resolution parameters for an anomaly detection task. Such a task is difficult for existing ensemble approaches based on weighted combination because: (a) evaluating and weighing an output requires training samples of anomalies which are generally unavailable, (b) the detectability of anomalies can depend on the resolution, i.e., the distinction from normal instances may only be apparent within a small, selective range of parameters. In the proposed framework, predictions based on different resolutions are aggregated to construct meta-feature representations of the behavior instances. The meta-features provide the discriminative information for conducting a clustering-based anomaly detection. In the proposed framework, two interrelated tasks of the behavior analysis: processing the numerical data and discovering anomalous patterns, are addressed jointly, providing an intuitive alternative for a knowledge-intensive parameter selection. We also design an efficient clustering-based anomaly detection algorithm which reduces the computational burden of mining at multiple resolutions. We conduct an empirical study of the proposed framework using real-world trajectory data. It shows that the proposed framework achieves a significant improvement over the conventional ensemble approach.",
author = "Shin Ando and Theerasak Thanomphongphan and Yoichi Seki and Einoshin Suzuki",
year = "2013",
month = "1",
day = "1",
doi = "10.1007/s10618-013-0334-x",
language = "English",
volume = "29",
pages = "39--83",
journal = "Data Mining and Knowledge Discovery",
issn = "1384-5810",
publisher = "Springer Netherlands",
number = "1",

}

TY - JOUR

T1 - Ensemble anomaly detection from multi-resolution trajectory features

AU - Ando, Shin

AU - Thanomphongphan, Theerasak

AU - Seki, Yoichi

AU - Suzuki, Einoshin

PY - 2013/1/1

Y1 - 2013/1/1

N2 - The numerical, sequential observation of behaviors, such as trajectories, have become an important subject for data mining and knowledge discovery research. Processing the raw observation into representative features of the behaviors involves an implicit choice of time-scale and resolution, which critically affect the final output of the mining techniques. The choice is associated with the parameters of data-processing, e.g., smoothing and segmentation, which unintuitively yet strongly influence the intrinsic structure of the numerical data. Data mining techniques generally require users to provide an appropriately processed input, but selecting a resolution is an arduous task that may require an expensive, manual examination of outputs between different settings. In this paper, we propose a novel ensemble framework for aggregating outcomes in different settings of scale and resolution parameters for an anomaly detection task. Such a task is difficult for existing ensemble approaches based on weighted combination because: (a) evaluating and weighing an output requires training samples of anomalies which are generally unavailable, (b) the detectability of anomalies can depend on the resolution, i.e., the distinction from normal instances may only be apparent within a small, selective range of parameters. In the proposed framework, predictions based on different resolutions are aggregated to construct meta-feature representations of the behavior instances. The meta-features provide the discriminative information for conducting a clustering-based anomaly detection. In the proposed framework, two interrelated tasks of the behavior analysis: processing the numerical data and discovering anomalous patterns, are addressed jointly, providing an intuitive alternative for a knowledge-intensive parameter selection. We also design an efficient clustering-based anomaly detection algorithm which reduces the computational burden of mining at multiple resolutions. We conduct an empirical study of the proposed framework using real-world trajectory data. It shows that the proposed framework achieves a significant improvement over the conventional ensemble approach.

AB - The numerical, sequential observation of behaviors, such as trajectories, have become an important subject for data mining and knowledge discovery research. Processing the raw observation into representative features of the behaviors involves an implicit choice of time-scale and resolution, which critically affect the final output of the mining techniques. The choice is associated with the parameters of data-processing, e.g., smoothing and segmentation, which unintuitively yet strongly influence the intrinsic structure of the numerical data. Data mining techniques generally require users to provide an appropriately processed input, but selecting a resolution is an arduous task that may require an expensive, manual examination of outputs between different settings. In this paper, we propose a novel ensemble framework for aggregating outcomes in different settings of scale and resolution parameters for an anomaly detection task. Such a task is difficult for existing ensemble approaches based on weighted combination because: (a) evaluating and weighing an output requires training samples of anomalies which are generally unavailable, (b) the detectability of anomalies can depend on the resolution, i.e., the distinction from normal instances may only be apparent within a small, selective range of parameters. In the proposed framework, predictions based on different resolutions are aggregated to construct meta-feature representations of the behavior instances. The meta-features provide the discriminative information for conducting a clustering-based anomaly detection. In the proposed framework, two interrelated tasks of the behavior analysis: processing the numerical data and discovering anomalous patterns, are addressed jointly, providing an intuitive alternative for a knowledge-intensive parameter selection. We also design an efficient clustering-based anomaly detection algorithm which reduces the computational burden of mining at multiple resolutions. We conduct an empirical study of the proposed framework using real-world trajectory data. It shows that the proposed framework achieves a significant improvement over the conventional ensemble approach.

UR - http://www.scopus.com/inward/record.url?scp=84921067372&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84921067372&partnerID=8YFLogxK

U2 - 10.1007/s10618-013-0334-x

DO - 10.1007/s10618-013-0334-x

M3 - Article

VL - 29

SP - 39

EP - 83

JO - Data Mining and Knowledge Discovery

JF - Data Mining and Knowledge Discovery

SN - 1384-5810

IS - 1

ER -