Unsupervised spam detection based on string alienness measures

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Citations (Scopus)

Abstract

We propose an unsupervised method for detecting spam documents from a given set of documents, based on equivalence relations on strings. We give three measures for quantifying the alienness (i.e. how different they are from others) of substrings within the documents. A document is then classified as spam if it contains a substring that is in an equivalence class with a high degree of alienness. The proposed method is unsupervised, language independent, and scalable. Computational experiments conducted on data collected from Japanese web forums show that the method successfully discovers spams.

Original languageEnglish
Title of host publicationDiscovery Science - 10th International Conference, DS 2007, Proceedings
Pages161-172
Number of pages12
Publication statusPublished - Dec 1 2007
Event10th International Conference on Discovery Science, DS 2007 - Sendai, Japan
Duration: Oct 1 2007Oct 4 2007

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4755 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other10th International Conference on Discovery Science, DS 2007
CountryJapan
CitySendai
Period10/1/0710/4/07

Fingerprint

Equivalence classes
Spam
Strings
Experiments
Equivalence relation
Equivalence class
Computational Experiments

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Narisawa, K., Bannai, H., Hatano, K., & Takeda, M. (2007). Unsupervised spam detection based on string alienness measures. In Discovery Science - 10th International Conference, DS 2007, Proceedings (pp. 161-172). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4755 LNAI).

Unsupervised spam detection based on string alienness measures. / Narisawa, Kazuyuki; Bannai, Hideo; Hatano, Kohei; Takeda, Masayuki.

Discovery Science - 10th International Conference, DS 2007, Proceedings. 2007. p. 161-172 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4755 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Narisawa, K, Bannai, H, Hatano, K & Takeda, M 2007, Unsupervised spam detection based on string alienness measures. in Discovery Science - 10th International Conference, DS 2007, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 4755 LNAI, pp. 161-172, 10th International Conference on Discovery Science, DS 2007, Sendai, Japan, 10/1/07.
Narisawa K, Bannai H, Hatano K, Takeda M. Unsupervised spam detection based on string alienness measures. In Discovery Science - 10th International Conference, DS 2007, Proceedings. 2007. p. 161-172. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Narisawa, Kazuyuki ; Bannai, Hideo ; Hatano, Kohei ; Takeda, Masayuki. / Unsupervised spam detection based on string alienness measures. Discovery Science - 10th International Conference, DS 2007, Proceedings. 2007. pp. 161-172 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{a699ea7e00d94562ab74fe39c670a142,
title = "Unsupervised spam detection based on string alienness measures",
abstract = "We propose an unsupervised method for detecting spam documents from a given set of documents, based on equivalence relations on strings. We give three measures for quantifying the alienness (i.e. how different they are from others) of substrings within the documents. A document is then classified as spam if it contains a substring that is in an equivalence class with a high degree of alienness. The proposed method is unsupervised, language independent, and scalable. Computational experiments conducted on data collected from Japanese web forums show that the method successfully discovers spams.",
author = "Kazuyuki Narisawa and Hideo Bannai and Kohei Hatano and Masayuki Takeda",
year = "2007",
month = "12",
day = "1",
language = "English",
isbn = "9783540754879",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "161--172",
booktitle = "Discovery Science - 10th International Conference, DS 2007, Proceedings",

}

TY - GEN

T1 - Unsupervised spam detection based on string alienness measures

AU - Narisawa, Kazuyuki

AU - Bannai, Hideo

AU - Hatano, Kohei

AU - Takeda, Masayuki

PY - 2007/12/1

Y1 - 2007/12/1

N2 - We propose an unsupervised method for detecting spam documents from a given set of documents, based on equivalence relations on strings. We give three measures for quantifying the alienness (i.e. how different they are from others) of substrings within the documents. A document is then classified as spam if it contains a substring that is in an equivalence class with a high degree of alienness. The proposed method is unsupervised, language independent, and scalable. Computational experiments conducted on data collected from Japanese web forums show that the method successfully discovers spams.

AB - We propose an unsupervised method for detecting spam documents from a given set of documents, based on equivalence relations on strings. We give three measures for quantifying the alienness (i.e. how different they are from others) of substrings within the documents. A document is then classified as spam if it contains a substring that is in an equivalence class with a high degree of alienness. The proposed method is unsupervised, language independent, and scalable. Computational experiments conducted on data collected from Japanese web forums show that the method successfully discovers spams.

UR - http://www.scopus.com/inward/record.url?scp=38149047603&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=38149047603&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:38149047603

SN - 9783540754879

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 161

EP - 172

BT - Discovery Science - 10th International Conference, DS 2007, Proceedings

ER -