Plagiarism detection using document similarity based on distributed representation

Kensuke Baba, Tetsuya Nakatoh, Toshiro Minami

Research output: Contribution to journalConference article

2 Citations (Scopus)

Abstract

Accurate methods are required for plagiarism detection from documents. Generally, plagiarism detection is implemented on the basis of similarity between documents. This paper evaluates the validity of using distributed representation of words for defining a document similarity. This paper proposes a plagiarism detection method based on the local maximal value of the length of the longest common subsequence (LCS) with the weight defined by a distributed representation. The proposed method and other two straightforward methods, which are based on the simple length of LCS and the local maximal value of LCS with no weight, are applied to the dataset of a plagiarism detection competition. The experimental results show that the proposed method is useful in the applications that need a strict detection of complex plagiarisms.

Original languageEnglish
Pages (from-to)382-387
Number of pages6
JournalProcedia Computer Science
Volume111
DOIs
Publication statusPublished - Jan 1 2017
Event8th International Conference on Advances in Information Technology, IAIT 2016 - , Macao
Duration: Dec 19 2016Dec 22 2016

All Science Journal Classification (ASJC) codes

  • Computer Science(all)

Cite this

Plagiarism detection using document similarity based on distributed representation. / Baba, Kensuke; Nakatoh, Tetsuya; Minami, Toshiro.

In: Procedia Computer Science, Vol. 111, 01.01.2017, p. 382-387.

Research output: Contribution to journalConference article

Baba, Kensuke ; Nakatoh, Tetsuya ; Minami, Toshiro. / Plagiarism detection using document similarity based on distributed representation. In: Procedia Computer Science. 2017 ; Vol. 111. pp. 382-387.
@article{748095dbe7b64e7e9f4b688a8fe02382,
title = "Plagiarism detection using document similarity based on distributed representation",
abstract = "Accurate methods are required for plagiarism detection from documents. Generally, plagiarism detection is implemented on the basis of similarity between documents. This paper evaluates the validity of using distributed representation of words for defining a document similarity. This paper proposes a plagiarism detection method based on the local maximal value of the length of the longest common subsequence (LCS) with the weight defined by a distributed representation. The proposed method and other two straightforward methods, which are based on the simple length of LCS and the local maximal value of LCS with no weight, are applied to the dataset of a plagiarism detection competition. The experimental results show that the proposed method is useful in the applications that need a strict detection of complex plagiarisms.",
author = "Kensuke Baba and Tetsuya Nakatoh and Toshiro Minami",
year = "2017",
month = "1",
day = "1",
doi = "10.1016/j.procs.2017.06.038",
language = "English",
volume = "111",
pages = "382--387",
journal = "Procedia Computer Science",
issn = "1877-0509",
publisher = "Elsevier BV",

}

TY - JOUR

T1 - Plagiarism detection using document similarity based on distributed representation

AU - Baba, Kensuke

AU - Nakatoh, Tetsuya

AU - Minami, Toshiro

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Accurate methods are required for plagiarism detection from documents. Generally, plagiarism detection is implemented on the basis of similarity between documents. This paper evaluates the validity of using distributed representation of words for defining a document similarity. This paper proposes a plagiarism detection method based on the local maximal value of the length of the longest common subsequence (LCS) with the weight defined by a distributed representation. The proposed method and other two straightforward methods, which are based on the simple length of LCS and the local maximal value of LCS with no weight, are applied to the dataset of a plagiarism detection competition. The experimental results show that the proposed method is useful in the applications that need a strict detection of complex plagiarisms.

AB - Accurate methods are required for plagiarism detection from documents. Generally, plagiarism detection is implemented on the basis of similarity between documents. This paper evaluates the validity of using distributed representation of words for defining a document similarity. This paper proposes a plagiarism detection method based on the local maximal value of the length of the longest common subsequence (LCS) with the weight defined by a distributed representation. The proposed method and other two straightforward methods, which are based on the simple length of LCS and the local maximal value of LCS with no weight, are applied to the dataset of a plagiarism detection competition. The experimental results show that the proposed method is useful in the applications that need a strict detection of complex plagiarisms.

UR - http://www.scopus.com/inward/record.url?scp=85029367850&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85029367850&partnerID=8YFLogxK

U2 - 10.1016/j.procs.2017.06.038

DO - 10.1016/j.procs.2017.06.038

M3 - Conference article

AN - SCOPUS:85029367850

VL - 111

SP - 382

EP - 387

JO - Procedia Computer Science

JF - Procedia Computer Science

SN - 1877-0509

ER -