A method for fine-grained document alignment using structural information

Naoki Tsujio, Toshiyuki Shimizu, Masatoshi Yoshikawa

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

It is useful to understand the corresponding relationships between each part of related documents, such as a conference paper and its modified version published as a journal paper, or documents in different versions. However, it is hard to associate corresponding parts which have been heavily modified only using similarity in their content. We propose a method of aligning documents considering not only content information but also structural information in documents. Our method consists of three steps; baseline alignment considering document order, merging, and swapping. We used papers which have been presented at a domestic conference and an international conference, then obtained their alignments by using several methods in our evaluation experiments. The results revealed the effectiveness of the use of document structures.

Original languageEnglish
Title of host publicationWeb Technologies and Applications - 16th Asia-Pacific Web Conference, APWeb 2014, Proceedings
PublisherSpringer Verlag
Pages201-211
Number of pages11
ISBN (Print)9783319111155
DOIs
Publication statusPublished - 2014
Externally publishedYes
Event16th Asia-Pacific Web Conference on Web Technologies and Applications, APWeb 2014 - Changsha, China
Duration: Sep 5 2014Sep 7 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8709 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference16th Asia-Pacific Web Conference on Web Technologies and Applications, APWeb 2014
Country/TerritoryChina
CityChangsha
Period9/5/149/7/14

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'A method for fine-grained document alignment using structural information'. Together they form a unique fingerprint.

Cite this