Testbed for information extraction from deep web

Yasuhiro Yamada, Nick Craswell, Tetsuya Nakatoh, Sachio Hirokawa

Research output: Chapter in Book/Report/Conference proceedingConference contribution

29 Citations (Scopus)

Abstract

Search results generated by searchable databases are served dynamically and far larger than the static documents on the Web. These results pages have been referred to as the Deep Web [1]. We need to extract the target data in results pages to integrate them on different searchable databases. We propose a testbed for information extraction from search results. We chose 100 databases randomly from 114,540 pages with search forms. Therefore, these databases have a good variety. We selected 51 databases which include URLs in a results page and manually identify target information to be extracted. We also suggest evaluation measures for comparing extraction methods and methods for extending the target data.

Original languageEnglish
Title of host publicationProceedings of the 13th International World Wide Web Conference on Alternate Track, Papers and Posters, WWW Alt. 2004
PublisherAssociation for Computing Machinery, Inc
Pages346-347
Number of pages2
ISBN (Electronic)1581139128, 9781581139129
DOIs
Publication statusPublished - May 19 2004
Event13th International World Wide Web Conference on Alternate Track, Papers and Posters, WWW Alt. 2004 - New York, United States
Duration: May 19 2004May 21 2004

Publication series

NameProceedings of the 13th International World Wide Web Conference on Alternate Track, Papers and Posters, WWW Alt. 2004

Other

Other13th International World Wide Web Conference on Alternate Track, Papers and Posters, WWW Alt. 2004
CountryUnited States
CityNew York
Period5/19/045/21/04

Fingerprint

Testbeds
Websites

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Software

Cite this

Yamada, Y., Craswell, N., Nakatoh, T., & Hirokawa, S. (2004). Testbed for information extraction from deep web. In Proceedings of the 13th International World Wide Web Conference on Alternate Track, Papers and Posters, WWW Alt. 2004 (pp. 346-347). (Proceedings of the 13th International World Wide Web Conference on Alternate Track, Papers and Posters, WWW Alt. 2004). Association for Computing Machinery, Inc. https://doi.org/10.1145/1013367.1013468

Testbed for information extraction from deep web. / Yamada, Yasuhiro; Craswell, Nick; Nakatoh, Tetsuya; Hirokawa, Sachio.

Proceedings of the 13th International World Wide Web Conference on Alternate Track, Papers and Posters, WWW Alt. 2004. Association for Computing Machinery, Inc, 2004. p. 346-347 (Proceedings of the 13th International World Wide Web Conference on Alternate Track, Papers and Posters, WWW Alt. 2004).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yamada, Y, Craswell, N, Nakatoh, T & Hirokawa, S 2004, Testbed for information extraction from deep web. in Proceedings of the 13th International World Wide Web Conference on Alternate Track, Papers and Posters, WWW Alt. 2004. Proceedings of the 13th International World Wide Web Conference on Alternate Track, Papers and Posters, WWW Alt. 2004, Association for Computing Machinery, Inc, pp. 346-347, 13th International World Wide Web Conference on Alternate Track, Papers and Posters, WWW Alt. 2004, New York, United States, 5/19/04. https://doi.org/10.1145/1013367.1013468
Yamada Y, Craswell N, Nakatoh T, Hirokawa S. Testbed for information extraction from deep web. In Proceedings of the 13th International World Wide Web Conference on Alternate Track, Papers and Posters, WWW Alt. 2004. Association for Computing Machinery, Inc. 2004. p. 346-347. (Proceedings of the 13th International World Wide Web Conference on Alternate Track, Papers and Posters, WWW Alt. 2004). https://doi.org/10.1145/1013367.1013468
Yamada, Yasuhiro ; Craswell, Nick ; Nakatoh, Tetsuya ; Hirokawa, Sachio. / Testbed for information extraction from deep web. Proceedings of the 13th International World Wide Web Conference on Alternate Track, Papers and Posters, WWW Alt. 2004. Association for Computing Machinery, Inc, 2004. pp. 346-347 (Proceedings of the 13th International World Wide Web Conference on Alternate Track, Papers and Posters, WWW Alt. 2004).
@inproceedings{226dfece8c6d457982d21044835627f4,
title = "Testbed for information extraction from deep web",
abstract = "Search results generated by searchable databases are served dynamically and far larger than the static documents on the Web. These results pages have been referred to as the Deep Web [1]. We need to extract the target data in results pages to integrate them on different searchable databases. We propose a testbed for information extraction from search results. We chose 100 databases randomly from 114,540 pages with search forms. Therefore, these databases have a good variety. We selected 51 databases which include URLs in a results page and manually identify target information to be extracted. We also suggest evaluation measures for comparing extraction methods and methods for extending the target data.",
author = "Yasuhiro Yamada and Nick Craswell and Tetsuya Nakatoh and Sachio Hirokawa",
year = "2004",
month = "5",
day = "19",
doi = "10.1145/1013367.1013468",
language = "English",
series = "Proceedings of the 13th International World Wide Web Conference on Alternate Track, Papers and Posters, WWW Alt. 2004",
publisher = "Association for Computing Machinery, Inc",
pages = "346--347",
booktitle = "Proceedings of the 13th International World Wide Web Conference on Alternate Track, Papers and Posters, WWW Alt. 2004",

}

TY - GEN

T1 - Testbed for information extraction from deep web

AU - Yamada, Yasuhiro

AU - Craswell, Nick

AU - Nakatoh, Tetsuya

AU - Hirokawa, Sachio

PY - 2004/5/19

Y1 - 2004/5/19

N2 - Search results generated by searchable databases are served dynamically and far larger than the static documents on the Web. These results pages have been referred to as the Deep Web [1]. We need to extract the target data in results pages to integrate them on different searchable databases. We propose a testbed for information extraction from search results. We chose 100 databases randomly from 114,540 pages with search forms. Therefore, these databases have a good variety. We selected 51 databases which include URLs in a results page and manually identify target information to be extracted. We also suggest evaluation measures for comparing extraction methods and methods for extending the target data.

AB - Search results generated by searchable databases are served dynamically and far larger than the static documents on the Web. These results pages have been referred to as the Deep Web [1]. We need to extract the target data in results pages to integrate them on different searchable databases. We propose a testbed for information extraction from search results. We chose 100 databases randomly from 114,540 pages with search forms. Therefore, these databases have a good variety. We selected 51 databases which include URLs in a results page and manually identify target information to be extracted. We also suggest evaluation measures for comparing extraction methods and methods for extending the target data.

UR - http://www.scopus.com/inward/record.url?scp=84880089492&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84880089492&partnerID=8YFLogxK

U2 - 10.1145/1013367.1013468

DO - 10.1145/1013367.1013468

M3 - Conference contribution

AN - SCOPUS:84880089492

T3 - Proceedings of the 13th International World Wide Web Conference on Alternate Track, Papers and Posters, WWW Alt. 2004

SP - 346

EP - 347

BT - Proceedings of the 13th International World Wide Web Conference on Alternate Track, Papers and Posters, WWW Alt. 2004

PB - Association for Computing Machinery, Inc

ER -