Testbed for information extraction from Deep Web

Yasuhiro Yamada, Tetsuya Nakatoh, Nick Craswell, Sachio Hirokawa

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Search results generated by searchable databases are served dynamically and far larger than the static documents on the Web. These results pages have been referred to as the Deep Web [1], We need to extract the target data in results pages to integrate them on different searchable databases. We propose a testbed for information extraction from search results. We chose 100 databases randomly from 114,540 pages with search forms. Therefore, these databases have a good variety. We selected 51 databases which include URLs in a results page and manually identify target information to be extracted. We also suggest evaluation measures for comparing extraction methods and methods for extending the target data.

Original languageEnglish
Title of host publicationThirteenth International World Wide Web Conference Proceedings, WWW2004
Pages1078-1079
Number of pages2
Publication statusPublished - Dec 1 2004
EventThirteenth International World Wide Web Conference Proceedings, WWW2004 - New York, NY, United States
Duration: May 17 2004May 22 2004

Publication series

NameThirteenth International World Wide Web Conference Proceedings, WWW2004

Other

OtherThirteenth International World Wide Web Conference Proceedings, WWW2004
CountryUnited States
CityNew York, NY
Period5/17/045/22/04

Fingerprint

Testbeds
Websites

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Cite this

Yamada, Y., Nakatoh, T., Craswell, N., & Hirokawa, S. (2004). Testbed for information extraction from Deep Web. In Thirteenth International World Wide Web Conference Proceedings, WWW2004 (pp. 1078-1079). (Thirteenth International World Wide Web Conference Proceedings, WWW2004).

Testbed for information extraction from Deep Web. / Yamada, Yasuhiro; Nakatoh, Tetsuya; Craswell, Nick; Hirokawa, Sachio.

Thirteenth International World Wide Web Conference Proceedings, WWW2004. 2004. p. 1078-1079 (Thirteenth International World Wide Web Conference Proceedings, WWW2004).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yamada, Y, Nakatoh, T, Craswell, N & Hirokawa, S 2004, Testbed for information extraction from Deep Web. in Thirteenth International World Wide Web Conference Proceedings, WWW2004. Thirteenth International World Wide Web Conference Proceedings, WWW2004, pp. 1078-1079, Thirteenth International World Wide Web Conference Proceedings, WWW2004, New York, NY, United States, 5/17/04.
Yamada Y, Nakatoh T, Craswell N, Hirokawa S. Testbed for information extraction from Deep Web. In Thirteenth International World Wide Web Conference Proceedings, WWW2004. 2004. p. 1078-1079. (Thirteenth International World Wide Web Conference Proceedings, WWW2004).
Yamada, Yasuhiro ; Nakatoh, Tetsuya ; Craswell, Nick ; Hirokawa, Sachio. / Testbed for information extraction from Deep Web. Thirteenth International World Wide Web Conference Proceedings, WWW2004. 2004. pp. 1078-1079 (Thirteenth International World Wide Web Conference Proceedings, WWW2004).
@inproceedings{a129fa240ff346ed88608e912c8ed372,
title = "Testbed for information extraction from Deep Web",
abstract = "Search results generated by searchable databases are served dynamically and far larger than the static documents on the Web. These results pages have been referred to as the Deep Web [1], We need to extract the target data in results pages to integrate them on different searchable databases. We propose a testbed for information extraction from search results. We chose 100 databases randomly from 114,540 pages with search forms. Therefore, these databases have a good variety. We selected 51 databases which include URLs in a results page and manually identify target information to be extracted. We also suggest evaluation measures for comparing extraction methods and methods for extending the target data.",
author = "Yasuhiro Yamada and Tetsuya Nakatoh and Nick Craswell and Sachio Hirokawa",
year = "2004",
month = "12",
day = "1",
language = "English",
isbn = "158113844X",
series = "Thirteenth International World Wide Web Conference Proceedings, WWW2004",
pages = "1078--1079",
booktitle = "Thirteenth International World Wide Web Conference Proceedings, WWW2004",

}

TY - GEN

T1 - Testbed for information extraction from Deep Web

AU - Yamada, Yasuhiro

AU - Nakatoh, Tetsuya

AU - Craswell, Nick

AU - Hirokawa, Sachio

PY - 2004/12/1

Y1 - 2004/12/1

N2 - Search results generated by searchable databases are served dynamically and far larger than the static documents on the Web. These results pages have been referred to as the Deep Web [1], We need to extract the target data in results pages to integrate them on different searchable databases. We propose a testbed for information extraction from search results. We chose 100 databases randomly from 114,540 pages with search forms. Therefore, these databases have a good variety. We selected 51 databases which include URLs in a results page and manually identify target information to be extracted. We also suggest evaluation measures for comparing extraction methods and methods for extending the target data.

AB - Search results generated by searchable databases are served dynamically and far larger than the static documents on the Web. These results pages have been referred to as the Deep Web [1], We need to extract the target data in results pages to integrate them on different searchable databases. We propose a testbed for information extraction from search results. We chose 100 databases randomly from 114,540 pages with search forms. Therefore, these databases have a good variety. We selected 51 databases which include URLs in a results page and manually identify target information to be extracted. We also suggest evaluation measures for comparing extraction methods and methods for extending the target data.

UR - http://www.scopus.com/inward/record.url?scp=19944381223&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=19944381223&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:19944381223

SN - 158113844X

T3 - Thirteenth International World Wide Web Conference Proceedings, WWW2004

SP - 1078

EP - 1079

BT - Thirteenth International World Wide Web Conference Proceedings, WWW2004

ER -