Paper submission extended: January 21st, 2014
The Fourth International Workshop on “Data Extraction and Object Search” (DEOS 2013) will take place as a satellite event of WWW 2014 in Seoul, South Korea, on April 7th, 2014. Web data extraction is witnessing a renaissance. In an increasing number of applications such as price intelligence or predictive analytics, the value of data-driven approaches has been conclusively proven. However, the necessary data is often available only as HTML, e.g., in form of online shops of competitors that can serve as sources for pricing and offer data. DEOS is a regular forum for researchers and practitioners in data extraction and object search, to present and discuss ongoing work on data extraction and object search for products, events, reviews, and other types of structured data on the web.
This year’s DEOS focuses on the challenges in scaling data extraction to the variety and volume of different data sources available only as HTML on the web. Classical data extraction has been largely site-specific, requiring some manual supervision for every site. Where data is to be sourced from more than a handful of websites, this approach fails. To address this challenge, we are witnessing a paradigm shift in data extraction away from manual supervision by experts. This shift has seen two primary directions emerge: Some approaches have considered how to allow non-experts to provide the necessary per-site supervision and turned to crowdsourcing. Some approaches employ automatic entity extraction to replace human annotation of data to be extracted and techniques to deal with the noise in such automatic annotations. Either direction poses major challenges and changes to existing data extraction technology. In this workshop, we bring together researchers from both directions.
This is the fourth installation of DEOS, the first held in Como in 2010, the second in Vienna in 2011, the third in Oxford 2013. The workshop is supported by the ERC DIADEM grant and the Oxford Martin school. There is a small amount of travel support available from the sponsors (contact: deos2014 at easychair.org).
Topics of Interests
The aim of the workshop is not only to share innovative ideas and results on the topics above but also to create a community of interest that flourishes during and after the workshop. The community will create a web of connected resources that captures the knowledge and interest of the attendees, and that can be shared and extended. We will exploit social networking as well as data publishing and curation tools to collect and maintain a body of knowledge related to the workshop, which includes not only the proceedings but also the discussed topics, materials, links, and datasets. It will be a way to keep attendees in touch and extend the community with other people interested.
Denilson Barbosa (University of Alberta) Michael Benedikt (Oxford University) Kalina Bontcheva (Univ. Sheffield) Rui Cai (Microsoft Research Asia) Stefano Ceri (Politecnico di Milano) Alexandra Cristea (Warwick Univ.) Sergio Flesca (Univ. Cosenza) Wolfgang Gatterbauer (CMU) Anna Gentile (Univ. Sheffield) Xiaonan Guo (Oxford University) Giovanni Grasso (Oxford University) Scott Hale (Oxford Internet Institute) Tamir Hassan (Univ. Konstanz) Jun Hong (Univ. of Belfast) Alberto Laender (Universidade Federal de Minas Gerais) Frederick Lochovsky (HKUST) Thomas Lukasiewicz (Oxford University) Roberto Navigli (La Sapienza, Rome) Pierre Senellart (Telekom Paris) Christian Schallhart (Oxford University) Davood Rafiei (University of Alberta) Sebastien Richard (EXALEAD) Mike Rosner (Univ. Malta) Weifeng Su (BNU-HKBU United International College) Oleg V. Ukhno (Yandex) Paola Velardi (Università di Roma La Sapienza) Ce Zhang (Univ. Washington)