Articles

ROSeAnn - Reconciling Opinions of Semantic Annotators

ROSeAnn Logo Named Entity Recognisers (NERs) can be used to enrich both text and Web documents with semantic annotations. While originally focused on a few standard entity types such as Person, Organisation, and Location, the ecosystem of annotators is becoming increasingly diverse, with recognition capabilities ranging from generic to specialised entity types. Both the overlap and the diversity in annotator vocabularies motivate the need for managing and integrating semantic annotations: allowing users to see the results of multiple annotations and to merge them into a unified solution.

ROSeAnn, is a system for the management of semantic annotations. ROSeAnn provides users with a unified view over the opinion of multiple independent annotators both on text and Web documents. It allows users to understand and reconcile conflicts between annotations via ontology-aware aggregation. ROSeAnn incorporates both supervised aggregation via Maximum-Entropy Markov Models (MEMM), appropriate when representative training data is available, and an unsupervised method based on the notion of weighted-repair (WR).

ROSeAnn supports annotation of plain text, (online/offline) Web, and PDF documents.


Screenshots

ROSeAnn text annotation Individual annotators on Fox Corpus Individual annotators on Fox Corpus
Text AnnotationWeb AnnotationPDF Annotation

Data and Tools

ROSeAnn source code will soon be available open source. However, in case you would like to compare your/other annotator(s) with ROSeAnn, here you can find some useful tool that will help you replicate our results. ROSeAnn has been evaluated against a dataset compiled using documents from the proprietary Reuters and MUC7 corpora. Both dataset are proprietary and we cannot redistribute them so we provide only the annotations and the reference (i.e., the unique Id) to the actual document in the corpus. The Fox and NETagger Corpuses are in GATE format so that you can easily visualise the annotations. Best of luck!


Experimental Evaluation

We carried out an extensive experimental evaluation of our aggregation methods in order to (i) motivate our work and (ii) demonstrate the value of our aggregation methods. In the following, we use System X and Aggregator Y to comply with the user agreements of two of the systems that do not allow comparative/benchmarking evaluation.

The following table reports on the number of conflicts arising from the combination of individual annotations. In particular, the table reports, for each corpus:

  1. The number of annotated spans.
  2. The number of basic conflicts (BC), i.e., how many times a span is annotated with a concept C but some of the competent annotators (i.e., claiming to know C) do not annotate the span.
  3. The number of strong conflicts (SC), i.e., how many times a span is annotated with concepts that can be inferred to be logically disjoint w.r.t. the merged ontology.
  4. The number of spans containing BCs.
  5. The number of spans containing SCs.


Summary of logical conflicts.
Corpus Annotated Spans # of BCs # of SCs # of BC spans # of SC Spans
Reuters 36737 21639 2937 15654 1981
MUC7 55172 36756 3501 26174 2483
Fox 798 943 185 605 68
NETagger 1493 1486 179 1195 83

We have also tracked conflicts in spans, e.g. annotators agreeing on a concept, but disagreeing on the span. The following table reports, for each corpus, on the number and type of span conflicts, in particular:

  1. The number of annotated spans.
  2. The number of times all annotators agree on the span (but might disagree on the concept).
  3. The number of times annotators produce fully contained spans.
  4. The number of times annotators produce overlapping spans.
  5. The total number of conflicting spans (i.e., either contained or overlapping).


Summary of span conflicts.
Tot. SpanSame spanContainmentOverlapConflicts
Reuters27536422163652186154253728
MUC736805327696687874321391087
Fox14451106163738973835
NETagger15027849661703616531

Although one might expect that conflicts are rare, the result show that they are extremely frequent. A chief source of conflict is that annotators have very limited capability of using context to distinguish ambiguity in meaning. Bloomberg sometimes refers to a company, and sometimes to a person; Jaguar can refer to an animal, a car, or a sports team; Chelsea can refer to a place, a person, or a sports team; Notting Hill can refer to a place or a movie. Different annotators tend to favour particular solutions to these ambiguities.

Note that the number of conflicts is restricted by the limited overlap in the vocabularies and limited recall of the annotators. For example, it is very rare for three annotators to be mutually strongly conflicting on the same span, since it is unlikely that all three will simultaneously annotate the span.

Overall, the results show both the need for annotator integration and the possibility of using conflict and agreement signals in constructing superior aggregate annotations. The results are not meant to indicate an intrinsic pattern of who is correct on which concepts. Results can vary as different datasets are used; further, these annotators are frequently modified, with the modifications impacting both their vocabularies and their scope.


A first set of experiments has been performed to assess the current performance of individual annotators. A second set of experiments has been devoted to demonstrate and quantify the benefit of aggregation over individual annotators. Then, we compared ROSeAnn aggregation techniques with those provided by state-of-the art competitors. Finally, we tested how ROSeAnn scales w.r.t. an increasing number of individual annotators to be reconciled.

Individual Annotators

Reuters Corpus
MUC7 Corpus
NETagger Corpus
Fox Corpus
Individual annotators on Fox Corpu Individual annotators on Fox Corpu Individual annotators on Fox Corpu Individual annotators on Fox Corpu

Individual vs Aggregated

Reuters Corpus
MUC7 Corpus
NETagger Corpus
Fox Corpus
Extractiv
Extractiv logo
ROSeAnn vs Extractiv - Reuters Corpus ROSeAnn vs Extractiv - MUC7 Corpus ROSeAnn vs Extractiv - NETagger Corpus ROSeAnn vs Extractiv - Fox Corpus
Lupedia
Lupedia logo
ROSeAnn vs Extractiv - Reuters Corpus ROSeAnn vs Extractiv - MUC7 Corpus ROSeAnn vs Extractiv - NETagger Corpus ROSeAnn vs Extractiv - Fox Corpus
NETagger
NETagger logo
ROSeAnn vs Extractiv - Reuters Corpus ROSeAnn vs Extractiv - MUC7 Corpus ROSeAnn vs Extractiv - NETagger Corpus ROSeAnn vs Extractiv - Fox Corpus
OpenCalais
OpenCalais logo
ROSeAnn vs OpenCalais - Reuters Corpus ROSeAnn vs OpenCalais - MUC7 Corpus ROSeAnn vs OpenCalais - NETagger Corpus ROSeAnn vs OpenCalais - Fox Corpus
Saplo
Saplo logo
ROSeAnn vs Saplo - Reuters Corpus ROSeAnn vs Saplo - MUC7 Corpus ROSeAnn vs Saplo - NETagger Corpus ROSeAnn vs Saplo - Fox Corpus
DBPedia Spotlight
Lupedia logo
ROSeAnn vs Extractiv - Reuters Corpus ROSeAnn vs Extractiv - MUC7 Corpus ROSeAnn vs Extractiv - NETagger Corpus ROSeAnn vs Extractiv - Fox Corpus
StanfordNER
Stanford logo
ROSeAnn vs Extractiv - Reuters Corpus ROSeAnn vs Extractiv - MUC7 Corpus ROSeAnn vs Extractiv - NETagger Corpus ROSeAnn vs Extractiv - Fox Corpus
System X ROSeAnn vs Extractiv - Reuters Corpus ROSeAnn vs Extractiv - MUC7 Corpus ROSeAnn vs Extractiv - NETagger Corpus ROSeAnn vs Extractiv - Fox Corpus
Wikimeta
Wikimeta logo
ROSeAnn vs Extractiv - Reuters Corpus ROSeAnn vs Extractiv - MUC7 Corpus ROSeAnn vs Extractiv - NETagger Corpus ROSeAnn vs Extractiv - Fox Corpus
Yahoo! CA
YCA logo
ROSeAnn vs YQL - Reuters Corpus ROSeAnn vs YQL - MUC7 Corpus ROSeAnn vs YQL - NETagger Corpus ROSeAnn vs YQL - Fox Corpus
Zemanta
Zemanta logo
ROSeAnn vs YQL - Reuters Corpus ROSeAnn vs YQL - MUC7 Corpus ROSeAnn vs YQL - NETagger Corpus ROSeAnn vs YQL - Fox Corpus

Comparative Evaluation

Reuters Corpus
MUC7 Corpus
NETagger Corpus
Fox Corpus
Aggregator Y ROSeAnn vs YQL - Reuters Corpus ROSeAnn vs YQL - MUC7 Corpus ROSeAnn vs YQL - NETagger Corpus ROSeAnn vs YQL - Fox Corpus
Fox
Fox logo
ROSeAnn vs YQL - Reuters Corpus ROSeAnn vs YQL - MUC7 Corpus ROSeAnn vs YQL - NETagger Corpus ROSeAnn vs YQL - Fox Corpus

Performance

WR Computation
MEMM Inference
MEMM Training
WR Performance MEMM Inference MEMM Training

API

ROSeAnn can be embedded into your application via a Java API or you can use ROSeAnn through a web service endpoint

If none of the above suits your needs, you can use ROSeAnn in the old fashioned way, i.e., by downloading the entire suite as a single archive (tar.gz, tar.bz2, zip). Have a look here on how to install it.


Tutorial


References

  • L. Chen, S. Ortona, G. Orsi. ROSeAnn: Taming Online Semantic Annotators. WWW 2014 (Developer Track).
  • L. Chen, S. Ortona, G. Orsi, and M. Benedikt. Aggregating Semantic Annotators. PVLDB 2013 (Full paper).
  • L. Chen, S. Ortona, G. Orsi, and M. Benedikt. ROSeAnn: Reconciling Opinions of Semantic Annotators. PVLDB 2013 (Demo).

Contacts

If not differently specified (name dot surname at cs dot ox dot ac dot uk)