ROSeAnn Java API

ROSeAnn provides a Java API enabling programmers to easily integrate its annotation and reconciliation features in Java programs via Maven. If you are not familiar with Maven, please have a look at http://maven.apache.org/.

Installation and Configuration

System Requirements

ROSeAnn requires a working Java 7 environment and Maven. In our Maven repository you will find all packages along with their source code, the documentation and all test cases. If you want to use the DOM annotation functionalities, you will need a working Mozilla Firefox installation (up to version 24.0).

Configuration

ROSeAnn Maven/Nexus repository can be accessed at:

using the following credentials:

Add to your pom.xml file the following dependency:

And you are now ready to go! Read our Javadoc documentation and learn how to use ROSeAnn.

Example Project

We have prepared an already-configured example Maven project. Do not hesitate to contact us for any configuration problems.

Configuration Files

ROSeAnn requires 3 configuration files. These files must be located in the /conf folder from the root folder of your project.

  • Configuration.xml. This is the main configuration file for ROSeAnn, specifying, e.g., the mapping ontology and the annotators to invoke. Here is an example configuration:

    The element materialized_folder_path specifies the location of the materialised ontology. The element aggregator_pool contains the set of reconciliation methods to be invoked by ROSeAnn (see below how to specify your reconciliation algorithm). The visualisation element contains information mostly about visualisation purposes, e.g. if only_clean for web is set to true, only those annotations that do not span over multiple DOM nodes will be visualized, if only_clean for PDF is set to true, only those annotations that do not span over multiple lines will be visualized.

  • TextannotatorConfiguration.xml. Provides the configuration of each annotator.

    The configuration specifies a set of annotator elements, where for each of them we can specify the URL endpoint (whenever the annotator is a web service), a set of keys to be used during the invocation (whenever needed), a set of name-value parameters that are annotator-specific (please refer to the documentation of each annotator) and an element ontology_prefix, i.e., the namespace prefix assigned to each concept from the annotator vocabulary. The element parallel_annotator specifies a set of annotators to invoke and a timeout.
    IMPORTANT: only those annotators that are specified in the parallel annotator's pool will be invoke by ROSeAnn (you can specify your annotator by defining a class that implements the AnnotatorAdapter abstract class).

  • AggregatorConfiguration.xml. This file provides the configuration for reconciliation algorithms. Here's an example:

    For each aggregator we can specify a set of parameters. In the example above we specify for MEMM the location of the folder containing the trained models used for prediction.

NOTE: all these configuration files must be specified into your project. By downloading the example project you can find all these configuration files with the default parameters.

The Ontology

ROSeAnn reconciliation algorithms rely on the presence of a user-input ontology. Since the default roseann ontology is a lightweight ontology which only contains axioms of subsumption, equivalent and disjointness, we make use of materialization version to improve the performance of roseann during reconciliation. We make use of OWL/Lite reasoner to materialize the subsumption, equivalent and disjoint relationship of the concepts, which are maintained as static files. These files contain all the constraints specified in the ontology (e.g., there is a file that specifies, for each class, the superclasses in the ontology). The location of the folder containg the materialized ontology files can be specified in the configuration file, if the default value is used (materializedOntology) then ROSeAnn will use its default ontology. You can find the materialized files of the default ROSeAnn ontology in the example project, under the path src/main/resources/uk/ac/ox/cs/diadem/roseann/ontology/materializedOntology. ROSeAnn has a method for the creation of the materialized ontology folder given an ontology file. This can be done with one line of code by specifying your OWL ontology file and your destination folder:

ROSeAnn roseann = new ROSeAnn();
File file = new File("your_folder_path");
roseann.createMaterializedOntologyFolder(new File("your_folder_path"),new URI("yout_owl_file_path.owl"));

After the creation of the materialized files into a folder, you can set the folder as the one to be used by ROSeAnn:

roseann.setMaterializedOntologyFolder(file);

Note that at the moment ROSeAnn works only with the materialized version of the ontology and an on-the-fly reasoning has not been implemented yet, adding an extendable reasoning framework is one of the feature with highest priority for the next releases of ROSeAnn.

IMPORTANT: In order to create the materialized ontology, your ontology file must specify a global prefix named global that is the namespace prefix of all the global concepts (the concepts to which all the individual annotators concepts are mapped) in the ontology. Also, the TextannotatorConfiguration.xml must specify an element ontology_prefix for each annotator that will be invoke by ROSeAnn, and this is the individual namespace prefix for those concepts identified by the individual annotator. The materialized files follow a specific name-pattern and this cannot be modified.

Reconciliation Algorithms

The core of ROSeAnn are the reconciliation algorithms. With its default setting ROSeAnn comes with two different algorithms: Weighted Repair (wr), a repair database algorithm, MEMM (memm), a Maximum Entropy Markov Model. You can specify your own reconciliation algorithm by writing a class that implements the abstract class AbstractAggregator and by adding the class in the Configuration.xml file. The AbstractAggregator class requires the implementation of only one method, reconcile, that takes in input a set of annotations, the annotated text and a set of completed individual annotators, and returns a set of annotations (the reconciled annotations). MEMM is a machine learning algorithm that requires a trained model to be run. The default trained models can be found in the example project under the path src/main/resources/uk/ac/ox/cs/diadem/roseann/aggregation/memmagg/trainedModel, but you can specify your trained models by setting the MEMM parameter in the AggregatorConfiguration.xml file.

Getting Started

Now that you have learned all the major components of ROSeAnn, you are ready to go! Follow the installation steps and download our example project and write your own code.

Initialize a ROSeAnn object with the parameters specified in the configuration files:

ROSeAnn roseann = new ROSeAnn();

Then, get an annotated document model:

AnnotatedDocumentModel model = roseann.annotateEntityPlainText("your input text", true);

The second boolean parameter indicates whether you want to invoke also the reconciliation algorithms. The object returned is an AnnotatedDocumentModel, which contains a document enriched with annotations (and conflict information).

You can iterate over the annotations in the following way:

Set allAnnotations = model.getAllAnnotations();
for(Annotation annotation:allAnnotations){
    System.out.println("Id-->"+annotation.getId());
    System.out.println("Concept-->"+annotation.getConcept());
    System.out.println("Annotator-->"+annotation.getOriginAnnotator());
    System.out.println("Start-->"+annotation.getStart());
    System.out.println("End-->"+annotation.getEnd());
}

And you can retrieve information about conflicts as follows:

Set allConflicts = model.getAllConflicts();
for(Conflict conflict:allConflicts){
    System.out.println("Id-->"+conflict.getId());
    System.out.println("Type-->"+conflict.getType()());
    System.out.println("Start-->"+conflict.getStart());
    System.out.println("End-->"+annotation.getEnd());
}

And eventually visualize all the information via our intuitive GUI:

ROSeAnn.visualizeAnnotatedDocument(model);

Congratulations! You've just understood how to use ROSeAnn. But remember, ROSeAnn offers much more than what stated above, you can annotate your PDF or HTML documents, you can get the annotations according to a specific concept or annotator. You can export your annotated documents along with all meta-data. If you want to know more about ROSeAnn, have a look at our Javadoc or download the source code of the class uk.ac.ox.cs.diadem.roseann.ROSeAnnClient that contains many more examples on how to use ROSeAnn.

The GUI

System Requirements

In order to annotate and visualise HTML documents, you must have a working Mozilla Firefox with version up to 24.0 (newer versions might work but have not been tested).

ROSeAnn API also offers a GUI to visualise the annotated documents and to annotate new ones on-the-fly. The GUI can be invoked by simply visualising an AnnotatedDocumentModel object (see example before) or via our GUI:

ROSeAnnGUI.launch();

This will launch the GUI and the user can interact with the GUI to annotate new documents (text, pdf, and html documents are supported so far). If you download our project and launch the GUI from inside the projects you will find some sample documents (already annotated) that can be visualized. Please read our demo paper for detailed information about our GUI.

For the Bravest

If you have tried all functionalities available in ROSeAnn and you are interested in extending it, download our source code! We strongly believe in Open Source and our code can be downloaded directly from the maven repository. If you have some questions or some ideas on how to improve ROSeAnn, please do not hesitate to contact us.

Go back to ROSeAnn main page