[BioNLP] Evaluating the Cocoa NER annotator against anatomical entities, diseases and organisms

S. V. Ramanan ramanan.sv at gmail.com
Wed Dec 19 16:16:57 EST 2012

Dear BioNLP list members,

We evaluated the performance of the Cocoa NER annotator (http:// 
npjoint.com) against three corpora:
1. the Colorado Richly Annotated Full Text (CRAFT) corpus
ii. The Anatomical Entity mention (AnEM) corpus
iii. The ARizona Disease (AZDC) corpus
for anatomical entities, disease and organisms.

The annotator now supports an extended annotation mode, where nested  
entity mentions as well as fine-grained descriptions of e.g.  
anatomical entities are tagged, which made this evaluation possible.

The results (for entity overlap) against the *CRAFT corpus* are:

Entity           Ontology        P       R       F
======           ========       ===     ===     ===

Cell                  CL        0.93    0.90    0.91

Cellular_component   GO_CC      0.76    0.76    0.73

Organism            NCBITaxon   0.92    0.95    0.93

For the *AnEM corpus*, we find:

Entity           Ontology        P       R       F
======           ========       ===     ===     ===

Cell                  CARO      0.90    0.76    0.83

Cellular_component   GO_CC      0.63    0.67    0.65

All entities            *       0.82    0.79    0.81

The performance against the *Arizona Disease corpus* was:

Entity           Ontology        P       R       F
======           ========       ===     ===     ===

Disease                UMLS     0.83    0.86    0.85

In all cases, P= Precision, R=Recall, and F=F-measure.

Some notes on the evaluation process, as well as the evaluation  
scripts, are available at:


The annotations from Cocoa (in BRAT stand-off format) are available at:


Notes on getting extended Cocoa annotations through a Web API are at:


We welcome and appreciate comments/criticism from the BioNLP community.

S. V. Ramanan
ramanan.sv at gmail.com
ramanan at npjoint.com

More information about the BioNLP mailing list