[BioNLP] Cocoa evaluation against the MLEE corpus

S. V. Ramanan ramanan.sv at gmail.com
Sun Dec 23 14:26:01 EST 2012

Dear BioNLP list members,

In his response to a previous post on the evaluation of Cocoa against  
the AnEM corpus, Sampo Pyysalo had suggested that:

> you might be interested in looking also at the
> MLEE corpus, which marks a superset of the AnEM entity types:
> http://www.nactem.ac.uk/MLEE/

The MLEE corpus seems like a good opportunity to re-evaluate Cocoa, as
- it is a totally unseen sample (for me)
- MLEE and AnEM were annotated by the same people
- the domain is narrow (angiogenesis)

Annotations against MLEE were retrieved using the Cocoa Web API, with  
*zero* changes made to the codebase used for previous evaluations  
against CRAFT, AnEM and AZDC. The results (for entity overlap)  
against the MLEE corpus are:

Entity                 P       R       F
======                ===     ===     ===

Molecule             0.80    0.82    0.81

Organism             0.77    0.89    0.82

Anatomical           0.84    0.86    0.85

Please note that, as with the AnEM evaluation, the "anatomical"  
entity evaluation was done against a lumped category over all sub- 
categories - we are still working on sub-categorizing anatomical  
entities bigger than a cell (tissue and up). 'Molecule' evaluations  
require the subcategories (roughly 'protein/gene' and 'chemical')  
also to be the same for the matched entities, while there are no  
subcategories for 'Organism' in the MLEE corpus.

As with the AnEM evaluation, we also looked at the individual Cell  
and Cellular_component subcategories. The results are (overlap, all  

Entity                 P       R       F
======                ===     ===     ===

Cell                  0.92    0.90    0.91

Cellular_component    0.40    0.61    0.49

While the performance in the Cell subcategory seems ok,  
cellular_components were identified poorly with, surprisingly, the  
precision being very low.

Some preliminary notes on this evaluation can be found at:


We welcome feedback and criticism from the BioNLP community.


S.V. Ramanan
ramanan.sv at gmail.com
ramanan at npjoint.com

