[BioNLP] New Paper on Recognition of Chemical Entities

Peter Corbett peter.corbett at linguamatics.com
Fri Apr 20 05:00:56 EDT 2012

On 19/04/12 18:19, Phil Gooch wrote:
> Hi Ulf
> Thanks for this. Unfortunately I don't have access to the full paper.
> Can I ask: is the 68.1% F1 measure calculated using strict (exact
> boundary match) or lenient (some overlap allowed) criteria?

No access here either.

I think there's a bigger issue with evaluation here. I've reported F 
scores as high as 83.2% on chemistry before (strict boundary match): 
http://www.biomedcentral.com/1471-2105/9/S11/S4/ - I think a lot depends on:

a) What the source text for the evaluation corpus was.
b) Exactly which chemical named entities were being annotated.
c) How well-defined the annotation task was; i.e. how extensive the 
guidelines were.
d) How good the inter-annotator agreement was.
e) Whether the software was developed for the corpus - i.e. whether 
development sets were annotated with the same guidelines as the test data.
f) Whether the training set was annotated with the same guidelines as 
the test set (e.g. by cross validation).

Given all of these, it's not hard to see how F scores might go up or 
down by 20% or so depending on evaluation conditions. Really, we need a 
BioCreative for chemical NER.

(Incidentally, F is a perverse metric, as precision-recall curves are 
typically the mirror image of F score contours, so another point is: g) 
Whether the software tried to balance precision and recall. But that's 
just a pet peeve of mine.)

Peter Corbett

More information about the BioNLP mailing list