[BioNLP] New Paper on Recognition of Chemical Entities

Ulf Leser leser at informatik.hu-berlin.de
Fri Apr 20 17:57:40 EDT 2012


thanks for the many comments on our recent paper on chemical named
entity recognition (and on chemical NER in general). We decided to pool 
answers to some of the questions that came up.

> I think there's a bigger issue with evaluation here. I've reported F
> scores as high as 83.2% on chemistry before (strict boundary match):

We fully agree that results using limited gold standard corpora should
always be considered with care. Nevertheless, we think it is fair to
compare different tools on the same corpora, and we did our best to
consider as many such corpora as possible (we use the SCAI & IUPAC 
corpus from SCAI and the DDI corpus, though this one probably shouldn't 
be considered as a gold standard in the usual sense). It is a pity that, 
in chemical NER as in many other interesting biomedical IE tasks, so few 
corpora become publicly available, rendering published results

> Given all of these, it's not hard to see how F scores might go up or
> down by 20% or so depending on evaluation conditions. Really, we
> need

The point in our work (as in many others) is to compare different tools 
on the same corpus using the same evaluation scheme. Of course, such
comparisons might produce different results on different corpora

> a BioCreative for chemical NER.


> From a quick check, the experiments were evaluated using exact match
>  criteria: the error analysis recognizes a category of partial
> matches (12% of FN, 30% of FP). I believe Peter is correct that the

We indeed evaluate using strict matching. We also provide our evaluation
script along with ChemSpot, so experiments with other criteria shouldn't
be too difficult.

> differences in performance compared to previous work have causes
> beyond matching criteria. The results reported for OSCAR4
> (45.7/76.5/57.3% p/r/f) suggest semantic scope mismatch may be one
> factor.

As in Kolarik et al. (2008) and Hettne et al. (2009), the standard
configuration of OSCAR3/4 is used. As far as we know, there are no
differences concerning the evaluation scope. We also disregard
"MODIFIER" entities contained in the evaluation corpus. Maybe the
inconsistency of published results is caused by different / more recent
versions of OSCAR, which perform better on the corpora of Corbett and
Copestake (2008) but not on the corpus of Kolarik et al. (2008).

> To illustrate, there are roughly 3797 IUPAC names in SCAI and ~4102
> Chemical compounds in Sciborg. All other classes of chemicals
> namely, drugs, reactions, enzymes, chemical words are so few in
> number that most modelling techniques will not converge, even if they
> did, the model is likely to be ordinary at best. I am not sure why
> these other classes are defenestrated during annotation.

The main corpus used for evaluation on our work (SCAI corpus) contains
483 IUPAC and partial IUPAC entities and 723 non-IUPAC entities. Indeed,
this is a small corpus compared to the figures you give for Sciborg, but 
the Sciborg corpus is not available (also not on request). We would
have loved to test on this corpus.

> If we do go for a shared task or anything like BioCreative, I reckon
> it is better take a fresh stock rather than dwell on the past and
> keep re-jigging the same old corpora again and again.


Best wishes,

More information about the BioNLP mailing list