[BioNLP] New Paper on Recognition of Chemical Entities
leser at informatik.hu-berlin.de
Fri Apr 20 17:57:40 EDT 2012
thanks for the many comments on our recent paper on chemical named
entity recognition (and on chemical NER in general). We decided to pool
answers to some of the questions that came up.
> I think there's a bigger issue with evaluation here. I've reported F
> scores as high as 83.2% on chemistry before (strict boundary match):
We fully agree that results using limited gold standard corpora should
always be considered with care. Nevertheless, we think it is fair to
compare different tools on the same corpora, and we did our best to
consider as many such corpora as possible (we use the SCAI & IUPAC
corpus from SCAI and the DDI corpus, though this one probably shouldn't
be considered as a gold standard in the usual sense). It is a pity that,
in chemical NER as in many other interesting biomedical IE tasks, so few
corpora become publicly available, rendering published results
> Given all of these, it's not hard to see how F scores might go up or
> down by 20% or so depending on evaluation conditions. Really, we
The point in our work (as in many others) is to compare different tools
on the same corpus using the same evaluation scheme. Of course, such
comparisons might produce different results on different corpora
> a BioCreative for chemical NER.
> From a quick check, the experiments were evaluated using exact match
> criteria: the error analysis recognizes a category of partial
> matches (12% of FN, 30% of FP). I believe Peter is correct that the
We indeed evaluate using strict matching. We also provide our evaluation
script along with ChemSpot, so experiments with other criteria shouldn't
be too difficult.
> differences in performance compared to previous work have causes
> beyond matching criteria. The results reported for OSCAR4
> (45.7/76.5/57.3% p/r/f) suggest semantic scope mismatch may be one
As in Kolarik et al. (2008) and Hettne et al. (2009), the standard
configuration of OSCAR3/4 is used. As far as we know, there are no
differences concerning the evaluation scope. We also disregard
"MODIFIER" entities contained in the evaluation corpus. Maybe the
inconsistency of published results is caused by different / more recent
versions of OSCAR, which perform better on the corpora of Corbett and
Copestake (2008) but not on the corpus of Kolarik et al. (2008).
> To illustrate, there are roughly 3797 IUPAC names in SCAI and ~4102
> Chemical compounds in Sciborg. All other classes of chemicals
> namely, drugs, reactions, enzymes, chemical words are so few in
> number that most modelling techniques will not converge, even if they
> did, the model is likely to be ordinary at best. I am not sure why
> these other classes are defenestrated during annotation.
The main corpus used for evaluation on our work (SCAI corpus) contains
483 IUPAC and partial IUPAC entities and 723 non-IUPAC entities. Indeed,
this is a small corpus compared to the figures you give for Sciborg, but
the Sciborg corpus is not available (also not on request). We would
have loved to test on this corpus.
> If we do go for a shared task or anything like BioCreative, I reckon
> it is better take a fresh stock rather than dwell on the past and
> keep re-jigging the same old corpora again and again.
More information about the BioNLP