[BioNLP] New Paper on Recognition of Chemical Entities

Ulf Leser leser at informatik.hu-berlin.de
Sat Apr 14 08:25:05 EDT 2012


ChemSpot: A Hybrid System for Chemical Named Entity Recognition
     Tim Rocktäschel
     Michael Weidlich
     Ulf Leser

Motivation: The accurate identification of chemicals in text is 
important for many applications, including computer-assisted 
reconstruction of metabolic networks or retrieval of information about 
substances in drug development. But due to the diversity of naming 
conventions and traditions for such molecules, this task is highly 
complex and should be supported by computational tools.

Results: We present ChemSpot, a named entity recognition tool for 
identifying mentions of chemicals in natural language texts, including 
trivial names, drugs, abbreviations, molecular formulas and IUPAC 
entities. Since the different classes of relevant entities have rather 
different naming characteristics, ChemSpot uses a hybrid approach 
combining a Conditional Random Field with a dictionary. It achieves an 
F1 measure of 68.1% on the SCAI corpus, outperforming the only other 
freely available chemical named entity recognition tool, OSCAR4, by 10.8 
percentage points.

Availability: ChemSpot is freely available at: 

Best wishes,

More information about the BioNLP mailing list