[BioNLP] The release of GENETAG in BioC

Islamaj, Rezarta (NIH/NLM/NCBI) [F] islamaj at ncbi.nlm.nih.gov
Thu Mar 19 13:02:38 PDT 2015

The GENETAG corpus contains 20K sentences of manually annotated gene/protein names. The first 15K sentences were used for the BioCreative 1 (Task 1A) competition in 2004, and the rest, 5K sentences were used as test data for BioCreative II (Gene Mention Task) competition in 2005. Since then, this set has become a most widely used corpus for the development of gene/protein recognition tools.

This corpus has now been converted to BioC format and is available for download<http://sourceforge.net/projects/bioc/files/GeneTag.zip/download> at the BioC website (bioc.sourceforge.net).

Feel free to contact us if you have any questions,
Thank  you,
Rezarta, Don and John

Rezarta Islamaj Dogan

National Center for Biotechnology Information
National Library of Medicine
Tel: (301) 435 8769
Rezarta.Islamaj at nih.gov

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.bionlp.org/pipermail/bionlp_bionlp.org/attachments/20150319/1e210062/attachment.html>

More information about the BioNLP mailing list