[BioNLP] The release of GENETAG in BioC

Islamaj, Rezarta (NIH/NLM/NCBI) [F] islamaj at ncbi.nlm.nih.gov
Thu Mar 19 13:02:38 PDT 2015


The GENETAG corpus contains 20K sentences of manually annotated gene/protein names. The first 15K sentences were used for the BioCreative 1 (Task 1A) competition in 2004, and the rest, 5K sentences were used as test data for BioCreative II (Gene Mention Task) competition in 2005. Since then, this set has become a most widely used corpus for the development of gene/protein recognition tools.

This corpus has now been converted to BioC format and is available for download<http://sourceforge.net/projects/bioc/files/GeneTag.zip/download> at the BioC website (bioc.sourceforge.net).

Feel free to contact us if you have any questions,
Thank  you,
Rezarta, Don and John



----------------------------------------------------------
Rezarta Islamaj Dogan
http://www.ncbi.nlm.nih.gov/CBBresearch/Dogan/

National Center for Biotechnology Information
National Library of Medicine
Tel: (301) 435 8769
Rezarta.Islamaj at nih.gov


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.bionlp.org/pipermail/bionlp_bionlp.org/attachments/20150319/1e210062/attachment.html>


More information about the BioNLP mailing list