[BioNLP] papers related to using NLP to improve concept identification?

Ahmed Abdeen Hamed ahmed.elmasri at gmail.com
Mon Nov 7 16:00:21 EST 2011


Thank you for sharing this really neat tool.

I would also like to add the list of the question the following: How do you
consume the annotations? Obviously, this is a rest web service and I am
sure it is returning XML representation of the results. How do you get a
list of entities and their types programatically, without a human clicking
on a mouse.

Thanks very much!
-Ahmed



On Mon, Nov 7, 2011 at 12:19 PM, Phil Gooch <philgooch at gmail.com> wrote:

> This looks pretty cool. It seems like you might be using automatic
> redirects from wikipedia to help construct the mappings to the correct
> DBpedia URI, e.g. 'malaria parasite' redirects to the article about
> Plasmodium. Would that be a correct assumption?
>
> How well does this tool work with large texts? One problem I've found with
> many biomedical concept identification systems is that they work best with
> text of the order of Medline abstracts or smaller, but struggle with larger
> documents. Where POS tagging and chunking can help is to pre-chunk large
> texts into domain-congruent noun- and verb phrases, each of which is
> submitted (by parallel threads to further improve performance) to the
> concept identifier.
>
> Phil
>
>
> On Mon, Nov 7, 2011 at 4:46 PM, Pablo Mendes <pablomendes at gmail.com>wrote:
>
>>
>> Hi Ning,
>> Our tool, DBpedia Spotlight, does such "concept identification". We
>> identify both Named Entities and more abstract concepts (fire, water). Our
>> target knowledge base is not UMLS, but DBpedia (a knowledge base extracted
>> from Wikipedia). Mappings from DBpedia to many other conceptual models
>> exist. One could be built for UMLS as well. Or you could use a corpus
>> annotated with UMLS concepts in order to "teach" our tool to annotate with
>> UMLS concepts.
>>
>> Please take a look:
>> http://spotlight.dbpedia.org/demo/index.html
>>
>> Taking your example text, the results don't seem too bad, although there
>> is one blatant error, and one debatable boundary-detection issue within 5
>> or so successful annotations:
>>
>> http://spotlight.dbpedia.org/rest/annotate?confidence=0&support=0&text=Merozoite%20Surface%20Protein%201%20is%20expressed%20on%20the%20surface%20of%20malaria%20merozoites%20and%20is%20important%20for%20invasion%20of%20the%20malaria%20parasite%20into%20erythrocytes.
>>
>> Our simplest approach does not rely on much NLP. It simply uses a
>> dictionary to recognize terms, and for disambiguation it relies on a model
>> of concepts in a multidimensional space of words (a VSM built from
>> Wikipedia). Our paper:
>> Pablo Mendes, Max Jakob, Andrés García-Silva and Christian Bizer. DBpedia
>> Spotlight: Shedding Light on the Web of Documents. In the Proceedings of
>> the 7th International Conference on Semantic Systems (I-Semantics). Graz,
>> Austria, 7–9 September 2011.
>>
>> However, the best performing solutions we have use POS, shallow parsing,
>> NER, etc. We are currently evaluating a number of these.
>>
>> Best,
>> Pablo
>>
>>
>> On Sat, Nov 5, 2011 at 8:46 PM, Ning Kang <emukang at gmail.com> wrote:
>>
>>> Hi, Bob,
>>>
>>> Thank you for your quick reply. I know for a lot of concept
>>> identification systems, they search the document to find the word in UMLS,
>>> and based on the context of the word, they got the correct concept ids and
>>> semantic groups for the words exists in this document.
>>>
>>> For example, for the sentence of "*Merozoite Surface Protein 1 is
>>> expressed on the surface of malaria merozoites and is important for
>>> invasion of the malaria parasite into erythrocytes.*" A concept
>>> identification system will find the following concepts
>>>
>>> ------------------------------------------------------------------------------------
>>>
>>> *Annotation content*
>>>
>>> *Annotation startPosition*
>>>
>>> *Annotation endPosition*
>>>
>>> *Annotation concept id*
>>>
>>> *Annotation concept name*
>>>
>>> *Annotation semanticTypeString*
>>>
>>> Merozoite
>>>
>>> 0
>>>
>>> 9
>>>
>>> 444659
>>>
>>> Merozoites
>>>
>>> 204
>>>
>>> expressed
>>>
>>> 31
>>>
>>> 40
>>>
>>> 1171362
>>>
>>> protein expression
>>>
>>> 45
>>>
>>> expressed
>>>
>>> 31
>>>
>>> 40
>>>
>>> 1515670
>>>
>>> mRNA Expression
>>>
>>> 45
>>>
>>> malaria
>>>
>>> 59
>>>
>>> 66
>>>
>>> 24530
>>>
>>> Malaria
>>>
>>> 47
>>>
>>> merozoites
>>>
>>> 67
>>>
>>> 77
>>>
>>> 444659
>>>
>>> Merozoites
>>>
>>> 204
>>>
>>> invasion
>>>
>>> 99
>>>
>>> 107
>>>
>>> 1269955
>>>
>>> tumor cell invasion
>>>
>>> 33
>>>
>>> invasion
>>>
>>> 99
>>>
>>> 107
>>>
>>> 2699153
>>>
>>> Cell Invasion
>>>
>>> 46
>>>
>>> malaria
>>>
>>> 115
>>>
>>> 122
>>>
>>> 24530
>>>
>>> Malaria
>>>
>>> 47
>>>
>>> parasite
>>>
>>> 123
>>>
>>> 131
>>>
>>> 30498
>>>
>>> Parasites
>>>
>>> 204
>>>
>>> erythrocytes
>>>
>>> 137
>>>
>>> 149
>>>
>>> 14792
>>>
>>> Erythrocytes
>>>
>>> 25
>>>
>>> ------------------------------------------------------------------------------------
>>>
>>> I would like to know if NLP(pos, chunking) can help concept
>>> identification. If so, what's the performance improvement.
>>>
>>> Thanks.
>>>
>>> On Sat, Nov 5, 2011 at 8:25 PM, Bob Futrelle <bob.futrelle at gmail.com>wrote:
>>>
>>>> "concept" is a very broad term. What are you thinking of more
>>>> specifically?
>>>> A few examples would help.
>>>>
>>>> - Bob Futrelle
>>>>   BioNLP.org
>>>>
>>>> On Sat, Nov 5, 2011 at 3:08 PM, Ning Kang <emukang at gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Does anyone know some papers related to using NLP to improve concept
>>>>> identification?
>>>>>
>>>>> For example, using POS or Chunking to improve the performance of
>>>>> concept identification systems, or the performance variation of
>>>>> some concept identification systems with/without using NLP as a pre-process
>>>>> model?
>>>>>
>>>>> Thank you very much.
>>>>>
>>>>> Ning Kang
>>>>>
>>>>> _______________________________________________
>>>>> BioNLP mailing list
>>>>> BioNLP at lists.ccs.neu.edu
>>>>> https://lists.ccs.neu.edu/bin/listinfo/bionlp
>>>>> The BioNLP website: http://www.bionlp.org
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> BioNLP mailing list
>>>> BioNLP at lists.ccs.neu.edu
>>>> https://lists.ccs.neu.edu/bin/listinfo/bionlp
>>>> The BioNLP website: http://www.bionlp.org
>>>>
>>>>
>>>
>>> _______________________________________________
>>> BioNLP mailing list
>>> BioNLP at lists.ccs.neu.edu
>>> https://lists.ccs.neu.edu/bin/listinfo/bionlp
>>> The BioNLP website: http://www.bionlp.org
>>>
>>>
>>
>> _______________________________________________
>> BioNLP mailing list
>> BioNLP at lists.ccs.neu.edu
>> https://lists.ccs.neu.edu/bin/listinfo/bionlp
>> The BioNLP website: http://www.bionlp.org
>>
>>
>
> _______________________________________________
> BioNLP mailing list
> BioNLP at lists.ccs.neu.edu
> https://lists.ccs.neu.edu/bin/listinfo/bionlp
> The BioNLP website: http://www.bionlp.org
>
>
-------------- next part --------------
HTML attachment scrubbed and removed


More information about the BioNLP mailing list