Show pageOld revisionsBacklinksExport to PDFODT exportBack to top This page is read only. You can view the source, but not change it. Ask your administrator if you think this is wrong. ====== List of existing resources useful for natural language processing in the pharmacovigilance domain ====== * **source**: type of source, e.g., scientific paper, abstract, drug leaflet, patient forum, tweet, etc. * **lang** = languages: comma-separated list of 2-letter ISO codes * **description**: short characterization of the corpus * **noteworthiness**: any specific feature of this dataset * **NER**: are entities annotated, and for what types of entities * **linking**: is entity linking provided, and to what ontologies * **REL**: are relations annotated * IE = information extraction style: between entity instances (one per pair of entity spans), * KB = knowledge-base style: between entities (one per text and pair of [linked] entities), * CL = text classification style: presence of a relation between entity types (one per text and pair of entity types); if only one type of relation is considered, this is a binary text classification task * **REL list**: if REL is non null, list of annotated relations * **format**: CONLL, BRAT, etc. * **size**: number of language units such as documents, sentences, words (please no megabytes) * **publication**: reference to a publication (peer-reviewed rather than preprint) * **URL**: URL where the dataset can be downloaded or is described ^ name ^ source ^ lang ^ description ^ noteworthiness ^ NER ^ linking ^ REL ^ REL list ^ format ^ size ^ publication ^ URL ^ ^ TLC | patient forum | de | dataset annotated with layman expressions: Fachterm, Laienbegriff, Abkürzung | | layman terms, including their associated technical terms; technical term with a rather layman term | no | no | | | BRAT | 4000 documents | https://www.aclweb.org/anthology/2020.lrec-1.759/ | http://macss.dfki.de/data/LREC2020/TLC_v01.tar.gz | resources/existing.txt Last modified: 2021/05/12 17:11by pz