====== List of existing resources useful for natural language processing in the pharmacovigilance domain ====== * **source**: type of source, e.g., scientific paper, abstract, drug leaflet, patient forum, tweet, etc. * **lang** = languages: comma-separated list of 2-letter ISO codes * **description**: short characterization of the corpus * **noteworthiness**: any specific feature of this dataset * **NER**: are entities annotated, and for what types of entities * **linking**: is entity linking provided, and to what ontologies * **REL**: are relations annotated * IE = information extraction style: between entity instances (one per pair of entity spans), * KB = knowledge-base style: between entities (one per text and pair of [linked] entities), * CL = text classification style: presence of a relation between entity types (one per text and pair of entity types); if only one type of relation is considered, this is a binary text classification task * **REL list**: if REL is non null, list of annotated relations * **format**: CONLL, BRAT, etc. * **size**: number of language units such as documents, sentences, words (please no megabytes) * **publication**: reference to a publication (peer-reviewed rather than preprint) * **URL**: URL where the dataset can be downloaded or is described ^ name ^ source ^ lang ^ description ^ noteworthiness ^ NER ^ linking ^ REL ^ REL list ^ format ^ size ^ publication ^ URL ^ ^ TLC | patient forum | de | dataset annotated with layman expressions: Fachterm, Laienbegriff, Abkürzung | | layman terms, including their associated technical terms; technical term with a rather layman term | no | no | | | BRAT | 4000 documents | https://www.aclweb.org/anthology/2020.lrec-1.759/ | http://macss.dfki.de/data/LREC2020/TLC_v01.tar.gz |