source: type of source, e.g., scientific paper, abstract, drug leaflet, patient forum, tweet, etc.
lang = languages: comma-separated list of 2-letter ISO codes
description: short characterization of the corpus
noteworthiness: any specific feature of this dataset
NER: are entities annotated, and for what types of entities
linking: is entity linking provided, and to what ontologies
REL: are relations annotated
IE = information extraction style: between entity instances (one per pair of entity spans),
KB = knowledge-base style: between entities (one per text and pair of [linked] entities),
CL = text classification style: presence of a relation between entity types (one per text and pair of entity types); if only one type of relation is considered, this is a binary text classification task
REL list: if REL is non null, list of annotated relations
format: CONLL, BRAT, etc.
size: number of language units such as documents, sentences, words (please no megabytes)
publication: reference to a publication (peer-reviewed rather than preprint)
URL:
URL where the dataset can be downloaded or is described