CORD-19 SciSpaCy Entity Dataset
Description
Dataset of biomedical entities extracted from the CORD-19 dataset (2020-08-28 and 2020-09-28) using trained NER (trained against CRAFT, JNLPBA, BC5CDR, and BioNLP) and NERL models (UMLS, MeSH, GO, HPO, and RxNorm) from the SciSpaCy project, provided as structured Parquet files. Dataset may be useful for downstream tasks around entity linking and relationship extraction. The work was carried out using Dask on the Saturn Cloud platform, and was a joint effort between Elsevier Labs and Saturn Cloud. Dataset available at: s3://els-labs-website/cord19-scispacy-entities/
Files
Steps to reproduce
Jupyter Notebooks to reproduce are available on: https://github.com/sujitpal/saturn-scispacy, please follow instructions in the README.md file. Dataset available as Parquet files at (requester pays network charges for downloads): s3://els-saturn-scispacy/cord19-scispacy-entities/
Institutions
Categories
Additional metadata for Elsevier datasets
Date the data was collected | 2020-08-28T07:00:00.000Z |