Skip to main content
Elsevier Data Repository

Datasets within this collection

Filter Results
1970 2024
31 results
  • Data for report "Artificial Intelligence: How knowledge is created, transferred, and used"
    There are the underlying data for our report "Artificial Intelligence: How knowledge is created, transferred, and used", published 2018. Data can be used to construct the graphs used in the report.
    • Dataset
  • October 2023 data-update for "Updated science-wide author databases of standardized citation indicators"
    Citation metrics are widely used and misused. We have created a publicly available database of top-cited scientists that provides standardized information on citations, h-index, co-authorship adjusted hm-index, citations to papers in different authorship positions and a composite indicator (c-score). Separate data are shown for career-long and, separately, for single recent year impact. Metrics with and without self-citations and ratio of citations to citing papers are given. Scientists are classified into 22 scientific fields and 174 sub-fields according to the standard Science-Metrix classification. Field- and subfield-specific percentiles are also provided for all scientists with at least 5 papers. Career-long data are updated to end-of-2022 and single recent year data pertain to citations received during calendar year 2022. The selection is based on the top 100,000 scientists by c-score (with and without self-citations) or a percentile rank of 2% or above in the sub-field. This version (6) is based on the October 1, 2023 snapshot from Scopus, updated to end of citation year 2022. This work uses Scopus data provided by Elsevier through ICSR Lab ( Calculations were performed using all Scopus author profiles as of October 1, 2023. If an author is not on the list it is simply because the composite indicator value was not high enough to appear on the list. It does not mean that the author does not do good work. PLEASE ALSO NOTE THAT THE DATABASE HAS BEEN PUBLISHED IN AN ARCHIVAL FORM AND WILL NOT BE CHANGED. The published version reflects Scopus author profiles at the time of calculation. We thus advise authors to ensure that their Scopus profiles are accurate. REQUESTS FOR CORRECIONS OF THE SCOPUS DATA (INCLUDING CORRECTIONS IN AFFILIATIONS) SHOULD NOT BE SENT TO US. They should be sent directly to Scopus, preferably by use of the Scopus to ORCID feedback wizard ( so that the correct data can be used in any future annual updates of the citation indicator databases. The c-score focuses on impact (citations) rather than productivity (number of publications) and it also incorporates information on co-authorship and author positions (single, first, last author). If you have additional questions, please read the 3 associated PLoS Biology papers that explain the development, validation and use of these metrics and databases. (, and Finally, we alert users that all citation metrics have limitations and their use should be tempered and judicious. For more reading, we refer to the Leiden manifesto:
    • Dataset
  • COVID-19: Public health, and societal and psychological impacts datasets
    • Collection
  • COVID-19: Epidemiology & infectious modelling datasets
    • Collection
  • COVID-19: Genetics, genomics & molecular structure datasets
    • Collection
  • COVID-19: Vaccine, prevention, diagnosis & treatment datasets
    • Collection
  • Mendeley Data FAIRest Datasets
    • Collection
  • Elsevier OA CC-BY Corpus
    This is a corpus of 40k (40,001) open access (OA) CC-BY articles from across Elsevier’s journals represent the first cross-discipline research of data at this scale to support NLP and ML research. This dataset was released to support the development of ML and NLP models targeting science articles from across all research domains. While the release builds on other datasets designed for specific domains and tasks, it will allow for similar datasets to be derived or for the development of models which can be applied and tested across domains.
    • Dataset
  • ChEMU dataset for information extraction from chemical patents
    The discovery of new chemical compounds and their synthesis process is of great importance to the chemical industry. Patent documents contain critical and timely information about newly discovered chemical compounds, providing a rich resource for chemical research in both academia and industry. Chemical patents are often the initial venues where a new chemical compound is disclosed. Only a small proportion of chemical compounds are ever published in journals and these publications can be delayed by up to 3 years after the patent disclosure. In addition, chemical patent documents usually contain unique information, such as reaction steps and experimental conditions for compound synthesis and mode of action. These details are crucial for the understanding of compound prior art, and provide a means for novelty checking and validation. Due to the high volume of chemical patents, approaches that enable automatic information extraction from these patents are in demand. To develop natural language processing methods for large-scale mining of chemical information from patent texts, a corpus is created providing chemical patent snippets and annotated entities and reaction steps.
    • Dataset
  • The researcher journey through a gender lens
    Data underlying the analyses in chapters 1, 2, 3, and 5 of the report "The researcher journey through a gender lens" (, which provides an analysis of the researcher journey, analysed using a gender lens. Data on authors, grantees and patent applicants pertain to researchers active during two periods, 16 geographies, and 26 subject areas and 11 sub-fields of medicine. Theses data are provided at the aggregated level.
    • Dataset