August 2021 data-update for "Updated science-wide author databases of standardized citation indicators"
Citation metrics are widely used and misused. We have created a publicly available database of over 100,000 top-scientists that provides standardized information on citations, h-index, co-authorship adjusted hm-index, citations to papers in different authorship positions and a composite indicator. Separate data are shown for career-long and single year impact. Metrics with and without self-citations and ratio of citations to citing papers are given. Scientists are classified into 22 scientific fields and 176 sub-fields. Field- and subfield-specific percentiles are also provided for all scientists who have published at least 5 papers. Career-long data are updated to end-of-2020. The selection is based on the top 100,000 by c-score (with and without self-citations) or a percentile rank of 2% or above. The dataset and code provides an update to previously released version 1 data under https://doi.org/10.17632/btchxktzyw.1; The version 2 dataset is based on the May 06, 2020 snapshot from Scopus and is updated to citation year 2019 available at https://doi.org/10.17632/btchxktzyw.2 This version (3) is based on the Aug 01, 2021 snapshot from Scopus and is updated to citation year 2020.
Steps to reproduce
Code is provided with the dataset and runs on the ICSR Lab data sharing platform (https://www.elsevier.com/icsr/icsrlab) using Scopus data. It is written in python (pyspark) and can be used with other datasets on any pyspark platform.
Additional metadata for Elsevier datasets
|Date the data was collected