September 2022 data-update for "Updated science-wide author databases of standardized citation indicators"

Published: 10 October 2022| Version 4 | DOI: 10.17632/btchxktzyw.4


Citation metrics are widely used and misused. We have created a publicly available database of top-cited scientists that provides standardized information on citations, h-index, co-authorship adjusted hm-index, citations to papers in different authorship positions and a composite indicator (c-score). Separate data are shown for career-long and, separately, for single recent year impact. Metrics with and without self-citations and ratio of citations to citing papers are given. Scientists are classified into 22 scientific fields and 176 sub-fields. Field- and subfield-specific percentiles are also provided for all scientists with at least 5 papers. Career-long data are updated to end-of-2021 and single recent year data pertain to citations received during calendar year 2021. The selection is based on the top 100,000 scientists by c-score (with and without self-citations) or a percentile rank of 2% or above in the sub-field. 195,605 scientists are included in the career-long database and 200,409 scientists are included in the single recent year dataset. This version (4) is based on the Sept 1, 2022 snapshot from Scopus, updated to end of citation year 2021. This work uses Scopus data provided by Elsevier through ICSR Lab ( Calculation were performed using all Scopus author profiles as of September 1, 2022. If an author is not on the list it is simply because the composite indicator value was not high enough to appear on the list. It does not mean that the author does not do good work. Please also note that the database has been published in an archival form and will not be changed. The published version accurately reflects Scopus author profiles at the time of calculation. Some authors may not appear on the list if their Scopus profile was inaccurate (missing publications and citations) at the time of calculation. We thus advise authors to ensure that their Scopus profiles are accurate. Requests for corrections of the Scopus data should not be sent to us. They should be sent directly to Scopus, preferably by use of the Scopus to ORCID feedback wizard ( so that the correct data can be used in any future annual updates of the citation indicator databases. The c-score focuses on impact (citations) rather than productivity (number of publications) and it also incorporates information on co-authorship and author positions (single, first, last author). If you have additional questions, please read the 3 associated papers published in PLoS Biology that explain the development, validation and use of these metrics and databases. (, and Finally, we alert users that all citation metrics have limitations and their use should be tempered and judicious. For more reading, we refer to the Leiden manifesto:


Steps to reproduce

Code is provided with the dataset and runs on the ICSR Lab data sharing platform ( using Scopus data. It is written in python (pyspark) and can be used with other datasets on any pyspark platform.


Stanford University