September 2022 data-update for "Updated science-wide author databases of standardized citation indicators"
See file 28oct2022_v5_update_release_notes.txt below for detailed explanation of differences between versions 5 and 4. They both use the same data but version 5 has more appropriate subfield assignment. Citation metrics are widely used and misused. We have created a publicly available database of top-cited scientists that provides standardized information on citations, h-index, co-authorship adjusted hm-index, citations to papers in different authorship positions and a composite indicator (c-score). Separate data are shown for career-long and, separately, for single recent year impact. Metrics with and without self-citations and ratio of citations to citing papers are given. Scientists are classified into 22 scientific fields and 174 sub-fields. Field- and subfield-specific percentiles are also provided for all scientists with at least 5 papers. Career-long data are updated to end-of-2021 and single recent year data pertain to citations received during calendar year 2021. The selection is based on the top 100,000 scientists by c-score (with and without self-citations) or a percentile rank of 2% or above in the sub-field. This version (5) is based on the Sept 1, 2022 snapshot from Scopus, updated to end of citation year 2021. This work uses Scopus data provided by Elsevier through ICSR Lab (https://www.elsevier.com/icsr/icsrlab). Calculations were performed using all Scopus author profiles as of September 1, 2022. If an author is not on the list it is simply because the composite indicator value was not high enough to appear on the list. It does not mean that the author does not do good work. PLEASE ALSO NOTE THAT THE DATABASE HAS BEEN PUBLISHED IN AN ARCHIVAL FORM AND WILL NOT BE CHANGED. The published version reflects Scopus author profiles at the time of calculation. We thus advise authors to ensure that their Scopus profiles are accurate. REQUESTS FOR CORRECIONS OF THE SCOPUS DATA (INCLUDING CORRECTIONS IN AFFILIATIONS) SHOULD NOT BE SENT TO US. They should be sent directly to Scopus, preferably by use of the Scopus to ORCID feedback wizard (https://orcid.scopusfeedback.com/) so that the correct data can be used in any future annual updates of the citation indicator databases. The c-score focuses on impact (citations) rather than productivity (number of publications) and it also incorporates information on co-authorship and author positions (single, first, last author). If you have additional questions, please read the 3 associated PLoS Biology papers that explain the development, validation and use of these metrics and databases. (https://doi.org/10.1371/journal.pbio.1002501, https://doi.org/10.1371/journal.pbio.3000384 and https://doi.org/10.1371/journal.pbio.3000918). Finally, we alert users that all citation metrics have limitations and their use should be tempered and judicious. For more reading, we refer to the Leiden manifesto: https://www.nature.com/articles/520429a
Steps to reproduce
Code is provided with the dataset and runs on the ICSR Lab data sharing platform (https://www.elsevier.com/icsr/icsrlab) using Scopus data. It is written in python (pyspark) and can be used with other datasets on any pyspark platform.
Additional metadata for Elsevier datasets
|Date the data was collected||2022-09-01T00:00:00.000Z|