Skip to main content
Elsevier BV
Exit comparison There are 20 differences
Removed Added or changed

September 2022 data-update for "Updated science-wide author databases of standardized citation indicators"

DOI: 10.17632/btchxktzyw.4 Version 4 | Published: 10 Oct 2022
Contributor(s):
  • John P.A. Ioannidis
    Unspecified
    Departments of Medicine, of Health Research and Policy, of Biomedical Data Science, and of Statistics, and Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, CA, USA

Description of this data

Citation metrics are widely used and misused. We have created a publicly available database of top-cited scientists that provides standardized information on citations, h-index, co-authorship adjusted hm-index, citations to papers in different authorship positions and a composite indicator (c-score). Separate data are shown for career-long and, separately, for single recent year impact. Metrics with and without self-citations and ratio of citations to citing papers are given. Scientists are classified into 22 scientific fields and 176 sub-fields. Field- and subfield-specific percentiles are also provided for all scientists with at least 5 papers. Career-long data are updated to end-of-2021 and single recent year data pertain to citations received during calendar year 2021. The selection is based on the top 100,000 scientists by c-score (with and without self-citations) or a percentile rank of 2% or above in the sub-field. 195,605 scientists are included in the career-long database and 200,409 scientists are included in the single recent year dataset. This version (4) is based on the Sept 1, 2022 snapshot from Scopus, updated to end of citation year 2021. This work uses Scopus data provided by Elsevier through ICSR Lab (https://www.elsevier.com/icsr/icsrlab). Calculation were performed using all Scopus author profiles as of September 1, 2022. If an author is not on the list it is simply because the composite indicator value was not high enough to appear on the list. It does not mean that the author does not do good work. Please also note that the database has been published in an archival form and will not be changed. The published version accurately reflects Scopus author profiles at the time of calculation. Some authors may not appear on the list if their Scopus profile was inaccurate (missing publications and citations) at the time of calculation. We thus advise authors to ensure that their Scopus profiles are accurate. Requests for corrections of the Scopus data should not be sent to us. They should be sent directly to Scopus, preferably by use of the Scopus to ORCID feedback wizard (https://orcid.scopusfeedback.com/) so that the correct data can be used in any future annual updates of the citation indicator databases. The c-score focuses on impact (citations) rather than productivity (number of publications) and it also incorporates information on co-authorship and author positions (single, first, last author). If you have additional questions, please read the 3 associated papers published in PLoS Biology that explain the development, validation and use of these metrics and databases. (https://doi.org/10.1371/journal.pbio.1002501, https://doi.org/10.1371/journal.pbio.3000384 and https://doi.org/10.1371/journal.pbio.3000918). Finally, we alert users that all citation metrics have limitations and their use should be tempered and judicious. For more reading, we refer to the Leiden manifesto: https://www.nature.com/articles/520429a

Steps to reproduce

Code is provided with the dataset and runs on the ICSR Lab data sharing platform (https://www.elsevier.com/icsr/icsrlab) using Scopus data. It is written in python (pyspark) and can be used with other datasets on any pyspark platform.

Experimental data files

  • xlsx
    Table_1_Authors_career_2021_pubs_since_1788_wopp_extracted_202209.xlsx
    77 MB
  • xlsx
    Table_1_Authors_singleyr_2021_pubs_since_1788_wopp_extracted_202209.xlsx
    70 MB
  • xlsx
    Table_2_field_subfield_thresholds_career_2021_pubs_since_1788_wopp_extracted_202209.xlsx
    42 KB
  • xlsx
    Table_2_field_subfield_thresholds_singleyr_2021_pubs_since_1788_wopp_extracted_202209.xlsx
    40 KB
  • xlsx
    Table_3_maxlog_career_2021_pubs_since_1788_wopp_extracted_202209.xlsx
    5 KB
  • xlsx
    Table_3_maxlog_singleyr_2021_pubs_since_1788_wopp_extracted_202209.xlsx
    5 KB
  • py
    top cited scholars - base v07f.py
    57 KB
  • py
    top cited scholars - run - 202209 - 07f - career.py
    21 KB
  • py
    top cited scholars - run - 202209 - 07f - singleyr 2021 citations to career up to y.py
    21 KB

Related links

Licence

Attribution-NonCommercial 3.0 Unported

Linked articles

Categories

Bibliometrics

September 2022 data-update for "Updated science-wide author databases of standardized citation indicators"

DOI: 10.17632/btchxktzyw.5 Version 5 | Published: 3 Nov 2022
Contributor(s):
  • John P.A. Ioannidis
    Unspecified
    Departments of Medicine, of Health Research and Policy, of Biomedical Data Science, and of Statistics, and Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, CA, USA

Description of this data

See file 28oct2022_v5_update_release_notes.txt below for detailed explanation of differences between versions 5 and 4. They both use the same data but version 5 has more appropriate subfield assignment. Citation metrics are widely used and misused. We have created a publicly available database of top-cited scientists that provides standardized information on citations, h-index, co-authorship adjusted hm-index, citations to papers in different authorship positions and a composite indicator (c-score). Separate data are shown for career-long and, separately, for single recent year impact. Metrics with and without self-citations and ratio of citations to citing papers are given. Scientists are classified into 22 scientific fields and 174 sub-fields. Field- and subfield-specific percentiles are also provided for all scientists with at least 5 papers. Career-long data are updated to end-of-2021 and single recent year data pertain to citations received during calendar year 2021. The selection is based on the top 100,000 scientists by c-score (with and without self-citations) or a percentile rank of 2% or above in the sub-field. This version (5) is based on the Sept 1, 2022 snapshot from Scopus, updated to end of citation year 2021. This work uses Scopus data provided by Elsevier through ICSR Lab (https://www.elsevier.com/icsr/icsrlab). Calculations were performed using all Scopus author profiles as of September 1, 2022. If an author is not on the list it is simply because the composite indicator value was not high enough to appear on the list. It does not mean that the author does not do good work. PLEASE ALSO NOTE THAT THE DATABASE HAS BEEN PUBLISHED IN AN ARCHIVAL FORM AND WILL NOT BE CHANGED. The published version reflects Scopus author profiles at the time of calculation. We thus advise authors to ensure that their Scopus profiles are accurate. REQUESTS FOR CORRECIONS OF THE SCOPUS DATA (INCLUDING CORRECTIONS IN AFFILIATIONS) SHOULD NOT BE SENT TO US. They should be sent directly to Scopus, preferably by use of the Scopus to ORCID feedback wizard (https://orcid.scopusfeedback.com/) so that the correct data can be used in any future annual updates of the citation indicator databases. The c-score focuses on impact (citations) rather than productivity (number of publications) and it also incorporates information on co-authorship and author positions (single, first, last author). If you have additional questions, please read the 3 associated PLoS Biology papers that explain the development, validation and use of these metrics and databases. (https://doi.org/10.1371/journal.pbio.1002501, https://doi.org/10.1371/journal.pbio.3000384 and https://doi.org/10.1371/journal.pbio.3000918). Finally, we alert users that all citation metrics have limitations and their use should be tempered and judicious. For more reading, we refer to the Leiden manifesto: https://www.nature.com/articles/520429a

Steps to reproduce

Code is provided with the dataset and runs on the ICSR Lab data sharing platform (https://www.elsevier.com/icsr/icsrlab) using Scopus data. It is written in python (pyspark) and can be used with other datasets on any pyspark platform.

Experimental data files

  • txt
    28oct2022_v5_update_release_notes.txt
    2 KB
  • xlsx
    Table_1_Authors_career_2021_pubs_since_1788_wopp_extracted_202209b.xlsx
    77 MB
  • xlsx
    Table_1_Authors_singleyr_2021_pubs_since_1788_wopp_extracted_202209b.xlsx
    76 MB
  • xlsx
    Table_2_field_subfield_thresholds_career_2021_pubs_since_1788_wopp_extracted_202209b.xlsx
    45 KB
  • xlsx
    Table_2_field_subfield_thresholds_singleyr_2021_pubs_since_1788_wopp_extracted_202209b.xlsx
    43 KB
  • xlsx
    Table_3_maxlog_career_2021_pubs_since_1788_wopp_extracted_202209b.xlsx
    5 KB
  • xlsx
    Table_3_maxlog_singleyr_2021_pubs_since_1788_wopp_extracted_202209b.xlsx
    5 KB
  • py
    top cited scholars - base v07g.py
    57 KB
  • py
    top cited scholars - run - 202209 - 07g - career.py
    21 KB
  • py
    top cited scholars - run - 202209 - 07g - singleyr 2022 citations to career up to y.py
    21 KB

Related links

Licence

Attribution-NonCommercial 3.0 Unported

Linked articles

Categories

Bibliometrics