Datasets Comparison
Version 2
Supplementary Data for "Differential correction of gender imbalance for top-cited scientists across scientific subfields over time"
Description
A look at gender imbalance amongst top cited authors. The term "breakdown" as it appears here means aggregate counts in the following list: author_count_total, author_count_top_2_pct, author_count_top_2_pct_sy, author_count_total_male, author_count_top_2_pct_male, author_count_top_2_pct_sy_male, author_count_total_female, author_count_top_2_pct_female, author_count_top_2_pct_sy_female, author_count_total_unknown, author_count_top_2_pct_unknown, author_count_top_2_pct_sy_unknown.
The analysis done produces the following computed field:
Femaleprop: The number of female authors in the top 2pct of cited authors over all genderized authors in the top 2pct of cited author for entire career
Femalepropsy: The number of female authors in the top 2pct of cited authors in 2021 over all genderized authors in the top 2pct of cited authors for entire career calculated for 2021 only
Difference: Femalepropsy - Femaleprop
Fmtoppropensity: For each cohort and subfield the product of
(total male authors/total number of female authors) * (total female authors in the top 2pct of cited authors in 2021/ total male authors in the top 2pct of cited authors in 2021)
or more simply.
total male to female ratio * ratio of female to male authors int the top 2pct of cited authors in 2021.
Steps to reproduce
Code is provided with the dataset and runs on the ICSR Lab data sharing platform (https://www.elsevier.com/icsr/icsrlab) using Scopus data. It is written in python (pyspark) and can be used with other datasets on any pyspark platform.
Institutions
Elsevier BV, Stanford University
Categories
Bibliometrics
Related Links
Licence
Version 3
Supplementary Data for "Differential correction of gender imbalance for top-cited scientists across scientific subfields over time"
Description
A look at gender imbalance amongst top cited authors. The term "breakdown" as it appears here means aggregate counts in the following list: author_count_total, author_count_top_2_pct, author_count_top_2_pct_sy, author_count_total_male, author_count_top_2_pct_male, author_count_top_2_pct_sy_male, author_count_total_female, author_count_top_2_pct_female, author_count_top_2_pct_sy_female, author_count_total_unknown, author_count_top_2_pct_unknown, author_count_top_2_pct_sy_unknown.
The analysis done produces the following computed field:
Femaleprop: The number of female authors in the top 2pct of cited authors over all genderized authors in the top 2pct of cited author for entire career
Femalepropsy: The number of female authors in the top 2pct of cited authors in 2021 over all genderized authors in the top 2pct of cited authors for entire career calculated for 2021 only
Difference: Femalepropsy - Femaleprop
Fmtoppropensity: For each cohort and subfield the product of
(total male authors/total number of female authors) * (total female authors in the top 2pct of cited authors in 2021/ total male authors in the top 2pct of cited authors in 2021)
or more simply.
total male to female ratio * ratio of female to male authors int the top 2pct of cited authors in 2021.
Steps to reproduce
Code is provided with the dataset. Underlying datasets are generated by the code in the related link and this runs on the ICSR Lab data sharing platform (https://www.elsevier.com/icsr/icsrlab) using Scopus data. It is written in python (pyspark) and can be used with other datasets on any pyspark platform.
Institutions
Elsevier BV, Stanford University
Categories
Bibliometrics
Related Links
Licence
Creative Commons Attribution 4.0 International