Supplementary Data for "Differential correction of gender imbalance for top-cited scientists across scientific subfields over time"
A look at gender imbalance amongst top cited authors. The term "breakdown" as it appears here means aggregate counts in the following list: author_count_total, author_count_top_2_pct, author_count_top_2_pct_sy, author_count_total_male, author_count_top_2_pct_male, author_count_top_2_pct_sy_male, author_count_total_female, author_count_top_2_pct_female, author_count_top_2_pct_sy_female, author_count_total_unknown, author_count_top_2_pct_unknown, author_count_top_2_pct_sy_unknown. The analysis done produces the following computed field: Femaleprop: The number of female authors in the top 2pct of cited authors over all genderized authors in the top 2pct of cited author for entire career Femalepropsy: The number of female authors in the top 2pct of cited authors in 2021 over all genderized authors in the top 2pct of cited authors for entire career calculated for 2021 only Difference: Femalepropsy - Femaleprop Fmtoppropensity: For each cohort and subfield the product of (total male authors/total number of female authors) * (total female authors in the top 2pct of cited authors in 2021/ total male authors in the top 2pct of cited authors in 2021) or more simply. total male to female ratio * ratio of female to male authors int the top 2pct of cited authors in 2021.
Steps to reproduce
Code is provided with the dataset. Underlying datasets are generated by the code in the related link and this runs on the ICSR Lab data sharing platform (https://www.elsevier.com/icsr/icsrlab) using Scopus data. It is written in python (pyspark) and can be used with other datasets on any pyspark platform.
Additional metadata for Elsevier datasets
|Date the data was collected||2022-09-01T00:00:00.000Z|