Datasets for analysis of co-occurrence of cell lines, basal media and supplementation in Open Access biomedical literature
This dataset contains three key pieces of data: 1. journal-list-issn.csv : contains a list of journal names and ISSNs that our corpus was limited to. 2. mediaQueriesMendeley.csv: contains a list of 39 distinct queries we used to search the corpus, all referencing one of 27 unique basal medias. 3. The folder 'Open Access sentences' includes 4 partitioned parquet files that together comprise a dataframe of 15,424 sentences that appeared in one of the journals and had a hit for one of the basal medias. The dataframe is structured as 'sentence', 'pii', 'year'.