Predictive Models on the 2013 NCDB Colon Cancer Data

Name: Predictive Models on the 2013 NCDB Colon Cancer Data
Creator: Grey Leonard
Published: 2021-05-04T09:03:58.735Z
Keywords: Health Sciences

Leonard, Grey

doi:10.17632/jg44fgspzk.1

Predictive Models on the 2013 NCDB Colon Cancer Data

Published: 4 May 2021| Version 1 | DOI: 10.17632/jg44fgspzk.1

Contributor:

Grey Leonard

Description

The attached file contains R code which encompasses and describes the process of loading data, cleaning data, selecting variables, imputing missing values, creating training and test sets, model building and evaluation. Additionally, the code contains the process to create graphs and tables for data and model evaluation. The goal was to build a logistic regression model to predict outcomes after surgery for colon cancer and to compare its performance with machine learning algorithms. An XGBgoost model, a Random Forest model and an XGBoost model from oversampled data using SMOTE were built and compared with logistic regression. Overall, the machine learning algorithms had improved AUC.

Files are not publicly available

You can contact the author to request the files

Steps to reproduce

Execute this code in R software with access to the 2013 Colon section of the National Cancer Database

Institutions

Elsevier BV

Additional metadata for Elsevier datasets

Date the data was collected

2013-01-01T06:00:00.000Z

Predictive Models on the 2013 NCDB Colon Cancer Data

Description

Steps to reproduce

Institutions

Categories

Additional metadata for Elsevier datasets

Licence