Predictive Models on the 2013 NCDB Colon Cancer Data
The attached file contains R code which encompasses and describes the process of loading data, cleaning data, selecting variables, imputing missing values, creating training and test sets, model building and evaluation. Additionally, the code contains the process to create graphs and tables for data and model evaluation. The goal was to build a logistic regression model to predict outcomes after surgery for colon cancer and to compare its performance with machine learning algorithms. An XGBgoost model, a Random Forest model and an XGBoost model from oversampled data using SMOTE were built and compared with logistic regression. Overall, the machine learning algorithms had improved AUC.
Files are not publicly available
You can contact the author to request the files
Steps to reproduce
Execute this code in R software with access to the 2013 Colon section of the National Cancer Database
Additional metadata for Elsevier datasets
|Date the data was collected||2013-01-01T06:00:00.000Z|