Title: Data Fusion Methods for Studying Racial Disparities in Cancer Mortality
Abstract: Despite significant reductions in cancer mortality over the past three decades, racial disparities in cancer-specific mortality persist. Reasons for these disparities are related to a combination of patient and provider factors including access to care, socioeconomic status, quality of care, and comorbidities. To study these disparities over time, we turn to national cancer surveillance databases; however, different databases collect different parts of the required information. When estimating disparities in cancer mortality, using the National Cancer Institute’s Surveillance, Epidemiology, and End Results (SEER) registry means excluding information on important potential confounders like hospital type, insurance status, and comorbidities. On the other hand, the National Cancer Database (NCDB), which does provide those variables, excludes cause-of-death information, making it impossible to study cancer-specific mortality. Integrating data from both sources allows us to study associations between race and cancer-specific mortality over time adjusted for important confounders.
To analyze racial disparities in mortality from head and neck cancer, we propose methods in data fusion, a particularly challenging scenario in data integration in which the probability of observing complete data is zero for every subject. The outcome of interest is collected in one dataset, variables of interest are collected in another, and the two sources include a limited set of overlapping variables. We propose a doubly robust method that provides unbiased estimates of the regression parameters of a survival model if either the data source model or the model of the unobserved covariates is specified correctly. Through simulation studies, we present the properties of our method under correctly specified and misspecified models, and apply the method to study racial disparities in mortality from head and neck cancer.