Portfolio Assignment – Data

Introduction

Glioblastoma multiforme (GBM) is a highly fatal primary brain tumor with less than 5% of patients surviving past 5 years and no improvements of this metric in the past 3 decades (Tamimi & Juweid, 2017). While the challenges in improving this outcome relate to the heterogeneity of mutations and infiltrating nature of the tumor, growing interest in understanding the epidemiological landscape of the disease has been equally relevant. Early studies by both Greece and Korea, observed rises in the incidence rate of GBM despite population-based corrections (Lee et al., 2010; Gousias et al., 2009). During 2018, both England and France further reported rises in GBM incidence from national registries collected by the UK Office of National Statistics and Santé Publique France respectively (Phillips et al., 2018a; Phonegate, 2019). In the report, rises in frontal and temporal GBM incidence and increased overall incidence was noted in all ages. Moreover, recent studies conducted in Finland and Italy, reported similar findings of increased GBM incidence (Korja et al., 2019; Grech et al., 2020). However other groups such as the Central Brain Tumor Registry of the United States have not found such associations and others have acknowledged such trends are not as clearly claimed (Ostrom et al., 2018; Phillips et al., 2018b).

Nevertheless, reasons for this rise in GBM incidence have been suggested to be from increased aging populations, cell phone use, air pollution and overdiagnosis of the disease (Ostrom et al., 2014; Calderón-Garcidueñas et al., 2008). However, with no clear consensus of whether this trend in incidence is true, some epidemiologists have advised against further discussion of the underpinnings of this debated trend. In consequence, a critical need in evaluating the validity of these trends is required to promote exploration of GBM in a public health context.

Question

Interestingly, a letter to the editor by Phillips et al (2018b) had commented on the lack of increased GBM incidence in US data and attributed this difference to the under-projection of US populations by age groups in the US2000 standard population dataset. To test the claims, I hypothesize the incidence of glioblastoma in the US will show rises similar to that of Europe and Asia when utilizing cancer reports from the NIH SEER dataset and US Census data from 2010 to 2017. This retrospective analysis will further explore age groups specific trends in incidence and site of lesion incidence over time to confirm if more specific findings in the literature occur in US populations as well.

Data Sources

The below analysis will utilize both the NIH Surveillance, Epidemiology and End Results Program’s SEER 21 Cancer Dataset and US Census’s 2010-2019 National Population Dataset.

SEER 21 Cancer Dataset: This dataset was collected as part of the Surveillance, Epidemiology and End Results Program– a division within the NIH’s National Cancer Institute. Within the dataset, 11,135,914 varying types of tumors were recorded from 2000-2017 with annual updates to the dataset every spring. Participating locations which provided data for the national registry included San Francisco, Connecticut, Detroit, Hawaii, Iowa, New Mexico, Seattle, Utah, Atlanta, San Jose-Monterey, Los Angeles, Alaska Native Registry, Rural Georgia, California excluding SF/SJM/LA, Kentucky, Louisiana, New Jersey, Georgia excluding ATL/RG, Idaho, New York and Massachusetts. With each reported case, both qualitative data such as tumor type, location, treatment, patient race, sex, vital status and quantitative data such as tumor size, lesion number, survival months and more were provided. In total over 100 variables were included in the dataset; however, a majority of these variables– when filtered for GBM cases– had no information or were copies of other variables. In turn after removal of such variables, 10 variables were used. Qualitative: Patient Race, Sex, Vital Status, Site of Lesion and Cancer Type. Quantitative: Year of Diagnosis, Age, Survival Months, Tumor Count and Tumor Size. To recognize however, some of the main challenges with this dataset was the requirement to utilize the NIH SEER*STAT software to acquire any data which was difficult as the software only worked on Windows and required request to download. Moreover, as was already mentioned with the recorded data, while a majority of data was missing, even the selected 10 variables had large inconsistencies as tumor count was not started until 2013 and race was recorded as White, Black and all else was other. Nevertheless lesion site and patient age would be the primary focuses in this dataset. 

2010-2019 National Population Dataset: This data was generated as part of the Census Bureau’s Population Estimates Program. Within the data only quantitative counts of the US population stratified by age and gender is given. These estimates are released annually and based on the 2010 Census results and integrated with known birth, death and migration statistics to predict population before the next census in 2020. Datasets and methodology of calculation can be downloaded as a CSV from the census website. However, navigation to find the datasets were challenging due to website organization and data will need to undergo further filtration and binning to be formatted to work with the SEER 21 data.

Analysis Tool

All visualization and data manipulation occurred in RStudio. This program was selected due to its primary advantages of being able to reformat data structure, its large array of libraries to select visualizations that best fit the data and the ability to customize all aspects of the visualizations. However to recognize the disadvantages, users need large comfort in coding and R language, familiarity with vector/matrix manipulation and high organization of code and data labeling. In consequence, the program needs a lot of upfront time spent learning, which may be less valuable for smaller projects where time can be saved with a system with more pre-built user interfaces.

Results

Both the SEER 21 and Census dataset we read into R. SEER 21 data was first filtered for glioblastoma tumor type and further subsetted to the years 2010-2017. This was due to the limited time range that overlapped with the 2010-2019 Census data. Counts were then done for each year to calculate the total glioblastoma cases within each year and further stratified by age in 5 year bins. Census data was additionally filtered to include only 2010-2017 and binned by 5-year age bins to match the SEER 21 data. Data vectors were then divided to calculate the percent incident of GBM each year. Figure 1 Panel A shows the change in incidence rate in the whole US population as a function of time. Overall, Panel A shows that incidence has increased over time with a period of plateau in 2012-2014 and a decline in 2017. To further visualize this, Figure 1 Panel B displays the relative change in incidence rate of each year as compared to the incidence rate of the previous year. All years had shown some degree of increased incidence except for 2013 and 2017.

Figure 1 GBM Incidence in the US from 2010-2017 based upon reported cases of GBM from SEER21 and US Census Population Estimates

(A) Line graph of total GBM incidence rate from 2010-2017 based upon total population data. (B) Bar plots of relative change in incidence rate. Teal bars represent increased incidence from the previous year while red bars represent decreased incidence.

 

Looking more deeply at age stratified incidence, Figure 2 Panel A shows the change in GBM incidence over time broken between age at 5-year bins. Ages less than 30 have very low incidence in relation to more advanced ages, showing difficulty in observing appreciable shifts in incidence. Ages in the 60s-80s show some of the highest incidence, but also some of the greatest variability in reported incidence rate. Interestingly, incidence rate seems to fall in many advanced aged groups in 2017. Looking deeper, unlike the clear increasing trends that were shown in the overall population analysis, age stratified comparisons of relative incidence to the previous year were highly variable with no clear trend among multiple ages as a whole (Figure 2 Panels B-S).

Figure 2 Age-stratified Incidence rate of GBM from 2010-2017 based upon SEER 21 Cases and US Census Population Estimates

(A) Line graph of age stratified GBM incidence rate from 2010-2017 based upon 5-year binned population data. (B-S) Bar plots of relative change in incidence rate. Teal bars represent increased incidence from the previous year while red bars represent decreased incidence. GBM cases which occurred for individuals <1 year old were removed due to the rarity of occurrence in most years.

 

Further assaying trends in lesion location, incidence rates were re-stratified to see how changes in incidence had shifted based on lesion location. In Figure 3 Panel A, most locations were very low in incidence due to the uncommon nature of GBM tumors growing at these sites. However for most lesions it seems very little change in incidence could be observed. However, as seen in Figure 3 Panels A,E,J, both temporal and frontal lobe lesions saw rises in incidence with a sharp drop off in 2014 and a steady decrease from 2016-2017 in frontal lobe tumors, but not temporal lobe tumors.

Figure 3 Lesion Location-stratified Incidence rate of GBM from 2010-2017 based upon SEER 21 Cases and US Census Population Estimates

(A) Line graph of lesion location stratified GBM incidence rate from 2010-2017 based upon 5-year binned population data. (B-S) Bar plots of relative change in incidence rate. Teal bars represent increased incidence from the previous year while red bars represent decreased incidence. NOS represents lesions that were not otherwise specified.

 

Discussion

While not to the levels in some areas such as France, claiming 4 times the incidence in GBM, the results from the total population analysis did support the hypothesis that use of the US census data would show increased trends in GBM incidence in the US unlike what had been established by CBTRUS. (Phonegate, 2019). However, with this support comes multiple caveats that must be acknowledged with the data. Overall GBM incidence in the US is very low, with around only 3 cases for every 100,000 individuals (Tamimi & Juweid, 2017). In consequence, small deviations in the incidence rate are considered quite large due to the naturally low baseline value. Additionally, while the SEER 21 data was able to report thousands of cases, the limitation of 21 recording sites did prevent the capture of all GBM cases in the US. Moreover, the CBTRUS was able to analyze more cases in combination with the SEER dataset from separate sites belonging to the registry (Ostrom et al., 2018). These points were not used in the analysis as CBTRUS data required formal request and descriptions of the project before data may be sent by the organization. Additionally, as shown in Figure 2, the increased incidence in the total population had disappeared when populations were stratified by age– unlike what was claimed in European analyses.

Of interest however, in the total population trends, it could be seen a plateau and then decline in GBM incidence was noted in 2016-2017 (Figure 1). Moreover, age stratified incidence rates in Figure 2 showed similar drops in incidence in 2017 in more advanced age groups– the population where GBM is most commonly diagnosed (Tamimi & Juweid, 2017). This time period coincides with the release of the WHO update of CNS tumor classification that better defined the criteria for diagnosing gliomas and introduced molecular testing into the primary workflow (Louis et al., 2016). Thus it could be speculated that this drop off in incidence might support the notion that overdiagnosis was a primary driver for the observed incidence rise. This is further supported by the fact that many incidence studies of GBM currently perform retrospective analysis of data prior to or right up until the new update was implemented. Thus it will be important in analyzing these trends with more current data where the new WHO standards are in more common practice.

Finally, when looking at lesion location incidence, it was interesting to see the trends in frontal and temporal lobe tumors as these locations are the most commons sites of GBM (Tamimi & Juweid, 2017). While frontal lobe tumors showed similar trends in falling from 2016-2017, temporal lobe tumors actually were shown to still rise in incidence (Figure 3). Thus it would be interesting to understand if these trends in temporal lobe glioblastoma relate to any biological reason. One potential hypothesis may be the known association HPV has with initiating GBM and its predilection to create temporal lobe inflammatory lesions in the brain (Vidone et al., 2013). Moreover as these lesions more often affect younger adults this relationship may be easily lost in analysis due to the low incidence of GBM in young ages. Thus deeper study of temporal lesions as stratified by age and studies that better collect medical history such as HPV may prove valuable.

Overall however, the challenge in studying GBM incidence trend is difficult due to its naturally low incidence. However, unlike some experts claims that this issue is not of important discussion, the challenge in treating GBM and the need to gain more knowledge of the etiology of this disease warrants deeper analyses. More importantly, as learned by using the SEER 21 dataset, the coding and reporting of data is still quite messy and requires more stringent logging in order to more accurately reflect trends– especially in rare diseases.

 

References

Calderón-Garcidueñas, L., Solt, A. C., Henríquez-Roldán, C., Torres-Jardón, R., Nuse, B., Herritt, L., Villarreal-Calderón, R., Osnaya, N., Stone, I., García, R., Brooks, D. M., González-Maciel, A., Reynoso-Robles, R., Delgado-Chávez, R., & Reed, W. (2008). Long-term Air Pollution Exposure Is Associated with Neuroinflammation, an Altered Innate Immune Response, Disruption of the Blood-Brain Barrier, Ultrafine Particulate Deposition, and Accumulation of Amyloid β-42 and α-Synuclein in Children and Young Adults: Toxicologic Pathology. https://doi.org/10.1177/0192623307313011

Gousias, K., Markou, M., Voulgaris, S., Goussia, A., Voulgari, P., Bai, M., Polyzoidis, K., Kyritsis, A., & Alamanos, Y. (2009). Descriptive Epidemiology of Cerebral Gliomas in Northwest Greece and Study of Potential Predisposing Factors, 2005–2007. Neuroepidemiology, 33(2), 89–95. https://doi.org/10.1159/000222090

Grech N, Dalli T, Mizzi S, Meilak L, Calleja N, & Zrinzo A. (2020). Rising Incidence of Glioblastoma Multiforme in a Well-Defined Population. Cureus, 12(5). https://www.cureus.com/articles/31024-rising-incidence-of-glioblastoma-multiforme-in-a-well-defined-population

Korja, M., Raj, R., Seppä, K., Luostarinen, T., Malila, N., Seppälä, M., Mäenpää, H., & Pitkäniemi, J. (2019). Glioblastoma survival is improving despite increasing incidence rates: A nationwide study between 2000 and 2013 in Finland. Neuro-Oncology, 21(3), 370–379. https://doi.org/10.1093/neuonc/noy164

Lee, C. H., Jung, K. W., Yoo, H., Park, S., & Lee, S. H. (2010). Epidemiology of Primary Brain and Central Nervous System Tumors in Korea. Journal of Korean Neurosurgical Society, 48(2), 145–152. https://doi.org/2010.48.2.145

Louis, D. N., Perry, A., Reifenberger, G., von Deimling, A., Figarella-Branger, D., Cavenee, W. K., Ohgaki, H., Wiestler, O. D., Kleihues, P., & Ellison, D. W. (2016). The 2016 World Health Organization Classification of Tumors of the Central Nervous System: a summary. Acta neuropathologica, 131(6), 803–820. https://doi.org/10.1007/s00401-016-1545-1

Ostrom, Q. T., Bauchet, L., Davis, F. G., Deltour, I., Fisher, J. L., Langer, C. E., Pekmezci, M., Schwartzbaum, J. A., Turner, M. C., Walsh, K. M., Wrensch, M. R., & Barnholtz-Sloan, J. S. (2014). The epidemiology of glioma in adults: A “state of the science” review. Neuro-Oncology, 16(7), 896–913. https://doi.org/10.1093/neuonc/nou087

Ostrom, Q. T., Gittleman, H., Truitt, G., Boscia, A., Kruchko, C., & Barnholtz-Sloan, J. S. (2018). CBTRUS Statistical Report: Primary Brain and Other Central Nervous System Tumors Diagnosed in the United States in 2011-2015. Neuro-Oncology, 20(suppl_4), iv1–iv86. https://doi.org/10.1093/neuonc/noy131

Philips, A., Henshaw, D. L., Lamburn, G., & O’Carroll, M. J. (2018a). Brain Tumours: Rise in Glioblastoma Multiforme Incidence in England 1995–2015 Suggests an Adverse Environmental or Lifestyle Factor. Journal of Environmental and Public Health, 2018, e7910754. https://doi.org/10.1155/2018/7910754

Philips, A., Henshaw, D. L., Lamburn, G., & O’Carroll, M. J. (2018b, June 25). Authors’ Comment on “Brain Tumours: Rise in Glioblastoma Multiforme Incidence in England 1995–2015 Suggests an Adverse Environmental or Lifestyle Factor” (Vol. 2018, p. e2170208) [Letter to the Editor]. Hindawi. https://doi.org/10.1155/2018/2170208

Phonegate, E. (2019, November 18). [Press release] Brain cancers: 4 times more new cases of glioblastoma in 2018 according to Public Health France. Phonegate Alert. https://www.phonegatealert.org/en/press-release-brain-cancers-4-times-more-new-cases-of-glioblastoma-in-2018-according-to-public-health-france

Tamimi, A. F., & Juweid, M. (2017). Epidemiology and Outcome of Glioblastoma. In S. De Vleeschouwer (Ed.), Glioblastoma. Codon Publications. http://www.ncbi.nlm.nih.gov/books/NBK470003/

Vidone, M., Alessandrini, F., Marucci, G., Farnedi, A., de Biase, D., Ricceri, F., Calabrese, C., Kurelac, I., Porcelli, A. M., Cricca, M., & Gasparre, G. (2014). Evidence of association of human papillomavirus with prognosis worsening in glioblastoma multiforme. Neuro-oncology, 16(2), 298–302. https://doi.org/10.1093/neuonc/not140