Preface Disease early detection and prevention offer numerous benefits to both our health and society. Often, the earlier a disease is detected, the higher the likelihood of successful cure or management. Managing a disease in its early stages can significantly reduce its impact on a patient’s quality of life and decrease healthcare costs. To detect a disease early, disease screening has become a popular tool. This method aims to determine the likelihood of a given patient having a particular disease by applying medical procedures or tests to check the major risk factors, even in patients without obvious symptoms of the disease. While disease screening primarily focuses on individual patients, disease surveillance is for detecting disease outbreaks early within a given population. For example, our society faces constant threats from bioterrorist attacks and pandemic influenza. It is thus important to monitor the incidence of infectious diseases continuously and detect their outbreaks promptly. This allows governments and individuals to implement timely disease control and prevention measures, minimizing the impact of these diseases. This book introduces some recent analytic methodologies and software packages developed for effective disease screening and disease surveillance. My exploration into disease screening was motivated by an experience around 2010 when I analyzed a dataset from the Framingham Heart Study (FHS). The FHS primarily aims to identify major risk factors for cardiovascular diseases (CVDs), and numerous CVD risk factors have been recognized since the study's inception in 1948, including smoking, high blood pressure, obesity, high cholesterol levels, physical inactivity, and more. During my data analysis, a pivotal question emerged: Could the identified CVD risk factors be utilized to predict the likelihood of a severe CVD, such as stroke, for individual patients? Statistically, this translates into a sequential decision-making problem, where the relevant statistical tool is the statistical process control (SPC) charts. However, traditional SPC charts, developed primarily for monitoring production lines in manufacturing, assume independence and identical distribution of process observations when the process is in-control (IC), and are designed for monitoring a single sequential process. In the context of disease screening, observed data of a patient's disease risk factors would rarely be independent and identically distributed over time and treating a patient's observed data as a process introduces numerous processes of different patients, making traditional SPC charts unsuitable to use. Recognizing the importance of the disease screening problem, I dedicated much of the past decade to addressing this issue. This endeavor led to the development of a series of new concepts and methods by my research team. The central methodology, termed the Dynamic Screening System (DySS), operates as follows: firstly, the regular longitudinal pattern of disease risk factors is estimated from a pre-collected dataset representing the population without the target disease. Subsequently, a patient's observed pattern of disease risk factors is cross-sectionally compared with the estimated regular longitudinal pattern at each observation time. The cumulative difference between the two patterns up to the current time is then employed to determine the patient's disease status at that time. DySS utilizes all historical data of the patient in its decision-making, and effectively accommodates the complex data structure, including time-varying data distribution. In the summer of 2013, upon joining the University of Florida (UF), I started to work on the pressing issue of disease surveillance due to its paramount importance in public health. Disease incidence data are typically collected sequentially over time and across multiple locations or regions, constituting spatio-temporal data. Similar to disease screening, disease surveillance is a sequential decision-making problem. However, its complexity arises from the intricate spatio-temporal data structure, encompassing seasonality, temporal/spatial variation, data correlation, and intricate data distribution. Common disease reporting and surveillance systems incorporate conventional SPC charts such as the cumulative sum (CUSUM) and exponentially weighted moving average (EWMA) charts. Additionally, retrospective methods like scan tests and generalized linear modeling approaches are employed for routine surveillance. Unfortunately, these methods often prove ineffective or unreliable due to their inability to handle the sequential nature of the problem or their restrictive model assumptions (cf., Section 2.7 and Chapters 7 and 8). Over the past decade, my research team has devoted significant effort to this domain, resulting in the development of several novel analytic methods for disease surveillance. Our initial method operates as follows: First, a nonparametric spatio-temporal modeling approach is employed to estimate the regular spatio-temporal pattern of disease incidence rates from observed data in a baseline time interval (e.g., a previous year without outbreaks). Second, the new spatial data collected at the current time are compared with the estimated regular pattern and decorrelated with all previous data. Third, an SPC chart is then applied to the decorrelated data to determine the occurrence of a disease outbreak by the current time. Modified versions of this method have been crafted to incorporate covariate information and accommodate specific spatial features of disease outbreaks. These methods adeptly handle the complex structure of observed data and have demonstrated effectiveness in disease surveillance. As discussed earlier, both disease screening and disease surveillance pose challenges as sequential decision-making problems, and traditional SPC charts prove unreliable in addressing them adequately. Consequently, disease screening and disease surveillance emerge as crucial applications of SPC, demanding the development of new methods tailored to their specific requirements. Fortuitously, my research journey in SPC began in 1998, allowing me to contribute significantly to several key areas within the field. Notable contributions include advancements in nonparametric process monitoring (e.g., Qiu and Hawkins 2001, Qiu 2018), monitoring correlated data (e.g., Qiu et al. 2020a, Xue and Qiu 2021), dynamic process monitoring (e.g., Qiu and Xiang 2014, Xie and Qiu 2023a), profile monitoring (e.g., Qiu et al. 2010, Zhou and Qiu 2022), and more. For a comprehensive description of SPC and some SPC charts developed by my research group, see the book Qiu (2014). This extensive experience has proven invaluable in my exploration of disease screening and disease surveillance, providing a robust foundation to innovate and tailor SPC methodologies to the distinctive challenges presented in these critical areas of public health. The book comprises nine chapters. In Chapter 1, a concise introduction sets the stage for understanding the challenges posed by disease screening and surveillance problems. Chapter 2 delves into fundamental statistical concepts and methods commonly employed in data modeling and analysis. Given that disease screening and surveillance involve sequential decision-making, Chapter 3 is dedicated to introducing essential SPC concepts and methods -- a major statistical tool for such problems. Chapters 4-6 focus on recent developments in DySS methods tailored for effective disease screening. Chapter 4 covers univariate and multivariate DySS methods based on direct monitoring of observed disease risk factors, while Chapter 5 introduces methods based on disease risk quantification and sequential monitoring of quantified disease risks. The practical implementation of DySS methods by the R package DySS is detailed in Chapter 6. Chapters 7-9 shift the focus to disease surveillance. Chapter 7 explores traditional methods utilizing the Knox test, scan statistics, and generalized linear modeling. Chapter 8 presents recent methods developed by my research team based on nonparametric spatio-temporal data modeling and monitoring. The implementation of these methods is demonstrated using the R package SpTe2M in Chapter 9. This book serves as an ideal primary textbook for a one-semester course focused on disease screening and/or disease surveillance, tailored for graduate students in biostatistics, bioinformatics, health data science, and related disciplines. Additionally, the book can be utilized as a supplementary textbook for courses covering analytic methods and tools relevant to medical and public health studies. Its content is designed to be accessible and beneficial for medical and public health researchers and practitioners. By introducing recent analytic tools for disease screening and surveillance, the book equips readers with valuable insights that can be easily implemented using the accompanying R packages DySS and SpTe2M. I extend my sincere gratitude to my current and former students and collaborators, Drs. Jun Li, Dongdong Xiang, Kai Yang, Lu You, and Jingnan Zhang, whose dedicated efforts, stimulating discussions, and constructive comments have played an invaluable role in the completion of this book. Their patience and insights have been indispensable. I express my deep appreciation to Dr. Xiulin Xie and Mr. Zibo Tian, who generously dedicated their time to reading the entire book manuscript and diligently corrected numerous typos and mistakes. Completing this book has been a three-year journey, and I owe a debt of gratitude to my wife, Yan, for providing unwavering help and support. Her efforts in managing household responsibilities and caring for our two sons, Andrew and Alan, allowed me to focus on this project. I extend my heartfelt thanks to my family for their love and constant support throughout this endeavor. Peihua Qiu Gainesville, Florida November 2023