A machine learning-derived polygenic risk score reveals that healthy lifestyle counteracts obesity-related mortality
To the best of our knowledge, this is the first study to evaluate the association between the genetic risk of obesity, lifestyle factors, and all-cause mortality. The results of this population-based cohort study show that meeting four healthy lifestyle factors is associated with a reduced risk of death from a genetic risk of obesity.
We developed a model to quantitatively assess the genetic risk of obesity by generating OPRS using data from a large population, the UK Biobank cohort, and validated it using the physical examination data from the Nanfang Hospital external test cohort. This model was based on a machine learning (ML) algorithm. Stacked ML models have higher prediction performance and universality than single models and can reduce model overfitting. In the external test cohort, a moderate decline in model performance was observed, which may be partially attributable to differences in genetic background between the European population and the current test cohort—including potential heterogeneity in allele frequency distributions and SNP effect sizes. Furthermore, systematic disparities in lifestyle, cultural context, socioeconomic environment, and healthcare systems may contribute to a lower tendency toward obesity in Chinese populations compared to European populations at similar genetic risk levels, thereby affecting the model’s generalizability. Despite this, our model maintained a significant positive correlation between the OPRS and obesity phenotypes in the external test cohort. This finding suggests that genetic risk retains a certain degree of transferability across populations.
A key feature of our ML-derived OPRS is its methodological difference from classical polygenic risk score (PRS). While both aim to aggregate genetic susceptibility, our stacked ML model integrates multiple algorithms and is specifically designed to capture non-linear and complex interactions among SNPs, thereby potentially improving predictive performance. In terms of predictive ability, the OPRS achieved AUCs of 0.621, 0.616, and 0.565 in the training, internal, and external test sets, respectively, which are comparable to recently reported obesity PRS models with AUCs ranging between 0.55 and 0.6223,24,25. This indicates that, despite its more complex architecture, the OPRS performs similarly to conventional PRS in terms of discriminative accuracy. However, a distinctive advantage of our model is its demonstrated generalizability across ethnicities, as evidenced by the external validation in a Chinese cohort—a feature often lacking in classical PRS derived from single-ancestry populations. We acknowledge that our model requires greater computational resources during the training phase compared to classical weighted-sum polygenic risk scores; however, its individual-level prediction process is fast and automatable. Once generated, the OPRS can serve as a static, long-term risk indicator, facilitating its translation and application in clinical settings. The OPRS, as an artificial intelligence-derived marker quantifying the genetic risk of obesity, retains significant value for precision prevention despite its moderate discriminative power at the individual level in external validation. Its primary utility lies in enabling population risk stratification, serving as a proactive screening tool to identify individuals—including those not currently diagnosed with obesity—with high genetic susceptibility and elevated mortality risk, thereby allowing early and targeted risk assessment. Importantly, it supports personalized intervention by providing individuals with actionable risk information and, crucially, motivational evidence from this study that adherence to a healthy lifestyle can substantially mitigate this inherited risk. Thus, the OPRS functions as an effective tool for motivating behavioral change and can guide clinicians in offering tailored recommendations for obesity risk management. Moreover, to minimize the burden on clinicians, we envision creating a user-friendly, web-based OPRS calculator or integrating the model into Electronic Health Record (EHR) systems to streamline its application. Future prospective studies should validate these pathways in diverse populations to facilitate broader implementation in clinical guidelines. However, deployment and application of the OPRS requires stringent data protection measures to safeguard participant privacy, including compliance with international standards such as the General Data Protection Regulation (GDPR). Additionally, to minimize psychological harm, risk communication should be framed within the context of modifiable lifestyle factors, empowering individuals rather than inducing fatalism. And ongoing efforts to validate the model in different ethnic groups will help mitigate biases and promote equitable access to genetic risk assessment.
This study found that the 508 genes constituting the OPRS were significantly enriched across multiple biological layers. In terms of biological processes, they converge on three core axes: transcriptional regulation, central nervous system function, and metabolic signaling. At the cellular component level, they localize to key structures such as neuronal synapses and chromatin. Regarding molecular function, they are primarily concentrated in transcription factor and kinase activities (Supplementary Fig. 13). This indicates that the genetic risk captured by the OPRS may drive obesity pathogenesis through alterations in central nervous system signaling, aligning with previous findings that obesity-associated genetic variants are predominantly expressed in the central nervous system26,27.
Several studies have evaluated the risk of obesity and its complications by constructing polygenic risk scores (PRS). A study on the BMI-related PRS in a Korean population showed that a higher PRS was associated with a higher incidence of obesity and related diseases28. Another study on the BMI-related PRS constructed using data from the US population showed that the participants with a high PRS had a higher mortality risk than those with a low PRS29. Neither study further discussed the impact of lifestyle. One study used the UK Biobank to assess the association of genetic risk, lifestyle, and their interaction with obesity and obesity-related morbidities (ORMs)30. This study found that after adjusting for the measured BMI, the association of PRS with ORMs was null. However, our research discovered that OPRS can independently predict the risk of mortality, irrespective of the measured BMI. Compared to previous studies, our study is the first to use ML methods to construct OPRS to assess all-cause mortality risk and to discuss the impact of lifestyle.
Among individuals with normal weight and those with overweight, our research revealed a linear relationship between OPRS and all-cause mortality. This finding is particularly useful for clinicians who can quantitatively assess the mortality risk of individuals caused by a genetic predisposition to obesity. In the internal test cohort, individuals with obesity showed a non-linear relationship between OPRS and all-cause mortality. One possible explanation is that when OPRS values are low, there are fewer genetic variants for cumulative risk and less of an effect on mortality risk, requiring a larger sample size to observe this difference. Another possible explanation is that at low OPRS values, potential nongenetic factors are the primary drivers of obesity, and obesity driven by these potential non-genetic factors also contributes to an elevated risk of all-cause mortality29. Additionally, when the study population is restricted to individuals with obesity, collision bias may arise between OPRS and other obesity-related risk factors, leading to a spurious negative association. If these risk factors are also linked to mortality, then this could result in an increased risk of death at lower OPRS levels. High OPRS were associated with an increased risk of all-cause mortality, which helps identify high-risk individuals. As the prevalence of obesity continues to increase, it is becoming increasingly difficult to contain this trend. Therefore, prioritizing high-risk individuals with limited medical resources is crucial to achieving maximum health benefits.
People with high genetic risk of obesity tend to have an increased risk of all-cause mortality even if they meet four healthy lifestyle factors. Although this is not statistically significant, it may not completely counterbalance the increased mortality risk associated with a high genetic risk of obesity. A study based on the UK Biobank showed that adhering to four healthy lifestyle factors cannot completely offset the health risks of obesity31. However, even if it cannot completely offset the effects of genetic risk for obesity, adopting a healthy lifestyle should still be encouraged to mitigate the associated risk of mortality. Many studies have emphasized the benefits of maintaining a normal weight to prevent premature death32,33,34,35,36. Therefore, to reduce the risk of mortality in individuals with a high genetic risk of obesity, it is important to promote healthy lifestyle habits and maintain a normal weight.
This study had a few limitations. First, causal inferences cannot be drawn because of the observational nature of the study. Second, the model was developed using data collected in Europe and externally validated using a cohort comprising the Chinese population, which may have resulted in dataset transfer issues. Third, the use of self-reported data for exposure factors, such as smoking, alcohol consumption, physical activity, and diet, may have led to an underestimation of their true relevance and residual confounding factors. Further prospective studies in different populations are needed to validate the health effects of OPRS, considering factors such as cost-effectiveness, resource allocation, and acceptance in clinical settings, which are critical for the clinical application of OPRS. Furthermore, even after adjusting for a range of covariates, the possibility of residual confounding factors such as socioeconomic status, mental health status, and dietary patterns cannot be ruled out. Fourth, our analyses of the interactions between genetic risk and lifestyle factors, and of the protective effects of a healthy lifestyle, were conducted entirely within the UK Biobank cohort. The lifestyle definitions used are based on a Western context. Our external Chinese validation cohort lacked comparable lifestyle data; therefore, we could not examine whether the observed lifestyle benefits generalize across different cultural and environmental settings. Future studies that incorporate culturally adapted lifestyle assessments for diverse populations are needed to confirm the universality of these interactions.
In summary, we established and validated a reliable and practical ML model based on genetics and constructed an OPRS. High OPRS were associated with an increased risk of all-cause mortality. Individuals with a high genetic risk of obesity can reduce their all-cause mortality risk by adopting a healthy lifestyle. These efforts can help identify individuals with a high genetic risk of obesity and provide them with appropriate management advice, ultimately reducing their mortality risk. To further advance the clinical application of OPRS, future studies should include trans-ethnic validation across diverse populations to establish generalizability, as well as prospective interventional trials to evaluate the efficacy of targeted interventions in OPRS-identified high-risk individuals.
link
