The digitalization of health care leads to the accumulation of huge amounts of biomedical data that is used in clinical research and studies to uncover therapies, treatments, or novel biomarkers. One important set of tools in clinical research is time-to-event analysis. These kinds of algorithms are used to analyze censored data. For such data, the exact time of an event is not known, since the event does not necessarily occur during observation time. These and other biomedical and clinical datasets are typically collected centrally at a single institution and then analyzed using statistical methods or machine learning. For gathering larger amounts of data, data sharing with a central institution is necessary. However, current privacy regulations make it difficult to share sensitive data with other institutions and gather them at a central instance. To address this issue, recently, a novel approach known as federated learning was introduced. Federated learning enables the application of machine learning on geographically distributed datasets. Therefore, the raw data of each institution stays locally and only model parameters or summary statistics are shared with a central aggregator. Despite recent advances in this field, there are still only a few accessible and privacy-preserving solutions for biomedical research, especially in time-to-event analysis.
The results of this cumulative dissertation are based on three main publications. The first publication introduces Partea, a platform for privacy-aware time-to-event analysis. Partea incorporates the most commonly employed time-to-event techniques and makes them accessible through a graphical user interface without requiring any programming expertise. The second publication describes FeatureCloud, a federated learning platform that goes beyond time-to-event analysis and enables both the use and development of federated learning algorithms by providing the necessary infrastructure. Finally, in the third publication, FeatureCloud was used to develop and evaluate a federated survival support vector machine for the analysis of distributed time-to-event data.
The developed methods and tools in this work extend existing approaches for analyzing time-to-event data on decentralized datasets and are directly accessible to researchers, statisticians, and clinicians. Furthermore, the dissertation demonstrates that federated learning algorithms possess the capability to attain a comparable level of accuracy on distributed datasets as compared to the original algorithms that solely operate on centrally collected datasets. By providing a broader set of algorithms, implementing privacy-enhancing technologies and providing user-friendly interfaces, the results of this dissertation increase the accessibility of federated learning in biomedical and clinical research environments and reduce the hurdles of complex federated learning infrastructures.