fedRBE

fedRBE Documentation

How-To-Start Guide Reproduce the Paper Tool README Local Tests

HowTo Guide Documentation GitHub FeatureCloud App

Federated limma remove batch effect (fedRBE)

License ArXiv


Table of Contents


Architecture overview

The Federated Limma Remove Batch Effect (fedRBE) is a federated implementation of the limma removeBatchEffect algorithm, developed within the FeatureCloud platform. It enables privacy-preserving batch effect correction by keeping raw data decentralized and utilizing Secure Multiparty Computation (SMPC) for secure data aggregation.

fedRBE allows multiple participants to collaboratively remove batch effects from their data without sharing raw data, ensuring privacy. It effectively eliminates non-biological variations arising from different sources such as labs, time points, or technologies, using limma’s removeBatchEffect. The tool supports various data formats and seamlessly integrates with the FeatureCloud platform for streamlined workflow management.

fedRBE app states
fedRBE app states. Source: ArXiv 2412.05894

The repository serves two main purposes:

fedRBE architecture
fedRBE architecture. Source: ArXiv 2412.05894

You can access and use the fedRBE app directly on FeatureCloud.

For detailed usage instructions and implementation information, refer to the How To Guide and the README.

For a comprehensive overview of the workflow, please consult the How To Guide.


Installation

Prerequisites

Before installing fedRBE, ensure you have the following installed:

  1. Docker: Installation Instructions
  2. Python 3.8+: Installation Instructions
  3. Python dependencies:
    pip install -r requirements.txt
    

Additional requirements depend on the use case:

For Windows users, we recommend using WSL.

Clone the repository

If you want to run the simulations locally, clone the repository (or check Quick Start below):

git clone https://github.com/Freddsle/fedRBE.git
cd fedRBE

This will clone the repository to your local machine with example files and simulation scripts.


Usage

Quick start

If you simply want to try out fedRBE quickly:

  1. Make sure the prerequisites are fulfilled!
  2. Run the sample experiment script:
    python3 run_sample_experiment.py
    

    This will run fedRBE on an example simulated data dataset.

Furthermore, there are detailed instructions for more specific, non sample data usage:

  1. For a step-by-step detailed instructions on how to start collaboration using multiple machines, refer to the How To Guide
  2. For a step-by-step instructions on how to generally simulate collaboration via test environment, refer to the Local Test Guide

Glossary & further resources

For more advanced configurations and detailed explanations, see the app README and the ArXiv preprint.

If you encounter difficulties, please:


Input and Output

For files preparation, format, config file, and output details, refer to the How To Guide.

In summary, you need two main inputs and one optional file:

Required files figure
Input files required for fedRBE.

Output files include:


Configuration

fedRBE is highly configurable via the config.yml file. This file controls data formats, normalization methods, and other essential parameters.

Example config.yml:

   flimmaBatchCorrection:
      data_filename: "expression_data_client1.csv"
      expression_file_flag: False
      index_col: "GeneIDs"
      covariates: ["Pyr"]
      separator: ","
      design_separator: ","
      normalizationMethod: "log2(x+1)"
      smpc: True
      min_samples: 2
      position: 1
      reference_batch: ""

For a comprehensive list of configuration options, refer to the Configuration Section in the batchcorrection README.


Reproducing the paper

This repository includes all necessary code and data to reproduce the analyses presented in our ArXiv preprint.

For detailed instructions on reproducing the paper, refer to the Reproducibility Guide.


Single-machine simulation

To simulate a federated workflow on a single machine using provided sample data:

Option 1: Using the helper python script

If you just want to run fedRBE with sample data, please refer to the Quick Start.

Option 2: Using the FeatureCloud Simulation framework

Please refer to the Local Test Guide for how to run a simulation using any correctly formatted test data.


Troubleshooting

Encountering issues? Here are some common problems and their solutions:

For detailed troubleshooting tips, refer to the How To Guide.

License

This project is licensed under the Apache License 2.0.


How to cite

If you use fedRBE in your research, please cite our ArXiv preprint:

Burankova, Y., Klemm, J., Lohmann, J.J., Taheri, A., Probul, N., Baumbach, J. and Zolotareva, O., 2024. FedRBE–a decentralized privacy-preserving federated batch effect correction tool for omics data based on limma. arXiv preprint arXiv:2412.05894.

   @misc{burankova2024fedrbedecentralizedprivacypreserving,
         title={FedRBE -- a decentralized privacy-preserving federated batch effect correction tool for omics data based on limma}, 
         author={Yuliya Burankova and Julian Klemm and Jens J. G. Lohmann and Ahmad Taheri and Niklas Probul and Jan Baumbach and Olga Zolotareva},
         year={2024},
         eprint={2412.05894},
         archivePrefix={arXiv},
         primaryClass={q-bio.QM},
         url={https://arxiv.org/abs/2412.05894}, 
   }

Contact information

For questions, issues, or support, please open an issue on the GitHub repository.