fedRBE

HowTo Guide Documentation GitHub FeatureCloud App

fedRBE - FeatureCloud

A federated implementation of the limma removeBatchEffect method. Supports normalization, various input formats, multiple batches per client and secure computation.


Table of Contents


Overview

fedRBE applies limma’s batch effect removal in a federated setting — data remains with the client, and only summary information is shared. Multiple input formats and normalization methods are supported. For advanced parameters, see the Configuration section.


Prerequisites and setup

Before using fedRBE, ensure:

  1. Docker is installed (FeatureCloud prerequisites).
  2. FeatureCloud CLI:
    pip install featurecloud
    featurecloud controller start
    
  3. App Image:
    • For linux/amd64:
      # pull the pre-built image
      featurecloud app download featurecloud.ai/bcorrect
      

      or directly via Docker

      docker pull featurecloud.ai/bcorrect:latest
      
    • Alternatively, If you are using a ARM architecture (e.g., Mac M-series), you may need to build the image locally as shown below._
      docker build . -t featurecloud.ai/bcorrect:latest
      

      or build the image from GitHub locally:

       cd batchcorrection
       docker build . -t featurecloud.ai/bcorrect:latest
      

The app image which is provided in the docker registry of featurecloud built on the linux/amd64 platform. Especially if you’re using a Macbook with any of the M-series chips or any other device not compatible with linux/amd64, please build the image locally.


Usage

Simulating a federated Workflow Locally

To test how fedRBE behaves with multiple datasets on one machine:

  1. Ensure the full repository including sample data is cloned and the current working directory:
    git clone https://github.com/Freddsle/fedRBE.git
    cd fedRBE
    
  2. Start the FeatureCloud Controller with the correct input folder:
    featurecloud controller start --data-dir=./evaluation_data/simulated/mild_imbalanced/before/
    
  3. Run a Sample Experiment:
    # if you have the controller running in a different folder, stop it first
    # featurecloud controller stop 
    featurecloud test start --app-image=featurecloud.ai/bcorrect:latest --client-dirs=lab1,lab2,lab3
    

    Alternatively, you can start the experiment from the frontend

    Select 3 clients, add lab1, lab2, lab3 respecitvely for the 3 clients to their path.

    Use featurecloud.ai/bcorrect:latest as the app image.

This runs an experiment bundled with the app, illustrating how fedRBE works. The given repository contains the app but furthermore includes all the experiments done with the app.

Running a true federated workflow

For an actual multi-party setting:

  1. Create a Project in FeatureCloud and invite at least 3 clients.
  2. Clients Join with Tokens provided by the coordinator.
  3. Each Client uploads their data and config.yml to their local FeatureCloud instance.
  4. Start the Project: fedRBE runs securely, never sharing raw data.

See HOW TO GUIDE for guidance on creating and joining projects.


Input requirements

For details, see the Configuration section.


Outputs

Each client after completion receives:

Note: Output files use the same separator defined in config.yml.


Configuration (config.yml)

Upload a config.yml alongside your data. Adjust parameters as needed:

flimmaBatchCorrection:
  data_filename: "lab_A_protein_groups_matrix.tsv"
    # Main data file: either features x samples or samples x features.

  design_filename: "lab_A_design.tsv"
    # Optional design matrix: samples x covariates.
    # Must have first column as sample indices.
    # it is read in the following way:
    # pd.read_csv(design_file_path, sep=seperator, index_col=0)
    # should therefore be in the format samples x covariates
    # with the first column being the sample indices

  expression_file_flag: True
    # True: data_file = features (rows) x samples (columns)
    # False: data_file = samples (rows) x features (columns)
    # format: boolean

  index_col: "sample"
    # If expression_file_flag True: index_col is the feature column name.
    # If expression_file_flag False: index_col is the sample column name.
    # If not given, defaults apply - the index is taken from the 0th column for
    # expression files and generated automatically for samples x features datafiles
    # format: str or int, int is interpreted as the column index (starting from 0)

  covariates: ["Pyr"]
    # Covariates included in the linear model.
    # If no design file, covariates must be present as features in the data file.

  separator: "\t"
    # Separator for main data file.

  design_separator: "\t"
    # Separator for design file.

  batch_col: "batch"
    # Column name in the design file that contains batch information 
    # (if multiple batches present in one client).
    # If not given, all client data is considered as one batch.
    # format: str

  normalizationMethod: "log2(x+1)"
    # Normalization: "log2(x+1)" or None.
    # If None, no normalization is applied.
    # More options will be available in future versions.

  smpc: True
    # Enable secure multiparty computation for privacy-preserving aggregation.
    # For more information see https://featurecloud.ai/assets/developer_documentation/privacy_preserving_techniques.html#smpc-secure-multiparty-computation

  min_samples: 5      # format: int
    # Minimum samples per feature required. Adjusted for privacy if needed.
    # If for a feature less than min_samples samples are present,
    # the client will not send any information about that feature
    # Please note that the actual used min_samples might be different
    # as for privacy reasons min_samples = max(min_samples, len(design.columns)+1)
    # This is to ensure that a sent Xty matrix always has more samples
    # than features so that neither X not y can be reconstructed from the Xty matrix.

  position: 1      # format: int
    # Defines client order. The last client in order is the reference batch.
    # Example:
    #  C1(position=0), C2(position=2), C3(position=1) -> Order: C1, C3, C2 (C2 is reference).
    # If empty/None, the order is random, making the batch correction run non deterministic

  reference_batch: ""
    # Explicitly set a reference batch (string) or leave empty.
    # Conflicts in ordering/reference will halt execution.

FeatureCloud App states

The app has the following states:

fedRBE app states


Additional resources

License

This project is licensed under the Apache License 2.0.

How to cite

If you use fedRBE in your research, please cite our ArXiv preprint:

Burankova, Y., Klemm, J., Lohmann, J.J., Taheri, A., Probul, N., Baumbach, J. and Zolotareva, O., 2024. FedRBE–a decentralized privacy-preserving federated batch effect correction tool for omics data based on limma. arXiv preprint arXiv:2412.05894.

   @misc{burankova2024fedrbedecentralizedprivacypreserving,
         title={FedRBE -- a decentralized privacy-preserving federated batch effect correction tool for omics data based on limma}, 
         author={Yuliya Burankova and Julian Klemm and Jens J. G. Lohmann and Ahmad Taheri and Niklas Probul and Jan Baumbach and Olga Zolotareva},
         year={2024},
         eprint={2412.05894},
         archivePrefix={arXiv},
         primaryClass={q-bio.QM},
         url={https://arxiv.org/abs/2412.05894}, 
   }