A federated implementation of the limma removeBatchEffect
method.
Supports normalization, various input formats, multiple batches per client and secure computation.
fedRBE
applies limma’s batch effect removal in a federated setting — data remains with the client, and only summary information is shared. Multiple input formats and normalization methods are supported. For advanced parameters, see the Configuration section.
Before using fedRBE
, ensure:
pip install featurecloud
featurecloud controller start
# pull the pre-built image
featurecloud app download featurecloud.ai/bcorrect
or directly via Docker
docker pull featurecloud.ai/bcorrect:latest
docker build . -t featurecloud.ai/bcorrect:latest
or build the image from GitHub locally:
cd batchcorrection
docker build . -t featurecloud.ai/bcorrect:latest
The app image which is provided in the docker registry of featurecloud built on the linux/amd64 platform. Especially if you’re using a Macbook with any of the M-series chips or any other device not compatible with linux/amd64, please build the image locally.
To test how fedRBE
behaves with multiple datasets on one machine:
git clone https://github.com/Freddsle/fedRBE.git
cd fedRBE
featurecloud controller start --data-dir=./evaluation_data/simulated/mild_imbalanced/before/
# if you have the controller running in a different folder, stop it first
# featurecloud controller stop
featurecloud test start --app-image=featurecloud.ai/bcorrect:latest --client-dirs=lab1,lab2,lab3
Alternatively, you can start the experiment from the frontend
Select 3 clients, add lab1, lab2, lab3 respecitvely for the 3 clients to their path.
Use featurecloud.ai/bcorrect:latest
as the app image.
This runs an experiment bundled with the app, illustrating how fedRBE
works.
The given repository contains the app but furthermore includes all the experiments done with the app.
For an actual multi-party setting:
config.yml
to their local FeatureCloud instance.fedRBE
runs securely, never sharing raw data.See HOW TO GUIDE for guidance on creating and joining projects.
config.yml
: Configuration file controlling formats, normalization, and additional parameters.For details, see the Configuration section.
Each client after completion receives:
only_batch_corrected_data.csv
: Batch-corrected features.report.txt
: Includes:
Note: Output files use the same separator
defined in config.yml
.
Upload a config.yml
alongside your data. Adjust parameters as needed:
flimmaBatchCorrection:
data_filename: "lab_A_protein_groups_matrix.tsv"
# Main data file: either features x samples or samples x features.
design_filename: "lab_A_design.tsv"
# Optional design matrix: samples x covariates.
# Must have first column as sample indices.
# it is read in the following way:
# pd.read_csv(design_file_path, sep=seperator, index_col=0)
# should therefore be in the format samples x covariates
# with the first column being the sample indices
expression_file_flag: True
# True: data_file = features (rows) x samples (columns)
# False: data_file = samples (rows) x features (columns)
# format: boolean
index_col: "sample"
# If expression_file_flag True: index_col is the feature column name.
# If expression_file_flag False: index_col is the sample column name.
# If not given, defaults apply - the index is taken from the 0th column for
# expression files and generated automatically for samples x features datafiles
# format: str or int, int is interpreted as the column index (starting from 0)
covariates: ["Pyr"]
# Covariates included in the linear model.
# If no design file, covariates must be present as features in the data file.
separator: "\t"
# Separator for main data file.
design_separator: "\t"
# Separator for design file.
batch_col: "batch"
# Column name in the design file that contains batch information
# (if multiple batches present in one client).
# If not given, all client data is considered as one batch.
# format: str
normalizationMethod: "log2(x+1)"
# Normalization: "log2(x+1)" or None.
# If None, no normalization is applied.
# More options will be available in future versions.
smpc: True
# Enable secure multiparty computation for privacy-preserving aggregation.
# For more information see https://featurecloud.ai/assets/developer_documentation/privacy_preserving_techniques.html#smpc-secure-multiparty-computation
min_samples: 5 # format: int
# Minimum samples per feature required. Adjusted for privacy if needed.
# If for a feature less than min_samples samples are present,
# the client will not send any information about that feature
# Please note that the actual used min_samples might be different
# as for privacy reasons min_samples = max(min_samples, len(design.columns)+1)
# This is to ensure that a sent Xty matrix always has more samples
# than features so that neither X not y can be reconstructed from the Xty matrix.
position: 1 # format: int
# Defines client order. The last client in order is the reference batch.
# Example:
# C1(position=0), C2(position=2), C3(position=1) -> Order: C1, C3, C2 (C2 is reference).
# If empty/None, the order is random, making the batch correction run non deterministic
reference_batch: ""
# Explicitly set a reference batch (string) or leave empty.
# Conflicts in ordering/reference will halt execution.
The app has the following states:
This project is licensed under the Apache License 2.0.
If you use fedRBE
in your research, please cite our ArXiv preprint:
Burankova, Y., Klemm, J., Lohmann, J.J., Taheri, A., Probul, N., Baumbach, J. and Zolotareva, O., 2024. FedRBE–a decentralized privacy-preserving federated batch effect correction tool for omics data based on limma. arXiv preprint arXiv:2412.05894.
@misc{burankova2024fedrbedecentralizedprivacypreserving,
title={FedRBE -- a decentralized privacy-preserving federated batch effect correction tool for omics data based on limma},
author={Yuliya Burankova and Julian Klemm and Jens J. G. Lohmann and Ahmad Taheri and Niklas Probul and Jan Baumbach and Olga Zolotareva},
year={2024},
eprint={2412.05894},
archivePrefix={arXiv},
primaryClass={q-bio.QM},
url={https://arxiv.org/abs/2412.05894},
}