This guide is designed for beginners who want a quick and easy way to start using fedRBE
and test its functionality.
For more technical details and advanced usage and specific implementation details, please refer to the main README file.
fedRBE
allows you to remove batch effects from data in a federated manner, ensuring data privacy.
For a more formal description and details, see the fedRBE’s preprint on ArXiv.
Prerequisites (see README for details):
For installation and setup details, see the main README.
Below is a simplified workflow of how to use fedRBE
:
config.yml
file.You need two main inputs:
config.yml
for custom settings
Minimal Example Directory Structure:
client_folder/
├─ config.yml
├─ expression_data.csv
├─ design.csv
If you want to simulate a federated workflow on a single machine, you can use the provided sample data and test script. In this case, you need to create at least three folders, each with the sample data and a config.yml
file (for example, clientA
, clientB
, clientC
folders).
Example config.yml
snippet:
flimmaBatchCorrection:
data_filename: "expression_data_client1.csv"
expression_file_flag: False # True if data is in samples x features format
index_col: "GeneIDs" # Column name to use as index
covariates: ["Pyr"] # Covariates column name to include in the design matrix
separator: "," # Separator used in the data file
design_separator: "," # Separator used in the design file
normalizationMethod: "log2(x+1)" # Normalization method or log transformation
smpc: True # Recommended to set to True
min_samples: 2 # Minimum number of samples to include a feature
position: 1 # position of the client (first, second, third, etc.)
reference_batch: "" # if True, this client is used as the reference batch
For more details on the config.yml
parameters, see the main README.
Scenario: Three clients (A, B, and C) collaborate on a federated analysis. Video tutorial: link.
expression_data_client.csv
and config.yml
in a local folder.config.yml
parameters as needed (e.g., change data_filename
to match the correct file name).design.csv
file with batch information and specify this column name in the config.yml
parameter batch_col
.
After completion, each client finds:
only_batch_corrected_data.csv
: The batch-corrected expression data.report.txt
: Details on excluded features, beta values, and the used design matrix.If you’d like to test everything on one machine, you can run the provided sample data and test script. This simulates multiple clients locally, so you can see the federated workflow in action without needing multiple machines.
For instructions, see the Local Test Simulation guide.
config.yml
and data files are in the same directory.expression_file_flag
and index_col
are set correctly based on your data orientation.report.txt
and logs.Depending on how your data is structured, you must correctly set expression_file_flag
in your config.yml
:
If your file is features (rows) x samples (columns):
expression_file_flag: True
and index_col: <feature_id_column>
If your file is samples (rows) x features (columns):
expression_file_flag: False
and index_col: <sample_id_column>
If you have additional covariates (e.g., age, treatment type) that might influence your data, you can include them either directly in the design_filename
file or list them in your config.yml
under covariates
. If no separate design file is provided, these covariates must exist as features in the main data file.
Example:
covariates: ["Age", "Treatment"]
fedRBE
needs a reference batch to align the other batches against. By default, if no reference_batch
is set, it uses the last client in the positional order defined by the position
parameter. If all parameters are unset, it may choose a batch at random, resulting in non-deterministic runs.
Example:
position: 2
reference_batch: ""