This guide provides step-by-step instructions to reproduce the analyses and results from the fedRBE preprint. It leverages the utility scripts and data provided in this repository to demonstrate both centralized and federated batch effect correction using limma and fedRBE, respectively.
Before you begin, ensure you have the following installed and configured:
You can install the necessary packages using Conda/Mamba or manually via pip
and R
requirements files.
We suggest using Conda/Mamba for a consistent environment setup.
environment.yml
This is the recommended method as it sets up both Python and R dependencies in a single environment.
Clone the Repository:
git clone https://github.com/your-username/your-repo.git
cd your-repo
Create and Activate the Mamba/Conda Environment:
mamba env create -f environment.yml
mamba activate fedRBE
pip
and R
requirements filesIf you prefer not to use Mamba, you can install Python and R dependencies separately.
Clone the Repository:
git clone https://github.com/your-username/your-repo.git
cd your-repo
Set Up Python Environment:
Create a Virtual Environment:
python3 -m venv fedrbe_env
source fedrbe_env/bin/activate # On Windows: fedrbe_env\Scripts\activate
Upgrade pip
:
pip install --upgrade pip
Install Python Dependencies:
pip install -r requirements.txt
Set Up R Environment:
Install R Packages:
Open R and run the following commands:
install.packages("remotes") # If not already installed
remotes::install_deps("requirements_r.txt", repos = "http://cran.rstudio.com/")
Alternatively, you can use the requirements_r.txt
with a script:
Rscript install_packages.R
Where install_packages.R
contains:
packages <- readLines("requirements_r.txt")
install.packages(packages, repos = "http://cran.rstudio.com/")
Note: Ensure you have an active internet connection for installing R packages.
Understanding the repository layout helps in navigating the files and scripts.
fedRBE/
├── README.md # General repository overview
├── batchcorrection/ # fedRBe FeatureCloud app
├── evaluation_data/ # Data used for evaluation
│ ├── microarray/ # Microarray datasets
├── before/ # Uncorrected data with structure needed to run the app
├── after/ # Corrected data
│ │ └── 01_Preprocessing_and_RBE.ipynb # Data preparation notebook with centralized removeBatchEffect run
│ ├── microbiome_v2/ # Microbiome datasets with similar structure as microarray
│ ├── proteomics/ # Proteomics datasets
│ ├── proteomics_multibatch/ # Multi-batch proteomics datasets (several ba)
│ └── simulated/ # Simulated datasets
├── evaluation_utils/ # Utility scripts for evaluations
│ ├── analyse_fedvscentral.py
│ ├── debugging_analyse_experiments.py
│ ├── evaluation_funcs.R
│ ├── featurecloud_api_extension.py
│ ├── fedRBE_simulation_scrip_simdata.py
│ ├── filtering.R
│ ├── get_federated_corrected_data.py
│ ├── plots_eda.R
│ ├── run_sample_experiment.py
│ ├── simulation_func.R
│ ├── upset_plot.py
│ └── utils_analyse.py
├── evaluation/ # Main evaluation scripts to produce results and figures
│ ├── eval_simulation/ # Evaluations on simulated data
│ ├── evaluation_microarray.ipynb # Evaluation of microarray datasets
│ ├── evaluation_microbiome.ipynb
│ ├── evaluation_proteomics.ipynb
└── [other directories/files]
This section guides you through running both federated and centralized batch effect corrections and comparing their results.
To simulate a federated workflow on a single machine using provided sample data:
python3 ./evaluation_utils/run_sample_experiment.py
What this does:
Use the provided utility script to perform federated batch effect correction on your datasets.
python3 ./evaluation_utils/get_federated_corrected_data.py
Steps Performed by the Script:
evaluation_data/[dataset]/before/
.Output:
evaluation_data/after/federated/
.Note: The script may take some time to complete, depending on the dataset size and the number of clients. Note 2: The microarray data processing is commented out in the script. To process this dataset one need >16GB RAM. To run the correction on microarray datasets, uncomment the corresponding lines in the script (get_federated_corrected_data.py, 248-287).
Customization:
evaluation_data/[dataset]/before/
following the existing structure.Perform centralized batch effect correction using limma’s removeBatchEffect
for comparison.
Navigate to the dataset directory:
cd evaluation_data/[dataset_name]
Run the data preprocessing and centralized correction script inside ipynb.
The code is located in *central_RBE.ipynb
Jupyter notebooks in the evaluation_data/[dataset]/
directory.
Output:
evaluation_data/[dataset]/after/
for each dataset.Note: The preprocessing steps and centralized correction are already implemented in the provided notebooks. It is possible to skip this step completely and use the provided corrected data.
Use the provided script to analyze and compare the results of federated and centralized batch effect corrections.
python3 ./evaluation_utils/analyse_fedvscentral.py
What This Does:
Output:
fed_vc_cent_results.tsv
in the `evaluation_data/ directory.To reproduce the tables and figures from the preprint, run the provided Jupyter notebooks in the evaluation/
directory.
This repository includes several utility scripts to facilitate data processing, analysis, and visualization placed in evaluation_utils/
.
get_federated_corrected_data.py
: Automates the federated batch effect correction process using fedRBE.
Functionality:
analyse_fedvscentral.py
: Compares the results of federated and centralized batch effect corrections.
Functionality:
featurecloud_api_extension.py
: Extends FeatureCloud API functionalities to support custom workflows and simulations.
Functionality:
filtering.R
: Includes neccesary filters for data preprocessing before centralized batch effect correction using limma’s removeBatchEffect
.
Functionality:
plots_eda.R
: Includes neccesary functions to generates plots to visualize data distributions and corrections.
Functionality:
upset_plot.py
: Generates UpSet plots to visualize intersections and overlaps in datasets or features.
Functionality:
Encountering issues? Below are common problems and their solutions:
evaluation_data/[dataset]/before/
directory.For unresolved issues, consider reaching out via the GitHub Issues page.
For questions, issues, or support, please: