fedRBE

HowTo Guide Documentation GitHub FeatureCloud App

Reproduce the fedRBE Preprint

This guide provides step-by-step instructions to reproduce the analyses and results from the fedRBE preprint. It leverages the utility scripts and data provided in this repository to demonstrate both centralized and federated batch effect correction using limma and fedRBE, respectively.

Table of Contents


Prerequisites and setup

Before you begin, ensure you have the following installed and configured:

  1. Docker: Essential for containerizing applications. Install Docker.
  2. Git: For cloning the repository. Install Git.
  3. FeatureCloud CLI: Get and configure the FeatureCloud CLI using our installation guide.
  4. Python 3.8+: Required for running Python scripts.
  5. R : Necessary for running R scripts. Install R.

Install required packages

You can install the necessary packages using Conda/Mamba or manually via pip and R requirements files.

We suggest using Conda/Mamba for a consistent environment setup.

Option 1: Using mamba with environment.yml

This is the recommended method as it sets up both Python and R dependencies in a single environment.

  1. Clone the Repository:

    git clone https://github.com/your-username/your-repo.git
    cd your-repo
    
  2. Create and Activate the Mamba/Conda Environment:

    mamba env create -f environment.yml
    mamba activate fedRBE
    

Option 2: Using pip and R requirements files

If you prefer not to use Mamba, you can install Python and R dependencies separately.

  1. Clone the Repository:

    git clone https://github.com/your-username/your-repo.git
    cd your-repo
    
  2. Set Up Python Environment:

    • Create a Virtual Environment:

      python3 -m venv fedrbe_env
      source fedrbe_env/bin/activate  # On Windows: fedrbe_env\Scripts\activate
      
    • Upgrade pip:

      pip install --upgrade pip
      
    • Install Python Dependencies:

      pip install -r requirements.txt
      
  3. Set Up R Environment:

    • Install R Packages:

      Open R and run the following commands:

      install.packages("remotes")  # If not already installed
      remotes::install_deps("requirements_r.txt", repos = "http://cran.rstudio.com/")
      

      Alternatively, you can use the requirements_r.txt with a script:

      Rscript install_packages.R
      

      Where install_packages.R contains:

      packages <- readLines("requirements_r.txt")
      install.packages(packages, repos = "http://cran.rstudio.com/")
      

      Note: Ensure you have an active internet connection for installing R packages.


Repository structure

Understanding the repository layout helps in navigating the files and scripts.

fedRBE/
├── README.md                                   # General repository overview
├── batchcorrection/                            # fedRBe FeatureCloud app
├── evaluation_data/                            # Data used for evaluation
│   ├── microarray/                             # Microarray datasets
        ├── before/                             # Uncorrected data with structure needed to run the app
        ├── after/                              # Corrected data
│   │   └── 01_Preprocessing_and_RBE.ipynb      # Data preparation notebook with centralized removeBatchEffect run
│   ├── microbiome_v2/                          # Microbiome datasets with similar structure as microarray
│   ├── proteomics/                             # Proteomics datasets
│   ├── proteomics_multibatch/                  # Multi-batch proteomics datasets (several ba)
│   └── simulated/                              # Simulated datasets
├── evaluation_utils/                           # Utility scripts for evaluations
│       ├── analyse_fedvscentral.py
│       ├── debugging_analyse_experiments.py
│       ├── evaluation_funcs.R
│       ├── featurecloud_api_extension.py
│       ├── fedRBE_simulation_scrip_simdata.py
│       ├── filtering.R
│       ├── get_federated_corrected_data.py
│       ├── plots_eda.R
│       ├── run_sample_experiment.py
│       ├── simulation_func.R
│       ├── upset_plot.py
│       └── utils_analyse.py
├── evaluation/                                 # Main evaluation scripts to produce results and figures
│   ├── eval_simulation/                        # Evaluations on simulated data
│   ├── evaluation_microarray.ipynb             # Evaluation of microarray datasets
│   ├── evaluation_microbiome.ipynb
│   ├── evaluation_proteomics.ipynb
└── [other directories/files]

Running the analysis

This section guides you through running both federated and centralized batch effect corrections and comparing their results.

1. Running a sample federated experiment

To simulate a federated workflow on a single machine using provided sample data:

python3 ./evaluation_utils/run_sample_experiment.py

What this does:

2. Obtaining federated corrected data

Use the provided utility script to perform federated batch effect correction on your datasets.

python3 ./evaluation_utils/get_federated_corrected_data.py

Steps Performed by the Script:

  1. Sets up multiple clients: Simulates clients based on the datasets in evaluation_data/[dataset]/before/.
  2. Runs fedRBE on each client: Applies federated batch effect correction using FeatureCloud testing environment.
  3. Aggregates results: Combines corrected data securely.

Output:

Note: The script may take some time to complete, depending on the dataset size and the number of clients. Note 2: The microarray data processing is commented out in the script. To process this dataset one need >16GB RAM. To run the correction on microarray datasets, uncomment the corresponding lines in the script (get_federated_corrected_data.py, 248-287).

Customization:

3. Obtaining centrally corrected data

Perform centralized batch effect correction using limma’s removeBatchEffect for comparison.

  1. Navigate to the dataset directory:

    cd evaluation_data/[dataset_name]
    
  2. Run the data preprocessing and centralized correction script inside ipynb.

The code is located in *central_RBE.ipynb Jupyter notebooks in the evaluation_data/[dataset]/ directory.

Output:

Note: The preprocessing steps and centralized correction are already implemented in the provided notebooks. It is possible to skip this step completely and use the provided corrected data.

4. Comparing federated and central corrections

Use the provided script to analyze and compare the results of federated and centralized batch effect corrections.

python3 ./evaluation_utils/analyse_fedvscentral.py

What This Does:

Output:

5. Produce tables and figures

To reproduce the tables and figures from the preprint, run the provided Jupyter notebooks in the evaluation/ directory.

Utility scripts overview

This repository includes several utility scripts to facilitate data processing, analysis, and visualization placed in evaluation_utils/.

Troubleshooting

Encountering issues? Below are common problems and their solutions:

For unresolved issues, consider reaching out via the GitHub Issues page.

Additional resources

Contact information

For questions, issues, or support, please: