Reproducible Bioinformatics

Welcome to the Reproducible Bioinformatics project

The aim of Reproducible Bioinformatics project is the creation of easy to use Bioinformatics workflows that fullfill the following roles (Sandve et al. PLoS Comp Biol. 2013):

For Every Result, Keep Track of How It Was Produced
Avoid Manual Data Manipulation Steps
Archive the Exact Versions of All External Programs Used
Version Control All Custom Scripts
Record All Intermediate Results, When Possible in Standardized Formats
For Analyses That Include Randomness, Note Underlying Random Seeds
Always Store Raw Data behind Plots
Generate Hierarchical Analysis Output, Allowing Layers of Increasing Detail to Be Inspected
Connect Textual Statements to Underlying Results
Provide Public Access to Scripts, Runs, and Results

The paper on the SeqBox project is on Bioinformatics (Beccuti et al. 2018).

The paper on the Reproducible Bioinformatics project is on BMC Bioinformatics (Kulkarni et al. 2018).

The paper on rCASC: reproducible classification analysis of single-cell sequencing data is on GigaScience (Alessandri et al. 2019)

The registration page for 7th Edition of RNAseq and Single Cell RNseq workshop is now available also on Elixir training platform

The Reproducible Bioinformatics project

Reproducible Bioinformatics is a non-profit and open-source project.

We are a group of Bioinformaticians interested to simplify the use of bioinformatics tools to Biologists w/wo scripting ability. At the same time we are interested in providing robust and reproducible workflows.

For this reason we have developed the docker4seq and the rCASC packages.

At the present time a total of five workflows are available:

RNAseq workflow (docker4seq package)
- Tutorial
miRNAseq workflow (docker4seq package)
- Tutorial
ChIPseq workflow (docker4seq package)
- Tutorial
single-cellRNAseq (rCASC package)
- Tutorial
circular RNA workflow (docker4seq package)
- Tutorial

Under development are:

PDX workflow: variants calling in patient derived xenograft (PDX) from RNAseq and EXOMEseq data
Metagenomics workflow

All workflows are controlled by a set of R fuctions, part of docker4seq package, and the algorithms used are all encapsulated into Docker images and stored at docker.io/repbioinfo repository.

4SeqGUI is the GUI that can be used to control docker4seq functionalities.

Video tutorials for 4SeqGUI:

HowTo run a full RNAseq analysis

HowTo run a full miRNAseq analysis

News

Sparsely-connected autoencoder (SCA) for single cell RNAseq data mining

This tool allows uncovering hidden features associated with scRNAseq data. We implemented two new metrics, QCC (Quality Control of Cluster) and QCM (Quality Control of Model), which allow quantifying the ability of SCA to reconstruct valuable cell clusters and to evaluate the quality of the neural network achievements, respectively. Our data indicate that SCA encoded space, derived by different experimentally validated data (TF targets, miRNA targets, Kinase targets, and cancer-related immune signatures), can be used to grasp single cell cluster-specific functional features. In our implementation, SCA efficacy comes from its ability to reconstruct only specific clusters, thus indicating only those clusters where the SCA encoding space is a key element for cells aggregation. SCA analysis is implemented as module in rCASC framework and it is supported by a GUI to simplify it usage for biologists and medical personnel.

Alessandri et al NPJ Syst Biol Appl. 2021

Available as part of rCASC package

The SeqBox Project

Short reads sequencing technology has been used for more than a decade now. However, the analysis of RNAseq and ChIPseq data is still computational demanding and the simple access to raw data does not guarantee results reproducibility between laboratories. To address these two aspects, we developed SeqBox, a cheap, efficient and reproducible RNAseq/ChIPseq hardware/software solution based on NUC6I7KYK mini-PC (an Intel consumer game computer with a fast processor and a high performance SSD disk), and Docker container platform. In SeqBox the analysis of RNAseq and ChIPseq data is supported by a friendly GUI. This allows access to fast and reproducible analyses also to scientists with/without scripting experience.

More info on SeqBox characteristics and cost are available at www.seqbox.com

How to be part of the Reproducible Bioinformatics project

Any bioinformatician interested to embed specific applications in the available workflows or interested to develop a new workflow is requested to embed the application(s) in a docker image, save it in a public repository and configure one or more R functions that can be used to interact with the docker image.

Steps required to submit a new application/workflow:

Edit the skeleton.R function and the ubuntu docker image (docker.io/repbioinfo/ubuntu) to create the new application.
- Please have a look at: Controlling jobs in a docker image, a brief tutorial.
Create a public docker repository for the docker image, e.g. at docker.com.
Create a workflow.Rmd vignette using RStudio and publish it via RStudio. As example of a vignette see docker4seq vignette.
Once the docker image, the function(s) and vignette are ready please fill this submission form.
- We will test and incorporate the code in docker4seq package.
- Mantainers will be responsable of the maintainance of their application(s).