Welcome to the Reproducible Bioinformatics project

The aim of Reproducible Bioinformatics project is the creation of easy to use Bioinformatics workflows that fullfill the following roles (Sandve et al. PLoS Comp Biol. 2013):

  1. For Every Result, Keep Track of How It Was Produced
  2. Avoid Manual Data Manipulation Steps
  3. Archive the Exact Versions of All External Programs Used
  4. Version Control All Custom Scripts
  5. Record All Intermediate Results, When Possible in Standardized Formats
  6. For Analyses That Include Randomness, Note Underlying Random Seeds
  7. Always Store Raw Data behind Plots
  8. Generate Hierarchical Analysis Output, Allowing Layers of Increasing Detail to Be Inspected
  9. Connect Textual Statements to Underlying Results
  10. Provide Public Access to Scripts, Runs, and Results

 

The paper on the Reproducible Bioinformatics project is on BMC Bioinformatics (Kulkarni et al. 2018).

 

The Reproducible Bioinformatics project

Reproducible Bioinformatics is a non-profit and open-source project.

We are a group of Bioinformaticians interested to simplify the use of bioinformatics tools to Biologists w/wo scripting ability. At the same time we are interested in providing robust and reproducible workflows.

For this reason we have developed the docker4seq package.

At the present time a total of three workflows are available in the stable version of docker4seq package:

  • RNAseq workflow
  • miRNAseq workflow
  • ChIPseq workflow

Under development are:

  • PDX workflow: variants calling in patient derived xenograft (PDX) from RNAseq and EXOMEseq data
  • Single cell analysis workflow
  • Metagenomics workflow

All workflows are controlled by a set of R fuctions, part of docker4seq package, and the algorithms used are all encapsulated into Docker images and stored at docker.io/repbioinfo repository.

More info on docker4seq here

 

4SeqGUI is the GUI that can be used to control  docker4seq functionalities.

Video tutorials for 4SeqGUI:

HowTo run a full RNAseq analysis

HowTo run a full miRNAseq analysis

HowTo run a full ChIPseq analysis

 

 

How to be part of the Reproducible Bioinformatics project

Any bioinformatician interested to embed specific applications in the available workflows or interested to develop a new workflow is requested to embed the application(s) in a docker image, save it in a public repository and configure one or more R functions that can be used to interact with the docker image.

Steps required to submit a new application/workflow:

  • Edit the skeleton.R function and the ubuntu docker image (docker.io/repbioinfo/ubuntu) to create the new application.
  • Create a public docker repository for the docker image, e.g. at docker.com.
  • Create a workflow.Rmd vignette using RStudio and publish it via RStudio. As example of a vignette see docker4seq vignette.
  • Once the docker image, the function(s) and vignette are ready please fill this submission form. 
    • We will test and incorporate the code in docker4seq package. 
    • Mantainers will be responsable of the maintainance of their application(s).

If you are interested to participate to the project or if you need more information please contact info@reproducible-bioinformatics.org

The SeqBox Project

Short reads sequencing technology has been used for more than a decade now. However, the analysis of RNAseq and ChIPseq data is still computational demanding and the simple access to raw data does not guarantee results reproducibility between laboratories. To address these two aspects, we developed SeqBox, a cheap, efficient and reproducible RNAseq/ChIPseq hardware/software solution based on NUC6I7KYK mini-PC (an Intel consumer game computer with a fast processor and a high performance SSD disk), and Docker container platform. In SeqBox the analysis of RNAseq and ChIPseq data is supported by a friendly GUI. This allows access to fast and reproducible analyses also to scientists with/without scripting experience.

More info on SeqBox characteristics and cost are available at www.seqbox.com

rCASC

rCASC (reproducible Classification Analysis of Single Cell Sequencing Data) is part of this project and provides single cell analysis functionalities within the reproducible rules described by Sandve et al. PLoS Comp Biol. 2013. rCASC is designed to provide a complete workflow for cell-subpopulation discovery.

More info on rCASC can be found here