degli Studi di Torino

Reproducible Bioinformatics Project:
  • The aim of our project is the creation of bioinformatics tools, allowing easy reproducibility of complex bioinformatics pipelines, suitable both for bioinformaticians and biologists without scripting knowledge:
    • docker4seq: a R package handling complex genomics pipeline via docker images.
    • 4SeqGUI: a GUI for docker4seq.
    • SeqBox: RNAseq/ChIPseq reproducible analysis on a consumer game computer.

  • This project is addressing reproducibility following the roles suggested by Sandve and coworkers (PLoS Comp Biol. 2013):
    • Rule 1: For Every Result, Keep Track of How It Was Produced
      • Any action is stored as script
    • Rule 2: Avoid Manual Data Manipulation Steps
      • Data manipulation is all done by R scripting
    • Rule 3: Archive the Exact Versions of All External Programs Used
      • Docker containers are freezed and they are store using a versioning code
    • Rule 4: Version Control All Custom Scripts
    • Rule 5: Record All Intermediate Results, When Possible in Standardized Formats
      • User can decide to store intermediate results
    • Rule 6: For Analyses That Include Randomness, Note Underlying Random Seeds
      • This point is addressed within the Docker containers in case
        Randomness is required
    • Rule 7: Always Store Raw Data behind Plots
    • Rule 8: Generate Hierarchical Analysis Output, Allowing Layers of Increasing Detail to Be Inspected
    • Rule 9: Connect Textual Statements to Underlying Results
    • Rule 10: Provide Public Access to Scripts, Runs, and Results
      • the combination of raw data, scripts handling docker container guarantee the reproducibility of results

Contacts: Raffaele Calogero