You are here

    • You are here:
    • Home > Events > Reproducible data workflows 2024

Reproducible data workflows 2024

Reproducible data workflows 2024Reproducible data workflows 2024

15/02/2024 16/02/2024
Add to Calendar

Reproducible data workflows 2024

Bioinformatics room (CRG Training centre)

This course is designed to teach the fundamental concepts and practical guidelines for ensuring that everyday data generation and management tasks fit into reproducible scientific workflows. The course emphasizes the importance of open data formats, and recommends using Markdown for documentation. Participants will also learn how to use Gitlab and Github, two data collaboration platforms, for tracking and managing data and documentation across different interfaces such as command line, IDEs and web browser. Git's underlying version control capabilities will be covered in detail during the hands-on sessions. 

Topics that are going to be addressed:

  • Reproducibility 
  • Open data formats 
  • Markdown 
  • IDE 
  • Git 
  • Gitlab 

What NOT to expect:

  • Despite CLI will also be used, desktop or web browser GUIs will be preferred 
  • We will not cover advanced Git or Github/Gitlab usages

Program: 

 Day 1
1. Reproducibility in Science. Why is there a reproducibility crisis? 
2. FAIR. What is it? How is it related to reproducibility? 
3. Outline of a reproducible data workflow 
       - Data collaboration platforms: Gitlab, Github 
       - Quick introduction to Gitlab
       - Editors and IDEs. Showcase with VSCode
4. Open Data Formats. Examples and importance 
       - CSV and TSV. Using them from everyday software
5. Markdown            
       - Why? 
       - Syntax hands-on 
       - Using it from everyday software 
 6. Conversion from/to other formats: Pandoc 

 Day 2
1. VCS. An introduction 
2, Git. Basic concepts: branches, HEAD, commits, etc. 
3. Gitlab: Git commands from both the web interface, the CLI and IDE 
      - Branches, Merge 
4. Gitlab: Collaborative editing. Pull requests 
5. Github 
      - Handling different remotes (Gitlab, Github) 
      - Github pages 

Instructors: Toni Hermoso, Emyr James, Clemente Borges, Emilio Palumbo & Luca Cozzuto 

Dates: 15th and 16th Feb 2024 (10:30-13:30) 

Location: Bioinformatics room (CRG Training Centre) - Presential

Maximum number of participants: 16

Level: Beginners

Registration deadline: 7th Feb 2024 - Extended!

Registration HERE

For any information, please send an email to CRG Training and Academic office (TAO): training@crg.eu