1 Introduction

The UvA is fully committed to the principles of Open Science, emphasizing transparency and reproducibility in research. Open Science involves making scientific research freely accessible to everyone in society, ensuring research results are widely available. Open Science follows the FAIR (findable, accessible, interoperable, reusable) principles. To ensure Open Science, for every project, whether it is a grant, a publication or a student project, the following general four guiding rules hold.

It is crucial to:

  • Develop a data management plan at the beginning of your project
  • Maintain good data management practices during the project
  • Create a well-organized, self-contained archive at the project’s conclusion
  • Share it with the scientific community in a data repository

To further support good data management practices, IBED provides a Data Management Bonus of 1000 euro to PhDs and Postdocs who succeed, by the end of their project, to deliver such a well-organized and self-contained archive. To support research at scale, this page describes computational resources available for IBED. The document provides both guidelines and resources (Storage, Archiving, HPC). We conclude with a list of resources for courses and trainings offered by UvA, including good data management and computational practices. This document is tailored to IBED, but for more generic information on available resources at faculty level check Data Analytics and Statistics Hub (DASH).

2 Data Management Plan

For almost all funding agencies a Data Management Plan (DMP) is a requirement. To prepare your DMP it is recommended to create an account at DMPonline. Most likely the template of your grant (e.g. NWO) will be available within the platform. Alternatively, you can use a template of the funding organization. Prepare your DMP and invite the IBED data steward (Johannes De Groeve) for review to check if the DMP is in line with FNWI RDM policy and institutional RDM protocol. For examples of successful DMPs within IBED, please check here.

Typical questions and topics which are addressed in a DMP include:

  • How will data be collected
  • How will data be stored and backed up
  • How will data be processed and analyzed
  • How will data be shared and disseminated
  • How will data be preserved
  • Data formats and standards
  • Metadata standards
  • Version control procedures
  • Is there a budget required for data management activities
  • Who is responsible for which data management tasks
  • Who will access the data and security procedures (e.g. personal data)

Answering these questions will streamline decisions about which computational resources you require during the project, including which type(s) of institutional storage are most applicable and which High Performance Computational needs there are. Moreover, it will help you to define better a well defined project directory structure. DMP’s are often only created when obliged by funders. However, whether you are a student or a PI, it is always recommended to think in advance about the above questions.

This will help with:

  • Working in a structured manner
    • avoid having all your files and data scattered without naming conventions or file organisation
  • Prevent you from losing time
    • you lost track of what is the right dataset or code
    • you forgot the meaning of your columns
    • you don’t remember the analysis workflow
  • Facilitate how you share your research
    • with your PI
    • with external collaborators

3 Project Setup

A common issue that most students and PI’s encounter is how to keep their project organized. The starting point to keep your project organized is to define a project directory structure for each individual project unit (e.g. a manuscript, Bsc/Msc thesis). The directory structure and what can be defined as a project unit, is based on knowledge that you gained from your project proposal’s roadmap and DMP. While project units can change over time it is good to have an initial starting point of basic organisation. Here we provide some basic examples. To set-up basic R-project please check the following tutorial [TO BE CREATED].

Basic project directory structure

Example 1

.
├── README.md
├── code
│   ├── 0_data_preparation.Rmd
│   └── 1_analysis.Rmd
├── data
│   ├── input
│   └── output
├── docs
│   ├── manuscript.docx
│   ├── figs
│   └── tabs
└── project_name.Rproj

Example 2

.
├── README.md
├── code
│   ├── 0_data_preparation.Rmd
│   └── 1_analysis.Rmd
├── data
│   ├── raw
│   └── processed
├── docs
│   ├── manuscript.docx
│   ├── figs
│   └── tabs
└── project_name.Rproj

4 Data Storage

To be in line with institutional policy, prior to publication, researchers need to ensure research data is accessible through an institutional cloud storage solution. Depending from the needs including size, speed, resources, programmatically accessibility, etc. different storage solutions are offered to IBED staff and students. Here we list all the storage resources which can be used for ongoing research.

4.1 Research Drive

  • When to use: Project-specific research data, ideal for collaboration provided by uva IT services.
  • Features: Store and share files/data with internal/external collaborators, (programmatically) accessible from computing resources.
  • Access: Request via topdesk selfservice portal by PIs or coordinators.
  • Cost: FREE
  • Documentation:

4.2 OneDrive

  • When to use: Personal file storage in Microsoft 365 cloud.
  • Access: Through browser or Windows File Explorer on UvA computers.
  • Features: Backup files, 1 TB limit.
  • Cost: FREE

4.3 Tape SURF

  • When to use: Long-term storage for infrequently accessed data (project-paid).
  • Access: SURF service desk.
  • Features: Long-term data storage in compressed units, 10 TB for 5 years at 150 euros per year.
  • Cost: 150 euros per year (750 euros for 5 years)
  • Documentation: More Info

4.4 Tape IBED

  • When to use: Long-term storage for infrequently accessed data (non-project).
  • Access: Request via Computational Support IBED (see IBED tape archive usage and policy)
  • Features: Long-term data storage in compressed units.
  • Cost: 5TB FREE
  • Documentation: More Info

4.5 IBED PhD/Postdoc Data Archive

  • When to use: Centralized storage for PhD/Postdoc data archives.
  • Access: Data Manager provides access upon submission of a data archive.
  • Features: PI can download and share files stored in the archive.
  • Cost: FREE
  • Documentation: More Info

5 Data Publication

At closure of a project (student, publication) data and code are required to be published in a data repository following the Open Science and FAIR principles. Many general purpose and domain-specific repositories exist (see below).

An important general purpose repository is Figshare, for which UvA provides an UvA/AUAS institutional account for every staff member. Note that BSc and MSc students do not have access to the institutional Figshare. BSc and MSc students can create accounts and upload files up to 5GB for open data and materials. However, we highly recommend researchers to invite students by creating a “project” via their institutional account. Through this venue the research data is accessible to the PI in a public repository owned by UvA and there are no storage limitations.

5.1 Figshare

5.2 (Domain-Specific) Repositories

  • Zenodo: General purpose, 50GB limit, offers options for single or multiple file uploads.
  • Datadryad: General disciplines, 300GB limit, $120USD base price.
  • NCBI: For biomedical and genomic information.
  • Pangaea: Earth science data, free but asks for a contribution.
  • Movebank: Movement data.
  • Paleobiodb: Paleo data.
  • TRY-db: Plant traits.
  • re3data: Search all available scientific repositories.

6 HPC

There are several options available to IBED staff and students who need extra computing capacity. We distinguish two Virtual Research Environments (VRE, Research Cloud) and three clusters (Crunchomics, Snellius, Spider / Grid). See the full list below.

Before applying for HPC resources, consider the following questions:

  • Do you know how to use HPC systems?
  • Do you need GPU or CPU (or both)?
  • Do you know exactly what you need?
  • How quickly do you need to get started and for how long?
  • Anything specific regarding large datasets?
  • Does it have to be completely free, or do you have project money?

The Computational Support Team can help people decide which option to use. The student or staff member should then contact the service’s help desks to get access and troubleshoot specific problems.

6.1 Virtual Research Environment

  • What is it: Cloud-based, lightweight work environment provided by UvA IT-services.
  • Features:
    • Fully customizable in terms of computing power, storage capacity, and tools.
    • Suitable for maintaining a service or tool.
    • Runs on Linux or Windows on Microsoft Azure, and may later be available on SURF Cloud and Amazon AWS.
  • Access: Request via topdesk selfservice portal by PIs or coordinators.
  • Cost: Probably free for small projects; may need to pay for large projects.
  • Documentation: more info

6.2 Research Cloud

  • What is it: SURF collaborative environment and portal to different cloud providers.
  • Features:
    • Everything done through a workspace.
    • Can connect to data on Research Drive.
  • Access: Contact SURF support desk or send an email to .
  • Cost: Pay with project budget if possible. E-infra grants available for students.
  • Documentation: more info

6.3 Crunchomics

  • What is it: FNWI (FEOIG) service for running heavy calculations.
  • Features:
    • CPUs, GPUs, large memory nodes.
    • Group creation and collaboration.
  • Access: Email Wim de Leeuw with your uva-net ID.
  • Cost: Available and free for IBED and SILS.
  • Documentation: more info

6.4 Snellius

  • What is it: SURF service providing access to Dutch national cluster supercomputer.
  • Features: CPUs, GPUs, large symmetric multi-processing nodes.
  • Access: Through SURF request portal.
  • Cost: Researchers can apply for computing time, data services, and support. Large scale research projects need to apply via NWO.
  • Documentation:: more info and tutorial GPU access

6.5 Spider and the Grid

  • What is it: Data processing platforms at SURF for highly parallel jobs on distributed resources.
  • Features: Suitable for large, structured datasets.
  • Access: Contact SURF support desk or send an email to .
  • Documentation: more info

7 Code Versioning

Imagine you have a piece of code, and you’re keen on tracking its changes without losing the original version. The conventional method involves saving scripts as new files, often labeled with indicators like ‘v0’ or a timestamp. Git offers a more seamless way to version your code without the hassle of managing different version files manually. It not only tracks changes made to your files but also equips you with tools to document those changes. While Git’s initial development focused on code versioning, it’s versatile enough to handle versioning of smaller datasets. GitHub and GitLab support various text file formats (e.g., csv, fasta), making them ideal for versioning.

9 Additional Tutorials

Please submit an issue for improvements to the documentation, or contact the Computational Support Team of IBED.