Generated Omic data

DNA methylation data and RNA-seq data were generated within the Biobank-based Integrative Omics Studies Consortium (http://wiki.bbmri.nl/wiki/BIOS_start-).

The data comprises six biobanks:

  1. Cohort on Diabetes and Atherosclerosis Maastricht (CODAM, n~180) (http://doi.org/10.1111/j.1365-2362.2010.02418.x),
  2. LifeLines (LL, n~700) (http://doi.org/10.1136/bmjopen-2014-006772),
  3. The Leiden Longevity Study (LLS, n~600) (http://doi.org/10.1111/j.1532-5415.2009.02381.x),
  4. The Rotterdam Study (RS, n~600) (http://doi.org/10.1007/s10654-011-9610-5),
  5. The Netherlands Twin Registry (NTR, n~1800) (http://doi.org/10.1017/thg.2012.140) and
  6. The Prospective ALS Study Netherlands (PAN, n~180) (http://doi.org/10.1136/jnnp.2011.244939).

Sample identity of DNA methylation and gene expression data was confirmed using genotype data.

Data were generated by the Human Genotyping facility HugeF of ErasmusMC, the Netherlands.

Available datasets

Currently, RNA-sequencing, DNA methylation data sets containing unrelated individuals with measurements that pass quality control have been made available per biobank and combined. An metabolimic dataset is created containing the overlapping individuals for which RNA-sequencing or DNA methylation data is available. Furthermore, BBMRIomics provides a convenient function to import the whole genome sequencing or imputed genotypes.

Loading the data

Load a specific data set with bbmri.data(name_dataset). Use data(package="BBMRIomics") to inspect the current available datasets. Once loaded check the name of the loaded data with ls() and view its content by just typing the name of the dataset in the console this will automatically call the buildin show-method and summarizes the content of the loaded dataset.

## [1] "counts"         "RP3_MDB_USRPWD"
## class: RangedSummarizedExperiment 
## dim: 56515 420 
## metadata(4): creationDate author BBMRIomicsVersion note
## assays(1): data
## rownames(56515): ENSG00000000003 ENSG00000000005 ... ENSG00000270182
##   ENSG00000270184
## rowData names(1): gene_id
## colnames(420): BIOSF616D007 BIOSE9E0432D ... BIOS6A609DF4 BIOS319C71B4
## colData names(113): uuid biobank_id ... Sampling_Date flowcell

SummarizedExperiment

The datasets are stored in the R/Bioconductor SummarizedExperiment-objects.

Brief summary from the SummarizedExperiment vignette:

The SummarizedExperiment class is used to store rectangular matrices of experimental results, which are commonly produced by sequencing and microarray experiments. Each object stores observations of one or more samples, along with additional meta-data describing both the observations (features) and samples (phenotypes).

A key aspect of the SummarizedExperiment class is the coordination of the meta-data and assays when subsetting. For example, if you want to exclude a given sample you can do for both the meta-data and assay in one operation, which ensures the meta-data and observed data will remain in sync. Improperly accounting for meta and observational data has resulted in a number of incorrect results and retractions so this is a very desirable property.

SummarizedExperiment is in many ways similar to the historical ExpressionSet, the main distinction being that SummarizedExperiment is more flexible in itโ€™s row information, allowing both GRanges based as well as those described by arbitrary DataFrames. This makes it ideally suited to a variety of experiments, particularly sequencing based experiments such as RNA-Seq and ChIp-Seq.

See for more information on the SummarizedExperiment datastructure:

  1. Vignette
  2. Tutorial
  3. Paper