The BIOS project has generated RNA-sequencing and DNA methylation data for over 4000 individuals. As part of these data, GoNL imputed genotypes were generated from existing genotypes and several phenotypes/demographic variables were collected for the same set of samples. A relational, SQL-based (Postgres) metadatabase (MDb) was created to store the large-scale multiple-omic data collected, in a structured way. Metadata and quantifications from the RP4 metabolomics project were also added to this database.

An export of this metadatabase is available through the BBMRIomics package as a dataset. This dataset contains a data.frame for each of the tables and views in the database.

##  [1] "allphenotypes"                 "cellcounts"                   
##  [3] "dna_sample"                    "freeze1methylation"           
##  [5] "freeze1rnaseq"                 "freeze2methylation"           
##  [7] "freeze2rnaseq"                 "getfastq"                     
##  [9] "getidat"                       "getids"                       
## [11] "getimputations"                "getmethylationruns"           
## [13] "getrelations"                  "getrnaseqruns"                
## [15] "gwas"                          "imputation"                   
## [17] "methylation_450k_freeze"       "methylation_450k_run"         
## [19] "methylationsamplesheet"        "minimalphenotypes"            
## [21] "nightingale_quantification"    "nightingale_run"              
## [23] "person"                        "persontogwas_includingmztwins"
## [25] "relation"                      "rna_freeze"                   
## [27] "rna_merged_run"                "rna_run"                      
## [29] "rna_sample"                    "rnaseqsamplesheet"            
## [31] "RP3_MDB_USRPWD"                "visit"

Alternatively, if you have access to a locally running instance of the database the functions described below may be used to access the database directly.

MDb contents

The MDb contains as much meta-information as possible from all samples and datatypes: location of (raw) data on srm, md5 checksum verification, quality control information, links between the different identifiers used (person_id, dna_id, etc) and phenotype information. The data has been seperated into a number of entities, as described below:

Table: Description:
person Information about persons (including associated ids)
relation Relationship information between persons
gwas Information about GWAS runs
imputation Information about preformed genotype imputations
visit Phenotypes and other information related to the collection of samples
dna_sample Information about DNA samples
methylation_450k_run Information about Illumina 450k methylation array runs
methylation_450k_freeze Which methylation runs are included in which data freezes (and freeze subsets)
rna_sample Information about RNA samples
rna_run Information about RNAseq runs
rna_merged_run Which RNAseq runs are included in merged RNA runs
rna_freeze Which RNAseq runs are included in which data freezes (and freeze subsets)
nightingale_run Information regarding nightingale runs
nightingale_quantification Metabolomics quantification measurements

The listTables function can be used to retrieve a list of table names from a locally running instance as well:

Available views

Views are predefined SQL queries which can be used to extract a subset of the available information from the database. The names of the available views can be retrieved from a local instance using the listViews-function:

Retrieving views and querying the database

To retrieve a view from a local instance the getSQLview-function can be used. Note that view names are not case sensitive.

We can always add views if necessary; please contact Leon Mei.

You can also query the tables from a local instance directly using the runQuery-function. This function is just a wrapper around the dbGetQuery-function from the RPostgreSQL-package, so that package (or any other API which interacts with postgres) can also be used directly.

Database versioning

The database is built from data and SQL scripts stored on the LUMC git server. To retrieve the hash of the commit used to built the local database, the mdbVersion-function can be used. This hash can be seen as the version of the database.