The BIOS project has generated RNA-sequencing and DNA methylation data for over 4000 individuals. As part of these data, GoNL imputed genotypes were generated from existing genotypes and several phenotypes/demographic variables were collected for the same set of samples. A relational, SQL-based (Postgres) metadatabase (MDb) was created to store the large-scale multiple-omic data collected, in a structured way. Metadata and quantifications from the RP4 metabolomics project were also added to this database.
An export of this metadatabase is available through the BBMRIomics package as a dataset. This dataset contains a data.frame for each of the tables and views in the database.
## [1] "allphenotypes" "cellcounts"
## [3] "dna_sample" "freeze1methylation"
## [5] "freeze1rnaseq" "freeze2methylation"
## [7] "freeze2rnaseq" "getfastq"
## [9] "getidat" "getids"
## [11] "getimputations" "getmethylationruns"
## [13] "getrelations" "getrnaseqruns"
## [15] "gwas" "imputation"
## [17] "methylation_450k_freeze" "methylation_450k_run"
## [19] "methylationsamplesheet" "minimalphenotypes"
## [21] "nightingale_quantification" "nightingale_run"
## [23] "person" "persontogwas_includingmztwins"
## [25] "relation" "rna_freeze"
## [27] "rna_merged_run" "rna_run"
## [29] "rna_sample" "rnaseqsamplesheet"
## [31] "RP3_MDB_USRPWD" "visit"
Alternatively, if you have access to a locally running instance of the database the functions described below may be used to access the database directly.
The MDb contains as much meta-information as possible from all samples and datatypes: location of (raw) data on srm, md5 checksum verification, quality control information, links between the different identifiers used (person_id, dna_id, etc) and phenotype information. The data has been seperated into a number of entities, as described below:
Table: | Description: |
---|---|
person | Information about persons (including associated ids) |
relation | Relationship information between persons |
gwas | Information about GWAS runs |
imputation | Information about preformed genotype imputations |
visit | Phenotypes and other information related to the collection of samples |
dna_sample | Information about DNA samples |
methylation_450k_run | Information about Illumina 450k methylation array runs |
methylation_450k_freeze | Which methylation runs are included in which data freezes (and freeze subsets) |
rna_sample | Information about RNA samples |
rna_run | Information about RNAseq runs |
rna_merged_run | Which RNAseq runs are included in merged RNA runs |
rna_freeze | Which RNAseq runs are included in which data freezes (and freeze subsets) |
nightingale_run | Information regarding nightingale runs |
nightingale_quantification | Metabolomics quantification measurements |
The listTables
function can be used to retrieve a list of table names from a locally running instance as well:
Views are predefined SQL queries which can be used to extract a subset of the available information from the database. The names of the available views can be retrieved from a local instance using the listViews
-function:
To retrieve a view from a local instance the getSQLview
-function can be used. Note that view names are not case sensitive.
We can always add views if necessary; please contact Leon Mei.
You can also query the tables from a local instance directly using the runQuery
-function. This function is just a wrapper around the dbGetQuery
-function from the RPostgreSQL
-package, so that package (or any other API which interacts with postgres) can also be used directly.
The database is built from data and SQL scripts stored on the LUMC git server. To retrieve the hash of the commit used to built the local database, the mdbVersion
-function can be used. This hash can be seen as the version of the database.