The aim of the BIOS consortium is to generate RNA sequencing and DNA methylation data for 4000 individuals that have been selected for having array-based genotypes or whole sequencing data (GoNL) already available. Furthermore, the generated data should pass a set of quality control metrics.

The following code shows how to extract the RNA sequencing and DNA methylation samples that pass quality control.

## No username and password provided for the MDB use stored views!
## No username and password provided for the MDB use stored views!
## No username and password provided for the MDB use stored views!
## [1] 6379   14
## [1] 4427    9
## [1] 6072    7
## [1] 0
## [1] 0
## [1] 0
##  rna dnam 
## 4427 6072
##   rna dnam Counts
## 1   0    0     92
## 2   0    1   1860
## 3   1    0    215
## 4   1    1   4212
## attr(,"class")
## [1] "VennCounts"

Another requirement for Freeze 2 is to have a maximal set of unrelated individuals for which both RNA sequencing and dna methylation data could be generated.

## No username and password provided for the MDB use stored views!
## [1] TRUE
## [1] 4262
## [1] 3558
## 
##           2nd degree family genetical 1st degree family 
##                           6                           8 
##                   has child          has dizygotic twin 
##                         534                        1028 
##        has monozygotic twin                  has parent 
##                        1881                         548 
##   has repeated measurements                     has sib 
##                          88                          99 
##  inferred 1st degree family 
##                          70
## [1] 1616
## famSizes
##    2    4    6    8   10   12   17   18   21   22   26   28 
## 1375  171   13    3   22   19    1    6    1    1    1    3
## [1] "1" "2" "3" "4" "6" "7"
##           ids         uuid biobank_id gonl_id gwas_id
## 17 CODAM-2037 BIOS71A89511      CODAM        2037
## 73 CODAM-2240 BIOS6601F3E0      CODAM        2240
##                 relation_type relation_id family_id
## 17 inferred 1st degree family  CODAM-2240         1
## 73 inferred 1st degree family  CODAM-2037         1
## [1] "9"  "10" "11" "12" "13" "14"
##                ids        uuid biobank_id  gonl_id gwas_id relation_type
## 956 LL-LLDeep_1600 BIOS82666E4         LL gonl-56a         has child
## 960 LL-LLDeep_1603 BIOS855C858         LL gonl-56b         has child
## 981 LL-LLDeep_1619 BIOS9542DD0         LL gonl-56c        has parent
## 982 LL-LLDeep_1619 BIOS9542DD0         LL gonl-56c        has parent
##        relation_id family_id
## 956 LL-LLDeep_1619         9
## 960 LL-LLDeep_1619         9
## 981 LL-LLDeep_1600         9
## 982 LL-LLDeep_1603         9
## [1] "165" "194" "308" "325" "383" "387"
##                           ids         uuid biobank_id gonl_id gwas_id
## 2148         NTR-A1083C-10146 BIOS7E1BFC2D        NTR       10175
## 2149         NTR-A1083C-10146 BIOS7E1BFC2D        NTR       10175
## 2150         NTR-A1083C-10146 BIOS7E1BFC2D        NTR       10175
## 2151 NTR-A1083C-NTR15215-8666 BIOS59AC7812        NTR       10175
## 2152 NTR-A1083C-NTR15215-8666 BIOS59AC7812        NTR       10175
## 2153 NTR-A1083C-NTR15215-8666 BIOS59AC7812        NTR       10175
## 2154         NTR-A1083D-10175 BIOS0AC3A8CD        NTR       10175
## 2155         NTR-A1083D-10175 BIOS0AC3A8CD        NTR       10175
## 2156         NTR-A1083D-10175 BIOS0AC3A8CD        NTR       10175
## 2157 NTR-A1083D-NTR15589-8860 BIOS063C99A7        NTR       10175
## 2158 NTR-A1083D-NTR15589-8860 BIOS063C99A7        NTR       10175
## 2159 NTR-A1083D-NTR15589-8860 BIOS063C99A7        NTR       10175
##                  relation_type              relation_id family_id
## 2148      has monozygotic twin         NTR-A1083D-10175       165
## 2149      has monozygotic twin NTR-A1083D-NTR15589-8860       165
## 2150 has repeated measurements NTR-A1083C-NTR15215-8666       165
## 2151      has monozygotic twin         NTR-A1083D-10175       165
## 2152      has monozygotic twin NTR-A1083D-NTR15589-8860       165
## 2153 has repeated measurements         NTR-A1083C-10146       165
## 2154      has monozygotic twin         NTR-A1083C-10146       165
## 2155      has monozygotic twin NTR-A1083C-NTR15215-8666       165
## 2156 has repeated measurements NTR-A1083D-NTR15589-8860       165
## 2157      has monozygotic twin         NTR-A1083C-10146       165
## 2158      has monozygotic twin NTR-A1083C-NTR15215-8666       165
## 2159 has repeated measurements         NTR-A1083D-10175       165
## [1] "261" "692" "738"
##                              ids         uuid biobank_id  gonl_id  gwas_id
## 2368 NTR-A118A-NTR01371-06D07229 BIOS29A0CC56        NTR gonl-96a 06D07229
## 2369 NTR-A118A-NTR01371-06D07229 BIOS29A0CC56        NTR gonl-96a 06D07229
## 2370 NTR-A118A-NTR01371-06D07229 BIOS29A0CC56        NTR gonl-96a 06D07229
## 2371 NTR-A118A-NTR01371-06D07229 BIOS29A0CC56        NTR gonl-96a 06D07229
## 2372 NTR-A118B-NTR01373-06D07230 BIOS13D06DE0        NTR gonl-96b 06D07230
## 2373 NTR-A118B-NTR01373-06D07230 BIOS13D06DE0        NTR gonl-96b 06D07230
## 2374 NTR-A118B-NTR01373-06D07230 BIOS13D06DE0        NTR gonl-96b 06D07230
## 2375 NTR-A118B-NTR01373-06D07230 BIOS13D06DE0        NTR gonl-96b 06D07230
## 2376               NTR-A118C-829 BIOS34C8B8A5        NTR gonl-96c    10375
## 2377               NTR-A118C-829 BIOS34C8B8A5        NTR gonl-96c    10375
## 2378               NTR-A118C-829 BIOS34C8B8A5        NTR gonl-96c    10375
## 2379               NTR-A118C-829 BIOS34C8B8A5        NTR gonl-96c    10375
## 2380               NTR-A118C-829 BIOS34C8B8A5        NTR gonl-96c    10375
## 2381   NTR-A118C-NT0027644-10375 BIOS125DBF1F        NTR gonl-96c    10375
## 2382   NTR-A118C-NT0027644-10375 BIOS125DBF1F        NTR gonl-96c    10375
## 2383   NTR-A118C-NT0027644-10375 BIOS125DBF1F        NTR gonl-96c    10375
## 2384   NTR-A118C-NT0027644-10375 BIOS125DBF1F        NTR gonl-96c    10375
## 2385   NTR-A118C-NT0027644-10375 BIOS125DBF1F        NTR gonl-96c    10375
## 2386   NTR-A118D-NT0027645-10376 BIOSD15C4FC7        NTR         10375
## 2387   NTR-A118D-NT0027645-10376 BIOSD15C4FC7        NTR         10375
## 2388   NTR-A118D-NT0027645-10376 BIOSD15C4FC7        NTR         10375
## 2389   NTR-A118D-NT0027645-10376 BIOSD15C4FC7        NTR         10375
## 2390   NTR-A118D-NT0027645-10376 BIOSD15C4FC7        NTR         10375
## 2391      NTR-A118D-NTR00927-607 BIOSA249C90B        NTR         10375
## 2392      NTR-A118D-NTR00927-607 BIOSA249C90B        NTR         10375
## 2393      NTR-A118D-NTR00927-607 BIOSA249C90B        NTR         10375
## 2394      NTR-A118D-NTR00927-607 BIOSA249C90B        NTR         10375
## 2395      NTR-A118D-NTR00927-607 BIOSA249C90B        NTR         10375
##                  relation_type                 relation_id family_id
## 2368                 has child               NTR-A118C-829       261
## 2369                 has child   NTR-A118C-NT0027644-10375       261
## 2370                 has child   NTR-A118D-NT0027645-10376       261
## 2371                 has child      NTR-A118D-NTR00927-607       261
## 2372                 has child               NTR-A118C-829       261
## 2373                 has child   NTR-A118C-NT0027644-10375       261
## 2374                 has child   NTR-A118D-NT0027645-10376       261
## 2375                 has child      NTR-A118D-NTR00927-607       261
## 2376      has monozygotic twin   NTR-A118D-NT0027645-10376       261
## 2377      has monozygotic twin      NTR-A118D-NTR00927-607       261
## 2378                has parent NTR-A118A-NTR01371-06D07229       261
## 2379                has parent NTR-A118B-NTR01373-06D07230       261
## 2380 has repeated measurements   NTR-A118C-NT0027644-10375       261
## 2381      has monozygotic twin   NTR-A118D-NT0027645-10376       261
## 2382      has monozygotic twin      NTR-A118D-NTR00927-607       261
## 2383                has parent NTR-A118A-NTR01371-06D07229       261
## 2384                has parent NTR-A118B-NTR01373-06D07230       261
## 2385 has repeated measurements               NTR-A118C-829       261
## 2386      has monozygotic twin               NTR-A118C-829       261
## 2387      has monozygotic twin   NTR-A118C-NT0027644-10375       261
## 2388                has parent NTR-A118A-NTR01371-06D07229       261
## 2389                has parent NTR-A118B-NTR01373-06D07230       261
## 2390 has repeated measurements      NTR-A118D-NTR00927-607       261
## 2391      has monozygotic twin               NTR-A118C-829       261
## 2392      has monozygotic twin   NTR-A118C-NT0027644-10375       261
## 2393                has parent NTR-A118A-NTR01371-06D07229       261
## 2394                has parent NTR-A118B-NTR01373-06D07230       261
## 2395 has repeated measurements   NTR-A118D-NT0027645-10376       261

Now we can selected the maximal unrelated individuals e.g. in case of GoNL trio’s if all have dnam and rna chose the parents; to maximize the number of individuals.

The rna or dnam freeze 2 is extended with unrelated individuals having only rna or dnam.

## [1] 3405
## [1] 3405
## [1] 3530    9
## [1] 4386    7
##          ids     run_id_rna       run_id_dnam
## 1 CODAM-2001 BD1NYRACXX-5-1 8667053102_R05C02
## 2 CODAM-2002 AD10W1ACXX-4-1 8667053157_R01C02
## 3 CODAM-2013 AD10W1ACXX-4-2 8655685053_R04C02
## 4 CODAM-2016 BD1NYRACXX-5-3 8655685094_R01C01
## 5 CODAM-2017 AD10W1ACXX-4-3 8667053076_R02C01
## 6 CODAM-2020 BD1NYRACXX-5-4 8655685021_R04C02
## [1] 3405    3
## No username and password provided for the MDB use stored views!
## character(0)