The package MethylAid is used for sample-level quality control. MethylAid is specially designed for quality control of large sets of Illumina Human DNA methylation array data sets e.g., epigenomewide association studies (EWAS) using the 450k or 850k (EPIC) arrays. Extracting intensities from IDAT files can be done in batches and/or in parallel to reduce memory load and/or overcome long run-times. It requires two function calls in going from IDAT files to launch the interactive web application; summarize
and visualize
. For more information see (Iterson et al. 2014).
path450k <- file.path(VM_BASE_DATA, "IlluminaHumanMethylation450k")
samplesheets <- getView("methylationSamplesheet")
samplesheets <- samplesheets[!duplicated(samplesheets$run_id),]
samplesheets <- samplesheets[order(samplesheets$ids),]
head(samplesheets)
dim(samplesheets)
for(biobank in c("CODAM", "LL", "LLS", "NTR", "PAN", "RS"))
write.table(subset(samplesheets, biobank_id == biobank),
file = file.path(path450k, "450k", biobank, "sample_sheet.txt"),
sep=",", row.names=FALSE, quote=FALSE)
samplesheets$Basename <- with(samplesheets,
file.path(path450k, "raw", Sentrix_Barcode,
paste(Sentrix_Barcode, Sentrix_Position, sep="_")))
Now a check is performed to see if all idat-files exists or if we missed one on the VM.
Now we are ready to run MethylAid summarization.
For each biobank we extract all quality control probe intensities using the MethylAid function summarize
. Since, we have a VM with multiple cores we can do the summarization in parallel. For each biobank we generate an RData
-file called MethylAid and store it in the biobanks data directory.
register(MulticoreParam(10, log=TRUE))
tmp <- sapply(c("CODAM", "LL", "LLS", "NTR", "PAN", "RS"), function(biobank) {
samplesheet <- samplesheets[samplesheets$biobank_id == biobank,]
summarize(samplesheet, batchSize = 100, BPPARAM = bpparam(),
file=file.path(path450k, "450k", biobank, "MethylAid"))
})
See Apps for the interactive sample-level quality control.
Iterson, M. van, E. W. Tobi, R. C. Slieker, W. den Hollander, R. Luijk, P. E. Slagboom, and B. T. Heijmans. 2014. “MethylAid: visual and interactive quality control of large Illumina 450k datasets.” Bioinformatics 30 (23): 3435–7.