SNP Groups based on multiple diseases

In the interpretation of MFM results, rather than reporting posterior probabilities for each SNP model, we focus on SNP groups that we construct such that SNPs in the same group are in LD, have a similar effect on disease, and rarely appear together in a disease model. SNP groups and their construction for one disease are detailed in the GUESSFM Tags/Groups vignette.

SNPs with marginal posterior probability of inclusion > 0.001 were placed in groups such that those in the same group could be substituted for one another in a model. Following from this, criteria for SNPs to be in the same group are:

  1. SNPs are in LD (pairwise \(r^2 > 0.05\), pairwise \(r<0\).

  2. SNPs are rarely selected together in models; marginal posterior probability that both are included in a model was \(< 0.01\).

We hierarchically cluster SNPs within each disease according to \(r^2 x sign(r)\) using complete linkage, and group SNPs by cutting the tree according to the above two criteria. We then identify overlapping groups defined in different diseases, and merge or split groups when they meet this criteria. This algorithm is implemented in the group.multi function in GUESSFM.

Example

Continuing with the data that was simulated in the MFM Introduction vignette, the group.multi function requires the list of GUESSFM output (SM2) and the SnpMatrix to calculate \(r^2\).

library(MFM) # contains objects Gm, SM2
#> Warning: replacing previous import 'data.table::melt' by 'reshape::melt'
#> when loading 'GUESSFM'
library(GUESSFM) # contains function group.multi
# Gm is the SnpMatrix for controls and both diseases. For LD calculations we use only the controls, 
# identifed as c0 below, Gm[c0,] 
Gm
#> A SnpMatrix with  9000 rows and  26 columns
#> Row names:  control.1 ... case2.3000 
#> Col names:  rs12722563 ... rs41295159
c0 <- grep("control.",rownames(Gm)) # identify controls in Gm SnpMatrix
info <- group.multi(SM2,Gm[c0,]) 
Sgroups <- info$groups@.Data

The info object is a list with three objects:

  1. summary: a data.frame with each row giving summary statistics for each group

  2. groups: the constructed SNP groups; a groups object, each element ordered according to the rows of the summary

  3. r2: the calculated \(r^2\) matrix

The SNP groups are identified as

Sgroups
#> [[1]]
#> [1] "rs11597367" "rs11594656" "rs35285258"
#> 
#> [[2]]
#> [1] "rs56382813" "rs41295105" "rs41295079" "rs62626325" "rs3118475" 
#> [6] "rs62626317" "rs41295055"
#> 
#> [[3]]
#> [1] "rs12722522" "rs7909519"  "rs12722508" "rs41295049" "rs61839660"
#> [6] "rs12722496" "rs12722495"

These 3 groups match or a subset of our previously identified SNP groups for C, D, and A, as given in snpGroups.

#check

setdiff(snpGroups[["C"]],Sgroups[[1]])
#> character(0)
setdiff(snpGroups[["D"]],Sgroups[[2]])
#> character(0)
setdiff(snpGroups[["A"]],Sgroups[[3]])
#> [1] "rs12722563"