NEMoE.Rmd
Nutrition-Enterotype Mixture of Expert Model(NEMoE) is an R package that facilitates discover latent classes shaped by nutrition intake that makes relationship between gut microbiome and health outcome different.
The methods use a regularized mixture of experts model framework identify such latent classes and using EM algorithm to fitting the parameters in the model. The overall workflow of NEMoE are describe in the following figure:
NEMoE use the nutrition data, microbiome data and health outcome as input to identify latent class based on the nutrients intake as well as the corresponding diet related microbiome signatures of disease.
In this vignette, we go through a 16S microbiome data of Parkinson’s disease to identify the latent classes shaped by nutrition intake and finding the microbiome signatures for each latent class.
The PD
dataset contain data from a gut
microbiome-Parkinson’s disease study, the nutrition intake of each
individual is also provided from food frequency questionnaire. We
provide two type of the data(i.e. phyloseq object and a list) to
construct input of NEMoE.
In this example, we will use NEMoE to identify two latent class of different nutrition pattern with altered microbiome-PD relationships and identify the related nutrition features for the latent class and microbiome signatures for each latent class.
data("PD")
NEMoE
object from listFirst we build the NEMoE
object, the input data to build
NEMoE
have three part: microbiome data, nutrition data and
health outcome. The nutrition data can be either matrix or dataframe
which each row is a sample and each column is a nutrient. The microbiome
data can be either a list of table that contain several microbiome
matrix where each row is a sample and each column is a taxa (at the
level) or a phyloseq
. The response is a vector of different
health state.
In the following example, we build NEMoE
object from
list of microbiome data.
Microbiome = PD$data_list$Microbiome
Nutrition = PD$data_list$Nutrition
Response = PD$data_list$Response
NEMoE object can also incorporate the parameters used in the fitting,
including K
,lambda1
, lambda2
,
alpha1
, alpha2
and cvParams
.
NEMoE = NEMoE_buildFromList(Microbiome, Nutrition, Response, K = 2,
lambda1 = c(0.005, 0.01, 0.016, 0.023, 0.025),
lambda2 = 0.02, alpha1 = 0.5, alpha2 = 0.5,
cvParams = createCVList(g1 = 10, shrink = 0.4,
track = F))
NEMoE
objectNow using the fitNEMoE
function, we can fit
NEMoE
object with the specified parameters in the
NEMoE
object.
NEMoE = fitNEMoE(NEMoE)
#> Fitting NEMoE....
#> it: 0, PLL0:-499.79
#> it: 1, PLL:-498.912
#> it: 2, PLL:-498.135
#> it: 3, PLL:-497.429
#> it: 4, PLL:-496.733
#> it: 5, PLL:-495.999
#> it: 6, PLL:-495.196
#> it: 7, PLL:-494.288
#> it: 8, PLL:-493.227
#> it: 9, PLL:-491.953
#> it: 10, PLL:-490.378
#> it: 11, PLL:-488.453
#> it: 12, PLL:-486.091
#> it: 13, PLL:-483.381
#> it: 14, PLL:-480.661
#> it: 15, PLL:-478.076
#> it: 16, PLL:-475.773
#> it: 17, PLL:-473.822
#> it: 18, PLL:-472.299
#> it: 19, PLL:-471.153
#> it: 20, PLL:-470.36
#> it: 21, PLL:-469.868
#> it: 22, PLL:-469.542
#> it: 23, PLL:-469.326
#> it: 24, PLL:-469.179
#> it: 25, PLL:-469.077
#> it: 26, PLL:-469.008
#> it: 27, PLL:-468.962
#> it: 28, PLL:-468.93
#> it: 29, PLL:-468.906
#> it: 30, PLL:-468.888
#> it: 31, PLL:-468.872
#> it: 32, PLL:-468.859
#> it: 33, PLL:-468.848
#> it: 34, PLL:-468.838
#> it: 35, PLL:-468.829
#> it: 36, PLL:-468.821
#> it: 37, PLL:-468.813
#> it: 38, PLL:-468.806
#> it: 39, PLL:-468.797
#> it: 40, PLL:-468.789
#> it: 41, PLL:-468.778
#> it: 42, PLL:-468.766
#> it: 43, PLL:-468.752
#> it: 44, PLL:-468.733
#> it: 45, PLL:-468.714
#> it: 46, PLL:-468.696
#> it: 47, PLL:-468.679
#> it: 48, PLL:-468.663
#> it: 49, PLL:-468.648
#> it: 50, PLL:-468.632
#> it: 51, PLL:-468.617
#> it: 52, PLL:-468.602
#> it: 53, PLL:-468.588
#> it: 54, PLL:-468.573
#> it: 55, PLL:-468.56
#> it: 56, PLL:-468.549
#> it: 57, PLL:-468.539
#> it: 58, PLL:-468.531
#> it: 59, PLL:-468.525
#> it: 60, PLL:-468.52
#> it: 61, PLL:-468.516
#> it: 62, PLL:-468.514
#> it: 63, PLL:-468.513
#> it: 64, PLL:-468.513
#> it: 65, PLL:-468.513
#> it: 66, PLL:-468.513
#> it: 67, PLL:-468.513
#> it: 68, PLL:-468.513
The fitted log-likelihood can be get from getLL
function.
getLL(NEMoE)
#> LL_obs PLL_obs LL_complete PLL_complete
#> Phylum -90.21069 -117.7364 -147.4908 -175.0165
#> Order -83.14621 -114.5389 -137.9087 -169.3014
#> Family -82.83030 -115.6088 -139.6559 -172.4345
#> Genus -78.56857 -113.8025 -144.2888 -179.5227
#> ASV -64.70312 -109.2637 -133.2256 -177.7862
#> All -399.45888 -468.5129 -702.5698 -771.6239
The corresponding coefficients in gating function, i.e. the effect
size of each nutrition variables of their contribution to the different
latent class can be obtained from getCoef
function.
coef.Gating <- getCoef(NEMoE)$coef.gating
These coefficients of gating network can be plotted by
plotGating
function.
p_list <- plotGating(NEMoE)
print(p_list[[1]])
print(p_list[[2]])
The corresponding coefficients in experts network, i.e. the effect
size of each microbiome features of their contribution to health outcome
(PD state here) in different latent class can be obtained from
getCoef
function.
coef.Experts <- getCoef(NEMoE)$coef.experts
These coefficients of experts network can be plotted by
plotGating
function.
p_list <- plotExperts(NEMoE)
print(p_list[[1]] + ggtitle("Coefficients of Phylum level") + theme(axis.text.x = element_text(angle = 45)))
print(p_list[[2]] + ggtitle("Coefficients of Order level") + theme(axis.text.x = element_text(angle = 45)))
print(p_list[[3]] + ggtitle("Coefficients of Family level") + theme(axis.text.x = element_text(angle = 45)))
print(p_list[[4]] + ggtitle("Coefficients of Genus level") + theme(axis.text.x = element_text(angle = 45, size = 5)))
print(p_list[[5]] + ggtitle("Coefficients of ASV level") + theme(axis.text.x = element_blank()))
NEMoE
NEMoE
can also built based on single level data, we
illustrated it using Genus
level of the Microbiome-PD
dataset.
Microbiome_gen <- Microbiome$Genus
NEMoE_gen = NEMoE_buildFromList(Microbiome_gen, Nutrition, Response, K = 2,
lambda1 = 0.028, lambda2 = 0.02,
alpha1 = 0.5, alpha2 = 0.5)
NEMoE_gen <- fitNEMoE(NEMoE_gen, restart_it = 20)
#> Fitting NEMoE....
#> it: 0, PLL0:-101.423
#> it: 1, PLL:-101.359
#> it: 2, PLL:-101.294
#> it: 3, PLL:-101.228
#> it: 4, PLL:-101.163
#> it: 5, PLL:-101.1
#> it: 6, PLL:-101.032
#> it: 7, PLL:-100.957
#> it: 8, PLL:-100.871
#> it: 9, PLL:-100.768
#> it: 10, PLL:-100.634
#> it: 11, PLL:-100.453
#> it: 12, PLL:-100.222
#> it: 13, PLL:-99.9147
#> it: 14, PLL:-99.5009
#> it: 15, PLL:-98.9517
#> it: 16, PLL:-98.2935
#> it: 17, PLL:-97.5981
#> it: 18, PLL:-96.9549
#> it: 19, PLL:-96.4377
#> it: 20, PLL:-96.048
#> it: 21, PLL:-95.7756
#> it: 22, PLL:-95.5959
#> it: 23, PLL:-95.4771
#> it: 24, PLL:-95.3956
#> it: 25, PLL:-95.3385
#> it: 26, PLL:-95.2966
#> it: 27, PLL:-95.2642
#> it: 28, PLL:-95.2377
#> it: 29, PLL:-95.2153
#> it: 30, PLL:-95.1957
#> it: 31, PLL:-95.1782
#> it: 32, PLL:-95.1622
#> it: 33, PLL:-95.1478
#> it: 34, PLL:-95.1354
#> it: 35, PLL:-95.1246
#> it: 36, PLL:-95.1153
#> it: 37, PLL:-95.1073
#> it: 38, PLL:-95.1003
#> it: 39, PLL:-95.0943
#> it: 40, PLL:-95.0891
#> it: 41, PLL:-95.0846
#> it: 42, PLL:-95.0807
#> it: 43, PLL:-95.0774
#> it: 44, PLL:-95.0745
#> it: 45, PLL:-95.072
#> it: 46, PLL:-95.0699
#> it: 47, PLL:-95.0681
#> it: 48, PLL:-95.0666
#> it: 49, PLL:-95.0652
#> it: 50, PLL:-95.0641
#> it: 51, PLL:-95.0631
#> it: 52, PLL:-95.0623
#> it: 53, PLL:-95.0616
#> it: 54, PLL:-95.0611
#> it: 55, PLL:-95.0606
#> it: 56, PLL:-95.0602
#> it: 57, PLL:-95.0598
#> it: 58, PLL:-95.0596
#> it: 59, PLL:-95.0593
#> it: 60, PLL:-95.0592
#> it: 61, PLL:-95.059
#> it: 62, PLL:-95.0589
#> it: 63, PLL:-95.0588
#> it: 64, PLL:-95.0587
#> it: 65, PLL:-95.0587
#> it: 66, PLL:-95.0586
#> it: 67, PLL:-95.0586
#> it: 68, PLL:-95.0586
p_list <- plotGating(NEMoE_gen)
print(p_list[[1]])
print(p_list[[2]])
p <- plotExperts(NEMoE_gen)[[1]]
p + theme(axis.text.x = element_text(angle = 45, size = 5))
NEMoE
object from phyloseqIn parallel, NEMoE
object can be built from
phyloseq
object from the microbiome counts data.
NEMoE
wrapper functions for TSS normalization and
transformations of the counts data. Also which levels of the data
incorporated in the data can be determined by user.
sample_PD <- sample_data(PD$ps)
Response <- sample_PD$PD
Nutrition <- sample_PD[,-ncol(sample_PD)]
In the following example, we first TSS normalized data, then filter
the taxa with prevalance larger than 0.7 (i.e. the taxa non-zero in more
than 70% of the sample) and variance larger than 5e-5. Then the data are
further tranformed using arcsin transformation and zscored. The
resulting Microbiome data is to used built NEMoE
object.
NEMoE_ps <- NEMoE_buildFromPhyloseq(ps = PD$ps, Nutrition = scale(Nutrition),
Response = Response, K = 2,
lambda1 = c(0.005, 0.014, 0.016, 0.023, 0.025),
lambda2 = 0.02, alpha1 = 0.5, alpha2 = 0.5,
filtParam = list(prev = 0.7, var = 1e-5),
transParam = list(method = "asin"),
taxLevel = c("Phylum","Order","Family",
"Genus","ASV"))
The NEMoE
object can also be fitted and visualized as it
is in the previous section.
NEMoE_ps <- fitNEMoE(NEMoE_ps)
#> Fitting NEMoE....
#> it: 0, PLL0:-494.62
#> it: 1, PLL:-493.94
#> it: 2, PLL:-493.284
#> it: 3, PLL:-492.686
#> it: 4, PLL:-492.094
#> it: 5, PLL:-491.481
#> it: 6, PLL:-490.83
#> it: 7, PLL:-490.129
#> it: 8, PLL:-489.357
#> it: 9, PLL:-488.485
#> it: 10, PLL:-487.511
#> it: 11, PLL:-486.401
#> it: 12, PLL:-485.14
#> it: 13, PLL:-483.711
#> it: 14, PLL:-482.088
#> it: 15, PLL:-480.307
#> it: 16, PLL:-478.45
#> it: 17, PLL:-476.66
#> it: 18, PLL:-475.071
#> it: 19, PLL:-473.713
#> it: 20, PLL:-472.585
#> it: 21, PLL:-471.669
#> it: 22, PLL:-470.943
#> it: 23, PLL:-470.389
#> it: 24, PLL:-469.979
#> it: 25, PLL:-469.67
#> it: 26, PLL:-469.436
#> it: 27, PLL:-469.256
#> it: 28, PLL:-469.116
#> it: 29, PLL:-469.004
#> it: 30, PLL:-468.932
#> it: 31, PLL:-468.885
#> it: 32, PLL:-468.852
#> it: 33, PLL:-468.827
#> it: 34, PLL:-468.81
#> it: 35, PLL:-468.797
#> it: 36, PLL:-468.789
#> it: 37, PLL:-468.782
#> it: 38, PLL:-468.777
#> it: 39, PLL:-468.773
#> it: 40, PLL:-468.77
#> it: 41, PLL:-468.767
#> it: 42, PLL:-468.766
#> it: 43, PLL:-468.764
#> it: 44, PLL:-468.763
#> it: 45, PLL:-468.762
#> it: 46, PLL:-468.761
#> it: 47, PLL:-468.761
#> it: 48, PLL:-468.761
#> it: 49, PLL:-468.76
#> it: 50, PLL:-468.76
#> it: 51, PLL:-468.76
#> it: 52, PLL:-468.76
p_list <- plotGating(NEMoE_ps)
print(p_list[[1]])
print(p_list[[2]])
p_list <- plotExperts(NEMoE_ps)
print(p_list[[1]] + ggtitle("Coefficients of Phylum level") + theme(axis.text.x = element_text(angle = 45)))
print(p_list[[2]] + ggtitle("Coefficients of Order level") + theme(axis.text.x = element_text(angle = 45)))
print(p_list[[3]] + ggtitle("Coefficients of Family level") + theme(axis.text.x = element_text(angle = 45)))
print(p_list[[4]] + ggtitle("Coefficients of Genus level") + theme(axis.text.x = element_text(angle = 45, size = 5)))
print(p_list[[5]] + ggtitle("Coefficients of ASV level") + theme(axis.text.x = element_blank()))
We provide many metric including statistics such as AIC
,
BIC
and ICL
and cross validation metric such
as cross validation accuracy and AUC.
calcCriterion(NEMoE, "all")
#> AIC BIC ICL1 ICL2 eBIC mAIC mBIC
#> [1,] -994.9178 -1301.066 -1601.14 -1907.288 -1668.063 -1034.918 -1328.765
#> mICL1 mICL2 accuracy D.square TPR TNR F1
#> [1,] -1641.14 -1934.987 0.6845588 0.1405786 0.7790186 0.5601094 0.7341976
#> auc
#> [1,] 0.7696813
Also the one can use this procedure to select parameters.
NEMoE <- cvNEMoE(NEMoE)
The selected parameters obtained from the cross validation result can be used to fitting the model.
lambda1_choose <- NEMoE@cvResult$lambda1_choose
lambda2_choose <- NEMoE@cvResult$lambda2_choose
NEMoE <- setParam(NEMoE, lambda1 = lambda1_choose, lambda2 = lambda2_choose)
NEMoE <- fitNEMoE(NEMoE)
#> Fitting NEMoE....
#> it: 0, PLL0:-501.013
#> it: 1, PLL:-500.212
#> it: 2, PLL:-499.472
#> it: 3, PLL:-498.733
#> it: 4, PLL:-497.929
#> it: 5, PLL:-497.022
#> it: 6, PLL:-496.004
#> it: 7, PLL:-494.839
#> it: 8, PLL:-493.45
#> it: 9, PLL:-491.807
#> it: 10, PLL:-489.977
#> it: 11, PLL:-488.07
#> it: 12, PLL:-486.185
#> it: 13, PLL:-484.433
#> it: 14, PLL:-482.941
#> it: 15, PLL:-481.805
#> it: 16, PLL:-480.969
#> it: 17, PLL:-480.343
#> it: 18, PLL:-479.852
#> it: 19, PLL:-479.398
#> it: 20, PLL:-478.804
#> it: 21, PLL:-478.021
#> it: 22, PLL:-477.311
#> it: 23, PLL:-476.739
#> it: 24, PLL:-476.297
#> it: 25, PLL:-476.022
#> it: 26, PLL:-475.841
#> it: 27, PLL:-475.714
#> it: 28, PLL:-475.621
#> it: 29, PLL:-475.549
#> it: 30, PLL:-475.487
#> it: 31, PLL:-475.443
#> it: 32, PLL:-475.407
#> it: 33, PLL:-475.374
#> it: 34, PLL:-475.341
#> it: 35, PLL:-475.309
#> it: 36, PLL:-475.274
#> it: 37, PLL:-475.236
#> it: 38, PLL:-475.192
#> it: 39, PLL:-475.141
#> it: 40, PLL:-475.084
#> it: 41, PLL:-475.019
#> it: 42, PLL:-474.971
#> it: 43, PLL:-474.924
#> it: 44, PLL:-474.879
#> it: 45, PLL:-474.837
#> it: 46, PLL:-474.808
#> it: 47, PLL:-474.785
#> it: 48, PLL:-474.763
#> it: 49, PLL:-474.742
#> it: 50, PLL:-474.72
#> it: 51, PLL:-474.698
#> it: 52, PLL:-474.676
#> it: 53, PLL:-474.651
#> it: 54, PLL:-474.622
#> it: 55, PLL:-474.589
#> it: 56, PLL:-474.547
#> it: 57, PLL:-474.499
#> it: 58, PLL:-474.451
#> it: 59, PLL:-474.417
#> it: 60, PLL:-474.387
#> it: 61, PLL:-474.36
#> it: 62, PLL:-474.333
#> it: 63, PLL:-474.302
#> it: 64, PLL:-474.27
#> it: 65, PLL:-474.233
#> it: 66, PLL:-474.189
#> it: 67, PLL:-474.134
#> it: 68, PLL:-474.066
#> it: 69, PLL:-474.012
#> it: 70, PLL:-473.974
#> it: 71, PLL:-473.947
#> it: 72, PLL:-473.927
#> it: 73, PLL:-473.912
#> it: 74, PLL:-473.902
#> it: 75, PLL:-473.895
#> it: 76, PLL:-473.89
#> it: 77, PLL:-473.888
#> it: 78, PLL:-473.886
#> it: 79, PLL:-473.885
#> it: 80, PLL:-473.885
#> it: 81, PLL:-473.884
#> it: 82, PLL:-473.884
#> it: 83, PLL:-473.884
#> it: 84, PLL:-473.884
#> it: 85, PLL:-473.884
p_list <- plotGating(NEMoE_ps)
print(p_list[[1]])
print(p_list[[2]])
p_list <- plotExperts(NEMoE)
print(p_list[[1]] + ggtitle("Coefficients of Phylum level") + theme(axis.text.x = element_text(angle = 45)))
print(p_list[[2]] + ggtitle("Coefficients of Order level") + theme(axis.text.x = element_text(angle = 45)))
print(p_list[[3]] + ggtitle("Coefficients of Family level") + theme(axis.text.x = element_text(angle = 45)))
print(p_list[[4]] + ggtitle("Coefficients of Genus level") + theme(axis.text.x = element_text(angle = 45, size = 5)))
print(p_list[[5]] + ggtitle("Coefficients of ASV level") + theme(axis.text.x = element_blank()))
sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 22000)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=Chinese (Simplified)_China.936
#> [2] LC_CTYPE=Chinese (Simplified)_China.936
#> [3] LC_MONETARY=Chinese (Simplified)_China.936
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=Chinese (Simplified)_China.936
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] phyloseq_1.36.0 NEMoE_1.1.0 ggplot2_3.3.6
#>
#> loaded via a namespace (and not attached):
#> [1] nlme_3.1-153 bitops_1.0-7 fs_1.5.0
#> [4] rprojroot_2.0.2 GenomeInfoDb_1.28.4 tools_4.1.0
#> [7] bslib_0.3.0 utf8_1.2.2 R6_2.5.1
#> [10] vegan_2.5-7 DBI_1.1.3 BiocGenerics_0.38.0
#> [13] mgcv_1.8-36 colorspace_2.0-2 permute_0.9-5
#> [16] rhdf5filters_1.4.0 ade4_1.7-17 withr_2.5.0
#> [19] tidyselect_1.1.1 compiler_4.1.0 glmnet_4.1-2
#> [22] textshaping_0.3.5 cli_3.0.1 Biobase_2.52.0
#> [25] desc_1.3.0 labeling_0.4.2 sass_0.4.0
#> [28] scales_1.2.1 pkgdown_1.6.1 systemfonts_1.0.2
#> [31] stringr_1.4.0 digest_0.6.27 rmarkdown_2.10
#> [34] XVector_0.32.0 pkgconfig_2.0.3 htmltools_0.5.2
#> [37] highr_0.9 fastmap_1.1.0 rlang_1.0.4
#> [40] rstudioapi_0.13 shape_1.4.6 jquerylib_0.1.4
#> [43] generics_0.1.3 farver_2.1.1 jsonlite_1.7.2
#> [46] mclust_5.4.7 BiocParallel_1.26.2 dplyr_1.0.7
#> [49] RCurl_1.98-1.4 magrittr_2.0.3 GenomeInfoDbData_1.2.6
#> [52] biomformat_1.20.0 Matrix_1.3-4 Rcpp_1.0.7
#> [55] munsell_0.5.0 S4Vectors_0.30.0 Rhdf5lib_1.14.2
#> [58] fansi_1.0.3 ape_5.5 lifecycle_1.0.1
#> [61] stringi_1.7.4 yaml_2.2.1 MASS_7.3-54
#> [64] zlibbioc_1.38.0 rhdf5_2.36.0 plyr_1.8.6
#> [67] grid_4.1.0 parallel_4.1.0 crayon_1.5.1
#> [70] lattice_0.20-44 Biostrings_2.60.2 splines_4.1.0
#> [73] multtest_2.48.0 knitr_1.38 pillar_1.8.1
#> [76] igraph_1.2.6 reshape2_1.4.4 codetools_0.2-18
#> [79] stats4_4.1.0 glue_1.6.2 evaluate_0.15
#> [82] data.table_1.14.0 vctrs_0.4.1 foreach_1.5.2
#> [85] gtable_0.3.0 purrr_0.3.4 assertthat_0.2.1
#> [88] cachem_1.0.6 xfun_0.30 ragg_1.1.3
#> [91] survival_3.2-13 tibble_3.1.8 iterators_1.0.14
#> [94] memoise_2.0.1 IRanges_2.26.0 cluster_2.1.2
#> [97] ROCR_1.0-11