NEMoE.RmdNutrition-Enterotype Mixture of Expert Model(NEMoE) is an R package that facilitates discover latent classes shaped by nutrition intake that makes relationship between gut microbiome and health outcome different.
The methods use a regularized mixture of experts model framework identify such latent classes and using EM algorithm to fitting the parameters in the model. The overall workflow of NEMoE are describe in the following figure:
NEMoE use
the nutrition data, microbiome data and health outcome as input to
identify latent class based on the nutrients intake as well as the
corresponding diet related microbiome signatures of disease.
In this vignette, we go through a 16S microbiome data of Parkinson’s disease to identify the latent classes shaped by nutrition intake and finding the microbiome signatures for each latent class.
The PD dataset contain data from a gut
microbiome-Parkinson’s disease study, the nutrition intake of each
individual is also provided from food frequency questionnaire. We
provide two type of the data(i.e. phyloseq object and a list) to
construct input of NEMoE.
In this example, we will use NEMoE to identify two latent class of different nutrition pattern with altered microbiome-PD relationships and identify the related nutrition features for the latent class and microbiome signatures for each latent class.
data("PD")NEMoE object from listFirst we build the NEMoE object, the input data to build
NEMoE have three part: microbiome data, nutrition data and
health outcome. The nutrition data can be either matrix or dataframe
which each row is a sample and each column is a nutrient. The microbiome
data can be either a list of table that contain several microbiome
matrix where each row is a sample and each column is a taxa (at the
level) or a phyloseq. The response is a vector of different
health state.
In the following example, we build NEMoE object from
list of microbiome data.
Microbiome = PD$data_list$Microbiome
Nutrition = PD$data_list$Nutrition
Response = PD$data_list$ResponseNEMoE object can also incorporate the parameters used in the fitting,
including K,lambda1, lambda2,
alpha1, alpha2 and cvParams.
NEMoE = NEMoE_buildFromList(Microbiome, Nutrition, Response, K = 2,
lambda1 = c(0.005, 0.01, 0.016, 0.023, 0.025),
lambda2 = 0.02, alpha1 = 0.5, alpha2 = 0.5,
cvParams = createCVList(g1 = 10, shrink = 0.4,
track = F))NEMoE objectNow using the fitNEMoE function, we can fit
NEMoE object with the specified parameters in the
NEMoE object.
NEMoE = fitNEMoE(NEMoE)
#> Fitting NEMoE....
#> it: 0, PLL0:-499.79
#> it: 1, PLL:-498.912
#> it: 2, PLL:-498.135
#> it: 3, PLL:-497.429
#> it: 4, PLL:-496.733
#> it: 5, PLL:-495.999
#> it: 6, PLL:-495.196
#> it: 7, PLL:-494.288
#> it: 8, PLL:-493.227
#> it: 9, PLL:-491.953
#> it: 10, PLL:-490.378
#> it: 11, PLL:-488.453
#> it: 12, PLL:-486.091
#> it: 13, PLL:-483.381
#> it: 14, PLL:-480.661
#> it: 15, PLL:-478.076
#> it: 16, PLL:-475.773
#> it: 17, PLL:-473.822
#> it: 18, PLL:-472.299
#> it: 19, PLL:-471.153
#> it: 20, PLL:-470.36
#> it: 21, PLL:-469.868
#> it: 22, PLL:-469.542
#> it: 23, PLL:-469.326
#> it: 24, PLL:-469.179
#> it: 25, PLL:-469.077
#> it: 26, PLL:-469.008
#> it: 27, PLL:-468.962
#> it: 28, PLL:-468.93
#> it: 29, PLL:-468.906
#> it: 30, PLL:-468.888
#> it: 31, PLL:-468.872
#> it: 32, PLL:-468.859
#> it: 33, PLL:-468.848
#> it: 34, PLL:-468.838
#> it: 35, PLL:-468.829
#> it: 36, PLL:-468.821
#> it: 37, PLL:-468.813
#> it: 38, PLL:-468.806
#> it: 39, PLL:-468.797
#> it: 40, PLL:-468.789
#> it: 41, PLL:-468.778
#> it: 42, PLL:-468.766
#> it: 43, PLL:-468.752
#> it: 44, PLL:-468.733
#> it: 45, PLL:-468.714
#> it: 46, PLL:-468.696
#> it: 47, PLL:-468.679
#> it: 48, PLL:-468.663
#> it: 49, PLL:-468.648
#> it: 50, PLL:-468.632
#> it: 51, PLL:-468.617
#> it: 52, PLL:-468.602
#> it: 53, PLL:-468.588
#> it: 54, PLL:-468.573
#> it: 55, PLL:-468.56
#> it: 56, PLL:-468.549
#> it: 57, PLL:-468.539
#> it: 58, PLL:-468.531
#> it: 59, PLL:-468.525
#> it: 60, PLL:-468.52
#> it: 61, PLL:-468.516
#> it: 62, PLL:-468.514
#> it: 63, PLL:-468.513
#> it: 64, PLL:-468.513
#> it: 65, PLL:-468.513
#> it: 66, PLL:-468.513
#> it: 67, PLL:-468.513
#> it: 68, PLL:-468.513The fitted log-likelihood can be get from getLL
function.
getLL(NEMoE)
#> LL_obs PLL_obs LL_complete PLL_complete
#> Phylum -90.21069 -117.7364 -147.4908 -175.0165
#> Order -83.14621 -114.5389 -137.9087 -169.3014
#> Family -82.83030 -115.6088 -139.6559 -172.4345
#> Genus -78.56857 -113.8025 -144.2888 -179.5227
#> ASV -64.70312 -109.2637 -133.2256 -177.7862
#> All -399.45888 -468.5129 -702.5698 -771.6239The corresponding coefficients in gating function, i.e. the effect
size of each nutrition variables of their contribution to the different
latent class can be obtained from getCoef function.
coef.Gating <- getCoef(NEMoE)$coef.gatingThese coefficients of gating network can be plotted by
plotGating function.
p_list <- plotGating(NEMoE)
print(p_list[[1]])
print(p_list[[2]])
The corresponding coefficients in experts network, i.e. the effect
size of each microbiome features of their contribution to health outcome
(PD state here) in different latent class can be obtained from
getCoef function.
coef.Experts <- getCoef(NEMoE)$coef.expertsThese coefficients of experts network can be plotted by
plotGating function.
p_list <- plotExperts(NEMoE)
print(p_list[[1]] + ggtitle("Coefficients of Phylum level") + theme(axis.text.x = element_text(angle = 45))) 
print(p_list[[2]] + ggtitle("Coefficients of Order level") + theme(axis.text.x = element_text(angle = 45))) 
print(p_list[[3]] + ggtitle("Coefficients of Family level") + theme(axis.text.x = element_text(angle = 45))) 
print(p_list[[4]] + ggtitle("Coefficients of Genus level") + theme(axis.text.x = element_text(angle = 45, size = 5))) 
print(p_list[[5]] + ggtitle("Coefficients of ASV level") + theme(axis.text.x = element_blank())) 
NEMoE
NEMoE can also built based on single level data, we
illustrated it using Genus level of the Microbiome-PD
dataset.
Microbiome_gen <- Microbiome$Genus
NEMoE_gen = NEMoE_buildFromList(Microbiome_gen, Nutrition, Response, K = 2,
lambda1 = 0.028, lambda2 = 0.02,
alpha1 = 0.5, alpha2 = 0.5)
NEMoE_gen <- fitNEMoE(NEMoE_gen, restart_it = 20)
#> Fitting NEMoE....
#> it: 0, PLL0:-101.423
#> it: 1, PLL:-101.359
#> it: 2, PLL:-101.294
#> it: 3, PLL:-101.228
#> it: 4, PLL:-101.163
#> it: 5, PLL:-101.1
#> it: 6, PLL:-101.032
#> it: 7, PLL:-100.957
#> it: 8, PLL:-100.871
#> it: 9, PLL:-100.768
#> it: 10, PLL:-100.634
#> it: 11, PLL:-100.453
#> it: 12, PLL:-100.222
#> it: 13, PLL:-99.9147
#> it: 14, PLL:-99.5009
#> it: 15, PLL:-98.9517
#> it: 16, PLL:-98.2935
#> it: 17, PLL:-97.5981
#> it: 18, PLL:-96.9549
#> it: 19, PLL:-96.4377
#> it: 20, PLL:-96.048
#> it: 21, PLL:-95.7756
#> it: 22, PLL:-95.5959
#> it: 23, PLL:-95.4771
#> it: 24, PLL:-95.3956
#> it: 25, PLL:-95.3385
#> it: 26, PLL:-95.2966
#> it: 27, PLL:-95.2642
#> it: 28, PLL:-95.2377
#> it: 29, PLL:-95.2153
#> it: 30, PLL:-95.1957
#> it: 31, PLL:-95.1782
#> it: 32, PLL:-95.1622
#> it: 33, PLL:-95.1478
#> it: 34, PLL:-95.1354
#> it: 35, PLL:-95.1246
#> it: 36, PLL:-95.1153
#> it: 37, PLL:-95.1073
#> it: 38, PLL:-95.1003
#> it: 39, PLL:-95.0943
#> it: 40, PLL:-95.0891
#> it: 41, PLL:-95.0846
#> it: 42, PLL:-95.0807
#> it: 43, PLL:-95.0774
#> it: 44, PLL:-95.0745
#> it: 45, PLL:-95.072
#> it: 46, PLL:-95.0699
#> it: 47, PLL:-95.0681
#> it: 48, PLL:-95.0666
#> it: 49, PLL:-95.0652
#> it: 50, PLL:-95.0641
#> it: 51, PLL:-95.0631
#> it: 52, PLL:-95.0623
#> it: 53, PLL:-95.0616
#> it: 54, PLL:-95.0611
#> it: 55, PLL:-95.0606
#> it: 56, PLL:-95.0602
#> it: 57, PLL:-95.0598
#> it: 58, PLL:-95.0596
#> it: 59, PLL:-95.0593
#> it: 60, PLL:-95.0592
#> it: 61, PLL:-95.059
#> it: 62, PLL:-95.0589
#> it: 63, PLL:-95.0588
#> it: 64, PLL:-95.0587
#> it: 65, PLL:-95.0587
#> it: 66, PLL:-95.0586
#> it: 67, PLL:-95.0586
#> it: 68, PLL:-95.0586
p_list <- plotGating(NEMoE_gen)
print(p_list[[1]])
print(p_list[[2]])
p <- plotExperts(NEMoE_gen)[[1]]
p + theme(axis.text.x = element_text(angle = 45, size = 5))
NEMoE object from phyloseqIn parallel, NEMoE object can be built from
phyloseq object from the microbiome counts data.
NEMoE wrapper functions for TSS normalization and
transformations of the counts data. Also which levels of the data
incorporated in the data can be determined by user.
sample_PD <- sample_data(PD$ps)
Response <- sample_PD$PD
Nutrition <- sample_PD[,-ncol(sample_PD)]In the following example, we first TSS normalized data, then filter
the taxa with prevalance larger than 0.7 (i.e. the taxa non-zero in more
than 70% of the sample) and variance larger than 5e-5. Then the data are
further tranformed using arcsin transformation and zscored. The
resulting Microbiome data is to used built NEMoE
object.
NEMoE_ps <- NEMoE_buildFromPhyloseq(ps = PD$ps, Nutrition = scale(Nutrition),
Response = Response, K = 2,
lambda1 = c(0.005, 0.014, 0.016, 0.023, 0.025),
lambda2 = 0.02, alpha1 = 0.5, alpha2 = 0.5,
filtParam = list(prev = 0.7, var = 1e-5),
transParam = list(method = "asin"),
taxLevel = c("Phylum","Order","Family",
"Genus","ASV"))The NEMoE object can also be fitted and visualized as it
is in the previous section.
NEMoE_ps <- fitNEMoE(NEMoE_ps)
#> Fitting NEMoE....
#> it: 0, PLL0:-494.62
#> it: 1, PLL:-493.94
#> it: 2, PLL:-493.284
#> it: 3, PLL:-492.686
#> it: 4, PLL:-492.094
#> it: 5, PLL:-491.481
#> it: 6, PLL:-490.83
#> it: 7, PLL:-490.129
#> it: 8, PLL:-489.357
#> it: 9, PLL:-488.485
#> it: 10, PLL:-487.511
#> it: 11, PLL:-486.401
#> it: 12, PLL:-485.14
#> it: 13, PLL:-483.711
#> it: 14, PLL:-482.088
#> it: 15, PLL:-480.307
#> it: 16, PLL:-478.45
#> it: 17, PLL:-476.66
#> it: 18, PLL:-475.071
#> it: 19, PLL:-473.713
#> it: 20, PLL:-472.585
#> it: 21, PLL:-471.669
#> it: 22, PLL:-470.943
#> it: 23, PLL:-470.389
#> it: 24, PLL:-469.979
#> it: 25, PLL:-469.67
#> it: 26, PLL:-469.436
#> it: 27, PLL:-469.256
#> it: 28, PLL:-469.116
#> it: 29, PLL:-469.004
#> it: 30, PLL:-468.932
#> it: 31, PLL:-468.885
#> it: 32, PLL:-468.852
#> it: 33, PLL:-468.827
#> it: 34, PLL:-468.81
#> it: 35, PLL:-468.797
#> it: 36, PLL:-468.789
#> it: 37, PLL:-468.782
#> it: 38, PLL:-468.777
#> it: 39, PLL:-468.773
#> it: 40, PLL:-468.77
#> it: 41, PLL:-468.767
#> it: 42, PLL:-468.766
#> it: 43, PLL:-468.764
#> it: 44, PLL:-468.763
#> it: 45, PLL:-468.762
#> it: 46, PLL:-468.761
#> it: 47, PLL:-468.761
#> it: 48, PLL:-468.761
#> it: 49, PLL:-468.76
#> it: 50, PLL:-468.76
#> it: 51, PLL:-468.76
#> it: 52, PLL:-468.76
p_list <- plotGating(NEMoE_ps)
print(p_list[[1]])
print(p_list[[2]])
p_list <- plotExperts(NEMoE_ps)
print(p_list[[1]] + ggtitle("Coefficients of Phylum level") + theme(axis.text.x = element_text(angle = 45))) 
print(p_list[[2]] + ggtitle("Coefficients of Order level") + theme(axis.text.x = element_text(angle = 45))) 
print(p_list[[3]] + ggtitle("Coefficients of Family level") + theme(axis.text.x = element_text(angle = 45))) 
print(p_list[[4]] + ggtitle("Coefficients of Genus level") + theme(axis.text.x = element_text(angle = 45, size = 5))) 
print(p_list[[5]] + ggtitle("Coefficients of ASV level") + theme(axis.text.x = element_blank())) 
We provide many metric including statistics such as AIC,
BIC and ICL and cross validation metric such
as cross validation accuracy and AUC.
calcCriterion(NEMoE, "all")
#> AIC BIC ICL1 ICL2 eBIC mAIC mBIC
#> [1,] -994.9178 -1301.066 -1601.14 -1907.288 -1668.063 -1034.918 -1328.765
#> mICL1 mICL2 accuracy D.square TPR TNR F1
#> [1,] -1641.14 -1934.987 0.6845588 0.1405786 0.7790186 0.5601094 0.7341976
#> auc
#> [1,] 0.7696813Also the one can use this procedure to select parameters.
NEMoE <- cvNEMoE(NEMoE)The selected parameters obtained from the cross validation result can be used to fitting the model.
lambda1_choose <- NEMoE@cvResult$lambda1_choose
lambda2_choose <- NEMoE@cvResult$lambda2_choose
NEMoE <- setParam(NEMoE, lambda1 = lambda1_choose, lambda2 = lambda2_choose)
NEMoE <- fitNEMoE(NEMoE)
#> Fitting NEMoE....
#> it: 0, PLL0:-501.013
#> it: 1, PLL:-500.212
#> it: 2, PLL:-499.472
#> it: 3, PLL:-498.733
#> it: 4, PLL:-497.929
#> it: 5, PLL:-497.022
#> it: 6, PLL:-496.004
#> it: 7, PLL:-494.839
#> it: 8, PLL:-493.45
#> it: 9, PLL:-491.807
#> it: 10, PLL:-489.977
#> it: 11, PLL:-488.07
#> it: 12, PLL:-486.185
#> it: 13, PLL:-484.433
#> it: 14, PLL:-482.941
#> it: 15, PLL:-481.805
#> it: 16, PLL:-480.969
#> it: 17, PLL:-480.343
#> it: 18, PLL:-479.852
#> it: 19, PLL:-479.398
#> it: 20, PLL:-478.804
#> it: 21, PLL:-478.021
#> it: 22, PLL:-477.311
#> it: 23, PLL:-476.739
#> it: 24, PLL:-476.297
#> it: 25, PLL:-476.022
#> it: 26, PLL:-475.841
#> it: 27, PLL:-475.714
#> it: 28, PLL:-475.621
#> it: 29, PLL:-475.549
#> it: 30, PLL:-475.487
#> it: 31, PLL:-475.443
#> it: 32, PLL:-475.407
#> it: 33, PLL:-475.374
#> it: 34, PLL:-475.341
#> it: 35, PLL:-475.309
#> it: 36, PLL:-475.274
#> it: 37, PLL:-475.236
#> it: 38, PLL:-475.192
#> it: 39, PLL:-475.141
#> it: 40, PLL:-475.084
#> it: 41, PLL:-475.019
#> it: 42, PLL:-474.971
#> it: 43, PLL:-474.924
#> it: 44, PLL:-474.879
#> it: 45, PLL:-474.837
#> it: 46, PLL:-474.808
#> it: 47, PLL:-474.785
#> it: 48, PLL:-474.763
#> it: 49, PLL:-474.742
#> it: 50, PLL:-474.72
#> it: 51, PLL:-474.698
#> it: 52, PLL:-474.676
#> it: 53, PLL:-474.651
#> it: 54, PLL:-474.622
#> it: 55, PLL:-474.589
#> it: 56, PLL:-474.547
#> it: 57, PLL:-474.499
#> it: 58, PLL:-474.451
#> it: 59, PLL:-474.417
#> it: 60, PLL:-474.387
#> it: 61, PLL:-474.36
#> it: 62, PLL:-474.333
#> it: 63, PLL:-474.302
#> it: 64, PLL:-474.27
#> it: 65, PLL:-474.233
#> it: 66, PLL:-474.189
#> it: 67, PLL:-474.134
#> it: 68, PLL:-474.066
#> it: 69, PLL:-474.012
#> it: 70, PLL:-473.974
#> it: 71, PLL:-473.947
#> it: 72, PLL:-473.927
#> it: 73, PLL:-473.912
#> it: 74, PLL:-473.902
#> it: 75, PLL:-473.895
#> it: 76, PLL:-473.89
#> it: 77, PLL:-473.888
#> it: 78, PLL:-473.886
#> it: 79, PLL:-473.885
#> it: 80, PLL:-473.885
#> it: 81, PLL:-473.884
#> it: 82, PLL:-473.884
#> it: 83, PLL:-473.884
#> it: 84, PLL:-473.884
#> it: 85, PLL:-473.884
p_list <- plotGating(NEMoE_ps)
print(p_list[[1]])
print(p_list[[2]])
p_list <- plotExperts(NEMoE)
print(p_list[[1]] + ggtitle("Coefficients of Phylum level") + theme(axis.text.x = element_text(angle = 45))) 
print(p_list[[2]] + ggtitle("Coefficients of Order level") + theme(axis.text.x = element_text(angle = 45))) 
print(p_list[[3]] + ggtitle("Coefficients of Family level") + theme(axis.text.x = element_text(angle = 45))) 
print(p_list[[4]] + ggtitle("Coefficients of Genus level") + theme(axis.text.x = element_text(angle = 45, size = 5))) 
print(p_list[[5]] + ggtitle("Coefficients of ASV level") + theme(axis.text.x = element_blank())) 
sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 22000)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=Chinese (Simplified)_China.936
#> [2] LC_CTYPE=Chinese (Simplified)_China.936
#> [3] LC_MONETARY=Chinese (Simplified)_China.936
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=Chinese (Simplified)_China.936
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] phyloseq_1.36.0 NEMoE_1.1.0 ggplot2_3.3.6
#>
#> loaded via a namespace (and not attached):
#> [1] nlme_3.1-153 bitops_1.0-7 fs_1.5.0
#> [4] rprojroot_2.0.2 GenomeInfoDb_1.28.4 tools_4.1.0
#> [7] bslib_0.3.0 utf8_1.2.2 R6_2.5.1
#> [10] vegan_2.5-7 DBI_1.1.3 BiocGenerics_0.38.0
#> [13] mgcv_1.8-36 colorspace_2.0-2 permute_0.9-5
#> [16] rhdf5filters_1.4.0 ade4_1.7-17 withr_2.5.0
#> [19] tidyselect_1.1.1 compiler_4.1.0 glmnet_4.1-2
#> [22] textshaping_0.3.5 cli_3.0.1 Biobase_2.52.0
#> [25] desc_1.3.0 labeling_0.4.2 sass_0.4.0
#> [28] scales_1.2.1 pkgdown_1.6.1 systemfonts_1.0.2
#> [31] stringr_1.4.0 digest_0.6.27 rmarkdown_2.10
#> [34] XVector_0.32.0 pkgconfig_2.0.3 htmltools_0.5.2
#> [37] highr_0.9 fastmap_1.1.0 rlang_1.0.4
#> [40] rstudioapi_0.13 shape_1.4.6 jquerylib_0.1.4
#> [43] generics_0.1.3 farver_2.1.1 jsonlite_1.7.2
#> [46] mclust_5.4.7 BiocParallel_1.26.2 dplyr_1.0.7
#> [49] RCurl_1.98-1.4 magrittr_2.0.3 GenomeInfoDbData_1.2.6
#> [52] biomformat_1.20.0 Matrix_1.3-4 Rcpp_1.0.7
#> [55] munsell_0.5.0 S4Vectors_0.30.0 Rhdf5lib_1.14.2
#> [58] fansi_1.0.3 ape_5.5 lifecycle_1.0.1
#> [61] stringi_1.7.4 yaml_2.2.1 MASS_7.3-54
#> [64] zlibbioc_1.38.0 rhdf5_2.36.0 plyr_1.8.6
#> [67] grid_4.1.0 parallel_4.1.0 crayon_1.5.1
#> [70] lattice_0.20-44 Biostrings_2.60.2 splines_4.1.0
#> [73] multtest_2.48.0 knitr_1.38 pillar_1.8.1
#> [76] igraph_1.2.6 reshape2_1.4.4 codetools_0.2-18
#> [79] stats4_4.1.0 glue_1.6.2 evaluate_0.15
#> [82] data.table_1.14.0 vctrs_0.4.1 foreach_1.5.2
#> [85] gtable_0.3.0 purrr_0.3.4 assertthat_0.2.1
#> [88] cachem_1.0.6 xfun_0.30 ragg_1.1.3
#> [91] survival_3.2-13 tibble_3.1.8 iterators_1.0.14
#> [94] memoise_2.0.1 IRanges_2.26.0 cluster_2.1.2
#> [97] ROCR_1.0-11