Introduction

Nutrition-Enterotype Mixture of Expert Model(NEMoE) is an R package that facilitates discover latent classes shaped by nutrition intake that makes relationship between gut microbiome and health outcome different.

The methods use a regularized mixture of experts model framework identify such latent classes and using EM algorithm to fitting the parameters in the model. The overall workflow of NEMoE are describe in the following figure:

workflow NEMoE use the nutrition data, microbiome data and health outcome as input to identify latent class based on the nutrients intake as well as the corresponding diet related microbiome signatures of disease.

In this vignette, we go through a 16S microbiome data of Parkinson’s disease to identify the latent classes shaped by nutrition intake and finding the microbiome signatures for each latent class.

Gut microbiome and Parkinson’s disease

The PD dataset contain data from a gut microbiome-Parkinson’s disease study, the nutrition intake of each individual is also provided from food frequency questionnaire. We provide two type of the data(i.e. phyloseq object and a list) to construct input of NEMoE.

In this example, we will use NEMoE to identify two latent class of different nutrition pattern with altered microbiome-PD relationships and identify the related nutrition features for the latent class and microbiome signatures for each latent class.

data("PD")

Build the NEMoE object from list

First we build the NEMoE object, the input data to build NEMoE have three part: microbiome data, nutrition data and health outcome. The nutrition data can be either matrix or dataframe which each row is a sample and each column is a nutrient. The microbiome data can be either a list of table that contain several microbiome matrix where each row is a sample and each column is a taxa (at the level) or a phyloseq. The response is a vector of different health state.

In the following example, we build NEMoE object from list of microbiome data.

Microbiome = PD$data_list$Microbiome
Nutrition = PD$data_list$Nutrition
Response = PD$data_list$Response

NEMoE object can also incorporate the parameters used in the fitting, including K,lambda1, lambda2, alpha1, alpha2 and cvParams.

NEMoE = NEMoE_buildFromList(Microbiome, Nutrition, Response, K = 2, 
                            lambda1 = c(0.005, 0.01, 0.016, 0.023, 0.025),
                            lambda2 = 0.02, alpha1 = 0.5, alpha2 = 0.5,
                            cvParams = createCVList(g1 = 10, shrink = 0.4,
                                                    track = F))

Fit the NEMoE object

Now using the fitNEMoE function, we can fit NEMoE object with the specified parameters in the NEMoE object.

NEMoE = fitNEMoE(NEMoE)
#> Fitting NEMoE.... 
#> it: 0, PLL0:-499.79
#> it: 1, PLL:-498.912
#> it: 2, PLL:-498.135
#> it: 3, PLL:-497.429
#> it: 4, PLL:-496.733
#> it: 5, PLL:-495.999
#> it: 6, PLL:-495.196
#> it: 7, PLL:-494.288
#> it: 8, PLL:-493.227
#> it: 9, PLL:-491.953
#> it: 10, PLL:-490.378
#> it: 11, PLL:-488.453
#> it: 12, PLL:-486.091
#> it: 13, PLL:-483.381
#> it: 14, PLL:-480.661
#> it: 15, PLL:-478.076
#> it: 16, PLL:-475.773
#> it: 17, PLL:-473.822
#> it: 18, PLL:-472.299
#> it: 19, PLL:-471.153
#> it: 20, PLL:-470.36
#> it: 21, PLL:-469.868
#> it: 22, PLL:-469.542
#> it: 23, PLL:-469.326
#> it: 24, PLL:-469.179
#> it: 25, PLL:-469.077
#> it: 26, PLL:-469.008
#> it: 27, PLL:-468.962
#> it: 28, PLL:-468.93
#> it: 29, PLL:-468.906
#> it: 30, PLL:-468.888
#> it: 31, PLL:-468.872
#> it: 32, PLL:-468.859
#> it: 33, PLL:-468.848
#> it: 34, PLL:-468.838
#> it: 35, PLL:-468.829
#> it: 36, PLL:-468.821
#> it: 37, PLL:-468.813
#> it: 38, PLL:-468.806
#> it: 39, PLL:-468.797
#> it: 40, PLL:-468.789
#> it: 41, PLL:-468.778
#> it: 42, PLL:-468.766
#> it: 43, PLL:-468.752
#> it: 44, PLL:-468.733
#> it: 45, PLL:-468.714
#> it: 46, PLL:-468.696
#> it: 47, PLL:-468.679
#> it: 48, PLL:-468.663
#> it: 49, PLL:-468.648
#> it: 50, PLL:-468.632
#> it: 51, PLL:-468.617
#> it: 52, PLL:-468.602
#> it: 53, PLL:-468.588
#> it: 54, PLL:-468.573
#> it: 55, PLL:-468.56
#> it: 56, PLL:-468.549
#> it: 57, PLL:-468.539
#> it: 58, PLL:-468.531
#> it: 59, PLL:-468.525
#> it: 60, PLL:-468.52
#> it: 61, PLL:-468.516
#> it: 62, PLL:-468.514
#> it: 63, PLL:-468.513
#> it: 64, PLL:-468.513
#> it: 65, PLL:-468.513
#> it: 66, PLL:-468.513
#> it: 67, PLL:-468.513
#> it: 68, PLL:-468.513

The fitted log-likelihood can be get from getLL function.

getLL(NEMoE)
#>            LL_obs   PLL_obs LL_complete PLL_complete
#> Phylum  -90.21069 -117.7364   -147.4908    -175.0165
#> Order   -83.14621 -114.5389   -137.9087    -169.3014
#> Family  -82.83030 -115.6088   -139.6559    -172.4345
#> Genus   -78.56857 -113.8025   -144.2888    -179.5227
#> ASV     -64.70312 -109.2637   -133.2256    -177.7862
#> All    -399.45888 -468.5129   -702.5698    -771.6239

The corresponding coefficients in gating function, i.e. the effect size of each nutrition variables of their contribution to the different latent class can be obtained from getCoef function.

coef.Gating <- getCoef(NEMoE)$coef.gating

These coefficients of gating network can be plotted by plotGating function.

p_list <- plotGating(NEMoE)
print(p_list[[1]])

print(p_list[[2]])

The corresponding coefficients in experts network, i.e. the effect size of each microbiome features of their contribution to health outcome (PD state here) in different latent class can be obtained from getCoef function.

coef.Experts <- getCoef(NEMoE)$coef.experts

These coefficients of experts network can be plotted by plotGating function.

p_list <- plotExperts(NEMoE)
print(p_list[[1]] + ggtitle("Coefficients of Phylum level") + theme(axis.text.x = element_text(angle = 45))) 

print(p_list[[2]] + ggtitle("Coefficients of Order level") + theme(axis.text.x = element_text(angle = 45))) 

print(p_list[[3]] + ggtitle("Coefficients of Family level") + theme(axis.text.x = element_text(angle = 45))) 

print(p_list[[4]] + ggtitle("Coefficients of Genus level") + theme(axis.text.x = element_text(angle = 45, size = 5))) 

print(p_list[[5]] + ggtitle("Coefficients of ASV level") + theme(axis.text.x = element_blank())) 

Single level NEMoE

NEMoE can also built based on single level data, we illustrated it using Genus level of the Microbiome-PD dataset.

Microbiome_gen <- Microbiome$Genus
NEMoE_gen = NEMoE_buildFromList(Microbiome_gen, Nutrition, Response, K = 2, 
                            lambda1 = 0.028, lambda2 = 0.02,
                            alpha1 = 0.5, alpha2 = 0.5)
NEMoE_gen <- fitNEMoE(NEMoE_gen, restart_it = 20)
#> Fitting NEMoE.... 
#> it: 0, PLL0:-101.423
#> it: 1, PLL:-101.359
#> it: 2, PLL:-101.294
#> it: 3, PLL:-101.228
#> it: 4, PLL:-101.163
#> it: 5, PLL:-101.1
#> it: 6, PLL:-101.032
#> it: 7, PLL:-100.957
#> it: 8, PLL:-100.871
#> it: 9, PLL:-100.768
#> it: 10, PLL:-100.634
#> it: 11, PLL:-100.453
#> it: 12, PLL:-100.222
#> it: 13, PLL:-99.9147
#> it: 14, PLL:-99.5009
#> it: 15, PLL:-98.9517
#> it: 16, PLL:-98.2935
#> it: 17, PLL:-97.5981
#> it: 18, PLL:-96.9549
#> it: 19, PLL:-96.4377
#> it: 20, PLL:-96.048
#> it: 21, PLL:-95.7756
#> it: 22, PLL:-95.5959
#> it: 23, PLL:-95.4771
#> it: 24, PLL:-95.3956
#> it: 25, PLL:-95.3385
#> it: 26, PLL:-95.2966
#> it: 27, PLL:-95.2642
#> it: 28, PLL:-95.2377
#> it: 29, PLL:-95.2153
#> it: 30, PLL:-95.1957
#> it: 31, PLL:-95.1782
#> it: 32, PLL:-95.1622
#> it: 33, PLL:-95.1478
#> it: 34, PLL:-95.1354
#> it: 35, PLL:-95.1246
#> it: 36, PLL:-95.1153
#> it: 37, PLL:-95.1073
#> it: 38, PLL:-95.1003
#> it: 39, PLL:-95.0943
#> it: 40, PLL:-95.0891
#> it: 41, PLL:-95.0846
#> it: 42, PLL:-95.0807
#> it: 43, PLL:-95.0774
#> it: 44, PLL:-95.0745
#> it: 45, PLL:-95.072
#> it: 46, PLL:-95.0699
#> it: 47, PLL:-95.0681
#> it: 48, PLL:-95.0666
#> it: 49, PLL:-95.0652
#> it: 50, PLL:-95.0641
#> it: 51, PLL:-95.0631
#> it: 52, PLL:-95.0623
#> it: 53, PLL:-95.0616
#> it: 54, PLL:-95.0611
#> it: 55, PLL:-95.0606
#> it: 56, PLL:-95.0602
#> it: 57, PLL:-95.0598
#> it: 58, PLL:-95.0596
#> it: 59, PLL:-95.0593
#> it: 60, PLL:-95.0592
#> it: 61, PLL:-95.059
#> it: 62, PLL:-95.0589
#> it: 63, PLL:-95.0588
#> it: 64, PLL:-95.0587
#> it: 65, PLL:-95.0587
#> it: 66, PLL:-95.0586
#> it: 67, PLL:-95.0586
#> it: 68, PLL:-95.0586
p_list <- plotGating(NEMoE_gen)
print(p_list[[1]])

print(p_list[[2]])

p <- plotExperts(NEMoE_gen)[[1]]
p + theme(axis.text.x = element_text(angle = 45, size = 5))

Build the NEMoE object from phyloseq

In parallel, NEMoE object can be built from phyloseq object from the microbiome counts data. NEMoE wrapper functions for TSS normalization and transformations of the counts data. Also which levels of the data incorporated in the data can be determined by user.

sample_PD <- sample_data(PD$ps)
Response <- sample_PD$PD
Nutrition <- sample_PD[,-ncol(sample_PD)]

In the following example, we first TSS normalized data, then filter the taxa with prevalance larger than 0.7 (i.e. the taxa non-zero in more than 70% of the sample) and variance larger than 5e-5. Then the data are further tranformed using arcsin transformation and zscored. The resulting Microbiome data is to used built NEMoE object.

NEMoE_ps <- NEMoE_buildFromPhyloseq(ps = PD$ps, Nutrition = scale(Nutrition),
                                 Response = Response,  K = 2, 
                                 lambda1 = c(0.005, 0.014, 0.016, 0.023, 0.025),
                                 lambda2 = 0.02, alpha1 = 0.5, alpha2 = 0.5,
                                 filtParam = list(prev = 0.7, var = 1e-5),
                                 transParam = list(method = "asin"),
                                 taxLevel = c("Phylum","Order","Family",
                                              "Genus","ASV"))

The NEMoE object can also be fitted and visualized as it is in the previous section.

NEMoE_ps <- fitNEMoE(NEMoE_ps)
#> Fitting NEMoE.... 
#> it: 0, PLL0:-494.62
#> it: 1, PLL:-493.94
#> it: 2, PLL:-493.284
#> it: 3, PLL:-492.686
#> it: 4, PLL:-492.094
#> it: 5, PLL:-491.481
#> it: 6, PLL:-490.83
#> it: 7, PLL:-490.129
#> it: 8, PLL:-489.357
#> it: 9, PLL:-488.485
#> it: 10, PLL:-487.511
#> it: 11, PLL:-486.401
#> it: 12, PLL:-485.14
#> it: 13, PLL:-483.711
#> it: 14, PLL:-482.088
#> it: 15, PLL:-480.307
#> it: 16, PLL:-478.45
#> it: 17, PLL:-476.66
#> it: 18, PLL:-475.071
#> it: 19, PLL:-473.713
#> it: 20, PLL:-472.585
#> it: 21, PLL:-471.669
#> it: 22, PLL:-470.943
#> it: 23, PLL:-470.389
#> it: 24, PLL:-469.979
#> it: 25, PLL:-469.67
#> it: 26, PLL:-469.436
#> it: 27, PLL:-469.256
#> it: 28, PLL:-469.116
#> it: 29, PLL:-469.004
#> it: 30, PLL:-468.932
#> it: 31, PLL:-468.885
#> it: 32, PLL:-468.852
#> it: 33, PLL:-468.827
#> it: 34, PLL:-468.81
#> it: 35, PLL:-468.797
#> it: 36, PLL:-468.789
#> it: 37, PLL:-468.782
#> it: 38, PLL:-468.777
#> it: 39, PLL:-468.773
#> it: 40, PLL:-468.77
#> it: 41, PLL:-468.767
#> it: 42, PLL:-468.766
#> it: 43, PLL:-468.764
#> it: 44, PLL:-468.763
#> it: 45, PLL:-468.762
#> it: 46, PLL:-468.761
#> it: 47, PLL:-468.761
#> it: 48, PLL:-468.761
#> it: 49, PLL:-468.76
#> it: 50, PLL:-468.76
#> it: 51, PLL:-468.76
#> it: 52, PLL:-468.76
p_list <- plotGating(NEMoE_ps)
print(p_list[[1]])

print(p_list[[2]])

p_list <- plotExperts(NEMoE_ps)
print(p_list[[1]] + ggtitle("Coefficients of Phylum level") + theme(axis.text.x = element_text(angle = 45))) 

print(p_list[[2]] + ggtitle("Coefficients of Order level") + theme(axis.text.x = element_text(angle = 45))) 

print(p_list[[3]] + ggtitle("Coefficients of Family level") + theme(axis.text.x = element_text(angle = 45))) 

print(p_list[[4]] + ggtitle("Coefficients of Genus level") + theme(axis.text.x = element_text(angle = 45, size = 5))) 

print(p_list[[5]] + ggtitle("Coefficients of ASV level") + theme(axis.text.x = element_blank())) 

Evaluation and cross validation of NEMoE

We provide many metric including statistics such as AIC, BIC and ICL and cross validation metric such as cross validation accuracy and AUC.

calcCriterion(NEMoE, "all")
#>            AIC       BIC     ICL1      ICL2      eBIC      mAIC      mBIC
#> [1,] -994.9178 -1301.066 -1601.14 -1907.288 -1668.063 -1034.918 -1328.765
#>         mICL1     mICL2  accuracy  D.square       TPR       TNR        F1
#> [1,] -1641.14 -1934.987 0.6845588 0.1405786 0.7790186 0.5601094 0.7341976
#>            auc
#> [1,] 0.7696813

Also the one can use this procedure to select parameters.

NEMoE <- cvNEMoE(NEMoE)

The selected parameters obtained from the cross validation result can be used to fitting the model.

lambda1_choose <- NEMoE@cvResult$lambda1_choose
lambda2_choose <- NEMoE@cvResult$lambda2_choose
NEMoE <- setParam(NEMoE, lambda1 = lambda1_choose, lambda2 = lambda2_choose)
NEMoE <- fitNEMoE(NEMoE)
#> Fitting NEMoE.... 
#> it: 0, PLL0:-501.013
#> it: 1, PLL:-500.212
#> it: 2, PLL:-499.472
#> it: 3, PLL:-498.733
#> it: 4, PLL:-497.929
#> it: 5, PLL:-497.022
#> it: 6, PLL:-496.004
#> it: 7, PLL:-494.839
#> it: 8, PLL:-493.45
#> it: 9, PLL:-491.807
#> it: 10, PLL:-489.977
#> it: 11, PLL:-488.07
#> it: 12, PLL:-486.185
#> it: 13, PLL:-484.433
#> it: 14, PLL:-482.941
#> it: 15, PLL:-481.805
#> it: 16, PLL:-480.969
#> it: 17, PLL:-480.343
#> it: 18, PLL:-479.852
#> it: 19, PLL:-479.398
#> it: 20, PLL:-478.804
#> it: 21, PLL:-478.021
#> it: 22, PLL:-477.311
#> it: 23, PLL:-476.739
#> it: 24, PLL:-476.297
#> it: 25, PLL:-476.022
#> it: 26, PLL:-475.841
#> it: 27, PLL:-475.714
#> it: 28, PLL:-475.621
#> it: 29, PLL:-475.549
#> it: 30, PLL:-475.487
#> it: 31, PLL:-475.443
#> it: 32, PLL:-475.407
#> it: 33, PLL:-475.374
#> it: 34, PLL:-475.341
#> it: 35, PLL:-475.309
#> it: 36, PLL:-475.274
#> it: 37, PLL:-475.236
#> it: 38, PLL:-475.192
#> it: 39, PLL:-475.141
#> it: 40, PLL:-475.084
#> it: 41, PLL:-475.019
#> it: 42, PLL:-474.971
#> it: 43, PLL:-474.924
#> it: 44, PLL:-474.879
#> it: 45, PLL:-474.837
#> it: 46, PLL:-474.808
#> it: 47, PLL:-474.785
#> it: 48, PLL:-474.763
#> it: 49, PLL:-474.742
#> it: 50, PLL:-474.72
#> it: 51, PLL:-474.698
#> it: 52, PLL:-474.676
#> it: 53, PLL:-474.651
#> it: 54, PLL:-474.622
#> it: 55, PLL:-474.589
#> it: 56, PLL:-474.547
#> it: 57, PLL:-474.499
#> it: 58, PLL:-474.451
#> it: 59, PLL:-474.417
#> it: 60, PLL:-474.387
#> it: 61, PLL:-474.36
#> it: 62, PLL:-474.333
#> it: 63, PLL:-474.302
#> it: 64, PLL:-474.27
#> it: 65, PLL:-474.233
#> it: 66, PLL:-474.189
#> it: 67, PLL:-474.134
#> it: 68, PLL:-474.066
#> it: 69, PLL:-474.012
#> it: 70, PLL:-473.974
#> it: 71, PLL:-473.947
#> it: 72, PLL:-473.927
#> it: 73, PLL:-473.912
#> it: 74, PLL:-473.902
#> it: 75, PLL:-473.895
#> it: 76, PLL:-473.89
#> it: 77, PLL:-473.888
#> it: 78, PLL:-473.886
#> it: 79, PLL:-473.885
#> it: 80, PLL:-473.885
#> it: 81, PLL:-473.884
#> it: 82, PLL:-473.884
#> it: 83, PLL:-473.884
#> it: 84, PLL:-473.884
#> it: 85, PLL:-473.884
p_list <- plotGating(NEMoE_ps)
print(p_list[[1]])

print(p_list[[2]])

p_list <- plotExperts(NEMoE)
print(p_list[[1]] + ggtitle("Coefficients of Phylum level") + theme(axis.text.x = element_text(angle = 45))) 

print(p_list[[2]] + ggtitle("Coefficients of Order level") + theme(axis.text.x = element_text(angle = 45))) 

print(p_list[[3]] + ggtitle("Coefficients of Family level") + theme(axis.text.x = element_text(angle = 45))) 

print(p_list[[4]] + ggtitle("Coefficients of Genus level") + theme(axis.text.x = element_text(angle = 45, size = 5))) 

print(p_list[[5]] + ggtitle("Coefficients of ASV level") + theme(axis.text.x = element_blank())) 

Misc

sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 22000)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=Chinese (Simplified)_China.936 
#> [2] LC_CTYPE=Chinese (Simplified)_China.936   
#> [3] LC_MONETARY=Chinese (Simplified)_China.936
#> [4] LC_NUMERIC=C                              
#> [5] LC_TIME=Chinese (Simplified)_China.936    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] phyloseq_1.36.0 NEMoE_1.1.0     ggplot2_3.3.6  
#> 
#> loaded via a namespace (and not attached):
#>  [1] nlme_3.1-153           bitops_1.0-7           fs_1.5.0              
#>  [4] rprojroot_2.0.2        GenomeInfoDb_1.28.4    tools_4.1.0           
#>  [7] bslib_0.3.0            utf8_1.2.2             R6_2.5.1              
#> [10] vegan_2.5-7            DBI_1.1.3              BiocGenerics_0.38.0   
#> [13] mgcv_1.8-36            colorspace_2.0-2       permute_0.9-5         
#> [16] rhdf5filters_1.4.0     ade4_1.7-17            withr_2.5.0           
#> [19] tidyselect_1.1.1       compiler_4.1.0         glmnet_4.1-2          
#> [22] textshaping_0.3.5      cli_3.0.1              Biobase_2.52.0        
#> [25] desc_1.3.0             labeling_0.4.2         sass_0.4.0            
#> [28] scales_1.2.1           pkgdown_1.6.1          systemfonts_1.0.2     
#> [31] stringr_1.4.0          digest_0.6.27          rmarkdown_2.10        
#> [34] XVector_0.32.0         pkgconfig_2.0.3        htmltools_0.5.2       
#> [37] highr_0.9              fastmap_1.1.0          rlang_1.0.4           
#> [40] rstudioapi_0.13        shape_1.4.6            jquerylib_0.1.4       
#> [43] generics_0.1.3         farver_2.1.1           jsonlite_1.7.2        
#> [46] mclust_5.4.7           BiocParallel_1.26.2    dplyr_1.0.7           
#> [49] RCurl_1.98-1.4         magrittr_2.0.3         GenomeInfoDbData_1.2.6
#> [52] biomformat_1.20.0      Matrix_1.3-4           Rcpp_1.0.7            
#> [55] munsell_0.5.0          S4Vectors_0.30.0       Rhdf5lib_1.14.2       
#> [58] fansi_1.0.3            ape_5.5                lifecycle_1.0.1       
#> [61] stringi_1.7.4          yaml_2.2.1             MASS_7.3-54           
#> [64] zlibbioc_1.38.0        rhdf5_2.36.0           plyr_1.8.6            
#> [67] grid_4.1.0             parallel_4.1.0         crayon_1.5.1          
#> [70] lattice_0.20-44        Biostrings_2.60.2      splines_4.1.0         
#> [73] multtest_2.48.0        knitr_1.38             pillar_1.8.1          
#> [76] igraph_1.2.6           reshape2_1.4.4         codetools_0.2-18      
#> [79] stats4_4.1.0           glue_1.6.2             evaluate_0.15         
#> [82] data.table_1.14.0      vctrs_0.4.1            foreach_1.5.2         
#> [85] gtable_0.3.0           purrr_0.3.4            assertthat_0.2.1      
#> [88] cachem_1.0.6           xfun_0.30              ragg_1.1.3            
#> [91] survival_3.2-13        tibble_3.1.8           iterators_1.0.14      
#> [94] memoise_2.0.1          IRanges_2.26.0         cluster_2.1.2         
#> [97] ROCR_1.0-11