This function generate a data follows a mixture distribution with different sample size and variables in the dataset.

genNEMoE(
  n = NULL,
  p = NULL,
  q = 30,
  K0 = 2,
  Sigma = NULL,
  eta = 0.5,
  c_g = 1,
  c_e = 1,
  s1 = 3,
  s2 = 4,
  p_L = c(10, 20, 50),
  fix_X = NULL,
  gen_Micro = "zinLDA",
  prev_filt = 0.3,
  var_filt = 1e-06,
  method = "comp",
  scale = T,
  link = "probit",
  beta_max = 100,
  ...
)

Arguments

n

Number of samples when generating the dataset. By default is 200.

p

Number of variables for experts network input. By default is 30.

q

Number of variables for gating netwrok By defult is 20.

K0

Number of components for latent class in dataset. By default is 2.

Sigma

Covariance matrix for gating network input. If it is NULL, will take identity matrix as covariance.

eta

Coefficient of separation parameters in generating data for gating networks. By default is 0.5.

c_g

Coefficient of signal strength parameters in generating data for gating networks. By default is 1.

c_e

Coefficient of signal strength parameters in experts data for gating networks. By default is 1.

s1

Number of non-zeros coefficient in experts network input. By default is 5.

s2

Number of non-zeros coefficient in gating network input By default is 5.

p_L

A numeric vector of length (L-1), each entries indicate number of variables in each level.

fix_X

Fixed microbiome input matrix. If NULL, will generate using zinLDA model.

gen_Micro

A character indicates which model used in generate microbiome data, can be chosen from "zinLDA", "dm" and "mgauss", means zero-inflated latent Dirichelet allocation model, Dirichlet multinomial model and multivariate gaussian model.

prev_filt

The threshold of prevalence of selected that have non-zero coefficients. By default is 0.3.

var_filt

The threshold of variance of selected that have non-zero coefficients. By default is 1e-6.

method

The transformation method used for construct relationship in experts network. If method = "comp", use prepositional data. If method = "asin", use arcsin transformed compositional data. If method = "clr", use central log ratio transformed compositional data. By default is "comp".

scale

Logical variable to indicate whether to use scaled coefficient. By default is TRUE

link

the method for generating response y. If link = "probit", use mixture of probit model. If version = "logit", use mixture of logistic model. By default is logit.

beta_max

Maximal number of coefficients for experts network.

...

other parameters can be passed to genNEMoE. i.e. parameters in zinLDA (K = 5, Alpha = 10, Pi = 0.4, a = 0.05, b = 10)

Value

A list contain the generated microbiome dataX, nutrition dataW, health response y, coefficients of experts network beta, coefficients of gating network gamma, simulated observed logits pi, simulated latent group latent and simulated response probability y_prob.

Examples

dat <- genNEMoE(n = 10, p = 10000, q = 30)