Imputing gene expression values using CPOP model

impute_cpop(cpop_result, x1, x2, newx)

Arguments

cpop_result

cpop_model result

x1

Original feature data matrix 1.

x2

Original feature data matrix 2.

newx

New original feature data matrix, with missing values.

Value

A vector

Examples

data(cpop_data_binary, package = 'CPOP')
## Loading simulated matrices and vectors
x1 = cpop_data_binary$x1
x2 = cpop_data_binary$x2
x3 = cpop_data_binary$x3
y1 = cpop_data_binary$y1
y2 = cpop_data_binary$y2
y3 = cpop_data_binary$y3
set.seed(1)
cpop_result = cpop_model(x1 = x1, x2 = x2, y1 = y1, y2 = y2, alpha = 0.1, n_features = 10)
#> Absolute colMeans difference will be used as the weights for CPOP
#> Fitting CPOP model using alpha = 0.1
#> Based on previous alpha, 0 features are kept 
#> CPOP1 - Step 01: Number of selected features: 0 out of 190
#> CPOP1 - Step 02: Number of selected features: 43 out of 190
#> 10 features was reached. 
#> A total of 43 features were selected. 
#> Removing sources of collinearity gives 18 features. 
#> 10 features was reached. 
#> A total of 18 features were selected. 
#> CPOP2 - Sign: Step 01: Number of leftover features: 12 out of 18
#> The sign matrix between the two data:
#>     
#>      -1 0 1
#>   -1  0 0 3
#>   0   0 0 0
#>   1   3 0 0
#> CPOP2 - Sign: Step 02: Number of leftover features: 12 out of 18
#> The sign matrix between the two data:
#>     
#>      -1 0 1
#>   -1  0 0 0
#>   0   0 0 0
#>   1   0 0 0
cpop_result
#> CPOP model with  12 features 
#> # A tibble: 13 × 3
#>    coef_name     coef1   coef2
#>    <chr>         <dbl>   <dbl>
#>  1 (Intercept)  0       0     
#>  2 X01--X02    -0.305  -0.216 
#>  3 X01--X03    -0.139  -0.109 
#>  4 X01--X06    -0.284  -0.193 
#>  5 X01--X07    -0.216  -0.150 
#>  6 X01--X09    -0.745  -0.382 
#>  7 X01--X11    -0.372  -0.264 
#>  8 X01--X13    -0.319  -0.205 
#>  9 X01--X14    -0.0488 -0.138 
#> 10 X01--X17    -0.176  -0.0962
#> 11 X01--X18    -0.338  -0.260 
#> 12 X01--X19    -0.0219 -0.247 
#> 13 X04--X20     0.481   0.286 
x3_pred_result = predict_cpop(cpop_result, newx = x3)
head(x3_pred_result)
#> # A tibble: 6 × 6
#>   samples cpop_model1 cpop_model2 cpop_model_avg cpop_model_avg_prob
#>   <chr>         <dbl>       <dbl>          <dbl>               <dbl>
#> 1 1             0.437     -0.107          0.165                0.540
#> 2 2            -0.824     -0.0719        -0.448                0.394
#> 3 3             0.248     -0.115          0.0664               0.516
#> 4 4            -0.877     -0.198         -0.537                0.372
#> 5 5             0.955      0.671          0.813                0.692
#> 6 6            -0.623     -1.05          -0.836                0.304
#> # … with 1 more variable: cpop_model_avg_class <chr>
## Introduce a column of missing values in a new matrix, x4.
x4 = x3
x4[,2] = NA
## Without imputation, the prediction function would not work properly
## This prompts the user to use an imputation on their data.
## head(predict_cpop(cpop_result, newx = x4))
## CPOP can perform imputation on the x4 matrix, before this matrix is converted into z4.
x4_imp = impute_cpop(cpop_result, x1 = x1, x2 = x2, newx = x4)
x4_pred_result = predict_cpop(cpop_result, newx = x4_imp)
head(x4_pred_result)
#> # A tibble: 6 × 6
#>   samples cpop_model1 cpop_model2 cpop_model_avg cpop_model_avg_prob
#>   <chr>         <dbl>       <dbl>          <dbl>               <dbl>
#> 1 1            -0.192     -0.553          -0.373               0.409
#> 2 2            -1.63      -0.646          -1.14                0.254
#> 3 3             0.470      0.0415          0.256               0.563
#> 4 4            -1.41      -0.574          -0.991               0.278
#> 5 5             0.416      0.290           0.353               0.587
#> 6 6            -1.32      -1.54           -1.43                0.193
#> # … with 1 more variable: cpop_model_avg_class <chr>
plot(x3_pred_result$cpop_model_avg_prob, x3_pred_result$cpop_model_avg_prob)