5  Procedure 4

The transferability of biomarkers from one patient population to another is often difficult to fully capture. Here we present three options for first assessing and then building transferable models from gene expression data.

5.1 Intro

5.1.1 Criss-cross validate

Let’s suppose you want to assess how biomarkers selected from one population relate to another. In typical fashion you may build a model on one dataset through some cross-validation strategy and then attempt to predict the outcome of the paitent population in teh other cohort. This is of course rather procedural. Hence, criss-cross validate. A technique that performs a cross validation model building procedure on one dataset or group of patients and then applies this model to the other datasets you have collected. This procedure is repeated for n datasets.

We will use the recently published PROMAD database as a quick and easy way to collect

5.2 Next thing

library(ClassifyR)
Loading required package: generics

Attaching package: 'generics'
The following objects are masked from 'package:base':

    as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
    setequal, union
Loading required package: S4Vectors
Loading required package: stats4
Loading required package: BiocGenerics

Attaching package: 'BiocGenerics'
The following objects are masked from 'package:generics':

    intersect, setdiff, union
The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs
The following objects are masked from 'package:base':

    anyDuplicated, aperm, append, as.data.frame, basename, cbind,
    colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
    get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
    Position, rank, rbind, Reduce, rownames, sapply, saveRDS, setdiff,
    table, tapply, union, unique, unsplit, which.max, which.min

Attaching package: 'S4Vectors'
The following object is masked from 'package:utils':

    findMatches
The following objects are masked from 'package:base':

    expand.grid, I, unname
Loading required package: MultiAssayExperiment
Loading required package: SummarizedExperiment
Loading required package: MatrixGenerics
Loading required package: matrixStats

Attaching package: 'MatrixGenerics'
The following objects are masked from 'package:matrixStats':

    colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
    colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
    colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
    colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
    colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
    colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
    colWeightedMeans, colWeightedMedians, colWeightedSds,
    colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
    rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
    rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
    rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
    rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
    rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
    rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
    rowWeightedSds, rowWeightedVars
Loading required package: GenomicRanges
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Attaching package: 'Biobase'
The following object is masked from 'package:MatrixGenerics':

    rowMedians
The following objects are masked from 'package:matrixStats':

    anyMissing, rowMedians
Loading required package: BiocParallel
Loading required package: survival

Attaching package: 'ClassifyR'
The following object is masked from 'package:Biobase':

    sampleNames
promad_data <- readRDS("data/procedure4/PROMAD_sample.Rds")
ccv = crissCrossValidate(measurements = promad_data$measurements, 
                         outcome = promad_data$outcome,
                         classifier = "SVM",
                         nCores = 4)
Warning in .local(measurements, ...): Unsafe feature names in input data.
Converted into safe names.
Warning in .local(measurements, ...): Unsafe feature names in input data.
Converted into safe names.
Warning in .local(measurements, ...): Unsafe feature names in input data.
Converted into safe names.
Warning in .local(measurements, ...): Unsafe feature names in input data.
Converted into safe names.
crissCrossPlot(ccv)

library(TOP)
Data_temp = lapply(promad_data$measurements, "[", , TOP::filterFeatures(x_list = promad_data$measurements, 
                                                        y_list = promad_data$outcome, 
                                                        contrast = "AR - Control", 
                                                        nFeatures = 50))
Warning in merge.data.frame(x, y, by = "gene", all = TRUE, no.dups = TRUE):
column names 't.x', 't.y' are duplicated in the result
topModel <- TOP::TOP_model(x_list = Data_temp, y_list = promad_data$outcome)
Calculating Pairwise Ratios of Features
Calculating Fold Changes of Pairwise Ratios
Calculating Final Weights
Fitting final lasso model

more text