Precision Pathways for Sample Prediction Based on Prediction Confidence.

Precision pathways allows the evaluation of various permutations of multiomics or multiview data. Samples are predicted by a particular assay if they were consistently predicted as a particular class during cross-validation. Otherwise, they are passed onto subsequent assays/tiers for prediction. Balanced accuracy is used to evaluate overall prediction performance and sample-specific accuracy for individual-level evaluation.

Usage

# S4 method for class 'MultiAssayExperimentOrList'
precisionPathwaysTrain(
  measurements,
  class,
  useFeatures = NULL,
  maxMissingProp = 0,
  topNvariance = NULL,
  fixedAssays = "clinical",
  confidenceCutoff = 0.8,
  minAssaySamples = 10,
  nFeatures = 20,
  selectionMethod = setNames(c("none", rep("t-test", length(measurements))),
    c("clinical", names(measurements))),
  classifier = setNames(c("elasticNetGLM", rep("randomForest", length(measurements))),
    c("clinical", names(measurements))),
  nFolds = 5,
  nRepeats = 20,
  nCores = 1
)

# S4 method for class 'PrecisionPathways,MultiAssayExperimentOrList'
precisionPathwaysPredict(pathways, measurements, class)

Arguments

measurements: Either a MultiAssayExperiment or a list of the basic tabular objects containing the data.
class: If a MultiAssayExperiment, a column name in colData(measurements) with the classes. If measurements is a list of tabular data, may also be a vector of classes.
useFeatures: Default: NULL (i.e. use all provided features). A named list of features to use. Otherwise, the input data is a single table and this can just be a vector of feature names. For any assays not in the named list, all of their features are used. "clinical" is also a valid assay name and refers to the clinical data table. This allows for the avoidance of variables such spike-in RNAs, sample IDs, sample acquisition dates, etc. which are not relevant for outcome prediction.
maxMissingProp: Default: 0.0. A proportion less than 1 which is the maximum tolerated proportion of missingness for a feature to be retained for modelling.
topNvariance: Default: NULL. An integer number of most variable features per assay to subset to. Assays with less features won't be reduced in size.
fixedAssays: A character vector of assay names specifying any assays which must be at the beginning of the pathway.
confidenceCutoff: The minimum confidence of predictions for a sample to be predicted by a particular issue . If a sample was predicted to belong to a particular class a proportion \(p\) times, then the confidence is \(2 \times |p - 0.5|\).
minAssaySamples: An integer specifying the minimum number of samples a tier may have. If a subsequent tier would have less than this number of samples, the samples are incorporated into the current tier.
nFeatures: Default: 20. The number of features to consider during feature selection, if feature selection is done.
selectionMethod: A named character vector of feature selection methods to use for the assays, one for each. The names must correspond to names of measurements.
classifier: A named character vector of modelling methods to use for the assays, one for each. The names must correspond to names of measurements.
nFolds: A numeric specifying the number of folds to use for cross-validation.
nRepeats: A numeric specifying the the number of repeats or permutations to use for cross-validation.
nCores: A numeric specifying the number of cores used if the user wants to use parallelisation.
pathways: A set of pathways created by precisionPathwaysTrain which is an object of class PrecisionPathways to be used for predicting on a new data set.

Value

An object of class PrecisionPathways which is basically a named list that other plotting and tabulating functions can use.

Examples

# To be determined.