Precision Pathways for Sample Prediction Based on Prediction Confidence.
precisionPathways.Rd
Precision pathways allows the evaluation of various permutations of multiomics or multiview data. Samples are predicted by a particular assay if they were consistently predicted as a particular class during cross-validation. Otherwise, they are passed onto subsequent assays/tiers for prediction. Balanced accuracy is used to evaluate overall prediction performance and sample-specific accuracy for individual-level evaluation.
Usage
# S4 method for class 'MultiAssayExperimentOrList'
precisionPathwaysTrain(
measurements,
class,
useFeatures = NULL,
maxMissingProp = 0,
topNvariance = NULL,
fixedAssays = "clinical",
confidenceCutoff = 0.8,
minAssaySamples = 10,
nFeatures = 20,
selectionMethod = setNames(c("none", rep("t-test", length(measurements))),
c("clinical", names(measurements))),
classifier = setNames(c("elasticNetGLM", rep("randomForest", length(measurements))),
c("clinical", names(measurements))),
nFolds = 5,
nRepeats = 20,
nCores = 1
)
# S4 method for class 'PrecisionPathways,MultiAssayExperimentOrList'
precisionPathwaysPredict(pathways, measurements, class)
Arguments
- measurements
Either a
MultiAssayExperiment
or a list of the basic tabular objects containing the data.- class
If a
MultiAssayExperiment
, a column name incolData(measurements)
with the classes. Ifmeasurements
is alist
of tabular data, may also be a vector of classes.- useFeatures
Default:
NULL
(i.e. use all provided features). A named list of features to use. Otherwise, the input data is a single table and this can just be a vector of feature names. For any assays not in the named list, all of their features are used."clinical"
is also a valid assay name and refers to the clinical data table. This allows for the avoidance of variables such spike-in RNAs, sample IDs, sample acquisition dates, etc. which are not relevant for outcome prediction.- maxMissingProp
Default: 0.0. A proportion less than 1 which is the maximum tolerated proportion of missingness for a feature to be retained for modelling.
- topNvariance
Default: NULL. An integer number of most variable features per assay to subset to. Assays with less features won't be reduced in size.
- fixedAssays
A character vector of assay names specifying any assays which must be at the beginning of the pathway.
- confidenceCutoff
The minimum confidence of predictions for a sample to be predicted by a particular issue . If a sample was predicted to belong to a particular class a proportion \(p\) times, then the confidence is \(2 \times |p - 0.5|\).
- minAssaySamples
An integer specifying the minimum number of samples a tier may have. If a subsequent tier would have less than this number of samples, the samples are incorporated into the current tier.
- nFeatures
Default: 20. The number of features to consider during feature selection, if feature selection is done.
- selectionMethod
A named character vector of feature selection methods to use for the assays, one for each. The names must correspond to names of
measurements
.- classifier
A named character vector of modelling methods to use for the assays, one for each. The names must correspond to names of
measurements
.- nFolds
A numeric specifying the number of folds to use for cross-validation.
- nRepeats
A numeric specifying the the number of repeats or permutations to use for cross-validation.
- nCores
A numeric specifying the number of cores used if the user wants to use parallelisation.
- pathways
A set of pathways created by
precisionPathwaysTrain
which is an object of classPrecisionPathways
to be used for predicting on a new data set.