A function to perform pairwise cross validation
crissCrossValidate.RdThis function has been designed to perform cross-validation and model prediction on datasets in a pairwise manner.
Usage
crissCrossValidate(
measurements,
outcomes,
nFeatures = 20,
selectionMethod = "auto",
selectionOptimisation = "Resubstitution",
trainType = c("modelTrain", "modelTest"),
performanceType = "auto",
doRandomFeatures = FALSE,
runTOP = FALSE,
classifier = "auto",
nFolds = 5,
nRepeats = 20,
nCores = 1,
verbose = 0
)Arguments
- measurements
A
listof eitherDataFrame,data.frameormatrixclass measurements.- outcomes
A
listof vectors that respectively correspond to outcomes of the samples inmeasurementslist. / Factors should be coded such that the control class is the first level.- nFeatures
The number of features to be used for modelling.
- selectionMethod
Default:
"auto". A character keyword of the feature algorithm to be used. If"auto", t-test (two categories) / F-test (three or more categories) ranking and topnFeaturesoptimisation is done. Otherwise, the ranking method is per-feature Cox proportional hazards p-value.- selectionOptimisation
A character of "Resubstitution", "Nested CV" or "none" specifying the approach used to optimise nFeatures.
- trainType
Default:
"modelTrain". A keyword specifying whether a fully trained model is used to make predictions on the test set or if only the feature identifiers are chosen using the training data set and a number of training-predictions are made by cross-validation in the test set.- performanceType
Default:
"auto". If"auto", then balanced accuracy for classification or C-index for survival. Otherwise, any one of the options described incalcPerformancemay otherwise be specified.- doRandomFeatures
Default:
FALSE. Whether to perform random feature selection to establish a baseline performance. EitherFALSEorTRUEare permitted values.- runTOP
Default:
FALSE. IfTRUE, perform the Transferable Omics Prediction (TOP) procedure in a leave-one-dataset-out manner.- classifier
Default:
"auto". A character keyword of the modelling algorithm to be used. If"auto", then a random forest is used for a classification task or Cox proportional hazards model for a survival task.- nFolds
A numeric specifying the number of folds to use for cross-validation.
- nRepeats
A numeric specifying the number of repeats or permutations to use for cross-validation.
- nCores
A numeric specifying the number of cores used if the user wants to use parallelisation.
- verbose
Default: 0. A number between 0 and 3 for the amount of progress messages to give. A higher number will produce more messages.