Skip to contents

Pair-wise overlaps can be done for two types of analyses. Firstly, each cross-validation iteration can be considered within a single classification. This explores the feature ranking stability. Secondly, the overlap may be considered between different classification results. This approach compares the feature ranking commonality between different results. Two types of commonality are possible to analyse. One summary is the average pair-wise overlap between all possible pairs of results. The second kind of summary is the pair-wise overlap of each level of the comparison factor that is not the reference level against the reference level. The overlaps are converted to percentages and plotted as lineplots.

Usage

# S4 method for ClassifyResult
rankingPlot(results, ...)

# S4 method for list
rankingPlot(
  results,
  topRanked = seq(10, 100, 10),
  comparison = "within",
  referenceLevel = NULL,
  characteristicsList = list(),
  orderingList = list(),
  sizesList = list(lineWidth = 1, pointSize = 2, legendLinesPointsSize = 1, fonts = c(24,
    16, 12, 12, 12, 16)),
  lineColours = NULL,
  xLabelPositions = seq(10, 100, 10),
  yMax = 100,
  title = if (comparison[1] == "within") "Feature Ranking Stability" else
    "Feature Ranking Commonality",
  yLabel = if (is.null(referenceLevel)) "Average Common Features (%)" else
    paste("Average Common Features with", referenceLevel, "(%)"),
  margin = grid::unit(c(1, 1, 1, 1), "lines"),
  showLegend = TRUE,
  parallelParams = bpparam()
)

Arguments

results

A list of ClassifyResult objects.

...

Not used by end user.

topRanked

A sequence of thresholds of number of the best features to use for overlapping.

comparison

Default: "within". The aspect of the experimental design to compare. Can be any characteristic that all results share or special value "within" to compared between all pairwise iterations of cross-validation.

referenceLevel

The level of the comparison factor to use as the reference to compare each non-reference level to. If NULL, then each level has the average pairwise overlap calculated to all other levels.

characteristicsList

A named list of characteristics. The name must be one of "lineColour", "pointType", "row" or "column". The value of each element must be a characteristic name, as stored in the "characteristic" column of the results' characteristics table.

orderingList

An optional named list. Any of the variables specified to characteristicsList can be the name of an element of this list and the value of the element is the order in which the factor should be presented in.

sizesList

Default: lineWidth = 1, pointSize = 2, legendLinesPointsSize = 1, fonts = c(24, 16, 12, 12, 12, 16). A list which must contain elements named lineWidth, pointSize, legendLinesPointsSize and fonts. The first three specify the size of lines and points in the graph, as well as in the plot legend. fonts is a vector of length 6. The first element is the size of the title text. The second element is the size of the axes titles. The third element is the size of the axes values. The fourth element is the size of the legends' titles. The fifth element is the font size of the legend labels. The sixth element is the font size of the titles of grouped plots, if any are produced. Each list element must numeric.

lineColours

A vector of colours for different levels of the line colouring parameter, if one is specified by characteristicsList[["lineColour"]]. If none are specified but, characteristicsList[["lineColour"]] is, an automatically-generated palette will be used.

xLabelPositions

Locations where to put labels on the x-axis.

yMax

The maximum value of the percentage to plot.

title

An overall title for the plot.

yLabel

Label to be used for the y-axis of overlap percentages.

margin

The margin to have around the plot.

showLegend

If TRUE, a legend is plotted next to the plot. If FALSE, it is hidden.

parallelParams

An object of class MulticoreParam or SnowParam.

Value

An object of class ggplot and a plot on the current graphics device, if plot is TRUE.

Details

If comparison is "within", then the feature selection overlaps are compared within a particular analysis. The result will inform how stable the selections are between different iterations of cross-validation for a particular analysis. Otherwise, the comparison is between different cross-validation runs, and this gives an indication about how common are the features being selected by different classifications.

Calculating all pair-wise set overlaps for a large cross-validation result can be time-consuming. This stage can be done on multiple CPUs by providing the relevant options to parallelParams.

Author

Dario Strbenac

Examples


  predicted <- DataFrame(sample = sample(10, 100, replace = TRUE),
                          permutation = rep(1:2, each = 50),
                          class = rep(c("Healthy", "Cancer"), each = 50))
  actual <- factor(rep(c("Healthy", "Cancer"), each = 5))
  allFeatures <- sapply(1:100, function(index) paste(sample(LETTERS, 3), collapse = ''))
  rankList <- list(allFeatures[1:100], allFeatures[c(15:6, 1:5, 16:100)],
                   allFeatures[c(1:9, 11, 10, 12:100)], allFeatures[c(1:50, 61:100, 60:51)])
  result1 <- ClassifyResult(DataFrame(characteristic = c("Data Set", "Selection Name", "Classifier Name", "Cross-validation"),
                            value = c("Melanoma", "t-test", "Diagonal LDA", "2 Permutations, 2 Folds")),
                            LETTERS[1:10], allFeatures, rankList,
                            list(rankList[[1]][1:15], rankList[[2]][1:15],
                                 rankList[[3]][1:10], rankList[[4]][1:10]),
                            list(function(oracle){}), NULL,
                            predicted, actual)
  
  predicted[, "class"] <- sample(predicted[, "class"])
  rankList <- list(allFeatures[1:100], allFeatures[c(sample(20), 21:100)],
  allFeatures[c(1:9, 11, 10, 12:100)], allFeatures[c(1:50, 60:51, 61:100)])
  result2 <- ClassifyResult(DataFrame(characteristic = c("Data Set", "Selection Name", "Classifier Name",
                                                         "Cross-validations"),
                            value = c("Melanoma", "t-test", "Random Forest", "2 Permutations, 2 Folds")),
                            LETTERS[1:10], allFeatures, rankList,
                            list(rankList[[1]][1:15], rankList[[2]][1:15],
                                 rankList[[3]][1:10], rankList[[4]][1:10]),
                            list(function(oracle){}), NULL,
                            predicted, actual)
                            
  rankingPlot(list(result1, result2), characteristicsList = list(pointType = "Classifier Name"))