An object containing a dataset and methods for evaluating analytical tasks against ground truths for the dataset.
Public fields
dataThe data
evidenceThe supporting evidence for the data
metricsThe metric for evaluating tasks against the gold standards
cachePathThe path to the data cache
dataSourceThe data repository that the data were retrieved from
dataSourceIDThe dataset ID for
dataSourceevidenceSourceThe data repository that the supporting evidence was retrieved from.
evidenceSourceIDThe dataset ID for
evidenceSource.splitIndicesIndices for cross-validation
splitSeedThe seed used to generate the split indices
verboseSet the verbosity of Trio. Defaults to
FALSE.descriptionA description of the dataset.
nameThe name of the Trio object, as defined in Curated Trio Datasets.
Methods
Method new()
Create a Trio object
Usage
Trio$new(
datasetID = NULL,
data = NULL,
dataLoader = NULL,
evidenceID = NULL,
evidence = NULL,
evidenceColumns = NULL,
evidenceLoader = NULL,
task = NULL,
metrics = NULL,
cachePath = FALSE,
verbose = FALSE
)Arguments
datasetIDA string specifying a dataset, either a name from curated-trio-data or a format string of the form
source:source_id.dataAn object to use as the Trio dataset.
dataLoaderA custom loading function that takes the path of a downloaded file and returns a single dataset, ready to be used in evaluation tasks.
evidenceIDIf
datasetIDis not an ID from the curated trio datasets spreadsheet, then a format string of the formsource:source_idindicating the file to obtain the supporting evidence from.evidenceA named list of lists. The top-level list is named by task type. The lower-level list is of length-two and named
"evidence"and"metrics". The"evidence"component has supporting evidence and the"metrics"component has a character vector of metric names (corresponding to the names of the list provided to themetricsparameter).evidenceColumnsIf
evidenceIDis notNULL, then the columns of the table containing the supporting evidence.evidenceLoaderAlternative to
evidenceandevidenceColumns. Extract the evidence in a flexible way.taskIf
evidenceColumnsorevidenceLoaderspecified, a character vector of length 1 naming the task the evidence is for.metricsA named list of metric functions.
cachePathThe path to the data cache
verboseSet the verbosity of Trio. Defaults to
FALSE.descriptionA description of the dataset.
Method addEvidence()
Add supporting evidence to the Trio.
Arguments
nameA string specifying the name of the supporting evidence.
evidenceThe supporting evidence. An object to be compared or a function to be run on the data.
metricsA list of one or more metrics names used to compare gs with the input to evaluate.
argsA named list of parameters and values to be passed to the function.
Method addMetric()
Add a metric to the Trio.
Arguments
nameA string specifying the name of the metric.
metricThe metric. A function to be run on the input to evaluate to compare it with the gold standard. Should be of the form f(x, y, ...). Where
xis the "truth" andyis the output to be evaluated. Otherwise input a wrapper function of the desired metric.argsA named list of parameters and values to be passed to the function.
Method split()
Create cross-validation indices.
Usage
Trio$split(
y,
n_fold = 5L,
n_repeat = 1L,
stratify = TRUE,
seed = NULL,
overwrite = FALSE,
...
)Arguments
yA variable to use for stratified sampling (e.g. supporting evidence). If
stratifyis false, a vector the length of the data.n_foldNumber of folds. Defaults to
5L.n_repeatNumber of repeats. Defaults to
1L.stratifyIf
TRUE, uses stratified sampling. Defaults toTRUE.seedAn optional seed for split generation. Defaults to
NULL. IfNULL, the seed is set to the current time.overwriteIf
TRUE, overwrites the current split. Defaults toFALSE....Additional arguments passed to
splitTools::create_folds.
Method writeCTD()
Write the Trio Metadata to Curated Trio Datasets sheet.
Usage
Trio$writeCTD(
name,
email = NULL,
githubPat = NULL,
description = NULL,
figshareUrl = NULL,
datasetFileName = NULL,
evidenceFileName = NULL,
dataType = NULL,
skipMd5Check = FALSE
)Arguments
nameThe name of the dataset to be added.
emailRequired. Email address of the contributor for dataset update notifications.
githubPatOptional GitHub Personal Access Token. If not provided and not set in environment, will prompt user.
descriptionOptional description of the dataset. If not provided and not set, will prompt user.
figshareUrlOptional URL to the Figshare dataset. If not provided, will prompt user.
datasetFileNameOptional name of the dataset file in Figshare. If not provided, will prompt user for selection.
evidenceFileNameOptional name of the evidence file in Figshare. If not provided, will prompt user for selection.
dataTypeOptional type of data. Must be one of: "omics", "clinical", "spatial", "other". If not provided, will prompt user.
skipMd5CheckOptional boolean to skip MD5 verification. Defaults to FALSE.
Examples
trio <- Trio$new("figshare:26054188/47112109", cachePath = tempdir())
#> ✔ Reading from Curated Trio Datasets.
#> ✔ Range ''Datasets''.
#> has no supporting evidence for this dataset.
#> ℹ Please add your own supporting evidence for evaluation.