An object containing a dataset and methods for evaluating analytical tasks against ground truths for the dataset.
Public fields
data
The data
evidence
The supporting evidence for the data
metrics
The metric for evaluating tasks against the gold standards
cachePath
The path to the data cache
dataSource
The data repository that the data were retrieved from
dataSourceID
The dataset ID for
dataSource
evidenceSource
The data repository that the supporting evidence was retrieved from.
evidenceSourceID
The dataset ID for
evidenceSource
.splitIndices
Indices for cross-validation
splitSeed
The seed used to generate the split indices
verbose
Set the verbosity of Trio. Defaults to
FALSE
.description
A description of the dataset.
name
The name of the Trio object, as defined in Curated Trio Datasets.
Methods
Method new()
Create a Trio object
Usage
Trio$new(
datasetID = NULL,
data = NULL,
dataLoader = NULL,
evidenceID = NULL,
evidence = NULL,
evidenceColumns = NULL,
evidenceLoader = NULL,
task = NULL,
metrics = NULL,
cachePath = FALSE,
verbose = FALSE
)
Arguments
datasetID
A string specifying a dataset, either a name from curated-trio-data or a format string of the form
source
:source_id
.data
An object to use as the Trio dataset.
dataLoader
A custom loading function that takes the path of a downloaded file and returns a single dataset, ready to be used in evaluation tasks.
evidenceID
If
datasetID
is not an ID from the curated trio datasets spreadsheet, then a format string of the formsource
:source_id
indicating the file to obtain the supporting evidence from.evidence
A named list of lists. The top-level list is named by task type. The lower-level list is of length-two and named
"evidence"
and"metrics"
. The"evidence"
component has supporting evidence and the"metrics"
component has a character vector of metric names (corresponding to the names of the list provided to themetrics
parameter).evidenceColumns
If
evidenceID
is notNULL
, then the columns of the table containing the supporting evidence.evidenceLoader
Alternative to
evidence
andevidenceColumns
. Extract the evidence in a flexible way.task
If
evidenceColumns
orevidenceLoader
specified, a character vector of length 1 naming the task the evidence is for.metrics
A named list of metric functions.
cachePath
The path to the data cache
verbose
Set the verbosity of Trio. Defaults to
FALSE
.description
A description of the dataset.
Method addEvidence()
Add supporting evidence to the Trio.
Arguments
name
A string specifying the name of the supporting evidence.
evidence
The supporting evidence. An object to be compared or a function to be run on the data.
metrics
A list of one or more metrics names used to compare gs with the input to evaluate.
args
A named list of parameters and values to be passed to the function.
Method addMetric()
Add a metric to the Trio.
Arguments
name
A string specifying the name of the metric.
metric
The metric. A function to be run on the input to evaluate to compare it with the gold standard. Should be of the form f(x, y, ...). Where
x
is the "truth" andy
is the output to be evaluated. Otherwise input a wrapper function of the desired metric.args
A named list of parameters and values to be passed to the function.
Method split()
Create cross-validation indices.
Usage
Trio$split(
y,
n_fold = 5L,
n_repeat = 1L,
stratify = TRUE,
seed = NULL,
overwrite = FALSE,
...
)
Arguments
y
A variable to use for stratified sampling (e.g. supporting evidence). If
stratify
is false, a vector the length of the data.n_fold
Number of folds. Defaults to
5L
.n_repeat
Number of repeats. Defaults to
1L
.stratify
If
TRUE
, uses stratified sampling. Defaults toTRUE
.seed
An optional seed for split generation. Defaults to
NULL
. IfNULL
, the seed is set to the current time.overwrite
If
TRUE
, overwrites the current split. Defaults toFALSE
....
Additional arguments passed to
splitTools::create_folds
.
Method writeCTD()
Write the Trio Metadata to Curated Trio Datasets sheet.
Usage
Trio$writeCTD(
name,
email = NULL,
githubPat = NULL,
description = NULL,
figshareUrl = NULL,
datasetFileName = NULL,
evidenceFileName = NULL,
dataType = NULL,
skipMd5Check = FALSE
)
Arguments
name
The name of the dataset to be added.
email
Required. Email address of the contributor for dataset update notifications.
githubPat
Optional GitHub Personal Access Token. If not provided and not set in environment, will prompt user.
description
Optional description of the dataset. If not provided and not set, will prompt user.
figshareUrl
Optional URL to the Figshare dataset. If not provided, will prompt user.
datasetFileName
Optional name of the dataset file in Figshare. If not provided, will prompt user for selection.
evidenceFileName
Optional name of the evidence file in Figshare. If not provided, will prompt user for selection.
dataType
Optional type of data. Must be one of: "omics", "clinical", "spatial", "other". If not provided, will prompt user.
skipMd5Check
Optional boolean to skip MD5 verification. Defaults to FALSE.
Examples
trio <- Trio$new("figshare:26054188/47112109", cachePath = tempdir())
#> Warning: has no supporting evidence for this dataset.
#> ℹ Please add your own supporting evidence for evaluation.