An object containing a dataset and methods for evaluating analytical tasks against ground truths for the dataset.
Public fields
data
The data
auxData
The auxiliary data in the data
metrics
The metric for evaluating tasks against the gold standards
cachePath
The path to the data cache
dataSource
The data repository that the data were retrieved from
dataSourceID
The dataset ID for
dataSouce
splitIndices
Indices for cross-validation
splitSeed
The seed used to generate the split indices
verbose
Set the verbosity of Trio. Defaults to
FALSE
.
Methods
Method new()
Create a Trio object
Usage
Trio$new(
datasetID = NULL,
data = NULL,
dataLoader = NULL,
cachePath = FALSE,
verbose = FALSE
)
Arguments
datasetID
A string specifying a dataset, either a name from curated-trio-data or a format string of the form
source
:source_id
.data
An object to use as the Trio dataset.
dataLoader
A custom loading fuction that takes the path of a downloaded file and returns a single dataset, ready to be used in evaluation tasks.
cachePath
The path to the data cache
verbose
Set the verbosity of Trio. Defaults to
FALSE
.
Method addAuxData()
Add a gold standard to the Trio.
Arguments
name
A string specifying the name of the gold standard.
auxData
The auxiliary data. An object to be compared or a function to be run on the data.
metrics
A list of one or more metrics names used to campare gs with the input to evaluate.
args
A named list of parameters and values to be passed to the function.
Method addMetric()
Add a metric to the Trio.
Arguments
name
A string specifying the name of the metric.
metric
The metric. A function to be run on the input to evaluate to compare it with the gold standard. Should be of the form f(x, y, ...). Where
x
is the "truth" andy
is the output to be evaluated. Otherwise input a wrapper function of the desired metric.args
A named list of parameters and values to be passed to the function.
Method split()
Create a cross-validation indices.
Arguments
y
A variable to use for statified sampling. If
stratify
is false, a vector the length of the data.n_fold
Number of folds. Defaults to
5L
.n_repeat
Number of repeats. Defaults to
1L
.stratify
If
TRUE
, uses stratified sampling. Defaults toTRUE
.seed
An optional seed for split generation. Defaults to
NULL
. IfNULL
, the seed is set to the current time.overwrite
If
TRUE
, overwrites the current split. Defaults toFALSE
.
Examples
trio <- Trio$new("figshare:26054188/47112109", cachePath = tempdir())