public class DatasetUtil
extends java.lang.Object
Constructor and Description |
---|
DatasetUtil() |
Modifier and Type | Method and Description |
---|---|
static java.util.ArrayList<Vector> |
appendBias(java.util.Collection<? extends Vector> dataset)
Appends a bias (constant 1.0) to the end of each Vector in the dataset,
the original dataset is unmodified.
|
static java.util.ArrayList<Vector> |
appendBias(java.util.Collection<? extends Vector> dataset,
double biasValue)
Appends "biasValue" to the end of each Vector in the dataset,
the original dataset is unmodified.
|
static <EntryType> |
asMultiCollection(java.util.Collection<EntryType> collection)
Takes a collection and returns a multi-collection version of that
collection.
|
static void |
assertDimensionalitiesAllEqual(java.lang.Iterable<? extends Vectorizable> data)
Asserts that all of the dimensionalities of the vectors in the
given set of data are the same.
|
static void |
assertInputDimensionalitiesAllEqual(java.lang.Iterable<? extends InputOutputPair<? extends Vectorizable,?>> data)
Asserts that all of the dimensionalities of the input vectors in the
given set of input-output pairs are the same.
|
static void |
assertInputDimensionalitiesAllEqual(java.lang.Iterable<? extends InputOutputPair<? extends Vectorizable,?>> data,
int dimensionality)
Asserts that all of the dimensionalities of the input vectors in the
given set of input-output pairs equal the given dimensionality.
|
static java.util.Collection<Vector> |
asVectorCollection(java.util.Collection<? extends Vectorizable> collection)
Takes a collection of
Vectorizable objects and returns a
collection of Vector objects of the same size. |
static Matrix |
computeOuterProductDataMatrix(java.util.ArrayList<? extends Vector> data)
Computes the outer-product Matrix of the given set of data:
XXt = [ x1 x2 ...
|
static double |
computeOutputMean(java.util.Collection<? extends InputOutputPair<?,? extends java.lang.Number>> data)
Computes the mean of the output data.
|
static double |
computeOutputVariance(java.util.Collection<? extends InputOutputPair<?,? extends java.lang.Number>> data)
Computes the variance of the output of a given set of input-output pairs.
|
static double |
computeWeightedOutputMean(java.util.Collection<? extends InputOutputPair<?,? extends java.lang.Number>> data)
Computes the mean of the output data.
|
static <OutputType> |
countOutputValues(java.lang.Iterable<? extends InputOutputPair<?,? extends OutputType>> data)
Creates a data histogram over the output values from the given data.
|
static java.util.ArrayList<java.util.ArrayList<java.lang.Double>> |
decoupleVectorDataset(java.util.Collection<? extends Vector> dataset)
Takes a dataset of M-dimensional Vectors and turns it into M
datasets of Doubles
|
static java.util.ArrayList<java.util.ArrayList<InputOutputPair<java.lang.Double,java.lang.Double>>> |
decoupleVectorPairDataset(java.util.Collection<? extends InputOutputPair<? extends Vector,? extends Vector>> dataset)
Takes a set of equal-dimension Vector-Vector InputOutputPairs and
turns them into a collection of Double-Double InputOutputPairs.
|
static <OutputType> |
findUniqueOutputs(java.lang.Iterable<? extends InputOutputPair<?,? extends OutputType>> data)
Creates a set containing the unique output values from the given data.
|
static int |
getDimensionality(java.lang.Iterable<? extends Vectorizable> data)
Gets the dimensionality of the vectors in given set of data.
|
static int |
getInputDimensionality(java.lang.Iterable<? extends InputOutputPair<? extends Vectorizable,?>> data)
Gets the dimensionality of the input vectors in given set of input-output
pairs.
|
static double |
getWeight(InputOutputPair<?,?> pair)
Gets the weight of a given input-output pair.
|
static double |
getWeight(TargetEstimatePair<?,?> pair)
Gets the weight of a given target-estimate pair.
|
static <InputType> |
inputsList(java.lang.Iterable<? extends InputOutputPair<? extends InputType,?>> data)
Creates a list containing all of the input values from the given data.
|
static <OutputType> |
outputsList(java.lang.Iterable<? extends InputOutputPair<?,? extends OutputType>> data)
Creates a list containing all of the output values from the given data.
|
static <DataType> DefaultPair<java.util.LinkedList<DataType>,java.util.LinkedList<DataType>> |
splitDatasets(java.util.Collection<? extends InputOutputPair<? extends DataType,java.lang.Boolean>> data)
Splits a dataset of input-output pair into two datasets, one for the
inputs that have a "true" output and another for the inputs that have
a "false" output
|
static <InputType,CategoryType> |
splitOnOutput(java.lang.Iterable<? extends InputOutputPair<? extends InputType,? extends CategoryType>> data)
Splits a dataset according to its output value (usually a category) so
that all the inputs for that category are given in a list.
|
static double |
sumWeights(java.util.Collection<? extends InputOutputPair<?,?>> data)
Gets the sum of the weights of the weights of the elements of the
dataset.
|
public static java.util.ArrayList<Vector> appendBias(java.util.Collection<? extends Vector> dataset)
dataset
- Dataset to append a bias term to, Vectors can be of different
dimensionalitypublic static java.util.ArrayList<Vector> appendBias(java.util.Collection<? extends Vector> dataset, double biasValue)
dataset
- Dataset to append a bias term to, Vectors can be of different
dimensionalitybiasValue
- Bias value to append to the samplespublic static java.util.ArrayList<java.util.ArrayList<InputOutputPair<java.lang.Double,java.lang.Double>>> decoupleVectorPairDataset(java.util.Collection<? extends InputOutputPair<? extends Vector,? extends Vector>> dataset)
dataset
- Collection of Vector-Vector InputOutputPairs. All Vectors (both inputs
and outputs) must have equal dimension!!public static java.util.ArrayList<java.util.ArrayList<java.lang.Double>> decoupleVectorDataset(java.util.Collection<? extends Vector> dataset)
dataset
- M-dimensional Vectors, throws IllegalArgumentException if all Vectors
aren't the same dimensionalitypublic static <DataType> DefaultPair<java.util.LinkedList<DataType>,java.util.LinkedList<DataType>> splitDatasets(java.util.Collection<? extends InputOutputPair<? extends DataType,java.lang.Boolean>> data)
DataType
- The type of the data.data
- Collection of InputOutputPairs to split according to the output flagpublic static <InputType,CategoryType> java.util.Map<CategoryType,java.util.List<InputType>> splitOnOutput(java.lang.Iterable<? extends InputOutputPair<? extends InputType,? extends CategoryType>> data)
InputType
- The the of the input values.CategoryType
- The type of the output values.data
- The input-output pairs to split.public static Matrix computeOuterProductDataMatrix(java.util.ArrayList<? extends Vector> data)
data
- Input dataset where each of "N" Vectors has dimension of "M"public static double computeOutputMean(java.util.Collection<? extends InputOutputPair<?,? extends java.lang.Number>> data)
data
- The data to compute the mean of the output.public static double computeWeightedOutputMean(java.util.Collection<? extends InputOutputPair<?,? extends java.lang.Number>> data)
data
- The data to compute the mean of the output.public static double computeOutputVariance(java.util.Collection<? extends InputOutputPair<?,? extends java.lang.Number>> data)
data
- The data.public static <OutputType> java.util.Set<OutputType> findUniqueOutputs(java.lang.Iterable<? extends InputOutputPair<?,? extends OutputType>> data)
OutputType
- The type of the output values.data
- The data to collect the unique output values from.public static <OutputType> DataDistribution<OutputType> countOutputValues(java.lang.Iterable<? extends InputOutputPair<?,? extends OutputType>> data)
OutputType
- The type of the output values.data
- The data to collect the output values from.public static <InputType> java.util.List<InputType> inputsList(java.lang.Iterable<? extends InputOutputPair<? extends InputType,?>> data)
InputType
- The type of the input values.data
- The data to collect the input values from.public static <OutputType> java.util.List<OutputType> outputsList(java.lang.Iterable<? extends InputOutputPair<?,? extends OutputType>> data)
OutputType
- The type of the output values.data
- The data to collect the output values from.public static <EntryType> MultiCollection<EntryType> asMultiCollection(java.util.Collection<EntryType> collection)
EntryType
- The entry type of the collection.collection
- A collection.public static java.util.Collection<Vector> asVectorCollection(java.util.Collection<? extends Vectorizable> collection)
Vectorizable
objects and returns a
collection of Vector
objects of the same size.collection
- The collection of Vectorizable
objects to convert.Vector
objects.public static int getInputDimensionality(java.lang.Iterable<? extends InputOutputPair<? extends Vectorizable,?>> data)
data
- The data to find the input dimensionality of.public static void assertInputDimensionalitiesAllEqual(java.lang.Iterable<? extends InputOutputPair<? extends Vectorizable,?>> data)
data
- A collection of input-output pairs.DimensionalityMismatchException
- If the dimensionalities are not all equal.public static void assertInputDimensionalitiesAllEqual(java.lang.Iterable<? extends InputOutputPair<? extends Vectorizable,?>> data, int dimensionality)
data
- A collection of input-output pairs.dimensionality
- The dimensionality that all the inputs must have.DimensionalityMismatchException
- If the dimensionalities are not all equal.public static int getDimensionality(java.lang.Iterable<? extends Vectorizable> data)
data
- The data to find the dimensionality of.public static void assertDimensionalitiesAllEqual(java.lang.Iterable<? extends Vectorizable> data)
data
- A collection of data.DimensionalityMismatchException
- If the dimensionalities are not all equal.public static double getWeight(InputOutputPair<?,?> pair)
WeightedInputOutputPair
interface, then it casts it to retrieve its weight. Otherwise, it
returns 1.0.pair
- The pair to get the weight of.public static double getWeight(TargetEstimatePair<?,?> pair)
WeightedTargetEstimatePair
interface, then it casts it to retrieve its weight. Otherwise, it
returns 1.0.pair
- The pair to get the weight of.public static double sumWeights(java.util.Collection<? extends InputOutputPair<?,?>> data)
data
- The dataset to compute the sum of the weights