weka.classifiers
Class BVDecomposeSegCVSub

java.lang.Object
  extended by weka.classifiers.BVDecomposeSegCVSub
All Implemented Interfaces:
OptionHandler, TechnicalInformationHandler

public class BVDecomposeSegCVSub
extends java.lang.Object
implements OptionHandler, TechnicalInformationHandler

This class performs Bias-Variance decomposion on any classifier using the sub-sampled cross-validation procedure as specified in (1).
The Kohavi and Wolpert definition of bias and variance is specified in (2).
The Webb definition of bias and variance is specified in (3).

Geoffrey I. Webb, Paul Conilione (2002). Estimating bias and variance from data. School of Computer Science and Software Engineering, Victoria, Australia.

Ron Kohavi, David H. Wolpert: Bias Plus Variance Decomposition for Zero-One Loss Functions. In: Machine Learning: Proceedings of the Thirteenth International Conference, 275-283, 1996.

Geoffrey I. Webb (2000). MultiBoosting: A Technique for Combining Boosting and Wagging. Machine Learning. 40(2):159-196.

BibTeX:

 @misc{Webb2002,
    address = {School of Computer Science and Software Engineering, Victoria, Australia},
    author = {Geoffrey I. Webb and Paul Conilione},
    institution = {Monash University},
    title = {Estimating bias and variance from data},
    year = {2002},
    PDF = {http://www.csse.monash.edu.au/\~webb/Files/WebbConilione04.pdf}
 }
 
 @inproceedings{Kohavi1996,
    author = {Ron Kohavi and David H. Wolpert},
    booktitle = {Machine Learning: Proceedings of the Thirteenth International Conference},
    editor = {Lorenza Saitta},
    pages = {275-283},
    publisher = {Morgan Kaufmann},
    title = {Bias Plus Variance Decomposition for Zero-One Loss Functions},
    year = {1996},
    PS = {http://robotics.stanford.edu/\~ronnyk/biasVar.ps}
 }
 
 @article{Webb2000,
    author = {Geoffrey I. Webb},
    journal = {Machine Learning},
    number = {2},
    pages = {159-196},
    title = {MultiBoosting: A Technique for Combining Boosting and Wagging},
    volume = {40},
    year = {2000}
 }
 

Valid options are:

 -c <class index>
  The index of the class attribute.
  (default last)
 -D
  Turn on debugging output.
 -l <num>
  The number of times each instance is classified.
  (default 10)
 -p <proportion of objects in common>
  The average proportion of instances common between any two training sets
 -s <seed>
  The random number seed used.
 -t <name of arff file>
  The name of the arff file used for the decomposition.
 -T <number of instances in training set>
  The number of instances in the training set.
 -W <classifier class name>
  Full class name of the learner used in the decomposition.
  eg: weka.classifiers.bayes.NaiveBayes
 
 Options specific to learner weka.classifiers.rules.ZeroR:
 
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console
Options after -- are passed to the designated sub-learner.

Version:
$Revision: 1.6 $
Author:
Paul Conilione (paulc4321@yahoo.com.au)

Constructor Summary
BVDecomposeSegCVSub()
           
 
Method Summary
 void decompose()
          Carry out the bias-variance decomposition using the sub-sampled cross-validation method.
 java.util.Vector findCentralTendencies(double[] predProbs)
          Finds the central tendency, given the classifications for an instance.
 Classifier getClassifier()
          Gets the name of the classifier being analysed
 int getClassifyIterations()
          Gets the number of times an instance is classified
 int getClassIndex()
          Get the index (starting from 1) of the attribute used as the class.
 java.lang.String getDataFileName()
          Get the name of the data file used for the decomposition
 boolean getDebug()
          Gets whether debugging is turned on
 double getError()
          Get the calculated error rate
 double getKWBias()
          Get the calculated bias squared according to the Kohavi and Wolpert definition
 double getKWSigma()
          Get the calculated sigma according to the Kohavi and Wolpert definition
 double getKWVariance()
          Get the calculated variance according to the Kohavi and Wolpert definition
 java.lang.String[] getOptions()
          Gets the current settings of the CheckClassifier.
 double getP()
          Get the proportion of instances that are common between two training sets.
 int getSeed()
          Gets the random number seed
 TechnicalInformation getTechnicalInformation()
          Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
 int getTrainSize()
          Get the training size
 double getWBias()
          Get the calculated bias according to the Webb definition
 double getWVariance()
          Get the calculated variance according to the Webb definition
 java.lang.String globalInfo()
          Returns a string describing this object
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
static void main(java.lang.String[] args)
          Test method for this class
 void randomize(int[] index, java.util.Random random)
          Accepts an array of ints and randomises the values in the array, using the random seed.
 void setClassifier(Classifier newClassifier)
          Set the classifiers being analysed
 void setClassifyIterations(int classifyIterations)
          Sets the number of times an instance is classified
 void setClassIndex(int classIndex)
          Sets index of attribute to discretize on
 void setDataFileName(java.lang.String dataFileName)
          Sets the name of the dataset file.
 void setDebug(boolean debug)
          Sets debugging mode
 void setOptions(java.lang.String[] options)
          Sets the OptionHandler's options using the given list.
 void setP(double proportion)
          Set the proportion of instances that are common between two training sets used to train a classifier.
 void setSeed(int seed)
          Sets the random number seed
 void setTrainSize(int size)
          Set the training size.
 java.lang.String toString()
          Returns description of the bias-variance decomposition results.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

BVDecomposeSegCVSub

public BVDecomposeSegCVSub()
Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing this object

Returns:
a description of the classifier suitable for displaying in the explorer/experimenter gui

getTechnicalInformation

public TechnicalInformation getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.

Specified by:
getTechnicalInformation in interface TechnicalInformationHandler
Returns:
the technical information about this class

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Sets the OptionHandler's options using the given list. All options will be set (or reset) during this call (i.e. incremental setting of options is not possible).

Valid options are:

 -c <class index>
  The index of the class attribute.
  (default last)
 -D
  Turn on debugging output.
 -l <num>
  The number of times each instance is classified.
  (default 10)
 -p <proportion of objects in common>
  The average proportion of instances common between any two training sets
 -s <seed>
  The random number seed used.
 -t <name of arff file>
  The name of the arff file used for the decomposition.
 -T <number of instances in training set>
  The number of instances in the training set.
 -W <classifier class name>
  Full class name of the learner used in the decomposition.
  eg: weka.classifiers.bayes.NaiveBayes
 
 Options specific to learner weka.classifiers.rules.ZeroR:
 
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the CheckClassifier.

Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings suitable for passing to setOptions

setClassifier

public void setClassifier(Classifier newClassifier)
Set the classifiers being analysed

Parameters:
newClassifier - the Classifier to use.

getClassifier

public Classifier getClassifier()
Gets the name of the classifier being analysed

Returns:
the classifier being analysed.

setDebug

public void setDebug(boolean debug)
Sets debugging mode

Parameters:
debug - true if debug output should be printed

getDebug

public boolean getDebug()
Gets whether debugging is turned on

Returns:
true if debugging output is on

setSeed

public void setSeed(int seed)
Sets the random number seed

Parameters:
seed - the random number seed

getSeed

public int getSeed()
Gets the random number seed

Returns:
the random number seed

setClassifyIterations

public void setClassifyIterations(int classifyIterations)
Sets the number of times an instance is classified

Parameters:
classifyIterations - number of times an instance is classified

getClassifyIterations

public int getClassifyIterations()
Gets the number of times an instance is classified

Returns:
the maximum number of times an instance is classified

setDataFileName

public void setDataFileName(java.lang.String dataFileName)
Sets the name of the dataset file.

Parameters:
dataFileName - name of dataset file.

getDataFileName

public java.lang.String getDataFileName()
Get the name of the data file used for the decomposition

Returns:
the name of the data file

getClassIndex

public int getClassIndex()
Get the index (starting from 1) of the attribute used as the class.

Returns:
the index of the class attribute

setClassIndex

public void setClassIndex(int classIndex)
Sets index of attribute to discretize on

Parameters:
classIndex - the index (starting from 1) of the class attribute

getKWBias

public double getKWBias()
Get the calculated bias squared according to the Kohavi and Wolpert definition

Returns:
the bias squared

getWBias

public double getWBias()
Get the calculated bias according to the Webb definition

Returns:
the bias

getKWVariance

public double getKWVariance()
Get the calculated variance according to the Kohavi and Wolpert definition

Returns:
the variance

getWVariance

public double getWVariance()
Get the calculated variance according to the Webb definition

Returns:
the variance according to Webb

getKWSigma

public double getKWSigma()
Get the calculated sigma according to the Kohavi and Wolpert definition

Returns:
the sigma

setTrainSize

public void setTrainSize(int size)
Set the training size.

Parameters:
size - the size of the training set

getTrainSize

public int getTrainSize()
Get the training size

Returns:
the size of the training set

setP

public void setP(double proportion)
Set the proportion of instances that are common between two training sets used to train a classifier.

Parameters:
proportion - the proportion of instances that are common between training sets.

getP

public double getP()
Get the proportion of instances that are common between two training sets.

Returns:
the proportion

getError

public double getError()
Get the calculated error rate

Returns:
the error rate

decompose

public void decompose()
               throws java.lang.Exception
Carry out the bias-variance decomposition using the sub-sampled cross-validation method.

Throws:
java.lang.Exception - if the decomposition couldn't be carried out

findCentralTendencies

public java.util.Vector findCentralTendencies(double[] predProbs)
Finds the central tendency, given the classifications for an instance. Where the central tendency is defined as the class that was most commonly selected for a given instance.

For example, instance 'x' may be classified out of 3 classes y = {1, 2, 3}, so if x is classified 10 times, and is classified as follows, '1' = 2 times, '2' = 5 times and '3' = 3 times. Then the central tendency is '2'.

However, it is important to note that this method returns a list of all classes that have the highest number of classifications. In cases where there are several classes with the largest number of classifications, then all of these classes are returned. For example if 'x' is classified '1' = 4 times, '2' = 4 times and '3' = 2 times. Then '1' and '2' are returned.

Parameters:
predProbs - the array of classifications for a single instance.
Returns:
a Vector containing Integer objects which store the class(s) which are the central tendency.

toString

public java.lang.String toString()
Returns description of the bias-variance decomposition results.

Overrides:
toString in class java.lang.Object
Returns:
the bias-variance decomposition results as a string

main

public static void main(java.lang.String[] args)
Test method for this class

Parameters:
args - the command line arguments

randomize

public final void randomize(int[] index,
                            java.util.Random random)
Accepts an array of ints and randomises the values in the array, using the random seed.

Parameters:
index - is the array of integers
random - is the Random seed.