weka.core.neighboursearch
Class KDTree

java.lang.Object
  extended by weka.core.neighboursearch.NearestNeighbourSearch
      extended by weka.core.neighboursearch.KDTree
All Implemented Interfaces:
java.io.Serializable, AdditionalMeasureProducer, OptionHandler, TechnicalInformationHandler

public class KDTree
extends NearestNeighbourSearch
implements TechnicalInformationHandler

Class implementing the KDTree search algorithm for nearest neighbour search.
The connection to dataset is only a reference. For the tree structure the indexes are stored in an array.
Building the tree:
If a node has <maximal-inst-number> (option -L) instances no further splitting is done. Also if the split would leave one side empty, the branch is not split any further even if the instances in the resulting node are more than <maximal-inst-number> instances.
**PLEASE NOTE:** The algorithm can not handle missing values, so it is advisable to run ReplaceMissingValues filter if there are any missing values in the dataset.

For more information see:

Jerome H. Friedman, Jon Luis Bentley, Raphael Ari Finkel (1977). An Algorithm for Finding Best Matches in Logarithmic Expected Time. ACM Transactions on Mathematics Software. 3(3):209-226.

Andrew Moore (1991). A tutorial on kd-trees.

BibTeX:

 @article{Friedman1977,
    author = {Jerome H. Friedman and Jon Luis Bentley and Raphael Ari Finkel},
    journal = {ACM Transactions on Mathematics Software},
    month = {September},
    number = {3},
    pages = {209-226},
    title = {An Algorithm for Finding Best Matches in Logarithmic Expected Time},
    volume = {3},
    year = {1977}
 }
 
 @techreport{Moore1991,
    author = {Andrew Moore},
    booktitle = {University of Cambridge Computer Laboratory Technical Report No. 209},
    howpublished = {Extract from PhD Thesis},
    title = {A tutorial on kd-trees},
    year = {1991},
    HTTP = {Available from http://www.autonlab.org/autonweb/14665.html}
 }
 

Valid options are:

 -S <classname and options>
  Node splitting method to use.
  (default: weka.core.neighboursearch.kdtrees.SlidingMidPointOfWidestSide)
 -W <value>
  Set minimal width of a box
  (default: 1.0E-2).
 -L
  Maximal number of instances in a leaf
  (default: 40).
 -N
  Normalizing will be done
  (Select dimension for split, with normalising to universe).

Version:
$Revision: 1.1 $
Author:
Gabi Schmidberger (gabi[at-the-rate]cs[dot]waikato[dot]ac[dot]nz), Malcolm Ware (mfw4[at-the-rate]cs[dot]waikato[dot]ac[dot]nz), Ashraf M. Kibriya (amk14[at-the-rate]cs[dot]waikato[dot]ac[dot]nz)
See Also:
Serialized Form

Field Summary
static int MAX
          The index of MAX value in attributes' range array.
static int MIN
          The index of MIN value in attributes' range array.
static int WIDTH
          The index of WIDTH (MAX-MIN) value in attributes' range array.
 
Constructor Summary
KDTree()
          Creates a new instance of KDTree.
KDTree(Instances insts)
          Creates a new instance of KDTree.
 
Method Summary
 void addInstanceInfo(Instance instance)
          Adds one instance to KDTree loosly.
 void assignSubToCenters(KDTreeNode node, Instances centers, int[] centList, int[] assignments)
          Assigns instances of this node to center.
 void centerInstances(Instances centers, int[] assignments, double pc)
          Assigns instances to centers using KDTree.
 java.util.Enumeration enumerateMeasures()
          Returns an enumeration of the additional measure names.
 DistanceFunction getDistanceFunction()
          returns the distance function currently in use.
 double[] getDistances()
          Returns the distances to the kNearest or 1 nearest neighbour currently found with either the kNearestNeighbours or the nearestNeighbour method.
 int getMaxInstInLeaf()
          Get the maximum number of instances in a leaf.
 double getMeasure(java.lang.String additionalMeasureName)
          Returns the value of the named measure.
 double getMinBoxRelWidth()
          Gets the minimum relative box width.
 KDTreeNodeSplitter getNodeSplitter()
          Returns the splitting method currently in use to split the nodes of the KDTree.
 boolean getNormalizeNodeWidth()
          Gets the normalize flag.
 java.lang.String[] getOptions()
          Gets the current settings of KDtree.
 TechnicalInformation getTechnicalInformation()
          Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
 java.lang.String globalInfo()
          Returns a string describing this nearest neighbour search algorithm.
 Instances kNearestNeighbours(Instance target, int k)
          Returns the k nearest neighbours of the supplied instance.
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
 java.lang.String maxInstInLeafTipText()
          Tip text for this property.
 double measureMaxDepth()
          Returns the depth of the tree.
 double measureNumLeaves()
          Returns the number of leaves.
 double measureTreeSize()
          Returns the size of the tree.
 java.lang.String minBoxRelWidthTipText()
          Tip text for this property.
 Instance nearestNeighbour(Instance target)
          Returns the nearest neighbour of the supplied target instance.
 java.lang.String nodeSplitterTipText()
          Returns the tip text for this property.
 java.lang.String normalizeNodeWidthTipText()
          Tip text for this property.
 void setDistanceFunction(DistanceFunction df)
          sets the distance function to use for nearest neighbour search.
 void setInstances(Instances instances)
          Builds the KDTree on the given set of instances.
 void setMaxInstInLeaf(int i)
          Sets the maximum number of instances in a leaf.
 void setMeasurePerformance(boolean measurePerformance)
          Sets whether to calculate the performance statistics or not.
 void setMinBoxRelWidth(double i)
          Sets the minimum relative box width.
 void setNodeSplitter(KDTreeNodeSplitter splitter)
          Sets the splitting method to use to split the nodes of the KDTree.
 void setNormalizeNodeWidth(boolean n)
          Sets the flag for normalizing the widths of a KDTree Node by the width of the dimension in the universe.
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void update(Instance instance)
          Adds one instance to the KDTree.
 
Methods inherited from class weka.core.neighboursearch.NearestNeighbourSearch
combSort11, distanceFunctionTipText, getInstances, getMeasurePerformance, getPerformanceStats, measurePerformanceTipText, quickSort
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

MIN

public static final int MIN
The index of MIN value in attributes' range array.

See Also:
Constant Field Values

MAX

public static final int MAX
The index of MAX value in attributes' range array.

See Also:
Constant Field Values

WIDTH

public static final int WIDTH
The index of WIDTH (MAX-MIN) value in attributes' range array.

See Also:
Constant Field Values
Constructor Detail

KDTree

public KDTree()
Creates a new instance of KDTree.


KDTree

public KDTree(Instances insts)
Creates a new instance of KDTree. It also builds the tree on supplied set of Instances.

Parameters:
insts - The instances/points on which the BallTree should be built on.
Method Detail

getTechnicalInformation

public TechnicalInformation getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.

Specified by:
getTechnicalInformation in interface TechnicalInformationHandler
Returns:
the technical information about this class

kNearestNeighbours

public Instances kNearestNeighbours(Instance target,
                                    int k)
                             throws java.lang.Exception
Returns the k nearest neighbours of the supplied instance. >k neighbours are returned if there are more than one neighbours at the kth boundary.

Specified by:
kNearestNeighbours in class NearestNeighbourSearch
Parameters:
target - The instance to find the nearest neighbours for.
k - The number of neighbours to find.
Returns:
The k nearest neighbours (or >k if more there are than one neighbours at the kth boundary).
Throws:
java.lang.Exception - if the nearest neighbour could not be found.

nearestNeighbour

public Instance nearestNeighbour(Instance target)
                          throws java.lang.Exception
Returns the nearest neighbour of the supplied target instance.

Specified by:
nearestNeighbour in class NearestNeighbourSearch
Parameters:
target - The instance to find the nearest neighbour for.
Returns:
The nearest neighbour from among the previously supplied training instances.
Throws:
java.lang.Exception - if the neighbours could not be found.

getDistances

public double[] getDistances()
                      throws java.lang.Exception
Returns the distances to the kNearest or 1 nearest neighbour currently found with either the kNearestNeighbours or the nearestNeighbour method.

Specified by:
getDistances in class NearestNeighbourSearch
Returns:
array containing the distances of the nearestNeighbours. The length and ordering of the array is the same as that of the instances returned by nearestNeighbour functions.
Throws:
java.lang.Exception - if called before calling kNearestNeighbours or nearestNeighbours.

setInstances

public void setInstances(Instances instances)
                  throws java.lang.Exception
Builds the KDTree on the given set of instances.

Overrides:
setInstances in class NearestNeighbourSearch
Parameters:
instances - The insts on which the KDTree is to be built.
Throws:
java.lang.Exception - If some error occurs while building the KDTree

update

public void update(Instance instance)
            throws java.lang.Exception
Adds one instance to the KDTree. This updates the KDTree structure to take into account the newly added training instance.

Specified by:
update in class NearestNeighbourSearch
Parameters:
instance - the instance to be added. Usually the newly added instance in the training set.
Throws:
java.lang.Exception - If the instance cannot be added.

addInstanceInfo

public void addInstanceInfo(Instance instance)
Adds one instance to KDTree loosly. It only changes the ranges in EuclideanDistance, and does not affect the structure of the KDTree.

Overrides:
addInstanceInfo in class NearestNeighbourSearch
Parameters:
instance - the new instance. Usually this is the test instance supplied to update the range of attributes in the distance function.

measureTreeSize

public double measureTreeSize()
Returns the size of the tree.

Returns:
the size of the tree

measureNumLeaves

public double measureNumLeaves()
Returns the number of leaves.

Returns:
the number of leaves

measureMaxDepth

public double measureMaxDepth()
Returns the depth of the tree.

Returns:
The depth of the tree

enumerateMeasures

public java.util.Enumeration enumerateMeasures()
Returns an enumeration of the additional measure names.

Specified by:
enumerateMeasures in interface AdditionalMeasureProducer
Overrides:
enumerateMeasures in class NearestNeighbourSearch
Returns:
an enumeration of the measure names

getMeasure

public double getMeasure(java.lang.String additionalMeasureName)
Returns the value of the named measure.

Specified by:
getMeasure in interface AdditionalMeasureProducer
Overrides:
getMeasure in class NearestNeighbourSearch
Parameters:
additionalMeasureName - the name of the measure to query for its value.
Returns:
The value of the named measure
Throws:
java.lang.IllegalArgumentException - If the named measure is not supported.

setMeasurePerformance

public void setMeasurePerformance(boolean measurePerformance)
Sets whether to calculate the performance statistics or not.

Overrides:
setMeasurePerformance in class NearestNeighbourSearch
Parameters:
measurePerformance - Should be true if performance statistics are to be measured.

centerInstances

public void centerInstances(Instances centers,
                            int[] assignments,
                            double pc)
                     throws java.lang.Exception
Assigns instances to centers using KDTree.

Parameters:
centers - the current centers
assignments - the centerindex for each instance
pc - the threshold value for pruning.
Throws:
java.lang.Exception - If there is some problem assigning instances to centers.

assignSubToCenters

public void assignSubToCenters(KDTreeNode node,
                               Instances centers,
                               int[] centList,
                               int[] assignments)
                        throws java.lang.Exception
Assigns instances of this node to center. Center to be assign to is decided by the distance function.

Parameters:
node - The KDTreeNode whose instances are to be assigned.
centers - all the input centers
centList - the list of centers to work with
assignments - index list of last assignments
Throws:
java.lang.Exception - If there is error assigning the instances.

minBoxRelWidthTipText

public java.lang.String minBoxRelWidthTipText()
Tip text for this property.

Returns:
the tip text for this property

setMinBoxRelWidth

public void setMinBoxRelWidth(double i)
Sets the minimum relative box width.

Parameters:
i - the minimum relative box width

getMinBoxRelWidth

public double getMinBoxRelWidth()
Gets the minimum relative box width.

Returns:
the minimum relative box width

maxInstInLeafTipText

public java.lang.String maxInstInLeafTipText()
Tip text for this property.

Returns:
the tip text for this property

setMaxInstInLeaf

public void setMaxInstInLeaf(int i)
Sets the maximum number of instances in a leaf.

Parameters:
i - the maximum number of instances in a leaf

getMaxInstInLeaf

public int getMaxInstInLeaf()
Get the maximum number of instances in a leaf.

Returns:
the maximum number of instances in a leaf

normalizeNodeWidthTipText

public java.lang.String normalizeNodeWidthTipText()
Tip text for this property.

Returns:
the tip text for this property

setNormalizeNodeWidth

public void setNormalizeNodeWidth(boolean n)
Sets the flag for normalizing the widths of a KDTree Node by the width of the dimension in the universe.

Parameters:
n - true to use normalizing.

getNormalizeNodeWidth

public boolean getNormalizeNodeWidth()
Gets the normalize flag.

Returns:
True if normalizing

getDistanceFunction

public DistanceFunction getDistanceFunction()
returns the distance function currently in use.

Overrides:
getDistanceFunction in class NearestNeighbourSearch
Returns:
the distance function

setDistanceFunction

public void setDistanceFunction(DistanceFunction df)
                         throws java.lang.Exception
sets the distance function to use for nearest neighbour search.

Overrides:
setDistanceFunction in class NearestNeighbourSearch
Parameters:
df - the distance function to use
Throws:
java.lang.Exception - if not EuclideanDistance

nodeSplitterTipText

public java.lang.String nodeSplitterTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getNodeSplitter

public KDTreeNodeSplitter getNodeSplitter()
Returns the splitting method currently in use to split the nodes of the KDTree.

Returns:
The KDTreeNodeSplitter currently in use.

setNodeSplitter

public void setNodeSplitter(KDTreeNodeSplitter splitter)
Sets the splitting method to use to split the nodes of the KDTree.

Parameters:
splitter - The KDTreeNodeSplitter to use.

globalInfo

public java.lang.String globalInfo()
Returns a string describing this nearest neighbour search algorithm.

Overrides:
globalInfo in class NearestNeighbourSearch
Returns:
a description of the algorithm for displaying in the explorer/experimenter gui

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Overrides:
listOptions in class NearestNeighbourSearch
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options.

Valid options are:

 -S <classname and options>
  Node splitting method to use.
  (default: weka.core.neighboursearch.kdtrees.SlidingMidPointOfWidestSide)
 -W <value>
  Set minimal width of a box
  (default: 1.0E-2).
 -L
  Maximal number of instances in a leaf
  (default: 40).
 -N
  Normalizing will be done
  (Select dimension for split, with normalising to universe).

Specified by:
setOptions in interface OptionHandler
Overrides:
setOptions in class NearestNeighbourSearch
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of KDtree.

Specified by:
getOptions in interface OptionHandler
Overrides:
getOptions in class NearestNeighbourSearch
Returns:
an array of strings suitable for passing to setOptions