weka.filters.unsupervised.attribute
Class InterquartileRange

java.lang.Object
  extended by weka.filters.Filter
      extended by weka.filters.SimpleFilter
          extended by weka.filters.SimpleBatchFilter
              extended by weka.filters.unsupervised.attribute.InterquartileRange
All Implemented Interfaces:
java.io.Serializable, CapabilitiesHandler, OptionHandler

public class InterquartileRange
extends SimpleBatchFilter

A filter for detecting outliers and extreme values based on interquartile ranges. The filter skips the class attribute.

Outliers:
Q3 + OF*IQR < x <= Q3 + EVF*IQR
or
Q1 - EVF*IQR <= x < Q1 - OF*IQR

Extreme values:
x > Q3 + EVF*IQR
or
x < Q1 - EVF*IQR

Key:
Q1 = 25% quartile
Q3 = 75% quartile
IQR = Interquartile Range, difference between Q1 and Q3
OF = Outlier Factor
EVF = Extreme Value Factor

Valid options are:

 -D
  Turns on output of debugging information.
 -R <col1,col2-col4,...>
  Specifies list of columns to base outlier/extreme value detection
  on. If an instance is considered in at least one of those
  attributes an outlier/extreme value, it is tagged accordingly.
  'first' and 'last' are valid indexes.
  (default none)
 -O <num>
  The factor for outlier detection.
  (default: 3)
 -E <num>
  The factor for extreme values detection.
  (default: 2*Outlier Factor)
 -E-as-O
  Tags extreme values also as outliers.
  (default: off)
 -P
  Generates Outlier/ExtremeValue pair for each numeric attribute in
  the range, not just a single indicator pair for all the attributes.
  (default: off)
 -M
  Generates an additional attribute 'Offset' per Outlier/ExtremeValue
  pair that contains the multiplier that the value is off the median.
     value = median + 'multiplier' * IQR
 Note: implicitely sets '-P'. (default: off)
Thanks to Dale for a few brainstorming sessions.

Version:
$Revision: 1.2 $
Author:
Dale Fletcher (dale at cs dot waikato dot ac dot nz), fracpete (fracpete at waikato dot ac dot nz)
See Also:
Serialized Form

Field Summary
static int NON_NUMERIC
          indicator for non-numeric attributes
 
Constructor Summary
InterquartileRange()
           
 
Method Summary
 java.lang.String attributeIndicesTipText()
          Returns the tip text for this property
 java.lang.String detectionPerAttributeTipText()
          Returns the tip text for this property
 java.lang.String extremeValuesAsOutliersTipText()
          Returns the tip text for this property
 java.lang.String extremeValuesFactorTipText()
          Returns the tip text for this property
 java.lang.String getAttributeIndices()
          Gets the current range selection
 Capabilities getCapabilities()
          Returns the Capabilities of this filter.
 boolean getDetectionPerAttribute()
          Gets whether an Outlier/ExtremeValue attribute pair is generated for each numeric attribute ("true") or just one pair for all numeric attributes together ("false").
 boolean getExtremeValuesAsOutliers()
          Get whether extreme values are also tagged as outliers.
 double getExtremeValuesFactor()
          Gets the factor for determining the thresholds for extreme values.
 java.lang.String[] getOptions()
          Gets the current settings of the filter.
 double getOutlierFactor()
          Gets the factor for determining the thresholds for outliers.
 boolean getOutputOffsetMultiplier()
          Gets whether an additional attribute "Offset" is generated per Outlier/ExtremeValue attribute pair that lists the multiplier the value is off the median: value = median + 'multiplier' * IQR.
 java.lang.String globalInfo()
          Returns a string describing this filter
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
static void main(java.lang.String[] args)
          Main method for testing this class.
 java.lang.String outlierFactorTipText()
          Returns the tip text for this property
 java.lang.String outputOffsetMultiplierTipText()
          Returns the tip text for this property
 void setAttributeIndices(java.lang.String value)
          Sets which attributes are to be used for interquartile calculations and outlier/extreme value detection (only numeric attributes among the selection will be used).
 void setAttributeIndicesArray(int[] value)
          Sets which attributes are to be used for interquartile calculations and outlier/extreme value detection (only numeric attributes among the selection will be used).
 void setDetectionPerAttribute(boolean value)
          Set whether an Outlier/ExtremeValue attribute pair is generated for each numeric attribute ("true") or just one pair for all numeric attributes together ("false").
 void setExtremeValuesAsOutliers(boolean value)
          Set whether extreme values are also tagged as outliers.
 void setExtremeValuesFactor(double value)
          Sets the factor for determining the thresholds for extreme values.
 void setOptions(java.lang.String[] options)
          Parses a list of options for this object.
 void setOutlierFactor(double value)
          Sets the factor for determining the thresholds for outliers.
 void setOutputOffsetMultiplier(boolean value)
          Set whether an additional attribute "Offset" is generated per Outlier/ExtremeValue attribute pair that lists the multiplier the value is off the median: value = median + 'multiplier' * IQR.
 
Methods inherited from class weka.filters.SimpleBatchFilter
batchFinished, input
 
Methods inherited from class weka.filters.SimpleFilter
debugTipText, getDebug, setDebug, setInputFormat
 
Methods inherited from class weka.filters.Filter
batchFilterFile, filterFile, getCapabilities, getOutputFormat, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, numPendingOutput, output, outputPeek, toString, useFilter, wekaStaticWrapper
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

NON_NUMERIC

public static final int NON_NUMERIC
indicator for non-numeric attributes

See Also:
Constant Field Values
Constructor Detail

InterquartileRange

public InterquartileRange()
Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing this filter

Specified by:
globalInfo in class SimpleFilter
Returns:
a description of the filter suitable for displaying in the explorer/experimenter gui

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Overrides:
listOptions in class SimpleFilter
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a list of options for this object.

Valid options are:

 -D
  Turns on output of debugging information.
 -R <col1,col2-col4,...>
  Specifies list of columns to base outlier/extreme value detection
  on. If an instance is considered in at least one of those
  attributes an outlier/extreme value, it is tagged accordingly.
  'first' and 'last' are valid indexes.
  (default none)
 -O <num>
  The factor for outlier detection.
  (default: 3)
 -E <num>
  The factor for extreme values detection.
  (default: 2*Outlier Factor)
 -E-as-O
  Tags extreme values also as outliers.
  (default: off)
 -P
  Generates Outlier/ExtremeValue pair for each numeric attribute in
  the range, not just a single indicator pair for all the attributes.
  (default: off)
 -M
  Generates an additional attribute 'Offset' per Outlier/ExtremeValue
  pair that contains the multiplier that the value is off the median.
     value = median + 'multiplier' * IQR
 Note: implicitely sets '-P'. (default: off)

Specified by:
setOptions in interface OptionHandler
Overrides:
setOptions in class SimpleFilter
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported
See Also:
SimpleFilter.reset()

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the filter.

Specified by:
getOptions in interface OptionHandler
Overrides:
getOptions in class SimpleFilter
Returns:
an array of strings suitable for passing to setOptions

attributeIndicesTipText

public java.lang.String attributeIndicesTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getAttributeIndices

public java.lang.String getAttributeIndices()
Gets the current range selection

Returns:
a string containing a comma separated list of ranges

setAttributeIndices

public void setAttributeIndices(java.lang.String value)
Sets which attributes are to be used for interquartile calculations and outlier/extreme value detection (only numeric attributes among the selection will be used).

Parameters:
value - a string representing the list of attributes. Since the string will typically come from a user, attributes are indexed from 1.
eg: first-3,5,6-last
Throws:
java.lang.IllegalArgumentException - if an invalid range list is supplied

setAttributeIndicesArray

public void setAttributeIndicesArray(int[] value)
Sets which attributes are to be used for interquartile calculations and outlier/extreme value detection (only numeric attributes among the selection will be used).

Parameters:
value - an array containing indexes of attributes to work on. Since the array will typically come from a program, attributes are indexed from 0.
Throws:
java.lang.IllegalArgumentException - if an invalid set of ranges is supplied

outlierFactorTipText

public java.lang.String outlierFactorTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setOutlierFactor

public void setOutlierFactor(double value)
Sets the factor for determining the thresholds for outliers.

Parameters:
value - the factor.

getOutlierFactor

public double getOutlierFactor()
Gets the factor for determining the thresholds for outliers.

Returns:
the factor.

extremeValuesFactorTipText

public java.lang.String extremeValuesFactorTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setExtremeValuesFactor

public void setExtremeValuesFactor(double value)
Sets the factor for determining the thresholds for extreme values.

Parameters:
value - the factor.

getExtremeValuesFactor

public double getExtremeValuesFactor()
Gets the factor for determining the thresholds for extreme values.

Returns:
the factor.

extremeValuesAsOutliersTipText

public java.lang.String extremeValuesAsOutliersTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setExtremeValuesAsOutliers

public void setExtremeValuesAsOutliers(boolean value)
Set whether extreme values are also tagged as outliers.

Parameters:
value - whether or not to tag extreme values also as outliers.

getExtremeValuesAsOutliers

public boolean getExtremeValuesAsOutliers()
Get whether extreme values are also tagged as outliers.

Returns:
true if extreme values are also tagged as outliers.

detectionPerAttributeTipText

public java.lang.String detectionPerAttributeTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setDetectionPerAttribute

public void setDetectionPerAttribute(boolean value)
Set whether an Outlier/ExtremeValue attribute pair is generated for each numeric attribute ("true") or just one pair for all numeric attributes together ("false").

Parameters:
value - whether or not to generate indicator attribute pairs for each numeric attribute.

getDetectionPerAttribute

public boolean getDetectionPerAttribute()
Gets whether an Outlier/ExtremeValue attribute pair is generated for each numeric attribute ("true") or just one pair for all numeric attributes together ("false").

Returns:
true if indicator attribute pairs are generated for each numeric attribute.

outputOffsetMultiplierTipText

public java.lang.String outputOffsetMultiplierTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setOutputOffsetMultiplier

public void setOutputOffsetMultiplier(boolean value)
Set whether an additional attribute "Offset" is generated per Outlier/ExtremeValue attribute pair that lists the multiplier the value is off the median: value = median + 'multiplier' * IQR.

Parameters:
value - whether or not to generate the additional attribute.

getOutputOffsetMultiplier

public boolean getOutputOffsetMultiplier()
Gets whether an additional attribute "Offset" is generated per Outlier/ExtremeValue attribute pair that lists the multiplier the value is off the median: value = median + 'multiplier' * IQR.

Returns:
true if the additional attribute is generated.

getCapabilities

public Capabilities getCapabilities()
Returns the Capabilities of this filter.

Specified by:
getCapabilities in interface CapabilitiesHandler
Overrides:
getCapabilities in class Filter
Returns:
the capabilities of this object
See Also:
Capabilities

main

public static void main(java.lang.String[] args)
Main method for testing this class.

Parameters:
args - should contain arguments to the filter: use -h for help