Class ChiSquareStat

java.lang.Object
org.bzdev.math.stats.Statistic
org.bzdev.math.stats.ChiSquareStat

public class ChiSquareStat extends Statistic
Class representing a ChiSquare statistic.

This class supports two forms for χ2 statistics.

  • $\chi^2 = \sum_{i}\frac{(x_i - E_i)^2}{E_i}$. This form is appropriate when the data points xi are counts where there is an expected value Ei.
  • $\chi^2 = \sum_i \frac{(x_i - E_i)^2}{\sigma_i^2}$. This form is appropriate when the data points xi are measurements with a standard error σi and a true or expected value Ei.
As a general rule, only one of these forms would be used.
  • Nested Class Summary

    Nested classes/interfaces inherited from class org.bzdev.math.stats.Statistic

    Statistic.PValueMode
  • Constructor Summary

    Constructors
    Constructor
    Description
    Constructor.
    ChiSquareStat(double[] data, double expected)
    Constructor given a data-point array and a corresponding true (or expected) value for the data points.
    ChiSquareStat(double[] data, double[] expected)
    Constructor given a data-point array an array of the corresponding true (or expected) values for the data points.
    ChiSquareStat(double[] data, double[] expected, double sigma)
    Contructor given a data-point array, an array of the corresponding true (or expected values), and the standard deviation for the data points.
    ChiSquareStat(double[] data, double[] expected, double[] sigma)
    Contructor given a data-point array, an array of the corresponding true (or expected) values, and an array of the corresponding standard deviation for each data point.
    ChiSquareStat(double[] data, double expected, double sigma)
    Contructor given a data-point array, a corresponding true (or expected) value for the data points, and the standard deviation for the data points.
    ChiSquareStat(double chisq, long n, long constraints)
    Constructor given an explicit χ2 value.
    ChiSquareStat(int[][] data)
    Constructor for categorical data.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    add(double d, double e)
    Add a data point.
    void
    add(double d, double e, double sigma)
    Add a data point with a specified standard error.
    void
    Disallow additional data.
    long
    Get the constraint count.
    long
    Get the number of degrees of freedom for this statistic.
    Get the probability distribution for this statistic.
    getDistribution(double lambda)
    Get a noncentral distribution for this statistic.
    double
    getNCParameter(double diff)
    Get the noncentrality parameter appropriate for this statistic.
    double
    getNCParameter(double... diffs)
    Get the noncentrality parameter appropriate for this statistic given multiple differences.
    double
    Get the value of this statistic.
    boolean
    Determine if this statistic is frozen.
    void
    setConstraints(long constraints)
    Set the constraint count.
    void
    setDegreesOfFreedom(long degreesOfFreedom)
    Set the number of degrees of freedom The argument is the number of degrees of freedom appropriate for the value currently returned by size().
    long
    Get the number of values used to compute this statistic.

    Methods inherited from class org.bzdev.math.stats.Statistic

    getBeta, getBeta, getCriticalValue, getPower, getPower, getPValue, optimalValue

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • ChiSquareStat

      public ChiSquareStat()
      Constructor. A constraint indicates the number of parameters used to determine the true or expected values that are determined by the data.
    • ChiSquareStat

      public ChiSquareStat(double chisq, long n, long constraints)
      Constructor given an explicit χ2 value. The number of degrees of freedom is (n - constraints). If negative, additional data will have to be added before the methods returning the probability distrubition or the number of degrees of freedom can be used.

      This constructor is intended for analysis purposes or for cases in which one might wish to save and restore previously computed statistics.

      Parameters:
      chisq - the χ2 value
      n - the size of the data set
      constraints - the number of constraints.
    • ChiSquareStat

      public ChiSquareStat(double[] data, double[] expected)
      Constructor given a data-point array an array of the corresponding true (or expected) values for the data points.
      Parameters:
      data - an array giving a collection of data points
      expected - the actual or expected values corresponding to each data point
    • ChiSquareStat

      public ChiSquareStat(double[] data, double expected)
      Constructor given a data-point array and a corresponding true (or expected) value for the data points.
      Parameters:
      data - an array giving a collection of data points
      expected - the actual or expected values corresponding to each data point
    • ChiSquareStat

      public ChiSquareStat(double[] data, double[] expected, double[] sigma)
      Contructor given a data-point array, an array of the corresponding true (or expected) values, and an array of the corresponding standard deviation for each data point.
      Parameters:
      data - an array giving a collection of data points
      expected - the actual or expected values corresponding to each data point
      sigma - the standard error for each data point
    • ChiSquareStat

      public ChiSquareStat(double[] data, double[] expected, double sigma)
      Contructor given a data-point array, an array of the corresponding true (or expected values), and the standard deviation for the data points.
      Parameters:
      data - an array giving a collection of data points
      expected - the actual or expected values corresponding to each data point
      sigma - the standard error for each data point
    • ChiSquareStat

      public ChiSquareStat(double[] data, double expected, double sigma)
      Contructor given a data-point array, a corresponding true (or expected) value for the data points, and the standard deviation for the data points.
      Parameters:
      data - an array giving a collection of data points
      expected - the actual or expected value for each data point (the same value for all)
      sigma - the standard deviation for each data point (the same value for all)
    • ChiSquareStat

      public ChiSquareStat(int[][] data)
      Constructor for categorical data. This test determines if two categories are independent. Each category is modeled by a fixed number of elements specified by an index, and a matrix provides counts of entries that are associated with a pair of elements, one from each category. One category is represented by the rows of the matrix and the other by its columns.

      One must not use one of the "add" methods after this constructor is used, nor change the number of degrees of freedom, nor set the number of constraints - otherwise an exception will be thrown.

      Parameters:
      data - an N by M matrix of counts
  • Method Details

    • freeze

      public void freeze()
      Disallow additional data.
    • isFrozen

      public boolean isFrozen()
      Determine if this statistic is frozen.
      Returns:
      true if no more data can be added; false otherwise
    • setConstraints

      public void setConstraints(long constraints) throws IllegalArgumentException, IllegalStateException
      Set the constraint count. When the constraint count is set to a specific value, the the number of degrees of freedom will be equal to size()-getConstraints(). The constraint count may be modified if setDegreesOfFreedom(long) is called.
      Parameters:
      constraints - the number of constraints
      Throws:
      IllegalArgumentException - the argument was out of range
      IllegalStateException - this method may not be called given the constructor used
    • getConstraints

      public long getConstraints()
      Get the constraint count.
      Returns:
      the constraint count
    • size

      public long size()
      Get the number of values used to compute this statistic.
      Returns:
      the number of values
    • getDegreesOfFreedom

      public long getDegreesOfFreedom()
      Get the number of degrees of freedom for this statistic.
      Returns:
      the number of degrees of freedom.
      Throws:
      IllegalStateException - more data points must be added before this method is called
    • setDegreesOfFreedom

      public void setDegreesOfFreedom(long degreesOfFreedom) throws IllegalArgumentException, IllegalStateException
      Set the number of degrees of freedom The argument is the number of degrees of freedom appropriate for the value currently returned by size(). If N additional entries are added, the number of degrees of freedom will increase by N. If this behavior is not appropriate, this method must be called when all entries are added. Calling this method may alter the constraint count.
      Parameters:
      degreesOfFreedom - the number of degrees of freedom
      Throws:
      IllegalArgumentException - the argument was out of range
      IllegalStateException - this method may not be called given the constructor used
    • add

      public void add(double d, double e) throws IllegalStateException
      Add a data point.
      Parameters:
      d - a data point
      e - the true (or expected) value for the data point
      Throws:
      IllegalStateException - this method may not be called given the constructor used
    • add

      public void add(double d, double e, double sigma) throws IllegalStateException
      Add a data point with a specified standard error.
      Parameters:
      d - a data point
      e - the true (or expected) value for the data point
      sigma - the standard error for the data point
      Throws:
      IllegalStateException - this method may not be called given the constructor used
    • getValue

      public double getValue() throws IllegalStateException
      Description copied from class: Statistic
      Get the value of this statistic.
      Specified by:
      getValue in class Statistic
      Returns:
      the value of this statistic
      Throws:
      IllegalStateException - the value cannot be computed (for example, because data has not yet been entered)
    • getDistribution

      public ProbDistribution getDistribution() throws IllegalStateException
      Description copied from class: Statistic
      Get the probability distribution for this statistic. The distribution is the distribution for the statistic, not the the distribution for the data the statistic describes.
      Specified by:
      getDistribution in class Statistic
      Returns:
      the probability distribution
      Throws:
      IllegalStateException - the value cannot be computed (for example, because data has not yet been entered)
    • getDistribution

      public ProbDistribution getDistribution(double lambda) throws IllegalArgumentException, IllegalStateException
      Get a noncentral distribution for this statistic. The definition of λ provided by ChiSquareDistr is such that, if the mean of k independent, normally distributed random variables is μi and these random variables have variances equal to 1.0, then the distribution of the sum of the squares of these random variables is a noncentral χ2 distribution with k degrees of freedom and a noncentrality parameter &lamba;. For the statistic provided by this class, each of these random variables is a ratio of a difference and a quantity that is essential a standard deviation. The difference is the difference between a value Xi and an expected value Ei, and the standard deviation is the standard deviation for Xi. The χ2 distribution assumes that these ratios have a normal distribution with unit variances and a mean of 0. For a noncentral χ2 distribution, the ratios are assumed to have a non-zero mean. The noncentrality parameter λ is defined by $\lambda = \sum_{i=1}^k \mu_i$, where μi is the mean value of the ith ratio.

      If there are k degrees of freedom, but additional terms in the χ2 sum, it is possible to define k variables zi so that χ2 = z12 + ... + zk2, where the mean of each zi is zero and its standard deviation is 1. The same value for &lamba; will be obtained. All the xi variables must satisify (n-k) linear equations that act as constraints, where n is the total number of terms.

      Overrides:
      getDistribution in class Statistic
      Parameters:
      lambda - the noncentrality parameter
      Returns:
      the (noncentral) probability distribution
      Throws:
      IllegalArgumentException - the argument is not allowed for this statistic
      IllegalStateException - the state of this statistic does not allow this function to return a meaningful value (e.g., because enough data has not be provided)
    • getNCParameter

      public double getNCParameter(double diff)
      Get the noncentrality parameter appropriate for this statistic. The argument is equal to the difference E'i - Ei where E'i is the actual value and Ei is the value used when the contructor was called. This offset is assumed be the same for all i. If we set Yi = δi = E'i - Ei, then (Xi-Ei - δi + δi) / σi = (Xi - E'i + δi) / σi. The standard deviation of Xi is σi and hence the standard deviation of Yi is also σi. Consequently, the variance of (Yi + δi)/σi is 1.0 and the corresponding mean value μi = δi / σi. The distribution of $\sum_{i=1}^k\frac{Y_i + \delta_i}{\sigma_i}$ is thus a noncentral χ2 with a non-centrality parameter $\lambda = \sum_{i=1}^k \mu_i$.

      This method computes λ for the case were the differences δi are identical. If the number of variables is large than the number of degrees of freedom, the user must ensure that constraining all the δi values so that they are equal does not violate the constraint equations that reduce the number of degrees of freedom.

      Overrides:
      getNCParameter in class Statistic
      Parameters:
      diff - the difference from their expected values for the mean value of the random variables used in the sum of squares
      Returns:
      the noncentrality parameter
      Throws:
      IllegalStateException - additional data was added after the constructor was called or the number of constraints was not zero
    • getNCParameter

      public double getNCParameter(double... diffs)
      Get the noncentrality parameter appropriate for this statistic given multiple differences. The ith argument is equal to the difference E'i - Ei where E'i is the desired value and Ei is the value used when the constructor was called.

      If the number of variables is large than the number of degrees of freedom, the user must ensure that the choice of arguments is consistent with the constraint equations that reduce the number of degrees of freedom.

      If there is only a single argument or an array of length 1, the argument or array value will be passed to getNCParameter(double).

      Overrides:
      getNCParameter in class Statistic
      Parameters:
      diffs - the difference from their expected values for the mean values of the random variables used in the sum of squares
      Returns:
      the noncentrality parameter.
      Throws:
      IllegalStateException - additional data was added after the constructor was called or the number of constraints was not zero