public class DynamicFloatBin1D extends QuantileFloatBin1D
The data filled into a DynamicBin1D is internally preserved in the
bin. As a consequence this bin can compute more than only basic statistics.
On the other hand side, if you add huge amounts of elements, you may run out
of memory (each element takes 8 bytes). If this drawbacks matter, consider to
use StaticFloatBin1D
, which overcomes them at the expense of limited
functionality.
This class is fully thread safe (all public methods are synchronized). Thus, you can have one or more threads adding to the bin as well as one or more threads reading and viewing the statistics of the bin while it is filled. For high performance, add data in large chunks (buffers) via method addAllOf rather than piecewise via method add.
If your favourite statistics measure is not directly provided by this class,
check out FloatDescriptive
in combination with
methods elements()
and sortedElements()
.
Implementation: Lazy evaluation, caching, incremental maintainance.
FloatDescriptive
,
Serialized FormConstructor and Description |
---|
DynamicFloatBin1D()
Constructs and returns an empty bin; implicitly calls
setFixedOrder(false) . |
Modifier and Type | Method and Description |
---|---|
void |
add(float element)
Adds the specified element to the receiver.
|
void |
addAllOfFromTo(FloatArrayList list,
int from,
int to)
Adds the part of the specified list between indexes from
(inclusive) and to (inclusive) to the receiver.
|
float |
aggregate(FloatFloatFunction aggr,
FloatFunction f)
Applies a function to each element and aggregates the results.
|
void |
clear()
Removes all elements from the receiver.
|
Object |
clone()
Returns a deep copy of the receiver.
|
float |
correlation(DynamicFloatBin1D other)
Returns the correlation of two bins, which is
corr(x,y) = covariance(x,y) / (stdDev(x)*stdDev(y)) (Pearson's
correlation coefficient).
|
float |
covariance(DynamicFloatBin1D other)
Returns the covariance of two bins, which is
cov(x,y) = (1/size()) * Sum((x[i]-mean(x)) * (y[i]-mean(y))).
|
FloatArrayList |
elements()
Returns a copy of the currently stored elements.
|
boolean |
equals(Object object)
Returns whether two bins are equal.
|
void |
frequencies(FloatArrayList distinctElements,
IntArrayList frequencies)
Computes the frequency (number of occurances, count) of each distinct
element.
|
int |
getMaxOrderForSumOfPowers()
Returns Integer.MAX_VALUE, the maximum order k for
which sums of powers are retrievable.
|
int |
getMinOrderForSumOfPowers()
Returns Integer.MIN_VALUE, the minimum order k for
which sums of powers are retrievable.
|
boolean |
isRebinnable()
Returns true.
|
float |
max()
Returns the maximum.
|
float |
min()
Returns the minimum.
|
float |
moment(int k,
float c)
Returns the moment of k-th order with value c, which is
Sum( (x[i]-c)k ) / size().
|
float |
quantile(float phi)
Returns the exact phi-quantile; that is, the smallest contained
element elem for which holds that phi percent of
elements are less than elem.
|
float |
quantileInverse(float element)
Returns exactly how many percent of the elements contained in the
receiver are <= element.
|
FloatArrayList |
quantiles(FloatArrayList percentages)
Returns the exact quantiles of the specified percentages.
|
boolean |
removeAllOf(FloatArrayList list)
Removes from the receiver all elements that are contained in the
specified list.
|
void |
sample(int n,
boolean withReplacement,
FloatRandomEngine randomGenerator,
FloatBuffer buffer)
Uniformly samples (chooses) n random elements with or without
replacement from the contained elements and adds them to the given
buffer.
|
DynamicFloatBin1D |
sampleBootstrap(DynamicFloatBin1D other,
int resamples,
FloatRandomEngine randomGenerator,
FloatBinBinFunction1D function)
Generic bootstrap resampling.
|
void |
setFixedOrder(boolean fixedOrder)
Determines whether the receivers internally preserved elements may be
reordered or not.
|
int |
size()
Returns the number of elements contained in the receiver.
|
FloatArrayList |
sortedElements()
Returns a copy of the currently stored elements, sorted ascending.
|
void |
standardize(float mean,
float standardDeviation)
Modifies the receiver to be standardized.
|
float |
sum()
Returns the sum of all elements, which is Sum( x[i] ).
|
float |
sumOfInversions()
Returns the sum of inversions, which is Sum( 1 / x[i] ).
|
float |
sumOfLogarithms()
Returns the sum of logarithms, which is Sum( Log(x[i]) ).
|
float |
sumOfPowers(int k)
Returns the k-th order sum of powers, which is
Sum( x[i]k ).
|
float |
sumOfSquares()
Returns the sum of squares, which is Sum( x[i] * x[i] ).
|
String |
toString()
Returns a String representation of the receiver.
|
void |
trim(int s,
int l)
Removes the s smallest and l largest elements from the
receiver.
|
float |
trimmedMean(int s,
int l)
Returns the trimmed mean.
|
void |
trimToSize()
Trims the capacity of the receiver to be the receiver's current size.
|
compareWith, median, sizeOfRange, splitApproximately, splitApproximately
geometricMean, harmonicMean, hasSumOfInversions, hasSumOfLogarithms, hasSumOfPowers, kurtosis, product, skew
addAllOf, buffered, mean, rms, standardDeviation, standardError, variance
public DynamicFloatBin1D()
setFixedOrder(false)
.public void add(float element)
add
in class StaticFloatBin1D
element
- element to be appended.public void addAllOfFromTo(FloatArrayList list, int from, int to)
addAllOfFromTo
in class QuantileFloatBin1D
list
- the list of which elements shall be added.from
- the index of the first element to be added (inclusive).to
- the index of the last element to be added (inclusive).IndexOutOfBoundsException
- if
list.size()>0 && (from<0 || from>to || to>=list.size())
.public float aggregate(FloatFloatFunction aggr, FloatFunction f)
Example:
cern.jet.math.Functions F = cern.jet.math.Functions.functions; bin = 0 1 2 3 // Sum( x[i]*x[i] ) bin.aggregate(F.plus,F.square); --> 14For further examples, see the package doc.
aggr
- an aggregation function taking as first argument the current
aggregation and as second argument the transformed current
element.f
- a function transforming the current element.FloatFunctions
public void clear()
clear
in class QuantileFloatBin1D
public Object clone()
clone
in class QuantileFloatBin1D
public float correlation(DynamicFloatBin1D other)
other
- the bin to compare with.IllegalArgumentException
- if size() != other.size().public float covariance(DynamicFloatBin1D other)
other
- the bin to compare with.IllegalArgumentException
- if size() != other.size().public FloatArrayList elements()
setFixedOrder(boolean)
.public boolean equals(Object object)
Definition of Equality for multisets: A,B are equal <=> A is a superset of B and B is a superset of A. (Elements must occur the same number of times, order is irrelevant.)
equals
in class AbstractFloatBin1D
public void frequencies(FloatArrayList distinctElements, IntArrayList frequencies)
Distinct elements are filled into distinctElements, starting at index 0. The frequency of each distinct element is filled into frequencies, starting at index 0. Further, both distinctElements and frequencies are sorted ascending by "element" (in sync, of course). As a result, the smallest distinct element (and its frequency) can be found at index 0, the second smallest distinct element (and its frequency) at index 1, ..., the largest distinct element (and its frequency) at index distinctElements.size()-1.
Example:
elements = (8,7,6,6,7) --> distinctElements = (6,7,8), frequencies = (2,2,1)
distinctElements
- a list to be filled with the distinct elements; can have any
size.frequencies
- a list to be filled with the frequencies; can have any size;
set this parameter to null to ignore it.public int getMaxOrderForSumOfPowers()
getMaxOrderForSumOfPowers
in class MightyStaticFloatBin1D
MightyStaticFloatBin1D.hasSumOfPowers(int)
,
sumOfPowers(int)
public int getMinOrderForSumOfPowers()
getMinOrderForSumOfPowers
in class MightyStaticFloatBin1D
MightyStaticFloatBin1D.hasSumOfPowers(int)
,
sumOfPowers(int)
public boolean isRebinnable()
isRebinnable
in class StaticFloatBin1D
public float max()
max
in class StaticFloatBin1D
public float min()
min
in class StaticFloatBin1D
public float moment(int k, float c)
moment
in class MightyStaticFloatBin1D
k
- the order; any number - can be less than zero, zero or greater
than zero.c
- any number.public float quantile(float phi)
quantile
in class QuantileFloatBin1D
phi
- must satisfy 0 < phi < 1.public float quantileInverse(float element)
quantileInverse
in class QuantileFloatBin1D
element
- the element to search for.public FloatArrayList quantiles(FloatArrayList percentages)
quantiles
in class QuantileFloatBin1D
percentages
- the percentages for which quantiles are to be computed. Each
percentage must be in the interval (0.0,1.0].
percentages must be sorted ascending.public boolean removeAllOf(FloatArrayList list)
list
- the elements to be removed.true
if the receiver changed as a result of the
call.public void sample(int n, boolean withReplacement, FloatRandomEngine randomGenerator, FloatBuffer buffer)
buffered
.n
- the number of elements to choose.withReplacement
- true samples with replacement, otherwise samples
without replacement.randomGenerator
- a random number generator. Set this parameter to null
to use a default random number generator seeded with the
current time.buffer
- the buffer to which chosen elements will be added.IllegalArgumentException
- if !withReplacement && n > size().cern.jet.random.tfloat.sampling
public DynamicFloatBin1D sampleBootstrap(DynamicFloatBin1D other, int resamples, FloatRandomEngine randomGenerator, FloatBinBinFunction1D function)
Finally returns the auxiliary bootstrap bin b3 from which the measure of interest can be read off.
Background:
Also see a more in-depth discussion on bootstrapping and related randomization methods. The classical statistical test for comparing the means of two samples is the t-test. Unfortunately, this test assumes that the two samples each come from a normal distribution and that these distributions have the same standard deviation. Quite often, however, data has a distribution that is non-normal in many ways. In particular, distributions are often unsymmetric. For such data, the t-test may produce misleading results and should thus not be used. Sometimes asymmetric data can be transformed into normally distributed data by taking e.g. the logarithm and the t-test will then produce valid results, but this still requires postulation of a certain distribution underlying the data, which is often not warranted, because too little is known about the data composition.
Bootstrap resampling of means differences (and other differences) is a robust replacement for the t-test and does not require assumptions about the actual distribution of the data. The idea of bootstrapping is quite simple: simulation. The only assumption required is that the two samples a and b are representative for the underlying distribution with respect to the statistic that is being tested - this assumption is of course implicit in all statistical tests. We can now generate lots of further samples that correspond to the two given ones, by sampling with replacement. This process is called resampling. A resample can (and usually will) have a different mean than the original one and by drawing hundreds or thousands of such resamples ar from a and br from b we can compute the so-called bootstrap distribution of all the differences "mean of ar minus mean of br". That is, a bootstrap bin filled with the differences. Now we can compute, what fraction of these differences is, say, greater than zero. Let's assume we have computed 1000 resamples of both a and b and found that only 8 of the differences were greater than zero. Then 8/1000 or 0.008 is the p-value (probability) for the hypothesis that the mean of the distribution underlying a is actually larger than the mean of the distribution underlying b. From this bootstrap test, we can clearly reject the hypothesis.
Instead of using means differences, we can also use other differences, for example, the median differences.
Instead of p-values we can also read arbitrary confidence intervals from the bootstrap bin. For example, 90% of all bootstrap differences are left of the value -3.5, hence a left 90% confidence interval for the difference would be (3.5,infinity); in other words: the difference is 3.5 or larger with probability 0.9.
Sometimes we would like to compare not only means and medians, but also the variability (spread) of two samples. The conventional method of doing this is the F-test, which compares the standard deviations. It is related to the t-test and, like the latter, assumes the two samples to come from a normal distribution. The F-test is very sensitive to data with deviations from normality. Instead we can again resort to more robust bootstrap resampling and compare a measure of spread, for example the inter-quartile range. This way we compute a bootstrap resampling of inter-quartile range differences in order to arrive at a test for inequality or variability.
Example:
// v1,v2 - the two samples to compare against each other float[] v1 = { 1, 2, 3, 4, 5, 6, 7, 8, 9,10, 21, 22,23,24,25,26,27,28,29,30,31}; float[] v2 = {10,11,12,13,14,15,16,17,18,19, 20, 30,31,32,33,34,35,36,37,38,39}; hep.aida.bin.DynamicBin1D X = new hep.aida.bin.DynamicBin1D(); hep.aida.bin.DynamicBin1D Y = new hep.aida.bin.DynamicBin1D(); X.addAllOf(new cern.colt.list.FloatArrayList(v1)); Y.addAllOf(new cern.colt.list.FloatArrayList(v2)); cern.jet.random.engine.RandomEngine random = new cern.jet.random.engine.MersenneTwister(); // bootstrap resampling of differences of means: BinBinFunction1D diff = new BinBinFunction1D() { public float apply(DynamicBin1D x, DynamicBin1D y) {return x.mean() - y.mean();} }; // bootstrap resampling of differences of medians: BinBinFunction1D diff = new BinBinFunction1D() { public float apply(DynamicBin1D x, DynamicBin1D y) {return x.median() - y.median();} }; // bootstrap resampling of differences of inter-quartile ranges: BinBinFunction1D diff = new BinBinFunction1D() { public float apply(DynamicBin1D x, DynamicBin1D y) {return (x.quantile(0.75)-x.quantile(0.25)) - (y.quantile(0.75)-y.quantile(0.25)); } }; DynamicBin1D boot = X.sampleBootstrap(Y,1000,random,diff); cern.jet.math.Functions F = cern.jet.math.Functions.functions; System.out.println("p-value="+ (boot.aggregate(F.plus, F.greater(0)) / boot.size())); System.out.println("left 90% confidence interval = ("+boot.quantile(0.9) + ",infinity)"); --> // bootstrap resampling of differences of means: p-value=0.0080 left 90% confidence interval = (-3.571428571428573,infinity) // bootstrap resampling of differences of medians: p-value=0.36 left 90% confidence interval = (5.0,infinity) // bootstrap resampling of differences of inter-quartile ranges: p-value=0.5699 left 90% confidence interval = (5.0,infinity) |
other
- the other bin to compare the receiver against.resamples
- the number of times resampling shall be done.randomGenerator
- a random number generator. Set this parameter to null
to use a default random number generator seeded with the
current time.function
- a difference function comparing two samples; takes as first
argument a sample of this and as second argument a
sample of other.GenericPermuting.permutation(long,int)
public void setFixedOrder(boolean fixedOrder)
Naturally, if fixedOrder is set to true you should not already have added elements to the receiver; it should be empty.
public int size()
size
in class StaticFloatBin1D
public FloatArrayList sortedElements()
setFixedOrder(boolean)
.public void standardize(float mean, float standardDeviation)
public float sum()
sum
in class StaticFloatBin1D
public float sumOfInversions()
sumOfInversions
in class MightyStaticFloatBin1D
MightyStaticFloatBin1D.hasSumOfInversions()
public float sumOfLogarithms()
sumOfLogarithms
in class MightyStaticFloatBin1D
MightyStaticFloatBin1D.hasSumOfLogarithms()
public float sumOfPowers(int k)
sumOfPowers
in class MightyStaticFloatBin1D
k
- the order of the powers.MightyStaticFloatBin1D.hasSumOfPowers(int)
public float sumOfSquares()
sumOfSquares
in class StaticFloatBin1D
public String toString()
toString
in class QuantileFloatBin1D
public void trim(int s, int l)
s
- the number of smallest elements to trim away (s >= 0
).l
- the number of largest elements to trim away (l >= 0).public float trimmedMean(int s, int l)
s
- the number of smallest elements to trim away (s >= 0
).l
- the number of largest elements to trim away (l >= 0).public void trimToSize()
Releases any superfluos internal memory. An application can use this operation to minimize the storage of the receiver. Does not affect functionality.
trimToSize
in class AbstractFloatBin1D
Jump to the Parallel Colt Homepage