dbrvstatdiff (1) - Linux Manuals
dbrvstatdiff: evaluate statistical differences between two random variables
NAME
dbrvstatdiff - evaluate statistical differences between two random variables
SYNOPSIS
dbrvstatdiff [-f format] [-c ConfRating] [-h HypothesizedDifference] m1c sd1c n1c m2c sd2c n2c
OR
dbrvstatdiff [-f format] [-c ConfRating] m1c n1c m2c n2c
DESCRIPTION
Produce statistics on the difference of sets of random variables. If a hypothesized difference is given (with "-h"), to does a Student's t-test.Random variables are specified by:
- "m1c", "m2c"
- The column names of means of random variables.
- "sd1c", "sd2c"
- The column names of standard deviations of random variables.
- "n1c", "n2c"
- Counts of number of samples for each random variable
These values can be computed with dbcolstats.
Creates up to ten new columns:
- "diff"
- The difference of RV 2 - RV 1.
- "diff_pct"
- The percentage difference (RV2-RV1)/1
- "diff_conf_{half,low,high}" and "diff_conf_pct_{half,low,high}"
- The half half confidence intervals and low and high values for absolute and relative confidence.
- "t_test"
- The T-test value for the given hypothesized difference.
- "t_test_result"
- Given the confidence rating, does the test pass? Will be either ``rejected'' or ``not-rejected''.
- "t_test_break"
- The hypothesized value that is break-even point for the T-test.
- "t_test_break_pct"
- Break-even point as a percent of m1c.
Confidence intervals are not printed if standard deviations are not provided. Confidence intervals assume normal distributions with common variances.
T-tests are only computed if a hypothesized difference is provided. Hypothesized differences should be proceeded by <=, >=, =. T-tests assume normal distributions with common variances.
OPTIONS
- -c FRACTION or --confidence FRACTION
- Specify FRACTION for the confidence interval. Defaults to 0.95 for a 95% confidence factor (alpha = 0.05).
- -f FORMAT or --format FORMAT
- Specify a printf(3)-style format for output statistics. Defaults to "%.5g".
- -h DIFF or --hypothesis DIFF
- Specify the hypothesized difference as "DIFF", where "DIFF" is something like "<=0" or ">=0", etc.
This module also supports the standard fsdb options:
- -d
- Enable debugging output.
- -i or --input InputSource
- Read from InputSource, typically a file name, or "-" for standard input, or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue objects.
- -o or --output OutputDestination
- Write to OutputDestination, typically a file name, or "-" for standard output, or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue objects.
- --autorun or --noautorun
- By default, programs process automatically, but Fsdb::Filter objects in Perl do not run until you invoke the run() method. The "--(no)autorun" option controls that behavior within Perl.
- --help
- Show help.
- --man
- Show full manual.
SAMPLE USAGE
Input:
#fsdb title mean2 stddev2 n2 mean1 stddev1 n1 example6.12 0.17 0.0020 5 0.22 0.0010 4
Command:
cat data.fsdb | dbrvstatdiff mean2 stddev2 n2 mean1 stddev1 n1
Output:
#fsdb title mean2 stddev2 n2 mean1 stddev1 n1 diff diff_pct diff_conf_half diff_conf_low diff_conf_high diff_conf_pct_half diff_conf_pct_low diff_conf_pct_high example6.12 0.17 0.0020 5 0.22 0.0010 4 0.05 29.412 0.0026138 0.047386 0.052614 1.5375 27.874 30.949 # | dbrvstatdiff mean2 stddev2 n2 mean1 stddev1 n1
Input 2:
(example 7.10 from Scheaffer and McClave):
#fsdb title x2 sd2 n2 x1 sd1 n1 example7.10 9 35.22 24.44 9 31.56 20.03
Command 2:
dbrvstatdiff -h '<=0' x2 sd2 n2 x1 sd1 n1
Output 2:
#fsdb title n1 x1 sd1 n2 x2 sd2 diff diff_pct diff_conf_half diff_conf_low diff_conf_high diff_conf_pct_half diff_conf_pct_low diff_conf_pct_high t_test t_test_result example7.10 9 35.22 24.44 9 31.56 20.03 3.66 0.11597 4.7125 -1.0525 8.3725 0.14932 -0.033348 0.26529 1.6465 not-rejected # | /global/us/edu/ucla/cs/ficus/users/johnh/BIN/DB/dbrvstatdiff -h <=0 x2 sd2 n2 x1 sd1 n1
Case 3:
A common use case is to have one file with a set of trials from two experiments, and to use dbrvstatdiff to see if they are different.Input 3:
#fsdb case trial value a 1 1 a 2 1.1 a 3 0.9 a 4 1 a 5 1.1 b 1 2 b 2 2.1 b 3 1.9 b 4 2 b 5 1.9
Command 3:
cat two_trial.fsdb | dbmultistats -k case value | dbcolcopylast mean stddev n | dbrow '_case eq "b"' | dbrvstatdiff -h '=0' mean stddev n copylast_mean copylast_stddev copylast_n | dblistize
Output 3:
#fsdb -R C case mean stddev pct_rsd conf_range conf_low conf_high conf_pct sum sum_squared min max n copylast_mean copylast_stddev copylast_n diff diff_pct diff_conf_half diff_conf_low diff_conf_high diff_conf_pct_half diff_conf_pct_low diff_conf_pct_high t_test t_test_result t_test_break t_test_break_pct case: b mean: 1.98 stddev: 0.083666 pct_rsd: 4.2256 conf_range: 0.10387 conf_low: 1.8761 conf_high: 2.0839 conf_pct: 0.95 sum: 9.9 sum_squared: 19.63 min: 1.9 max: 2.1 n: 5 copylast_mean: 1.02 copylast_stddev: 0.083666 copylast_n: 5 diff: -0.96 diff_pct: -48.485 diff_conf_half: 0.12202 diff_conf_low: -1.082 diff_conf_high: -0.83798 diff_conf_pct_half: 6.1627 diff_conf_pct_low: -54.648 diff_conf_pct_high: -42.322 t_test: -18.142 t_test_result: rejected t_test_break: -1.082 t_test_break_pct: -54.648 # | dbmultistats -k case value # | dbcolcopylast mean stddev n # | dbrow _case eq "b" # | dbrvstatdiff -h =0 mean stddev n copylast_mean copylast_stddev copylast_n # | dbfilealter -R C
(So one cannot say that they are statistically equal.)
AUTHOR and COPYRIGHT
Copyright (C) 1991-2015 by John Heidemann <johnh [at] isi.edu>This program is distributed under terms of the GNU general public license, version 2. See the file COPYING with the distribution for details.