Filewatcher File Search File Search
Catalog
Content Search
» » » » » libpdl-stats-perl_0.6.2-1_mipsel.deb » Content »
pkg://libpdl-stats-perl_0.6.2-1_mipsel.deb:335810/usr/share/man/man3/  info  control  downloads

libpdl-stats-perl - collection of statistics modules in Perl Data Language…  more info»

PDL::Stats::GLM.3pm.gz

GLM(3pm)         User Contributed Perl Documentation        GLM(3pm)



NAME
       PDL::Stats::GLM -- general and generalized linear modeling
       methods such as ANOVA, linear regression, PCA, and logistic
       regression.

DESCRIPTION
       The terms FUNCTIONS and METHODS are arbitrarily used to refer
       to methods that are threadable and methods that are NOT
       threadable, respectively. FUNCTIONS except ols_t support bad
       value. PDL::Slatec strongly recommended for most METHODS, and
       it is required for logistic.

       P-values, where appropriate, are provided if PDL::GSL::CDF is
       installed.

SYNOPSIS
           use PDL::LiteF;
           use PDL::NiceSlice;
           use PDL::Stats::GLM;

           # do a multiple linear regression and plot the residuals

           my $y = pdl( 8, 7, 7, 0, 2, 5, 0 );

           my $x = pdl( [ 0, 1, 2, 3, 4, 5, 6 ],        # linear component
                        [ 0, 1, 4, 9, 16, 25, 36 ] );   # quadratic component

           my %m  = $y->ols( $x, {plot=>1} );

           print "$_\t$m{$_}\n" for (sort keys %m);

FUNCTIONS
   fill_m
         Signature: (a(n); float+ [o]b(n))

       Replaces bad values with sample mean. Mean is set to 0 if all
       obs are bad. Can be done inplace.

            perldl> p $data
            [
             [  5 BAD   2 BAD]
             [  7   3   7 BAD]
            ]

            perldl> p $data->fill_m
            [
             [      5     3.5       2     3.5]
             [      7       3       7 5.66667]
            ]

       The output pdl badflag is cleared.

   fill_rand
         Signature: (a(n); [o]b(n))

       Replaces bad values with random sample (with replacement) of
       good observations from the same variable. Can be done
       inplace.

           perldl> p $data
           [
            [  5 BAD   2 BAD]
            [  7   3   7 BAD]
           ]

           perldl> p $data->fill_rand

           [
            [5 2 2 5]
            [7 3 7 7]
           ]

       The output pdl badflag is cleared.

   dev_m
         Signature: (a(n); float+ [o]b(n))

       Replaces values with deviations from the mean. Can be done
       inplace.

       dev_m does handle bad values.  It will set the bad-value flag
       of all output piddles if the flag is set for any of the input
       piddles.

   stddz
         Signature: (a(n); float+ [o]b(n))

       Standardize ie replace values with z_scores based on sample
       standard deviation from the mean (replace with 0s if
       stdv==0). Can be done inplace.

       stddz does handle bad values.  It will set the bad-value flag
       of all output piddles if the flag is set for any of the input
       piddles.

   sse
         Signature: (a(n); b(n); float+ [o]c())

       Sum of squared errors between actual and predicted values.

       sse does handle bad values.  It will set the bad-value flag
       of all output piddles if the flag is set for any of the input
       piddles.

   mse
         Signature: (a(n); b(n); float+ [o]c())

       Mean of squared errors between actual and predicted values,
       ie variance around predicted value.

       mse does handle bad values.  It will set the bad-value flag
       of all output piddles if the flag is set for any of the input
       piddles.

   rmse
         Signature: (a(n); b(n); float+ [o]c())

       Root mean squared error, ie stdv around predicted value.

       rmse does handle bad values.  It will set the bad-value flag
       of all output piddles if the flag is set for any of the input
       piddles.

   pred_logistic
         Signature: (a(n,m); b(m); float+ [o]c(n))

       Calculates predicted prob value for logistic regression.

           # glue constant then apply coeff returned by the logistic method

           $pred = $x->glue(1,ones($x->dim(0)))->pred_logistic( $m{b} );

       pred_logistic does handle bad values.  It will set the bad-
       value flag of all output piddles if the flag is set for any
       of the input piddles.

   d0
         Signature: (a(n); float+ [o]c())

           my $d0 = $y->d0();

       Null deviance for logistic regression.

       d0 does handle bad values.  It will set the bad-value flag of
       all output piddles if the flag is set for any of the input
       piddles.

   dm
         Signature: (a(n); b(n); float+ [o]c())

           my $dm = $y->dm( $y_pred );

             # null deviance
           my $d0 = $y->dm( ones($y->nelem) * $y->avg );

       Model deviance for logistic regression.

       dm does handle bad values.  It will set the bad-value flag of
       all output piddles if the flag is set for any of the input
       piddles.

   dvrs
         Signature: (a(); b(); float+ [o]c())

       Deviance residual for logistic regression.

       dvrs does handle bad values.  It will set the bad-value flag
       of all output piddles if the flag is set for any of the input
       piddles.

   ols_t
       Threaded version of ordinary least squares regression (ols).
       The price of threading was losing significance tests for
       coefficients (but see r2_change). The fitting function was
       shamelessly copied then modified from PDL::Fit::Linfit. Uses
       PDL::Slatec when possible but otherwise uses PDL::MatrixOps.
       Intercept is LAST of coeff if CONST => 1.

       ols_t does not handle bad values. consider fill_m or
       fill_rand if there are bad values.

       Default options (case insensitive):

           CONST   => 1,

       Usage:

           # DV, 2 person's ratings for top-10 box office movies
           # ascending sorted by box office numbers

           perldl> p $y = qsort ceil( random(10, 2)*5 )
           [
            [1 1 2 4 4 4 4 5 5 5]
            [1 2 2 2 3 3 3 3 5 5]
           ]

           # model with 2 IVs, a linear and a quadratic trend component

           perldl> $x = cat sequence(10), sequence(10)**2

           # suppose our novice modeler thinks this creates 3 different models
           # for predicting movie ratings

           perldl> p $x = cat $x, $x * 2, $x * 3
           [
            [
             [ 0  1  2  3  4  5  6  7  8  9]
             [ 0  1  4  9 16 25 36 49 64 81]
            ]
            [
             [  0   2   4   6   8  10  12  14  16  18]
             [  0   2   8  18  32  50  72  98 128 162]
            ]
            [
             [  0   3   6   9  12  15  18  21  24  27]
             [  0   3  12  27  48  75 108 147 192 243]
            ]
           ]

           perldl> p $x->info
           PDL: Double D [10,2,3]

           # insert a dummy dim between IV and the dim (model) to be threaded

           perldl> %m = $y->ols_t( $x->dummy(2) )

           perldl> p "$_\t$m{$_}\n" for (sort keys %m)

           # 2 persons' ratings, eached fitted with 3 "different" models

           F
           [
            [ 38.314159  25.087209]
            [ 38.314159  25.087209]
            [ 38.314159  25.087209]
           ]

           # df is the same across dv and iv models

           F_df    [2 7]
           F_p
           [
            [0.00016967051 0.00064215074]
            [0.00016967051 0.00064215074]
            [0.00016967051 0.00064215074]
           ]

           R2
           [
            [ 0.9162963 0.87756762]
            [ 0.9162963 0.87756762]
            [ 0.9162963 0.87756762]
           ]

           b
           [  # linear      quadratic     constant
            [
             [  0.99015152 -0.056818182   0.66363636]    # person 1
             [  0.18939394  0.022727273          1.4]    # person 2
            ]
            [
             [  0.49507576 -0.028409091   0.66363636]
             [  0.09469697  0.011363636          1.4]
            ]
            [
             [  0.33005051 -0.018939394   0.66363636]
             [ 0.063131313 0.0075757576          1.4]
            ]
           ]

           # our novice modeler realizes at this point that
           # the 3 models only differ in the scaling of the IV coefficients

           ss_model
           [
            [ 20.616667  13.075758]
            [ 20.616667  13.075758]
            [ 20.616667  13.075758]
           ]

           ss_residual
           [
            [ 1.8833333  1.8242424]
            [ 1.8833333  1.8242424]
            [ 1.8833333  1.8242424]
           ]

           ss_total        [22.5 14.9]
           y_pred
           [
            [
             [0.66363636  1.5969697  2.4166667  3.1227273  ...  4.9727273]
           ...

   r2_change
       Significance test for the incremental change in R2 when new
       variable(s) are added to an ols regression model. Returns the
       change stats as well as stats for both models. Based on
       ols_t. (One way to make up for the lack of significance tests
       for coeffs in ols_t).

       Default options (case insensitive):

           CONST   => 1,

       Usage:

           # suppose these are two persons' ratings for top 10 box office movies
           # ascending sorted by box office

           perldl> p $y = qsort ceil(random(10, 2) * 5)
           [
            [1 1 2 2 2 3 4 4 4 4]
            [1 2 2 3 3 3 4 4 5 5]
           ]

           # first IV is a simple linear trend

           perldl> p $x1 = sequence 10
           [0 1 2 3 4 5 6 7 8 9]

           # the modeler wonders if adding a quadratic trend improves the fit

           perldl> p $x2 = sequence(10) ** 2
           [0 1 4 9 16 25 36 49 64 81]

           # two difference models are given in two pdls
           # each as would be pass on to ols_t
           # the 1st model includes only linear trend
           # the 2nd model includes linear and quadratic trends
           # when necessary use dummy dim so both models have the same ndims

           perldl> %c = $y->r2_change( $x1->dummy(1), cat($x1, $x2) )

           perldl> p "$_\t$c{$_}\n" for (sort keys %c)
             #              person 1   person 2
           F_change        [0.72164948 0.071283096]
             # df same for both persons
           F_df    [1 7]
           F_p     [0.42370145 0.79717232]
           R2_change       [0.0085966043 0.00048562549]
           model0  HASH(0x8c10828)
           model1  HASH(0x8c135c8)

           # the answer here is no.

METHODS
   anova
       Analysis of variance. Uses type III sum of squares for
       unbalanced data.

       anova supports bad value in the dependent variable.

       Default options (case insensitive):

           V      => 1,       # carps if bad value in dv
           IVNM   => [],      # auto filled as ['IV_0', 'IV_1', ... ]
           PLOT   => 1,       # plots highest order effect
                              # can set plot_means options here

       Usage:

           # suppose this is ratings for 12 apples

           perldl> p $y = qsort ceil( random(12)*5 )
           [1 1 2 2 2 3 3 4 4 4 5 5]

           # IV for types of apple

           perldl> p $a = sequence(12) % 3 + 1
           [1 2 3 1 2 3 1 2 3 1 2 3]

           # IV for whether we baked the apple

           perldl> @b = qw( y y y y y y n n n n n n )

           perldl> %m = $y->anova( $a, \@b, { IVNM=>['apple', 'bake'] } )

           perldl> p "$_\t$m{$_}\n" for (sort keys %m)
           # apple # m
           [
            [2.5   3 3.5]
           ]

           # apple # se
           [
            [0.64549722 0.91287093 0.64549722]
           ]

           # apple ~ bake # m
           [
            [1.5 1.5 2.5]
            [3.5 4.5 4.5]
           ]

           # apple ~ bake # se
           [
            [0.5 0.5 0.5]
            [0.5 0.5 0.5]
           ]

           # bake # m
           [
            [ 1.8333333  4.1666667]
           ]

           # bake # se
           [
            [0.30731815 0.30731815]
           ]

           F       7.6
           F_df    [5 6]
           F_p     0.0141586545851857
           ms_model        3.8
           ms_residual     0.5
           ss_model        19
           ss_residual     3
           ss_total        22
           | apple | F     2
           | apple | F_df  [2 6]
           | apple | F_p   0.216
           | apple | ms    1
           | apple | ss    2
           | apple ~ bake | F      0.666666666666667
           | apple ~ bake | F_df   [2 6]
           | apple ~ bake | F_p    0.54770848985725
           | apple ~ bake | ms     0.333333333333334
           | apple ~ bake | ss     0.666666666666667
           | bake | F      32.6666666666667
           | bake | F_df   [1 6]
           | bake | F_p    0.00124263849516693
           | bake | ms     16.3333333333333
           | bake | ss     16.3333333333333

   anova_rptd
       Repeated measures and mixed model anova. Uses type III sum of
       squares. The standard error (se) for the means are based on
       the relevant mean squared error from the anova, ie it is
       pooled across levels of the effect.

       anova_rptd supports bad value in the dependent variable. It
       automatically removes bad data listwise, ie remove a
       subject's data if there is any cell missing for the subject.

       Default options (case insensitive):

           V      => 1,    # carps if bad value in dv
           IVNM   => [],   # auto filled as ['IV_0', 'IV_1', ... ]
           BTWN   => [],   # indices of between-subject IVs (matches IVNM indices)
           PLOT   => 1,    # plots highest order effect
                           # see plot_means() for more options

       Usage:

           Some fictional data: recall_w_beer_and_wings.txt

           Subject Beer    Wings   Recall
           Alex    1       1       8
           Alex    1       2       9
           Alex    1       3       12
           Alex    2       1       7
           Alex    2       2       9
           Alex    2       3       12
           Brian   1       1       12
           Brian   1       2       13
           Brian   1       3       14
           Brian   2       1       9
           Brian   2       2       8
           Brian   2       3       14
           ...

             # rtable allows text only in 1st row and col
           my ($data, $idv, $subj) = rtable 'recall_w_beer_and_wings.txt';

           my ($b, $w, $dv) = $data->dog;
             # subj and IVs can be 1d pdl or @ ref
             # subj must be the first argument
           my %m = $dv->anova_rptd( $subj, $b, $w, {ivnm=>['Beer', 'Wings']} );

           print "$_\t$m{$_}\n" for (sort keys %m);

           # Beer # m
           [
            [ 10.916667  8.9166667]
           ]

           # Beer # se
           [
            [ 0.4614791  0.4614791]
           ]

           # Beer ~ Wings # m
           [
            [   10     7]
            [ 10.5  9.25]
            [12.25  10.5]
           ]

           # Beer ~ Wings # se
           [
            [0.89170561 0.89170561]
            [0.89170561 0.89170561]
            [0.89170561 0.89170561]
           ]

           # Wings # m
           [
            [   8.5  9.875 11.375]
           ]

           # Wings # se
           [
            [0.67571978 0.67571978 0.67571978]
           ]

           ss_residual 19.0833333333333
           ss_subject  24.8333333333333
           ss_total    133.833333333333
           | Beer | F  9.39130434782609
           | Beer | F_p        0.0547977008378944
           | Beer | df 1
           | Beer | ms 24
           | Beer | ss 24
           | Beer || err df    3
           | Beer || err ms    2.55555555555556
           | Beer || err ss    7.66666666666667
           | Beer ~ Wings | F  0.510917030567687
           | Beer ~ Wings | F_p        0.623881438624431
           | Beer ~ Wings | df 2
           | Beer ~ Wings | ms 1.625
           | Beer ~ Wings | ss 3.25000000000001
           | Beer ~ Wings || err df    6
           | Beer ~ Wings || err ms    3.18055555555555
           | Beer ~ Wings || err ss    19.0833333333333
           | Wings | F 4.52851711026616
           | Wings | F_p       0.0632754786153548
           | Wings | df        2
           | Wings | ms        16.5416666666667
           | Wings | ss        33.0833333333333
           | Wings || err df   6
           | Wings || err ms   3.65277777777778
           | Wings || err ss   21.9166666666667

       For mixed model anova, ie when there are between-subject IVs
       involved, feed the IVs as above, but specify in BTWN which
       IVs are between-subject. For example, if we had added age as
       a between-subject IV in the above example, we would do

           my %m = $dv->anova_rptd( $subj, $age, $b, $w,
                                  { ivnm=>['Age', 'Beer', 'Wings'], btwn=>[0] });

   dummy_code
       Dummy coding of nominal variable (perl @ ref or 1d pdl) for
       use in regression.

           perldl> @a = qw(a a a b b b c c c)
           perldl> p $a = dummy_code(\@a)
           [
            [1 1 1 0 0 0 0 0 0]
            [0 0 0 1 1 1 0 0 0]
           ]

   effect_code
       Unweighted effect coding of nominal variable (perl @ ref or
       1d pdl) for use in regression. returns in @ context coded pdl
       and % ref to level - pdl->dim(1) index.

           my @var = qw( a a a b b b c c c );
           my ($var_e, $map) = effect_code( \@var );

           print $var_e . $var_e->info . "\n";

           [
            [ 1  1  1  0  0  0 -1 -1 -1]
            [ 0  0  0  1  1  1 -1 -1 -1]
           ]
           PDL: Double D [9,2]

           print "$_\t$map->{$_}\n" for (sort keys %$map)
           a       0
           b       1
           c       2

   effect_code_w
       Weighted effect code for nominal variable. returns in @
       context coded pdl and % ref to level - pdl->dim(1) index.

           perldl> @a = qw( a a b b b c c )
           perldl> p $a = effect_code_w(\@a)
           [
            [   1    1    0    0    0   -1   -1]
            [   0    0    1    1    1 -1.5 -1.5]
           ]

   interaction_code
       Returns the coded interaction term for effect-coded
       variables.

           perldl> $a = sequence(6) > 2
           perldl> p $a = $a->effect_code
           [
            [ 1  1  1 -1 -1 -1]
           ]

           perldl> $b = pdl( qw( 0 1 2 0 1 2 ) )
           perldl> p $b = $b->effect_code
           [
            [ 1  0 -1  1  0 -1]
            [ 0  1 -1  0  1 -1]
           ]

           perldl> p $ab = interaction_code( $a, $b )
           [
            [ 1  0 -1 -1 -0  1]
            [ 0  1 -1 -0 -1  1]
           ]

   ols
       Ordinary least squares regression, aka linear regression.
       Unlike ols_t, ols returns the full model in list context with
       various stats, but is not threadable.

       IVs ($x) should be pdl dims $y->nelem or $y->nelem x n_iv. Do
       not supply the constant vector in $x. Intercept is
       automatically added and returned as LAST of the coeffs if
       CONST=>1. Returns full model in list context and coeff in
       scalar context.

       Default options (case insensitive):

           CONST  => 1,
           PLOT   => 1,   # see plot_residuals() for plot options

       Usage:

           # suppose this is a person's ratings for top 10 box office movies
           # ascending sorted by box office

           perldl> p $y = qsort ceil( random(10) * 5 )
           [1 1 2 2 2 2 4 4 5 5]

           # construct IV with linear and quadratic component

           perldl> p $x = cat sequence(10), sequence(10)**2
           [
            [ 0  1  2  3  4  5  6  7  8  9]
            [ 0  1  4  9 16 25 36 49 64 81]
           ]

           perldl> %m = $y->ols( $x )

           perldl> p "$_\t$m{$_}\n" for (sort keys %m)

           F       40.4225352112676
           F_df    [2 7]
           F_p     0.000142834216344756
           R2      0.920314253647587

           # coeff  linear     quadratic  constant

           b       [0.21212121 0.03030303 0.98181818]
           b_p     [0.32800118 0.20303404 0.039910509]
           b_se    [0.20174693 0.021579989 0.38987581]
           b_t     [ 1.0514223   1.404219  2.5182844]
           ss_model        19.8787878787879
           ss_residual     1.72121212121212
           ss_total        21.6
           y_pred  [0.98181818  1.2242424  1.5272727  ...  4.6181818  5.3454545]

   ols_rptd
       Repeated measures linear regression (Lorch & Myers, 1990; Van
       den Noortgate & Onghena, 2006). Handles purely within-subject
       design for now. See t/stats_ols_rptd.t for an example using
       the Lorch and Myers data.

       Usage:

           # This is the example from Lorch and Myers (1990),
           # a study on how characteristics of sentences affected reading time
           # Three within-subject IVs:
           # SP -- serial position of sentence
           # WORDS -- number of words in sentence
           # NEW -- number of new arguments in sentence

           # $subj can be 1D pdl or @ ref and must be the first argument
           # IV can be 1D @ ref or pdl
           # 1D @ ref is effect coded internally into pdl
           # pdl is left as is

           my %r = $rt->ols_rptd( $subj, $sp, $words, $new );

           print "$_\t$r{$_}\n" for (sort keys %r);

           (ss_residual)   58.3754646504336
           (ss_subject)    51.8590337714286
           (ss_total)  405.188241771429
           #      SP        WORDS      NEW
           F   [  7.208473  61.354153  1.0243311]
           F_p [0.025006181 2.619081e-05 0.33792837]
           coeff   [0.33337285 0.45858933 0.15162986]
           df  [1 1 1]
           df_err  [9 9 9]
           ms  [ 18.450705  73.813294 0.57026483]
           ms_err  [ 2.5595857  1.2030692 0.55671923]
           ss  [ 18.450705  73.813294 0.57026483]
           ss_err  [ 23.036272  10.827623  5.0104731]

   logistic
       Logistic regression with maximum likelihood estimation using
       PDL::Fit::LM (requires PDL::Slatec. Hence loaded with
       "require" in the sub instead of "use" at the beginning).

       IVs ($x) should be pdl dims $y->nelem or $y->nelem x n_iv. Do
       not supply the constant vector in $x. It is included in the
       model and returned as LAST of coeff. Returns full model in
       list context and coeff in scalar context.

       The significance tests are likelihood ratio tests (-2LL
       deviance) tests. IV significance is tested by comparing
       deviances between the reduced model (ie with the IV in
       question removed) and the full model.

       ***NOTE: the results here are qualitatively similar to but
       not identical with results from R, because different
       algorithms are used for the nonlinear parameter fit. Use with
       discretion***

       Default options (case insensitive):

           INITP => zeroes( $x->dim(1) + 1 ),    # n_iv + 1
           MAXIT => 1000,
           EPS   => 1e-7,

       Usage:

           # suppose this is whether a person had rented 10 movies

           perldl> p $y = ushort( random(10)*2 )
           [0 0 0 1 1 0 0 1 1 1]

           # IV 1 is box office ranking

           perldl> p $x1 = sequence(10)
           [0 1 2 3 4 5 6 7 8 9]

           # IV 2 is whether the movie is action- or chick-flick

           perldl> p $x2 = sequence(10) % 2
           [0 1 0 1 0 1 0 1 0 1]

           # concatenate the IVs together

           perldl> p $x = cat $x1, $x2
           [
            [0 1 2 3 4 5 6 7 8 9]
            [0 1 0 1 0 1 0 1 0 1]
           ]

           perldl> %m = $y->logistic( $x )

           perldl> p "$_\t$m{$_}\n" for (sort keys %m)

           D0  13.8629436111989
           Dm  9.8627829791575
           Dm_chisq    4.00016063204141
           Dm_df       2
           Dm_p        0.135324414081692
             #  ranking    genre      constant
           b   [0.41127706 0.53876358 -2.1201285]
           b_chisq     [ 3.5974504 0.16835559  2.8577151]
           b_p [0.057868258  0.6815774 0.090936587]
           iter        12
           y_pred      [0.10715577 0.23683909 ... 0.76316091 0.89284423]

   pca
       Principal component analysis. Based on corr instead of cov
       (bad values are ignored pair-wise. OK when bad values are few
       but otherwise probably should fill_m etc before pca). Use
       PDL::Slatec::eigsys() if installed, otherwise use
       PDL::MatrixOps::eigens_sym().

       Default options (case insensitive):

           CORR  => 1,     # boolean. use correlation or covariance
           PLOT  => 1,     # calls plot_screes by default
                           # can set plot_screes options here

       Usage:

           my $d = qsort random 10, 5;      # 10 obs on 5 variables
           my %r = $d->pca( \%opt );
           print "$_\t$r{$_}\n" for (keys %r);

           eigenvalue    # variance accounted for by each component
           [4.70192 0.199604 0.0471421 0.0372981 0.0140346]

           eigenvector   # dim var x comp. weights for mapping variables to component
           [
            [ -0.451251  -0.440696  -0.457628  -0.451491  -0.434618]
            [ -0.274551   0.582455   0.131494   0.255261  -0.709168]
            [   0.43282   0.500662  -0.139209  -0.735144 -0.0467834]
            [  0.693634  -0.428171   0.125114   0.128145  -0.550879]
            [  0.229202   0.180393  -0.859217     0.4173  0.0503155]
           ]

           loadings      # dim var x comp. correlation between variable and component
           [
            [ -0.978489  -0.955601  -0.992316   -0.97901  -0.942421]
            [ -0.122661   0.260224  0.0587476   0.114043  -0.316836]
            [ 0.0939749   0.108705 -0.0302253  -0.159616 -0.0101577]
            [   0.13396 -0.0826915  0.0241629  0.0247483   -0.10639]
            [  0.027153  0.0213708  -0.101789  0.0494365 0.00596076]
           ]

           pct_var       # percent variance accounted for by each component
           [0.940384 0.0399209 0.00942842 0.00745963 0.00280691]

       Plot scores along the first two components,

           $d->plot_scores( $r{eigenvector} );

   pca_sorti
       Determine by which vars a component is best represented.
       Descending sort vars by size of association with that
       component. Returns sorted var and relevant component indices.

       Default options (case insensitive):

           NCOMP => 10,     # maximum number of components to consider

       Usage:

             # let's see if we replicated the Osgood et al. (1957) study
           perldl> ($data, $idv, $ido) = rtable 'osgood_exp.csv', {v=>0}

             # select a subset of var to do pca
           perldl> $ind = which_id $idv, [qw( ACTIVE BASS BRIGHT CALM FAST GOOD HAPPY HARD LARGE HEAVY )]
           perldl> $data = $data( ,$ind)->sever
           perldl> @$idv = @$idv[list $ind]

           perldl> %m = $data->pca

           perldl> ($iv, $ic) = $m{loadings}->pca_sorti()

           perldl> p "$idv->[$_]\t" . $m{loadings}->($_,$ic)->flat . "\n" for (list $iv)

                    #   COMP0     COMP1    COMP2    COMP3
           HAPPY       [0.860191 0.364911 0.174372 -0.10484]
           GOOD        [0.848694 0.303652 0.198378 -0.115177]
           CALM        [0.821177 -0.130542 0.396215 -0.125368]
           BRIGHT      [0.78303 0.232808 -0.0534081 -0.0528796]
           HEAVY       [-0.623036 0.454826 0.50447 0.073007]
           HARD        [-0.679179 0.0505568 0.384467 0.165608]
           ACTIVE      [-0.161098 0.760778 -0.44893 -0.0888592]
           FAST        [-0.196042 0.71479 -0.471355 0.00460276]
           LARGE       [-0.241994 0.594644 0.634703 -0.00618055]
           BASS        [-0.621213 -0.124918 0.0605367 -0.765184]

   plot_means
       Plots means anova style. Can handle up to 4-way interactions
       (ie 4D pdl).

       Default options (case insensitive):

           IVNM  => ['IV_0', 'IV_1', 'IV_2', 'IV_3'],
           DVNM  => 'DV',
           AUTO  => 1,       # auto set dims to be on x-axis, line, panel
                             # if set 0, dim 0 goes on x-axis, dim 1 as lines
                             # dim 2+ as panels
             # see PDL::Graphics::PGPLOT::Window for next options
           WIN   => undef,   # pgwin object. not closed here if passed
                             # allows comparing multiple lines in same plot
                             # set env before passing WIN
           DEV   => '/xs',         # open and close dev for plotting if no WIN
                                   # defaults to '/png' in Windows
           SIZE  => 640,           # individual square panel size in pixels
           SYMBL => [0, 4, 7, 11],

       Usage:

             # see anova for mean / se pdl structure
           $mean->plot_means( $se, {IVNM=>['apple', 'bake']} );

       Or like this:

           $m{'# apple ~ bake # m'}->plot_means;

   plot_residuals
       Plots residuals against predicted values.

       Usage:

           $y->plot_residuals( $y_pred, { dev=>'/png' } );

       Default options (case insensitive):

            # see PDL::Graphics::PGPLOT::Window for more info
           WIN   => undef,  # pgwin object. not closed here if passed
                            # allows comparing multiple lines in same plot
                            # set env before passing WIN
           DEV   => '/xs',  # open and close dev for plotting if no WIN
                            # defaults to '/png' in Windows
           SIZE  => 640,    # plot size in pixels
           COLOR => 1,

   plot_scores
       Plots standardized original and PCA transformed scores
       against two components. (Thank you, Bob MacCallum, for the
       documentation suggestion that led to this function.)

       Default options (case insensitive):

         CORR  => 1,      # boolean. PCA was based on correlation or covariance
         COMP  => [0,1],  # indices to components to plot
           # see PDL::Graphics::PGPLOT::Window for next options
         WIN   => undef,  # pgwin object. not closed here if passed
                          # allows comparing multiple lines in same plot
                          # set env before passing WIN
         DEV   => '/xs',  # open and close dev for plotting if no WIN
                          # defaults to '/png' in Windows
         SIZE  => 640,    # plot size in pixels
         COLOR => [1,2],  # color for original and rotated scores

       Usage:

         my %p = $data->pca();
         $data->plot_scores( $p{eigenvector}, \%opt );

   plot_screes
       Scree plot. Plots proportion of variance accounted for by PCA
       components.

       Default options (case insensitive):

         NCOMP => 20,     # max number of components to plot
         CUT   => 0,      # set to plot cutoff line after this many components
                          # undef to plot suggested cutoff line for NCOMP comps
          # see PDL::Graphics::PGPLOT::Window for next options
         WIN   => undef,  # pgwin object. not closed here if passed
                          # allows comparing multiple lines in same plot
                          # set env before passing WIN
         DEV   => '/xs',  # open and close dev for plotting if no WIN
                          # defaults to '/png' in Windows
         SIZE  => 640,    # plot size in pixels
         COLOR => 1,

       Usage:

         # variance should be in descending order

         $pca{var}->plot_screes( {ncomp=>16} );

       Or, because NCOMP is used so often, it is allowed a shortcut,

         $pca{var}->plot_screes( 16 );

SEE ALSO
       PDL::Fit::Linfit

       PDL::Fit::LM

REFERENCES
       Cohen, J., Cohen, P., West, S.G., & Aiken, L.S. (2003).
       Applied Multiple Regression/correlation Analysis for the
       Behavioral Sciences (3rd ed.). Mahwah, NJ: Lawrence Erlbaum
       Associates Publishers.

       Hosmer, D.W., & Lemeshow, S. (2000). Applied Logistic
       Regression (2nd ed.). New York, NY: Wiley-Interscience.

       Lorch, R.F., & Myers, J.L. (1990). Regression analyses of
       repeated measures data in cognitive research. Journal of
       Experimental Psychology: Learning, Memory, & Cognition, 16,
       149-157.

       Osgood C.E., Suci, G.J., & Tannenbaum, P.H. (1957). The
       Measurement of Meaning. Champaign, IL: University of Illinois
       Press.

       Rutherford, A. (2001). Introducing Anova and Ancova: A GLM
       Approach (1st ed.). Thousand Oaks, CA: Sage Publications.

       Shlens, J. (2009). A Tutorial on Principal Component
       Analysis. Retrieved April 10, 2011 from
       http://citeseerx.ist.psu.edu/

       The GLM procedure: unbalanced ANOVA for two-way design with
       interaction. (2008). SAS/STAT(R) 9.2 User's Guide. Retrieved
       June 18, 2009 from http://support.sas.com/

       Van den Noortgatea, W., & Onghenaa, P. (2006). Analysing
       repeated measures data in cognitive research: A comment on
       regression coefficient analyses. European Journal of
       Cognitive Psychology, 18, 937-952.

AUTHOR
       Copyright (C) 2009 Maggie J. Xiong <maggiexyz
       users.sourceforge.net>

       All rights reserved. There is no warranty. You are allowed to
       redistribute this software / documentation as described in
       the file COPYING in the PDL distribution.



perl v5.14.2                 2012-06-07                     GLM(3pm)
Results 1 - 1 of 1
Help - FTP Sites List - Software Dir.
Search over 15 billion files
© 1997-2017 FileWatcher.com