magic                  package:mgcv                  R Documentation

_S_t_a_b_l_e _M_u_l_t_i_p_l_e _S_m_o_o_t_h_i_n_g _P_a_r_a_m_e_t_e_r _E_s_t_i_m_a_t_i_o_n _b_y _G_C_V _o_r _U_B_R_E, 
_w_i_t_h _o_p_t_i_o_n_a_l _f_i_x_e_d _p_e_n_a_l_t_y

_D_e_s_c_r_i_p_t_i_o_n:

     Function to efficiently estimate smoothing parameters in
     generalized ridge regression problems with multiple (quadratic)
     penalties, by GCV  or UBRE. The function uses Newton's method in
     multi-dimensions, backed up by  steepest descent to iteratively
     adjust the smoothing parameters for each penalty  (one penalty may
     have a smoothing parameter fixed at unity ). 

     For maximal numerical stability the method is based on orthogonal
     decomposition methods,  and attempts to deal with numerical rank
     deficiency gracefully using a truncated singular  value
     decomposition approach.

_U_s_a_g_e:

     magic(y,X,sp,S,off,rank=NULL,H=NULL,C=NULL,w=NULL,
           gamma=1,scale=1,gcv=TRUE,ridge.parameter=NULL,
           control=list(maxit=50,tol=1e-6,step.half=25,
                   rank.tol=.Machine$double.eps^0.5))

_A_r_g_u_m_e_n_t_s:

       y: is the response data vector.

       X: is the model matrix.

      sp: is the array of smoothing parameters multiplying the penalty
          matrices stored in  'S'. Any that are negative are
          autoinitialized, otherwise they are taken as supplying 
          starting values. A supplied starting value will be reset to a
          default starting value if the gradient of the GCV/UBRE score
          is too small at the supplied value.  

       S: is a list of of penalty matrices. 'S[[i]]' is the ith penalty
          matrix, but note that it is not stored as a full matrix, but
          rather as the smallest square matrix including all  the
          non-zero elements of the penalty matrix. Element 1,1 of
          'S[[i]]'  occupies  element 'off[i]', 'off[i]' of the ith
          penalty matrix. Each 'S[[i]]' must be  positive
          semi-definite.  

     off: is an array indicating the first parameter in the parameter
          vector that is  penalized by the penalty involving 'S[[i]]'.

    rank: is an array specifying the ranks of the penalties. This is
          useful, but not  essential, for forming square roots of the
          penalty matrices.

       H: is the optional offset penalty - i.e. a penalty with a
          smoothing parameter fixed at  1. This is useful for allowing
          regularization of the estimation process, fixed smoothing 
          penalties etc.

       C: is the optional matrix specifying any linear equality
          constraints on the fitting  problem. If b is the parameter
          vector then the parameters are forced to satisfy  Cb=0. 

       w: the regression weights. If this is a matrix then it is taken
          as being the  square root of the inverse of the covariance
          matrix of 'y', specifically  V_y^{-1}=w'w. If 'w' is an array
          then  it is taken as the diagonal of this matrix, or simply
          the weight for each element of  'y'.

   gamma: is an inflation factor for the model degrees of freedom in
          the GCV or UBRE  score.

   scale: is the scale parameter for use with UBRE.

     gcv: should be set to 'TRUE' if GCV is to be used, 'FALSE' for
          UBRE.

ridge.parameter: It is sometimes useful to apply a ridge penalty to the
          fitting problem,  penalizing the parameters in the
          constrained space directly. Setting this parameter to a value
           greater than zero will cause such a penalty to be used, with
          the magnitude given by the  parameter value.

 control: is a list of iteration control constants with the following
          elements:

     _m_a_x_i_t The maximum number of iterations of the magic algorithm to
          allow.

     _t_o_l The tolerance to use in judging convergence.

     _s_t_e_p._h_a_l_f If a trial step fails then the method tries halving it
          up to a maximum of  'step.half' times.

     _r_a_n_k._t_o_l is a constant used to test for numerical rank deficiency
          of the problem.  Basically any singular value less than
          'rank_tol' multiplied by the largest singular value of  the 
          problem is set to zero.

_D_e_t_a_i_l_s:

     The method is a computationally efficient means of applying GCV or
     UBRE (often approximately  AIC) to the  problem of smoothing
     parameter selection in generalized ridge regression problems  of
     the form:

 minimise || W (Xb-y) ||^2 + b'Hb + theta_1 b'S_1 b + theta_2 b'S_2 b + . . .

     possibly subject to constraints Cb=0.  X is a design matrix, b a
     parameter vector,  y a data vector, W a weight matrix, S_i a
     positive semi-definite matrix  of coefficients defining the ith
     penalty with associated smoothing parameter theta_i,  H is the
     positive semi-definite offset penalty matrix  and C a  matrix of
     coefficients defining any linear equality constraints on the
     problem.  X need not be of full column rank.

     The theta_i are chosen to minimize either the GCV score:


                V_g = n ||W(y-Ay)||^2/[tr(I - g A)]^2


     or the UBRE score:


             V_u =||W(y-Ay||^2/n - 2 s tr(I - g A)/n + s


     where g is 'gamma' the inflation factor for degrees of freedom
     (usually set to 1) and s  is 'scale', the scale parameter. A is
     the hat matrix (influence matrix) for the fitting problem (i.e the
     matrix mapping data to fitted values). Dependence of the scores on
     the smoothing parameters is through A. 

     The method operates by  Newton or steepest descent updates of the
     logs of the  theta_i. A key aspect of the method is stable and
     economical calculation of the  first and second derivatives of the
     scores w.r.t. the log smoothing parameters.  Because the GCV/UBRE
     scores are flat w.r.t. very large or very small theta_i,  it's
     important to get good starting parameters, and to be careful not
     to step into a flat region of the smoothing parameter space. For
     this reason the algorithm rescales any Newton step that  would
     result in a log(theta_i) change of more than 5. Newton steps are 
     only used if the Hessian of the GCV/UBRE is postive definite,
     otherwise steepest descent is  used. Similarly steepest descent is
     used if the Newton step has to be contracted too far  (indicating
     that the quadratic model underlying Newton is poor). All initial
     steepest descent  steps are scaled so that their largest component
     is 1. However a step is calculated,  it is never expanded if it is
     successful (to avoid flat portions of the objective),  but steps
     are successively halved if they do not decrease the GCV/UBRE
     score, until  they do, or the direction is deemed to have failed.
     (Given the smoothing parameters the optimal  b parameters are
     easily found.)

     The method is coded in 'C' with matrix factorizations performed
     using LINPACK and LAPACK routines.

_V_a_l_u_e:

     The function returns a list with the following items:

       b: The best fit parameters given the estimated smoothing
          parameters.

   scale: the estimated (GCV) or supplied (UBRE) scale parameter.

   score: the minimized GCV or UBRE score.

      sp: an array of the estimated smoothing parameters.

      rV: a factored form of the parameter covariance matrix. The
          (Bayesian)  covariance matrix of the parametes 'b' is given
          by 'rV%*%t(rV)*scale'. 

gcv.info: is a list of information about the performance of the method
          with the following elements:

     _f_u_l_l._r_a_n_k The apparent rank of the problem: number of parameters
          less number of equality constraints.

     _r_a_n_k The estimated actual rank of the problem (at the final
          iteration of the method).

     _f_u_l_l_y._c_o_n_v_e_r_g_e_d is 'TRUE' if the method converged by satisfying
          the convergence criteria, and 'FALSE' if it coverged  by
          failing to decrease the score along the search direction.

     _h_e_s_s._p_o_s._d_e_f is 'TRUE' if the hessian of the UBRE or GCV score was
          positive definite at convergence.

     _i_t_e_r is the number of Newton/Steepest descent iterations taken.

     _s_c_o_r_e._c_a_l_l_s is the number of times that the GCV/UBRE score had to
          be evaluated.

     _r_m_s._g_r_a_d is the root mean square of the gradient of the UBRE/GCV
          score w.r.t. the smoothing parameters. 

     Note that some further useful quantities can be obtained using
     'magic.post.proc'.

_A_u_t_h_o_r(_s):

     Simon N. Wood simon@stats.gla.ac.uk

_R_e_f_e_r_e_n_c_e_s:

     Wood, S.N. (2004) Stable and efficient multiple smoothing
     parameter estimation for generalized additive models. J. Amer.
     Statist. Ass. 99:637-686

     <URL: http://www.stats.gla.ac.uk/~simon/>

_S_e_e _A_l_s_o:

     'magic.post.proc', 'mgcv', 'gam',

_E_x_a_m_p_l_e_s:

     library(mgcv)
     set.seed(1);n<-400;sig2<-4
     x0 <- runif(n, 0, 1);x1 <- runif(n, 0, 1)
     x2 <- runif(n, 0, 1);x3 <- runif(n, 0, 1)
     f <- 2 * sin(pi * x0)
     f <- f + exp(2 * x1) - 3.75887
     f <- f+0.2*x2^11*(10*(1-x2))^6+10*(10*x2)^3*(1-x2)^10-1.396
     e <- rnorm(n, 0, sqrt(sig2))
     y <- f + e
     # set up additive model
     G<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),fit=FALSE)
     # fit using magic
     mgfit<-magic(G$y,G$X,G$sp,G$S,G$off,G$rank,C=G$C)
     # and fit using gam as consistency check
     b<-gam(G=G)
     mgfit$sp;b$sp  # compare smoothing parameter estimates
     edf<-magic.post.proc(G$X,mgfit,G$w)$edf  # extract e.d.f. per parameter
     # get termwise e.d.f.s
     twedf<-0;for (i in 1:4) twedf[i]<-sum(edf[((i-1)*10+1):(i*10)])
     twedf;b$edf  # compare

