empinf                 package:boot                 R Documentation

_E_m_p_i_r_i_c_a_l _I_n_f_l_u_e_n_c_e _V_a_l_u_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     This function calculates the empirical influence values for a
     statistic applied to a data set.  It allows four types of
     calculation, namely the infinitesimal jackknife (using numerical
     differentiation), the usual jackknife estimates, the "positive"
     jackknife estimates and a method which estimates the empirical
     influence values using regression of bootstrap  replicates of the
     statistic.  All methods can be used with one or more samples.

_U_s_a_g_e:

     empinf(boot.out=NULL, data=NULL, statistic=NULL, 
            type=NULL, stype="w", index=1, t=NULL,
            strata=rep(1, n), eps=0.001, ...)

_A_r_g_u_m_e_n_t_s:

boot.out: A bootstrap object created by the function 'boot'.  If 'type'
          is '"reg"' then  this argument is required.  For any of the
          other types it is  an optional argument.  If it is included
          when optional then the values of  'data', 'statistic',
          'stype', and 'strata' are taken from the components of 
          'boot.out' and any values passed to 'empinf' directly are
          ignored. 

    data: A vector, matrix or data frame containing the data for which
          empirical influence values are required.  It is a required
          argument if 'boot.out' is not supplied.  If 'boot.out' is
          supplied then 'data' is set to 'boot.out$data' and any value
          supplied is ignored.   

statistic: The statistic for which empirical influence values are
          required.  It must be a function of at least two arguments,
          the data set and  a vector of weights, frequencies or
          indices.  The nature of the second argument is given by the
          value of 'stype'.  Any other arguments that it  takes must be
          supplied to 'empinf' and will be passed to 'statistic'
          unchanged. This is a required argument if 'boot.out' is not
          supplied, otherwise its value is taken from 'boot.out' and
          any value supplied here will be ignored. 

    type: The calculation type to be used for the empirical influence
          values.   Possible values of 'type' are '"inf"'
          (infinitesimal jackknife), '"jack"' (usual jackknife),
          '"pos"' (positive jackknife), and '"reg"' (regression
          estimation).  The default value depends on the other
          arguments.  If 't' is supplied then the default value of
          'type' is '"reg"' and 'boot.out' should be present so that
          its frequency array can be found.  It 't' is not  supplied
          then if 'stype' is '"w"', the default value of 'type' is
          '"inf"'; otherwise, if 'boot.out' is present the default is
          '"reg"'.  If  none of these conditions apply then the default
          is '"jack"'.   Note that it is an error for 'type' to be
          '"reg"' if 'boot.out' is missing or to be  '"inf"' if 'stype'
          is not '"w"'. 

   stype: A character variable giving the nature of the second argument
          to 'statistic'. It can take on three values: '"w"' (weights),
          '"f"' (frequencies), or '"i"' (indices).  If 'boot.out' is
          supplied the value of 'stype' is set to  'boot.out$stype' and
          any value supplied here is ignored.  Otherwise it is an
          optional argument which defaults to '"w"'.  If 'type' is
          '"inf"' then 'stype'  MUST be '"w"'. 

   index: An integer giving the position of the variable of interest in
          the output of 'statistic'. 

       t: A vector of length 'boot.out$R' which gives the bootstrap
          replicates of the statistic of interest.  't' is used only
          when 'type' is 'reg' and it defaults to 'boot.out$t[,index]'.             

  strata: An integer vector or a factor specifying the strata for
          multi-sample problems. If 'boot.out' is supplied  the value
          of 'strata' is set to 'boot.out$strata'.   Otherwise it is an
          optional argument which has default corresponding to the 
          single sample situation. 

     eps: This argument is used only if 'type' is '"inf"'.  In that
          case the value of epsilon to be used for numerical
          differentiation will be 'eps' divided by the number of
          observations in 'data'. 

     ...: Any other arguments that 'statistic' takes.  They will be
          passed unchanged to 'statistic' every time that it is called. 

_D_e_t_a_i_l_s:

     If 'type' is '"inf"' then numerical differentiation is used to
     approximate the empirical influence values.  This makes sense only
     for statistics which are written in weighted form (i.e. 'stype' is
     '"w"').  If 'type' is '"jack"' then the usual leave-one-out
     jackknife estimates of the empirical influence are  returned.  If
     'type' is '"pos"' then the positive (include-one-twice) jackknife
     values are used.  If 'type' is '"reg"' then a bootstrap object
     must be supplied. The regression method then works by regressing
     the bootstrap replicates of 'statistic' on the frequency array
     from which they were derived.  The  bootstrap frequency array is
     obtained through a call to 'boot.array'.  Further details of the
     methods are given in Section 2.7 of Davison and Hinkley (1997).

     Empirical influence values are often used frequently in
     nonparametric bootstrap applications.  For this reason many other
     functions call 'empinf' when they are required.  Some examples of
     their use are for nonparametric delta estimates of variance, BCa
     intervals and finding linear approximations to statistics for  use
     as control variates.  They are also used for antithetic bootstrap 
     resampling.

_V_a_l_u_e:

     A vector of the empirical influence values of 'statistic' applied
     to 'data'. The values will be in the same order as the
     observations in data.

_W_a_r_n_i_n_g:

     All arguments to 'empinf' must be passed using the 'name=value'
     convention.  If this is not followed then unpredictable errors can
     occur.

_R_e_f_e_r_e_n_c_e_s:

     Davison, A.C. and Hinkley, D.V. (1997)  _Bootstrap Methods and
     Their Application_. Cambridge University Press.

     Efron, B. (1982) _The Jackknife, the Bootstrap and Other
     Resampling Plans_. CBMS-NSF Regional Conference Series in Applied
     Mathematics, *38*, SIAM.

     Fernholtz, L.T. (1983) _von Mises Calculus for Statistical
     Functionals_. Lecture Notes in Statistics, *19*, Springer-Verlag.

_S_e_e _A_l_s_o:

     'boot', 'boot.array', 'boot.ci', 'control', 'jack.after.boot',
     'linear.approx', 'var.linear'

_E_x_a_m_p_l_e_s:

     # The empirical influence values for the ratio of means in 
     # the city data.
     ratio <- function(d, w) sum(d$x *w)/sum(d$u*w)
     empinf(data=city,statistic=ratio)
     city.boot <- boot(city,ratio,499,stype="w")
     empinf(boot.out=city.boot,type="reg")

     # A statistic that may be of interest in the difference of means
     # problem is the t-statistic for testing equality of means.  In 
     # the bootstrap we get replicates of the difference of means and 
     # the variance of that statistic and then want to use this output
     # to get the empirical influence values of the t-statistic.
     grav1 <- gravity[as.numeric(gravity[,2])>=7,]
     grav.fun <- function(dat, w)
     {    strata <- tapply(dat[, 2], as.numeric(dat[, 2]))
          d <- dat[, 1]
          ns <- tabulate(strata)
          w <- w/tapply(w, strata, sum)[strata]
          mns <- tapply(d * w, strata, sum)
          mn2 <- tapply(d * d * w, strata, sum)
          s2hat <- sum((mn2 - mns^2)/ns)
          c(mns[2]-mns[1],s2hat)
     }

     grav.boot <- boot(grav1, grav.fun, R=499, stype="w", strata=grav1[,2])

     # Since the statistic of interest is a function of the bootstrap
     # statistics, we must calculate the bootstrap replicates and pass
     # them to empinf using the t argument.
     grav.z <- (grav.boot$t[,1]-grav.boot$t0[1])/sqrt(grav.boot$t[,2])
     empinf(boot.out=grav.boot,t=grav.z)

