dpill               package:KernSmooth               R Documentation

_S_e_l_e_c_t _a _B_a_n_d_w_i_d_t_h _f_o_r _L_o_c_a_l _L_i_n_e_a_r _R_e_g_r_e_s_s_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     Use direct plug-in methodology to select the bandwidth of a local
     linear Gaussian kernel regression estimate, as described by
     Ruppert, Sheather and Wand (1995).

_U_s_a_g_e:

     dpill(x, y, blockmax=5, divisor=20, trim=0.01, proptrun=0.05, 
           gridsize=401, range.x=<<see below>>, truncate=TRUE)

_A_r_g_u_m_e_n_t_s:

       x: vector of x data. Missing values are not accepted. 

       y: vector of y data. This must be same length as 'x', and
          missing values are not accepted. 

blockmax: the maximum number of blocks of the data for construction of
          an initial parametric estimate.  

 divisor: the value that the sample size is divided by to determine a
          lower limit on the number of blocks of the data for
          construction of an initial parametric estimate. 

    trim: the proportion of the sample trimmed from each end in the 'x'
          direction before application of the plug-in methodology. 

proptrun: the proportion of the range of 'x' at each end truncated in
          the functional estimates. 

gridsize: number of equally-spaced grid points over which the function
          is to be estimated. 

 range.x: vector containing the minimum and maximum values of 'x' at
          which to compute the estimate. For density estimation the
          default is the minimum and maximum data values with 5% of the
          range added to each end. For regression estimation the
          default is the minimum and maximum data values. 

truncate: logical flag: if 'TRUE', data with 'x' values outside the
          range specified by 'range.x' are ignored. 

_D_e_t_a_i_l_s:

     The direct plug-in approach, where unknown functionals that appear
     in expressions for the asymptotically optimal bandwidths are
     replaced by kernel estimates, is used. The kernel is the standard
     normal density. Least squares quartic fits over blocks of data are
     used to  obtain an initial estimate. Mallow's Cp is used to select
     the number of blocks.

_V_a_l_u_e:

     the selected bandwidth.

_W_a_r_n_i_n_g:

     If there are severe irregularities (i.e. outliers, sparse regions)
     in the 'x' values then the local polynomial smooths required for
     the bandwidth selection algorithm may become degenerate and the
     function will crash. Outliers in the 'y' direction may lead to
     deterioration of the quality of the selected bandwidth.

_R_e_f_e_r_e_n_c_e_s:

     Ruppert, D., Sheather, S. J. and Wand, M. P. (1995). An effective
     bandwidth selector for local least squares regression. _Journal of
     the American Statistical Association_, *90*, 1257-1270.

     Wand, M. P. and Jones, M. C. (1995). _Kernel Smoothing._ Chapman
     and Hall, London.

_S_e_e _A_l_s_o:

     'ksmooth', 'locpoly'.

_E_x_a_m_p_l_e_s:

     data(geyser, package="MASS")
     x <- geyser$duration
     y <- geyser$waiting
     plot(x, y)
     h <- dpill(x, y)
     fit <- locpoly(x, y, bandwidth=h)
     lines(fit)

