cut                   package:base                   R Documentation

_C_o_n_v_e_r_t _N_u_m_e_r_i_c _t_o _F_a_c_t_o_r

_D_e_s_c_r_i_p_t_i_o_n:

     'cut' divides the range of 'x' into intervals and codes the values
     in 'x' according to which interval they fall. The leftmost
     interval corresponds to level one, the next leftmost to level two
     and so on.

_U_s_a_g_e:

     cut(x, ...)

     ## Default S3 method:
     cut(x, breaks, labels = NULL,
         include.lowest = FALSE, right = TRUE, dig.lab = 3, ...)

_A_r_g_u_m_e_n_t_s:

       x: a numeric vector which is to be converted to a factor by
          cutting.

  breaks: either a vector of cut points or number giving the number of
          intervals which 'x' is to be cut into.

  labels: labels for the levels of the resulting category.  By default,
          labels are constructed using '"(a,b]"' interval notation. If
          'labels = FALSE', simple integer codes are returned instead
          of a factor.

include.lowest: logical, indicating if an 'x[i]' equal to the lowest
          (or highest, for 'right = FALSE') 'breaks' value should be
          included.

   right: logical, indicating if the intervals should be closed on the
          right (and open on the left) or vice versa.

 dig.lab: integer which is used when labels are not given. It
          determines the number of digits used in formatting the break
          numbers.

     ...: further arguments passed to or from other methods.

_D_e_t_a_i_l_s:

     If a 'labels' parameter is specified, its values are used to name
     the factor levels. If none is specified, the factor level labels
     are constructed as '"(b1, b2]"', '"(b2, b3]"' etc. for
     'right=TRUE' and as '"[b1, b2)"', ... if 'right=FALSE'. In this
     case, 'dig.lab' indicates how many digits should be used in
     formatting the numbers 'b1', 'b2', ....

_V_a_l_u_e:

     A 'factor' is returned, unless 'labels = FALSE' which results in
     the mere integer level codes.

_N_o_t_e:

     Instead of 'table(cut(x, br))', 'hist(x, br, plot = FALSE)' is
     more efficient and less memory hungry.  Instead of 'cut(*, labels
     = FALSE)', 'findInterval()' is more efficient.

_R_e_f_e_r_e_n_c_e_s:

     Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S
     Language_. Wadsworth & Brooks/Cole.

_S_e_e _A_l_s_o:

     'split' for splitting a variable according to a group factor;
     'factor', 'tabulate', 'table', 'findInterval()'.

_E_x_a_m_p_l_e_s:

     Z <- rnorm(10000)
     table(cut(Z, br = -6:6))
     sum(table(cut(Z, br = -6:6, labels=FALSE)))
     sum(   hist  (Z, br = -6:6, plot=FALSE)$counts)

     cut(rep(1,5),4)#-- dummy
     tx0 <- c(9, 4, 6, 5, 3, 10, 5, 3, 5)
     x <- rep(0:8, tx0)
     stopifnot(table(x) == tx0)

     table( cut(x, b = 8))
     table( cut(x, br = 3*(-2:5)))
     table( cut(x, br = 3*(-2:5), right = FALSE))

     ##--- some values OUTSIDE the breaks :
     table(cx  <- cut(x, br = 2*(0:4)))
     table(cxl <- cut(x, br = 2*(0:4), right = FALSE))
     which(is.na(cx));  x[is.na(cx)]  #-- the first 9  values  0
     which(is.na(cxl)); x[is.na(cxl)] #-- the last  5  values  8

     ## Label construction:
     y <- rnorm(100)
     table(cut(y, breaks = pi/3*(-3:3)))
     table(cut(y, breaks = pi/3*(-3:3), dig.lab=4))

     table(cut(y, breaks =  1*(-3:3), dig.lab=4))
     # extra digits don't "harm" here
     table(cut(y, breaks =  1*(-3:3), right = FALSE))
     #- the same, since no exact INT!

