tapply                 package:base                 R Documentation

_A_p_p_l_y _a _F_u_n_c_t_i_o_n _O_v_e_r _a "_R_a_g_g_e_d" _A_r_r_a_y

_D_e_s_c_r_i_p_t_i_o_n:

     Apply a function to each cell of a ragged array, that is to each
     (non-empty) group of values given by a unique combination of the
     levels of certain factors.

_U_s_a_g_e:

     tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE)

_A_r_g_u_m_e_n_t_s:

       X: an atomic object, typically a vector.

   INDEX: list of factors, each of same length as 'X'.

     FUN: the function to be applied.  In the case of functions like
          '+', '%*%', etc., the function name must be quoted.  If 'FUN'
          is 'NULL', tapply returns a vector which can be used to
          subscript the multi-way array 'tapply' normally produces.

     ...: optional arguments to 'FUN'.

simplify: If 'FALSE', 'tapply' always returns an array of mode
          '"list"'.  If 'TRUE' (the default), then if 'FUN' always
          returns a scalar, 'tapply' returns an array with the mode of
          the scalar.

_V_a_l_u_e:

     When 'FUN' is present, 'tapply' calls 'FUN' for each cell that has
     any data in it.  If 'FUN' returns a single atomic value for each
     cell (e.g., functions 'mean' or 'var') and when 'simplify' is
     'TRUE', 'tapply' returns a multi-way array containing the values. 
     The array has the same number of dimensions as 'INDEX' has
     components; the number of levels in a dimension is the number of
     levels ('nlevels()') in the corresponding component of 'INDEX'.

     Note that contrary to S, 'simplify = TRUE' always returns an
     array, possibly 1-dimensional.

     If 'FUN' does not return a single atomic value, 'tapply' returns
     an array of mode 'list' whose components are the values of the
     individual calls to 'FUN', i.e., the result is a list with a 'dim'
     attribute.

_R_e_f_e_r_e_n_c_e_s:

     Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S
     Language_. Wadsworth & Brooks/Cole.

_S_e_e _A_l_s_o:

     the convenience functions 'by' and 'aggregate' (using 'tapply');
     'apply', 'lapply' with its versions 'sapply' and 'mapply'.

_E_x_a_m_p_l_e_s:

     require(stats)
     groups <- as.factor(rbinom(32, n = 5, p = .4))
     tapply(groups, groups, length) #- is almost the same as
     table(groups)

     ## contingency table from data.frame : array with named dimnames
     tapply(warpbreaks$breaks, warpbreaks[,-1], sum)
     tapply(warpbreaks$breaks, warpbreaks[, 3, drop = FALSE], sum)

     n <- 17; fac <- factor(rep(1:3, len = n), levels = 1:5)
     table(fac)
     tapply(1:n, fac, sum)
     tapply(1:n, fac, sum, simplify = FALSE)
     tapply(1:n, fac, range)
     tapply(1:n, fac, quantile)

     ## example of ... argument: find quarterly means
     tapply(presidents, cycle(presidents), mean, na.rm = TRUE)

     ind <- list(c(1, 2, 2), c("A", "A", "B"))
     table(ind)
     tapply(1:3, ind) #-> the split vector
     tapply(1:3, ind, sum)

