pyspark.Accumulator¶
-
class
pyspark.Accumulator(aid: int, value: T, accum_param: pyspark.accumulators.AccumulatorParam[T])[source]¶ A shared variable that can be accumulated, i.e., has a commutative and associative “add” operation. Worker tasks on a Spark cluster can add values to an Accumulator with the += operator, but only the driver program is allowed to access its value, using value. Updates from the workers get propagated automatically to the driver program.
While
SparkContextsupports accumulators for primitive data types likeintandfloat, users can also define accumulators for custom types by providing a customAccumulatorParamobject. Refer to its doctest for an example.Examples
>>> a = sc.accumulator(1) >>> a.value 1 >>> a.value = 2 >>> a.value 2 >>> a += 5 >>> a.value 7 >>> sc.accumulator(1.0).value 1.0 >>> sc.accumulator(1j).value 1j >>> rdd = sc.parallelize([1,2,3]) >>> def f(x): ... global a ... a += x ... >>> rdd.foreach(f) >>> a.value 13 >>> b = sc.accumulator(0) >>> def g(x): ... b.add(x) ... >>> rdd.foreach(g) >>> b.value 6
>>> rdd.map(lambda x: a.value).collect() Traceback (most recent call last): ... Py4JJavaError: ...
>>> def h(x): ... global a ... a.value = 7 ... >>> rdd.foreach(h) Traceback (most recent call last): ... Py4JJavaError: ...
>>> sc.accumulator([1.0, 2.0, 3.0]) Traceback (most recent call last): ... TypeError: ...
Methods
add(term)Adds a term to this accumulator’s value
Attributes
Get the accumulator’s value; only usable in driver program