pyspark.sql.functions.array_join#
- pyspark.sql.functions.array_join(col, delimiter, null_replacement=None)[source]#
- Array function: Returns a string column by concatenating the elements of the input array column using the delimiter. Null values within the array can be replaced with a specified string through the null_replacement argument. If null_replacement is not set, null values are ignored. - New in version 2.4.0. - Changed in version 3.4.0: Supports Spark Connect. - Parameters
- colColumnor str
- The input column containing the arrays to be joined. 
- delimiterstr
- The string to be used as the delimiter when joining the array elements. 
- null_replacementstr, optional
- The string to replace null values within the array. If not set, null values are ignored. 
 
- col
- Returns
- Column
- A new column of string type, where each value is the result of joining the corresponding array from the input column. 
 
 - Examples - Example 1: Basic usage of array_join function. - >>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(["a", "b", "c"],), (["a", "b"],)], ['data']) >>> df.select(sf.array_join(df.data, ",")).show() +-------------------+ |array_join(data, ,)| +-------------------+ | a,b,c| | a,b| +-------------------+ - Example 2: Usage of array_join function with null_replacement argument. - >>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(["a", None, "c"],)], ['data']) >>> df.select(sf.array_join(df.data, ",", "NULL")).show() +-------------------------+ |array_join(data, ,, NULL)| +-------------------------+ | a,NULL,c| +-------------------------+ - Example 3: Usage of array_join function without null_replacement argument. - >>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(["a", None, "c"],)], ['data']) >>> df.select(sf.array_join(df.data, ",")).show() +-------------------+ |array_join(data, ,)| +-------------------+ | a,c| +-------------------+ - Example 4: Usage of array_join function with an array that is null. - >>> from pyspark.sql import functions as sf >>> from pyspark.sql.types import StructType, StructField, ArrayType, StringType >>> schema = StructType([StructField("data", ArrayType(StringType()), True)]) >>> df = spark.createDataFrame([(None,)], schema) >>> df.select(sf.array_join(df.data, ",")).show() +-------------------+ |array_join(data, ,)| +-------------------+ | NULL| +-------------------+ - Example 5: Usage of array_join function with an array containing only null values. - >>> from pyspark.sql import functions as sf >>> from pyspark.sql.types import StructType, StructField, ArrayType, StringType >>> schema = StructType([StructField("data", ArrayType(StringType()), True)]) >>> df = spark.createDataFrame([([None, None],)], schema) >>> df.select(sf.array_join(df.data, ",", "NULL")).show() +-------------------------+ |array_join(data, ,, NULL)| +-------------------------+ | NULL,NULL| +-------------------------+