pyspark.sql.functions.string_agg#

pyspark.sql.functions.string_agg(col, delimiter=None)[source]#

Aggregate function: returns the concatenation of non-null input values, separated by the delimiter.

An alias of listagg().

New in version 4.0.0.

Parameters
colColumn or column name

target column to compute on.

delimiterColumn, literal string or bytes, optional

the delimiter to separate the values. The default value is None.

Returns
Column

the column for computed results.

Examples

Example 1: Using string_agg function

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([('a',), ('b',), (None,), ('c',)], ['strings'])
>>> df.select(sf.string_agg('strings')).show()
+-------------------------+
|string_agg(strings, NULL)|
+-------------------------+
|                      abc|
+-------------------------+

Example 2: Using string_agg function with a delimiter

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([('a',), ('b',), (None,), ('c',)], ['strings'])
>>> df.select(sf.string_agg('strings', ', ')).show()
+-----------------------+
|string_agg(strings, , )|
+-----------------------+
|                a, b, c|
+-----------------------+

Example 3: Using string_agg function with a binary column and delimiter

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(b'',), (b'',), (None,), (b'',)], ['bytes'])
>>> df.select(sf.string_agg('bytes', b'B')).show()
+------------------------+
|string_agg(bytes, X'42')|
+------------------------+
|        [01 42 02 42 03]|
+------------------------+

Example 4: Using string_agg function on a column with all None values

>>> from pyspark.sql import functions as sf
>>> from pyspark.sql.types import StructType, StructField, StringType
>>> schema = StructType([StructField("strings", StringType(), True)])
>>> df = spark.createDataFrame([(None,), (None,), (None,), (None,)], schema=schema)
>>> df.select(sf.string_agg('strings')).show()
+-------------------------+
|string_agg(strings, NULL)|
+-------------------------+
|                     NULL|
+-------------------------+