pyspark.sql.functions.string_agg#
- pyspark.sql.functions.string_agg(col, delimiter=None)[source]#
Aggregate function: returns the concatenation of non-null input values, separated by the delimiter.
An alias of
listagg()
.New in version 4.0.0.
- Parameters
- Returns
Column
the column for computed results.
Examples
Example 1: Using string_agg function
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([('a',), ('b',), (None,), ('c',)], ['strings']) >>> df.select(sf.string_agg('strings')).show() +-------------------------+ |string_agg(strings, NULL)| +-------------------------+ | abc| +-------------------------+
Example 2: Using string_agg function with a delimiter
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([('a',), ('b',), (None,), ('c',)], ['strings']) >>> df.select(sf.string_agg('strings', ', ')).show() +-----------------------+ |string_agg(strings, , )| +-----------------------+ | a, b, c| +-----------------------+
Example 3: Using string_agg function with a binary column and delimiter
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(b'',), (b'',), (None,), (b'',)], ['bytes']) >>> df.select(sf.string_agg('bytes', b'B')).show() +------------------------+ |string_agg(bytes, X'42')| +------------------------+ | [01 42 02 42 03]| +------------------------+
Example 4: Using string_agg function on a column with all None values
>>> from pyspark.sql import functions as sf >>> from pyspark.sql.types import StructType, StructField, StringType >>> schema = StructType([StructField("strings", StringType(), True)]) >>> df = spark.createDataFrame([(None,), (None,), (None,), (None,)], schema=schema) >>> df.select(sf.string_agg('strings')).show() +-------------------------+ |string_agg(strings, NULL)| +-------------------------+ | NULL| +-------------------------+