pyspark.sql.functions.stack#

pyspark.sql.functions.stack(*cols)[source]#

Separates col1, …, colk into n rows. Uses column names col0, col1, etc. by default unless specified otherwise.

New in version 3.5.0.

Parameters

colsColumn or column name: the first element should be a literal int for the number of rows to be separated, and the remaining are input elements to be separated.

Examples

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(1, 2, 3)], ['a', 'b', 'c'])
>>> df.select('*', sf.stack(sf.lit(2), df.a, df.b, 'c')).show()
+---+---+---+----+----+
|  a|  b|  c|col0|col1|
+---+---+---+----+----+
|  1|  2|  3|   1|   2|
|  1|  2|  3|   3|NULL|
+---+---+---+----+----+

>>> df.select('*', sf.stack(sf.lit(2), df.a, df.b, 'c').alias('x', 'y')).show()
+---+---+---+---+----+
|  a|  b|  c|  x|   y|
+---+---+---+---+----+
|  1|  2|  3|  1|   2|
|  1|  2|  3|  3|NULL|
+---+---+---+---+----+

>>> df.select('*', sf.stack(sf.lit(3), df.a, df.b, 'c')).show()
+---+---+---+----+
|  a|  b|  c|col0|
+---+---+---+----+
|  1|  2|  3|   1|
|  1|  2|  3|   2|
|  1|  2|  3|   3|
+---+---+---+----+

>>> df.select('*', sf.stack(sf.lit(4), df.a, df.b, 'c')).show()
+---+---+---+----+
|  a|  b|  c|col0|
+---+---+---+----+
|  1|  2|  3|   1|
|  1|  2|  3|   2|
|  1|  2|  3|   3|
|  1|  2|  3|NULL|
+---+---+---+----+