pyspark.sql.functions.array_contains#

pyspark.sql.functions.array_contains(col, value)[source]#

Collection function: This function returns a boolean indicating whether the array contains the given value, returning null if the array is null, true if the array contains the given value, and false otherwise.

New in version 1.5.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
colColumn or str

The target column containing the arrays.

value

The value or column to check for in the array.

Returns
Column

A new Column of Boolean type, where each value indicates whether the corresponding array from the input column contains the specified value.

Examples

Example 1: Basic usage of array_contains function.

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(["a", "b", "c"],), ([],)], ['data'])
>>> df.select(sf.array_contains(df.data, "a").alias("contains_a")).show()
+----------+
|contains_a|
+----------+
|      true|
|     false|
+----------+

Example 2: Usage of array_contains function with a column.

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(["a", "b", "c"], "c"),
...                            (["c", "d", "e"], "d"),
...                            (["e", "a", "c"], "b")], ["data", "item"])
>>> df.select(sf.array_contains(df.data, sf.col("item"))
...   .alias("data_contains_item")).show()
+------------------+
|data_contains_item|
+------------------+
|              true|
|              true|
|             false|
+------------------+

Example 3: Attempt to use array_contains function with a null array.

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(None,), (["a", "b", "c"],)], ['data'])
>>> df.select(sf.array_contains(df.data, "a").alias("contains_a")).show()
+----------+
|contains_a|
+----------+
|      NULL|
|      true|
+----------+

Example 4: Usage of array_contains with an array column containing null values.

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(["a", None, "c"],)], ['data'])
>>> df.select(sf.array_contains(df.data, "a").alias("contains_a")).show()
+----------+
|contains_a|
+----------+
|      true|
+----------+