pyspark.sql.functions.regr_avgx#

pyspark.sql.functions.regr_avgx(y, x)[source]#

Aggregate function: returns the average of the independent variable for non-null pairs in a group, where y is the dependent variable and x is the independent variable.

New in version 3.5.0.

Parameters
yColumn or column name

the dependent variable.

xColumn or column name

the independent variable.

Returns
Column

the average of the independent variable for non-null pairs in a group.

Examples

Example 1: All pairs are non-null

>>> import pyspark.sql.functions as sf
>>> df = spark.sql("SELECT * FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x)")
>>> df.select(sf.regr_avgx("y", "x"), sf.avg("x")).show()
+---------------+------+
|regr_avgx(y, x)|avg(x)|
+---------------+------+
|           2.75|  2.75|
+---------------+------+

Example 2: All pairs’ x values are null

>>> import pyspark.sql.functions as sf
>>> df = spark.sql("SELECT * FROM VALUES (1, null) AS tab(y, x)")
>>> df.select(sf.regr_avgx("y", "x"), sf.avg("x")).show()
+---------------+------+
|regr_avgx(y, x)|avg(x)|
+---------------+------+
|           NULL|  NULL|
+---------------+------+

Example 3: All pairs’ y values are null

>>> import pyspark.sql.functions as sf
>>> df = spark.sql("SELECT * FROM VALUES (null, 1) AS tab(y, x)")
>>> df.select(sf.regr_avgx("y", "x"), sf.avg("x")).show()
+---------------+------+
|regr_avgx(y, x)|avg(x)|
+---------------+------+
|           NULL|   1.0|
+---------------+------+

Example 4: Some pairs’ x values are null

>>> import pyspark.sql.functions as sf
>>> df = spark.sql("SELECT * FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x)")
>>> df.select(sf.regr_avgx("y", "x"), sf.avg("x")).show()
+---------------+------+
|regr_avgx(y, x)|avg(x)|
+---------------+------+
|            3.0|   3.0|
+---------------+------+

Example 5: Some pairs’ x or y values are null

>>> import pyspark.sql.functions as sf
>>> df = spark.sql("SELECT * FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x)")
>>> df.select(sf.regr_avgx("y", "x"), sf.avg("x")).show()
+---------------+------+
|regr_avgx(y, x)|avg(x)|
+---------------+------+
|            3.0|   3.0|
+---------------+------+