pyspark.sql.functions.regr_intercept#

pyspark.sql.functions.regr_intercept(y, x)[source]#

Aggregate function: returns the intercept of the univariate linear regression line for non-null pairs in a group, where y is the dependent variable and x is the independent variable.

New in version 3.5.0.

Parameters
yColumn or str

the dependent variable.

xColumn or str

the independent variable.

Returns
Column

the intercept of the univariate linear regression line for non-null pairs in a group.

Examples

Example 1: All pairs are non-null

>>> import pyspark.sql.functions as sf
>>> df = spark.sql("SELECT * FROM VALUES (1, 1), (2, 2), (3, 3), (4, 4) AS tab(y, x)")
>>> df.select(sf.regr_intercept("y", "x")).show()
+--------------------+
|regr_intercept(y, x)|
+--------------------+
|                 0.0|
+--------------------+

Example 2: All pairs’ x values are null

>>> import pyspark.sql.functions as sf
>>> df = spark.sql("SELECT * FROM VALUES (1, null) AS tab(y, x)")
>>> df.select(sf.regr_intercept("y", "x")).show()
+--------------------+
|regr_intercept(y, x)|
+--------------------+
|                NULL|
+--------------------+

Example 3: All pairs’ y values are null

>>> import pyspark.sql.functions as sf
>>> df = spark.sql("SELECT * FROM VALUES (null, 1) AS tab(y, x)")
>>> df.select(sf.regr_intercept("y", "x")).show()
+--------------------+
|regr_intercept(y, x)|
+--------------------+
|                NULL|
+--------------------+

Example 4: Some pairs’ x values are null

>>> import pyspark.sql.functions as sf
>>> df = spark.sql("SELECT * FROM VALUES (1, 1), (2, null), (3, 3), (4, 4) AS tab(y, x)")
>>> df.select(sf.regr_intercept("y", "x")).show()
+--------------------+
|regr_intercept(y, x)|
+--------------------+
|                 0.0|
+--------------------+

Example 5: Some pairs’ x or y values are null

>>> import pyspark.sql.functions as sf
>>> df = spark.sql("SELECT * FROM VALUES (1, 1), (2, null), (null, 3), (4, 4) AS tab(y, x)")
>>> df.select(sf.regr_intercept("y", "x")).show()
+--------------------+
|regr_intercept(y, x)|
+--------------------+
|                 0.0|
+--------------------+