pyspark.pandas.Series.corr#
- Series.corr(other, method='pearson', min_periods=None)[source]#
- Compute correlation with other Series, excluding missing values. - New in version 3.3.0. - Parameters
- otherSeries
- method{‘pearson’, ‘spearman’, ‘kendall’}
- pearson : standard correlation coefficient 
- spearman : Spearman rank correlation 
- kendall : Kendall Tau correlation coefficient 
 - Changed in version 3.4.0: support ‘kendall’ for method parameter 
- min_periodsint, optional
- Minimum number of observations needed to have a valid result. - New in version 3.4.0. 
 
- Returns
- correlationfloat
 
 - Notes - The complexity of Kendall correlation is O(#row * #row), if the dataset is too large, sampling ahead of correlation computation is recommended. - Examples - >>> df = ps.DataFrame({'s1': [.2, .0, .6, .2], ... 's2': [.3, .6, .0, .1]}) >>> s1 = df.s1 >>> s2 = df.s2 >>> s1.corr(s2, method='pearson') -0.85106... - >>> s1.corr(s2, method='spearman') -0.94868... - >>> s1.corr(s2, method='kendall') -0.91287... - >>> s1 = ps.Series([1, np.nan, 2, 1, 1, 2, 3]) >>> s2 = ps.Series([3, 4, 1, 1, 5]) - >>> with ps.option_context("compute.ops_on_diff_frames", True): ... s1.corr(s2, method="pearson") -0.52223... - >>> with ps.option_context("compute.ops_on_diff_frames", True): ... s1.corr(s2, method="spearman") -0.54433... - >>> with ps.option_context("compute.ops_on_diff_frames", True): ... s1.corr(s2, method="kendall") -0.51639... - >>> with ps.option_context("compute.ops_on_diff_frames", True): ... s1.corr(s2, method="kendall", min_periods=5) nan