| corr {SparkR} | R Documentation |
Computes the Pearson Correlation Coefficient for two Columns.
Calculates the correlation of two columns of a SparkDataFrame. Currently only supports the Pearson Correlation Coefficient. For Spearman Correlation, consider using RDD methods found in MLlib's Statistics.
corr(x, ...) ## S4 method for signature 'Column' corr(x, col2) ## S4 method for signature 'SparkDataFrame' corr(x, colName1, colName2, method = "pearson")
x |
a Column or a SparkDataFrame. |
... |
additional argument(s). If |
col2 |
a (second) Column. |
colName1 |
the name of the first column |
colName2 |
the name of the second column |
method |
Optional. A character specifying the method for calculating the correlation. only "pearson" is allowed now. |
The Pearson Correlation Coefficient as a Double.
corr since 1.6.0
corr since 1.6.0
Other math_funcs: acos, asin,
atan2, atan,
bin, bround,
cbrt, ceil,
conv, cosh,
cos, covar_pop,
cov, expm1,
exp, factorial,
floor, hex,
hypot, log10,
log1p, log2,
log, pmod,
rint, round,
shiftLeft,
shiftRightUnsigned,
shiftRight, signum,
sinh, sin,
sqrt, tanh,
tan, toDegrees,
toRadians, unhex
Other stat functions: approxQuantile,
cov, crosstab,
freqItems, sampleBy
## Not run: corr(df$c, df$d)
## Not run:
##D df <- read.json("/path/to/file.json")
##D corr <- corr(df, "title", "gender")
##D corr <- corr(df, "title", "gender", method = "pearson")
## End(Not run)