| dapplyCollect {SparkR} | R Documentation |
Apply a function to each partition of a SparkDataFrame and collect the result back to R as a data.frame.
dapplyCollect(x, func) ## S4 method for signature 'SparkDataFrame,'function'' dapplyCollect(x, func)
x |
A SparkDataFrame |
func |
A function to be applied to each partition of the SparkDataFrame. func should have only one parameter, to which a R data.frame corresponds to each partition will be passed. The output of func should be a R data.frame. |
dapplyCollect since 2.0.0
Other SparkDataFrame functions: SparkDataFrame-class,
agg, arrange,
as.data.frame, attach,
cache, coalesce,
collect, colnames,
coltypes,
createOrReplaceTempView,
crossJoin, dapply,
describe, dim,
distinct, dropDuplicates,
dropna, drop,
dtypes, except,
explain, filter,
first, gapplyCollect,
gapply, getNumPartitions,
group_by, head,
histogram, insertInto,
intersect, isLocal,
join, limit,
merge, mutate,
ncol, nrow,
persist, printSchema,
randomSplit, rbind,
registerTempTable, rename,
repartition, sample,
saveAsTable, schema,
selectExpr, select,
showDF, show,
storageLevel, str,
subset, take,
union, unpersist,
withColumn, with,
write.df, write.jdbc,
write.json, write.orc,
write.parquet, write.text
## Not run:
##D df <- createDataFrame(iris)
##D ldf <- dapplyCollect(df, function(x) { x })
##D
##D # filter and add a column
##D df <- createDataFrame(
##D list(list(1L, 1, "1"), list(2L, 2, "2"), list(3L, 3, "3")),
##D c("a", "b", "c"))
##D ldf <- dapplyCollect(
##D df,
##D function(x) {
##D y <- x[x[1] > 1, ]
##D y <- cbind(y, y[1] + 1L)
##D })
##D # the result
##D # a b c d
##D # 2 2 2 3
##D # 3 3 3 4
## End(Not run)