| repartition {SparkR} | R Documentation |
The following options for repartition are possible:
1. Return a new SparkDataFrame that has exactly numPartitions.
2. Return a new SparkDataFrame hash partitioned by
the given columns into numPartitions.
3. Return a new SparkDataFrame hash partitioned by the given column(s),
using spark.sql.shuffle.partitions as number of partitions.
repartition(x, ...) ## S4 method for signature 'SparkDataFrame' repartition(x, numPartitions = NULL, col = NULL, ...)
x |
a SparkDataFrame. |
... |
additional column(s) to be used in the partitioning. |
numPartitions |
the number of partitions to use. |
col |
the column by which the partitioning will be performed. |
repartition since 1.4.0
Other SparkDataFrame functions: SparkDataFrame-class,
agg, arrange,
as.data.frame, attach,
cache, coalesce,
collect, colnames,
coltypes,
createOrReplaceTempView,
crossJoin, dapplyCollect,
dapply, describe,
dim, distinct,
dropDuplicates, dropna,
drop, dtypes,
except, explain,
filter, first,
gapplyCollect, gapply,
getNumPartitions, group_by,
head, histogram,
insertInto, intersect,
isLocal, join,
limit, merge,
mutate, ncol,
nrow, persist,
printSchema, randomSplit,
rbind, registerTempTable,
rename, sample,
saveAsTable, schema,
selectExpr, select,
showDF, show,
storageLevel, str,
subset, take,
union, unpersist,
withColumn, with,
write.df, write.jdbc,
write.json, write.orc,
write.parquet, write.text
## Not run:
##D sparkR.session()
##D path <- "path/to/file.json"
##D df <- read.json(path)
##D newDF <- repartition(df, 2L)
##D newDF <- repartition(df, numPartitions = 2L)
##D newDF <- repartition(df, col = df$"col1", df$"col2")
##D newDF <- repartition(df, 3L, col = df$"col1", df$"col2")
## End(Not run)