/ / Spark DataFrame groupBy a agregácia vrhá NegativeArraySizeException - výnimka, apache-spark, dataframe

Spark DataFrame groupBy a agregácia je hádzanie NegativeArraySizeException - výnimka, apache-spark, dataframe

Robím nasledujúci dotaz na Spark DataFrame

  input
.select("id")
.groupBy("id")
.agg(count("*").as("count"))

Mám java.lang.NegativeArraySizeException

at org.apache.spark.unsafe.types.UTF8String.getBytes(UTF8String.java:234)
at org.apache.spark.unsafe.types.UTF8String.toString(UTF8String.java:827)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection.apply(Unknown Source)
at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator$$anonfun$generateProcessRow$1.apply(TungstenAggregationIterator.scala:276)
at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator$$anonfun$generateProcessRow$1.apply(TungstenAggregationIterator.scala:273)
at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:533)

odpovede:

0 pre odpoveď č. 1

Nižšie by mal fungovať

input.groupBy("id").count()