/ / Pig Chyba pri funkcii SUM - hadoop, apache-pig, hadoop2

Pig Error na SUM funkcii - hadoop, apache-pig, hadoop2

Mám údaje ako -

store  trn_date  dept_id sale_amt
1    2014-12-14 101   10007655
1    2014-12-14 101   10007654
1    2014-12-14 101   10007544
6    2014-12-14 104   100086544
8    2014-12-14 101   1000000
9    2014-12-14 106   1000000

Chcem získať sumu sale_amt, za to som ja

Najprv načítam údaje pomocou:

table = LOAD "table" USING org.apache.hcatalog.pig.HCatLoader();

Potom zoskupenie dát v obchode, tran_date, dept_id

grp_table = GROUP table BY (store, tran_date, dept_id);

Nakoniec sa snaží dostať SUM z sale_amt pomocou

grp_gen = FOREACH grp_table GENERATE
FLATTEN(group) AS (store, tran_date, dept_id),
SUM(table.sale_amt) AS tota_sale_amt;

pod chybou -

================================================================================
Pig Stack Trace
---------------
ERROR 2103: Problem doing work on Longs

org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: grouped_all: Local Rearrange[tuple]{tuple}(false) - scope-1317 Operator Key: scope-1317): org.apache.pig.backend.executionengine.ExecException: ERROR 2103: Problem doing work on Longs
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:289)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:263)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:183)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:161)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1645)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1611)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2103: Problem doing work on Longs
at org.apache.pig.builtin.AlgebraicLongMathBase.doTupleWork(AlgebraicLongMathBase.java:84)
at org.apache.pig.builtin.AlgebraicLongMathBase$Intermediate.exec(AlgebraicLongMathBase.java:108)
at org.apache.pig.builtin.AlgebraicLongMathBase$Intermediate.exec(AlgebraicLongMathBase.java:102)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:330)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextTuple(POUserFunc.java:369)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:333)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:378)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:298)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:281)
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Number
at org.apache.pig.builtin.AlgebraicLongMathBase.doTupleWork(AlgebraicLongMathBase.java:77)
================================================================================

Ako som "m čítanie tabuľky pomocou HCatalog Loader a v tabuľke úľa tabuľka dátový typ je reťazec, takže som sa snažil s odlievanie aj v skripte, ale stále dostať rovnakú chybu

odpovede:

0 pre odpoveď č. 1

Ja nemám HCatalog nainštalovaný v mojom systéme, tak sa snažil s jednoduchým súborom, ale nižšie uvedený prístup a kód bude pracovať pre vás.

1.SUM bude fungovať iba s typmi údajov (int, long, float, double, bigdecimal, biginteger or bytearray cast as double). Vyzerá to ako váš sale_amt stĺpec je v reťazci, takže tento stĺpec musíte zadať do (long or double) pred použitím SUM Funkcie.

2.Nepoužívajte store ako premenná, bcoz je rezervované kľúčové slovo v Pig, takže budete musieť premenovať túto premennú na iné meno, inak dostanete chybu. Premenovanú premennú som premenoval na "obchody".

Príklad:

tabuľka:

1    2014-12-14   101   10007655
1    2014-12-14   101   10007654
1    2014-12-14   101   10007544
6    2014-12-14   104   100086544
8    2014-12-14   101   1000000
9    2014-12-14   106   1000000

PigScript:

A = LOAD "table" USING PigStorage() AS (store:chararray,trn_date:chararray,dept_id:chararray,sale_amt:chararray);
B = FOREACH A GENERATE $0 AS stores,trn_date,dept_id,(long)sale_amt; --Renamed the variable store to stores and typecasted the sale_amt to long.
C = GROUP B BY (stores,trn_date,dept_id);
D = FOREACH C GENERATE FLATTEN(group),SUM(B.sale_amt);
DUMP D;

Výkon:

(1,2014-12-14,101,30022853)
(6,2014-12-14,104,100086544)
(8,2014-12-14,101,1000000)
(9,2014-12-14,106,1000000)