0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)

Is there a Hive query to quickly find table size (i.e. number of rows) without launching a time-consuming MapReduce job? (Which is why I want to avoid COUNT(*).)

I tried DESCRIBE EXTENDED, but that yielded numRows=0 which is obviously not correct.

1 Answer

0 votes
by (32.3k points)
edited by

And for non-partitioned tables, “tblproperties” will give the size:

  • To get all the properties:

show tblproperties yourTableName

  • To show just the raw data size:

show tblproperties yourTableName("rawDataSize")

If the table is partitioned here is a quick command for you:

 hive> ANALYZE TABLE ops_bc_log PARTITION(day) COMPUTE STATISTICS noscan;

output is

Partition logdata.ops_bc_log{day=20140523} stats: [numFiles=37, numRows=26095186, totalSize=654249957, rawDataSize=58080809507]

Partition logdata.ops_bc_log{day=20140521} stats: [numFiles=30, numRows=21363807, totalSize=564014889, rawDataSize=47556570705]

Partition logdata.ops_bc_log{day=20140524} stats: [numFiles=35, numRows=25210367, totalSize=631424507, rawDataSize=56083164109]

Partition logdata.ops_bc_log{day=20140522} stats: [numFiles=37, numRows=26295075, totalSize=657113440, rawDataSize=58496087068]

If you want to know more about Hive, then do check out this awesome video tutorial:

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...