Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

Is there a Hive query to quickly find table size (i.e. number of rows) without launching a time-consuming MapReduce job? (Which is why I want to avoid COUNT(*).)

I tried DESCRIBE EXTENDED, but that yielded numRows=0 which is obviously not correct.

2 Answers

0 votes
by (32.3k points)
edited by

And for non-partitioned tables, “tblproperties” will give the size:

  • To get all the properties:

show tblproperties yourTableName

  • To show just the raw data size:

show tblproperties yourTableName("rawDataSize")

If the table is partitioned here is a quick command for you:

 hive> ANALYZE TABLE ops_bc_log PARTITION(day) COMPUTE STATISTICS noscan;

output is

Partition logdata.ops_bc_log{day=20140523} stats: [numFiles=37, numRows=26095186, totalSize=654249957, rawDataSize=58080809507]

Partition logdata.ops_bc_log{day=20140521} stats: [numFiles=30, numRows=21363807, totalSize=564014889, rawDataSize=47556570705]

Partition logdata.ops_bc_log{day=20140524} stats: [numFiles=35, numRows=25210367, totalSize=631424507, rawDataSize=56083164109]

Partition logdata.ops_bc_log{day=20140522} stats: [numFiles=37, numRows=26295075, totalSize=657113440, rawDataSize=58496087068]

If you want to know more about Hive, then do check out this awesome video tutorial:

0 votes
by (1.2k points)

There are numerous approaches using which we can find the number of rows without the need to execute very time consuming code, like:

  1. We can use the table properties:

SHOW TABLEPROPERTIES table_name;

This will result in the properties that includes the `rawDataSize` attribute which will allow you to get data size, but won’t be returning you the exact number of rows

  1. Additionally, we can also use more resource-intensive technique like ANALYZE TABLE

ANALYZE TABLE table_name COMPUTE STATISTICS;

Once this command is up and running, then execute 

DESCRIBE FORMATTED table_name

This will return your the rows counts

As there is no direct method to get the row count directly, but using table properties, HDFS commands, and by analyzing the table statistics we can reduce the time spent in using COUNT query

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...