Back

Explore Courses Blog Tutorials Interview Questions
+8 votes
3 views
in Big Data Hadoop & Spark by (1.5k points)

Can someone tell me what is metadata? What is the difference between Internal tables and external tables in the hive?

2 Answers

+14 votes
by (13.2k points)

To keep a track of state on its master node Hive has a relational database. If you have a partitioned table, the partitions are stored in the database(this allows hive to use lists of partitions without going to the file-system and finding them). These sort of things are the 'metadata'. If you drop a table, Hive returns the state it absolutely was in before you dropped the table. If you run the query 'select * from jay' after you drop jay, hive will tell you the table doesn’t exist. This happens as you have told hive to forget that table.

So, internal tables are used when we want Hive to completely manage table and the data. While the external tables are used when the data is also used outside of Hive.

0 votes
by (33.1k points)

Hive has a relational database on the master node it uses to maintain track of state. For example, when you CREATE TABLE FOO(foo string) LOCATION 'hdfs://tmp/';, this table schema is saved in the database.

If you have a partitioned table, the partitions are saved in the database(this allows the hive to utilize the lists of partitions without going to the file-system and finding them, etc). These kinds of things are 'metadata'.

When you drop an internal table, it drops the data, and it will also drop the metadata.

When you drop an external table, it only drops the metadata. That means the hive is ignorant of that data now. It does not encounter the data itself.

If you want more information regarding the same, refer to the following video:

Browse Categories

...