Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

I have a data file which is in .txt format. I am using the file to load data into Hive tables. When I load the file in a table like

CREATE TABLE test_details_txt(
visit_id INT,
store_id SMALLINT) STORED AS TEXTFILE;

 

the data is loaded correctly using

LOAD DATA LOCAL INPATH '/home/user/test_details.txt' INTO TABLE test_details_txt;


and I can run a SELECT * FROM test_details_txt; on the table in Hive.

However If I try to load the data in a table that is

CREATE TABLE test_details_txt(
visit_id INT,
store_id SMALLINT) STORED AS ORC; 


I receive the following error on trying to run a SELECT:

Failed with exception java.io.IOException:java.io.IOException: Malformed ORC file hdfs://master:6000/user/hive/warehouse/test.db/transaction_details/test_details.txt. Invalid postscript.

While loading the data using above LOAD statement I do not receive any error or exception.

1 Answer

0 votes
by (32.3k points)
edited by

The ORC(Optimized Row Columnar) file format gives a highly efficient way to store data in Hive. It was created to overcome the limitations of the other Hive file formats. Usage of ORC files in Hive increases the performance of reading, writing, and processing data.

LOAD DATA is used to copy the files to hive datafiles. So, in this case, if you are loading the input file /home/user/test_details.txt into an ORC table, it is required to be in ORC format.

A possible workaround is to create a temporary table that is STORED AS TEXT,  LOAD DATA into it, and at last copy data to the ORC table.

Example:

Storing as a Text file:

CREATE TABLE test_details_txt( visit_id INT, store_id SMALLINT) STORED AS TEXTFILE;

CREATE TABLE test_details_orc( visit_id INT, store_id SMALLINT) STORED AS ORC;

Load into Text table:

LOAD DATA LOCAL INPATH '/home/user/test_details.txt' INTO TABLE test_details_txt;

Copy to ORC table:

INSERT INTO TABLE test_details_orc SELECT * FROM test_details_txt;

If you want to know more about Hive, then do check out this awesome video tutorial:

Related questions

Browse Categories

...