0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)

I want to create a Hive table out of some JSON data (nested) and run queries on it? Is this even possible?

I've gotten as far as uploading the JSON file to S3 and launching an EMR instance but I don't know what to type in the hive console to get the JSON file to be a Hive table?

Does anyone have some example command to get me started, I can't find anything useful with Google ...

1 Answer

0 votes
by (32.5k points)
edited by

You can use  JSON Serde. You have to create a table with a structure that maps the structure of JSON. 

Then you should upload your JSON file in the location path of the table, giving the right permissions and you are good to go.

For example:

Let data.json be:

{"X": 134, "Y": 55, "labels": ["L1", "L2"]}

{"X": 11, "Y": 166, "labels": ["L1", "L3", "L4"]}

Now, create a table:



    X INT,

    Y INT,

    labels ARRAY<STRING>


ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'


LOCATION 'path/to/table';

 If you want to know more about Hive, then do check out this awesome video tutorial:

Welcome to Intellipaat Community. Get your technical queries answered by top developers !