Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

Can I think of an ORC file as similar to a CSV file with column headings and row labels containing data? If so, can I somehow read it into a simple pandas dataframe? I am not that familiar with tools like Hadoop or Spark, but is it necessary to understand them just to see the contents of a local ORC file in Python?

The filename is someFile.snappy.orc

I can see online that spark.read.orc('someFile.snappy.orc') works, but even after import pyspark, it is throwing error.

1 Answer

0 votes
by (41.4k points)

Use the below code, it will work fine:

import pandas as pd

import pyarrow.orc as orc

with open(filename) as file:

    data = orc.ORCFile(file)

    df = data.read().to_pandas()

 

If you want to know more about Pandas Dataframe visit this Pandas Tutorial.

Related questions

0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

Browse Categories

...