I have a CSV file that I want to read with Pandas library in Python.
In this table when we encounter a new item (e.g. items Nr. 1393 or 1654 in the example below) we first have a 4 column row metadata and after that several 100 column rows as real data associated to that item. Then it happens again for the next item and so on.
The table is like this:
1,1393,0,0
1,1393,1,22,55,63,...
1,1393,5,32,43,163,...
.
.
.
2,1654,0,0
2,1654,8,95,96,142,...
2,1654,21,31,364,9,...
.
.
.
So the problems are:
- Some rows have different sizes than others
- We do not have headers and can not create it as the first row has 4 entries and second one 100 entries
- My CSV file is huge (about 10G)!
Any suggestion which helps me to organize my data in Pandas or any other Python library is highly appreciated.
PS: BTW, anybody knows how to manage it in Tableau?