Back

Explore Courses Blog Tutorials Interview Questions
0 votes
3 views
in Data Science by (18.4k points)

I have the python code that filters the data according to a specific column and creates multiple CSV files.

Here is my main csv file:

Name,    City,      Email

john     cty_1      [email protected]

jack     cty_1      [email protected]

...

Ross     cty_2      [email protected]

Rachel   cty_2      [email protected]

...

My python logic currently produces a separate csv for separate cities. Existing python logic is:

from itertools import groupby

import csv

with open('filtered_final.csv') as csv_file:

    reader = csv.reader(csv_file)

    next(reader) #skip header

    

    #Group by column (city)

    lst = sorted(reader, key=lambda x : x[1])

    groups = groupby(lst, key=lambda x : x[1])

    #Write file for each city

    for k,g in groups:

        filename = k[21:] + '.csv'

        with open(filename, 'w', newline='') as fout:

            csv_output = csv.writer(fout)

            csv_output.writerow(["Name","City","Email"])  #header

            for line in g:

                csv_output.writerow(line)

I want to remove the "City" column on each of the new CSV files. 

1 Answer

0 votes
by (36.8k points)

If your data is small enough to place on the ram, you can just read the entire thing in also do a groupby:

import pandas as pd

df = pd.read_csv('filtered_final.csv')

for city, data in df[['Name','Email']].groupby(df['City']):

    data.to_csv(f'{city}_data.csv', index=False)

 Do check out python for data science which helps you understand from scratch 

Browse Categories

...