0 votes
1 view
in Azure by (7.9k points)

I have 4 csv files that are inputs to the python script in azure ML, but the widget has only 2 inputs for dataframes and the third for a zip file. I tried to put the csv files in a zipped folder and connect it to the third input for the script but that also did not work : 

Image of workspace

I would like to know how to read multiple csv files in the python script.

1 Answer

0 votes
by (15.5k points)

Here's some more detail on the approach others have outlined above. Try replacing the code currently in the "Execute Python Script" module with the following:

import pandas as pd

import os

def azureml_main(dataframe1=None, dataframe2=None):

    print(os.listdir('.'))

    return(pd.DataFrame([]))

After running the experiment, click on the module. There should be a "View output log" link now in the right-hand bar. I get something like the following:

[Information]         Started in [C:\temp]

[Information]         Running in [C:\temp]

[Information]         Executing 4af67c05ba02417a980f6a16e84e61dc with inputs [] and generating outputs ['.maml.oport1']

[Information]         Extracting Script Bundle.zip to .\Script Bundle

[Information]         File Name                                             Modified             Size

[Information]         temp.csv                                       2016-05-06 13:16:56           52

[Information]         [ READING ] 0:00:00

[Information]         ['4af67c05ba02417a980f6a16e84e61dc.py', 'Script Bundle', 'Script Bundle.zip']

This tells me that the contents of my zip file have been extracted to the C:\temp\Script Bundlefolder. In my case the zip file contained just one CSV file, temp.csv: your output would probably have four files. You may also have zipped a folder containing your four files, in which case the filepath would be one layer deeper. You can use the os.listdir() to explore your directory structure further if necessary.

Once you think you know the full filepaths for your CSV files, edit your Execute Python Script module's code to load them, e.g.:

Here's some more detail on the approach others have outlined above. Try replacing the code currently in the "Execute Python Script" module with the following:

import pandas as pd

import os

def azureml_main(dataframe1=None, dataframe2=None):

    print(os.listdir('.'))

    return(pd.DataFrame([]))

After running the experiment, click on the module. There should be a "View output log" link now in the right-hand bar. I get something like the following:

[Information]         Started in [C:\temp]

[Information]         Running in [C:\temp]

[Information]         Executing 4af67c05ba02417a980f6a16e84e61dc with inputs [] and generating outputs ['.maml.oport1']

[Information]         Extracting Script Bundle.zip to .\Script Bundle

[Information]         File Name                                             Modified             Size

[Information]         temp.csv                                       2016-05-06 13:16:56           52

[Information]         [ READING ] 0:00:00

[Information]         ['4af67c05ba02417a980f6a16e84e61dc.py', 'Script Bundle', 'Script Bundle.zip']

This tells me that the contents of my zip file have been extracted to the C:\temp\Script Bundle folder. In my case the zip file contained just one CSV file, temp.csv: your output would probably have four files. You may also have zipped a folder containing your four files, in which case the filepath would be one layer deeper. You can use the os.listdir() to explore your directory structure further if necessary.

Once you think you know the full filepaths for your CSV files, edit your Execute Python Script module's code to load them, e.g.:

import pandas as pd

def azureml_main(dataframe1 = None, dataframe2 = None):

    df = pd.read_csv('C:/temp/Script Bundle/temp.csv')

    # ...load other files and merge into a single dataframe...

    return(df)

...