I'm using the first 100 lines from The Enron Email Dataset for my experiment in Azure ML Studio, however, the Saved Dataset object is being populated with odd 4.8K lines instead of 100. That must be due to "Inaccurate column separation on string data containing commas" issue, which I understand.
However, using the same dataset in the Python project locally and/or in Azure ML Jupyter notebook (same imported dataset from ML Studio - not separately imported to Jupyter notebook) the number of lines is being read correctly and the further logic does also work fine.
from azureml import Workspace
ws = Workspace()
ds = ws.datasets['The Enron Email Dataset (Minimal)']
emails_df = ds.to_dataframe()
import pandas as pd
emails_df = pd.read_csv('C:/enron-email-dataset/emails.csv', nrows=100)
And here is how dataset visualization looks like in Azure ML Studio
It's clear that it gets messed up after it gets moved from saved datasets to an experiment, but my question is - what would be the best way to work around it? Calling dataset from Azure BLOB Storage inside my Python code perhaps?