Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

I am in the jupyter notebook on the Windows PC. I have read in the dataframe I am calling tran using the below code:

tran = pd.read_csv("https://raw.githubusercontent.com/m1ngle/TRCount/main/TRCountUS.csv")

When I look at the data types for my dataframe, it works great

tran.dtypes

FIPS             int64

State           object

YMTF             int64

MTFPer         float64

YFTM             int64

FTMPer         float64

YNB              int64

NBPer          float64

YTR              int64

YTRper         float64

NoTR             int64

NoTRPer        float64

DK               int64

DKPer          float64

DNAns            int64

DNAPer         float64

TotSurveyed      int64

StatePop         int64

TRPop            int64

dtype: object

But when I try to work with the float64 columns I get an error as shown below:

tran['MTFPer'].dtype

KeyError                                  Traceback (most recent call last)

~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)

   2645             try:

-> 2646                 return self._engine.get_loc(key)

   2647             except KeyError:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'MTFPer'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)

<ipython-input-85-2bd9c012f223> in <module>

----> 1 tran['MTFPer'].dtype

~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)

   2798             if self.columns.nlevels > 1:

   2799                 return self._getitem_multilevel(key)

-> 2800             indexer = self.columns.get_loc(key)

   2801             if is_integer(indexer):

   2802                 indexer = [indexer]

~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)

   2646                 return self._engine.get_loc(key)

   2647             except KeyError:

-> 2648                 return self._engine.get_loc(self._maybe_cast_indexer(key))

   2649         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

   2650         if indexer.ndim > 1 or indexer.size > 1:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'MTFPer'

But this error does not occur when I work with the int64 dtypes.

tran['YMTF'].dtype

dtype('int64')

1 Answer

0 votes
by (36.8k points)

While reading the .csv file, pandas also read with the whitespace that came along with columns. When I did the tran[' MTFPer '].dtype instead of the tran['MTFPer'].dtype, pandas gave me correct answer.

You need to clean up the column names like this:

tran.columns = [c.strip() for c in tran.columns]

If you are a beginner and want to know more about Python the do check out the python for data science course 

Browse Categories

...