Back

Explore Courses Blog Tutorials Interview Questions
0 votes
1 view
in Data Science by (47.6k points)
edited by

In this data file, the United States is broken up into four regions using the "REGION" column.

Create a query that finds the counties that belong to regions 1 or 2, whose name starts with 'Washington', and whose POPESTIMATE2015 was greater than their POPESTIMATE 2014.

This function should return a 5x2 DataFrame with the columns = ['STNAME', 'CITY NAME'] and the same index ID as the census_df (sorted ascending by index).

You'll find a description of my data in the following picture:-

enter image description here

def answer_eight():

    counties=census_df[census_df['SUMLEV']==50]

    regions = counties[(counties[counties['REGION']==1]) | (counties[counties['REGION']==2])]

    washingtons = regions[regions[regions['COUNTY']].str.startswith("Washington")]

    grew = washingtons[washingtons[washingtons['POPESTIMATE2015']]>washingtons[washingtons['POPESTIMATES2014']]]

    return grew[grew['STNAME'],grew['COUNTY']]

outcome = answer_eight()

assert outcome.shape == (5,2)

assert list (outcome.columns)== ['STNAME','CTYNAME']

print(tabulate(outcome, headers=["index"]+list(outcome.columns),tablefmt="orgtbl"))

ERROR

ValueError                                Traceback (most recent call last)

<ipython-input-77-546e58ae1c85> in <module>()

      6     return grew[grew['STNAME'],grew['COUNTY']]

      7 

----> 8 outcome = answer_eight()

      9 assert outcome.shape == (5,2)

     10 assert list (outcome.columns)== ['STNAME','CTYNAME']

<ipython-input-77-546e58ae1c85> in answer_eight()

      1 def answer_eight():

      2     counties=census_df[census_df['SUMLEV']==50]

----> 3     regions = counties[(counties[counties['REGION']==1]) | (counties[counties['REGION']==2])]

      4     washingtons = regions[regions[regions['COUNTY']].str.startswith("Washington")]

      5     grew = washingtons[washingtons[washingtons['POPESTIMATE2015']]>washingtons[washingtons['POPESTIMATES2014']]]

/opt/conda/lib/python3.5/site-packages/pandas/core/frame.py in __getitem__(self, key)

   1991             return self._getitem_array(key)

   1992         elif isinstance(key, DataFrame):

-> 1993             return self._getitem_frame(key)

   1994         elif is_mi_columns:

   1995             return self._getitem_multilevel(key)

/opt/conda/lib/python3.5/site-packages/pandas/core/frame.py in _getitem_frame(self, key)

   2066     def _getitem_frame(self, key):

   2067         if key.values.size and not com.is_bool_dtype(key.values):

-> 2068             raise ValueError('Must pass DataFrame with boolean values only')

   2069         return self.where(key)

   2070 

ValueError: Must pass DataFrame with boolean values only

I am clueless. Where am I going wrong?

Thanks

1 Answer

0 votes
by (41.4k points)
edited by

You should pass only the condition when you compare a series or column in df with a scalar to produce a boolean mask. Do not pass the df multiple times.

def answer_eight(): counties=census_df[census_df['SUMLEV']==50]

regions = counties[(counties['REGION']==1]) | (counties['REGION']==2])]

washingtons = regions[regions['COUNTY'].str.startswith("Washington")]

grew = washingtons[washingtons['POPESTIMATE2015']>washingtons['POPESTIMATES2014']] return grew[['STNAME','COUNTY']]

If you wish to learn more about Data Science, visit the Data Science tutorial and Data Science courses by Intellipaat.

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

28.4k questions

29.7k answers

500 comments

94k users

Browse Categories

...