Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (47.6k points)
edited by

In this data file, the United States is broken up into four regions using the "REGION" column.

Create a query that finds the counties that belong to regions 1 or 2, whose name starts with 'Washington', and whose POPESTIMATE2015 was greater than their POPESTIMATE 2014.

This function should return a 5x2 DataFrame with the columns = ['STNAME', 'CITY NAME'] and the same index ID as the census_df (sorted ascending by index).

You'll find a description of my data in the following picture:-

enter image description here

def answer_eight():

    counties=census_df[census_df['SUMLEV']==50]

    regions = counties[(counties[counties['REGION']==1]) | (counties[counties['REGION']==2])]

    washingtons = regions[regions[regions['COUNTY']].str.startswith("Washington")]

    grew = washingtons[washingtons[washingtons['POPESTIMATE2015']]>washingtons[washingtons['POPESTIMATES2014']]]

    return grew[grew['STNAME'],grew['COUNTY']]

outcome = answer_eight()

assert outcome.shape == (5,2)

assert list (outcome.columns)== ['STNAME','CTYNAME']

print(tabulate(outcome, headers=["index"]+list(outcome.columns),tablefmt="orgtbl"))

ERROR

ValueError                                Traceback (most recent call last)

<ipython-input-77-546e58ae1c85> in <module>()

      6     return grew[grew['STNAME'],grew['COUNTY']]

      7 

----> 8 outcome = answer_eight()

      9 assert outcome.shape == (5,2)

     10 assert list (outcome.columns)== ['STNAME','CTYNAME']

<ipython-input-77-546e58ae1c85> in answer_eight()

      1 def answer_eight():

      2     counties=census_df[census_df['SUMLEV']==50]

----> 3     regions = counties[(counties[counties['REGION']==1]) | (counties[counties['REGION']==2])]

      4     washingtons = regions[regions[regions['COUNTY']].str.startswith("Washington")]

      5     grew = washingtons[washingtons[washingtons['POPESTIMATE2015']]>washingtons[washingtons['POPESTIMATES2014']]]

/opt/conda/lib/python3.5/site-packages/pandas/core/frame.py in __getitem__(self, key)

   1991             return self._getitem_array(key)

   1992         elif isinstance(key, DataFrame):

-> 1993             return self._getitem_frame(key)

   1994         elif is_mi_columns:

   1995             return self._getitem_multilevel(key)

/opt/conda/lib/python3.5/site-packages/pandas/core/frame.py in _getitem_frame(self, key)

   2066     def _getitem_frame(self, key):

   2067         if key.values.size and not com.is_bool_dtype(key.values):

-> 2068             raise ValueError('Must pass DataFrame with boolean values only')

   2069         return self.where(key)

   2070 

ValueError: Must pass DataFrame with boolean values only

I am clueless. Where am I going wrong?

Thanks

1 Answer

0 votes
by (41.4k points)
edited by

You should pass only the condition when you compare a series or column in df with a scalar to produce a boolean mask. Do not pass the df multiple times.

def answer_eight(): counties=census_df[census_df['SUMLEV']==50]

regions = counties[(counties['REGION']==1]) | (counties['REGION']==2])]

washingtons = regions[regions['COUNTY'].str.startswith("Washington")]

grew = washingtons[washingtons['POPESTIMATE2015']>washingtons['POPESTIMATES2014']] return grew[['STNAME','COUNTY']]

If you wish to learn more about Data Science, visit the Data Science tutorial and Data Science courses by Intellipaat.

31k questions

32.9k answers

507 comments

693 users

...