Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

I am trying to find a more pandorable way to get all rows of a DataFrame past a certain value in the a certain column (the Quarter column in this case).

I want to slice a DataFrame of GDP statistics to get all rows past the first quarter of 2000 (2000q1). Currently, I'm doing this by getting the index number of the value in the GDP_df["Quarter"] column that equals 2000q1 (see below). This seems way too convoluted and there must be an easier, simpler, more idiomatic way to achieve this. Any ideas?

Current Method:

def get_GDP_df():

    GDP_df = pd.read_excel(

        "gdplev.xls", 

        names=["Quarter", "GDP in 2009 dollars"], 

        parse_cols = "E,G", skiprows = 7)

    year_2000 = GDP_df.index[GDP_df["Quarter"] == '2000q1'].tolist()[0]

    GDP_df["Growth"] = (GDP_df["GDP in 2009 dollars"]

        .pct_change()

        .apply(lambda x: f"{round((x * 100), 2)}%"))

    GDP_df = GDP_df[year_2000:]

    return GDP_df

Output:

image

Also, after the DataFrame has been sliced, the indices now start at 212. Is there a method to renumber the indices so they start at 0 or 1?

1 Answer

0 votes
by (41.4k points)

Use the following code for more simplicity:

year_2000 = (GDP_df["Quarter"] == '2000q1').idxmax()

GDP_df["Growth"] = (GDP_df["GDP in 2009 dollars"]

  .pct_change()

  .mul(100)

  .round(2)

  .apply(lambda x: f"{x}%"))

return GDP_df.loc[year_2000:]

Browse Categories

...