Back

Explore Courses Blog Tutorials Interview Questions
0 votes
4 views
in Data Science by (17.6k points)

I am using pandas as a db substitute as I have multiple databases (oracle, mssql, etc) and I am unable to make a sequence of commands to a SQL equivalent.

I have a table loaded in a DataFrame with some columns:

YEARMONTH, CLIENTCODE, SIZE, .... etc etc

In SQL, to count the amount of different clients per year would be:

SELECT count(distinct CLIENTCODE) FROM table GROUP BY YEARMONTH;

And the result would be

201301    5000

201302    13245

How can I do that in pandas?

2 Answers

0 votes
by (41.4k points)

Use this:

table.groupby('YEARMONTH').CLIENTCODE.nunique()

Example:

In [2]: table

Out[2]: 

   CLIENTCODE  YEARMONTH

0           1     201301

1           1     201301

2           2     201301

3           1     201302

4           2     201302

5           2     201302

6           3     201302

In [3]: table.groupby('YEARMONTH').CLIENTCODE.nunique()

Out[3]: 

YEARMONTH

201301       2

201302       3

0 votes
by (106k points)

To count the pandas equivalent is much simple, let's say your dataframe name is daat and column name is YEARMONTH

daat.YEARMONTH.value_counts()

Related questions

Browse Categories

...