Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

I started my Python journey a few weeks ago. My main goal is to be able to automatize some aspects of my daily work routine, and in the future to dive more on Data Science. For now, I'm trying to make a script that will check if the values present on a certain column from sheet1 are also present on sheet2. I managed to get here:

I am new to data science and I am trying to compare two sheets and come to know whether sheet 1 values are present in sheet2. To know that I used the below code.

import pandas as pd

from pandas import ExcelWriter

from pandas import ExcelFile

pd.set_option('display.max_columns', 500)

df = pd.ExcelFile('C:\\Users\\Andre\\Desktop\\Scraps\\abacus.xlsx')

sheet1 = pd.read_excel(df, 'Sheet1')

sheet2 = pd.read_excel(df, 'Sheet2')

print(type(df))

print(sheet1)

print(sheet2)

df['Ref'] = df.lookup(df.index, df[sheet2['Ref']])

I know that my last line is incorrect, but it also states that 'ExcelFile' object has no attribute 'lookup', so I'm not managing to find a path to explore. Someone can point me a direction?

I know my code is not completely correct. It even throws me an error telling 

'ExcelFile' object has no attribute 'lookup'

Can anyone help me?

1 Answer

0 votes
by (36.8k points)

Use the code below:

df = pd.ExcelFile('C:\\Users\\Andre\\Desktop\\Scraps\\abacus.xlsx')

sheet1 = pd.read_excel(df, 'Sheet1')

sheet2 = pd.read_excel(df, 'Sheet2')

In the above code, sheet1 and sheet2 are the data frames which represent sheets.

Compare the sheet1 column to sheet2 column

In [1898]: sheet1                                                                                                                                                                                           

Out[1898]: 

   id_col1 id_col2  name  age  sex

0      101      1M   NaN   21  NaN

1      101      3M   NaN   21    M

2      102      1M  Mark   25  NaN

In [1899]: sheet2                                                                                                                                                                                           

Out[1899]: 

   id_col1 id_col2   name   age   sex

0      101      1M  Steve   NaN     M

1      101      2M    NaN   NaN     M

2      101      3M  Steve  25.0  None

3      102      1M    Ria  25.0     M

4      102      2M   Anie  22.0     F

The column id_col1 are present in both sheet1 and sheet2, so we can compare both the sheets.

In [1900]: sheet1['id_col1'].isin(sheet2['id_col1'])                                                                                                                                                        

Out[1900]: 

0    True

1    True

2    True

Name: id_col1, dtype: bool

You can use for loop to check all the columns

In [1902]: for col in sheet1.columns.tolist(): 

      ...:     print(sheet1[col].isin(sheet2[col])) 

      ...:                                                                                                                                                                                                  

0    True

1    True

2    True

Name: id_col1, dtype: bool

0    True

1    True

2    True

Name: id_col2, dtype: bool

0     True

1     True

2    False

Name: name, dtype: bool

0    False

1    False

2     True

Name: age, dtype: bool

0    False

1     True

2    False

Name: sex, dtype: bool

 If you are a beginner and want to know more about Data Science the do check out the Data Science course

Browse Categories

...