How to Change Column Data Types in Pandas?

Changing the data type of columns of a DataFrame in pandas is one of the fundamental steps in data preprocessing. Whether you are working with numerical type conversions, data handling, or aiming to maximise memory space, choosing the appropriate data type guarantees correct analysis and optimal computation. This article will discuss various ways to alter data types in Pandas.

Table of Contents:

What is a Data Type in Pandas?
Methods to Change the Datatype of a Single Column in Pandas
Method to Change the Datatype of Multiple Columns in Pandas
Efficient Data Type Conversion in Pandas
Conclusion

What is a Data Type in Pandas?

A data type in Python specifies the type of data contained in a column, e.g., integers, floats, strings, or dates. An appropriate choice of data type maximises memory and processing efficiency. For instance, an int32 data type takes 4 bytes per value, while int64 takes 8 bytes per value, and string is an object data type that occupies approximately 50-100 bytes plus the metadata, rather than storing the same amount of memory for every object based on the data type.

To change column type in Pandas. You can change column type in Pandas in two ways:

In Pandas change data type of all the columns together.
In Pandas change data type of a single column separately.

Methods to Change the Data Type of a Single Column in Pandas

There are various methods to change the data type of a single column in a DataFrame using Pandas. You can use the .astype() function of Python to change the data type to any other specific data type. There is the pd.to_numeric() function to change the data type into numerics. Finally, pd.to_datetime() is a function that changes the data type into DateTime.

Advance Your Career with Python – Start Learning Now!!

Join thousands of learners building in-demand skills.

Explore Program

Method 1: Using .astype() function in Pandas

The .astype() function is applied specifically to change the type of one column to a specific type. It’s effective and easy to use when you are certain the conversion will be successful, i.e., from numeric strings to int or float. If there are conflicting values (i.e., a string where there is a numeric column), it will throw an error.

When to use: To convert a column to an int in pandas, use .astype() when you need to force a specific data type for a column and know that the data is uniform.

Example:

Python

Output:

Explanation: Here, the data type of col1 was successfully converted to an integer data type from an object data type.

Method 2: Using pd.to_numeric() method in Pandas

The pd.to_numeric() is a more robust data type conversion function and comes with exception handling built in. It would be used most frequently when there’s a mixed collection of values in a column, where some can be converted to a number (’10’) and others are completely invalid (‘invalid’).

When to use: This method is ideal when working with datasets that may have noise or errors in numeric columns.

Example:

Python

Output:

Explanation: Here, the data type of col1 was converted to int64 because that is the largest size supported by the local system.

Method 3: Using pd.to_datetime() in Pandas

The pd.to_datetime() function is meant to convert string or numeric columns into datetime objects. It’s widely applied in dealing with time-series data, e.g., logs, purchase history, or event timestamps. pd.to_datetime() is flexible, and it supports multiple date formats. It can also deal with non-date or invalid strings by converting them into NaT (Not a Time).

When to use: Use this approach when you have to use the column to execute date-based operations, such as filtering by date intervals or aggregating by time intervals.

Example:

Python

Output:

Explanation: Here, the data type of the date column was successfully changed into a datetime object

Method to Change the Data Type of Multiple Columns in Pandas

The convert_dtypes() method automatically analyses all the columns in the DataFrame and converts them to the most suitable data types, e.g., from integer-like objects to integers or object-like strings to categorical types based on memory availability, size of data, and information in the rows. It is helpful if you want to maximise memory efficiency and have each column be allocated the most efficient data type possible according to its contents. Use this approach when you require a rapid and implicit conversion without having to explicitly specify types for every column.

Example:

Python

Output:

Explanation: Here, instead of changing the data type of each column one by one, the function did it all at once.

Get 100% Hike!

Master Most in Demand Skills Now!

Efficient Data Type Conversion in Pandas

Until now, we have learnt about various methods to change the data type of columns using Pandas. One of these methods, pd.to_numeric(), can take extra parameters as arguments to make the conversion of data types in various columns even more efficient and flexible to errors. These parameters are ‘error=’ and ‘downcast=’. Let us explore both of these concepts in detail.

Error Handling by .to_numeric() function in Pandas

The column might have some values that cannot be converted to numbers, for example, string data like ‘intellipaat.’ If we use pd.to_numeric() to convert these values into numeric, it might throw an error. To prevent this error, pd.to_numeric() also takes an error argument that allows you to force non-numeric values to be NaN or simply ignore columns containing these values.

The error parameter can take the following values:

errors=’ignore’ keeps the original values unchanged.
errors=’raise’ (default) raises an error if the conversion fails.
errors=’coerce’ forces invalid values to NaNs.

Example:

Python

Output:

Explanation: Here, the code demonstrates what happens when we give different values, ‘coerce’ and ‘ignore,’ to the errors argument.

Downcasting in Pandas

Downcasting is nothing but minimizing the size of numeric types (such as int64 to int8) to conserve memory. pd.to_numeric() defaults to using the largest numeric type for the conversion. But if memory usage is critical, downcasting lets you force a smaller type. This is especially helpful with large datasets where memory usage needs to be optimised and you know the values will be contained in the smaller data type range.

Example:

Python

Output:

Explanation: Here, we downcasted the data type of the column. We only had to store integers from 1 to 4. Using the int64 data type was not necessary and wasted memory space.

Using category dtype for memory optimization

Converting string columns with repeated values to category reduces memory usage by using the category dtype for memory optimisation.

Python

Output:

Explanation: Converts ‘city’ strings to category dtype to save memory.

Changing Data Types Conditionally using .apply() ⁣or⁣ .loc[]

Modify column data types based on specific conditions using .apply() or .loc[].

Example:

Python

Output:

Changing Data Types Conditionally using .apply() ⁣or⁣ .loc[]

Explanation: Changes ‘numeric_column’ strings to integers only where values exist, preserving nulls.

Comparing memory usage before and after type conversion

Analyse memory usage to validate optimisation impact.

Example:

Python

Output:

convert column to int in pandas type conversion

Explanation: Prints memory usage before and after so you can compare the effect.

Kickstart Your Coding Journey with Python – 100% Free

Beginner-friendly. No cost. Start now.

Explore Program

Conclusion

Changing the column data type in pandas is a crucial skill in effective data preprocessing in Python. Whether you need to transform a single column or multiple columns, methods like .astype(), pd.to_numeric(), and pd.to_datetime() carry it out within a single line of code. They also provide other functionalities like error handling and memory optimisation that make data conversion simpler and easier to implement. All you need to do is include an argument as the parameter of the function. Thus, you gained knowledge of these methods, which guarantee effective performance and precision when working with data analysis.
To take your skills to the next level, check out this Python training course and gain hands-on experience. Also, prepare for job interviews with Python interview questions prepared by industry experts.

The following resources cover all the basics you need to start with Python programming.

Getting the index of a row in a pandas apply function – Understand row indexing techniques with pandas apply usage.

How to select rows from a DataFrame based on column values – Efficient row selection based on DataFrame column data.

Check whether a file exists without exceptions – Check file status in Python using safe, exception-free methods.

Converting strings to datetime objects – Work with datetime parsing from string input in Python.

Delete a column from a pandas DataFrame – Remove specific columns in pandas with simple code.

Checking if a string is an integer or float – Methods to distinguish int and float strings in Python.

Delete an element from a dictionary – Efficiently delete dictionary entries in Python.

Getting the class name of an instance – How to retrieve class info from an object in Python.

Changing column data types in Pandas – FAQs

Q1. How to change the column data type to int in Pandas?

You can use .astype(int) or the pd.to_numeric function to convert the data type to int.

Q2. How to change multiple columns data types in Pandas?

To change the data type of multiple columns in Pandas, use the pd.convert_datatypes() function of Pandas.

Q3: How do I change the data type of some values within a column?

Use .loc[] or .apply() to selectively change data types in a column.

Q4: How to change the datatype of a column to strings in Pandas?

You can use .astype(str) to change the datatype of a column to a string.

Q5: How to change column datatype in Pandas?

You can change the datatype of Pandas using .astype(), pd.to_numeric(), pd.to_datetime() or pd.convert_datatypes() functions.

Q6. How to convert string to datetime in pandas?

To convert a string to a datetime in pandas, you can use this command: pd.to_datetime(your_string_column).

How to Change Column Data Types in Pandas?

What is a Data Type in Pandas?

Methods to Change the Data Type of a Single Column in Pandas

Method 1: Using .astype() function in Pandas

Method 2: Using pd.to_numeric() method in Pandas

Method 3: Using pd.to_datetime() in Pandas

Method to Change the Data Type of Multiple Columns in Pandas

Efficient Data Type Conversion in Pandas

Error Handling by .to_numeric() function in Pandas

Downcasting in Pandas

Using category dtype for memory optimization

Changing Data Types Conditionally using .apply() ⁣or⁣ .loc[]

Comparing memory usage before and after type conversion

Conclusion

Changing column data types in Pandas – FAQs

About the Author