When writing SQL queries, you may encounter the common practice of using SELECT * to retrieve all columns from a table. This approach is easy but can lead to significant performance, security, and maintainability issues, especially when databases contain more data. In this blog, let us explore why SELECT * is harmful and other alternatives to improve performance.
Table of Contents:
Why is SELECT * Used in SQL?
SELECT * is commonly used in SQL to display all columns from a table without listing each column individually. It provides a shortcut way of querying all available data, which makes it convenient while doing operations on databases
Let us create a table called your_table and insert some values into it, which can be used as an example to demonstrate why SELECT * is harmful in SQL.
CREATE TABLE your_table (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(50),
age INT
);
INSERT INTO your_table (name, age)
SELECT
CONCAT('User', id) AS name,
FLOOR(18 + (RAND() * 42)) AS age
FROM
(SELECT @row := @row + 1 AS id FROM
(SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4) t1,
(SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4) t2,
(SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4) t3,
(SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4) t4,
(SELECT @row := 0) init) temp
LIMIT 1000;
SELECT * FROM your_table
Output:
The table will have 1k rows…
Reason Why Select * is Considered Harmful?
Let us explore problems that arise while using SELECT * in SQL
Problem 1: Unnecessary I/O
Unnecessary I/O refers to any input or output operation like reading or writing data that is not important to the user.
Example:
The above-created table called your_table contains 1000 rows which leads to inefficient data retrieval which increases disk reads and takes a long query execution time so using SELECT * you are retrieving all columns even if you do not need all that data.
Problem 2: Increased Network Traffic
Increased network traffic occurs when more data than necessary is transmitted over a network which leads to overcrowded or blocking of data, slower speeds, and higher cost
Example:
In the above created your_table, using SELECT * to fetch unnecessary data impacts your database and also puts stress on your network, while querying large amounts of data leads to slowing down of applications if they rely on real-time data
Problem 3: More Memory Usage
When a program is consuming a larger part of the computer’s RAM it leads to slower performance, system instability, or even crashes.
Example:
In the above created your_table, executing SELECT * retrieves unnecessary columns which increases the memory usage of your computer. If your computer loads a large amount of data into memory, unnecessary columns will increase memory consumption which slow down your application.
Problem 4: Security Risks
Security Risks are accidental exposure of sensitive data in the table to unauthorized users.
Example:
If your_table also contained columns like security_number or password_number and someone runs the SELECT *, they could retrieve this sensitive information causing a security breach.
Problem 5: Dependency on Column Order in Result
Depending on the specific column order in the query’s result sets a dependency. Changing database structure like adding or reordering columns can break applications that assume a fixed order. This practice reduces code maintainability and increases the risk of errors.
Example:
If a new column of phone_number is added to your_table, SELECT * will return it, and if the application isn’t expecting this new column it causes errors.
Problem 6: Readability Concerns
Readability concerns are concerns that make queries harder to understand. Anyone reading the query would have to inspect the table structure to know what data is being fetched making code harder to maintain.
Example:
Suppose you are fetching only name and age from your_table using SELECT *, it’s unclear to anyone reading the query what the real output looks like.
Troubleshooting Tips
- Avoid using SELECT * for retrieving the columns. instead, specify the specific columns you need. This will improve performance by fetching only the required data.
- Mention the columns you need: This minimizes the amount of data being retrieved which results in increasing disk I/O and memory usage.
- If you are working with a large dataset, use LIMIT and OFFSET to keep and check on results and avoid fetching the entire table at once.
- Explicitly Select only necessary columns. This will reduce the chance of exposing data.
- If your query is long and contains multiple columns you can use comments to explain your query operations.
- Use Indexing to index your queried columns as this reduces the time taken to retrieve your column.
Aspects | SELECT * | Specific Column Selection |
Data Retrieval | Retrieves all columns from the table, even if not needed | Retrieves only specific columns that ignore data not needed by to user |
Performance | Performance is slow as more data is fetched which leads to slower operation of queries | Performance is high as only required data is being fetched resulting in faster operation of queries |
Resource Consumption | Consumption of resources are high due to high memory usage and network usage. | Consumption of resources are low due to less memory usage and network usage. |
Network Traffic | High network traffic as the entire result is sent over the network including unnecessary columns. | Lower network traffic as only the required columns are sent over the network. |
Readability | Lower readability as the query doesn’t specify which columns are needed. | Higher readability as the query specifies the required column. |
Security | Less security as exposing sensitive data. | High security as only necessary columns are exposed. |
Flexibility | Lower Flexibility as even unnecessary columns are selected which can affect performance. | High Flexibility as you can select only specified columns that are needed which increases performance. |
Best Practices
- Specify Columns: You need to mention the columns you need in your queries as it helps in reducing unnecessary data fetching and improves performance.
- Use Aliases When working with multiple tables you can use aliases or short names given to a table for better readability.
- Avoid SELECT *:. When there are structural changes in your table it is always better to define the specific columns you need externally.
- Use Index: Make sure you are indexing columns that are used in WHERE, JOINS, or ORDER BY clauses
- Limit the Use of Subqueries: Reduce the use of nested subqueries and rewrite the nested queries by using JOIN operations for better performance.
Conclusion
It is very easy to use SELECT * with small tables but there is also a risk of using it with large tables as it affects the processing time of the query and overall performance of the query using best practices can help you to be safe from this situation and can lead to better performance
To learn more about SQL functions, check out this SQL course and also explore SQL Interview Questions prepared by industry experts.
Why Is SELECT * Considered Harmful in SQL? – FAQs
1. Why is it bad to use SELECT *?
SELECT * is bad to use as you retrieve unnecessary data that will be of no use.
2. What does SELECT * do in SQL?
SELECT * retrieves all columns in your table, even which you don’t need.
3. What is the difference between SELECT * and SELECT?
The main difference between SELECT * and SELECT is that- SELECT * retrieves all the columns while SELECT 1 will retrieve all the rows showing 1 value in one column.
4. Is SELECT * a DDL or DML statement?
SELECT *is a DML (Data Manipulation Language) statement.
5. What is the use of * in the SELECT query?
An asterisk “*” is used in the query to return all columns that exist in the table.