zaro

What is UNION operation?

Published in SQL Operations 4 mins read

The UNION operation in SQL is a powerful set operator used to combine the result sets of two or more SELECT statements into a single result set. This allows you to consolidate data from different tables or different queries that have a similar structure, providing a unified view of the information.

Understanding the Core Functionality

At its heart, the UNION operator works by vertically stacking rows from multiple queries. For the UNION operation to succeed and produce meaningful results, specific rules must be strictly followed:

  • Same Number of Columns: Every SELECT statement involved in the UNION must retrieve the exact same number of columns.
  • Similar Data Types: The corresponding columns in each SELECT statement must have compatible or similar data types. For instance, if the first column in the first SELECT statement is a text type, the first column in the second SELECT statement should also be a text type (e.g., VARCHAR, TEXT).
  • Same Order of Columns: The columns in each SELECT statement must appear in the same order. Even if data types are compatible, their position matters for correct merging.

The UNION operator automatically removes duplicate rows from the final result set, presenting only distinct records.

SQL UNION Syntax

The basic syntax for using the UNION operator is straightforward:

SELECT column1, column2, ...
FROM table1
WHERE condition1
UNION
SELECT column1, column2, ...
FROM table2
WHERE condition2;

You can combine more than two SELECT statements by simply adding more UNION keywords between them.

Key Characteristics and Rules

When working with UNION, consider these important aspects:

  • Duplicate Removal: By default, UNION eliminates duplicate rows. If both SELECT statements return identical rows, only one instance of that row will appear in the final combined result.

  • Column Aliases: Column names in the final result set are typically taken from the first SELECT statement. If you want specific column names, define aliases in the first SELECT statement.

  • ORDER BY Clause: An ORDER BY clause can be used to sort the entire combined result set, but it must be placed at the very end of the last SELECT statement.

    SELECT FirstName, LastName FROM Customers
    UNION
    SELECT EmployeeName, EmployeeSurname FROM Employees
    ORDER BY FirstName; -- Applies to the combined result

UNION vs. UNION ALL

It's crucial to understand the distinction between UNION and UNION ALL, as they serve slightly different purposes:

Feature UNION UNION ALL
Duplicate Rows Automatically removes duplicate rows. Includes all rows, including duplicates.
Performance Generally slower due to the duplicate check overhead. Generally faster as it doesn't check for duplicates.
Use Case When you need a distinct list of combined records. When you need all records, even if they are repeated across queries.

Practical Applications

The UNION operation is incredibly useful in various scenarios:

  • Consolidating Reports: Merging customer lists from different sales regions or product catalogs from various departments into a single, cohesive report.
  • Cross-Table Queries: Combining similar data (e.g., names, IDs, dates) from tables that represent different entities but share common attributes.
  • Data Archiving: Combining current data with archived data to get a complete historical view.
  • Simplified Data Retrieval: Avoiding complex JOIN operations when data resides in separate tables but needs to be presented uniformly.

For example, if you have customer contact information split between an OnlineCustomers table and a StoreCustomers table, you could use UNION to get a single list of all customer emails:

SELECT EmailAddress FROM OnlineCustomers
WHERE SubscriptionStatus = 'Active'
UNION
SELECT Email FROM StoreCustomers
WHERE OptInMarketing = TRUE;

This would provide a distinct list of all active online customers and store customers who have opted into marketing, removing any duplicates.

The UNION operator is a fundamental tool in relational database management systems for effectively combining and managing data across different query results.