zaro

What is Union vs Intersection in Python?

Published in Python Set Operations 4 mins read

In Python, union and intersection are fundamental set operations used to combine or compare collections of unique elements. While union gathers all distinct elements from two or more sets, intersection identifies only the elements that are common to all sets involved.

These operations are primarily performed on Python's built-in set data type, which is an unordered collection of unique, immutable objects. Understanding them is crucial for efficient data manipulation and analysis.

Understanding Python Sets

Before diving into union and intersection, it's important to grasp what Python sets are. A set is like a mathematical set; it does not allow duplicate elements and is unordered. This uniqueness property makes sets ideal for performing operations like union, intersection, and difference.

# Creating a set
my_set = {1, 2, 3, 4, 5}
another_set = {4, 5, 6, 7, 8}

print(f"Set 1: {my_set}")
print(f"Set 2: {another_set}")

Union in Python

The union of two or more sets results in a new set containing all the elements from both sets (as per the reference). Essentially, it combines all unique elements from the participating sets into a single new set, discarding any duplicates.

How Union Works

Imagine you have two groups of friends. The union operation would create a new list that includes every unique friend from both groups, without listing anyone twice.

Syntax and Examples

Python offers two primary ways to perform a union:

  1. Using the union() method:

    set_a = {1, 2, 3, 4}
    set_b = {3, 4, 5, 6}
    
    # Using the union() method
    union_set_result = set_a.union(set_b)
    print(f"Union using method: {union_set_result}")
    # Expected output: {1, 2, 3, 4, 5, 6}
  2. Using the | operator (Bitwise OR): This is a more concise and often preferred way.

    set_c = {'apple', 'banana', 'orange'}
    set_d = {'banana', 'grape', 'kiwi'}
    
    # Using the | operator
    union_operator_result = set_c | set_d
    print(f"Union using operator: {union_operator_result}")
    # Expected output: {'grape', 'kiwi', 'apple', 'banana', 'orange'} (order may vary)

Intersection in Python

The intersection of two or more sets results in a new set containing only the elements two sets have in common (as per the reference). It identifies the elements that are present in all the sets involved in the operation.

How Intersection Works

Consider two lists of skills. An intersection operation would tell you which skills are possessed by individuals in both lists.

Syntax and Examples

Python provides two main ways to find the intersection:

  1. Using the intersection() method:

    set_x = {10, 20, 30, 40}
    set_y = {30, 40, 50, 60}
    
    # Using the intersection() method
    intersection_set_result = set_x.intersection(set_y)
    print(f"Intersection using method: {intersection_set_result}")
    # Expected output: {40, 30} (order may vary)
  2. Using the & operator (Bitwise AND): This is a more concise and commonly used approach.

    set_fruits_1 = {'apple', 'banana', 'orange', 'grape'}
    set_fruits_2 = {'banana', 'kiwi', 'grape', 'pineapple'}
    
    # Using the & operator
    intersection_operator_result = set_fruits_1 & set_fruits_2
    print(f"Intersection using operator: {intersection_operator_result}")
    # Expected output: {'grape', 'banana'} (order may vary)

Key Differences Summarized

Here's a table comparing union and intersection operations in Python:

| Feature | Union (| or .union()) | Intersection (& or .intersection()) |
| :----------- | :----------------------------------------------------------- | :-------------------------------------------------------------------- |
| Definition | Combines all the elements from both sets, removing duplicates. | Finds elements two sets have in common. |
| Result | A new set containing every unique element from all input sets. | A new set containing only the elements present in all input sets. |
| Size | Typically larger than or equal to the largest input set. | Typically smaller than or equal to the smallest input set. |
| Analogy | "Everyone who attended either party A or party B." | "People who attended both party A and party B." |

Related Set Operation: Difference

While not the primary focus of "union vs intersection," the reference also mentions difference: "Elements present on one set, but not on the other."

Python's difference() method or - operator allows you to find elements that are unique to the first set when compared to the second.

set_nums_1 = {1, 2, 3, 4, 5}
set_nums_2 = {4, 5, 6, 7, 8}

# Elements in set_nums_1 but not in set_nums_2
difference_result = set_nums_1.difference(set_nums_2)
print(f"Difference (set1 - set2): {difference_result}")
# Expected output: {1, 2, 3}

# Elements in set_nums_2 but not in set_nums_1
difference_result_reverse = set_nums_2 - set_nums_1
print(f"Difference (set2 - set1): {difference_result_reverse}")
# Expected output: {8, 6, 7} (order may vary)

Practical Insights and Applications

Understanding union and intersection is vital for:

  • Data Cleaning: Identifying unique records (union) or finding duplicate entries across datasets (intersection).
  • Database Operations: Simulating SQL UNION and INTERSECT clauses.
  • Feature Engineering: Combining or filtering features in machine learning datasets.
  • Permissions and Access Control: Determining combined user permissions (union) or common access rights (intersection).
  • Network Analysis: Finding common nodes between different sub-networks.

By leveraging these efficient set operations, Python programmers can write cleaner, more performant code for managing and analyzing collections of unique data.