zaro

How to Apply a Function to a Column in Pandas

Published in Pandas Column Manipulation 4 mins read

Applying a function to a specific column in pandas is a common data manipulation task. The most straightforward and widely used approach is to select the column (which is a pandas Series) and then use the .apply() method on that Series.

To apply a function to a column in a pandas DataFrame, you typically select the column by its name and then use the .apply() method available for pandas Series. This method will pass each element of the column to your specified function.

As mentioned in the reference, the apply() method in pandas is a powerful tool that allows you to apply a function along an axis of a DataFrame or Series. While the reference notes you can "pass the name of the column as an argument to the apply() method" (referring perhaps to the DataFrame's apply with subset or applying a function that operates on a Series), the most direct way to process each value within a specific column is to select the column first.

Here's the breakdown:

  1. Select the target column: Access the column using square bracket notation (e.g., df['Column_Name']). This returns a pandas Series.
  2. Call the .apply() method: Use .apply(your_function) on the selected Series. your_function will be executed for each element in that Series.

Practical Example

Let's say you have a DataFrame with a 'Price' column and you want to apply a discount function or convert temperatures in a 'Temperature_F' column to Celsius.

import pandas as pd

# Create a sample DataFrame
data = {'Product': ['A', 'B', 'C', 'D'],
        'Price': [100, 150, 200, 50],
        'Temperature_F': [32, 50, 68, 86]}
df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)

# --- Method 1: Using a defined function ---

def apply_discount(price):
  """Applies a 10% discount if price is over 100, otherwise no discount."""
  if price > 100:
    return price * 0.9
  return price

# Apply the function to the 'Price' column
df['Price_After_Discount'] = df['Price'].apply(apply_discount)

# --- Method 2: Using a lambda function ---

# Function to convert Fahrenheit to Celsius
# Formula: (F - 32) * 5/9
df['Temperature_C'] = df['Temperature_F'].apply(lambda f: (f - 32) * 5/9)

print("\nDataFrame after applying functions:")
print(df)

In this example:

  • We selected the 'Price' column (df['Price']) and applied the apply_discount function to each value.
  • We selected the 'Temperature_F' column (df['Temperature_F']) and applied a lambda function to convert Fahrenheit to Celsius for each value.

The results are assigned to new columns, but you could also overwrite the existing column if desired (df['Price'] = df['Price'].apply(apply_discount)).

Function Types for apply()

You can use various types of functions with .apply() on a Series:

  • Defined Functions: Use a standard Python def function, as shown with apply_discount. This is useful for more complex logic.
  • Lambda Functions: Use an anonymous lambda function for simple, inline operations, as shown with the temperature conversion.
  • Built-in Functions: Apply built-in functions like str.lower for string manipulation.
# Example applying a built-in string method
df['Product_Lower'] = df['Product'].apply(str.lower)
print("\nDataFrame with Product in lowercase:")
print(df)

When to Use apply() vs. Vectorization

While .apply() is flexible and easy to use, especially with complex row-wise or element-wise logic that isn't easily "vectorized," it's generally not the most performant option for simple mathematical operations or string manipulations on large datasets.

  • Vectorized Operations: For simple operations (addition, subtraction, multiplication, division, comparisons, common string methods like .str.lower(), .str.contains()), pandas offers highly optimized vectorized operations. These are much faster than .apply().
    • Example: df['Price_Doubled'] = df['Price'] * 2 (Vectorized) vs. df['Price'].apply(lambda x: x * 2) (apply)
    • Example: df['Product_Upper'] = df['Product'].str.upper() (Vectorized) vs. df['Product'].apply(lambda x: x.upper()) (apply)
  • apply() Method: Best suited for applying functions that cannot be easily expressed using vectorized operations or built-in pandas/NumPy functions. This includes functions with conditional logic across elements, or when interacting with external libraries row by row.


Method Description Use Case Performance on Large Data
df['col'].apply() Applies a function to each element of a selected column (Series). Complex element-wise logic, calling external libraries, complex conditions. Slower
Vectorized Ops Using pandas/NumPy built-in operations directly on the column (Series). Simple math, comparisons, common string manipulations, aggregation. Faster
df.apply(..., axis=...) Applies a function to rows (axis=1) or columns (axis=0) of the DataFrame. Applying functions that operate on entire rows or columns as units. Varies, often slower than vectorized

In summary, to apply a function to a specific column element by element, select the column as a Series and use the .apply() method. For better performance on simple tasks, investigate if a vectorized operation is available.