zaro

How to calculate proximity matrix?

Published in Proximity Matrix Calculation 3 mins read

To calculate a proximity matrix, you determine the distance between every pair of objects in your dataset using a distance function.

A proximity matrix is a square matrix that stores the pairwise similarity or dissimilarity (distance) between all objects in a dataset. It's a fundamental step in various data analysis techniques, such as clustering.

Calculating the Proximity Matrix

The core of calculating a proximity matrix involves applying a distance function between each possible pair of objects in your dataset.

Here’s how it works:

  1. Identify Your Objects: Let's say you have a dataset with n objects (e.g., data points, samples, items).
  2. Choose a Distance Function: Select an appropriate mathematical function that quantifies the difference or distance between two objects. The Euclidean distance function is commonly used for this operation, as mentioned in the reference. Other options include Manhattan distance, Minkowski distance, or specialized metrics depending on the data type (e.g., cosine similarity for text data).
  3. Calculate Pairwise Distances: For every unique pair of objects (object i and object j, where i is not equal to j), calculate the distance between them using the chosen distance function.
  4. Populate the Matrix: Store the calculated distance values in the proximity matrix.

Matrix Structure

For a dataset with n objects, the proximity matrix will be an n x n matrix. Let's denote the objects as O₁, O₂, ..., Oᵢ, ..., Oⱼ, ..., O<0xE2><0x82><0x99>.

The value in the cell at row i and column j typically represents the distance between object Oᵢ and object Oⱼ.

O₁ O₂ ... Oⱼ ... O<0xE2><0x82><0x99>
O₁ d(O₁, O₁) d(O₁, O₂) ... d(O₁, Oⱼ) ... d(O₁, O<0xE2><0x82><0x99>)
O₂ d(O₂, O₁) d(O₂, O₂) ... d(O₂, Oⱼ) ... d(O₂, O<0xE2><0x82><0x99>)
... ... ... ... ... ... ...
Oᵢ d(Oᵢ, O₁) d(Oᵢ, O₂) ... d(Oᵢ, Oⱼ) ... d(Oᵢ, O<0xE2><0x82><0x99>)
... ... ... ... ... ... ...
O<0xE2><0x82><0x99> d(O<0xE2><0x82><0x99>, O₁) d(O<0xE2><0x82><0x99>, O₂) ... d(O<0xE2><0x82><0x99>, Oⱼ) ... d(O<0xE2><0x82><0x99>, O<0xE2><0x82><0x99>)

Where d(Oᵢ, Oⱼ) is the distance calculated between object Oᵢ and object Oⱼ using the chosen distance function.

Key Characteristics

  • Diagonal Elements: The distance between an object and itself, d(Oᵢ, Oᵢ), is typically 0.
  • Symmetry: For many common distance functions (like Euclidean), the distance from Oᵢ to Oⱼ is the same as the distance from Oⱼ to Oᵢ, d(Oᵢ, Oⱼ) = d(Oⱼ, Oᵢ). This results in a symmetric matrix.

In summary, the proximity matrix's values are calculated by applying a distance function between each pair of objects, providing a quantitative measure of how 'close' or 'far apart' they are in the feature space.