Sunday, 30 March 2025

VECTOR_DISTANCE

What is VECTOR_DISTANCE?
The VECTOR_DISTANCE function calculates the distance between two vectors (represented as expr1 and expr2). Depending on the context, the vectors can represent various types of data, such as images, text, or numbers.
Key Points:
    Purpose: Calculates the distance between two vectors.
    Optional Metric: You can specify a distance metric. If not specified:
        The default metric is Cosine Distance for general vectors.
        For binary vectors, the default is Hamming Distance.
If you do not specify a distance metric, Cosine Distance is used by default for most cases, and Hamming Distance for binary vectors.
Shorthand Functions for Common Distance Metrics
To make it easier to calculate distances, the VECTOR_DISTANCE function comes with shorthand functions for common distance metrics. These are equivalent to the more detailed functions, providing a more compact way to express vector distance calculations.
Here are the shorthand functions:
    L1_DISTANCE: Manhattan (L1) distance.
    L2_DISTANCE: Euclidean (L2) distance.
    COSINE_DISTANCE: Cosine similarity distance.
    INNER_PRODUCT: Negative dot product (used to compare similarity).
    HAMMING_DISTANCE: Hamming distance for binary vectors.
    JACCARD_DISTANCE: Jaccard distance for binary vectors.

Distance Metrics Available:
    COSINE: Measures the cosine of the angle between two vectors, useful for high-dimensional data like text.
    DOT: Calculates the negated dot product of two vectors, useful for measuring similarity.
    EUCLIDEAN: Measures the straight-line (L2) distance between two vectors, commonly used in spatial data.
    EUCLIDEAN_SQUARED: Euclidean distance without taking the square root, often used in optimization tasks.
    HAMMING: Counts the number of differing dimensions between two binary vectors, typically used in error correction.
    MANHATTAN: Also known as L1 distance, calculates the sum of absolute differences between vector components, useful for grid-based problems.
    JACCARD: Measures dissimilarity between binary vectors based on the ratio of the intersection to the union of the vectors.
    
Shorthand Operators for Distance Metrics:
Instead of specifying the distance metric explicitly, you can use shorthand operators for quicker calculations. These are especially handy when writing queries or performing similarity searches:
    <->: Equivalent to L2_DISTANCE (Euclidean distance).
    <=>: Equivalent to COSINE_DISTANCE (Cosine similarity).
    <#>: Equivalent to -1 * INNER_PRODUCT (Negative dot product).