KEMBAR78
[API Proposal]: TensorPrimitives.HammingDistance · Issue #102980 · dotnet/runtime · GitHub
Skip to content

[API Proposal]: TensorPrimitives.HammingDistance #102980

@stephentoub

Description

@stephentoub

Background and motivation

Given two vectors/lists/strings, Hamming distance is the number of positions at which the two vectors differ. This has a variety of uses (https://en.wikipedia.org/wiki/Hamming_distance), but when focusing on sequences of bits, it's commonly used with embedding vectors that have been quantized down to a single bit per element, with the Hamming distance between two embeddings then used as the similarity/distance metric. When used as such, Hamming distance is computable as the popcount of the xor of the bits.

API Proposal

namespace System.Numerics.Tensors;

public static class TensorPrimitives
{
    // Existing
    public static void PopCount<T>(ReadOnlySpan<T> x, Span<T> destination);
    
    // New
+   public static long PopCount<T>(ReadOnlySpan<T> x) where T : IBinaryInteger<T>;
+   public static int HammingDistance<T>(ReadOnlySpan<T> x, ReadOnlySpan<T> y); // computes number of differing elements
+   public static long HammingBitDistance<T>(ReadOnlySpan<T> x, ReadOnlySpan<T> y) where T : IBinaryInteger<T>; // computes number of differing bits
}

(I've included the new PopCount overload as it felt strange to have a method that effectively did the same thing for the xor of two inputs but not for a single input. It's also useful in its own right.)

API Usage

ReadOnlySpan<byte> singleBitEmbedding1 = ...;
ReadOnlySpan<byte> singleBitEmbedding2 = ...;
long distance = TensorPrimitives.HammingBitDistance(singleBitEmbedding1, singleBitEmbedding2);

Alternative Designs

  • Different names to distinguish bits from elements?
  • OperationStatus / Try-based PopCount that could accomodate continugous memory with more than 2^63 bits.
  • Another HammingDistance overload accepting an IEqualityComparer (or a generic constrained to one).

Risks

n/a

Metadata

Metadata

Assignees

Labels

api-approvedAPI was approved in API review, it can be implementedapi-ready-for-reviewAPI is ready for review, it is NOT ready for implementationarea-System.Numerics.Tensorsin-prThere is an active PR which will close this issue when it is merged

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions