-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Closed
Labels
api-approvedAPI was approved in API review, it can be implementedAPI was approved in API review, it can be implementedapi-ready-for-reviewAPI is ready for review, it is NOT ready for implementationAPI is ready for review, it is NOT ready for implementationarea-System.Numerics.Tensorsin-prThere is an active PR which will close this issue when it is mergedThere is an active PR which will close this issue when it is merged
Milestone
Description
Background and motivation
Given two vectors/lists/strings, Hamming distance is the number of positions at which the two vectors differ. This has a variety of uses (https://en.wikipedia.org/wiki/Hamming_distance), but when focusing on sequences of bits, it's commonly used with embedding vectors that have been quantized down to a single bit per element, with the Hamming distance between two embeddings then used as the similarity/distance metric. When used as such, Hamming distance is computable as the popcount of the xor of the bits.
API Proposal
namespace System.Numerics.Tensors;
public static class TensorPrimitives
{
// Existing
public static void PopCount<T>(ReadOnlySpan<T> x, Span<T> destination);
// New
+ public static long PopCount<T>(ReadOnlySpan<T> x) where T : IBinaryInteger<T>;
+ public static int HammingDistance<T>(ReadOnlySpan<T> x, ReadOnlySpan<T> y); // computes number of differing elements
+ public static long HammingBitDistance<T>(ReadOnlySpan<T> x, ReadOnlySpan<T> y) where T : IBinaryInteger<T>; // computes number of differing bits
}(I've included the new PopCount overload as it felt strange to have a method that effectively did the same thing for the xor of two inputs but not for a single input. It's also useful in its own right.)
API Usage
ReadOnlySpan<byte> singleBitEmbedding1 = ...;
ReadOnlySpan<byte> singleBitEmbedding2 = ...;
long distance = TensorPrimitives.HammingBitDistance(singleBitEmbedding1, singleBitEmbedding2);Alternative Designs
- Different names to distinguish bits from elements?
- OperationStatus / Try-based PopCount that could accomodate continugous memory with more than 2^63 bits.
- Another HammingDistance overload accepting an IEqualityComparer (or a generic constrained to one).
Risks
n/a
vcsjones, neon-sunset, colejohnson66 and PaulusParssinen
Metadata
Metadata
Assignees
Labels
api-approvedAPI was approved in API review, it can be implementedAPI was approved in API review, it can be implementedapi-ready-for-reviewAPI is ready for review, it is NOT ready for implementationAPI is ready for review, it is NOT ready for implementationarea-System.Numerics.Tensorsin-prThere is an active PR which will close this issue when it is mergedThere is an active PR which will close this issue when it is merged