Perform a quantized matrix multiplication of a by the matrix b with bias add and relu fusion.
tf.raw_ops.QuantizedMatMulWithBiasAndRelu(
a,
b,
bias,
min_a,
max_a,
min_b,
max_b,
Toutput=tf.dtypes.qint32,
transpose_a=False,
transpose_b=False,
input_quant_mode='MIN_FIRST',
name=None
)
The inputs must be two-dimensional matrices and 1D bias vector. And the inner
dimension of a (after being transposed if transpose_a is non-zero) must
match the outer dimension of b (after being transposed if transposed_b is
non-zero). Then do broadcast add operation with bias values on the matrix
multiplication result. The bias size must match inner dimension of b. Then do
relu activation to get non-negative result.
Args:
a: A Tensor. Must be one of the following types: qint8, quint8, qint32, qint16, quint16.
A matrix to be multiplied. Must be a two-dimensional tensor of type quint8.
b: A Tensor. Must be one of the following types: qint8, quint8, qint32, qint16, quint16.
A matrix to be multiplied and must be a two-dimensional tensor of type qint8.
bias: A Tensor of type float32.
A 1D bias tensor with size matching with inner dimension of b (after being
transposed if transposed_b is non-zero).
min_a: A Tensor of type float32.
The float value that the lowest quantized a value represents.
max_a: A Tensor of type float32.
The float value that the highest quantized a value represents.
min_b: A Tensor of type float32.
The float value that the lowest quantized b value represents.
max_b: A Tensor of type float32.
The float value that the highest quantized b value represents.
Toutput: An optional tf.DType from: tf.qint8, tf.quint8, tf.qint32, tf.qint16, tf.quint16. Defaults to tf.qint32.
transpose_a: An optional bool. Defaults to False.
If true, a is transposed before multiplication.
transpose_b: An optional bool. Defaults to False.
If true, b is transposed before multiplication.
input_quant_mode: An optional string from: "MIN_FIRST", "SCALED". Defaults to "MIN_FIRST".
Input data quantization mode. Either MIN_FIRST(default) or SCALED.
name: A name for the operation (optional).
Returns:
A tuple of Tensor objects (out, min_out, max_out).
out: A `Tensor` of type `Toutput`.
min_out: A `Tensor` of type `float32`.
max_out: A `Tensor` of type `float32`.