feat: lfilter with filterbanks support #1587

yoyolicoris · 2021-06-18T14:01:28Z

resolves #1528 , relates to #1561

This merge add the ability to apply multiple filters in F.lfilter.
When the input shape is (..., time), if the shape of filter coefficients is (number_of_filters, filter_order), the shape of output will be (...., number_of_filters, time), which is the input shape plus one additional dimension from the filters.

Before feeding to the cpp backend, the python frontend first duplicate the input number_of_filters times and stacking them to form a shape of (...., number_of_filters, time).

The cpp backend assume the shape of waveform and *_coeffs are (batch, number_of_filters, time) and (number_of_filters, filter_order) (before are just (batch, time) and (filter_order,)). Different filters is applied parallelly at the second dimension of waveform.

cc @mthrok

mthrok

Overall, it looks good. The semantic around filter bank looks straightforward though documentation can be improved.

If I understand correctly, the implementation cannot handle the case where one wants to apply different filters per channel. (though I am not sure if that's valid use-case.) Is that correct?

test/torchaudio_unittest/functional/functional_impl.py

mthrok · 2021-06-21T14:22:44Z

torchaudio/functional/filtering.py


    Returns:
-        Tensor: Waveform with dimension of ``(..., time)``.
+        Tensor: Waveform with dimension of ``(..., *, time)``.


I think that, things about shape and filter bank (noted as * in the current form) needs more clarification.
Specifically, the distinction between the cases of 1D coeffs and 2D coeffs should be explained.
(even though I think the use of * here is very sophisticated.)

Sounds reasonable. How about

Suggested change

Tensor: Waveform with dimension of ``(..., *, time)``.

Tensor: Waveform with dimension of either ``(..., *, time)`` if ``a_coeffs`` and ``b_coeffs`` are 2D Tensors,

or ``(..., time)`` otherwise.

This looks good, but can we simply replace the * with something like num_filters?

(And in the docstring of a_coeffs, b_coeffs, can we explicitly say either 1D with shape (num_order + 1) or 2D with shape (num_filters, num_order + 1)?)

The thing is, ellipsis is used in many docstrings, but asterisk is very unique to this function, so I think that explicitly mentioning the difference of 1D and 2D are easier to understand.

yoyolicoris · 2021-06-22T02:31:56Z

Overall, it looks good. The semantic around filter bank looks straightforward though documentation can be improved.

If I understand correctly, the implementation cannot handle the case where one wants to apply different filters per channel. (though I am not sure if that's valid use-case.) Is that correct?

No, it cannot apply different filters per channel. All signals are applied with same set of filters, regardless.

mthrok

Hi @yoyololicon

Sorry for not getting back sooner.
I think this PR is good, but if you can consider the comments, that would be great.
Thanks!

mthrok · 2021-06-26T22:08:04Z

torchaudio/functional/filtering.py

+    waveform = waveform.reshape(-1, 1, shape[-1])
+    if a_coeffs.ndim > 1:
+        shape = shape[:-1] + (a_coeffs.shape[0], shape[-1])
+        waveform = waveform.repeat(1, a_coeffs.shape[0], 1)


Instead of repeating the waveform manually and computing the shape manually, how about overwriting shape after repeating? I think that will be slightly easier to understand.

waveform = waveform.repeat(...) shape = waveform.size()

mthrok · 2021-06-26T22:08:09Z

torchaudio/functional/filtering.py


    Returns:
-        Tensor: Waveform with dimension of ``(..., time)``.
+        Tensor: Waveform with dimension of ``(..., *, time)``.


This looks good, but can we simply replace the * with something like num_filters?

(And in the docstring of a_coeffs, b_coeffs, can we explicitly say either 1D with shape (num_order + 1) or 2D with shape (num_filters, num_order + 1)?)

The thing is, ellipsis is used in many docstrings, but asterisk is very unique to this function, so I think that explicitly mentioning the difference of 1D and 2D are easier to understand.

yoyolicoris · 2021-07-02T06:56:42Z

Benchmarks

I have managed to build the sources with cuda support using docker container.
The benchmarks was conducted on a linux machine with i5 4790K, 32G of RAM, and a RTX 3070.
The result indicate that the runtime on CUDA is roughly doubled.

Before

CPU

[-------------- IIR filter --------------]
                   |  forward  |  backward
1 threads: -------------------------------
      [32, 256]    |    307.2  |    808.0 
      [32, 1024]   |    614.9  |   1512.9 
      [32, 4096]   |   2036.1  |   5199.6 
      [64, 256]    |    399.5  |    992.8 
      [64, 1024]   |   1027.6  |   2417.2 
      [64, 4096]   |   7028.7  |  13454.1 
      [128, 256]   |    624.4  |   1493.7 
      [128, 1024]  |   1986.3  |   4805.7 
      [128, 4096]  |  17420.1  |  31806.2 
2 threads: -------------------------------
      [32, 256]    |    291.6  |    813.7 
      [32, 1024]   |    531.0  |   1417.3 
      [32, 4096]   |   1754.9  |   4138.1 
      [64, 256]    |    357.4  |    968.6 
      [64, 1024]   |    797.2  |   1973.9 
      [64, 4096]   |   6914.0  |  13116.0 
      [128, 256]   |    535.3  |   1398.3 
      [128, 1024]  |   1522.3  |   4060.9 
      [128, 4096]  |  17863.6  |  30650.4 
4 threads: -------------------------------
      [32, 256]    |    268.3  |    822.9 
      [32, 1024]   |    491.3  |   1395.4 
      [32, 4096]   |   1539.7  |   4215.9 
      [64, 256]    |    347.6  |    996.4 
      [64, 1024]   |    708.5  |   1801.9 
      [64, 4096]   |   7666.7  |  13461.7 
      [128, 256]   |    479.9  |   1306.3 
      [128, 1024]  |   1428.4  |   4137.1 
      [128, 4096]  |  18735.3  |  31495.1 

Times are in microseconds (us).

CUDA

[-------------- IIR filter --------------]
                   |  forward  |  backward
1 threads: -------------------------------
      [32, 256]    |    10.7   |    19.6  
      [32, 1024]   |    41.7   |    76.5  
      [32, 4096]   |   159.8   |   297.6  
      [64, 256]    |    10.3   |    19.7  
      [64, 1024]   |    41.8   |    75.9  
      [64, 4096]   |   157.9   |   299.2  
      [128, 256]   |    10.5   |    19.8  
      [128, 1024]  |    41.3   |    75.9  
      [128, 4096]  |   164.3   |   301.0  
2 threads: -------------------------------
      [32, 256]    |    10.6   |    19.6  
      [32, 1024]   |    40.2   |    77.6  
      [32, 4096]   |   161.9   |   302.6  
      [64, 256]    |    10.7   |    19.9  
      [64, 1024]   |    40.0   |    75.6  
      [64, 4096]   |   159.8   |   299.9  
      [128, 256]   |    10.9   |    19.8  
      [128, 1024]  |    41.4   |    75.2  
      [128, 4096]  |   170.6   |   301.2  
4 threads: -------------------------------
      [32, 256]    |    10.8   |    19.6  
      [32, 1024]   |    40.3   |    76.8  
      [32, 4096]   |   160.5   |   301.7  
      [64, 256]    |    10.6   |    19.7  
      [64, 1024]   |    40.3   |    75.5  
      [64, 4096]   |   163.7   |   302.8  
      [128, 256]   |    10.7   |    19.8  
      [128, 1024]  |    42.8   |    75.9  
      [128, 4096]  |   165.5   |   299.8  

Times are in milliseconds (ms).

After

CPU

[-------------- IIR filter --------------]
                   |  forward  |  backward
1 threads: -------------------------------
      [32, 256]    |    322.1  |    835.5 
      [32, 1024]   |    635.2  |   1538.1 
      [32, 4096]   |   2164.4  |   5424.1 
      [64, 256]    |    410.9  |   1026.8 
      [64, 1024]   |   1010.5  |   2299.0 
      [64, 4096]   |   6981.6  |  12721.7 
      [128, 256]   |    616.4  |   1451.6 
      [128, 1024]  |   1886.5  |   4810.8 
      [128, 4096]  |  17436.0  |  31758.8 
2 threads: -------------------------------
      [32, 256]    |    305.4  |    854.4 
      [32, 1024]   |    550.3  |   1464.6 
      [32, 4096]   |   1733.3  |   4914.4 
      [64, 256]    |    370.5  |   1003.6 
      [64, 1024]   |    791.4  |   1927.2 
      [64, 4096]   |   6829.2  |  12608.5 
      [128, 256]   |    536.6  |   1383.8 
      [128, 1024]  |   1429.7  |   4541.2 
      [128, 4096]  |  17876.0  |  31664.5 
4 threads: -------------------------------
      [32, 256]    |    283.0  |    850.6 
      [32, 1024]   |    492.4  |   1446.2 
      [32, 4096]   |   1586.0  |   4792.5 
      [64, 256]    |    362.8  |    978.7 
      [64, 1024]   |    735.8  |   1842.1 
      [64, 4096]   |   7331.0  |  12645.4 
      [128, 256]   |    492.1  |   1351.4 
      [128, 1024]  |   1265.6  |   3954.7 
      [128, 4096]  |  18502.5  |  33391.6 

Times are in microseconds (us).

CUDA

[-------------- IIR filter --------------]
                   |  forward  |  backward
1 threads: -------------------------------
      [32, 256]    |    19.6   |    37.9  
      [32, 1024]   |    77.4   |   148.7  
      [32, 4096]   |   314.3   |   596.4  
      [64, 256]    |    19.7   |    38.5  
      [64, 1024]   |    82.8   |   147.6  
      [64, 4096]   |   328.1   |   589.7  
      [128, 256]   |    20.5   |    38.0  
      [128, 1024]  |    82.3   |   148.7  
      [128, 4096]  |   323.4   |   595.7  
2 threads: -------------------------------
      [32, 256]    |    19.9   |    37.9  
      [32, 1024]   |    77.4   |   150.9  
      [32, 4096]   |   308.5   |   597.6  
      [64, 256]    |    19.7   |    38.6  
      [64, 1024]   |    82.7   |   147.6  
      [64, 4096]   |   330.5   |   590.1  
      [128, 256]   |    20.8   |    38.2  
      [128, 1024]  |    80.9   |   149.3  
      [128, 4096]  |   330.8   |   595.9  
4 threads: -------------------------------
      [32, 256]    |    19.8   |    37.9  
      [32, 1024]   |    78.2   |   150.8  
      [32, 4096]   |   309.5   |   598.0  
      [64, 256]    |    20.1   |    38.1  
      [64, 1024]   |    82.6   |   148.9  
      [64, 4096]   |   323.4   |   589.2  
      [128, 256]   |    20.9   |    38.2  
      [128, 1024]  |    81.4   |   148.7  
      [128, 4096]  |   322.0   |   596.0  

Times are in milliseconds (ms).

mthrok · 2021-07-02T16:29:27Z

Hi @yoyololicon

Thanks for the benchmark.
The shapes in the table, are they the size of input batch?
I am also curious to learn the performance difference of multiple filters.

What do you think is the feasibility of splitting CPU/CUDA code?
I assume that the extra overhead for CUDA case is coming from the newly added for-loop.

BTW, I found an attempt to use RNN in #226. Although it is also pointed out that RNN without activation would not improve the performance as much. https://discuss.pytorch.org/t/vanillarnn-without-activation-function/26507/2. I wonder if this could help us in custom CUDA lfilter code.

yoyolicoris · 2021-07-03T02:23:00Z

Hi @mthrok ,

Thanks for the benchmark.
The shapes in the table, are they the size of input batch?
I am also curious to learn the performance difference of multiple filters.

Yes, they are the size of input. I can do the multiple filters case later.

What do you think is the feasibility of splitting CPU/CUDA code?
I assume that the extra overhead for CUDA case is coming from the newly added for-loop.

The overhead concern is technically true, although the actual implementation is done by replacing addmv and a for-loop with bmm and some extra transpose operations.
To completely avoid these overheads, I think writing custom cuda kernels is needed.

BTW, I found an attempt to use RNN in #226. Although it is also pointed out that RNN without activation would not improve the performance as much. https://discuss.pytorch.org/t/vanillarnn-without-activation-function/26507/2. I wonder if this could help us in custom CUDA lfilter code.

Hmm, I can't see much potentials in this direction of using RNN, in our case for now, cuz RNN is strictly a case of first-order IIR, and cannot generalize to arbitrary order.
I do see some possibilities of using RNN if we can implement lfilter in state space formulation, but I haven't look into it carefully yet.

mthrok · 2021-07-12T19:26:51Z

@yoyololicon Sorry for not getting back to you. I am very busy these days and I should be able to get back to this at the end of this week.

mthrok · 2021-07-21T19:17:29Z

@yoyololicon Again, really sorry for taking a long time. Let's continue to the next step.

mthrok · 2021-07-21T19:19:25Z

test/torchaudio_unittest/functional/functional_impl.py

-        assert shape == waveform.size() == output_waveform.size()
+        assert input_shape == waveform.size()
+        assert target_shape == output_waveform.size()



Follow-up: We should add tests to verify that the resulting signals are same whether multiple filters are applied individually or as a bank.

Got it, will add batch consistency tests and also benchmarks of multiple filters in later commits.

facebook-github-bot added the CLA Signed label Jun 18, 2021

yoyolicoris added 4 commits June 21, 2021 10:05

let coeffs can be 2d tensor

37d38a9

update cpp implementation to support filter banks

48e4026

fix: wrong argument

5f05f4b

add tests

e889232

yoyolicoris force-pushed the feat/lfilter-filter-banks branch from 5c239cb to e889232 Compare June 21, 2021 02:07

mthrok reviewed Jun 21, 2021

View reviewed changes

test: merge shape tests

db636e6

test: different number of channels

111a43f

mthrok approved these changes Jun 26, 2021

View reviewed changes

yoyolicoris added 3 commits June 27, 2021 22:29

doc: state 2D shapes explicitly

2669e4a

refactor: shape calculation

c87c16e

doc: remove trailing whitespace

a42af04

yoyolicoris requested a review from mthrok June 28, 2021 15:02

mthrok approved these changes Jul 21, 2021

View reviewed changes

mthrok reviewed Jul 21, 2021

View reviewed changes

mthrok merged commit aa0dd03 into pytorch:master Jul 21, 2021

yoyolicoris deleted the feat/lfilter-filter-banks branch July 21, 2021 23:19

nateanl pushed a commit to nateanl/audio that referenced this pull request Jul 28, 2021

Add filterbanks support to lfilter (pytorch#1587)

ad7b58b

yoyolicoris mentioned this pull request Aug 11, 2021

feat: batching/filter banks support of lfilter #1561

Closed

	Tensor: Waveform with dimension of ``(..., *, time)``.
	Tensor: Waveform with dimension of either ``(..., *, time)`` if ``a_coeffs`` and ``b_coeffs`` are 2D Tensors,
	or ``(..., time)`` otherwise.

feat: lfilter with filterbanks support #1587

feat: lfilter with filterbanks support #1587

Uh oh!

Conversation

yoyolicoris commented Jun 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mthrok left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mthrok Jun 21, 2021

Choose a reason for hiding this comment

Uh oh!

yoyolicoris Jun 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mthrok Jun 26, 2021

Choose a reason for hiding this comment

Uh oh!

yoyolicoris commented Jun 22, 2021

Uh oh!

mthrok left a comment

Choose a reason for hiding this comment

Uh oh!

mthrok Jun 26, 2021

Choose a reason for hiding this comment

Uh oh!

mthrok Jun 26, 2021

Choose a reason for hiding this comment

Uh oh!

yoyolicoris commented Jul 2, 2021

Benchmarks

Before

CPU

CUDA

After

CPU

CUDA

Uh oh!

mthrok commented Jul 2, 2021

Uh oh!

yoyolicoris commented Jul 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mthrok commented Jul 12, 2021

Uh oh!

mthrok commented Jul 21, 2021

Uh oh!

mthrok Jul 21, 2021

Choose a reason for hiding this comment

Uh oh!

yoyolicoris Jul 21, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yoyolicoris commented Jun 18, 2021 •

edited

Loading

yoyolicoris Jun 22, 2021 •

edited

Loading

yoyolicoris commented Jul 3, 2021 •

edited

Loading