KEMBAR78
Add TorchScript-able "info" func to sox_io backend by mthrok · Pull Request #728 · pytorch/audio · GitHub
Skip to content

Conversation

@mthrok
Copy link
Contributor

@mthrok mthrok commented Jun 18, 2020

This is a part of PRs to add new "sox_io" backend #726, and depends on #718.

This PR adds info function to "sox_io" backend, which allows users to fetch some metadata of an audio file.
At this moment, the information retrieved are;

  • Number of samples in the audio file
  • Sampling rate
  • Number of channels

This implementation ended up being aligned with what was suggested in #618 (comment).

Compared to the original "sox" backend, which exposed all sox_signalinfo_t and sox_encodinginfo_t, this is limited but it was not helpful to expose all of sox internals either. For the details of these structures, see sox_signalinfo_t and sox_encodinginfo_t.

@mthrok mthrok changed the title Add TorchScript-able info func to sox_io backend Add TorchScript-able "info" func to sox_io backend Jun 18, 2020
@mthrok mthrok force-pushed the get-info branch 2 times, most recently from 9b76f9f to 5e970ee Compare June 18, 2020 19:10
@mthrok mthrok mentioned this pull request Jun 18, 2020
6 tasks
@codecov
Copy link

codecov bot commented Jun 19, 2020

Codecov Report

Merging #728 into master will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #728      +/-   ##
==========================================
+ Coverage   88.97%   88.99%   +0.02%     
==========================================
  Files          31       32       +1     
  Lines        2522     2527       +5     
==========================================
+ Hits         2244     2249       +5     
  Misses        278      278              
Impacted Files Coverage Δ
torchaudio/backend/sox_io_backend.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f8eac89...74459d9. Read the comment docs.

Copy link
Contributor

@vincentqb vincentqb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


# 4. Build codecs
build_tools/setup_helpers/build_third_party.sh
# build_tools/setup_helpers/build_third_party.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can you elaborate on why you are changing this default mechanic as part of this pr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The static build of libsox (which is what we ship as binary distribution) does not contain codecs for ogg/vorbis.
Using libsox-fmt-all we can test ogg/vorbis types too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we can still build third parties by using BUILD_SOX=1, but by default, we link with the static build. Is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When BUILD_SOX=0 _torchaudio is linked against a libsox found in the system. static/dynamic depends on what is found.

torchaudio.set_audio_backend(be)


class TempDirMixin:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not for this PR: just as is done below in a prior pr, what's the advantage of not inheriting PytorchTestCase here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid inheritance of PytorchTestCase multiple times at the test module.
TempDirMixin is designed to be composable so that each test case can choose use or not to use.
This decision should not collide with the fact we are requiring each test case to inherit PytorchTestCase.

Comment on lines +22 to +24
['float32', 'int32', 'int16', 'uint8'],
[8000, 16000],
[1, 2],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is nice :) is there a way to have pass keyword arguments with parameterized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know. Please refer to the doc https://pypi.org/project/parameterized/

self.assertEqual(output, expected, rtol=rtol, atol=atol)

@unittest.skipIf(_not_available('apply-cmvn-sliding'), '`apply-cmvn-sliding` not available')
@common_utils.skipIfNoExec('apply-cmvn-sliding')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: again, i'm not sure why this is changing as part of this pr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as to avoid two duplicated implementation.

kaldi_compatibility_impl had this original version of skipIfNoExec implementation (which is _not_available func + unittest.skipIf), and now that implementation is promoted to common utility because the new sox_io test uses it too.

@mthrok mthrok merged commit 88fccd1 into pytorch:master Jun 19, 2020
@mthrok mthrok deleted the get-info branch June 19, 2020 20:22
@mthrok
Copy link
Contributor Author

mthrok commented Jun 19, 2020

thanks!

mthrok added a commit to mthrok/audio that referenced this pull request Jun 22, 2020
In pytorch#728, linux unit test switches to libsox provided by apt.
For CPU jobs this is fine because all the job steps share the same Docker container,
but on CPU job, each job step runs a script in a new Docker container, so
libsox installed in a step is not available to the subsequent steps.

To fix this, this PR moves the installation of libsox and sox to Docker build.
seemethere pushed a commit that referenced this pull request Jun 23, 2020
In #728, linux unit test switches to libsox provided by apt.
For CPU jobs this is fine because all the job steps share the same Docker container,
but on CPU job, each job step runs a script in a new Docker container, so
libsox installed in a step is not available to the subsequent steps.

To fix this, this PR moves the installation of libsox and sox to Docker build.
mthrok added a commit that referenced this pull request Jun 25, 2020
This is a part of PRs to add new "sox_io" backend. #726 and depends on #718 and #728 .

This PR adds `load` function to "sox_io" backend, which is  tested on the following audio formats;
 - `wav`
 - `mp3`
 - `flac`
 - `ogg/vorbis` *

By default, "sox_io" backend returns Tensor with `float32` dtype and the shape of `[channel, time]`. The samples are normalized to fit in the range of `[-1.0, 1.0]`.

Unlike existing "sox" backend, the new `load` function can handle WAV file natively, when the input format is WAV with integer type, (such as 32-bit signed integer, 16-bit signed integer and 8-bit unsigned integer) by providing `normalize=False`, this function can return integer Tensor, where the samples are expressed within the whole range of the corresponding dtype, that is, `int32` tensor for `32-bit PCM`, `int16` for `16-bit PCM` and `uint8` for `8-bit PCM`. This behavior follows [scipy.io.wavfile.read](https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.wavfile.read.html). `normalize` parameter has no effect for other formats and the load function always return normalized value with `float32` Tensor.

__* Note__ The current binary distribution of torchaudio does not contain `ogg/vorbis` and `opus` codecs. To handle these files, one needs to build torchaudio from the source with proper codecs in the system.

__Note 2__ Since this PR, `scipy` becomes required module for running test.
mthrok added a commit that referenced this pull request Jul 1, 2020
This is a part of PRs to add new "sox_io" backend. #726 and depends on #718, #728 and #731.

This PR adds `save` function to "sox_io" backend, which can save Tensor to a file with the following audio formats;
 - `wav`
 - `mp3`
 - `flac`
 - `ogg/vorbis`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants