KEMBAR78
Remove duplicate --bits parameter while invoking sox by imaginary-person · Pull Request #1287 · pytorch/audio · GitHub
Skip to content

Conversation

@imaginary-person
Copy link
Contributor

--bits is being provided twice by mistake & sox is unable to handle it well & produces a warning.
For example,

sox -V3 --no-dither -R --bits 16 --rate 8000 --null --channels 1 --bits 16 --encoding gsm /tmp/tmp08b_xoao/torchaudio_unittest.backend.sox_io.info_test.TestInfo.test_gsm/data.gsm synth 1 sawtooth 1
sox:      SoX v14.4.2

Input File     : '' (null)
Channels       : 1
Sample Rate    : 8000
Precision      : 16-bit
Endian Type    : little
Reverse Nibbles: no
Reverse Bits   : no

sox WARN formats: gsm can't encode GSM to 16-bit

Get latest changes from main repo
Remove duplicate --bits command line parameter. It's producing a warning with sox.
@vincentqb
Copy link
Contributor

Good catch, thanks!

Interestingly, removing the first --bits but not the second still raises the warning.

❯ sox -V3 --no-dither -R --bits 16 --rate 8000 --null --channels 1 --encoding gsm example.gsm synth 1 sawtooth 1       
sox:      SoX v

Input File     : '' (null)
Channels       : 1
Sample Rate    : 8000
Precision      : 16-bit
Endian Type    : little
Reverse Nibbles: no
Reverse Bits   : no

sox INFO sox: Overwriting `example.gsm'

Output File    : 'example.gsm'
Channels       : 1
Sample Rate    : 8000
Precision      : 16-bit
Sample Encoding: GSM
Comment        : 'Processed by SoX'

sox INFO sox: effects chain: input         8000Hz  1 channels
sox INFO sox: effects chain: synth         8000Hz  1 channels
sox INFO sox: effects chain: output        8000Hz  1 channels
❯ sox -V3 --no-dither -R --rate 8000 --null --channels 1 --bits 16 --encoding gsm example.gsm synth 1 sawtooth 1          

sox:      SoX v

Input File     : '' (null)
Channels       : 1
Sample Rate    : 8000
Precision      : 32-bit

sox INFO sox: Overwriting `example.gsm'
sox WARN formats: gsm can't encode GSM to 16-bit

Output File    : 'example.gsm'
Channels       : 1
Sample Rate    : 8000
Precision      : 16-bit
Sample Encoding: GSM
Comment        : 'Processed by SoX'

sox INFO sox: effects chain: input         8000Hz  1 channels
sox INFO sox: effects chain: synth         8000Hz  1 channels
sox INFO sox: effects chain: output        8000Hz  1 channels

Also, the documentation highlights that --bits is not applicable to codec like GSM and MP3.

@vincentqb vincentqb merged commit 9395030 into pytorch:master Feb 19, 2021
@imaginary-person
Copy link
Contributor Author

imaginary-person commented Feb 19, 2021

Also, the documentation highlights that --bits is not applicable to codec like GSM and MP3.

Thank you! Even with that change, for .gsm files, libsox is still encoding the length field in the file header as 0. I'd look closely at its source-code to figure out the reason.

Interestingly, removing the first --bits but not the second still raises the warning.

I noticed that too, but I haven't looked at the sox source code yet to ascertain the reason.

@mthrok
Copy link
Contributor

mthrok commented Feb 19, 2021

Actually, it is valid to have two --bits parameters, because the first one is for input and the second one is for output.
The question here is either if it is meaningful to give --bits to input. Because the input is synthetic data generated within sox command, so the precision is simply determined by the internal buffer type that sox uses, which is 32-bit signed integer.

The second --bits parameter is for output, which will change the bit depth from internal type to the specified output type. So if we are to remove one of them, then we should be retaining the second one.

@mthrok
Copy link
Contributor

mthrok commented Feb 19, 2021

@vincentqb Can you revert the merge?

vincentqb added a commit to vincentqb/audio that referenced this pull request Feb 19, 2021
@imaginary-person
Copy link
Contributor Author

Uh, oh! Sorry for the inconvenience!

@imaginary-person
Copy link
Contributor Author

imaginary-person commented Feb 19, 2021

The second --bits parameter is for output, which will change the bit depth from internal type to the specified output type. So if we are to remove one of them, then we should be retaining the second one.

@mthrok, if the second one is removed, then the first one applies to both the input & the output.

sox -V3 --no-dither -R --bits 16  --rate 8000 --null --channels 1  data.flac  synth 1 sawtooth 1
sox:      SoX v14.4.2

Input File     : '' (null)
Channels       : 1
Sample Rate    : 8000
Precision      : 16-bit
Endian Type    : little
Reverse Nibbles: no
Reverse Bits   : no

sox INFO sox: Overwriting `data.flac'
sox INFO flac: encoding at 16 bits per sample

Output File    : 'data.flac'
Channels       : 1
Sample Rate    : 8000
Precision      : 16-bit
Sample Encoding: 16-bit FLAC
Endian Type    : little
Reverse Nibbles: no
Reverse Bits   : no
Comment        : 'Processed by SoX'

sox INFO sox: effects chain: input         8000Hz  1 channels
sox INFO sox: effects chain: synth         8000Hz  1 channels
sox INFO sox: effects chain: output        8000Hz  1 channels

If even the first one is removed (default bit-depth),

sox -V3 --no-dither -R   --rate 8000 --null --channels 1  data.flac  synth 1 sawtooth 1
sox:      SoX v14.4.2

Input File     : '' (null)
Channels       : 1
Sample Rate    : 8000
Precision      : 32-bit

sox INFO sox: Overwriting `data.flac'
sox INFO flac: encoding at 24 bits per sample

Output File    : 'data.flac'
Channels       : 1
Sample Rate    : 8000
Precision      : 24-bit
Sample Encoding: 24-bit FLAC
Endian Type    : little
Reverse Nibbles: no
Reverse Bits   : no
Comment        : 'Processed by SoX'

sox INFO sox: effects chain: input         8000Hz  1 channels
sox INFO sox: effects chain: synth         8000Hz  1 channels
sox INFO sox: effects chain: output        8000Hz  1 channels

@mthrok
Copy link
Contributor

mthrok commented Feb 19, 2021

@mthrok, if the second one is removed, then the first one applies to both the input & the output.

Notice that the message is INFO, whereas the original one was WARN.

As you can see, the input (generated data) has 32-bit precision, but when you convert it to flac, flac can retain up to 24 bits so the precision is lowered. So sox is telling you that it is going to user lower precision. But this operation is valid as that's the specification of flac format.

However, when you were providing --bits 16 to GSM output, you were doing invalid operation, so sox was warning you that it ignores the option you provided. In #1277, where you found the warning, you were explicitly providing bit_depth=16, where it should have been bit_depth=None.

@mthrok
Copy link
Contributor

mthrok commented Feb 19, 2021

@imaginary-person
With that, I am fine with removing the first --bits option, because the implementation of the helper function looks a bit strange, even though the result should be identical. (If removing the first --bits causes existing tests to fail, that can mean these tests are wrong. I started writing all the test fixture without deep understanding of audio formats / libsox internal mechanism, so I am happy to see any of them fail and fix them.)

@imaginary-person
Copy link
Contributor Author

imaginary-person commented Feb 19, 2021

However, when you were providing --bits 16 to GSM output, you were doing invalid operation

Yes, I realized that when Vincent provided a reference in the documentation.

You're right that the second --bits option should be retained, as it only affects the output.
I just meant that the --bits option only at the first position is affecting both the input & the output,
so it's causing efficiency issues but not correctness issues.
I wasn't referring to the INFO message.

mthrok pushed a commit that referenced this pull request Feb 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants