Voice synthesis on ISR

Page 33/36
26 | 27 | 28 | 29 | 30 | 31 | 32 | | 34 | 35 | 36

By gdx

Enlighted (5022)

gdx's picture

08-12-2021, 11:30

Nice improvement! I can't wait to see applications of this, especialy in enhanced Konami games.

By Grauw

Ascended (10317)

Grauw's picture

08-12-2021, 14:08

ARTRAG wrote:

Due to the fact that the the first harmonic in the wave sample should be the one with highest energy, optimising the rotation to minimise the MSE between successive wave samples probably, in the DFT domain, is almost equivalent to choose the phase of the first harmonic of the previous sample as linear offset for the second.
This relates to your method but I cannot fully predict the effect of using as reference the fist bin of the previous sample as compared to the effect of using the first bin of the current sample itself.

I think using the previous frame as reference will amount to the same thing. Since for all the consecutive frames the reference will be the same, taking the phase of the very first frame and carrying it over.

The reason why I like using -½π as the reference phase is that 1. it looks good in waveform scopes (like in Realfun 3), and 2. if a wave plays from the beginning, it will curve upwards from zero rather than starting at a top. In my imagination it could give the sound a smoother attack without (subtle?) tick, although I don’t know whether that’s actually a thing.

And indeed, for waves that are more a noisy approximation than a clear tonal waveform, where the fundamental doesn’t really play a significant role, maybe minimising the MSE will work better. For those my current algorithm fails in the pitch detection, before phase adjustment comes into play, so I couldn’t test anything like that.

By ARTRAG

Enlighted (6701)

ARTRAG's picture

08-12-2021, 23:14

I've added to the encoder the phase optimisation using both MSE between frames and phase of the first bin. The results are practically identical, even if the MSE seems slightly better.
What I had to change is the player, where the previous version had sample phase reset at each period write and this was cancelling the phase optimisation. The phase need to not reset.

I'm compiling right now a new version to be released soon.

By ARTRAG

Enlighted (6701)

ARTRAG's picture

09-12-2021, 00:22

Here it is
https://github.com/artrag/voicenc_scc/releases

Let me know how it goes
Look into the .m file for parameters
Now you need -N to use NTSC timings (earlier it was -50)

By ARTRAG

Enlighted (6701)

ARTRAG's picture

09-12-2021, 09:15

@Grauw
If you want to go deeper into pitch detection look at the free matlab toolbox sap-voicebox-master from here

http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html

Look for function v_fxpefac(), it has bibliography into its comments

By Huey

Prophet (2681)

Huey's picture

09-12-2021, 10:29

Good work guys!
Going to look into this this Christmas!

By ARTRAG

Enlighted (6701)

ARTRAG's picture

09-12-2021, 11:57

@GostwriteP,@Huey
NB: the SCC player need to NOT reset the sample phase or the phase optimisation will be lost.
I had to change this detail in my player (after banging my head quite a few times on why the optimisation wasn't working ;-).
Take care of this detail in the TT player when you test the new data.

By Grauw

Ascended (10317)

Grauw's picture

09-12-2021, 13:18

Or if you really want to reset the phase at the start of playback, only enable it (test register bit 5) the first time the frequency is set in the series.

By Grauw

Ascended (10317)

Grauw's picture

09-12-2021, 13:15

ARTRAG wrote:

If you want to go deeper into pitch detection look at the free matlab toolbox sap-voicebox-master from here. Look for function v_fxpefac(), it has bibliography into its comments.

Thanks for the pointer, I’ll check it out. So far I’ve read some papers on time-domain methods, I’ve read the MacLeod paper and I will read the YIN one next. My current implementation uses a library with several algorithms, the YIN one worked best for me so far but still not perfect. And none of them apply any post processing step that considers the previous and next frames when no pitch is found, so I want to try and make some improvements in that area.

But I’m also interested in solutions using windowed spectrum analysis; if there are peaks in frequencies which are related by integer multiples (a common GCD) it seems like those would be good candidates. Relying on the fact that most sounds will have harmonics to improve the accuracy of the detection.

By ARTRAG

Enlighted (6701)

ARTRAG's picture

10-12-2021, 10:39

I'm using the functions in voicebox. There are two implementations for pitch tracking:

v_fxpefac() based on PEFAC algorithm from this:
[1] S. Gonzalez and M. Brookes. PEFAC - a pitch estimation algorithm robust to high levels of noise.
IEEE Trans. Audio, Speech, Language Processing, 22 (2): 518-530, Feb. 2014.
doi: 10.1109/TASLP.2013.2295918.
[2] S.Gonzalez and M. Brookes,
A pitch estimation filter robust to high levels of noise (PEFAC), Proc EUSIPCO,Aug 2011.

and v_fxrapt() based on
[1] D. Talkin, "A Robust Algorithm for Pitch Tracking (RAPT)"
in "Speech Coding & Synthesis", W B Kleijn, K K Paliwal eds,
Elsevier ISBN 0444821694, 1995

The former is has an implementation that allows a simpler interface with my parameters so I've used that one and it works fine. The source for the encoder is here
https://github.com/artrag/voicenc_scc/blob/master/tt_voicenc...

Page 33/36
26 | 27 | 28 | 29 | 30 | 31 | 32 | | 34 | 35 | 36