Voice synthesis on ISR

Page 35/36
28 | 29 | 30 | 31 | 32 | 33 | 34 | | 36

By ARTRAG

Enlighted (6845)

ARTRAG's picture

13-12-2021, 13:14

Sure, look at the link https://github.com/artrag/voicenc_scc
The released files are slightly older than the files in the repository but the optimisations are only in the speed of the search, the results should be the same

By ARTRAG

Enlighted (6845)

ARTRAG's picture

13-12-2021, 16:40

I'm still struggling on how to "emulate" the SCC output in the matlab script
This means that the actual data sound far better on real msx than on the matlab encoder with the option "p" (playback)

What I've added a window able to show as animation of the phase sample correction with the option "w" (waves)
You can see the original wave in the upper 3rd of the window, the rotated one according to the chosen optimisation method in the middle, and the previous rotated wave in the lower part of the screen

By GhostwriterP

Hero (663)

GhostwriterP's picture

13-12-2021, 21:34

ARTRAG wrote:

Much cleaner now! Great amazing work!

It is a great improvement on the encoder, so credit goes mainly to you Wink

ARTRAG wrote:

Yea! Are you ready ? ;-) You are applying effects to sampled speech too.
The "Awesome" sample is modulated in real time.

Well the features on samples are slimmed down a lot, in previous version it was possible to put some gain on it or modulate another wave using the sample as modulator etc. But that is just not very practical in the re-player (as that one will not do any on the fly calculations). So, currently it is only possible to payback the sample with or without using the pitch information and slow down the samples a bit using the 8 timers.
Using the playback ignoring pitch data option is used the yeah samples in the last parts on last part. This gives a bit of robot voice / autotune effect.

ARTRAG wrote:

Can you also use samples as they were instruments ?

Yes, samples can be used as instruments. If the pitch data use is used the pitch is recalculated based on the deviation from the C-4 note. So if the sampled pitch is an C then you can use this sample as an instrument that stays in tune, at least for a certain range (not into extremes).
But for instruments it is also possible to ignore the pitch information in the sample and just write the pitch directly. This way the instrument is in perfect pitch. Only thing to keep in mind that you need to add any vibrato manually again, cause that will information all be skipped. The later option I specifically intended for instruments, it is also faster as there is no need to recalculate the pitch every time!

The new encoder is picking up a lot more type of samples and a lot better quality than before, kudos! Especially with -o1 option. With -o2 I feel the results are not that good (wave forms seem to "shiver" back and forth a lot). But did not look into this in any detail yet. Hopefully I will have some time next week the explore this further.

PS: OK, I quickly threw together something using only samples Snap. Not my best work but just to get the idea ;)

By Manuel

Ascended (18794)

Manuel's picture

13-12-2021, 21:42

Next thing we know is that there's a MOD player for SCC Tongue

By ARTRAG

Enlighted (6845)

ARTRAG's picture

13-12-2021, 22:24

Quote:

PS: OK, I quickly threw together something using only samples Snap. Not my best work but just to get the idea ;)

It is very good !!! :) :) :)
Do you need any specific feature I could try to add to the encoder inorder to allow you to use samples as instruments?
I am a total noob with music but if you explain me what you need I can try to implement it

By Grauw

Ascended (10583)

Grauw's picture

13-12-2021, 23:16

GhostwriterP wrote:

If the pitch data use is used the pitch is recalculated based on the deviation from the C-4 note.

Is it worth making the reference note / frequency configurable? Then if the sample is not C-4 one could enter the actual note rather than transposing the notes to compensate.

This could also be done in the sample generation though. @ARTRAG Maybe this is an interesting command line option, to allow a frequency offset to be specified (in cents).

GhostwriterP wrote:

But for instruments it is also possible to ignore the pitch information in the sample and just write the pitch directly.

That’s nice, it will also remove the small ticks due to repeated changes to the frequency, giving a cleaner sound. Even if there is a check that frequency is only set when it changes, the automatically determined frequency will have small instabilities. This gives the user manual control over that.

GhostwriterP wrote:

PS: OK, I quickly threw together something using only samples Snap. Not my best work but just to get the idea ;)

Cool!

By gdx

Enlighted (5514)

gdx's picture

14-12-2021, 01:50

I would want like to try the new trial version of Realfun 3 (that of the video).

By ARTRAG

Enlighted (6845)

ARTRAG's picture

14-12-2021, 08:27

Just wondering about improving the unvoiced segments of the speech.
Now I encode them as the voiced frames, only the pich comes from the maximum of their spectrum.

Could it be worth using simple sampling at 60×32 Hz or 50x32?
It would result in a useful bandwidth of about 800 - 900Hz.
Not a lot but maybe better than considering them as periodic with period taken from the frequency of their spectral maximum
I can add an option to the encoder for that...

By Grauw

Ascended (10583)

Grauw's picture

14-12-2021, 11:32

About unvoiced, uff, I couldn’t say, I haven’t experimented with it enough yet.

My initial thoughts would be to look at the top n spectrum maxima, and find the frequency where the SCC can represent the most amount of maxima with the waveform. So the frequency where the power of the maximum plus the power of its harmonic frequencies is the highest. But I’m not sure to what degree this can work for unvoiced.

Your suggestion could also be good…

@gdx I suggest to look at and ask in this thread. On this topic the threads intermingle a bit but this one is primarily about the sampling technique.

By ARTRAG

Enlighted (6845)

ARTRAG's picture

14-12-2021, 11:49

Probably someone has a better idea looking at the real data of what now is to be represented using SCC features.

https://github.com/artrag/voicenc_scc/blob/master/unvoiced%2...

The unvoiced tracts are those where the lower black line (voiced probability) goes below 50%.
There are two kind of examples in this picture:
The one in the at about 2,75 sec, where the spectrum is concentrated at 5KHz
The two at about 2,4 sec and at 3,9 sec where the spectrum is low pass and with low energy

Proposals ?

Page 35/36
28 | 29 | 30 | 31 | 32 | 33 | 34 | | 36