#39771 - 30/11/2001 01:37
Re: Navigation project startup
[Re: tonyc]
|
member
Registered: 21/07/1999
Posts: 140
Loc: Helsinki, Finland
|
I was just asking if Kim was using Flite or something else. If he had gotten Flite to output to /dev/audio, he would have solved one of my current problems and I would have begged to see what he did. :)
As Rob said, I'm using offline created PCM samples, which are generated with AT&T Labs Natural Voices which is the best TTS I've ever heard and which can be freely used.
For the navigation system, it works fairly well. I mostly need numbers anyhow, and I can easily represent any decimal or floating-point number between 0 and 999 in high quality with a single 400KB WAV file. For other speech prompts I need to generate them manually one by one.
Kim
|
Top
|
|
|
|
#39772 - 30/11/2001 05:21
Re: Navigation project startup
[Re: tonyc]
|
carpal tunnel
Registered: 13/07/2000
Posts: 4180
Loc: Cambridge, England
|
...the 4608 byte restriction on /dev/audio. Flite writes 256 bytes at a time.
dd obs=4608 ?
Peter
|
Top
|
|
|
|
#39773 - 30/11/2001 08:05
Re: Navigation project startup
[Re: kim]
|
carpal tunnel
Registered: 21/05/1999
Posts: 5335
Loc: Cambridge UK
|
As Rob said, I'm using offline created PCM samples, which are generated with AT&T Labs Natural Voices which is the best TTS I've ever heard and which can be freely used.
It can be used freely nn times a day from their web site, but a single user desktop licence (with which the PCM output cannot be distributed) costs $49. More useful commercial licences are somewhat more expensive, but still excellent value for money considering the quality.
Rob
|
Top
|
|
|
|
#39774 - 30/11/2001 09:18
Re: Navigation project startup
[Re: peter]
|
carpal tunnel
Registered: 27/06/1999
Posts: 7058
Loc: Pittsburgh, PA
|
Yeah... even when I do
dd if=file.wav of=/dev/audio obs=4608
I don't get any sound at all. So I'm thinking it has something to do with the Sound Overlay kernel which is currently installed on my Empeg. I'm going to try to back that out and see if I am still getting silent sound output. I just remembered this morning that I had installed that kernel patch. Not sure if that's the problem..
Assuming the WAV file isn't silence, I should get sound output when I run that dd command, right? There's no ioctl's I have to call to change the volume or select PCM source, are there?
|
Top
|
|
|
|
#39775 - 30/11/2001 11:41
Re: Navigation project startup
[Re: tonyc]
|
member
Registered: 21/07/1999
Posts: 140
Loc: Helsinki, Finland
|
So I'm thinking it has something to do with the Sound Overlay kernel which is currently
If you open the /dev/audio without the O_SYNC flag, there is no difference in the way how the audio is outputted.
Assuming the WAV file isn't silence, I should get sound output when I run that dd command, right? There's no ioctl's I have to call to change the volume or select PCM source, are there?
There are multiple. And if you try this so that the player application is not running, it's likely that the soft audio mute is enabled.
After you've opened /dev/audio, try this:
int iMixer = open( "/dev/mixer", O_RDONLY );
int iSource = SOUND_MASK_PCM;
int iFlags = 0; // source not muted
int iSAM = 0; // SAM is off
int iVolume = 100 | (100 << 8);
ioctl( iMixer, _IOW( 'm', 0, int ), &iSource ); // set source
ioctl( iMixer, _IOW( 'm', 1, int ), &iFlags ); // set flags
ioctl( iMixer, _IOW( 'm', 15, int ), &iSAM ); // set Soft Audio Mute
ioctl( iMixer, MIXER_WRITE( SOUND_MIXER_VOLUME ), &iVolume );
close( iMixer );
Kim
|
Top
|
|
|
|
#39776 - 30/11/2001 11:50
Re: Navigation project startup
[Re: kim]
|
carpal tunnel
Registered: 27/06/1999
Posts: 7058
Loc: Pittsburgh, PA
|
Aaaaaaah I think it is that soft audio mute... I was confusing that with cell-phone mute which I don't have tied to anything... I didn't know that SAM was enabled when the player exits.
That's the problem. Thanks muchly.
What the hell is soft audio mute supposed to do anyway?
|
Top
|
|
|
|
#39777 - 30/11/2001 14:34
Re: Navigation project startup
[Re: kim]
|
enthusiast
Registered: 11/11/2000
Posts: 202
Loc: Boston, MA
|
OK, so you knwo I would be posting on this given recent events.
L&H's website also has a free converter for test-to-speech with a female voice. IMHO, the AT&T Labs version sounds much better but at least you have another resource. I can't wait to get my hands on some of this technology to play with.
Greg
|
Top
|
|
|
|
#39778 - 30/11/2001 15:12
Re: Navigation project startup
[Re: kim]
|
carpal tunnel
Registered: 27/06/1999
Posts: 7058
Loc: Pittsburgh, PA
|
Okay Kim you seem to have a handle on this sound stuff... explain this to me. I can now hear output, but I can't set any of the parameters like are set in the pcmplay example... So 8 KHz wave files come out sounding like a record playing super fast. In order to get pcmplay to work, I have to comment out all of the mixer ioctl's which set the format, frequency, etc... Or I get something like this:
ioctl(SNDCTL_DSP_SETFMT): Invalid argument
Here's the section I had to comment out to get it to play anything:
format = AFMT_S16_LE;
if (ioctl(fd, SNDCTL_DSP_SETFMT, &format) == -1) {
perror("ioctl(SNDCTL_DSP_SETFMT)");
return -1;
}
if (format != AFMT_S16_LE) {
fprintf(stderr, "AFMT_S16_LE not available\n");
return -1;
}
stereo = 1;
if (ioctl(fd, SNDCTL_DSP_STEREO, &stereo) == -1) {
perror("ioctl(SNDCTL_DSP_STEREO)");
return -1;
}
if (!stereo) {
fprintf(stderr, "stereo selection failed\n");
return -1;
}
speed = 44100;
if (ioctl(fd, SNDCTL_DSP_SPEED, &speed) == -1) {
perror("ioctl(SNDCTL_DSP_SPEED)");
return -1;
}
if (speed != 44100) {
fprintf(stderr, "sample speed 44100 not available (closest %u)\n", speed);
return -1;
}
Any idea what could be happening here?
|
Top
|
|
|
|
#39779 - 30/11/2001 17:00
Re: Navigation project startup
[Re: tonyc]
|
member
Registered: 21/07/1999
Posts: 140
Loc: Helsinki, Finland
|
Okay Kim you seem to have a handle on this sound stuff... explain this to me. I can now hear output, but I can't set any of the parameters like are set in the pcmplay example...
The pcmplay example is obsolete...
So 8 KHz wave files come out sounding like a record playing super fast. In order to get pcmplay to work, I have to comment out all of the mixer ioctl's which set the format, frequency, etc...
The audio input is locked at 44KHz, 16-bit stereo, little-endian signed with buffer size of 4608 bytes (size of one mpeg frame). If the program only outputs PCM at 8KHz, you need to manually convert it into 44KHz .
Kim
|
Top
|
|
|
|
#39780 - 30/11/2001 17:09
Re: Navigation project startup
[Re: kim]
|
carpal tunnel
Registered: 27/06/1999
Posts: 7058
Loc: Pittsburgh, PA
|
The audio input is locked at 44KHz, 16-bit stereo, little-endian signed with buffer size of 4608 bytes (size of one mpeg frame).
Doh! Is that new or has it always been that way? The pcmplay example would have one believe you can select mono/stereo, 11/22/44 KHz, etc... I knew the size of the buffer was locked...
This is a bit disappointing to my plans to get text-to-speech on the player. Sigh.
|
Top
|
|
|
|
#39781 - 30/11/2001 17:22
Re: Navigation project startup
[Re: tonyc]
|
carpal tunnel
Registered: 27/06/1999
Posts: 7058
Loc: Pittsburgh, PA
|
So, continuing the thoughts in my last post...
If it's indeed impossible to change sample rates, etc... Would it be possible to make some kernel modifications to enable the playing of formats other than 16-bit 44.1khz stereo? I mean this is REALLY limiting for a product whose primary purpose is as an audio player! I know this locking was chosen to keep the visuals in sync with the audio, but is that saying that there are no ways to keep the visuals in sync and still be able to accept other formats?
|
Top
|
|
|
|
#39782 - 30/11/2001 17:53
Re: Navigation project startup
[Re: tonyc]
|
carpal tunnel
Registered: 20/12/1999
Posts: 31597
Loc: Seattle, WA
|
That sounds more like a DSP limitation than a kernel limitation to me. Mind you, I know nothing about this stuff, it's just an observation off the top of my head. If that's true, then any translation/resampling work would have to be done in software.
|
Top
|
|
|
|
#39783 - 30/11/2001 20:58
Re: Navigation project startup
[Re: tfabris]
|
carpal tunnel
Registered: 27/06/1999
Posts: 7058
Loc: Pittsburgh, PA
|
Yeah it appears there's no way to play anything other than 44.1 khz stereo. Sigh. So I have to figure out a way to do some conversion. I have zero experience in this. Anyone have any idea how one would "upsample" an 8000 or 11000 khz mono wave to 44.1 khz stereo? After about a half an hour of looking I couldn't find any way in the flite code to output at 44.1 or stereo. So I'd have to take its output and upsample it before it writes it... I just have no clue how this would be done. This is bound to be very inefficient. CRAP. I am so angry about this limitation, even though I know why it's there I wish there was like a second audio device that didn't have such stringent requirements.
The comments in empeg_audio3.c state "wishlist: sample rate adjustment with antialiasing filters." If this would allow us to write non-standard sample rate files to the audio device, then I hope this wish comes true. Hugo, are you listening? Or maybe there are some other smart people out there who could handle something like this, since the likelihood of this becoming an official feature is rather slim. Sigh.
|
Top
|
|
|
|
#39784 - 01/12/2001 03:26
Re: Navigation project startup
[Re: tonyc]
|
carpal tunnel
Registered: 21/05/1999
Posts: 5335
Loc: Cambridge UK
|
That's the rate supported by the hardware. If it isn't convenient you can resample in software - it's not such a great programming challenge really (i.e. I'm sure someone on here knows how to do it - we had to!).
Rob
|
Top
|
|
|
|
#39785 - 01/12/2001 03:28
Re: Navigation project startup
[Re: tonyc]
|
carpal tunnel
Registered: 21/05/1999
Posts: 5335
Loc: Cambridge UK
|
|
Top
|
|
|
|
#39786 - 01/12/2001 11:21
Re: Navigation project startup
[Re: rob]
|
carpal tunnel
Registered: 27/06/1999
Posts: 7058
Loc: Pittsburgh, PA
|
Thanks for the link, Rob. Now, does anyone have any pointers on how to convert from mono to stereo? I want to do sample rate conversion and mono->stereo conversion in real-time so I'll have to graft this resampling code in, but I also need something to switch it to stereo.. any ideas?
|
Top
|
|
|
|
#39787 - 01/12/2001 11:57
Re: Navigation project startup
[Re: tonyc]
|
enthusiast
Registered: 24/08/2001
Posts: 344
Loc: France, Champagne
|
A question for Smu :
Will you consider localization in audio sounds or just let english sounds ?
_________________________
Empeg IIa - 10 Gb - Red Fascia -
Tuner, the day is coming
- I Will Strike From the Grey -
|
Top
|
|
|
|
#39788 - 01/12/2001 12:24
Re: Navigation project startup
[Re: tonyc]
|
member
Registered: 21/07/1999
Posts: 140
Loc: Helsinki, Finland
|
Now, does anyone have any pointers on how to convert from mono to stereo?
Erm... Just output the same sample twice .
Each sample is one 16-bit integer for one channel. For stereo, just output the same 16-bit integer twice in a row (for left and right channel).
For the sample rate conversion, from 11KHz to 22KHz or 44KHz is very easy as you only need to output the same sample either two or four times, as they are multiples. 8KHz is more difficulty as you'd need to filter it upwards, thus more expensive.
For instance, converting 11KHz mono sound to 44KHz stereo sound, you just output the same sample 8 times (x 4 for sample rate conversion x 2 for stereo).
Kim
|
Top
|
|
|
|
#39789 - 01/12/2001 12:51
Re: Navigation project startup
[Re: kim]
|
carpal tunnel
Registered: 27/06/1999
Posts: 7058
Loc: Pittsburgh, PA
|
Thanks for the excellent information, Kim. I figured stereo would be something like that, but I wasn't sure if it was a block of left samples followed by a block of right samples, or whether channels alternated on each sample. Thanks for the clarification.
The Flite software only comes with an 8 Khz voice right now. Future versions might have 11khz voices. I am going to try to mix in some of the upsampling code that Rob pointed me to and then try to hack in the mono-stereo conversion on my own. After all of that work, I can only pray that it performs fast enough to allow for somewhat real-time text-to-speech. Hey, I've been wanting to dig into a more low-level project on the Empeg, so this isn't totally a bad thing. Should be fun.
|
Top
|
|
|
|
#39790 - 01/12/2001 13:48
Re: Navigation project startup
[Re: tonyc]
|
carpal tunnel
Registered: 12/11/2001
Posts: 7738
Loc: Toronto, CANADA
|
Don't forget to include a hidden option for taunts and other insults. That way you can trigger them from the remote without having them visible on the screen. Passengers will get a kick out of it. :)
Bruno
|
Top
|
|
|
|
#39791 - 01/12/2001 17:38
Re: Navigation project startup
[Re: Nosferatu]
|
old hand
Registered: 30/07/2000
Posts: 879
Loc: Germany (Ruhrgebiet)
|
Hi.
I don't intend to do the sound output stuff myself (yet), so this wil depend on co-developers. Anyhow, my nav project is advancing _really_ slow currently, my diploma thesis is top priority for me.
However, I would like the software to have at least english and german speak output. No streetnames though, if there is no realtime TTS software that is completely free (BSD style license or something like that).
cu,
sven
_________________________
proud owner of MkII 40GB & MkIIa 60GB both lit by God and HiJacked by Lord
|
Top
|
|
|
|
#39792 - 02/12/2001 08:16
Re: Navigation project startup
[Re: smu]
|
addict
Registered: 22/07/1999
Posts: 453
Loc: Florida
|
Sven,
It might be a good idea to incorporate some IVR (Interactive Voice Response) tactics in the design of your Nav. Mainly, allowing the user to record his/her own prompts. This works for the majority of your grammar (North, South, 10 Miles, etc), and allows for dynamic data like street names. We've been doing this in the IVR industry for decades. If you want, email me off-list and I can give you some ideas.
Jason
_________________________
_~= Dearing =~_ Gettin' back into it thanks to slimrio!
|
Top
|
|
|
|
#39793 - 02/12/2001 15:11
Re: Navigation project startup
[Re: hybrid8]
|
member
Registered: 19/12/1999
Posts: 117
|
Along with that little red button under the gear shift knob?
|
Top
|
|
|
|
#39794 - 02/12/2001 17:24
Re: Navigation project startup
[Re: Dearing]
|
old hand
Registered: 30/07/2000
Posts: 879
Loc: Germany (Ruhrgebiet)
|
Hi Jason.
My plans go in that direction, but as of now, I am too far away from that stage of development to actually think about the deeper program structures. But I promise to keep all that in mind.
cu,
sven
_________________________
proud owner of MkII 40GB & MkIIa 60GB both lit by God and HiJacked by Lord
|
Top
|
|
|
|
#39795 - 13/12/2001 20:47
Re: Navigation project startup
[Re: kim]
|
carpal tunnel
Registered: 29/08/2000
Posts: 14493
Loc: Canada
|
The existing Linux/unix "sox" program is excellent for performing rate/channel/format conversions to/from just about any sound file format.
"Sound eXchange : universal sound sample translator"
|
Top
|
|
|
|
#39796 - 14/12/2001 08:20
Re: Navigation project startup
[Re: mlord]
|
carpal tunnel
Registered: 27/06/1999
Posts: 7058
Loc: Pittsburgh, PA
|
Oh.... maybe I'll steal some of the code from there instead of rolling my own. It looks like SoX is made for converting *files*, whereas I really want to convert the samples as they're generated by the text-to-speech program. I guess I can find the relevant pieces of code and hammer them into the TTS software's source somewhere.
Certainly not as elementary as a /dev/audio that accepts other sample rates, but certainly doable.
|
Top
|
|
|
|
#39797 - 14/12/2001 10:34
Re: Navigation project startup
[Re: tonyc]
|
carpal tunnel
Registered: 29/08/2000
Posts: 14493
Loc: Canada
|
Sox will work in a pipe as well, but the buffering may or may not match your needs.
|
Top
|
|
|
|
#39798 - 14/12/2001 13:21
Re: Navigation project startup
[Re: mlord]
|
carpal tunnel
Registered: 27/06/1999
Posts: 7058
Loc: Pittsburgh, PA
|
Ah... I will toy with it this weekend probably, then.
|
Top
|
|
|
|
#39799 - 02/02/2002 20:39
Re: Navigation project startup
[Re: kim]
|
enthusiast
Registered: 14/09/2000
Posts: 363
|
Kinda thinking aloud here:
44100 / 8000 is almost 5.5.
8000 * 5.5 = 44000
Close enough that the playback speed would be off by 0.23%. Prolly not enough to make a difference for TTS.
Off hand, I don't know how to do the half in there. If you read in pairs of samples and output 11 at a time, how do you do it?
A: input = (-14, 120) output = (-14, -14, -14, -14, -14, -14, 120, 120, 120, 120, 120)
B: input = (-14, 120) output = (-14, -14, -14, -14, -14, 53, 120, 120, 120, 120, 120)
C: none of the above... something better.
At least A and B would be pretty easy to program and not very CPU intensive.
|
Top
|
|
|
|
#39800 - 06/02/2002 19:02
Re: Navigation project startup
[Re: smu]
|
Pooh-Bah
Registered: 09/09/1999
Posts: 1721
Loc: San Jose, CA
|
Smu,
Is the navigation project going well? If you guys would like, I can contribute to the project in the way of maps. I have several contacts over at NavTech. While a deal directly with NavTech is not likely to be struck, there are backchannels and other methods for obtaining the license cheaply.
If there is already a source for the maps, then I'll let this slide.
Let me know.
Calvin
|
Top
|
|
|
|
|
|