SpeechToText procedure

Official Content
This documentation is valid for:

Converts an audio stream to plain text by transcribing the speech detected in the audio stream.

Parameters

Configuration

The following table resumes the configuration properties (access credentials) you must set in order to use this AI task.

Property   Amazon WS Baidu AI  Google Cloud AI IBM Watson  Microsoft Azure  SAP Leonardo Tencent AI
Id - 百度语音 - - - - 语音识别
Key Polly 百度语音 Cloud Speech API SpeechToText API Speech API or
Bing Speech API
(backward compatible)
- 语音识别
SecretKey Polly 百度语音 - - - - -
Username - - - SpeechToText API  - - -
Password - - - SpeechToText API - - -

Sample

Taking the following spoken audio input, the table below shows the transcription made for each provider and the time it takes for processing it.

Provider Output Benchmark
Amazon WS "The first question that comes to mind is, What is the nexus you Nexus is a tool that automatically generate software programs such as applications for Windows, the Web and smart devices, which are always at the forefront of technological evolution." 202627ms
Baidu AI GXAI4101 - Parameter '&Locale' is malformed. Expected values: [Chinese (Simplified, Mainland), Mandarin (Simplified, Mainland), Cantonese (Traditional, Hong Kong)] N/A
Google Cloud AI "The first question that comes to mind is what is Genesis. The Nexus is a tool that automatically generate software program such as applications for Windows." 6986ms
IBM Watson "The first question that comes to mind is. What is your nexus. Next is a tool that automatically generate software programs such as applications for windows the web and smart devices which are always at the forefront technological evolution." 8682ms
Microsoft Azure  "The first question that comes to mind is what is genexus." 3412ms
SAP Leonardo N/A N/A
Tencent AI GXAI5000 - External provider raise an error. Reason '[-2147483635] system busy, please try again later' N/A

Considerations

Short transcriptions

The transcription will be made only for short timing audio (up to 15 seconds) and short utterances. As a consequence of this second condition, text output can be "incomplete" regarding the audio input because the transcription will be made up to the first "silence mark" (e.g. as Microsoft does). The aim is to identify a voice command.

Chinese providers

Only support Chinese-spoken audios. Otherwise, it will raise a GXAI5000 error when the audio is provided in another (unknown) language.

For example, taking the following Chinese-spoken audio, you will get the result detailed on the below table.

Provider Output Benchmark
Baidu AI "提出的第一个问题是一个自动生成软件程序的工具,例如应用程序,智能设备始终处于技术发展的最前沿。" 101354ms
Tencent AI "提出的第一个问题是什么仅三十一个自动生成软件程序的工具例如应用程序可智能设备始终处于技术发展的最前沿". 98457ms

Amazon provider

The audio file must be uploaded to Amazon S3. In case you have set Storage Provider property with your Amazon credentials, any audio will be automatically stored on your S3 bucket to be processed. In another case, you must provide an URL with one of the following expressions:
+ http://{bucket}.s3.amazonaws.com/{path/to/filename.ext}
+ http://{bucket}.s3-{region}.amazonaws.com/{path/to/filename.ext}
+ http://s3.amazonaws.com/{bucket}/{path/to/filename.ext}
+ http://s3-{region}.amazonaws.com/{bucket}/{path/to/filename.ext}
The {region} must match with the region of your access credentials (or can be empty only when your region is 'us-east-1').

Notes

  • Input audio format depends on the provider type.
    - Amazon WS supports mp3, mp4, wav and flac.
    - Baidu AI supports pcm, wav, and amr.
    - IBM Watson supports mp3, mp4, wav, ogg, flac and webm (GeneXus 16 Upgrade 0 only supports mp3)
    - Microsoft Azure supports wav only.
    - Google Cloud AI supports mp3, wav and ogg.
    - Tencent AI supports wav only.
    Use this site to verify your audio has an appropriate mime-type.

  • Audio format conversion can be done by using an external tool.
    GeneXusAI integrates this feature automatically as an experimental feature if you follow these steps:
    1) Download the ffmpeg tool depending on your server's operative system (i.e. Linux or Windows).
    2) Attach the binary file to your knowledge base as a File object.
    3) Set the Extract for {gen} Generator property in True, being '{gen}' your working generator (Java, .NET or .NET Core).
    4) Set the {gen} Generator Extract Directory property with "Resources" value.
    5) Ensure that the extracted binary file has execution permission where the webapp is running.
    6) Give an &audio input to this task without worrying about its format.
    Take into account that adding this feature the performance can be degraded.

    IMPORTANT: As an experimental feature, no support is provided and it is subject to breaking changes without any advertisement. Use it at your own risk.

  • Microsoft's Bing Speech API will be deprecated and its credentials must be updated to Speech API before October 2019.

Scope

Platforms  Web(.NET,.NETCore,Java), SmartDevices(Android,iOS)
Connectivity  Online

Availability

This procedure is available as of GeneXus 16.

See also

Was this page helpful?
What Is This?
Your feedback about this content is important. Let us know what you think.