SpeechToText procedure

Official Content
This documentation is valid for:

Converts an audio stream to plain text by transcribing the speech detected in the audio stream.

Parameters

Configuration

The following table resumes the configuration properties (access credentials) you must set in order to use this AI task.

Property   Amazon WS Google Cloud AI IBM Watson  Microsoft Azure  SAP Leonardo Tencent AI
Id - - - - - 语音识别
Key Polly Cloud Speech API SpeechToText API Bing Speech API  - 语音识别
SecretKey Polly - - - - -
Username - - SpeechToText API  - - -
Password - - SpeechToText API - - -

Sample

Taking the following spoken audio input, the table below shows the transcription made for each provider and the time it takes for processing it.

Provider Output Benchmark
Amazon WS "The first question that comes to mind is, What is the nexus you Nexus is a tool that automatically generate software programs such as applications for Windows, the Web and smart devices, which are always at the forefront of technological evolution." 202627ms
Google Cloud AI "The first question that comes to mind is what is Genesis. The Nexus is a tool that automatically generate software program such as applications for Windows." 6986ms
IBM Watson "The first question that comes to mind is. What is your nexus. Next is a tool that automatically generate software programs such as applications for windows the web and smart devices which are always at the forefront technological evolution." 8682ms
Microsoft Azure  "The first question that comes to mind is what is genexus." 3412ms
SAP Leonardo N/A N/A
Tencent AI GXAI5000 - External provider raise an error. Reason '[-2147483635] system busy, please try again later' N/A

Notes

  • Only short spoken audio is supported (up to 15 seconds).

  • Some providers only support short utterances (e.g. Microsoft). For such reason, text output can be "incomplete" regarding the audio input. The transcription will be made up to the first "silence mark" for identifying a voice command.

  • Input audio format depends on the provider type.
    - Amazon WS supports mp3, mp4, wav and flac.
    - IBM Watson supports mp3, mp4, wav, ogg, flac and webm (GeneXus 16 Upgrade 0 only supports mp3)
    - Microsoft Azure supports wav only.
    - Google Cloud AI supports mp3, wav and ogg.
    - Tencent AI supports wav only.
    Use this site to verify your audio has an appropriate mime-type.

  • Audio format conversion can be done by using an external tool.
    GeneXusAI integrates this feature automatically as an experimental feature if you follow these steps:
    1) Download the ffmpeg tool depending on your server's operative system (i.e. Linux or Windows).
    2) Attach the binary file to your knowledge base as a File object.
    3) Set the Extract for {gen} Generator property in True, being '{gen}' your working generator (Java, .NET or .NET Core).
    4) Set the {gen} Generator Extract Directory property with "Resources" value.
    5) Ensure that the extracted binary file has execution permission where the webapp is running.
    6) Give an &audio input to this task without worrying about its format.
    Take into account that adding this feature the performance can be degraded.

    IMPORTANT: As an experimental feature, no support is provided and it is subject to breaking changes without any advertisement. Use it at your own risk.

  • Considerations when using Amazon as a provider:
    1) Benchmark is higher than other providers because the processing is made asynchronously by polling for the status in order to get the result.
    2) The audio file must be uploaded to Amazon S3. In case you have set Storage Provider property with your Amazon credentials, any audio will be automatically stored on your S3 bucket to be processed. In another case, you must provide an URL with one of the following expressions:
    + http://{bucket}.s3.amazonaws.com/{path/to/filename.ext}
    + http://{bucket}.s3-{region}.amazonaws.com/{path/to/filename.ext}
    + http://s3.amazonaws.com/{bucket}/{path/to/filename.ext}
    + http://s3-{region}.amazonaws.com/{bucket}/{path/to/filename.ext}
    The {region} must match with the region of your access credentials (or can be empty only when your region is 'us-east-1').

  • Tencent provider only supports Chinese-spoken audios. For such reason, it will raise a GXAI5000 error when the audio is provided in another language. For example, if you provide this audio sample (in Chinese), you will get the following result:
    "提出的第一个问题是什么仅三十一个自动生成软件程序的工具例如应用程序可智能设备始终处于技术发展的最前沿".

Scope

Platforms  Web(.NET,.NETCore,Java), SmartDevices(Android,iOS)
Connectivity  Online

Availability

This procedure is available as of GeneXus 16.

See also

Was this page helpful?
What Is This?
Your feedback about this content is important. Let us know what you think.