SpeechToText procedure

Official Content
This documentation is valid for:

Converts an audio stream to plain text by transcribing the speech detected in the audio stream.



The following table resumes the configuration properties that you must set before calling this AI task.

Property   Google Cloud AI IBM Watson  Microsoft Azure  SAP Leonardo
Key From Cloud Speech API From SpeechToText API From Bing Speech API  -
Username - From SpeechToText API  - -
Password - From SpeechToText API - -


Taking the following spoken audio input, the table below shows the transcription made for each provider and the time it takes for processing it.

Provider Output Benchmark
Google Cloud AI "The first question that comes to mind is what is Genesis. The Nexus is a tool that automatically generate software program such as applications for Windows." 6986ms
IBM Watson "The first question that comes to mind is. What is your nexus. Next is a tool that automatically generate software programs such as applications for windows the web and smart devices which are always at the forefront technological evolution." 8682ms
Microsoft Azure  "The first question that comes to mind is what is genexus." 3412ms
SAP Leonardo N/A N/A


  • Only short spoken audio is supported (up to 15 seconds).

  • Some providers only support short utterances (e.g. Microsoft). For such reason, text output can be "incomplete" regarding the audio input. The transcription will be made up to the first "silence mark" for identifying a voice command.

  • Input audio format depends on the provider type.
    - IBM Watson supports mp3, mp4, wav, ogg, flac and webm (GeneXus 16 Upgrade 0 only supports mp3)
    - Microsoft Azure supports wav only.
    - Google Cloud AI supports mp3, wav and ogg.
    Use this site to verify your audio has an appropriate mime-type.

  • Audio format conversion can be done by using an external tool.
    GeneXusAI integrates this feature automatically as an experimental feature if you follow these steps:
    1) Download the ffmpeg tool depending on your server's operative system (i.e. Linux or Windows).
    2) Attach the binary file to your knowledge base as a File object.
    3) Set the Extract for {gen} Generator property in True, being '{gen}' your working generator (Java, .NET or .NET Core).
    4) Set the {gen} Generator Extract Directory property with "Resources" value.
    5) Ensure that the extracted binary file has execution permission where the webapp is running.
    6) Give an &audio input to this task without worrying about its format.
    Take into account that adding this feature the performance can be degraded.

    IMPORTANT: As an experimental feature, no support is provided and it is subject to breaking changes without any advertisement. Use it at your own risk.


Platforms  Web(.NET,.NETCore,Java), SmartDevices(Android,iOS)
Connectivity  Online


This procedure is available as of GeneXus 16.

See also

Was this page helpful?
What Is This?
Your feedback about this content is important. Let us know what you think.