Train procedure

Official Content
This documentation is valid for:

Creates and starts training a custom model from a given dataset.

Parameters

Configuration

The following table resumes the configuration properties (access credentials) you must set in order to use this AI task.

  PropertyKey
ProviderType Key
Alibaba -
Amazon -
Baidu -
Google  Service Account JSON
IBM Visual Recognition Key
Microsoft Custom Vision Training Key
SAP -
Tencent -

Sample

Suppose you want to create a model to classify different types of flowers.

First, you must get your tagged data. In this case, we will use the Mamaevs' Flowers Recognition dataset.

Then, you must provide a GeneXus' 'generator' object that must satisfy two conditions:
1) Returns a collection of Data data type.
2) Allows pagination through two input parameters: page number and page size.

In this context, you have two alternatives:

  1. Use a Data Provider
    e.g. if you load the dataset in a Transaction object, you can create a Data Provider object using Skip/Count clauses.
    Properties:
      Output = Data
      Collection = True
    
    Rules:
      parm(in:&pageNum, in:&pageSize);
    
    Source:
      Data [COUNT = &pageSize] [SKIP = (&pageNum - 1) * &pageSize]
      {
          Input
          {
             Features
             {
                  Value = TransactionBlobAttribute
             }
          }
          Output
          {
             Label = TransactionStringCategory
          }
    ​​​​​  }
    
  2. Use a Procedure
    e.g. if you have your dataset in a directory and every image follows the format '{category}_{index}.png', you can scan the directory with the following Procedure object.
    Rules:
      parm(in:&pageNum, in:&pageSize, out:&DataCollection);
    
    Source:
      &i = 0
      &BTM = (&pageNum - 1) * &pageSize + 1
      &TOP = &pageSize * &pageNum
      &directory.Source = !"{path}/dataset" 
      for &file in &directory.GetFiles()
         &i += 1
         do case
            case &i > &TOP
               exit // upper index in range [&BTM,&TOP]
            case &i < &BTM
               // skip - lower index in range [&BTM,&TOP]
            otherwise
              &mediaFilePath = &file.GetAbsoluteName()
              &mediaCategory = &file.GetName().ReplaceRegEx(!"_\d+\.png$",!"") // p.e. "cat1_0001.png" --> "cat1"
              &data = new()
              &data.Input.Featrues.Add(&mediaFilePath)
              &data.Output.Label = &mediaCategory
              &dataCollection.Add(&data)
         endCase
      endFor

Finally, you can define your model and start the training process:

&definition = new()

// define model name
&definition.Name = !"Flowers model"

// define model dataset (link to generator object previously defined)
&definition.Dataset.Loader = link(MyGeneratorObject)

// define model input
&feature = new()
&feature.Name = !"IMAGE"
&feature.Type = DataInputType.Media
&definition.Input.Features.Add(&feature)

// define model output
&definition.Output.Type = DataOutputType.Label

// call train process
&Model = GeneXusAI.Custom.Train(&definition, &provider, &Messages)

Note: Don not forget to include at least 10 samples per class: 8 for  training, 1 for testing and 1 for validation.
In case you don't have enaugh samples on your dataset, it is most probably you do not satisfy the mentioned conditions. You can use the Purpose field of Data data type to achieve this aim when you code your generator object.

Requirments

Google provider

  1. Your service-account.json file must be accessible from the web app in case you set the Provider's Key property with its file path.

  2. You must install OpenSSL command-line tool.
    Check the successful installation by typing the following command in the command-line interface.
    > openssl version
    Do not forget to add the directory containing the openssl.exe to your PATH environment variable in case you are working on Windows OS.

  3. You must set the Storage Provider property to 'Google Cloud Storage' and set the associated properties according to your service-account.json file. Ensure your bucket region is 'us-central1'; otherwise, you must create it and recreate your service-account.json file.

Notes

  • This process executes silently (in the provider's server). You can poll the training status by calling the Check procedure.
  • The training process time may vary depending on your input dataset.
  • It is highly important your 'generator' object follows these two rules:
    1) Returns a collection of Data data type.
    2) Allows pagination through two input parameters: page number and page size.
    Also, your 'generator' object must be reachable from your main object because it will be dynamically called from GeneXusAI.

Scope

Platforms  Web(.NET,.NETCore,Java), SmartDevices(Android,iOS)
Connectivity  Online

Availability

This procedure is available as of GeneXus 16 upgrade 6.

See also