Creates and starts training a custom model from a given dataset.
The following table resumes the configuration properties (access credentials) you must set in order to use this AI task.
|
PropertyKey |
ProviderType |
Key |
Alibaba |
- |
Amazon |
- |
Baidu |
- |
Google |
Service Account JSON |
IBM |
Visual Recognition Key |
Microsoft |
Custom Vision Training Key |
SAP |
- |
Tencent |
- |
Suppose you want to create a model to classify different types of flowers.
First, you must get your tagged data. In this case, we will use the Mamaevs' Flowers Recognition dataset.
Then, you must provide a GeneXus' 'generator' object that must satisfy two conditions:
1) Returns a collection of Data data type.
2) Allows pagination through two input parameters: page number and page size.
In this context, you have two alternatives:
- Use a Data Provider
e.g. if you load the dataset in a Transaction object, you can create a Data Provider object using Skip/Count clauses.
Properties:
Output = Data
Collection = True
Rules:
parm(in:&pageNum, in:&pageSize);
Variables:
&pageSize: Numeric(4.0)
&pageNum: Numeric(4.0)
&inputMediaBlob: Blob
&outputCategory: VarChar(40)
Source:
DataCollection [COUNT = &pageSize] [SKIP = (&pageNum - 1) * &pageSize]
{
Dummy [NoOutput]
{
// get dataset involved attributes
&inputMediaBlob = TransactionImage
&outputCaregory = TransactionCategory.ToString()
// load item
Data
{
Input
{
Features
{
Value = &inputMediaBlob
}
}
Output
{
Label = &outputCaregory
}
}
}
}
- Use a Procedure
e.g. if you have your dataset in a directory and every image follows the format '{category}_{index}.png', you can scan the directory with the following Procedure object.
Rules:
parm(in:&pageNum, in:&pageSize, out:&DataCollection);
Variables:
&pageSize: Numeric(4.0)
&pageNum: Numeric(4.0)
&i: Numeric(4.0)
&BTM: Numeric(4.0)
&TOP: Numeric(4.0)
&directory: Directory
&file: File
&mediaFilePath: VarChar(512)
&mediaCategory: VarChar(32)
&data: Data, GeneXusAI.Custom
&DataCollection: Data, GeneXusAI.Custom (collection)
Source:
&i = 0
&BTM = (&pageNum - 1) * &pageSize + 1
&TOP = &pageSize * &pageNum
&directory.Source = !"{path}/dataset"
// look for every image in directory
for &file in &directory.GetFiles()
&i += 1
do case
case &i > &TOP // exclude upper index in range [&BTM,&TOP]
exit
case &i < &BTM // exclude lower index in range [&BTM,&TOP]
// skip
otherwise
// get dataset involved values
&mediaFilePath = &file.GetAbsoluteName()
&mediaCategory = &file.GetName()
.ReplaceRegEx(!"_\d+\.png$",!"") // e.g. "cat1_0001.png" --> "cat1"
// load item
&data = new()
&data.Input.Featrues.Add(&mediaFilePath)
&data.Output.Label = &mediaCategory
&DataCollection.Add(&data)
endCase
endFor
Finally, you can define your model and start the training process as follows:
&definition = new()
// define model name
&definition.Name = !"Flowers model"
// define model dataset (link to generator object previously defined)
&definition.Dataset.Loader = link(MyGeneratorObject)
// define model input
&feature = new()
&feature.Name = !"IMAGE"
&feature.Type = DataInputType.Media
&definition.Input.Features.Add(&feature)
// define model output
&definition.Output.Type = DataOutputType.Label
// call train process
&Model = GeneXusAI.Custom.Train(&definition, &provider, &Messages)
Note: Don not forget to include at least
10 samples per class: 8 for training, 1 for testing and 1 for validation.
In case you don't have enaugh samples on your dataset, it is most probably you do not satisfy the mentioned conditions. You can use the Purpose field of
Data data type to achieve this aim when you code your generator object.
-
Your service-account.json file must be accessible from the web app in case you set the Provider's Key property with its file path.
-
You must install OpenSSL command-line tool.
Check the successful installation by typing the following command in the command-line interface.
> openssl version
Do not forget to add the directory containing the openssl.exe to your PATH environment variable in case you are working on Windows OS.
-
You must set the Storage Provider property to 'Google Cloud Storage' and set the associated properties according to your service-account.json file. Ensure your bucket region is 'us-central1'; otherwise, you must create it and recreate your service-account.json file.
- This process executes silently (in the provider's server). You can poll the training status by calling the GeneXus Cognitive API - Check procedure.
- The training process time may vary depending on your input dataset.
- It is highly important your 'generator' object follows these two rules:
1) Returns a collection of Data data type.
2) Allows pagination through two input parameters: page number and page size.
Also, your 'generator' object must be reachable from your main object because it will be dynamically called from GeneXusAI.
This procedure is available as of GeneXus 16 upgrade 6.