The custom format (JSON format ending with .custom extension and matching a particular schema) is a specific format to be used for full control over the ingestion process. It means that the whole chunking must be done by the caller, and metadata must be assigned accordingly. The minimal set of values is as follows:
{
[] // List if items
},
Each item is as shown below:
{
"pageContent": "some text", // mandatory
"metadata": // mandatory
{
"name": "name", // mandatory
"description": "some description", // mandatory
"source": "absolute URL" // mandatory
// key value for extra metadata (optional)
}
}
Once the document is uploaded, the ingestion process will skip the chunking process based on the text of the file; all related information must be provided in the .custom file. Each chunk to be ingested and stored in the vector store will use the pageContent field with the associated metadata.
Note that if you are using extra metadata elements, you must fill in the associated key/value grid or use the API to upload it accordingly. Besides, as you are in control of the pageContent value, make sure the input does not exceed tha maximum value for the selected embeddings model.
View a sample here.
.web File Format