By default the following upload file formats are supported:
Category |
Extension |
Text |
.txt |
Portable |
.pdf2, .md, .msg2, .org2, .rst2, .csv1, .tsv |
General |
.eml2, .html2, .xml |
Microsoft Office |
.doc2, .docx2, .ppt2, .pptx2, .xls, .xlsx, .rtf |
Open Document |
.odt2, .odp2, .ods2 |
Ebook |
.epub2 |
JSON |
.json, .jsonl |
Images |
.png, .jpeg/.jpg, .tiff, .bmp, .gif |
Custom |
.custom, .web |
1 - When using the legacy provider, the separator must be a comma (,) and not a semicolon (;).
2 - supports the pageNumber element when using the GEAI provider.
Take into account that simple files like .csv, .txt are expected to have utf-8 encoding.
For On-premises installations, you can extend the supported file extensions by changing the FILE_TYPES parameter.
- .custom: use it to manually configure your desired chunks and metadata.
- .web: use it to crawl a web site.
- New extensions: doc, ppt, xls, msg, org, rtf, rst, tsv, eml, tiff, bmp, epub.
- new options when processing csv, xls*.
Check the System parameters as detailed for the RAG module:
Revision |
Parameter |
Value |
2 |
FILE_TYPES |
txt,pdf,docx,pptx,xlsx,odt,odp,ods,xlsx,epub,json,jsonl,csv,java,cs,py,js,ts,xml,html,web,custom,md |
5 |
FILE_TYPES |
txt,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,rtf,rst,epub,json,jsonl,csv,tsv,java,cs,py,js,ts,web,custom,md,png,gif,jpeg,jpg,mp3,mp4,msg,org,tiff,bmp,eml,xml,html |
Ingestion Provider