Choose this strategy by adding the query parameter “default”, i.e. ?strategy=default at the end of the URL (or, by simply leaving it blank).
The fastest and cheapest strategy.
Extracts text from files, and makes no attempt to extract anything more.
This means it’s great for long contracts full of words, for example, but not a great option for documents which are full of tables (and where you need the information contained in the tables).
Outputs text (i.e. you should expect to receive a string back).
Choose this strategy by adding the query parameter “vision”, i.e. ?strategy=vision at the end of the URL.
Can read everything from files, i.e. tables, images, watermarks, text, etc.
Outputs text (i.e. you should expect to receive a string back).
This is a great option for documents which contain tables, and documents where the structure of the information on the page is important for human-level understanding (for example: invoices, receipts, and bills-of-lading).
Choose this strategy by adding the query parameter “sota”, i.e. ?strategy=sota at the end of the URL.
Can read everything from files, i.e. tables, images, watermarks, text, etc.
Outputs text, HTML, Markdown, JSON, and Chunks. All formats are computed at runtime, and you can choose which one you access later.
This is a great option for documents which contain tables, and documents where the structure of the information on the page is important for human-level understanding (for example: invoices, receipts, and bills-of-lading).