Mistral AI has launched Mistral OCR 3, its newest optical character recognition service that powers the corporateās Doc AI stack. The mannequin, named as mistral-ocr-2512, is constructed to extract interleaved textual content and pictures from PDFs and different paperwork whereas preserving construction, and it does this at an aggressive value of $2 per 1,000 pages with a 50% low cost when used by means of the Batch API.
What Mistral OCR 3 is Optimized for?
Mistral OCR 3 targets typical enterprise doc workloads. The mannequin is tuned for varieties, scanned paperwork, advanced tables, and handwriting. It’s evaluated on inside benchmarks drawn from actual enterprise use circumstances, the place it achieves a 74% general win fee over Mistral OCR 2 throughout these doc classes utilizing a fuzzy match metric in opposition to floor reality.
The mannequin outputs markdown that preserves doc format, and when desk formatting is enabled, it enriches the output with HTML based mostly desk representations. This mix provides downstream programs each the content material and the structural data that’s wanted for retrieval pipelines, analytics, and agent workflows.
Function in Mistral Doc AI
OCR 3 sits inside Mistral Doc AI, the corporateās doc processing functionality that mixes OCR with structured information extraction and Doc QnA.
It now powers the Doc AI Playground in Mistral AI Studio. On this interface, customers add PDFs or pictures and get again both clear textual content or structured JSON with out writing code. The identical underlying OCR pipeline is accessible through the general public API, which permits groups to maneuver from interactive exploration to manufacturing workloads with out altering the core mannequin.
Inputs, Outputs, And Construction
The OCR processor accepts a number of doc codecs by means of a single API. The doc area can level to:
document_urlfor PDFs, pptx, docx and extraimage_urlfor picture varieties reminiscent of png, jpeg or avif- Uploaded or base64 encoded PDFs or pictures by means of the identical schema
That is documented within the OCR Processor part of Mistralās Doc AI docs.
The response is a JSON object with a pages array. Every web page comprises an index, a markdown string, a listing of pictures, a listing of tables when table_format="html" is used, detected hyperlinks, elective header and footer fields when header or footer extraction is enabled, and a dimensions object with web page dimension. There’s additionally a document_annotation area for structured annotations and a usage_info block for accounting data.
When pictures and HTML tables are extracted, the markdown consists of placeholders reminiscent of  and [tbl-3.html](tbl-3.html). These placeholders are mapped again to precise content material utilizing the pictures and tables arrays within the response, which simplifies downstream reconstruction.
Upgrades Over Mistral OCR 2
Mistral OCR 3 introduces a number of concrete upgrades relative to OCR 2. The general public launch notes emphasize 4 essential areas.
- Handwriting Mistral OCR 3 extra precisely interprets cursive, blended content material annotations, and handwritten textual content positioned on high of printed templates.
- Types It improves detection of containers, labels, and handwritten entries in dense layouts reminiscent of invoices, receipts, compliance varieties, and authorities paperwork.
- Scanned and sophisticated paperwork The mannequin is extra sturdy to compression artifacts, skew, distortion, low DPI, and background noise in scanned pages.
- Advanced tables It reconstructs desk constructions with headers, merged cells, multi row blocks, and column hierarchies, and it could possibly return HTML tables with correct
colspanandrowspantags in order that format is preserved.

Pricing, Batch Inference, And Annotations
The OCR 3 mannequin card lists pricing at $2 per 1,000 pages for normal OCR and $3 per 1,000 annotated pages when structured annotations are used.
Mistral additionally exposes OCR 3 by means of its Batch Inference API /v1/batch, which is documented underneath the batching part of the platform. Batch processing halves the efficient OCR value to $1 per 1,000 pages by making use of a 50% low cost for jobs that run by means of the batch pipeline.
The mannequin integrates with two essential options on the identical endpoint, Annotations ā Structured and BBox Extraction. These enable builders to connect schema pushed labels to areas of a doc and get bounding containers for textual content and different components, which is beneficial when mapping content material into downstream programs or UI overlays.
Key Takeaways
- Mannequin and function: Mistral OCR 3, named as
mistral-ocr-2512, is the brand new OCR service that powers Mistralās Doc AI stack for web page based mostly doc understanding. - Accuracy positive factors: On inside benchmarks overlaying varieties, scanned paperwork, advanced tables, and handwriting, OCR 3 achieves a 74% general win fee over Mistral OCR 2, and Mistral positions it as cutting-edge in opposition to each conventional and AI native OCR programs.
- Structured outputs for RAG: The service extracts interleaved textual content and embedded pictures and returns markdown enriched with HTML reconstructed tables, preserving format and desk construction so outputs can feed straight into RAG, brokers, and search pipelines with minimal additional parsing.
- API and doc codecs: Builders entry OCR 3 through the
/v1/ocrendpoint or SDK, passing PDFs asdocument_urland pictures reminiscent of png or jpeg asimage_url, and may allow choices like HTML desk output, header or footer extraction, and base64 pictures within the response. - Pricing and batch processing: OCR 3 is priced at 2 {dollars} per 1,000 pages and three {dollars} per 1,000 annotated pages, and when used by means of the Batch API the efficient value for normal OCR drops to 1 greenback per 1,000 pages for giant scale processing.
Try theĀ TECHNICAL DETAILS.Ā Be happy to take a look at ourĀ GitHub Web page for Tutorials, Codes and Notebooks.Ā Additionally,Ā be at liberty to comply with us onĀ TwitterĀ and donāt overlook to affix ourĀ 100k+ ML SubRedditĀ and Subscribe toĀ our E-newsletter.
