Mistral AI Releases OCR 3: A Smaller Optical Character Recognition (OCR) Mannequin for Structured Doc AI at Scale

Mistral AI has launched Mistral OCR 3, its newest optical character recognition service that powers the corporate’s Doc AI stack. The mannequin, named as mistral-ocr-2512, is constructed to extract interleaved textual content and pictures from PDFs and different paperwork whereas preserving construction, and it does this at an aggressive value of $2 per 1,000 pages with a 50% low cost when used by means of the Batch API.

What Mistral OCR 3 is Optimized for?

Mistral OCR 3 targets typical enterprise doc workloads. The mannequin is tuned for varieties, scanned paperwork, advanced tables, and handwriting. It’s evaluated on inside benchmarks drawn from actual enterprise use circumstances, the place it achieves a 74% general win fee over Mistral OCR 2 throughout these doc classes utilizing a fuzzy match metric in opposition to floor reality.

The mannequin outputs markdown that preserves doc format, and when desk formatting is enabled, it enriches the output with HTML based mostly desk representations. This mix provides downstream programs each the content material and the structural data that’s wanted for retrieval pipelines, analytics, and agent workflows.

Function in Mistral Doc AI

OCR 3 sits inside Mistral Doc AI, the corporate’s doc processing functionality that mixes OCR with structured information extraction and Doc QnA.

It now powers the Doc AI Playground in Mistral AI Studio. On this interface, customers add PDFs or pictures and get again both clear textual content or structured JSON with out writing code. The identical underlying OCR pipeline is accessible through the general public API, which permits groups to maneuver from interactive exploration to manufacturing workloads with out altering the core mannequin.

Inputs, Outputs, And Construction

The OCR processor accepts a number of doc codecs by means of a single API. The doc area can level to:

document_url for PDFs, pptx, docx and extra
image_url for picture varieties reminiscent of png, jpeg or avif
Uploaded or base64 encoded PDFs or pictures by means of the identical schema

That is documented within the OCR Processor part of Mistral’s Doc AI docs.

The response is a JSON object with a pages array. Every web page comprises an index, a markdown string, a listing of pictures, a listing of tables when table_format="html" is used, detected hyperlinks, elective header and footer fields when header or footer extraction is enabled, and a dimensions object with web page dimension. There’s additionally a document_annotation area for structured annotations and a usage_info block for accounting data.

When pictures and HTML tables are extracted, the markdown consists of placeholders reminiscent of ![img-0.jpeg](img-0.jpeg) and [tbl-3.html](tbl-3.html). These placeholders are mapped again to precise content material utilizing the pictures and tables arrays within the response, which simplifies downstream reconstruction.

Upgrades Over Mistral OCR 2

Mistral OCR 3 introduces a number of concrete upgrades relative to OCR 2. The general public launch notes emphasize 4 essential areas.

Handwriting Mistral OCR 3 extra precisely interprets cursive, blended content material annotations, and handwritten textual content positioned on high of printed templates.
Types It improves detection of containers, labels, and handwritten entries in dense layouts reminiscent of invoices, receipts, compliance varieties, and authorities paperwork.
Scanned and sophisticated paperwork The mannequin is extra sturdy to compression artifacts, skew, distortion, low DPI, and background noise in scanned pages.
Advanced tables It reconstructs desk constructions with headers, merged cells, multi row blocks, and column hierarchies, and it could possibly return HTML tables with correct colspan and rowspan tags in order that format is preserved.

https://mistral.ai/information/mistral-ocr-3

Pricing, Batch Inference, And Annotations

The OCR 3 mannequin card lists pricing at $2 per 1,000 pages for normal OCR and $3 per 1,000 annotated pages when structured annotations are used.

Mistral additionally exposes OCR 3 by means of its Batch Inference API /v1/batch, which is documented underneath the batching part of the platform. Batch processing halves the efficient OCR value to $1 per 1,000 pages by making use of a 50% low cost for jobs that run by means of the batch pipeline.

The mannequin integrates with two essential options on the identical endpoint, Annotations – Structured and BBox Extraction. These enable builders to connect schema pushed labels to areas of a doc and get bounding containers for textual content and different components, which is beneficial when mapping content material into downstream programs or UI overlays.

Key Takeaways

Mannequin and function: Mistral OCR 3, named as mistral-ocr-2512, is the brand new OCR service that powers Mistral’s Doc AI stack for web page based mostly doc understanding.
Accuracy positive factors: On inside benchmarks overlaying varieties, scanned paperwork, advanced tables, and handwriting, OCR 3 achieves a 74% general win fee over Mistral OCR 2, and Mistral positions it as cutting-edge in opposition to each conventional and AI native OCR programs.
Structured outputs for RAG: The service extracts interleaved textual content and embedded pictures and returns markdown enriched with HTML reconstructed tables, preserving format and desk construction so outputs can feed straight into RAG, brokers, and search pipelines with minimal additional parsing.
API and doc codecs: Builders entry OCR 3 through the /v1/ocr endpoint or SDK, passing PDFs as document_url and pictures reminiscent of png or jpeg as image_url, and may allow choices like HTML desk output, header or footer extraction, and base64 pictures within the response.
Pricing and batch processing: OCR 3 is priced at 2 {dollars} per 1,000 pages and three {dollars} per 1,000 annotated pages, and when used by means of the Batch API the efficient value for normal OCR drops to 1 greenback per 1,000 pages for giant scale processing.

Try the TECHNICAL DETAILS. Be happy to take a look at our GitHub Web page for Tutorials, Codes and Notebooks. Additionally, be at liberty to comply with us on Twitter and don’t overlook to affix our 100k+ ML SubReddit and Subscribe to our E-newsletter.

Michal Sutter is a knowledge science skilled with a Grasp of Science in Knowledge Science from the College of Padova. With a stable basis in statistical evaluation, machine studying, and information engineering, Michal excels at reworking advanced datasets into actionable insights.

Sample Page Title

What Mistral OCR 3 is Optimized for?

Function in Mistral Doc AI

Inputs, Outputs, And Construction

Upgrades Over Mistral OCR 2

Pricing, Batch Inference, And Annotations

Key Takeaways

Related Articles

Why Does Donald Trump Refuse to Defend America?

XRP Enters “Volatility Vacuum” As Merchants Exit Derivatives Market

2 Canadian Dividend Shares Good for Retirees

LEAVE A REPLY Cancel reply

Latest Articles

Why Does Donald Trump Refuse to Defend America?

XRP Enters “Volatility Vacuum” As Merchants Exit Derivatives Market

2 Canadian Dividend Shares Good for Retirees

The chic, feral pleasure of Reddit’s Woman Dinner Diaries

Keep away from Errors Shopping for Enterprise Insurance coverage

EDITOR PICKS

Why Does Donald Trump Refuse to Defend America?

XRP Enters “Volatility Vacuum” As Merchants Exit Derivatives Market

2 Canadian Dividend Shares Good for Retirees

POPULAR POSTS

Qubic’s Mining Pool Attacking Monero Falls Beneath Assault

Feedback on the brand new buying and selling dialog in Metatrader...

What’s nano-texture glass and do I would like it?

POPULAR CATEGORY