
Nougat is a visible transformer mannequin from Meta AI that converts doc photos, together with complicated math equations, into structured textual content, providing developments in educational paper parsing.
Now you can check out nougat-base within the Clarifai Platform and entry it by means of the API.
Desk of Contents
- Introduction
- Mannequin Structure
- Working Nougat mannequin with Python
- Working Nougat mannequin with Javascript
- Finest Use Instances
Introduction
Nougat is a visible transformer mannequin developed by researchers at Meta AI that may convert photos of doc pages into structured textual content. It takes a scanned picture of a doc web page as enter and outputs textual content in a light-weight markup language.
The important thing benefit of Nougat is that it depends solely on the doc picture and doesn’t want any OCR textual content. This permits it to recuperate semantic construction like math equations correctly. It’s skilled on hundreds of thousands of educational papers from arXiv and PubMed to study the patterns of analysis paper formatting and language.
Mannequin Structure
Nougat makes use of a visible transformer encoder-decoder structure. The encoder makes use of a Swin Transformer to encode the doc picture into latent embeddings. The Swin Transformer processes the picture in a hierarchical vogue utilizing shifted home windows. The decoder then generates the output textual content tokens autoregressive utilizing self-attention over the encoder outputs.
Working Nougat mannequin with Python
You may run Nougat with Clarifai’s Python SDK in only a few traces of code. To get began, Signup to Clarifai and get your Private Entry Token(PAT) following the directions right here.
Export your PAT as an atmosphere variable
export CLARIFAI_PAT={your private entry token}
Try the Code under to run the Mannequin:
Working Nougat mannequin with Javascript
You may also run it with our Javascript Shopper:
You may also run Nougat utilizing different Clarifai Shopper Libraries like Java, cURL, NodeJS, PHP, and so on.
Mannequin Demo within the Clarifai Platform:
Check out the Nougat mannequin right here: https://clarifai.com/fb/nougat/fashions/nougat-base
Finest Use Instances
Nougat Mannequin has a variety of functions within the subject of doc understanding and extraction. Some key use instances embody:
- Analysis Paper Parsing: Nougat can precisely parse analysis papers, extracting textual content, tables, figures, and equations from doc photos. This functionality is essential for making the knowledge in analysis papers extra accessible for varied functions.
- Knowledge Extraction: The mannequin’s capability to transform documented photos into structured textual content makes it beneficial for extracting beneficial knowledge from educational papers, which can be utilized for analysis, evaluation, and data-driven decision-making.
- Summarization: Nougat could be built-in into textual content summarization pipelines to extract and summarize the content material of analysis papers mechanically, saving effort and time for researchers.
Hold in control with AI
Observe us on Twitter X to get the most recent from the LLMs
Be a part of us in our Discord to speak LLMs!