Microsoft Analysis has launched OptiMind, an AI based mostly system that converts pure language descriptions of complicated determination issues into mathematical formulations that optimization solvers can execute. It targets a protracted standing bottleneck in operations analysis, the place translating enterprise intent into combined integer linear applications often wants skilled modelers and days of labor.
What OptiMind Is And What It Outputs?
OptiMind-SFT is a specialised 20B parameter Combination of Specialists mannequin within the gpt oss transformer household. About 3.6B parameters are lively per token, so inference value is nearer to a mid sized mannequin whereas protecting excessive capability. The context size is 128,000 tokens, which permits lengthy specs and multi step reasoning traces inside a single request.
The mannequin takes a pure language description of an optimization drawback as enter. The output is a mathematical formulation together with executable Python code that makes use of GurobiPy. The generated script defines determination variables, constraints, and goal, calls the Gurobi solver, and prints the optimum goal worth and selections.
OptiMind acts as a formulation layer between area consultants and customary MILP solvers. It doesn’t exchange the solver, it generates the MILP that the solver will optimize.
Structure, Coaching Setup, And Datasets
The bottom mannequin is openai/gpt-oss-20b, superb tuned into microsoft/OptiMind-SFT utilizing cleaned optimization datasets. The structure is a Combination of Specialists transformer, with routing that prompts a subset of consultants per token. The mannequin is launched underneath the MIT license.
Coaching makes use of 8 NVIDIA B200 GPUs, and inference and analysis within the reference setup use 8 NVIDIA H100 GPUs. Reported superb tuning time is about 8 hours. For normal use, the crew recommends not less than 32 GB of GPU reminiscence on {hardware} equivalent to A100, H100, or B200.
For supervised superb tuning, the analysis crew assemble cleaned variations of OR Instruct and OptMATH Prepare. For testing, they use skilled validated and re-cleaned variations of IndustryOR, Mamo Complicated, and OptMATH. These benchmarks cowl onerous formulation duties the place current fashions typically attain solely 20 to 50 p.c accuracy on the unique noisy variations.
Class Based mostly Error Evaluation And Knowledge Cleansing
A key technical thought in OptiMind is to mix optimization experience with LLM coaching. The analysis crew classifies issues from OR-Instruct and OptMATH into 53 seed courses, for instance set cowl, move store scheduling, or touring salesman drawback.
For every class, they run the gpt-oss-20b-base mannequin on a pattern of issues and choose situations the place the mannequin output disagrees with the bottom fact. Optimization consultants examine these things, establish the recurring formulation errors, and write quick error descriptions and preventive hints. These hints describe appropriate constraints, variable bounds, or modeling methods, equivalent to the right Miller Tucker Zemlin constraints for TSP.
The analysis crew then makes use of a semi-automated pipeline. They regenerate options with a bigger mannequin that’s prompted with the category particular hints, apply majority voting throughout samples to enhance answer high quality, and drop gadgets that stay inconsistent. In addition they detect lacking parameters and ambiguous statements and regenerate drawback descriptions when wanted. The result’s a cleaned coaching corpus that’s higher aligned with appropriate mathematical formulations.
Inference Pipeline, Hints, And Check Time Scaling
At inference time, OptiMind behaves as a multi stage system, not only a single immediate. The default pipeline first classifies every check occasion into one of many 53 optimization courses used throughout error evaluation. It then augments the immediate with the error abstract and trace pairs related to that class.
The mannequin then generates a reasoning hint, the mathematical formulation, and the GurobiPy code. When extra compute is obtainable, the system can apply self consistency with majority voting. It generates a number of candidate scripts, executes them, and selects the answer that seems most frequently inside set numerical tolerances.
A multi flip correction mode may also be enabled. The system runs the generated code, captures solver logs or execution errors, feeds this suggestions again to the mannequin, and lets the mannequin revise the formulation and code for a number of rounds. This closes some modeling and coding errors at the price of greater latency.
Quantitative Beneficial properties On Optimization Benchmarks
On cleaned variations of IndustryOR, Mamo-Complicated, and OptMATH, the OptiMind framework considerably improves answer accuracy. The fine-tuned mannequin improves formulation accuracy by 20.7 p.c throughout a number of optimization benchmarks, with additional features when check time scaling strategies equivalent to self consistency and multi flip suggestions are utilized.
Throughout these benchmarks, OptiMind improves absolute accuracy over the gpt-oss-20b-base mannequin and outperforms different open supply fashions of comparable or bigger dimension. It reaches efficiency that’s aggressive with proprietary frontier fashions equivalent to GPT-o4 mini and GPT-5 underneath the analysis settings.
These outcomes depend on cautious cleansing of each coaching and check information. The analysis crew report that many obvious mannequin errors on authentic benchmarks really got here from lacking information, ambiguous descriptions, or incorrect reference options, and that re-cleaning can raise obvious accuracy for a hard and fast mannequin from about 40 to 60 p.c into the 70 to 90 p.c vary on the corrected units.
Key Takeaways
- OptiMind is a 20B parameter Combination of Specialists transformer within the gpt-oss-family that takes pure language optimization issues as enter and outputs each a mathematical formulation and executable GurobiPy code, with about 3.6B parameters activated per token and a 128,000 token context size.
- The mannequin is ok tuned from
openai/gpt-oss-20bon cleaned optimization datasets equivalent to OR-Instruct and OptMATH, and evaluated on skilled validated benchmarks together with IndustryOR and Mamo Complicated, specializing in combined integer linear programming formulations. - OptiMind makes use of class based mostly error evaluation and skilled written hints for 53 optimization courses, then applies these hints each in information cleansing and at inference time, which systematically reduces frequent modeling errors in generated MILPs.
- The framework improves formulation accuracy by 20.7 p.c throughout a number of optimization benchmarks in comparison with the bottom mannequin, and with check time scaling strategies equivalent to self consistency and multi flip suggestions it reaches efficiency that’s aggressive with bigger proprietary techniques.
- OptiMind-SFT is launched as
microsoft/OptiMind-SFTon Hugging Face and asmicrosoft-optimind-sftin Azure AI Foundry, the place it may be served through SGLang as an OpenAI appropriate endpoint, enabling sensible integration into determination assist pipelines for provide chains, manufacturing, logistics, and scheduling.
Take a look at the Mannequin Weights and Technical particulars. Additionally, be at liberty to observe us on Twitter and don’t neglect to affix our 100k+ ML SubReddit and Subscribe to our E-newsletter. Wait! are you on telegram? now you possibly can be a part of us on telegram as effectively.
Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its recognition amongst audiences.