
Automated metrics for machine translation analysis have come a good distance, with neural metrics like COMET (Rei et al. 2020) and BLEURT (Sellam et al, 2020) main the cost in enhancing translation high quality evaluation. These metrics have proven vital developments, significantly of their capability to correlate with human judgments, surpassing conventional metrics like BLEU (Papineni et al. 2002). Nonetheless, these metrics, whereas highly effective, have their limitations, as they supply solely a single sentence-level rating, leaving translation errors hidden beneath the floor.
In an period the place giant language fashions (LLMs) have revolutionized pure language processing, researchers have began to make use of them for extra granular translation error evaluation (Fernandes et al. 2023, Kocmi et al. 2023). This entails not simply evaluating the interpretation as an entire but additionally pinpointing and categorizing particular errors, offering a deeper and extra insightful view into translation high quality.
Right here is the place XCOMET makes its grand entrance. XCOMET is a cutting-edge, open-source metric designed to bridge the hole between these two analysis approaches. It brings collectively the very best of each worlds by combining sentence-level analysis and error span detection capabilities. The outcome? State-of-the-art efficiency throughout all sorts of analysis, from sentence-level to system-level, whereas additionally highlighting and categorizing error spans, enriching the standard evaluation course of.
However what units XCOMET other than the remaining? Right here’s a more in-depth take a look at what makes it a game-changer:
- Detailed Error Evaluation: Not like conventional metrics that provide only one rating, XCOMET digs deeper by figuring out and categorizing particular translation errors. This fine-grained method offers a extra complete understanding of the standard of the interpretation.
- Strong Efficiency: XCOMET has been rigorously examined and outperforms widely-used neural metrics and generative LLM-based approaches. It units a brand new customary for analysis metrics, demonstrating its superiority in all related analysis vectors.
- Robustness and Reliability: The XCOMET suite of metrics excels at figuring out crucial errors and hallucinations, making it a dependable alternative for evaluating translation high quality, even in difficult eventualities.
- Versatility: XCOMET is a unified metric that accommodates all modes of analysis, whether or not you could have a reference, want high quality estimation, and even when a supply isn’t supplied. This flexibility units it aside and makes it a useful software for translation analysis.
How does XCOMET Examine with Auto-MQM and different Metrics?
Let’s delve into the outcomes to see simply how spectacular XCOMET really is. We’ve carried out thorough evaluations, together with a comparability with different broadly recognized metrics, together with latest LLM-based metrics. Check out these two tables:


These tables spotlight the distinctive efficiency of XCOMET. In segment-level evaluations, XCOMET outshines different broadly recognized metrics, together with the latest LLM-based metrics comparable to GEMBA-GPT4. When in comparison with AutoMQM primarily based on GPT-4 on the phrase stage, XCOMET maintains its superiority, even when used with out reference in a High quality Estimation state of affairs!
It’s price noting that AutoMQM primarily based on GPT-4, whereas spectacular at phrase stage, depends on giant and expensive LLMs, limiting its accessibility and applicability. XCOMET, however, outperforms GPT-4 and thrives with cost-effective LLMs like GPT-3, making it a flexible alternative for researchers and practitioners within the area.
To make XCOMET accessible to the neighborhood, we now have launched two analysis fashions: XCOMET-XL, that includes 3.5 billion parameters, and XCOMET-XXL, with a formidable 10.7 billion parameters. These fashions can be found by way of the COMET framework and the Hugging Face Mannequin Hub:Â
Connecting to Unbabel High quality Intelligence
XCOMET shares its foundational expertise with UnbabelQI, with a key distinction being UnbabelQI’s utilization of proprietary MQM annotations, spanning a formidable array of as much as 30 languages, and its capability to seamlessly adapt to various buyer domains and expectations. Whereas XCOMET excels throughout numerous analysis eventualities, UnbabelQI is primarily tailor-made for reference-free high quality estimation. In real-world, ‘in-the-wild’ conditions, reference translations are much less frequent, making UnbabelQI a perfect alternative for assessing translation high quality in such eventualities.
In a world the place language and translation are extra crucial than ever, Unbabel is dedicated to revolutionizing the way in which we consider translation high quality, offering a stage of perception and efficiency that was beforehand unattainable.
To study extra about Unbabel’s QE capabilities and LangOps, go to our platform and our UnbabelQi demo right here.