Mistral AI, in collaboration with NVIDIA, has introduced NeMo, a robust 12-billion parameter model. This cutting-edge model features a remarkable context window that can handle up to 128,000 tokens and is noted for its top-tier performance in reasoning, world knowledge, and coding accuracy within its size category.
The partnership between Mistral AI and NVIDIA has led to a model that not only excels in performance but also emphasizes user-friendliness. Mistral NeMo can seamlessly replace systems currently operating with Mistral 7B due to its reliance on a standard architecture.
To foster wider adoption and encourage further research, Mistral AI has released both the pre-trained base and instruction-tuned checkpoints under the Apache 2.0 license. This open-source approach is expected to be highly attractive to both researchers and enterprises, potentially accelerating the integration of the model into various applications.
A standout feature of Mistral NeMo is its quantization-aware training, which facilitates FP8 inference without sacrificing performance. This capability is particularly beneficial for organizations aiming to deploy large language models efficiently.
Performance comparisons provided by Mistral AI highlight the Mistral NeMo base model’s superiority over two recent open-source pre-trained models: Gemma 2 9B and Llama 3 8B.
Mistral AI emphasized the model’s suitability for global, multilingual applications. It’s proficient in function calling, boasts a large context window, and excels in languages including English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.
“This is a new step toward bringing frontier AI models to everyone’s hands in all languages that form human culture,” Mistral AI stated.
Mistral NeMo also introduces Tekken, a new tokenizer inspired by Tiktoken. Tekken, trained on over 100 languages, improves compression efficiency for both natural language text and source code, outperforming the SentencePiece tokenizer used in previous Mistral models. According to the company, Tekken is about 30% more efficient in compressing source code and several major languages, with even greater gains for Korean and Arabic.
Furthermore, Mistral AI claims that Tekken surpasses the Llama 3 tokenizer in text compression for around 85% of all languages, giving Mistral NeMo a potential advantage in multilingual applications.
Both the base and instruction-tuned versions of the model's weights are now available on HuggingFace. Developers can start experimenting with Mistral NeMo using the mistral-inference tool and customize it with mistral-finetune. For users of Mistral’s platform, the model is accessible under the name open-mistral-nemo.
Additionally, Mistral NeMo is available as an NVIDIA NIM inference microservice on ai.nvidia.com, thanks to the collaboration with NVIDIA. This integration could simplify deployment for organizations already utilizing NVIDIA’s AI ecosystem.
The launch of Mistral NeMo marks a significant advancement in making sophisticated AI models more accessible. By combining high performance, multilingual capabilities, and open-source availability, Mistral AI and NVIDIA aim to provide a versatile tool for a broad range of AI applications across various industries and research domains.