Mixtral
Description
Mixtral 8x7B, a high-quality sparse mixture of experts model (SMoE) with open weights. Mixtral outperforms Llama 2 70B on most benchmarks with 6x faster inference. It matches or outperforms GPT3.5 on most standard benchmarks. <br>paper:https://arxiv.org/pdf/2401.04088.pdf <br>news:https://mistral.ai/news/mixtral-of-experts/
Related Tools
DeepSeek-R1
DeepSeek's first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.
DeepSeek-V3
A strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.
Qwen3
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
Gemma 4
Gemma 4 is Google's latest open source large language model series, built on the Gemini architecture, offering improved performance, longer context windows, and better multilingual support.
Llama 3
Llama3 is a large language model developed by Meta AI. It is the successor to Meta's Llama2 language model. <br>Online test address:<br>[huggingface.co/Meta-Llama-3-70B-Instruct](https://huggingface.co/chat/models/meta-llama/Meta-Llama-3-70B-Instruct)
grok-1
A large language model open sourced by xAI