Be a part of our each day and weekly newsletters for the most recent updates and unique content material on industry-leading AI protection. Be taught Extra
In the event you haven’t heard of “Qwen2” it’s comprehensible, however that ought to all change beginning at present with a stunning new launch taking the crown from all others in relation to an important topic in software program improvement, engineering, and STEM fields the world over: math.
What’s Qwen2?
With so many new AI fashions rising from startups and tech firms, it may be onerous even for these paying shut consideration to the area to maintain up.
Qwen2 is an open-source massive language mannequin (LLM) rival to OpenAI’s GPTs, Meta’s Llamas, and Anthropic’s Claude household, however fielded by Alibaba Cloud, the cloud storage division of the Chinese language e-commerce large Alibaba.
Alibaba Cloud started releasing its personal LLMs underneath the sub model identify “Tongyi Qianwen” or Qwen, for brief, in August 2023, together with open-source fashions Qwen-7B, Qwen-72B and Qwen-1.8B, with 72 billion and 1.8-billion parameters respectively (referencing the settings and finally, intelligence of every mannequin), adopted by multimodal variants together with Qwen-Audio and Qwen-VL (for imaginative and prescient inputs), and at last Qwen2 again in early June 2024 with 5 variants: 0.5B, 1.5B, 7B, 14B, and 72B. Altogether, Alibaba has launched greater than 100 AI fashions of various sizes and features within the Qwen household on this time.
And clients, notably in China, have taken be aware, with greater than 90,000 enterprises reported to have adopted Qwen fashions of their operations within the first yr of availability.
Whereas many of those fashions boasted state-of-the-art or close-to-it efficiency upon their launch dates, the LLM and AI mannequin race extra broadly strikes so quick all over the world, they have been rapidly eclipsed in efficiency by different open and closed supply rivals. Till now.
What’s Qwen2-Math?
At this time, Alibaba Cloud’s Qwen workforce peeled off the wrapper on Qwen2-Math, a brand new “series of math-specific large language models” designed for English language. Probably the most highly effective of those outperform all others on the earth — together with the vaunted OpenAI GPT-4o, Anthropic Claude 3.5 Sonnet, and even Google’s Math-Gemini Specialised 1.5 Professional.
Particularly, the 72-billion parameter Qwen2-Math-72B-Instruct variant clocks in at 84% on the MATH Benchmark for LLMs, which offers 12,500 “challenging competition mathematics problems,” and phrase issues at that, which will be notoriously troublesome for LLMs to finish (see the take a look at of which is bigger: 9.9 or 9.11).
Right here’s an instance of an issue included within the MATH dataset:
Candidly, it’s not one I might reply by myself, and definitely not inside seconds, however Qwen2-Math apparently can more often than not.
Maybe unsurprisingly, then, Qwen2-Math-72B Instruct additionally excels and outperforms the competitors at grade college math benchmark GSM8K (8,500 questions) at 96.7% and at collegiate-level math (School Math benchmark) at 47.8% as properly.
Notably, nonetheless, Alibaba didn’t evaluate Microsoft’s new Orca-Math mannequin launched in February 2024 in its benchmark charts, and that 7-billion parameter mannequin (a variant of Mistral-7B, itself a variant of Llama) comes up near the Qwen2-Math-7B-Instruct mannequin at 86.81% for Orca-Math vs. 89.9% for Qwen-2-Math-7B-Instruct.
But even the smallest model of Qwen2-Math, the 1.5 billion parameter model, performs admirably and near the mannequin greater than 4 instances its measurement scoring at 84.2% on GSM8Kand 44.2% on faculty math.
What are math AI fashions good for?
Whereas preliminary utilization of LLMs has targeted on their utility in chatbots and within the case of enterprises, for answering worker or buyer questions or drafting paperwork and parsing data extra rapidly, math-focused LLMs search to offer extra dependable instruments for these seeking to often remedy equations and work with numbers.
Satirically given all code relies on mathematic fundamentals, LLMs have to date not been as dependable as earlier eras of AI or machine studying, and even older software program, at fixing math issues.
The Alibaba researchers behind Qwen2-Math state that they “hope that Qwen2-Math can contribute to the community for solving complex mathematical problems.”
The customized licensing phrases for enterprises and people searching for to make use of Qwen2-Math fall wanting purely open supply, requiring that any industrial utilization with greater than 100 million month-to-month lively customers receive a further permission and license from the creators. However that is nonetheless an especially permissive higher restrict and would enable for a lot of startups, SMBs, and even some massive enterprises to make use of Qwen-2 Math commercially (to make them cash) free of charge, basically.