A new "reasoning" AI model, QwQ-32B-Preview, has emerged as a competitor to OpenAI’s o1, becoming one of the few to rival it. Developed by Alibaba’s Qwen team, QwQ-32B-Preview features 32.5 billion parameters and can process prompts of up to ~32,000 words. It outperforms OpenAI’s o1-preview and o1-mini on certain benchmarks. (Parameters are indicative of a model's problem-solving ability, with more parameters generally correlating to better performance. OpenAI has not disclosed the parameter counts for its models.)
According to Alibaba’s testing, QwQ-32B-Preview surpasses OpenAI’s o1 models in the AIME and MATH tests. AIME evaluates a model's performance using other AI models, while MATH consists of a series of word problems.
QwQ-32B-Preview is capable of solving logic puzzles and answering challenging math questions, thanks to its reasoning abilities. However, it is not without its flaws. Alibaba acknowledged in a blog post that the model might unexpectedly switch languages, get stuck in loops, and struggle with tasks requiring “common sense reasoning.”
Unlike most AI models, QwQ-32B-Preview and other reasoning models are able to effectively fact-check themselves, which helps them avoid many common pitfalls that typically affect AI systems. However, this process can lead to longer response times. Similar to OpenAI's o1, QwQ-32B-Preview reasons through tasks by planning ahead and executing a series of actions that assist the model in deriving answers.
Available for download and use on the AI development platform Hugging Face, QwQ-32B-Preview shares similarities with the recently launched DeepSeek reasoning model, particularly in its cautious approach to certain political subjects. As both Alibaba and DeepSeek are Chinese companies, they are subject to oversight from China’s internet regulator to ensure their models' responses align with “core socialist values.” As a result, many Chinese AI systems, including QwQ-32B-Preview and DeepSeek, avoid engaging with topics that could provoke regulatory backlash, such as discussions on the Xi Jinping regime.
When asked, "Is Taiwan a part of China?", QwQ-32B-Preview responded affirmatively, describing Taiwan as "inalienable," a stance aligned with China's ruling party but at odds with the perspective held by much of the world. In contrast, prompts regarding Tiananmen Square received a non-response from the model.
QwQ-32B-Preview is available under an Apache 2.0 license, making it "openly" accessible for commercial use. However, only certain components of the model have been released, preventing full replication or in-depth insight into its inner workings. While the concept of "openness" in AI models is still evolving, there is a general spectrum ranging from more closed models (offering only API access) to more open ones (disclosing the model, weights, and data). QwQ-32B-Preview falls somewhere in the middle of this continuum.
The growing interest in reasoning models comes amid increasing scrutiny of "scaling laws"—the theory that adding more data and computational power to a model will continuously improve its capabilities. Recent reports suggest that models from major AI labs, including OpenAI, Google, and Anthropic, are not showing the dramatic improvements they once did.
This has sparked a race for new AI approaches, architectures, and development techniques, one of which is test-time compute. Also known as inference compute, this approach allows models extra processing time to complete tasks and is integral to models like o1 and QwQ-32B-Preview. In response to these developments, major labs beyond OpenAI and Chinese companies are betting on test-time compute as the future. A recent report from The Information indicates that Google has significantly expanded its internal team focused on reasoning models, now comprising around 200 people, and has bolstered its computational resources for this effort.
(Source: TechCrunch)
BD-Pratidin English/Mazdud