Alibaba introduces latest enhancements to its ‘Qwen 2.5’ visual-language series

by

Azunta Gaviola

-

1 year ago

Beijing, China – Alibaba, a Chinese tech company, has recently issued a statement noting the launch of Qwen2.5-VL, an upgraded version of its visual-language model predecessor, Qwen2-VL.

As per the company, the multimodal model is offered in an open-source format, with sizes ranging from 3 billion to 72 billion parameters, and includes both base and instruction-tuned variants.

The Qwen2.5-VL-72B-Instruct model can be accessed as well on the Qwen Chat platform, alongside the entire Qwen2.5-VL series hosted on Hugging Face and Alibaba’s Model Scope.

In terms of capabilities, the Qwen2.5-VL can interpret complex visual elements, including texts, diagrams, charts, graphics, and image structures. It can also understand videos longer than an hour and answer video-related questions while accurately identifying specific segments down to the exact second.

In addition, the model can develop structured outputs, like JSON, enabling the automatic extraction and organisation of data from invoices, forms, and tables. Said capability streamlines processes in finance and legal sectors.

Meanwhile, Qwen2.5-VL may also function as a visual agent that facilitates task execution on computers and mobile devices, such as checking the weather or booking flights, through the use of a guiding tool. 

In particular, the flagship model Qwen2.5-VL-72B-Instruct has performed a series of benchmarks covering domains and tasks including document and diagram reading, general visual question answering, college-level math, video understanding, and visual agent.

From this end, researchers have improved the model’s multimodal capabilities by implementing dynamic resolution and frame rate training for enhanced video understanding. They have also introduced a visual encoder, integrating Window Attention within a dynamic Vision Transformer (ViT) framework to accelerate both training and inference. 

These innovations make the model a crucial solution for diverse multimodal applications across various fields.

Apart from these developments, Alibaba has also launched the latest version of the Qwen large language model, known as Qwen2.5-1M. This open-source iteration is distinguished by its capability to process long context inputs, with the ability to handle up to 1 million tokens.

Included in the release are two instruction-tuned models, Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M,boasting 7 billion and 14 billion parameters. These models have been made available on Hugging Face.

It has also unveiled a corresponding inference framework optimised for processing long contexts on GitHub. This framework is tailored to help developers deploy the Qwen2.5-1M series more cost-effectively. 

By leveraging techniques such as length extrapolation and sparse attention, the framework can process 1-million-token inputs with speeds 3 to 7 times faster than traditional approaches, offering a potent solution for developing applications that require long-context processing with more efficiency.

Recently, Alibaba also made an announcement introducing Qwen2.5-Max, a next-generation AI model they claim surpasses several top AI systems in key performance benchmarks. This latest model is now accessible to developers via Alibaba Cloud services and Alibaba’s conversational AI platform, Qwen Chat.

Recognise the innovators redefining commerce at the Retail & E-commerce Excellence Awards Asia Pacific 2026! Taking place this December 2026, we celebrate the region’s most impactful retail strategies, standout e-commerce experiences, and forward-thinking leaders—submit your entries today!
Honour the women shaping the future of marketing and technology at the Empowered Women Awards 2026! This December 2026, we celebrate inspiring leaders, changemakers, and rising voices driving impact across the industry—submit your entries today!
Share

RECENT ARTICLES

HDBank taps London Stock Exchange to broaden global funding routes for Vietnamese firms
Paymentology enters Australian market to support evolving fintech, digital payments landscape
AEON360, Google Cloud collaborate on AI ecosystem to enhance retail experiences in Southeast Asia
Sumsub taps Go Digital Philippines to strengthen digital trust, AI governance across ASEAN
ITSEC Asia launches IntelliBron Aman Enterprise to expand mobile cybersecurity across Indonesia
Ellipse 3

RELATED ARTICLES

Alibaba Cloud introduces revamped AI-focused partner ecosystem with latest initiatives_11zon
Alibaba Cloud introduces Qwen 2
Alibaba Cloud to expand regional footprint in key international markets to boost cloud, AI infrastructure_11zon
Ellipse 3

FEATURED ARTICLES

Levelling up beyond gameplay: How Coda innovates content monetisation with out-of-app strategies
1_Huawei unveils smart tech strategies for secure, transparent e-commerce future 
EW2025_(UT)Launch Article_Feature Image_11zon

Subscribe to UpTech Media Newsletter