Cloud Southeast Asia

Alibaba Cloud now offers ‘free access’ to its video generation models

by

Azunta Gaviola

-

1 year ago

Our What’s NEXT in Marketing 2026 Conference Series returns to the Philippines, Singapore, Hong Kong, Indonesia, Malaysia—and for the first time in Thailand! Brace yourself for bold ideas driving the next marketing wave. Click here to learn more!

Singapore – Aiming to foster open-source innovation, Alibaba Cloud has recently announced that its video generation models are now accessible for free.

The strategic move, according to the company, will involve the release of four models from its Wan2.1 series. This includes a 14-billion-parameter and 1.3-billion-parameter version of its latest video foundation model, Tongyi Wanxiang (Wan). These models are also designed to generate high-quality images and videos from both text and image inputs.

Additionally, the T2V-14B, T2V-1.3B, I2V-14B-720P and I2V-14B-480P models can be downloaded from Alibaba Cloud’s AI model community, Model Scope, and the collaborative AI platform Hugging Face, making them accessible to academics, researchers, and commercial institutions worldwide.

The Wan2.1 series, introduced earlier this year, is the first video generation model to support text effects in both Chinese and English. This model series creates realistic visuals by accurately capturing complex movements, improving pixel quality, adhering to physical principles, and refining instruction execution.

Its precision in following instructions has further propelled Wan2.1 to the top of the VBench leaderboard, a comprehensive benchmark suite for video generative models. Interestingly, it has also become the only open-source video generation model among the top five on the VBench leaderboard of Hugging Face.

Moreover, the training of video foundation models requires massive computational resources and large volumes of high-quality data. Open access reduces entry barriers, allowing businesses to utilise AI for cost-effective, customised visual content creation.

The T2V-14B model is also better suited for creating high-quality visuals with substantial motion dynamics, while the T2V-1.3B model strikes a balance between generation quality and computational power, making it ideal for a broad range of developers conducting secondary development and academic research.

Beyond text-to-video generation, the I2V-14B-720P and I2V-14B-480P models further offer image-to-video capabilities. Users only need to upload an image and a brief text prompt to produce dynamic videos, with the platform accommodating normal-sized images of any dimensions.

As per the firm, Alibaba Cloud was among the first major global tech companies to open-source its self-developed large-scale AI model, introducing its first open model, Qwen (Qwen-7B), in August 2023.

Since then, Qwen models have consistently ranked at the top of the Hugging Face Open LLM Leaderboards, demonstrating performance on par with leading global AI models across multiple benchmarks.