Chinese AI company DeepSeek has introduced its latest text-to-image AI model, Janus-Pro-7B, positioning it as a stronger contender than OpenAI’s DALL-E 3 and Stability AI’s Stable Diffusion 3 Medium. The company claims that Janus-Pro-7B surpasses its competitors in multimodal understanding and image generation tasks, marking a significant leap in AI-powered content creation.
This announcement was made in DeepSeek’s latest research report, “Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling,” which outlines the model’s advancements in training efficiency, scalability, and output quality.
Janus-Pro-7B Outperforms DALL-E 3 and Stable Diffusion 3 Medium
DeepSeek’s Janus-Pro-7B achieved a score of 0.80 on the GenEval leaderboard, a benchmark for text-to-image instruction-following, outperforming:
•Janus (0.61)
•OpenAI’s DALL-E 3 (0.67)
•Stable Diffusion 3 Medium (0.74)
Additionally, in the multimodal understanding benchmark MMBench, Janus-Pro-7B scored 79.2, surpassing competitors like:
•Janus (69.4)
•TokenFlow (68.9)
•MetaMorph (75.2)
These scores indicate Janus-Pro-7B’s superior ability to interpret and generate images from text inputs, further reinforcing DeepSeek’s claim that it outpaces OpenAI and Stability AI in AI-generated imagery.
Addressing Key AI Challenges: The Evolution from Janus to Janus-Pro
Janus-Pro-7B builds upon its predecessor, Janus, by solving major challenges in visual encoding and text-to-image generation.
Key Upgrades in Janus-Pro-7B:
•Decoupled visual encoding, allowing the model to excel in both multimodal understanding and image generation without performance trade-offs.
•Enhanced training datasets, incorporating 72 million high-quality synthetic images combined with real-world data, resulting in more stable and precise image outputs.
•Scalability improvements, with two configurations: 1B and 7B parameters, enabling a more efficient, large-scale AI deployment.
The original Janus model, while groundbreaking, suffered from limited model capacity and training data, leading to suboptimal short-prompt image generation and unstable outputs. Janus-Pro-7B effectively resolves these weaknesses through advanced training strategies and expanded datasets.
DeepSeek’s Rapid Growth and Industry Disruption
DeepSeek, founded in 2023 by Chinese entrepreneur Liang Wenfeng, has quickly established itself as a key player in AI development.
•In January 2024, the company open-sourced its models, which topped iPhone app download charts, surpassing even OpenAI’s ChatGPT app.
•Its latest reasoning model, R1, has drawn comparisons to industry leaders like OpenAI and Meta, with DeepSeek touting superior efficiency and cost-effectiveness.
•DeepSeek trained one of its latest AI models for just $5.6 million, a fraction of the $100 million to $1 billion typically required by major industry players.
•Unlike most AI companies, DeepSeek claims its models do not rely on the most powerful AI accelerators, suggesting a highly optimized AI development process.
A Strong Contender in AI-Generated Content
With the launch of Janus-Pro-7B, DeepSeek is positioning itself as a major disruptor in AI-generated imagery, directly challenging OpenAI and Stability AI. By achieving superior multimodal understanding, lower training costs, and enhanced image generation capabilities, the company is setting a new benchmark for AI-powered creativity.
As AI-generated content continues to revolutionize industries such as entertainment, marketing, and design, DeepSeek’s rapid advancements signal a shifting landscape, where new players like Janus-Pro-7B could redefine the future of AI-driven art and media.