Skip to main content
Ovis-Image is a 7B text-to-image model built upon Ovis-U1, specifically optimized for high-quality text rendering. It delivers text rendering quality comparable to much larger 20B-class systems while remaining compact enough to run on widely accessible hardware. Model Highlights:
  • Strong Text Rendering at 7B Scale: Delivers text rendering quality comparable to much larger 20B-class systems like Qwen-Image and competitive with leading closed-source models like GPT4o in text-centric scenarios
  • High Fidelity on Text-Heavy Prompts: Excels on prompts that demand tight alignment between linguistic content and rendered typography (e.g., posters, banners, logos, UI mockups, infographics)
  • Accurate Bilingual Text Rendering: Produces legible, correctly spelled, and semantically consistent text in both Chinese and English across diverse fonts, sizes, and aspect ratios
  • Efficiency and Deployability: Fits on a single high-end GPU with moderate memory, supports low-latency interactive use
Related Links:

Ovis-Image text-to-image workflow

Download JSON Workflow File

Run on ComfyUI Cloud

Make sure your ComfyUI is updated.Workflows in this guide can be found in the Workflow Templates. If you canโ€™t find them in the template, your ComfyUI may be outdated. (Desktop versionโ€™s update will delay sometime)If nodes are missing when loading a workflow, possible reasons:
  1. You are not using the latest ComfyUI version (Nightly version)
  2. Some nodes failed to import at startup
text_encoders diffusion_models vae Model Storage Location
๐Ÿ“‚ ComfyUI/
โ”œโ”€โ”€ ๐Ÿ“‚ models/
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ text_encoders/
โ”‚   โ”‚      โ””โ”€โ”€ ovis_2.5.safetensors
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ diffusion_models/
โ”‚   โ”‚      โ””โ”€โ”€ ovis_image_bf16.safetensors
โ”‚   โ””โ”€โ”€ ๐Ÿ“‚ vae/
โ”‚          โ””โ”€โ”€ ae.safetensors