OpenAI has unveiled ChatGPT Images 2.0, a significant upgrade to its image generation model, just over a year after introducing the ability to create images directly from its chatbot. The company describes the new system as a “step change” for image generation, particularly in its ability to follow detailed instructions, render dense text, and accurately place and relate objects within a scene.

For the first time, OpenAI has integrated reasoning capabilities into an image model, enabling the system to perform tasks such as searching the web and verifying its outputs. These enhancements are designed to improve reliability, especially when accuracy, consistency, and visual cohesion are critical. An example of ChatGPT's new non-Latin rendering abilities. OpenAI

OpenAI has also focused on improving the model’s understanding and rendering of non-Latin text, achieving “significant gains” in handling languages such as Japanese, Korean, Chinese, Hindi, and Bengali. Additionally, the model now better captures the unique characteristics of different visual languages, making it more effective for applications like game prototyping and storyboarding.

The new model offers greater flexibility in aspect ratios, generating images as wide as 3:1 or as tall as 1:3. It supports resolutions up to 2K and can produce up to eight outputs in a single request. A tortoiseshell cat in the style of Pokémon's third generation of games. ChatGPT

Prior to its public release, I previewed ChatGPT Images 2.0. For my first test, I prompted the model to generate an image of a tortoiseshell cat in the pixel art style of Pokémon’s third generation. This was a challenging task, as AI models often struggle with pixel art, and the Game Boy Advance Pokémon games are iconic for their distinctive style. The result was impressive, accurately capturing the essence of the requested style.

Next, I tasked the model with converting the generated image into a transparent PNG format. While the process took longer than expected, the output met the requirement, though it differed slightly from the original image. Finally, I asked ChatGPT to create a four-page manga featuring my cat enjoying a sunny day by a city stream. Notice how the cat isn't rendered exactly like the one above it. ChatGPT

Of the three tests, the second task consumed the most time, and the output deviated slightly from my initial prompt. However, the model successfully generated a transparent image, a capability that other image models often struggle with. As more users begin testing Images 2.0, we will gain a clearer understanding of how it compares to competitors like Google’s Nano Banana 2.

Source: Engadget