The most recent technology can create scenes including city skylines and cafes and create images that look amazingly real at the very least, initially.
However, one of the longstanding weaknesses of text-to-image AI models is the fact that they cannot read, or even write the text. Even the top models struggle to produce images that have legible logos or text calligraphy, or fonts.
DeepFloyd the research organization that is supported by Stability AI, unveiled DeepFloyd IF, a text-to-image model with a text-to-image model that can “smartly” integrate text into images. Based on a set comprised of more than one billion images and text DeepFloyd, which requires the use of a GPU that has a minimum of 16GB of RAM in order to be able to run, can create an image using a prompt such as “a teddy bear wearing a shirt that reads ‘Deep Floyd'” -or in various designs.
Read More:- What are some impressive images generated by AI from just text?
The restriction was likely motivated by the current tenuous legal status of generative AI art models. Several commercial model vendors are under fire from artists who allege the vendors are profiting from their work without compensating them by scraping that work from the web without permission.
NightCafe Chief Executive Officer Angus Russell spoke to TechCrunch about the factors that make DeepFloyd IF different from other text-to-image models and why it could be a major step in the direction of the field of generative AI.