Thoughts in Between Ed and Tech #10: A DALL·E Mini Review in Images
My #unautomatable journey continues with a little automated creativity.
When I started writing this particular blog post back in June, I had just moderated an interactive demo session with the Chief Evangelist from one of the biggest startups in machine learning, Hugging Face, the AI double-unicorn that just announced a $100M Series C on a $2B valuation in May.
After he finished a full-length demo on how to build an end-to-end AI application, Julien Simon dropped some wisdom for all aspiring machine learners on what to learn first. What was most intriguing to me, however, was none of that. It was the part of his presentation that inspired this very blog!
He mentioned in passing that the most popular free AI app that had ever been built by the Hugging Face community had just been released. It was getting historic amounts of web traffic because everyone wanted to use and try it out. That app, built by Boris Dayma, et al., is called DALL·E mini.
The original DALL·E is a 12-billion parameter version of the general pre-trained transformer GPT-3 that can generate an image from any text description.
Pretty wild, right?
Recently, Open AI came out with DALL·E 2, which generates even more photorealistic and accurate images. But these are expensive, technical tools that are generally hard to get access to for now.
Here’s where DALL·E mini comes in, why it’s taken the world by storm, and why you should know about it. Why everyone should know about it.
Although DALL·E mini is 27 times smaller than the original DALL·E this app contains some true AI magic, and it’s free! It’s perfect for a general audience to start understanding:
where these tools are at in 2022
how they can help all of us unleash our innate human creativity, and
how vast the possibility space of AI apps coming soon might just be
With futuristic technology like AI, seeing is believing for a general audience. And then, as Naval Ravikant is known for saying, “once something works, it’s no longer technology.” DALL·E mini is a big step in this direction.
Thinking in Words
I’m a person who thinks in words rather than images. Visualizations often turn into mantras for me, rather than being held in image in my mind. So this idea of giving an AI text and getting an image out is amazing for me.
Let’s see an example!
Input: A rabbit detective sitting on a park bench and reading a newspaper in a victorian setting.
For a collection of amazing DALL·E 2 examples, check out this repository.
OK, so here’s where things get interesting. Think about the possibility space that exists for this technology now.
As Sam Altman, CEO of OpenAI - the company that created both DALL·E and DALL·E 2 - told us in his blog upon release of the enterprise edition of the tool,
“DALL•E 2 is a tool that will help artists and illustrators be more creative, but it can also create a ‘complete work’. This may be an early example of the impact [of] AI on labor markets.” ~ Sam Altman
He goes on to say that:
“A decade ago, the conventional wisdom was that AI would first impact physical labor, and then cognitive labor, and then maybe someday it could do creative work. It now looks like it’s going to go in the opposite order.”
Spooky? Exciting? A little bit of both?
Teaching and Machine Learning
I was talking with an instructor of ours not so long ago, and we were chatting about how empowered students feel when they deploy their first-ever machine learning application, publicly, so that they can text the link to someone (generally a friend or family member), who can then try it out from wherever they are.
This is a visceral moment for learners. It gets them pumped, and ready to take their machine learning and software development game to the next level.
Moments like these are the ones that have the potential to genuinely change the outcome for learners who may just need a little extra seeing-is-believing nudge to convince themselves that they too can do this type of work.
I believe that DALLE-mini is another one of these tools that can create “Aha!” moments for everyone and can help transform all of us into ML practitioners in the 21st century. This is a watershed moment in AI learning because now you don’t even have to be someone who develops AI applications by writing code, you only need to be someone who uses them to appreciate where all of this is headed.
One tweet from Sam Altman sums it up nicely.
I think he’s right.
In fact, I’m sure that he’s right.
So I wanted to put it to the test with my own ideas. I was pleased, to say the least.
Thoughts in Between Ed and Tech, a Review in Images
As soon as I saw DALL·E mini, I knew what I had to do (even if it’s taken me more than two months to actually get it done). So let’s review all nine “Thoughts in Between Ed and Tech” articles, in images. And then let’s get meta by creating a review image for this very review.
Note: I did have to play with wording (e.g., ed versus education, tech versus technology) to get the best results. These tools are by no means perfect and still require disciplined iteration to get good results.
Without further ado:
Thoughts in Between Ed and Tech #1
This particular tapestry of education and tech with thought bubbles really spoke to me about what I’ve been trying to merge in this blog series.
Thoughts in Between Ed and Tech #2: Letting Fires Burn
This one felt right because while a book is on fire, everything isn’t burning. Things are still OK.
Thoughts in Between Ed and Tech #3: Missionaries Over Mercenaries
This person is really into thinking about making the educational experience awesome through tech! This is exactly who I want to work with; a real missionary.
Thoughts in Between Ed and Tech #4: Aiming and Hitting the Target
I thought this image was beautiful in its simplicity. The target on the stack of books is what we’re all aiming at hitting. The target somehow abstracted into the SMS message/thought bubble could’ve been designed by Apple.
Thoughts in Between Ed and Tech #5: Watch it Grow
The addition of the laptop to the stack of books forms the basis for this young girl’s imagination to grow as tall as the sky. If she can learn it and dream it, she can do it. Perfect.
Thoughts in Between Ed and Tech #6: Scope Doesn’t Creep, Understanding Grows
We can all understand this image. As we read and go deeper, the rabbit hole widens. As we understand more, we want to dig into the source material and references. As we learn more, there is more to learn.
Thoughts in Between Ed and Tech #7: Are You Willing to Wear Your White Belt?
This is a beautiful representation of early learners getting after it together at the intersection of education and technology. I often feel this way in a startup while working with the squad.
Thoughts in Between Ed and Tech #8: It’s All About the Journey
Where will this road at the intersection of education and technology take me? Where will it take us as a society? The grass looks pretty green over there, and that’s pretty exciting.
Thoughts in Between Ed and Tech #9: A Transformational View
How about this image as a wrap? A futuristic abstracted brain amplified by computers but built on the original education tool; the pages of a book. Time to transform and rise up. Indeed. Matrix style.
Thoughts in Between Ed and Tech #10: A Review in Images
Somewhere between 9 and 10 images somewhere between education and technology? That’s a wrap!
A Few Lessons Learned
I found this exercise to be pretty enlightening. It turns out that to leverage an automated creativity tool like DALL·E mini properly actually requires some creativity! Since your caption text input is the only thing that you have to work with, some of the images that I landed on took many iterations, reframes, and not to mention, googling of synonyms.
The most challenging example was TiBET #3 Missionaries Over Mercenaries. To avoid pictures of groups of people that were either soldiers or folks that missionaries aim to help took a lot of iteration. Ultimately, to get what I was going for, I replaced “missionaries over mercenaries” with simply “motivation.” It turns out that the letter of the caption wasn’t nearly as important as the spirit of the article.
Another takeaway from this exercise was that if you run the exact same caption through the same image generator, you get different results! There is an inherent randomness to what you’ll see on any given query. While not surprising, this really does add something to the mix. Once you dial in what you’re looking for, you can keep generating, and keep looking! Just don’t forget to save images that you’ll want to come back to.
What Does It All Mean?
At the end of the day, I’m incredibly interested in what technologies like DALL·E mini mean for meta-makers striving towards #unautomatability. What’s more, though, and perhaps more immediate, is that I’m fascinated by what tools like this mean for teaching and learning in all spheres. A tool like this that can provide those visceral “Aha!” moments for children and adults alike is very much worth paying attention to as we all do our best to stay ahead of the AI curve in our chosen domains of practice.
There’s much more to come in the 21st century.
As I continue down my own path of becoming, I expect to make use of this tool a regular practice for me as I wrap up future blogs. After all, why not?! The learning curve for high-leverage automated creativity is, like everything else, across iterations after all.
Time will tell where developments like DALL·E mini lead to from here, somewhere between ed and tech.
PS…The Business of AI Applications
It took a long time for me to publish this blog. The upshot is that I get to include the latest evolution of the DALL·E mini machine learning application. That is, the AI product called Craiyon (get it, an AI crayon?). Craiyon.com is where DALL·E mini will be migrating for good.
What this means is that the application will be running not on Hugging Face’s open-source Spaces platform, but rather on a new website that even shows us ads while we wait for images! Of course, this was inevitable, as the app was so popular it was being flooded with requests constantly. Requests to run AI applications, as it turns out, are not free.
As web traffic increases to machine learning applications like this, a complex dance is being performed behind the scenes. Each caption that people around the world come up with is first translated to a string of numbers, which is then fed into a pre-trained machine learning model that transforms that string of numbers into a set of images. There are many different types of calculations that go into both the pre-training of the model and also the transformation of the string of numbers from the caption. These are definitely worth checking out, butthe point here is that the constant barrage of captions that people want free pictures for requires many calculations, perhaps even what most of us would consider being uncountably many of them. This became quite an expensive and heavy lift for Hugging Face to be shouldering.
We’ve seen this before in tech.
“How do you make money?”
”We run ads.”
From my perspective, it’s great to see that an innovative open-source AI app has been rapidly shared across the world, has spread its wings, and has taken flight in just a few months. I’ll be a regular user of this AI crayon from here on out. And I’m sure that this is only the first of many creativity-automation AI apps to come as the 21st-century barrels on…