Are we there yet?
Why it may seem to some like there's a slowdown. Beware. There isn't anything like one.
So much progress has happened and keeps coming so fast.
In image generation, remember when hands were a thing that we just couldn’t seem to master? Then it was text…
Now we have Flux and Ideogram 2.0 which can handle hands, text, feet, people lying in the grass, straight out of the box. For FREE.
Video generation platforms are a dime a dozen it seems. The major ones: RUNWAY GEN3, KLING, Luma, Vidu, Sora (still not accessible for some mysterious reason), and the amazing open and controllable AnimateDiff, not to mention stable video diffusion and so many more.
Language models are a penny a dozen. Hundreds (thousands?) of language models compete for rating on the fascinating LLMSYS Chatbot Arena. Constantly supplanting one another, with Meta’s extraordinary open source (open-weights!) Llama-400B in the lead group.
What does ‘open-weights’ mean? It means you can customize your own version, trivially. You can jailbreak it, you can build on top of it, and it is a gift to everyone. Indeed you can easily run uncensored frontier models now which is fascinating in it’s own right.
And the robots are coming.
So where are we at then? Is this what AGI looks like? These things are still so limited and can’t be trusted… it seems like progress is so painfully slow (to some) and nothing has really changed.
Images seem to be just getting marginally better, we’re talking about fine details here, why has progress ‘slowed’? Same with the large language models? They seem fairly smart yet repeatedly say completely stupid things and can’t be trusted and just get lost and forget or get confused……..
Those video generators? Little more than toys right now. Very cool, very fun toys but can not compete with cameras and ‘traditional’ computer-generated imagery (ray-tracing for instance). They just aren’t stable and detailed enough. But they are getting better. When will they get to parity? Will it be a year? Three years? A month?
I can’t help but feel like high quality text to video is an excellent harbinger of AGI.
We’re not there yet. Breathe. Generative video still pales compared to cameras. But it is coming and fast.
The 80/20 Rule
I’m sure many of you are familiar with the 80/20 rule. It’s a common ‘rule of projects’ that you get 80% of the results in 20% of the time, and it will take you another 4x that amount of time to get that pesky remaining 20%. Or you could say: linear progress appears logarithmic. Or you could say it’s that polish that takes all the time at the end that really counts.
Only when the polish is complete can you see yourself in it. Whatever our analogy has morphed into the point is that the darn thing only works when it gets to 99.999% done pretty much. That goes for code, planes, movies, poetry, cooking and pretty much anything else worth doing.
So yes. It may look to some like ‘progress is slowing’ because the heady days of ‘almost there!’ in no time at all are past. But make no mistake, we are rapidly approaching the finish. That polish in which suddenly the world appears reflected like a mirror and everything is transformed.
Ok jeff, I’ll stay tuned