<img src="https://certify.alexametrics.com/atrk.gif?account=43vOv1Y1Mn20Io" style="display:none" height="1" width="1" alt="">

How fast is AI evolving? It's now doing generative video

Stable Diffusion created this sort of image. Now developer Runway has started doing generative video
5 minute read
Stable Diffusion created this sort of image. Now developer Runway has started doing generative video

One very good reason to be nervous about AI is that we don't have any easy way to measure it. So here are some milestones to watch for.

Most of us have a feeling right now that AI is picking up speed. You don't have to have your head buried in academic papers to know something's going on. Let's assume that you don't rely on the more hysterical media as your window into the world - if you did, you'd probably feel like cowering under your desk. AI has reached the stage where it's a mainstream topic of discussion alongside sport, politics and celebrity gossip. But it still needs to be more widely understood.

I want to talk about some of the ways in which we can gauge progress in AI. I don't mean in terms of MegaTurnips or anything like that. Instead, I want to focus on certain thresholds that we are likely to pass or may have already.

Demonstrating usefulness

The first milestone is the point where AI did anything useful. It's tough to say when that's happened - not least because definitions of AI vary. But it's fair to say that AI has influenced computer translation and image analysis (and processing) over the last ten years.

It wasn't that long ago that games programmers started exploring with procedurally-generated cityscapes. These relied on algorithms built on basic parametric information about cities, using unchanging routines that could take a load of statistics about what buildings, roads and street furniture tended to be like, and create not particularly realistic "cities" based on those essential elements. This wasn't AI. It relied on fixed routines, almost certainly with an element of randomness built in. Randomness is arguably the opposite of intelligence, but it can give the appearance of design if it's applied to elements that are essentially random in the real world. If you've just arrived in a new city, the objects surrounding you might have a familiar overall appearance but at the same time seem random in the sense that you don't know where this or that street leads to or how many floors the building you've never seen, just round the corner, will have. Similarly, you won't know whether there's a coffee shop just down the road or a large truck is about to veer round the corner.

Today, AI techniques can generate photorealistic cities that seemingly go on forever. These don't rely on a fixed set of parameters but on extensive training and, ultimately, a "knowledge" of what a city is like. But a word of caution here: don't assume that the word "knowledge" means that the AI is some conscious entity that can really know something. That may come in a few years, and we'll discuss it later in the article.

Going consumer

Along the way, smartphones have been boosting their relatively meagre optical resources with AI, and many phones now have dedicated AI chipsets alongside the CPU and GPU.

This marks a milestone because it means AI has gone consumer.

The first part of 2022 was all about text-to-image. This seemingly magical technique can take a text prompt and generate frankly astonishing artwork based on a mere suggestion. It's scarily good, but, as will become a pattern, at least in these early days of truly useful AI, it's relatively easy to spot the telltale signs and artefacts of an AI-generated image - especially when you've seen a lot of them, or if you've struggled to get the result you want. Fingers are a giveaway, as are - strangely - ear lobes.

In late 2022, ChatGPT burst onto the scene and had everyone rubbing their eyes in disbelief. Never mind that anyone who wanted to see a DSLR manual written in the Scots language could now satisfy themselves, but ChatGPT broke through into public awareness in a way that no other AI model ever has. And it is genuinely significant. It may even change the way we search for things on the internet.

Just this week, there have been several announcements about AI audio models, where, for example, you can sing your own original song, and the AI will create an accompaniment to go with it.

Generative video

And now - generative video. Just this week, Stable Diffusion developer Runway announced its generative video model: "Introducing Gen-1: a new AI model that uses language and images to generate new videos out of existing ones.”

This is really happening. The rate of change is truly staggering.

All of this leaves me in the immodest position of having to say that I predicted this about eight years ago when I said that - eventually - you'd be able to feed a film script into a computer and press "go". You'd get a finished feature film as the result.

In reality, that's still a long way off, not least because it takes a lot of work to get AI to generate convincing-looking humans, especially when they're interacting with each other. But that will come. Eight years ago, it was generally thought to be impossible or a thousand years away.

Finally, I want to talk about something fundamental to the progress of AI.

From narrow to general

What we have now is mostly - almost entirely - Narrow AI (ANI - Artificial Narrow Intelligence). This is AI that's trained on a particular set of data and can appear to have superpowers as long as it stays within that knowledge domain. So a model trained to recognise dog breeds wouldn't do as well with camels or ironing boards. Medical applications of AI include analysing anatomical scans for disease. With good training data, the results can rival human experts.

The next step is AGI - Artificial General Intelligence. This requires an AI to be able to cope with anything. To be able to respond and make decisions about situations it specifically hasn't been trained for. Nobody knows how many orders of magnitude of computing resources, software cleverness or some unknown ingredient we'll need to get there, but there are signs of it already on the distant horizon.

Even though ChatGPT and Tesla's Full Self-Driving package are manifestly not AGI, both can give the impression, from certain angles, that they are. ChatGPT seems recklessly overconfident at times, giving answers wrapped up in slick language that are wildly inaccurate. It wouldn't survive long in the real world if it had to fend for itself in unfamiliar situations. On the other hand, language is fundamental to our understanding of the world and our ability to think. So we should definitely keep an eye on Large Language Models (LLMs) because who knows what might emerge from them.

Tesla's FSD is a hot potato, but it's undeniably clever. Remember that self-driving cars will never be perfect. They don't have to be: they only need to be better statistically than human drivers. In a rare candid moment, Elon Musk explained that FSD was taking longer than expected because it has to "solve reality". That sounds AGI to me, which might be why FSD is still a long way off.

Self-driving cars are even more relevant to the quest for AGI than you might have expected. It's because no AI model will ever be able to cope with anything the world throws at it until it can understand its own place in the world. At the very least, this will require "embodiment", which is something that self-driving cars already have.

Embodiment means having the physical characteristics of a living body, with senses provided by sensors analogous to eyes, ears, touch etc. It would also need another sense, one that we rarely think about. It's called "proprioception", and it's what we rely on to tell us what our body is doing and where it is in space. If you've ever closed your eyes and tried to touch your two index fingertips together in front of you, that's proprioception working. Without senses, an AGI has no way of understanding the world.

A need for robotics?

So does that mean we can't have AGI without robots? Not necessarily, but it probably does. They might not be physical robots because you could build an AGI that lives in a detailed, virtual world. But that AGI would have similar issues to a large language model like ChatGPT with a limited (or at least finite) training data set. The real world is not limited at all because of the infinite number of possible events, and combinations of them.

So when will we achieve AGI? The answer is somewhere in between never and sooner than you think. But AI has a habit these days of surprising us. When the rate of progress surprises even experts in their own field, you know something profound is going on. Right now, this seems to be commonplace. You could reasonably conclude that if AI is surprising even experts, then it could be out of control. So hold on to your hats. This year will not end the way it started. 

Tags: Technology AI