09 Nov 2018

We never talk about the most important part of "seeing" an image

  • Written by 
  • submit to reddit  
We never talk about the most important part of "seeing" an image Shutterstock - agsandrew

How our brains interpret the real world might hold the key to how we improve our electronic imagery.

Outside of our heads, it’s easy enough to trace the path of an image on its journey to our eyes. Typically there’s a lens, a sensor, some processing, a file (or a real-time output), some sort of player, and then a display. 

And that’s it. It’s all there if you want to analyse it. There’s no need to speculate about any part of it, because we can go in, take it all apart, and see - and prove - what’s happening. 

But what happens next? This is where it gets a bit foggy. 

Eyes are pretty well understood. We won’t go into that here. We know that images are taken from the eye to the brain via the optic nerve. So far it’s all optics and wiring. 

But what happens then?

A whole load of stuff.

Some neuroscientists reckon there are at least twelve separate mechanisms involved in the ability to analyse an image and, for example, recognise objects. There are all sorts of pathways involved. The Neocortex is part of this, with its hierarchical structure that finds simple patterns first and associates them into more complex structures. It’s like going from the vertical line and the semi-circular part of the letter “P” to a complete word, “Pig” to a complete concept “Pigs live on a farm”, by associating adjacent recognisable elements both vertically between hierarchical levels and and horizontally, between adjacent concepts. 

All of which is fine and somewhat of a given. But what happens then? Because this isn’t the final stage. 

So, what is the final stage? What’s the final part of the process of perceiving something: perception, in other words? 

When you perceive something, it’s almost as if you’re crossing some sort of boundary. It’s a threshold between the outside world and - what, exactly? 

The answer is awareness. It’s the point at which something enters your consciousness. 

To philosophers and neuroscientists, bringing consciousness into the conversation opens up a warehouse of canned worms. It’s easy to see why: we know so little about it. 

We are our consciousness

That seems pretty incredible, doesn’t it? After all, we live inside our consciousness. We are our consciousness. Our consciousness and our self-identity (whatever it is that we are referring to when we say “I” or “me”) are all we know, and all we have with which to compare ourselves with the world and other people. When you live all your life inside a tent, it’s hard to know what goes on outside.

But we do purport to know what goes out outside. We do this mostly through perception (I say “mostly” because some things we know by definition and self-evidence). 

The point at which we start to know what’s happening outside is called awareness. It’s the threshold between the world and our brain processes, and our consciousness. 

All of which is a rather long-winded way of saying that rather than concentrating on resolution in our video images, we should perhaps be looking beyond that single aspect of the nature of our captured images, towards how we can optimise images for that transition through awareness into our consciousness. 

This is all getting rather abstract, so to bring it back down to Earth, let me make a comparison between the way our eyes and our brain interpret images and the way we write programs optimised for graphics cards (GPUs). 

GPUs can’t all be the same. If they had to be, they would never be allowed to improve. So languages have been developed where developers write a series of instructions based on an abstracted set of commands and routines. To massively oversimplify, in order to draw a circle, instead of having to tell the GPU to light up a set of pixels that we specify pixel by pixel, that happen to live in a circle, we would use this higher-level language to say “Draw a black circle, this wide, this colour, with this diameter, with its centre right here”. The GPU then “renders” this circle based on the known characteristics of the display. Similar instructions exist for a wide variety of “primitives”, and this ensures that programs written for GPUs are at least to some extent portable between different hardware - as long as this “language” is shared by them and “understood” in the sense of being able to correctly interpret the instructions. 

The more you optimise and inform your instructions for the specific language used by your targeted GPU, the faster and more efficiently it will work. 

(Open CL is one example of such a “language”. CUDA, used by Nvidia, is another). 

These languages break down into what are called “Graphics Primitives”. they’re the most basic parts of what make up an image (like the circle described above). 

I have a hunch that our perception works in a similar way: a set of graphics primitives that are used as the basic building blocks for our mental images. It’s a hierarchical set of operations that build vertically to make an image of any complexity.

I believe we should be researching this area of our mental activity. It’s only when we begin to understand it that we will be able to move beyond pixels - and perhaps even beyond displays. 

Please don’t think I’m saying that in the future we won’t need to capture images with a very high resolution: I’m not saying that. If a camera’s resolution isn’t enough to capture the detail we want to see, then you need higher resolution. 

But once we’ve got those images, we need to find a better way to shoehorn all that information through our neural pathways and though the threshold of our awareness. 

We need a massively better understanding of consciousness and awareness for this to happen. And we’ll probably need a thousand-fold increase in processing capacity.

But that might only take ten years*

*(I estimate that the current iPhone is around a thousand times more powerful than the original iPhone, in the space of only eleven years. With the aid of AI, graphics cards have recently taken huge leaps in performance in areas like real-time ray tracing)

Image : Shutterstock - agsandrew


David Shapton

David is the Editor In Chief of RedShark Publications. He's been a professional columnist and author since 1998, when he started writing for the European Music Technology magazine Sound on Sound. David has worked with professional digital audio and video for the last 25 years.

Twitter Feed