AUTOMATIC IMAGERY

(long read – contains random thoughts, diversions and odd tangents)

Creating videos with stock footage

A little while back I had my first experience of giving up control of the creative process to an algorithm. This involved making a couple of promotional videos using stock footage as automatic imagery. So here are the details of my battle with the machine, and some random thoughts on Doctor Who and sociopaths.

In the shadow of a plastic tree

AI – it’s everywhere. Artificial Intelligence, as a term, always seemed similar to saying ‘plastic trees’ or ‘imitation leather’ or ‘toy car’. It’s only a representation, with nothing in common with the real thing except appearance.

A plastic tree can be placed in a shopping mall to give the impression of nature, a way to calm down the space. It acts as a shortcut to soften the industrial element and perhaps as an indicator of a healthy environment. We all know it’s a lie but it does the job and creates an illusion we are happy to accept.

You could argue in some cases that the imitation is better than what it represents – faux fur perhaps. At no point do you argue it is exactly the same thing. With AI though the level of sophistication built up from billions of references is almost too much to process. In the face of this complexity we simply give up.

We start seeing it as actual intelligence rather than, in reality, a sort of high level ‘fake it til you make it’. Information pulled from millions of sites and behaviour analysed from social media come together to weave a dark spell.

Goodnight humans

In the Doctor Who episode ‘Midnight’ on the BBC from 2008 an alien presence enters a human and starts to mimic what people are saying. The character of Sky, played by Leslie Sharp, cowers in the corner after a mysterious invisible force attacks the spaceship. She then starts to copy what the people around her are saying.

Her fellow passengers are initially concerned for her then annoyed then, finally, fearful.

It’s much like the Imitation Game that kids play to annoy each other: Kid A says “Why are you copying me?” Kid B replies “Why are you copying me?” Leading to “Hey, stop copying me!” – “Hey, stop copying me!” followed inevitably by “This isn’t funny” – “This isn’t funny” And so on until the parents join in and fall helplessly into the same trap.

After a while she fixes on the Doctor (played by David Tennant) only mimicking his questions as he tries to work out what has happened. At some point this process of repetition speeds up and anticipates what the Doctor will say next – so that it now seems that the Doctor is copying her. This then appears to the passengers that the alien presence has moved and is now inside him. And so they panic and suggest throwing him off the ship.

This is a brilliant and chilling scene because the scenario and passengers’ reaction to it seem completely believable.

The future is like the present only more so

This seems to be where AI is heading – getting to the thought before the human and then the human appears to be a low level copy of a superior intelligence. The genius of the whole situation is that the whole structure of AI is to learn from others by complex comparison and mimicry. This is, of course, exactly the way a sociopath operates – and so it could be pointed out the underlaying structure of all AI is that of a sociopath.

Déjà vu

It’s generally accepted that the phenomenon of déjà vu is no more than a glitch in the brain. You may absolutely swear something has happened to you before. You may get the sense you know what the person you are talking to is about to say. The theory is there is tiny pause in the way the brain receives the information. So although the event happens in real time your brain delays the processing of it. It delays the processing of it. Sorry – what happened there?

I quite like the idea that actually what has happened is a bit like the Doctor Who episode – you have tuned into the moment so clearly and precisely and have reached an intuitive understanding of the other person that you feel their thought process as a gut instinct before they form the words. So that whatever they say after that point feels like it has already happened – as you are one step ahead. Anyway – this thought seems a better, if completely without any basis in fact, understanding of the phenomenon.

The thing I found strangest about the reboot of Doctor was the score – grand orchestral arrangements and dramatic flourishes, which people seem to love. What I really wanted was something eerie, textured and electronic – after a while I couldn’t watch it because the faux John Williams orchestration was too distracting. It should have been a dream of modular synths and sound design.

Automatic video production for the people

I saw an advert on my music distributor – ‘distribution with free Rotor video’. I had no idea what this was. Turns out to be a site that brilliantly sees the potential to create automatic videos for artists with stock footage. All this without the associated high budget a video usually entails.

For me the conceit is upsetting because the creative decisions in video making are critical. The way one cameraperson films is completely different to another – and the the way way one editor works is not the same as another. Every micro decision cascades through the other parts of the video.

My instant reaction was one of annoyance that this process can be farmed out to an algorithm for a small fee. It felt like a threat.

It’s a strong idea because it taps into a need – videos are expensive and if you can put together a cheap shoot it’s still incredibly time consuming to do it justice. Creating videos with stock footage seems like a good affordable alternative, even if the idea filled me with horror.

I didn’t want to form an opinion without actually experiencing how it worked so I had a go.

Kairos

I tried the program on the track Kairos. My first thought regarding the style I wanted was, admittedly, fairly conventional.

I looked for ‘Tron’ style synth-wave graphics and film elements – found a few – you drop them in order in their entirety on a selection grid. So a clip may be a few seconds or 20 or 30 seconds long.

Next you load up the track and it then splices the elements in the order you selected to the rhythm of the track and appears to attempt to find key movement points in the footage to line up with specific beats or breaks in the track.

If there is not enough in the selection it repeats elements to fit. I tried a few variations by ditching a couple of shots and selecting another. To be clear you can’t actually cut the shots you can only select takes.

The cold light of day

On reviewing it the next day I thought how disappointed I was – but then I had low expectations anyway. Something was happening a couple of minutes in but frankly I have made better videos on my phone for instagram.

For example here’s something I made from scratch for the same track using one of my own photos and a very short video of some sunlight shimmering on the floor from my phone – then edited on Premier Pro utilising only patience and an excessive amount of spare time: https://www.instagram.com/p/CfTP56iIJJK/ (turn sound up!)

However I had one more credit and I got to thinking how I could work with the system and really see what I could get out of it. What are it’s strengths and how can I utilise them? The Rotor site actually has some dynamic templates that you can pay for that add a lot more energy and variation to the overall video . I decided to stick with the more straightforward cinematic approach for this test.

Playing God v Why bother?

While loading in my next track to the Rotor site I got to wondering whether the program eventually would become like the Doctor Who episode – you record a song – and as you are working on it the AI generates the imagery that works with the audio – warping and shaping as you complete the mix – by the time you have the mastering done the video is ready. Or maybe it works out your mood and your back catalogue (like ChatGPT) and then writes the tune for you as well, adds in a deep fake of you doing a nice performance and some basic choreography – All things are possible until you hit the critical question ‘Yes but why bother?’

It’s all about the process

When the entire process is automated it has effectively removed the agency of the artist. Music and film are not, surprisingly, about the result but about the process. Certainly the result is important but it needs to work to represent you as an artist and the process you went through to get there. Generally for the audience it is of little importance what decisions were made in the creative process – but it is encoded in the work. Whether it specific reference to your life, or simply the sounds that inspire you at that moment in time.

Back to Rotor Videos – I thought to myself – I’m approaching this wrong – I need to approach it as a director, I need to play God.

Live – die – repeat

One credit left. I had a realisation, fairly obvious I know but here it is: Stock footage is generally posed and artificial in appearance. It is usually made by people with the soul purpose of making footage for usage rather than to film something for it’s own reason. It is not made as part of a film or a script or to capture a memory. This means that it almost always lacks drama, weight and sincerity. Archive footage on the other hand is about capturing events in history, at gigs, local events, world news. When you are editing together archive footage for a documentary the footage brings it’s own sense of history and gravitas.

Much like using a classic sample from Bob James or Alan Hawkshaw – the time and the place has been bottled, carefully preserved – it’s not purely the notes, not only the sound of the keys but a whole world spliced into yours – the kit, the musicians, the mood they were in and the ambience of the studio, the quality of the mix. Copying that authenticity brings us back to the idea of plastic trees and Artificial Intelligence – it’s never really a tree and it’s never really intelligent.

Taking stock

In the context of using Rotor for making a quick video all you really have is a library of stock footage mainly faking scenarios and moods. You have to forget this though and pretend to yourself it is an incredible collection of archive and high quality promo shots. Once you think like this the game seems to change – you aim a higher – and if you fall below your ambition it’s still higher than starting with low expectations.

Since I have some experience structuring and shaping other peoples footage to give it an effortless flow I got to wondering about the process. Is it only reliant on precision cut points and the sequence? Or is it, as I have always thought, about the spatial dynamics of the shots and the flow of ideas. It’s about where to come close in – where to add space, how to lead the eye. It’s about finding the resonance between sound and picture. More importantly it’s about having an idea of the overall shape of the work. You can put any picture to any sound but only some find an affinity and create a third presence. By this I mean the phenomenon of an additional quality beyond just the pure sound and the pure picture.

Reverse engineering

With this in mind I imagined the process backwards. Imagine the images you would have if you had spent the money – the time the place the location. I looked at the way I’d built videos that spend phenomenal amounts of money on location, dancers and helicopters (pre drones…) Then matched back how the shots worked spatially. Next I searched until I find similar scenarios in similar spaces. To get some cohesion I searched specific locations that I had some experience of so it was pertinent to the song. Places I would have shot if I only had the money to make high end videos.

I choose Lisbon Portugal since the song was written (loosely) about my experiences there. San Francisco was my next choice. It was the place that when I visited I felt a strange and complex energy there that couldn’t quite be defined. Since the mood and light of both areas are quite unique I wondered if any of it would translate to stock footage. I also had some original footage from my visit but decided to save that for another video another time.

All things splice

So I lined up all the relevant shots after this process and hit render. Magically the algorithm got pretty damn close. If a shot dynamic felt like it occurred too early or too late for the appropriate part of the track you can’t really edit it. What you can do is pull out a shot before it to encourage the algorithm to play it earlier. However it may then emphasise everything after it earlier than you wanted. I tried adding small shots to the flow to get key events to trigger later while removing later ones to balance the action.

So after about 8 hours and 48 attempts I found a version I was happy with. A version where most of the elements had some resonance. Early iterations of the video sometimes had better action points within a particular shot. Unfortunately the algorithm would abandon these magic moments in the revised version.

This is an interesting area for filmmakers and musicians. Once you start to try to adjust it you get into the area of normal editing. In this app, however, your hands are tied. If you actually could edit in the app then it would take away the unique ‘auto generated’ element. If this was the case then the line between the on-site automation and your own manual editing would quickly get blurred.

Who cares if it makes sense

Hey that’s my shot – oh actually it really isn’t

Overall it was quite a positive experience. The program did generate the result I wanted albeit with some persistence and ‘directorial’ input. It is genuinely exciting to see a version of your reality from another creative direction. This is a lot like having another artist remix your tune. The video works spatially and after much messing around has a flow to it as well. So who really made this video? It makes me laugh when people put ‘directed by’ themselves after making a Rotor video. On reflection it does kind of makes sense. You have, in fact, played the role of director to the algorithm.

For me if I am going to make my own videos I would prefer to create the shots from scratch. However, I still got a thrill delegating camerawork to unknown authors and farming out the editing to the machine. I wouldn’t want to make a habit out of it though. Bear in mind that the footage is non-exclusive. It can be quite disconcerting seeing shots you feel are now part of ‘your’ video in someone else’s Rotor promo.

(NB this is not a sponsored post – I received a free credit when purchasing distribution from CD Baby)

Follow me on instagram for images and little bits of magic spotted in the shadows.

https://www.instagram.com/tobiasdezaldua/

For a more coherent article on the creative process in general I recommend you take a look at ‘Writer/Producers Block Busters’

Automatic Imagery

AUTOMATIC IMAGERY

Creating videos with stock footage