While occidental culture has for the longest time practically subjected the memory of images to verbal or numerical access (alphanumerical indexing by authors and subjects, and even Sergej Eisenstejn subjected films to the idea of deciphering its virtual story-book by transcribing moving images into a score – a kind of reverse engineering of the written script), the iconic turn, predicted by W. T. Mitchell, is still to come in the field of image-based multimedia information retrieval.
In media culture there is still the problem that audio-visual analogue sources can or should not be addressed like texts and books in a library; these resources form a rather unconquered multi-media datascape.
Addressing and sorting non-scriptural media remains an urgent challenge (not only for commercial TV) which after the arrival of fast-processing computers can be matched by digitizing analogous sources. This does not necessarily result in better image quality but rather in the unforseen option of addressing not only images (by frames) but even every single picture element (pixel). Images and sounds thus become calculable and can be subjected to algorithms of pattern recognition – procedures which will „excavate" unexpected optical statements and perspectives out of the audio-visual archive which, for the first time, can organize itself not just according to meta-data but according to its proper criteria – visual memory in its own medium (endogenic).
By translating analogous, photographic images (of which film, of course, still consists) into digital codes, not only images become adressable in mathematical operations, but their ordering as well can be literally calculated (a re-appearance of principles of picture-hanging envisaged by Diderot in the eighteenth century).
The subjection of images to words is not just a question of adressing, but of still applying the structuralist-linguistic paradigm on audiovisual data as well.
Within the medium film, the practice of montage (cutting) has always already performed an kind of image-based image sorting (by similarity, f. e.). Cutting has two options: to link images by similarity of by contrast (Eisenstein´s option). Only video – as a kind of intermediary medium between classical cinema and the digital image – has replaces mechanical addressing of cinematographic images by different means (timecode), offering new options of navigating in stored image space. Automated digital linking of images by similarity, though, creates rather unexpected, improbable links: which are, in the theory of information, the most informative, the least redundant ones. It also allows for searching for the least probable cuts.
Jurij Lotman explained in his film semiotics: „Joining chains of varied shots into a meaningful sequence forms a story." This is being contrasted by Roger Odin´s analysis of Chris Marker´s film La Jetée (1963); how can a medium, consisting of single and discrete shots, in which nothing moves internally – photographic moments of time (frozen image) -, create narrative effects? Cinematographic sequences are time-based, but film as such – the cinematographic apparatus - „has no first layer of narrativity", when being looked at media-archaeologically .
The absence of reproduction of movement <...> tends to block narrativity since the lack of movement means that there is no Before/After opposition within each shot, the narrative can only be derived from the sequence of shots, that is from montage.
What happens if that sequence is not being arranged according to iconological or narrative codes any more, but rather in an inherently similarity-based mode, leading to a genuinely (image- oder media-)archaeological montage?
After a century of creating a genuinely audio-visual technical memory emerges a new cultural practice of mnemic immediacy: the recycling and feed-back of the media archive (a new archival economy of memory). With new options of measuring, naming, describing and addressing digitally stored images, this ocean asks for being navigated (cybernetics, literally) in different ways and not any longer just being ordered by classification (the encyclopedic enlightened paradigm).
This state of affairs has motivated the film director Harun Farocki and the media scientists Friedrich Kittler and Wolfgang Ernst to design a project of performing an equivalent to lexicographical research: a collection of filmic expressions. Contrary to familiar semantic research in the history of ideas (which Farocki calls contentism , that is: the fixation on the fable, the narrative bits), such a filmic archive will no longer concentrate on protagonists and plots and list images and sequences according to their authors, time and space of recording and subject; on the contrary, digital image data banks allow for systematizing visual sequences according to genuinely iconic notions (topoi or- for time-based images, a different notion of Bachtian chrono-topoi) and narrative elements, revealing literally new insights into their semantic, symbolic and stylistic values. This is exactly what the film maker Harun Farocki strived for when in summer 1995 at the Potsdam Einstein Foundation he proposed the project of a kind of visual library of film which would not only classify its images according to directors, place and time of shooting, but beyond that: digitally systemizing sequences of images according to motives, topoi and, f. e., narrative statements, thus helping to create a culture of visual thinking with a visual grammar analogous to linguistic capacities.
Different from the verbal space there is still an active visual thesaurus and grammar of linking images lacking; our predominantly scripturally directed culture still lacks the competence of genuinely filmic communication („reading" and understanding).
Genuinely mediatic criteria for storing electronic or filmic images have been listed by the director of the Federal Archives of Germany (Kahlenberg) and the chief archivist of the nationwide public tv-channel ZDF (Schmitt); next to economically driven criteria (recycling of registered emissions) historically-semantically-iconographically „inhaltsbezogene Kriterien" they name 1. „Dominanzereignisse" (historical event-centered), 2. „politische und soziale Indikationen längerfirstier Entwicklungen und Tendenzen", 3. „Soziale Realität im Alltag" follows under „gestaltungsbezogene bzw, ästhetische Kriterien" l. „Optische Besonderheiten" (remarkable camera perspectives, such as „Bildverkantung und extreme Auf- oder Untersicht"), 2. „die dramaturgische Gestaltung von Bildsequenzen" (cut, opposition of single frames), 3. „besondere Bildmotive" (landscapes, people) – close to Farocki´s topoi. Last but not least, of course, „Medientypische Gesichtspunkte" – the very proper media archives, documenting the history of a tv channel itself.
On the market, though, digital video browsing still seeks to reaffirm textual notions such as the story format as segmentation of a video sequence, such as the news story, „a series of related scenes with a common content. The system needs to determine the beginning and ending of an individual news story." Beginning and end though, in technical terms, are nothing but cuts here.
With film, time enters the pictorial archive. Once being digitized, even the single frame is no more a static photographic image, but a virtual object which is constantly being re-inscribed on the computer monitor in electronic refresh circle light beams. While the visual archive has for the longest time in history been an institution associated with unchangeable content, the memory of (time-based) images becomes dynamic itself. Thus, images get a temporal index.
The equivalent for iconographic studies of images is the search for macroscopic time objects in moving images, „for instance larger sequences constituting a narrative unit" . The media-archaeological look on film, on the contrary, segments serially.
What do we mean by the notion of „excavating the archive"? The answer is media-archaeology instead of iconographical history: What is being digitally „excavated" by the computer is a genuinely media-mediated gaze on a well-defined number of (what we call) images.
In a different commercial news analysis system, Farocki´s notion of kinetic topoi occurs: „Each segment has some informative label, or topic. It is this kind of table of contents that we strive to automatically generate" (i. e. by topic segmentation). Of course, „motion is the major indicator of content change", a zoom shot f. e. is best abstracted by the first, the last, and one frame in the middle .
„Current video processing technologies reduce the volume of information by transforming the dynamic medium of video into the static medium of impages, that is, a video stream is segmented and a representative image is ex-<...>" ; that is exactly what indexing by words (description) does. How to avoid freezing the analysis into a data bank? „Image analysis looks at the images in the video stream. Image analysis is primarily used for the identification of scene breaks and to select static frame icons that are representative of a scene" , using color historgram analysis and optical flow analysis and speech analysis for analyzing the audio component (which can be done by transforming the spoken content of news stories into a phoneme string). Thus the image stream is not subjected to verbal description but rather accompanied by an audio-visual frame analysis.
Retrieval and browsing require that the source material first be effectively indexed. While most previous research in indexing has been text-based (Davis 1993, Rowe et al. 1994), content based indexing of video with visual features is still a research problem. Visual features can be divided into two levels : low-level image features , and semantic features base don objects and events. <...> a viable solution seems to be to index representative key-frames (O´Connor 1991) extracted from the video sources
- but what is „representative", in that archivo-archaeological context? „Key frames utilize only spatial information and ignore the temporal nature of a video to a large extent" .
The basic unit of video to be represented or index is usually assumed to be a single camera shot, consisting of one or more frames generated and recorded contiguously and representing a continuous actionin time and space. Thus, temporal segmentation is the problem of detecting boundaries between consecutive camera shots. The general approach to the soultion has been the definition of a suitable quantitative difference metric which represents significant qualitative differences between frames
- which is exactly the boundary between the iconological and the archaeological gaze, between semantics and statistics, between narrative and formal (in the sense of Wölfflin) topoi.
Of course, a topos is a rhetorical category; rhetoric, though, is more of a technique than a question of content: The philosopher Immanuel Kant, f. e., considers the ordering art of topics to be a kind of storage grid for general notions, just like in a library the books are being distributed and stored in shelves with different inscriptions. Do we have to always group image features into meaningful objects and attach semantic descriptions to scenes , or does it rather make sense to concentrate on syntax, thus treating semantics as second-order-syntax?
The Warburg paradigm
His aiming at an archive of expressions Farocki shares with the art historian Aby Warburg who established, between the World Wars, an visual, photography-based archive of gestic expressions (so-called pathos formulas) in occidental art history, in the form of his Mnemosyne-Atlas (a kind of visual encyclopedia where the reproductions, provided with numbers, could be constantly re-arranged and re-configurated). But though Warburg conceived his chart sequentially, even there the apriori of this pictorial memory is still the order or the library ; it it the famous Warburg file catalogue (Zettelkästen) which translates both texts and images in alphanumerical notations – like in digital space – which then allow for hypermedia-like links of visual and verbal information (the definition for hypertext according to Ted Nelson).
Encyclopaedia Cinematographica has been the name of a film project of the German Institute of Scientific Film (Göttingen) which, unter the guidance of Konrad Lorenz, attempted to fix the world of moving beings on celluloid (up to 4000 films). Like the medical films produced at the Berlin hospital Charité between 1900 and 1990 which the media artist Christopf Keller has secured from being thrown away as trash, this visual encyclopedia forms an archive which gains its coherence not from semantically internal but formally external criteria.
As a first practical „entry" for an analogous international Dictionary of filmic terms – and taking one hundred years of film as a motive - Farocki has produced a commentated compilation of the recurrent kinematographic motive of „workers leaving the factory" (Arbeiter verlassen die Fabrik, Germany 1995) – starting with the Lumière brothers film La Sortie des usines Lumière (1895) and re-occurring in films by Pier Paolo Pasiolini, Michelangelo Antonioni, Fritz Lang, D. W. Griffith, Hartmut Bitomsky. Farocki operates on an iconological level in classifying cinematographic topics; see as well his film The expression of the hands (Der Ausdruck der Hände). Here he links gestures which are symptomatic of Taylorism in work (and in the standardisation of filmic rules themselves ) with the narrative gestures of such films; in fact there is a film historic model: the film produced in the US called Hands, showing gestures which don´t tell stories – a phenomenon well known from forensiv rhetoric, f. e. Today, one option of content-based retrieval of digital archives is using statistical object modelling techniques (so called Hidden Markov Models, probability scores which are deformation tolerant), i. e. the user searches an image database intuitively by applying simple drawings, sketches, f. e. a hand.
Here, the subject and the object of The expressions of the hands become performative. Film itself is not a tactile, but visual medium; still when it comes to creating a man-machine interface for the query of image databanks, it is easier to use a computer mouse or digitizer boards rather than alphanumeric keyboard – a fact which, on the physiological level, matches the disadvantages of text-based retrieval and the traditional manual textual indexing.
Farocki is currently working on filmic expressions of symptomatic moments of human encounters and video surveillance in prisons (Ich glaubte, Gefangene zu sehen), to be installed as a two-channel-version, that is, two projections are blended and overlapping in one, both visually and acoustically.
Like in classical iconography, verbal commentaries, attached to the image sequences, explain or extend the meaning; one of the reference books for this iconological approach shown in Farocki´s Film Der Ausdruck der Hände is Karl von Amira / Claudius von Schwerin, Rechtsarchäologie. Gegenstände, Formen und Symbole Germanischen Rechts, Berlin (Ahnenerbe-Stiftung) 1943 . I want to contrast this with a media-archaeological approach, where the aesthetics of surveillance cameras is taken as a starting point. For this field, automatic image sorting and comparative algorithms have been developed, which might be used as the basis for an audio-visual archive of filmic sequences.
Indeed, the genre of compilation film already operates on similarity-based image retrieval (by association), as expressed by Pierre Billard in Cinéma 63 (April 1963 issue): There is always a director who feels tempted to create,out of thousands of meters of known film material, new combinations and interpretations, „in order to breath new life into the material". A veritable memory of waste; what happens to the non utilisées (Nus in the language of der cutters)?
This is why Farocki conceives the Visual archive rather as a CD-ROM which can be read/seen vertically and horizontally, i. e. paradigmatically and syntagmatically, different from the linear reading of analogue film and video .
While the Visual archive of cinematic topoi project started mainly iconographically and –logically orientated at the first glance, its coupling with new asemantically operating digital images sorting programs opens new perspectives, resulting in a productive, but maybe irreconciliatable tension between the image-content-based (Farocki) and the media-archaeological (Ernst) approach which privileges a form-based method of ordering images known from the controversial art historian Heinrich Wölfflin.
While the French apparatus theory (Baudry) discovered the ideological pressure and physical disciplination acted upon the viewer by the very technical form of the optical media which select, frame and direct the visual , the media-archaeological aesthetics on the contrary makes these technical predispositions a chance for liberating the images from human perception exclusively. An algorithm, though, will never motorically compose images in a way the director Farocki does it; Farocki is heading at his individual, rather ideosyncratic archive (Siegfried Zielinski, AMA Cologne). Authoring tools simply do not reach at the complexity of associations which grow from experience rather than data banks. „We don´t have to search for new, unseen images, but we have to work on the already known images in a way that they appear new", Farocki comments on his film Images of the World and Inscription of the War (Bilder der Welt und Inschrift des Krieges, 1988/89).
Which is true, though, for digital transformation of images as well. Thus the „death of the author", proclaimed once by Roland Barthes und Michel Foucault, is the precondition for the digital archive: a radical separation from subjective respects. This is the way in which photography acts upon the real, f. e. by superimposing faces of different people, thus blending them into a composite picture for purpose of apprehending a suspect . This is as well the realm of supervising cameras, of monitoring systems which actually create an archive of filmic expressions by automatically selecting images according to affinities with an archive – an affinity which, f. e., has not been seen by the allied center for aerial photography analysis in Medmenham, England, when the first aerial photographs of the Auschwitz-Birkenau Complex from April 1944 onwards have not been identified as concentration camps by human eyes. „They were not under orders to look for the Auschwitz camp, and thus they did not find it" - analogous to pattern recognition in automated image retrieval systems. Here the Lacanean seperation of the camera (and digital) gaze from the human eye makes sense:
Not only does the camera/gaze here manifestly „apprehend" what the human eye cannot, but the eye also seems strikingly handicapped by its historical and institutional placement, as if to suggest that military control extends beyond behaviour, speech, dress, and bodily posture to the very sensory organs themselves.
Human blindness here confront technical insight:
Again and again, even in 1945, after the Nazis had cleared out the Auschwitz camps <...> Allied airplanes flew over Auschwitz and captured the camps in photographs. They were never mentioned in a report. The analysis had no orders to look for the camps, and therefore did not find them.
Only the I.G. Farben Monowitz chemical factories were of strategic interest to the Allies, which is what attracted the bobers and cameras. Today, of course, it is television cameras that both guide their weapons to their targets and privide a record of the event in the same moment.
Thirty-three years after the pictures were shot, two CIA men undertook a new analysis of the images, after having stimulated for this search by the Holocaust tv series. They fed into the photo archivde computer the coordinates of all strategically important targets situated in the vicinity of the concentration camp – and thus also the coordinates of the IG Farben plant at Monowitz; thus the alliance of automated image recognition and military targets becomes evident. Since World War I and II bomber planes were being equipped with cameras in coordinance with search lights; in the case of the V2 rocket, these camera images were telematically transferred to auto-flight-correction systems, thus not being addressed to human eyes at all any more. „A program is being developed that focuses on sections of aerial photographs and isolates moving objects <...>. More pictures than the eyes of the soldiers can consume" .
In the first image from April 4, 1944, they identified the Auschwitz gas chambers.
What distinguisees Auschwitz from other polaces cannot be immediately observed from these images. We can only revognize in these images what others have already testified to, eyewitnesses who were phsically present at the site. Once agian there is an interplay between image and text in the writing of history: test that should make the images accessible, and images that should make the texts imaginable.
So it was belated that the word „gas chamber" was literally inscribed on the photographs. Once again, images can only be retrieved logo-centristically? The alternative is automated image retrieval by image content.
In order to do so, we have to insist on the computability of the imagined world. For monitoring sytems to process a large amount of electronic images, such as human faces, such systems have to get rid of semantic notions of Gestalt:
The police is not yet able to register the characteristics of a human face that remain the same, in youth and old age, in happiness and in sorrow. <...> And because the police does not know what it is, how to describe the picture of a human being, the police wants at least to take measurements of it, to express its picture in numbers.
Enters the computability of images, which derives ultimately from Albrecht Dürers and the renaissance perspective artists scale pictures (the rules of projective geometry). „This precedes depiction by photographic means" and makes it, reversely, possible to calculate for machines pictures out of numbers and rules , as being accentuated by the late media philosopher Vilém Flusser:
Vilém Flusser has remarked that digital technology is already found in embryonic form in photography, because the photographic images is built up out of points and decomposes into points. The human eye synthesizes the points into an image. A machine can capture the same image, without any consciousness or experience of the form, by situating the image points in a coordinate system. The continuous sign-system image thereby becomes divisible into „discrete" units; it can be transmited and reproduced. A code is thus obtained that comprehends images.This leads one to activate the code and to create new images out of the code language.
This is why the QBIC system does not try to radically decide in the quarel between semantic versus nonsemantiv information, but rather to distribute the task according to the respective strength in the human-machine interface:
Humans are much better than computers at extracting semantic descriptions from pictures. Computers, however, are better than humans at measuring properties and retaining these in long-term memory. On of the guiding principles used by QBIC is to let computers do what they do best – quantiifiable measurements – and let humans do what they do best – attaching semantic meaning
- which establishes a feedback-loop between man and machine and stages the difference between analogous and digital data processing, thus not trying to efface, but to creatively enhance the human-computer-difference where they meet on the interface.
Sleeping mnemic energies can be revitalized out of the latent audio-visual archive of film (as long as they are accessible in the public domain of state and public image archives). Opening new ways of access to such archives in an administrative and technical sense is an aim of the project which will test (or even develop) new tools of image-based image retrieval (see, f. e., QBIC). Such a new cultural practice is marginally already being performed by the archival image retrieval software VideoScribe at the Institut National Audiovisuel in Paris.
Remains the theoretical reflection of this practice in its implications for memory culture and historiography of film, in order to supplement film-philological approaches by trans-hermeneutic ways of processing cinematic information.
Excavating the cinematographic archive means as well un-covering the hidden virtual machine of the film event, its cuttings and montages hidden behind the apparent narrative. With film, a different aesthetics in the succession of images technically entered: counting with differences to achieve the illusion of continuity in time and space. This allows for searching films according to these rules of organization of images; this procedure is based on identifying and counting with differences of objects (shapes, colors) in digitized images. Such a program is based not on iconographic, word-based criteria, but on the contrary on the computer´s dullness:
No "sample information" can suffice <...> unless great care is taken in finding the points at which the text ceases to be standard and becomes variable. In one sense this type of detective work emulates the original text compiler´s work in creating the text. <...> no understanding of the meaning of the text is required to analyse text in this way. In many way this is an ideal application for the `dump´ computer. The computer will be able to sort which elements are similar and which are unique, which are always variable, which are sometimes similar etc. To the computer the mysteries of the meaning of the text <...> are not relevant. The words migh as well be figures. Where a database becomes useful in this case is in dealing with quantities of information which would otherwise become unmanageable.
This reminds of the color theory of the impressionist school of painting, as analyzed by the late art historian Max Imdahl in his seminal study Farbe (Color) from 1987. Its main characteristic is the „desemantization of seeing" <"Entbegrifflichung des Sehens">, freeing the image from its pictorial logic – an archaeological gaze indeed.
There is a digital image sorting method developed by the London-based art historian William Vaughan under the name Morelli. He reduces digital pictures to a sort of visual abstracts called „visual identifier" which still keep the characteristic signature of an image, which than can be mathematically compared with similar structures without absorbing enormous amounts of storage space. Contrary to art historical image bank like Iconclass or the Marburg-based German Documentary Center for the Arts, where we can access images by internet but still only address them by verbal descriptors like subject or artist name (www.bildindex.de), the specific new option in digital space is the possibility of addressing images in its own medium according to inherent criteria like formal and color qualities; the IBM search engine QBIC (Query by Image Content) f. e. allows to draw outlines of objects in pictures and to search for similar shapes in the pictorial archive. This, of course, requires an explicit non-iconographic view on pictures: not seeing, but gazing / scanning (aesthetics of the scanner), a media-archaeological approach like in Svetlana Alpers´ and Michael Baxandall´s book on Tiepolo which is explicitely „un-historical": when pictorial elements, like ship masts, are being analyzed beyond any iconographical content and rather being regarded as graphic and geometrical picture elements (macro-„pixels", in a way).
There is a significant difference to the historic scholar Morelli whos is well knows to have detected frauds in painting by observing seemingly insignificant details which reveal an author´s very indivual style: „The automated `Morelli´ system is not concerned with establishing authorship", sondern „with providing an objective means of describing and identifying pictorial characteristics, such as form, configuration, motif, tonality and <...> colour" . Here digitale image processing takes over:
The comparison is of a simple „overlay" kind, and points of similarity and difference are recorded during the process of comparison. <...> the central <...> is that of a simple matching process. In this sense it is really the visual equivalent of the „word search" that is a standard feature of every word-processing and database package. <...> possible due to the fact that the digitized image is an image that is stored as a set of quantifiable elements."
Between the human notion of an image and the digital, formal format of an image stands the pictogram as a cultural form(alization) - as applied as a interface tool for image retrieval by QBIC, where the user is supposed to draw an outline of the object to be found.
Another notion of sorting pictures, which is derived from camera techniques, brings us even closer to the project of a visual encyclopedia of filmic expressions, is blending, which in cognitive linguistics means the virtual combination of two realms of imagination. Different from metaphoric speech, blending transforms the context of image a into image b. If cognitive operations work like this, why not translate them into algorithmic procedures?