Computer Vision Understanding of Narrative Strategies on Greek Vases

Madhu P, Kosti RV, Bendschus T, Reinhardt C, Maier A (2020)


Publication Language: English

Publication Type: Conference contribution, Original article

Publication year: 2020

Conference Proceedings Title: Computer Applications and Quantitative Methods in Archaeology

Event location: Oxford, UK GB

Abstract

In this presentation we tackle the problem of understanding complex relationships and cultural
references within artworks of Classical Antiquity by Computer Vision techniques using Machine
Learning. Therefore, we approach the open challenge of understanding the semantics in images
beyond the automated recognition of individual objects, by computationally identifying general

formal schemes of human interaction.

Our dataset consists of several thousands of images of Attic vase paintings from the 6th and 5th
century BC. The specific case of vase paintings presents us with a very promising material basis
because the understanding of iconography and visual narration in ancient imagery is a main focus
in the scientific field of Classical Archaeology. Due to the fact that during this time the vase
painters did not illustrate a textual version of myth, we have to look at the specific visual means
they applied to narrate a story and to make it understandable for the viewer. These images make
extensive use of highly meaningful and repetitive formal schemes as well as narrative elements.
Both were used in order to characterize the individuals within the story as well as to make the
situations in which these protagonists interact comprehensible for the ancient viewer [1]. On the
one hand, the interaction and relationship of the figures is outlined by significant postures and
gestures (schemata) in order to illustrate key aspects of the storyline. As these schemes are not
limited to certain iconographies, visual links between different images and different image
contexts occur [2]. Due to their familiarity with these schemes the ancient viewer was already able
to identify the general meaning of the image and to understand the main components of a
narration. On the other hand, however, other characteristic elements are needed to determine a
depiction as, for example, a specific mythological story and to add further aspects of the narration.
Hence, narrative elements such as attributes of gods, a certain surrounding or other significant
objects in the image, are also essential to understand specific narrative contexts.

Visual similarities are therefore the key for interpreting vase paintings. By comparing a large
number of images, scholars usually study, at length, specific elements that help to narrate the
stories. Although valuable insights are gained in terms of understanding the significance of the
image relations for the cultural meaning of the vase paintings, unveiling links between images
(mainly between non-narrative images – e.g. wedding scenes – and narrative ones) contributes
even more details to our interpretation of these images and helps to comprehend their cultural
meaning. Therefore, we aim to develop computational methods in order to support human
research on the questions of archaeological image studies by Computer Vision and Deep
Learning. Fresh perspectives on yet unidentified image relations and quantitative results by
handling a large number of images are the main foci of our research, based not on the textual
metadata but on the visual characteristics of an image.

A selection of popular schemes in Attic vase paintings was chosen in order to answer the
questions raised above: 1. Leading the bride, 2. Abduction, 3. Pursuit scenes. These schemes
were deemed suitable as, for example, the scheme of leading the bride in Attic vase paintings is
characterized by a significant leading-gesture (χεῖρ’ ἐπὶ καρπῷ – hand on wrist / hand on hand)that sets the bride and groom in non-mythological wedding scenes. However, also scenes of
Helen and Menelaus make use of this scheme to show that Menelaus leads Helen home from
Troy like a bride after regaining power over his wife. Therefore, not only the interaction scheme
between the two figures must be recognized but also the narrative elements, i. e. Menelaus
dressed as a warrior and his weapons, have to be identified in order to name the figures.

We approach this problem of recognizing specific visual attributes using state-of-the-art object
recognition algorithms in Computer Vision [3]. We present a novel approach using convolutional
neural networks which maps the attributes occurring in these images into a deep representational
space, through Deep Learning techniques. This representation maps the inherent variations in
the representation of any particular attribute into a joint space. We recognize these objects using
this embedding. However, attributes depicted in the narratives alone do not suffice. The gestures
and poses of the characters highlight their role and importance in the narration. We model these
characteristics of the protagonists in the deep neural network in conjunction with the recognition
of the attributes. This also helps the network to recognize the narratives being displayed. Another
novel method is developed that uses this representation space along with the context of the
narrative for a better understanding of the depicted scene in the artwork. Here, we use the main
theme of the scene (e. g. leading the bride or pursuit scene) as a context. We analyze the model
performance with abstract classes where we combine multiple similar attributes into a single
abstract class. This model learns a generic representation of the attributes through these abstract
classes. This new approach has important applications, including the retrieval of semantically
similar artworks and, in a more generic sense, contextual understanding of works of art in
Classical Archaeology.

Preliminary results of the recognition of attributes are promising. The network is able to predict
most of the attributes present in the scene whose labels are available, and quite often predicts
those attributes whose labels are missing. This is remarkable and a good representation of the
generalization of the network, and serves as a proof of concept of our techniques mentioned
above. To date, our proposed methods therefore have good potential to work towards the
semantic understanding of the vase paintings. Due to its innovative and experimental approach
we would not only like to present our first results to the audience, but particularly to discuss the
occurring problems and especially the further potential of the applied methods as well.

References:
[1] Giuliani, L., 2003. Bild und Mythos. Geschichte der Bilderzählung in der griechischen Kunst, München: Beck 2003.
[2] Catoni, M. L., 2008. La comunicazione non verbale nella Grecia antica : gli schemata nella danza, nell'arte, nella
vita, Turin: Bollati Boringhieri.
[3] Lin, T.Y., Goyal, P., Girshick, R., He, K. and Dollár, P., 2017. Focal loss for dense object detection. In Proceedings
of the IEEE international conference on computer vision (pp. 2980–2988).

Authors with CRIS profile

How to cite

APA:

Madhu, P., Kosti, R.V., Bendschus, T., Reinhardt, C., & Maier, A. (2020). Computer Vision Understanding of Narrative Strategies on Greek Vases. In Computer Applications and Quantitative Methods in Archaeology. Oxford, UK, GB.

MLA:

Madhu, Prathmesh, et al. "Computer Vision Understanding of Narrative Strategies on Greek Vases." Proceedings of the 48th Computer Applications and Quantitative Methods in Archaeology Conference, Oxford, UK 2020.

BibTeX: Download