Exploring deep learning to detect and identify design objects at Design Museum Gent — one step at a time.

Olivier Van D'huynslager
7 min readOct 31, 2020

--

Fascinated by technology and museums — call me a culture geek of sorts — I recently committed myself to a pet project to build a mobile application that is able to detect and identify objects from the collection of Design Museum Gent in real-time and once it has done so, fetch more context information from our database on that particular object.

All in favor, say aye.

So why bother using machine learning and computer vision technology in the first place? What’s in it for the museum? Although the AI hype is real and very much out there it’s still too early to say what impact the technology will have on the average museum in the long term. But still, we can wage some opportunities;

[000] WE ARE CLOSED. As the COVID-pandemic has forced museums to close their doors for the public all over the world, they are looking for alternative ways to reach out to their audience. Although it can never replace the museum visit experience, If done right, digital can bring some relief. If there ever was a time to consider digitally opening up the collection, the time is now.

[1] NO VACANCY. Museums often face limited exhibition space — especially in comparison to their collections. The space they do have is kept as “clean” and uncluttered as possible. This impacts the use of signage offering extra information on the objects on display in favor of keeping the emphasis on the objects. In search of finding new ways of offering multiple views on the collection, Design Museum Gent makes use of several thematic handouts that are freely distributed on-site.

[2] AGILITY IS KEY? Printed publications are fine, but they are hard to update and expansive to reprint. Digital signage could offer a more agile approach that not only allows for changes on a micro level but also is better suited for multilingual content. We could use AI as a way for visitors to engage with our objects and in doing so retrieving extra information.

[3] BIGGER BETTER. A big inspiration is MoMA's ongoing collaboration with Google Arts and Culture. Imagine an algorithm combing through 30.000 exhibition photos, depicting over 65.000 works, identifying and annotating over 20.000 works in the process. We might not be up there in the same league as MoMA but trying doesn’t hurt.

That’s why in this series I’ll be exploring the opportunities, pitfalls, and restrictions of AI in the context of the Design Museum Gent. And perhaps one day — at the end of these series? — we will be able to leverage this technology to identify and annotate pictures with objects of our own.

based on specific features of an object we could use our model to identity objects on vast sets of exhibition photographs. Source: Archief gent

Every story needs a beginning.

Before we can get into the technicalities we need a dataset that is fit for the assignment.

Out of the 23.000+ design objects that are part of the collection of Design Museum Gent, only a small number are on display in the permanent exhibition. The museum usually has two semi-permanent exhibitions: Object Stories and Maarten Van Severen & Co. A wild thing. The other half of the museum makes room for temporary exhibitions. Because we envision an application that can accompany the on-site experience this narrows down the number of objects to more reasonable numbers.

Considering both options, Object Stories seems most suitable. Not only does it offer a broad range of objects on display but we can also define physical “clusters” of objects in the gallery space. In the back of the room, there is a large stage filled with design chairs from various periods— however an interesting pick due to their variating nature, they are not easy to reach because of how they are placed. A more fitting choice would the collection of 55 ceramics and vases (also spanning different periods). These will become our base set to prototype our algorithm.

The chosen objects will have lots of intra-class variations (a vase can take on multiple shapes, vary in color and dimensions). Because of that, we will need a lot of material (images) to train our model and extract as many features as possible.

selection of chairs on display in Object Stories at Design Museum Gent (exhibition design by FELT)
A selection of vases and ceramics on display in Object Stories at Design Museum Gent (exhibition design by FELT)

Let’s delve deeper.

We will be using Python as our main coding language to do most of the heavy lifting and to create our model we will be using TensorFlow, an open-source software developed by Google that can be used for Machine Learning (ML).

However, ML is very versatile and there are many ways to go from here. So it’s important to start defining our flow. In this case, we are facing a classification problem, for which we will need to train a neural network model that classifies the input for us. In this case, the classes of our objects will be our object_numbers.

Let’s break down the process.

PREPROCESSING [the boring stuff]

Like any other data science project, the first and perhaps most intensive and time-consuming segment — yet crucial component — of building the model is preparing our dataset. Training a Convolutional Neural Networks takes a two-dimensional image and the class of the image (in our case, the object-number as input):

  • IMAGE: To reach a critical mass, that is big enough to train, validate, and test our model we need at least 100 images of each object (80 for training / 20 for validation). Which totals out to 5500 images for only 50 objects. We might need to pull this number up (or down?) but let’s see how far we can get. To give an idea of where we’re at now > our image count per object ranges anywhere between 1 and 5.
  • CLASS It is crucial that the folders containing our images are structured in a way that makes it machine-readable which image belongs to which class. An easy way to do this is by creating subdirectories for each object.

├── TRAINING_IMG
│ ├── 0539
│ │ ├── 0539$1.jpg
│ │ ├── 0539$2.jpg
│ │ ├── 0539$3.JPG
│ │ ……
│ │ ├── 0539$43.jpeg
│ │ ├── 0539$44.jpeg
│ │ ├── 0539$45.jpeg
│ │ └── 0539$46.jpeg
│ ├── 1451–1–2
│ │ ├── 1451_1–2$1.jpg
│ │ ├── 1451_1–2$2.jpg
│ │ ├── 1451_1–2$3.jpg
│ │ …
│ │ ├── 1451_1–2$41.JPG
│ │ ├── 1451_1–2$42.JPG
│ │ └── 1451_1–2$43.JPG
│ ├── 1452–1_2
│ │ …

  • We can then make use of tf.data API to do the heavy lifting for us when prepping, transforming, and normalizing our set so we can feed it to our neural network.
image after preprocessing

BUILDING THE MODEL

After gathering and normalizing the needed material, preprocessing our data into training, validation, and test sets, we can finally start thinking about what model might be best suited for this case. Weighing out the advantages and disadvantages of different models, the use of a convolutional neural network (CNN) seems about right. Without going into too many details a CNN is a class of deep neural networks that are commonly used to identify visual imagery. It will basically split our image into smaller chunks (also known as weights) looking for features and characteristics that will enable it to learn and predict and classify new pictures.

Model: “sequential”
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 26, 26, 128) 3584
_________________________________________________________________
activation (Activation) (None, 26, 26, 128) 0
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 128) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 11, 11, 128) 147584
_________________________________________________________________
activation_1 (Activation) (None, 11, 11, 128) 0
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 3, 3, 128) 0
_________________________________________________________________
flatten (Flatten) (None, 1152) 0
_________________________________________________________________
dense (Dense) (None, 128) 147584
_________________________________________________________________
activation_2 (Activation) (None, 128) 0
_________________________________________________________________
dropout (Dropout) (None, 128) 0
_________________________________________________________________
dense_1 (Dense) (None, 31) 3999
_________________________________________________________________
activation_3 (Activation) (None, 31) 0
=================================================================
Total params: 302,751
Trainable params: 302,751
Non-trainable params: 0
_________________________________________________________________

now it’s time to compile (tf.model.compile) our model and start fitting (tf.model.fit) both our training and validation set.

HUMBLE BEGINNINGS...

We will start tuning our model onto 31 objects of which we have collected a fair amount of images (782) — that’s an average of 25 images per object (that’s a bit low).

Restricted to 20 epochs, our results are already looking promising indeed. With an accuracy of 88.9% and validation accuracy of 82.6, and low losses for both training and validation.

TIME FOR TESTING

This is still an early prototype but it’s worth a shot feeding in some unseen images so introduce 5 new images to test our model… The results (5/5) might seem very promising at first, but there’s a slight chance that this is due to overfitting our model. But for now, we can proclaim, and I quote; “it’s alive, IT’S ALIVE!”.

Sign up to discover human stories that deepen your understanding of the world.

--

--

Olivier Van D'huynslager
Olivier Van D'huynslager

Written by Olivier Van D'huynslager

Digital Strategist @ Design Museum Gent | Strategic content manager @CoGhent | overall Culture Geek — interested in AI and its value for museums.

No responses yet

Write a response