Detecting Ivory Artifacts With Deep Learning | by Ryan Posternak | Aug, 2022

By Jessie Hobb On Aug 12, 2022

Image recognition can help fight against illegal online ivory sales

Introduction

To date, wildlife conservation groups and organizations have used diplomatic and public pressure campaigns as their main tool in the fight against the continued existence of the market for elephant ivory. While this form of action is certainly worthwhile and can lead to monumental success (which I’ll describe further below), it need not be the only tool in the arsenal. Technology and machine learning can be deployed to combat illicit sales of ivory online, a critical next step once ivory sales are made illegal in a country. This article will outline how we developed a proof-of-concept of such a model, using open-source data on ivory-based artifacts to classify whether objects consist of ivory or not with over 80% accuracy.

Background

Elephants are crucial not only as biological and cultural icons, but as keystone organisms in their ecosystems. Consisting of three main species — the African Forest elephant, the African Savanna elephant, and the Asian elephant — each serves myriad purposes in their respective environments such as habitat creation, seed dispersal, forest pathway creation, and brush cover management. In 1930, an estimated 10 million wild elephants existed on the African continent. But after decades of poaching, habitat loss, and other human interventions that number declined to approximately 496,000 by 2007. In the following seven years, the elephant population of Africa was further reduced by 30% to 352,000 according to The Great Elephant Census, one of the largest wildlife surveys ever conducted.

The World Wildlife Fund (WWF) estimates that poachers kill 20,000 wild elephants every year. The vast majority of these killings are done for illegal commercial purposes, mainly consisting of the valuable tusks of the elephant — otherwise known as ivory. With so few elephants remaining, this poaching presents an existential threat to the continued survival of elephants on the planet. In 2016, many countries including the United States, the United Kingdom, Singapore and others implemented near total bans on the importation and sale of ivory and ivory products. In a massively consequential step, China followed suit the next year by ending the legal sale of ivory within the country, a move that is widely regarded as “the single biggest step to ending the slaughter of elephants” according to the Environmental Investigation Agency. Smithsonian Magazine estimates that the ban eliminated as much as 70% of the global ivory market.

As significant an improvement as this step was, there is still much work to be done. Consumer transports of ivory from abroad are still allowed in China, and in countries such as Vietnam and Thailand, the sale of ivory is illegal in name only, with little or no enforcement. This leads to a wholly unregulated market; the Sustainability Times estimates that between 2009 and 2018 over 56 tons of elephant ivory entered Vietnam’s black market, and another 20 tons was intercepted in transit.

To combat the continued existence of the ivory trade, wildlife conservationists are tirelessly urging nations across the world to enact stricter legislation, but the emergence of online shopping and peer-to-peer marketplaces presents a challenging roadblock. A recent study by the Durrell Institute of Conservation and Ecology (DICE) in Kent, U.K. found that the illegal ivory trade thrives on online peer-to-peer marketplaces. Although on most major websites specific product listings can be searched for by keywords, there is no easy way to tell if ivory products are listed in postings under alternative names or pseudonyms.

The aim of this project is to create a proof-of-concept of a machine learning image recognition model that processes image data of three-dimensional objects and artifacts and classifies them as consisting of ivory or not. Such a proof-of-concept could be used as a starting point for a model that can aid online peer-to-peer marketplaces, auction sites, law enforcement agencies and wildlife NGOs to identify listings potentially containing illegal ivory products and flag them for further investigation.

Data Sourcing

Due to the illegality of online ivory sales in most nations across the world, obtaining a dataset of sufficient size and quality to train a neural network is no simple task. Ideally, such a dataset would be obtained from contemporary processed ivory artifacts that are for sale on marketplaces where the model is intended to be deployed. Lacking this, we decided to go the non-contemporary route and use image data sourced from The Metropolitan Museum of Art in New York City. The museum, which hosts a collection of nearly half a million paintings, drawings, sculptures, and other artifacts spanning over 5,000 years of history, has open-sourced data on nearly all of the items in its collection through the museum’s Collection API. The API is freely available for commercial and non-commercial use, and requires no API key to use the service.

Artifacts are searched for in the API via the object’s “objectId” — a 5 or 6 digit identifying code. The search returns a dictionary with 57 items of information on everything from the artifact’s department, title, artist name, medium(s) of art, region, and measurements, although many of the fields are blank for most items. The museum provides images for most artifacts in the API, through specific keys in the return dictionary linking to URLs containing the jpeg images.

For this project, we first searched for artifacts that have the word “ivory” listed among the object’s medium(s). These art pieces formed the initial collection of ivory art pieces that we used to train our neural network. This query returned 5,975 artifacts in total containing the word “ivory” listed as a medium of the object. For non-ivory object images to train the neural network, we searched for objects that contained “ceramic” listed among the mediums, as we reasoned that ceramic figures would be of a similar size and shape as most ivory objects, thus making them a difficult comparison against the ivory artifacts for the neural networks to classify. To get a feel for how some of the objects look, below are five images from the final dataset, with ground-truth labels displayed above each artifact.

Images sourced from the Metropolitan Museum of Art, NYC

After further exploration of the data, we discovered that many of the artifacts in our initial collection of ivory objects (all artifacts listing ivory as a medium) contained many art pieces where ivory was only a minor component of the design. In many of these cases, it was clear that ivory was such a limited component of the composition of the artifact that the inclusion of such objects would introduce more noise than signal. For this reason, we decided to limit our collection of ivory artifacts that we passed in to the models to objects where ivory was the sole or main component. Of the 5,975 ivory artifacts in our original API call, 1,769 remained after selection refinement and preprocessing. These 1,769 ivory images were paired with 1,767 non-ivory artifact images.

Distribution of number of materials comprising ivory artifacts

The Model

The data was partitioned into train/validation/test splits prior to modeling. Of the 3,536 total images in the dataset, 2,265 was used for training, 565 was used for validation, and 706 was used for testing. Non-ivory artifacts were assigned the normal state for this binary classification, and given a class label of 0. Ivory artifacts were assigned the abnormal state and given a label of 1.

Six models in total were tested as part of this project. Our baseline model was a fully-connected (dense) neural network implemented in Keras. The baseline model achieved an accuracy score of 72% on the testing data — a 22% improvement over a dummy classifier given that the dataset had balanced class weights.

Our best performing model was in fact our first iteration of a convolutional neural network (CNN), clocking in at 82.4% accurate on the testing data. The details, parameters, and evaluation metrics of this model can be found below. Subsequent iterations of the CNN adjusted the optimizer (adaptive moment estimation aka Adam), used regularization (L2 aka ridge regularization), reduced the batch size, and increased the number of filters. Despite the tuning and tinkering, none of these models outperformed the first CNN model on testing data accuracy score, although some were very close.

Best Performer: Keras Sequential Convolutional Neural Network, V1

Neural Network Model Architecture:
Sequential

Input Layer: Conv2D (layers: 32, kernel size: (3, 3), activation: relu)
MaxPooling2D (pool size: (2, 2))
Conv2D (layers: 32, kernel size: (4, 4), activation: relu)
MaxPooling2D (pool size: (2, 2))
Conv2D (layers: 64, kernel size: (3, 3), activation: relu)
MaxPooling2D (pool size: (2, 2))
Flatten()
Dense (units: 64, activation: relu)
Dense (units: 1, activation: sigmoid)

Parameters:

Evaluation Metrics (Testing Data):

Accuracy: 0.824
Loss: 0.444
Precision: 0.840
Recall: 0.802

As part of the evaluation phase of the model, we examined some of the artifacts that the model incorrectly classified. The figure below shows five such artifacts, with the labels again representing the ground truth class of each object (so the model predicted the opposite of whatever the label is).

Something you may have picked up on is that none of the ivory objects have the distinctive tan or yellowish-brown color that ivory is known for. Additionally, the first and last items appear almost to resemble some sort of stone or rock, and certainly aren’t molded into shapes that one would expect for an artistic piece made from ivory. The fourth object does have a more classic figure and representation that one might expect, but it is in fact not ivory. Based on our analysis, the model had the most difficulty with false negatives where the color and form of the object did not fit the mold of an ivory-based artifact (yellowish and sculpted into a human or animal figurine), and it had the most difficulty with false positives when the artifact did fit the mold of such an object but, of course, wasn’t.

Conclusion

While many of the ivory objects in this dataset are difficult to classify, our results show that successful classification of ivory artifacts is possible. An accuracy score of 82.4% is likely too low to be sufficient to deploy, but the fact that we were able to build a model that performed substantially better than chance shows that our proof-of-concept was successful. Additionally, due to the nature of the dataset, it would be extremely difficult to achieve very high accuracy scores on these objects. Some of the objects in our dataset were hundreds or thousands of years old which can lead to discoloration or disfiguration, or loss of parts of the original art piece. There were also many artifacts that may have historical or artistic significance, but likely are not greatly representative of ivory-based objects that are sold on the illegal black-markets today.

Though no other model scored better than the first CNN in terms of overall accuracy on the testing data, some of the models did swing quite a bit in terms of the precision-recall trade-off. As can be seen in the evaluation metrics and confusion matrix above, our best model performed better at correctly classifying non-ivory artifacts (higher precision). Our Adam optimized model however was the opposite, correctly classifying more ivory-based objects (higher recall). Organizations using such a model should consider the precision-recall trade-off and where on that spectrum they’d prefer the model lean towards. If catching every single potential sale of ivory is a priority, then a high recall score should take precedence. On the other hand, if organizations want to limit wasting resources on false leads and only have the model flag an image when it is very likely to be ivory, they should prioritize a high precision score.

Given additional time and resources, there are multiple next steps that could be taken which have the potential to yield better results. First and foremost, obtaining additional and more recent data would be most likely to yield a more accurate and useful model. Ideally, this data would consist of the images attached to real postings of ivory objects for sale on the online marketplaces where the model would be deployed. Additionally, experimentation with other neural networks and further hyperparameter tuning could also prove beneficial. In this project we only used dense and convolutional neural networks, but there are other forms or variations of neural networks in existence that have shown promising results on image recognition tasks.

Let’s connect! I encourage you to comment, share, or message me directly with your thoughts on the ideas and techniques presented here, or suggestions on interesting topics or resources that I should look into going forward.

LinkedIn | Project GitHub Repository