Skip to content Skip to navigation

Article: Learning to see the wood for the trees: machine learning, decision trees, and the classification of isolated theropod teeth

Palaeontology - Vol. 64 Part 1 - Cover Image
Publication: Palaeontology
Volume: 64
Part: 1
Publication Date: January 2021
Page(s): 75 99
Author(s): Simon Wills, Charlie J. Underwood, and Paul M. Barrett
Addition Information

How to Cite

WILLS, S., UNDERWOOD, C.J., BARRETT, P.M. 2021. . Palaeontology, 64, 1, 75-99. DOI: /doi/10.1111/pala.12512

Author Information

  • Simon Wills - Department of Earth Sciences Natural History Museum Cromwell Road, South Kensington London SW7 5BD UK
  • Simon Wills - Department of Earth & Planetary Sciences Birkbeck College Malet Street London WC1E 7HX UK
  • Charlie J. Underwood - Department of Earth & Planetary Sciences Birkbeck College Malet Street London WC1E 7HX UK
  • Paul M. Barrett - Department of Earth Sciences Natural History Museum Cromwell Road, South Kensington London SW7 5BD UK

Publication History

  • Issue published online: 28 February 2021
  • Manuscript Accepted: 07 September 2020
  • Manuscript Received: 17 May 2020

Online Version Hosted By

Wiley Online Library
Get Article: Wiley Online Library [Pay-to-View Access] |

Abstract

Taxonomic identification of fossils based on morphometric data traditionally relies on the use of standard linear models to classify such data. Machine learning and decision trees offer powerful alternative approaches to this problem but are not widely used in palaeontology. Here, we apply these techniques to published morphometric data of isolated theropod teeth in order to explore their utility in tackling taxonomic problems. We chose two published datasets consisting of 886 teeth from 14 taxa and 3020 teeth from 17 taxa, respectively, each with five morphometric variables per tooth. We also explored the effects that missing data have on the final classification accuracy. Our results suggest that machine learning and decision trees yield superior classification results over a wide range of data permutations, with decision trees achieving accuracies of 96% in classifying test data in some cases. Missing data or attempts to generate synthetic data to overcome missing data seriously degrade all classifiers predictive accuracy. The results of our analyses also indicate that using ensemble classifiers combining different classification techniques and the examination of posterior probabilities is a useful aid in checking final class assignments. The application of such techniques to isolated theropod teeth demonstrate that simple morphometric data can be used to yield statistically robust taxonomic classifications and that lower classification accuracy is more likely to reflect preservational limitations of the data or poor application of the methods.

PalAss Go! URL: http://go.palass.org/l9v | Twitter: Share on Twitter | Facebook: Share on Facebook | Google+: Share on Google+