Skip to content Skip to navigation

Article: Biases with the Generalized Euclidean Distance measure in disparity analyses with high levels of missing data

Palaeontology - Vol 62, Part 5 - Cover Image
Publication: Palaeontology
Volume: 62
Part: 5
Publication Date: September 2019
Page(s): 837 849
Author(s): Oscar E. R. Lehmann, Martín D. Ezcurra, Richard J. Butler, and Graeme T. Lloyd
Addition Information

How to Cite

LEHMANN, O.E.R., EZCURRA, M.D., BUTLER, R.J., LLOYD, G.T. 2019. . Palaeontology, 62, 5, 837-849. DOI: /doi/10.1111/pala.12430

Author Information

  • Oscar E. R. Lehmann - Sección Paleontología de Vertebrados CONICET–Museo Argentino de Ciencias Naturales ‘Bernardino Rivadavia’ C1405DJR Buenos Aires Argentina
  • Martín D. Ezcurra - Sección Paleontología de Vertebrados CONICET–Museo Argentino de Ciencias Naturales ‘Bernardino Rivadavia’ C1405DJR Buenos Aires Argentina
  • Martín D. Ezcurra - School of Geography, Earth & Environmental Sciences University of Birmingham Edgbaston Birmingham B15 2TT UK
  • Richard J. Butler - School of Geography, Earth & Environmental Sciences University of Birmingham Edgbaston Birmingham B15 2TT UK
  • Graeme T. Lloyd - School of Earth & Environment University of Leeds Leeds LS2 9JY UK

Publication History

  • Issue published online: 29 August 2019
  • Manuscript Accepted: 31 January 2019
  • Manuscript Received: 26 September 2018

Funded By

H2020 European Research Council. Grant Number: 637483

Online Version Hosted By

Wiley Online Library
Get Article: Wiley Online Library [Pay-to-View Access] |

Abstract

The Generalized Euclidean Distance (GED) measure has been extensively used to conduct morphological disparity analyses based on palaeontological matrices of discrete characters. This is in part because some implementations allow the use of morphological matrices with high percentages of missing data without needing to prune taxa for a subsequent ordination of the data set. Previous studies have suggested that this way of using the GED may generate a bias in the resulting morphospace, but a detailed study of this possible effect has been lacking. Here, we test whether the percentage of missing data for a taxon artificially influences its position in the morphospace, and if missing data affects pre‐ and post‐ordination disparity measures. We find that this use of the GED creates a systematic bias, whereby taxa with higher percentages of missing data are placed closer to the centre of the morphospace than those with more complete scorings. This bias extends into pre‐ and post‐ordination calculations of disparity measures and can lead to erroneous interpretations of disparity patterns, especially if specimens present in a particular time interval or clade have distinct proportions of missing information. We suggest that this implementation of the GED should be used with caution, especially in cases with high percentages of missing data. Results recovered using an alternative distance measure, Maximum Observed Rescaled Distance (MORD), are more robust to missing data. As a consequence, we suggest that MORD is a more appropriate distance measure than GED when analysing data sets with high amounts of missing data.

PalAss Go! URL: http://go.palass.org/kok | Twitter: Share on Twitter | Facebook: Share on Facebook | Google+: Share on Google+