Parsimony and steps on a cladogram
All is not well in Figure 3. The characters do not always specify the same
groups. For instance, character 13 (fin rays present in the shark and the salmon)
suggests that the salmon and the shark are sister-groups relative to the lizard.
So, with the characters at hand there are two theories of taxon relationship.
These are shown in Figure 4. In alternative 1 the shark and salmon are sister-groups
evidenced by the common possession of character 13. However, if we accept this
we have to assume that characters 3 and 4 were either gained twice (once in
the salmon and once in the lizard) or that they were gained in the common ancestor
of shark+salmon+lizard and subsequently lost in the lizard.
Alternative 2 is that the lizard and salmon are sister-groups, evidenced by
the common possession of characters 3 and 4, and we have to assume that character
13 was either gained independently in the salmon and the shark or gained in
the common ancestor of shark+salmon+lizard and subsequently lost in the lizard.
In other words alternative 1 is more costly in terms of the number of assumptions
that we have to make about character evolution. In cladistic analysis, if given
no more information, we choose alternative 2 because it assumes the least (or
to turn it on its head – it explains the most in the minimum way). Alternative
2 is the more parsimonious solution and therefore is to be preferred. OK –
I can hear the cries “but fin rays are more important than large dermal
bones, maxilla and dentary.” Maybe, but that is another argument and
one that is usually the source of multitudes of disputes. Cladists use parsimony
to choose between alternatives because parsimony is a universal rule –
it can be applied everywhere in the same way. It does not mean that evolution
has followed the most parsimonious course. You do not have to accept the most
parsimonious solution, you just have to explain why you do not!

Figure 4. Parsimony. The theory to the right explains the most and assumes
the least, and is to be preferred. See text for discussion.
We can think of this in a slightly different way that is revealed in the computer
programs used by cladists. In Figure 5 there are four taxa displaying states
for six characters and this is displayed in the taxon by character data matrix
at the top (data matrices are the daily currency of cladistics). Just for now
let us assume that empty cells mean absence of something and that absence is
plesiomorphic. Taxon A has none of the attributes. It is wholly plesiomorphic
with respect to B, C and D. Taxa B, C and D have various complements of the
other characters. Given this information there are three ways in which Taxa
B, C and D can be interrelated, and these are shown in the top line of cladograms.
The individual characters can be placed on each of the cladograms according
to the groups that they specify. For instance character 1 specifies a group
B+C+D and therefore will be placed on all cladograms just once. Characters 2
and 4 are autapomorphies and therefore they too will fi t to all cladograms
just once (note that these two characters do not help resolve any relationships
and some people will ignore them). Characters 3 and 5 specify a group C+D and
therefore will be placed on the cladogram to the left once. On this cladogram
characters 3 and 5 are said to congruent; they fi t the tree perfectly. On the
other two trees characters 3 and 5 are said to be homoplasious because they
do not fi t the tree perfectly; two occurrences are needed to explain their
distribution. When all characters are fitted on to the cladogram on the left
then all but character 6 appears once. If we simply count up the number of times
characters appear this equals seven. This cladogram is said to be seven steps
long because it requires seven transformations of the characters to explain
their distribution in the most parsimonious way (computer programs report the
length of the cladogram and authors always give this). If all characters are
fi tted to all three cladograms then we will see the centre cladogram and the
one to the right are longer (nine and eight steps respectively). In other words
the cladogram to the left is the most parsimonious – often called the optimal
cladogram. The others are suboptimal.

Figure 5. Optimising characters on to alternative cladograms. See text for
discussion.
Notice at this stage that we have made no evaluation of HOW the characters
have fi t the cladogram. For characters 1, 2 and 4 there is no argument, they
all fi t once and that is that. Take a look at character 6 on the cladogram
to the left (the optimal cladogram). It specifi es a group B+C that does not
appear in this cladogram (this group appears in the right-hand cladogram). In
the optimal cladogram the character has been assumed to have arisen in B and
separately in C; parallel origination has been assumed. It has shown two steps,
both gains (absence → presence). However, we may have assumed that character
6 has been gained by B, C and D and then subsequently lost in taxon D; this
is a gain and a loss (absence → presence → absence) but still
records two steps on the tree. As far as parsimony is concerned there is no
difference and we cannot distinguish the two scenarios. We may, however, have
beliefs outside of cladistics that lead us to favour one of these transformations
over the other. For example, some mammalian palaeontologists believe that the
origination of a particular cusp pattern may be more closely related to diet
rather than genealogy, therefore parallelism is to be preferred to gain plus
loss. On the other hand most palaeontologists would assume that complex structures
such as legs are unlikely to have been developed more than once and that the
absence in snakes is a loss that followed a gain. Notice that these are not
cladistic arguments.
The action of fi tting characters to a cladogram is called optimisation. We
will come across several ways for doing this and this is where we can, if we
wish, build in some evolutionary scenarios. We have dealt with two so far contained
within Figures 4 and 5. Assuming two parallel acquisitions is called delayed
transformation (DELTRAN in the PAUP program) because the initial transformation
(absence → presence) has been delayed to near the terminal tips of the
cladogram. Assuming gain plus loss is called accelerated transformation (ACCTRAN)
because this way of optimising places the initial transformation nearer to the
root of the cladogram.
Consensus
It sometimes happens that having been through the exercise in Figure 5 we arrive
at a solution where there are more than one optimal cladograms: that is, two
or more cladograms are of equal length. We have several choices at this stage:
we could add more characters to try and resolve the problem, we could choose
one of the cladograms because it fits the stratigraphic record better, or a
palaeobiogeographic theory more comfortably, or simply because it satisfies
our preconceptions. Another is to summarise the information that is common to
them all and this is done through the use of consensus trees. We will devote
a few paragraphs to these later.