Analysis of Phylogeny and Character Evolution
Version 3

Wayne P. Maddison and David R. Maddison

Sinauer Associates Inc., Sunderland MA. 1992. 398 pp. + computer disk. ISBN 0-87893-490-1.

It may be risky to say it, since there may be many people who will disagree with me, but there seems to be a consensus among modern taxonomists that the subjective data analysis methods of the past are no longer acceptable as the sole means of studying systematic data. If science is all about the conscious testing of explicitly-stated hypotheses, then subjective methods may be a useful heuristic tool but they cannot form the focus of a rigorous scientific methodology. Taxonomy, then, has undergone a number of significant changes in the past three decades, as it attempts to come to grips with exactly what hypotheses we are testing and how best to go about testing them.

For those of us who believe that an arrangement of taxa is most suitable if it is based on the evolutionary history of the organisms concerned, then there is a need to reconstruct the hypothesized phylogeny of the organisms. That is, our initial hypotheses are about the phylogenetic patterns among the organisms concerned. Once a hypothesis of the phylogeny has been reconstructed, by whatever means, then it can be used as a basis for:- a classification of the taxa; or a study of hypotheses about evolutionary processes; or a study of hypotheses concerning biogeography, co-speciation, or co-evolutionary relationships.

The construction and evaluation of hypotheses concerning phylogenetic relationships is thus of primary concern to most modern taxonomists. Given the modern development of computers as an aid for data analysis, then it is not surprising that there are now a number of quite sophisticated computer programs available to help with the construction of trees that represent the hypothesized phylogenetic relationships. These programs include Steve Farris's HENNIG86, Joe Felsenstein's PHYLIP, and Dave Swofford's PAUP; and all serious evolutionary biologists should be familiar with at least one of these packages.

However, in the field of evaluating these trees as hypotheses there is less scope. Each of the above-named computer packages provides at least some means of evaluating the information content of a phylogenetic tree, whether this is by some form of consistency index, bootstrapping, or consensus tree. Nevertheless, this is not really enough for anyone who takes themselves seriously as a scientist. Each of these techniques is an automatic procedure that distances the user from their data - the user feeds the data into the black box and a number comes out, which the user interprets (or not, as the case may be). What is needed is a more detailed means of evaluating the character evolution implied by the phylogenetic tree (or trees, as is usually the case) produced by one of the tree-construction packages.

This need for a means of evaluating phylogenetic trees is often under-estimated by that brigade of people who like black-box data analysis. The number of times I've seen people feed their data into a data-analysis program, get a large number of alternative (e.g. equally-parsimonious) trees out, then construct a consensus tree from these, and then call it quits, always amazes me. How could this uncritical approach to phylogenetic analysis be any better than the subjective evaluations of the past? We're supposed to be scientists, so shouldn't there be some thinking in this process somewhere? In particular, consensus trees have lost much of the information that went into them, so reconstructing character evolution on such a tree is essentially meaningless.

Let's look at a simple example of what I mean. Most people recognize that multiple phylogenetic trees are a likely end-product of any data analysis with more than a handful of taxa. This is the inevitable result of contradictory evidence within the data matrix, i.e. characters that do not agree with each other about the most likely phylogenetic history of the taxa. However, how many people also recognize that there can be many trees that are only slightly less optimal than the suite of optimal trees, and that these should probably be investigated as well? Furthermore, many people do not deal with the fact that there can be more than one possible reconstruction of character evolution on a single tree, i.e. any one tree often has several ways in which the characters can be considered to have changed on the branches. Finally, it is often not recognized that reconstruction of character evolution will differ if polytomies on the tree are considered to represent multiple speciation events or to represent uncertain resolution of a series of dichotomies.

Clearly, rigourous interactive evaluation of phylogenetic trees is required if these issues are to be investigated, not just a black-box approach. What we need is some means of examining the implications of the trees; after all, most of us are far more interested in what the tree has to tell us about evolution than in the mechanics of how to construct them. Not unexpectedly, this is the subject of this review. The only comprehensive computer program designed solely for the purposes of evaluating phylogenetic trees is MacClade.

Version 1.0 of the MacClade computer program was released in 1986, and I still possess a copy of it (it was basically free, unlike the current version). It was essentially a simple tree-drawing device, which took advantage of the Apple Macintosh computer's graphical abilities to allow the user to interactively examine in some detail how the characters changed (evolved) on the branches of a specified tree. The branches of the tree could be re-arranged to see what effect this had, and the basic shape of the tree could be printed out. This meant that trees no longer had to be drawn by hand on a piece of paper, nor did they have to be endlessly re-drawn every time something was changed or you wanted to see the effect of an alternative data interpretation. More to the point, you didn't have to spend half of your time worrying about whether you were making mistakes by missing something vital in the data.

Version 2.1 appeared in 1987, which incorporated a data editor and expanded the user interface. Version 3 (1992) is now a fully-blown system for evaluating the information content of phylogenetic trees, and its presentation is the equal of that of any professional computer package that I have encountered. For this fact alone, the Maddison twins are to be congratulated (and, indeed, awed by those of us who have written programs ourselves).

The package now consists of a computer disk and a lengthy book (technically, you buy the book from the publisher, and the disk comes with it). The program only runs on Apple Macintosh computers (don't wait for a PC version - the whole program would have to be completely re-written from the beginning), and it should work on all models produced since 1987 (including the Plus, Classics, SEs, LCs, IIs, Powerbooks, Quadras, and PowerPC-based machines). You don't actually need a hard disk, but the program is severely hampered without one, and it is preferable to have at least 2 M of memory (but 4 M is better). It will run under System 4.2 or later.

Installation of the package on a computer is trivial. The package comes as two compressed files, which are easily uncompressed by double-clicking on them. The uncompressed files occupy about 1.8 M of disk space, although nearly half of this comprises the example data files (which are not essential).

The program is basically an interactive one, and it has four sections to it:- the Data Editor, in which the taxa are named and their states for various characters entered; the Tree Window, where the phylogenetic trees are modified and their relationship to the character data explored; the Character Status Window, which lists the characters and allows modification of the various evolutionary assumptions associated with them; and the Chart Window, which shows various summary statistics about the characters and their evolution on the tree. These windows can all be re-sized or overlap each other, although the Data Editor and the Tree Window cannot be displayed at the same time. There is also an extensive on-line help facility.

Data can be entered using the data editor, which is a pretty amazing spreadsheet specialized for systematics. All of the data-modification procedures that are a real pain in the neck when you're using a word processor or general spreadsheet program are automated in the MacClade editor, so that they require little more than the press of a button. There are even specialized data types specifically designed for making DNA/RNA and protein data easy to deal with. Data can be specified as unordered, ordered, irreversible, dollo, stratigraphic, or continuous, and they can have up to 26 states. All of the usual Macintosh editing facilities are available, along with many specialist features for taxonomic data. Taxa and characters can be included/excluded for various analyses; taxa can be merged; and characters can be recoded.

MacClade does not itself construct phylogenetic trees (other than randomly-arranged trees, which can then be subjected to limited branch-swapping). The simplest way to produce a suitable tree (or trees) is to use the PAUP computer program, as it reads the same data and tree files as does MacClade, and then to save the tree(s). However, MacClade can also import files from the PHYLIP and HENNIG86 programs, and can read NBRF format data files as well as data in simple spreadsheet and word-processor files. Data can also be exported from the program in all of these formats.

MacClade can deal with multiple trees in a file, and many of its analyses are specifically designed to summarize these multiple trees. Any one phylogenetic tree can be interactively manipulated in several different ways, and there are twenty different tools provided for these manipulations. The trees can contain polytomies, but the way they are interpreted is different for uncertain resolution versus multiple speciation (see above).

The original rationale for the development of MacClade was the reconstruction of the history of character change on a phylogenetic tree. A particular phylogenetic tree is presumed, along with the character states of the taxa concerned. The objective is to reconstruct the history of the character changes along the branches of the phylogeny. There are a number of ways in which this information can be displayed. One of the most useful is called Equivocal Cycling, where multiple most-parsimonious reconstructions of character evolution are displayed sequentially, thus avoiding having to re-draw the tree by hand each time.

The charting features were developed to allow the examination of multiple trees and multiple character reconstructions, and to summarize and present the results in an easily-interpretable fashion. These charts include bar charts and bubble charts, as well as tables. Furthermore, MacClade can calculate basic character and tree statistics, such as treelength, minimum and maximum possible steps/changes (these are not the same thing), as well as consistency and retention indices.

The interactive nature of the program means that results are constantly being produced and changed, which in turn means that there is no permanent record of the process of your analyses unless you explicitly ask for one. However, most of the details of the analyses are easy to specify, and they can be saved to a file for later editing and printing.

MacClade will print trees in several different formats (including circular), and with various amounts of information displayed on them. Unfortunately, not all of these formats are available for non-Postscript printers (and branch-lengths cannot yet be printed as proportional to the number of inferred changes). The trees and graphs can also be transported to a graphics program for editing. Furthermore, there is no facility to print all aspects of a data file with a single command; for the data file itself, it is necessary to use a word processor if you want to print all of the file exactly as it is stored.

All of the features appear to work as they should, and no other program comes within cooee of this one in terms of the ability to interactively investigate phylogeny. The authors suggest as their grand purpose: "to help biologists explore the relationships between data and hypotheses in phylogenetic biology". They therefore see MacClade as an aid to helping biologists understand the implications that phylogeny has for their studies, whether these are studies of molecules, development, function, adaptation, ecology, speciation, or biogeography. None of the available tree-construction programs allows you to evaluate and understand the nature of your phylogenetic trees in quite the way that this program can. Certainly, you cannot adopt a black-box approach to phylogeny when using this program.

Perhaps my biggest reservation about the program is that it has now developed considerable complexity - in fact, it may be too complex for its own good. Like many of the commercial computer applications packages (the major word-processing programs come to mind), this package is trying to be all things to all people. Consequently, it quite literally has hundreds of options (as the authors freely admit), and it is therefore rather daunting for casual users. The authors appear to have acceded to everybody's requests for new features to be incorporated, and there are a number of features that look like they are left over from interesting ideas explored by the authors, so that most people will probably not bother to find out all of the current possibilities for using the program. In fact, there are many features in the program that I cannot realistically see more than a handful of people ever using.

Reading the manual is certainly a major undertaking (as I now know from personal experience), and it is unlikely that too many people will actually bother with more than small parts of it. At best, this means that some people will be using the program inefficiently; and at worst, they may be misinterpreting the output. Using this program to draw a cladogram is a bit like using Microsoft Word to write a one-line memo - 98% of the time you're only using 2% of the capabilities of the program. It is now twenty years since Alvin Toffler warned us about the consequences of "future shock", but no-one in the computer business seems to have paid much attention - too many options means that most people will simply ignore most of them, because it is not possible to assimilate them all. The danger, then, is that the program will be misused.

The book itself consists of three parts:- Introducing MacClade (two chapters), Phylogenetic Theory (four chapters), and Using MacClade (fourteen chapters); plus two appendices and an index. The presentation is very good, although the book is fairly large for a paperback and the cover of my copy is therefore now banana-shaped at each end. There are a number of typographical errors, and there is a four-page supplement listing some of these along with several changes to the program since the book was produced. Only two of the errors are notable, one on page 135 and one on page 269, and both of these are listed in the supplement.

The book is both a manual for the computer program, describing its features and uses, as well as a description of a phylogenetic approach to studying diversity and evolution. Therefore, perhaps the most interesting part of the book is Part II, on phylogenetic theory. The topics covered include:- A Phylogenetic Perspective (16 pages), Introduction to Phylogenetic Inference (33 pages), and Reconstructing Character Evolution Using Parsimony (50 pages). There is also a brief (6 pages) introduction to stratigraphic parsimony (useful for palaeontologists), by Daniel Fisher.

This Part of the book is not just an explanation of the methods used by the MacClade program (although it certainly is that), but is more a treatise on phylogenetic methodology and thinking. It argues for the primacy (or at least the usefulness) of an explicitly phylogenetic perspective on all of biology when explaining patterns via processes, or even when recognizing biological patterns themselves. It also spends a lot of space discussing the pitfalls and limitations of phylogenetic analysis (many of which applied to earlier versions of MacClade), particularly as related to the use of parsimony as a methodological tool. In many ways, this is the best discussion of the pros and cons of phylogenetic methodology (as opposed to the details of the mathematics) that I have encountered so far. You could do a lot worse than read this section of the book if you want to put phylogenetic analysis into a broader biological perspective.

The only draw-back to Part II is that parts of it can get pretty tedious. For example, much of chapter 5 gives you the details of the computer algorithms, which will be incomprehensible to most people. This means that useful points can be missed, because they are sometimes tucked away in a mass of pedestrian detail. Perseverance pays off, however.

All in all, this package should really form a standard part of any phylogenetic data analysis. Without it, phylogenetic analysis can deteriorate to being nothing more that an uncritical black-box analysis, in spite of the best intentions of the authors of the tree-construction programs. An analysis is only as good as the programs and the user make it. The programs are continually being updated based on improvements in techniques, but it is solely up to the user as to how effective their use of these programs is. So, there must be some conscious input from the user that evaluates the meaning and usefulness of the trees, and this package provides it.

The book (plus program disk) can be obtained from Sinauer Associates; and updates for later versions of the program are also available, along with a demonstration version. [Note: MacClade v4.08 is now freely available.]

David Morrison
Department of Environmental Biology & Horticulture
University of Technology, Sydney

Originally published in Australian Systematic Botany Society Newsletter 78: 18-22 (1994).