Information

What program was used to create this visualization of a neighbor-joining tree?

What program was used to create this visualization of a neighbor-joining tree?


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I know there are many programs out there to visualize neighbor-joining trees in phylogenetics. I have a tree in Newick format. Does anyone recognize the features of this image as coming from a particular visualization program? I have tried several that may have produced this but with no success yet.

Note: I have to visually compare a new method's NJ tree with the result shown here, so that is why I really want to know exactly what program was used. This is not a question about software recommendation or opinions. Thanks!


1 Importing Tree with Data

Phylogenetic trees are used to describe genealogical relationships among a group of organisms, which can be constructed based on the genetic sequences of the organisms. A rooted phylogenetic tree represents a model of evolutionary history depicted by ancestor-descendant relationships between tree nodes and clustering of ‘sister’ or ‘cousin’ organisms at different level of relatedness, as illustrated in Figure 1.1. In infectious disease research, phylogenetic trees are usually built from the pathogens’ gene or genome sequences to show which pathogen sample is genetically closer to a pathogen sample, providing insights into the underlying unobserved epidemiologic linkage and potential source of an outbreak.

Figure 1.1: Components of a phylogenetic tree. External nodes (green circles), also called ‘tips,’ represent actual organisms sampled and sequenced (e.g., virus in infectious disease research). They are the ‘taxa’ in the terminology of evolutionary biology. The internal nodes (blue circles) represent hypothetical ancestors for the tips. The root (red circle) is the common ancestor of all species in the tree. The horizontal lines are branches and represent evolutionary changes (grey number) measured in unit of time or genetic divergence. The bar at the bottom provides the scale of these branch lengths.

Phylogenetic tree can be constructed from genetic sequences using distance-based methods or character-based methods. Distance-based methods, including unweighted pair group method with arithmetic means (UPGMA) and Neighbor-joining (NJ), are based on the matrix of pairwise genetic distances calculated between sequences. The character-based methods, including maximum parsimony (MP) (Fitch 1971) , maximum likelihood (ML) (J. Felsenstein 1981) , and Bayesian Markov Chain Monte Carlo (BMCMC) method (Rannala and Yang 1996) , are based on mathematical model that describes the evolution of genetic characters and search for the best phylogenetic tree according to their own optimality criteria.

Maximum Parsimony (MP) method assumes that the evolutionary change is rare and minimizes the amount of character-state changes (e.g., number of DNA substitutions). The criterion is similar to Occam’s razor, that the simplest hypothesis that can explains the data is the best hypothesis. Unweighted parsimony assumes mutations across different characters (nucleotides or amino acids) are equally likely while weighted method assume unequal likely of mutations (e.g., the third codon position is more liable than other codon positions and the transition mutations have higher frequency than transversion). The concept of MP method is straightforward and intuitive, which is a probable reason for its popularity amongst biologists who care more about the research question rather than the computational details of the analysis. However, this method has a number of disadvantages, which in particular the tree inference can be biased by the well-known systematic error called long-branch attraction (LBA) that incorrectly infer distantly related lineages as closely related (Joseph Felsenstein 1978) . This is because MP method poorly takes into consideration of many sequence evolution factors (e.g., reversals and convergence) that are hardly observable from the existing genetic data.

Maximum likelihood (ML) method and Bayesian Markov Chain Monte Carlo (BMCMC) method are the two most commonly used methods in phylogenetic tree construction and are most often used in scientific publications. ML and BMCMC methods require a substitution model of sequence evolution. Different sequence data have different substitution models to formulate the evolutionary process of DNA, codon and amino acid. There are several models for nucleotide substitution, including JC69, K2P, F81, HKY and GTR (Arenas 2015) . These models can be used in conjunction with the rate variation across sites (denoted as + (Gamma) )) (Z. Yang 1994) and the proportion of invariable sites (denoted as +I) (Shoemaker and Fitch 1989) . Previous research (Lemmon and Moriarty 2004) had suggested that misspecification of substitution model might bias phylogenetic inference. Procedural testing for the best-fit substitution model is recommended.

The optimal criterion of ML method is to find the tree that maximizes the likelihood given the sequence data. The procedure of ML method is simple: calculating the likelihood of a tree and optimizing its topology and branches (and the substitution model parameters, if not fixed) until the best tree is found. Heuristic search such as those implemented in PhyML and RAxML, is often used to find the best tree based on the likelihood criterion. Bayesian method finds the tree that maximizes posterior probability by sampling trees through MCMC based on the given substitution model. One of the advantage of BMCMC is that parameter variance and tree topological uncertainty, included by the posterior clade probability, can be naturally and conveniently obtained from the sampling trees in MCMC process. Moreover, influence of topological uncertainty to other parameter estimates are also naturally integrated in the BMCMC phylogenetic framework.

In a simple phylogenetic tree, data associated with the tree branches/nodes could be the branch lengths (indicating genetic or time divergence) and lineage supports such as bootstrap values estimated from bootstrapping procedure or posterior clade probability summarized from the sampled trees in the BMCMC analysis.


Step by step example

In this section, we'll construct an evolutionary tree using Neighbor Joining Algorithm. We will use hypothetical distance matrix of n = 6 taxas.

Step 1: Calculate the net divergence r for each taxa from all other taxa.

Step 2: Calculate the new distance matrix (M) using the following formula for each pair of taxa:

Example calculation for M(A, B)

Similarly all distances are calculated and shown below:

Step 3: Using this new matrix find the closest pair of taxa i, j. Consider the lowest distance and assign u as the connecting node for that pair. Branch length is then calculated using formula:

As per the matrix M, the closest taxa pair is: AB = -13. The distance from U to A, and U to B is calculated as:

Step 4: Calculate the new distances from U to all other taxas. The distance d(u, k) between u and taxa k is given by:

Hence, for our continued example, we have

Other distances remain as is. Hence, the new matrix distance matrix will be:

Step 5: Repeat steps 1 to 4 using the new matrix of distances in every round. After each step is done recursively we will end with the following resultant tree:


Material and method

As an example, we used mitochondrial cyt b sequences (373 bp) dataset from 120 Bombus terrestris dalmatinus belonging to 8 different populations (Aksu = 15, Bayatbadem = 15, Demre = 15, Phaselis = 15, Geyikbayir = 15, Kumluca = 15, Termessos = 15, Firm = 15). Populations were grouped according to the regions from where they were collected the Aksu, Demre and Kumluca populations belong to greenhouse regions, while the Bayatbadem, Phaselis, Geyikbayir and Termessos populations belong to nature areas and the commercial population was obtained from a firm which is located in Antalya. We want to show how to obtain multiple sequence alignments, haplotype networks, heat map and phylogenetic trees from a FASTA format input file using R (4.0.3. version) [10]. For all these analyses and graphics, we benefited from both R packages and short R commands.

Preparing the dataset

The file with the .fas extension obtained from the sequencing process was used as the input file. The sample names in the data were tagged with their population name and sample number. Names and numbers were separated by underscores or spaces, for example, “Kumluca_6” or “Bayatbadem 24”. This naming method allows extracting unique names as population names from sample names with the help of a short command. Thus, the name of the population in all the analyses do not need to be entered again.

Multiple sequence alignment and plotting aligned FASTA format file

The readDNAStringSet() command supported in Biostrings package (version 2.54.0) was used to read FASTA format file [17]. With msa() function implemented in msa package, all samples were aligned to the same length by ClustalW algorithm and stored as DNAStringSet object [4, 17]. The as.DNAbin() function provided by ape package (version 5.3) was used to store multiple sequence alignments as a DNAbin object [7]. In this stage, the trim.Ends() function implemented in ips package (version 0.0.11) can be used for trimming the sequences [21]. The msaplot() command provided by ggtree package and ggplot2 package was used to demonstrate the aligned sequences with the phylogenetic tree [11]. Geometric layers ( geom_tiplab() , scale_color_continuous() , geom_tiplab() , geom_treescale() ) belonging to ggplot2 package were used fordetailing the tree [11]. To construct the phylogenetic tree, the dist.dna() function implemented in ape package was used [7]. The pairwise distance of the DNA sequences was computed with K80 model derived by Kimura [22]. The phylogenetic tree was estimated using the nj() function implemented in ape package [7]. The branch lengths of the tree have been colored to represent the genetic distance. As stated in the commands below, "lightskyblue1" was used for the longest branch of the tree and "coral4" was used for the shortest branch. Each of the nucleotides was represented by a different color. A, C, G, and T nucleotides have been colored with "rosybrown" , "sienna1" , "lightgoldenrod1" , and "lightskyblue1" , respectively, as stated in the commands below.

ggt <- ggtree(tree, cex = 0.8, aes(color = branch.length)) +

scale_color_continuous(high = ‘lightskyblue1’,low = ‘coral4’) +

geom_tiplab(align = TRUE, size = 2) +

geom_treescale(y = -5, color = "coral4", fontsize = 4)

msaplot(ggt, nbin, offset = 0.009, width = 1, height = 0.5,

color = c(rep("rosybrown", 1), rep("sienna1", 1),

rep("lightgoldenrod1", 1), rep("lightskyblue1", 1)))

Extraction of haplotypes

We wrote dynamic short R commands to find out information about haplotypes and sequence variations. Firstly, we converted the DNAbin object to the DNA matrix (120x373) using the as.matrix() command provided by R base package [10]. Secondly, by comparing the sequences, we extracted the haplotype number, haplotype frequency and variable regions. Thirdly, we identified unique haplotype sequences by ignoring common nucleotides between haplotypes and by printing variable regions.

The number of haplotypes per population was calculated using haplotypes package and short R commands [3]. Firstly, DNAbin object was converted to an object of class "DNA" using the as.dna() command which is provided by haplotypes package. Then haplotypes were extracted and grouped using the haplotype() and grouping() commands, respectively [3]. Finally, the population frequency matrix was extracted.

Haplotype distance matrix and heat map

Distance between the haplotypes was calculated by using dist.hamming() function from pegas package [2]. The Hamming distance method is a calculation of the pairwise distance matrix for the corresponding symbols between two strings of equal length [23]. Our data set consisted of haplotype sequences with 41 base pair long strings. We first separated each string into nucleotide arrays with strsplit() function, and formed a (20x41) haplotype sequences matrix, and then called dist.hamming() function for computing Hamming distance matrix.

For the construction of a heat map, we extracted the symmetric distance matrix (20x20) from the haplotype sequences matrix (20x41) using simple R commands. For this calculation, we compared the haplotype sequences in pairs, counting the nucleotide differences between them and writing them on a symmetric matrix. Then, we used this matrix for the visualization of heat map with the heatmap() command provided by stats package [10].

Haplotype network

The haplotype() and haploNet() functions implemented in pegas package were used for the construction of the haplotype network [2]. In this section, we wanted to show that data in R can be modified quickly and easily, creating multiple options for analysis. For this reason, we have shown three different haplotype graphs that were represented with different colors as hierarchical using the same data set. Thus, we have created options for those working both in individual datasets and those working with larger populations or groups. While the first haplotype network was represented by individuals, the second haplotype network was represented by populations and the third haplotype network was represented by groups. All haplotype networks were also plotted in different colors.

The haplotype network represented by individuals has been colored using rainbow colors defined as the default and the names and colors of the samples were described using fill argument in the legend() command, as below.

plot(net, size = attr(net, "freq"), scale.ratio = 2, cex = 0.6,

labels = TRUE, pie = ind.hap, show.mutation = 1, font = 2,

legend(x = 57,y = 15, colnames(ind.hap), fill = rainbow(ncol(ind.hap)),

cex = 0.52, ncol = 6, x.intersp = 0.2, text.width = 11)

We chose special colors for the haplotype network represented by the populations. For the haplotype network, the desired colors were defined as a list in bg argument in the plot() command, as below.

bg <- c(rep("dodgerblue4", 15), rep("olivedrab4",15),

rep("royalblue2", 15), rep("red",15), rep("olivedrab3",15),

rep("skyblue1", 15), rep("olivedrab1", 15),

rep("darkseagreen1", 15))

plot(net, size = attr(net, "freq"), bg = bg, scale.ratio = 2, cex = 0.7,

labels = TRUE, pie = ind.hap,show.mutation = 1, font = 2, fast = TRUE)

The names and colors of samples were described as a list in fill argument in the legend() command, as below.

hapcol <- c("Aksu", "Demre", "Kumluca", "Firm", "Bayatbadem",

"Geyikbayir", "Phaselis", "Termessos")

ubg < -c(rep("dodgerblue4",1), rep("royalblue2",1),

rep("skyblue1",1),

rep("red",1), rep("olivedrab4",1), rep("olivedrab3",1),

rep("olivedrab1",1), rep("darkseagreen1",1))

legend(x = -35,y = 45, hapcol, fill = ubg, cex = 0.8, ncol = 1, bty = "n",

For the construction of the haplotype network represented by groups, each individual has been renamed with the name of the group to which it belongs. The sample names in the DNAbin object were replaced with the group names to which they belong with a few simple commands, and the haplotype network represented by the groups was plotted. The desired color set for the network diagram was defined in a list for the gbg argument in the plot() command, as below.

gbg <- c(rep("red"), rep("blue"), rep("green"))

plot(netg, size = attr(netg, "freq"), bg = gbg, scale.ratio = 2, cex = 0.7,

labels = TRUE, pie = ind.hapg, show.mutation = 1, font = 2, fast = TRUE)

Colors of the groups were defined as a list in fill argument in the legend() command, as below.

legend(x = -35,y = 45, colnames(ind.hapg), fill = c("red","blue","green"),

cex = 0.8, ncol = 1, bty = "n", x.intersp = 0.2)

Phylogenetic trees

We demonstrated the circular phylogenetic tree by using ggtree, ggplot2, ape, and stats packages [7, 10, 11]. To construct the phylogenetic tree, the dist.dna() and nj() commands were used supported by stats package. We have shown two circular phylogenetic trees. In the first tree, populations have been colored using the aes(color = Populations) command inherited from ggtree() and were drawn using ggplot2 package, as below.

emos <- ggtree(tree, layout = ‘circular’,

branch.length = ‘branch.length’, lwd = 0.5) +

groupOTU(emos, krp, ‘Populations’) +

aes(color = Populations) +

theme(legend.position = “right”) +

geom_tiplab(names(nbin), cex = 1.7, offset = 0.002) +

guides(color = guide_legend(override.aes = list(size = 2.5))) +

geom_treescale(x = -0.1,color = “coral4”, fontsize = 3, offset = 9)

In the second tree, the phylogenetic tree was colored according to branch lengths representing genetic distance. The aes(color = branch.length) command was used for coloring branches. Colors were defined using the scale_color_continuous() command. As stated in the commands below, "lightskyblue1" color was used for the longest branch and "coral4" color was used for the shortest branch.

ggtree(tree,layout = ‘circular’, branch.length = ‘branch.length’,

aes(color = branch.length), lwd = 0.5) +

xlim(-0.1, NA) +

geom_tiplab(names(nbin), size = 1.7, offset = 0.002) +

scale_color_continuous(high = ‘lightskyblue1’,low = ‘coral4’) +

geom_treescale(x = -0.1, color = “coral4”, fontsize = 3, offset = 9)

On the other hand, we have constructed the phylogenetic relationship between haplotypes by using haplotype sequences. In this stage, treeio package (version 1.10.0) [24] was used with ggtree package. We calculated the genetic distance with the dist.hamming() function supported by pegas package [2]. The nj() function was used for neighbor-joining tree estimation. The confidence level between the branches was calculated using 100 bootstrap replicates by the boot.phylo() function implemented in the ape package [20]. The confidence interval was defined according to Kress et al. [25] criteria, as strong for 85% and above, moderate for 70–85%, weak for 50–70%, and poor for 50% and below. We colored node points using the scale_fill_manual() command inherited from the ggtree() command. As stated below, "black" , "red" , "pink1" , and "white" colors were selected according to the suggested four confidence intervals, respectively.

D <- dist.hamming(mat7)#pegas package

bp <- boot.phylo(htre, mat7, B = 100, function(x) nj(dist.hamming(x)))

bp2 <- data.frame(node = 1:Nnode(htre) + Ntip(htre), bootstrap = bp)

htree <- full_join(htre, bp2, by = “node”)

boothap <- ggtree(htree, size = 1, branch.length = ‘branch.length’) +

geom_tiplab(size = 4) +

geom_nodepoint(aes(fill = cut(bootstrap,c(0,50,70,85,100)),

shape = 21, size = 4) +

theme_tree(legend.position = c(0.85, 0.2)) +

scale_fill_manual(values = c("black","red","pink1","white",

name = ‘Bootstrap Percentage (BP)’,

breaks = c(‘(85,100]’, ‘(70,85]’,

labels = expression(BP> = 85, 70< = BP*“<85”,

50< = BP*“<70”, BP<50))


What program was used to create this visualization of a neighbor-joining tree? - Biology

Jun Adachi and Masami Hasegawa have written a package MOLPHY , version 2.3b3, carrying out maximum likelihood inference of phylogenies for either nucleotide sequences or protein sequences. Their protein sequence maximum likelihood program, ProtML, is a successor to the one they made available to me and which I formerly distributed on a nonsupported basis in PHYLIP. The package is distributed free in C source code, with documentation. MOLPHY is available from its web site from http://www.ism.ac.jp/ismlib/softother.e.html A monograph describing MOLPHY (number 28 in the Computer Science Monographs of the Institute of Statistical Mathematics) is available from the same source (see folder csm96 on the distribution web page), as TeX source and as a .dvi file. The monograph can also be ordered from the Institute. An executable version of MOLPHY 2.2 for Windows95 or Windows NT on Intel processors, and also one that works on Windows NT on DEC Alpha processors, is available from Russell Malmberg at the Botany Department of the University of Georgia ( russell (at) plantbio.uga.edu ) at his software web site at http://www.plantbio.uga.edu/

  • The C program and MacOS 9 and MacOS X executables are available by ftp from the Indiana University Biology ftp server at ftp.bio.indiana.edu in directory molbio/evolve .
  • A Debian Linux executable package for fastDNAml was made available by Stephane Bortzmeyer and is maintained by Andreas Tille. It is available through its web page at http://packages.debian.org/unstable/science/fastdnaml .

Bette Korber of the Theoretical Division, Los Alamos National Laboratory , Los Alamos, New Mexico ( btk (at) t10.lanl.gov ) and her colleagues have released a version of fastDNAml which uses the REV (general reversible) model of DNA evolution. They used it for the results in the paper: B. Korber, M. Muldoon, J. Theiler, F. Gao, R. Gupta, A. Lapedes, B. H. Hahn, S. Wolinksy and T. Bhattacharya. 2000. Timing the ancestor of the HIV-1 pandemic strains. Science 288: 1789-1796. The program is available both in a version using the MPI Message-Passing Interface for parallel computers or a non-parallel version. It is available as C source code for Unix from the web site for the programs from that paper at http://www.santafe.edu/

  • Stamatakis, A., T. Ludwig, H. Meier, and M. J. Wolf. 2002. AxML: A fast program for sequential and parallel phylogenetic tree calculations based on the maximum likelihood method. pp. 21-28 in Proceedings of 1st IEEE Computer Society Bioinformatics Conference (CSB2002), Palo Alto, California, August 2002.
  • Stamatakis, A., T. Ludwig, and H. Meier 2003. RAxML-II: A program for sequential, parallel and distributed inference of large phylogenetic trees. Concurrency and Computation: Practice and Experience (CCPE)16: 975-988.
  • Stamatakis, A., T. Ludwig, and H. Meier. 2004. RAxML-III: A fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics Advance Access published on December 17, 2004.

Daniele Silvestro and Ingo Michalak of the Department of Botany and Molecular Evolution of the Senckenberg Research Institute, Frankfurt am Main, Germany (raxmlgui.help (at) googlemail.com) have released raxmlGUI (RAxML Graphical User Interface), version 1.0, A graphical user interface for RAxML. RaxmlGUI is intended to accelerate and simplify the usage of RAxML, enabling an interactive control of all its major features (as of RAxML version 7.2.8). The graphical interface is designed to be self-explanatory and to make its use very intuitive. In addition, a detailed built-in help file is available. Through the implementation of multi-thread versions of RAxML, the GUI enables the optimal utilization of the available computational resources. It is described in the paper: Silvestro, D. and I. Michalak. 2011. raxmlGUI: a graphical front-end for RAxML. Organisms Diversity & Evolution DOI: 10.1007/s13127-011-0056-0. It is available as Python script and Intel Mac OS X executables. It can be downloaded from its web site at https://sites.google.com/site/raxmlgui/

Thomas Keane (thomas.m.keane (at) nuim.ie) and Thomas Naughton (tom.naughton (at) nuim.ie), both of the Department of Computer Science of the National University of Ireland, Maynooth have released DPRML, a distributed cross-platform tree-building program that can use the idle clock cycles of machines, allowing idle time on hundreds of machines to be harnessed for tree-building. It uses the PAL Java framework. It is described in a paper: Keane, T.M., T. J. Naughton, S. A. A. Travers, J. O. McInerney, and G. P. McCormack. 2005. DPRml: Distributed Phylogeny Reconstruction by Maximum Likelihood. Bioinformatics 21: 969-974. DPRML can be downloaded from its web page at http://distributed.cs.nuim.ie/dprml.php Its authors note that it is slower than their more recent distributed phylogeny platform MULTIPHYL, and they urge use of that instead of DPRML.

T. M. Keane, T.J. Naughton, S.A.A. Travers, J.O. McInerney, and G.P. McCormack, of the Department of Computer Science at the the National University of Ireland, Maynooth, Ireland (tkeane (at) cs.nuim.ie ) have produced MultiPhyl, version 1.06, a distributed phylogeny platform enabling maximum likelihood runs across a large number of heterogeneous machines. MultiPhyl is a high-throughput implementation of a distributed phylogenetics platform that is capable of using the idle computational resources of many heteogeneous non-dedicated machines to form a phylogenetics supercomputer. It allows a user to upload hundreds or thousands of amino acid or nucleotide alignments simultaneously and perform computationally intensive tasks such as model selection, tree searching, and bootstrapping of each of the alignments. The program implements a set of 80 amino acid models and 56 nucleotide ML models and a variety of statistical methods for choosing between alternative models. It is described in the paper: Keane, T.M., T.J. Naughton, S.A.A. Travers, J.O. McInerney, and G.P. McCormack. 2005. DPRml: Distributed Phylogeny Reconstruction by Maximum Likelihood. Bioinformatics 21(7): 969-974. It is available as Java code. It can be downloaded from the downloads web site at http://distributed.cs.nuim.ie/downloads.php for the distributed Java-based platform produced by this group. The platform itself can also be downloaded from the same site. Multiphyl can also be tested by using their web server version.

Ziheng Yang of the Department of Genetics and Biometry, University College London, ( z.yang (at) ucl.ac.uk ) has released PAML , version 4.4, a package of programs for the maximum likelihood analysis of nucleotide or protein sequences, including codon-based methods that take into account both amino acids and nucleotides. The programs can estimate branch lengths in a phylogenetic tree and parameters in the evolutionary model such as the transition/transversion rate ratio, the gamma parameter for variable substitution rates among sites, rate parameters for different genes, and synonymous and nonsynonymous substitution rates. They can also test evolutionary models, calculate substitution rates at particular sites, reconstruct ancestral nucleotide or amino acid sequences, simulate DNA and protein sequence evolution, compute distances based on the synonymous and nonsynonymous changes, and of course do phylogenetic tree reconstruction by maximum likelihood and Bayesian Markov Chain Monte Carlo methods. The strength of the package lies in its rich implementation of evolutionary models, though Yang coments that tree-making is not a strong point of the current version. Another notable point is the availability of codon models, which Yang pioneered. The package is available as Windows executables and as C source code for Unix and MacOS X systems. An Old Versions folder in the ftp site that distributes these also contains Mac OS executables for the earlier versions 3.0a and 3.0c. See the PAML web page at http://abacus.gene.ucl.ac.uk/software/paml.html where it is available.

Amy Egan and Joana Silva of The Institute for Genomic Research (TIGR) in Rockville, Maryland (aegan (at) jcvi.org) have produced IDEA (Interactive Display for Evolutionary Analysis), version 2.4, a graphical interface for PAML. IDEA allows you to run either of the PAML programs codeml or baseml on a single dataset or on multiple datasets simultaneously. They allow you to obtain maximum likelihood estimates of numbers of substitutions per branch and per site and to compare multiple models of molecular evolution given the data and a phylogenetic tree for the sequences. You can optionally generate phylogenies with PHYLIP, using maximum parsimony (on small datasets) or neighbor-joining. IDEA can perform multiple runs of codeml with different starting (dN/dS) values and merge their results for increased accuracy. It can also analyze multiple datasets in parallel and save parameter values for future use, and monitor progress step by step. For codeml analyses of sites-based evolutionary models features an interactive tabular summary of results, visualizations of selective pressure along genes, interactive histograms and depictions of phylogenetic trees with branch lengths proportional to the estimated number of nucleotide substitutions. It is available as a combination of Perl script, Java executables and Linux or Solaris executables. It can be run on systems that have Perl, Java, PAML 3.14 or 3.15, and PHYLIP. If parallel execution is desired you need to have SGE or Condor, otherwise it will just run on the machine on which it is launched. It can be downloaded from its web site at http://ideanalyses.sourceforge.net/main.html

Tim Massingham and Nick Goldman of the Eurpean Bioinformatics Institute in Hinxton, U. K. (timm (at) ebi.ac.uk and goldman (at) ebi.ac.uk) have produced SLR (Sitewise Likelihood Ratio), a program to compute and test the nonsynonymous/synonymous ratio of substitutions at each site. For coding sequences it makes a maximum likelihood estimate for each amino acid position of the ratio of nonsynonymous substitutions to synonymous substitutions, and does a likelihood ratio test for that site. The many sitewise tests are then corrected for multiple comparisons to indicate which sites have strong evidence of purifying or positive selection and so whether there is any reliable evidence for the presence of selection in the alignment. Alternatively SLR can restricted to only detect unusually variable sites, indicating such sites and providing evidence for the presence of positive selection in the alignment. It is described in the paper: Massingham, T. and N. Goldman. 2005. Detecting amino acid sites under positive selection and purifying selection. Genetics 169: 1853-1762. It is available as C source code, Windows executables, Linux executables and Powermac Mac OS X executables. It can be downloaded from its web site at http://www.ebi.ac.uk/goldman/SLR/

Gangolf Jobb (gangolf (at) treefinder.de), formerly of the Institut für Statistik of the University of München, Germany, has produced Treefinder, a maximum likelihood program for nucleotide sequence data. It makes available a variety of models of base change, including codon-position-specific models. It carries out search for best trees by its own method of tree rearrangement, and can assess statistical support for groups by either bootstrap or a local paired-sites method. All parameters of the models can be optimized by searching for the values that maximize the likelihood. The program is fast, and has both a graphical user interface and a general language in which its operation can be programmed. Trees can be interactively manipulated and constrained in various ways. Treefinder is described in a paper: Jobb, G., A. von Haeseler, and K. Strimmer. 2004. TREEFINDER: A powerful graphical analysis environment for molecular phylogenetics. BMC Evolutionary Biology 4: 18. It has been available for download from its web site at http://www.treefinder.de as executables for Windows, Mac OS X, and Linux. It requires the Java runtime environment to be present. However currently Jobb has declared himself "on strike" at this web site and asks that people first email him to discuss whether he should be compensated for his work. I do not know whether that means that the program is available for free currently, or whether he will soon start charging for it. He certainly deserves compensation for this good program.

Stéphane Guindon (currently at the University of Auckland, New Zealand, s.guindon (at) auckland.ac.nz) and Olivier Gascuel (gascuel (at) lirmm.fr) at the LIRMM, of the CNRS and the University of Montpellier II, France, have released PHYML version 3.0, a fast maximum likelihood program for nucleotide or protein sequence data. It has 6 possible DNA substition models, 5 amino acid substitution models, allowing estimation of many of the model parameters, and can allow for a gamma distribution of rates among sites and a proportion of invariable sites. It can also do bootstrapping of the trees. PHYML is described in a paper: Guindon, S., and O. Gascuel. 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Systematic Biology 52: 696-704. It is available as Linux, SunOS, Windows, and Mac OS X executables from its web site in Montpellier at http://www.atgc-montpellier.fr/phyml/binaries.php , where it is also available as a web server.

Johan Nylander (Johan.Nylander (at) abc.se) has written BootPHYML version 3.4. This is a Perl script that performs bootstrapping using programs from PHYLIP , substituting PHYML for the PHYLIP program DNAML. It works with Mac OS X and Linux or Unix. It is available on a web page at Nylander's web site in Sweden.

Bastien Boussau of the Laboratoire de Biométrie et Biologie Evolutive of the Université Lyon 1, CNRS, Lyon (bastien.boussau (at) univ-lyon1.fr) has released nhPhyML (non-homogeneous PhyML), version 1.00, a program based on PHYML to reconstruct trees with a non-homogeneous model of DNA sequence evolution. nhPhyML can reconstruct phylogenies by Maximum Likelihood framework using the Galtier and Gouy (1998) nonhomogenous model of DNA sequence evolution. This model allows different equilibrium G+C content to be associated to different branches of the phylogeny. nhPhyML will locally rearrange a given rooted phylogeny without changing the root position, and will optimize parameters of the model of sequence evolution. It is described in the paper: Boussau, B., and M. Gouy. 2006. Efficient likelihood computations with nonreversible models of evolution. Systematic Biology 55 (5): 756-768. It is available as C source code and Linux executables. It can be downloaded from its web site at http://pbil.univ-lyon1.fr/software/nhphyml/

Bastien Boussau of the Laboratoire de Biométrie et Biologie Evolutive of the Université Lyon 1, CNRS, Lyon (bastien.boussau (at) univ-lyon1.fr) has released PhyML-Multi (a PhyML-derived program to detect recombination which uses multiple trees for one alignment), version 1.00, a program that can infer recombination breakpoints and infer multiple phylogenetic from one alignment. PhyML-Multi can find recombination breakpoints in an alignment and infer phylogenies. It takes as input a sequence alignment and a putative number k of trees to reconstruct. Then, using a HMM or a mixture model, it will infer k trees from the alignment and predict recombination breakpoints, under the Maximum Likelihood criterion. PhyML-Multi can work on DNA sequences as well as protein sequences, and can handle dozens of sequences. It is described in the paper: Boussau, B., L. Guéguen, and M. Gouy. 2009. A mixture model and a hidden markov model to simultaneously detect recombination breakpoints and reconstruct phylogenies. Evolutionary Bioinformatics Online 5 (June 25): 67-79. It is available as C source code and Linux executables. It can be downloaded from its web site at http://pbil.univ-lyon1.fr/software/phyml_multi/

Pierre Rioux and Tim Littlejohn, then of the Informatics Division of the Organelle Genome Megasequencing Program at the Université de Montréal (LittleJohn is currently at BioLateral Group, in Sydney, Australia, tim (at) biolateralgroup.com) made available PARBOOT (PARallel BOOTstrapping), a program that takes bootstrap sampled data sets and splits them up, submitting each to a different computer, so as to run bootstrapping quickly on networks of computers. It is intended for use with PHYLIP programs. It is available free as C source code from the Indiana University IUBIO software archive at http://microbe.bio.indiana.edu:7131/soft/iubionew/molbio/evolution/phylo/ParBoot/ . It is no longer available by ftp from Montréal. It is described on a web page at the Université de Montréal at http://megasun.bch.umontreal.ca/aboutpb.html . It requires a networked system of computers with PHYLIP, a Perl interpreter, and appropriate accounts and permissions.

Laura Salter Kubatko of the Departments of Statistics and Evolution, Ecology, and Organismal Biology at the Ohio State University, Columbus, Ohio (lkubatko (at) stat.ohio-state.edu) has written SSA (inference of maximum likelihood phylogenetic trees using a Stochastic Search Algorithm), version 1.0 , a program that uses a stochastic search to find maximum likelihood phylogenies. SSA is a program for inferring maximum likelihood phylogenies from DNA sequences. Two versions of the program are available: one which assumes a molecular clock and one which does not make this assumption. The method for searching the space of trees for the ML tree is based on a simulated-annealing type algorithm. The program implements the F84 model of nucleotide substitution and associated sub-models. It estimates the ML tree and branch lengths, and can optionally estimate the transversion/transversion ratio. Upon termination, the program returns the k trees of highest likelihood found during the search, where k can be set by the user. It is described in the paper: Salter, L. A. and D. K. Pearl. 2001. Stochastic search strategy for estimation of maximum likelihood phylogenetic trees. Systematic Biology 50(1): 7-17. It is available as executables for Windows, Linux, AIX, and SPARC systems. Laura is also willing to send the source code to users who own the book Numerical Recipes in C by Press, Teukolsky, Vetterling and Flannery, and who thus have permission to use routines from that book. The documentation and executables can be downloaded from its web site at http://www.stat.ohio-state.edu/

  • Friedman, N., M. Ninio, I. Pe'er, and T. Pupko. 2002. A Structural EM Algorithm for phylogenetic inference. Journal of Computational Biology9(2): 331-353.
  • Ninio, M., E. Privman, T. Pupko, and N. Friedman. 2007. Phylogeny reconstruction: increasing the accuracy of pairwise distance estimation using bayesian inference of evolutionary rates. Bioinformatics23: e136-e141.

Simon Whelan of the Faculty of Life Sciences at the University of Manchester, U.K. (simon.whelan (at) manchester.ac.uk) has released Leaphy (Likelihood Estimation Algorithms for PHYlogenetics), version 1.0beta, a fast and accurate program for maximum likelihood phylogenetic inference. Leaphy uses maximum likelihood to estimate trees from aligned amino acid and nucleotide sequences under a variety of commonly used and popular models. The methods for searching for the best tree topology are described in the paper: Whelan, S. 2007. New approaches to phylogenetic tree search and their application to large numbers of protein alignments. Systematic Biology 56: 727-740. It is available as Windows executables and Linux executables. It can be downloaded from its web site at http://www.bioinf.manchester.ac.uk/leaphy/Leaphy.htm

Daniele Catanzaro of the Computer Science Department of the Université Libre de Bruxelles (U.L.B.) (dacatanz (at) @ulb.ac.be) has released PhyloCoco version 2.3, a molecular phylogeny package for Intel-iMac with OS 10.4.9 or higher and Java 1.4 or higher. PhyloCoco is a minimalist tool for rebuilding molecular phylogenies by means of the likelihood criterion or the minimum evolution criterion. Phylococo selects the best substitution model of DNA evolution for the dataset of sequences to be analyzed and displays the best phylogeny found so far. It uses the GTR model of DNA evolution and uses different optimization methods including the Very Large Scale Neighborhood (VLSN) search for the topology and Iterated Local Search (ILS) to explore the solution space. PhyloCoCo uses FigTree to display the resulting phylogeny. It is described in the paper: Catanzaro, D., R. Pesenti and M. C. Milinkovitch. 2007. Estimating phylogenies under maximum likelihood: a very large-scale neighborhood approach. Submitted to BMC Bioinformatics. It is available as Java source code and Intel Mac OS X executables. It can be downloaded from its web site at http://homepages.ulb.ac.be/

  • Hudelot, C., V. Gowri-Shankar, H. Jow, M. Rattray and P. Higgs. 2003. RNA-based phylogenetic methods: Application to mammalian mitochondrial RNA sequences. Molecular Phylogenetics and Evolution28: 241-252.
  • Jow, H., C. Hudelot, M. Rattray and P. Higgs. 2002. Bayesian phylogenetics using an RNA substitution model applied to early mammalian evolution. Molecular Biology and Evolution19: 1591-1601.

Le Sy Vinh (vinh (at) cs.uni-duesseldorf.de) and Heiko Schmidt (heiko (at) cs.uni-duesseldorf.de) of the Institut für Bioinformatik of the University of Düsseldorf, Germany and Arndt von Haeseler (arndt.von.haeseler (at) univie.ac.at) of the Center for Integrative Bioinformatics Vienna (CIBIV), Austria, have written Phylogenetic Navigator (PhyNav) version 1.0. This program finds subsets of species in a dataset that are "minimal k-distance subsets" and analyses these each by maximum likelihood. Then it stitches these groups together using likelihood. This makes it possible to analyze larger datasets. The program is described in a paper: Vinh, L. S., H. A. Schmidt, and A. von Haeseler. 2005. PhyNav: A novel approach to reconstruct large phylogenies. pp. 386-393 in Classification, the Ubiquitous Challenge (Proceedings of the 28th Annual Conference of the GfKl 2004), ed. C. Weihs and W. Gaul. Series Studies in Classification, Data Analysis, and Knowledge Organization. Springer-Verlag, Heidelberg/New York. It is available as Linux executables from its web site at http://www.cibiv.at/software/phynav/

  • Chen, S. C., M. Rosenberg, and B. Lindsay. 2011. MixtureTree: a program for constructing phylogeny, BMC Bioinformatics12: 111.
  • Chen, S. C., M. Li, M. Rosenberg, and B. Lindsay. 2011. Mixture tree construction and its applications. pp 135-147 in pp. 135-147 in Handbook of Statistical Bioinformatics, ed. by H. S. Lu, B. Scholkopf, and H. Zhao. Springer Handbooks of Computational Statistics. Springer-Verlag
  • Price, M. N., P. S. Dehal, and A. P. Arkin 2009. FastTree: Computing large minimum-evolution trees with profiles instead of a distance matrix. Molecular Biology and Evolution26: 1641-1650
  • Price, M. N., P. S. Dehal, and A. P. Arkin 2010. FastTree 2 -- approximately maximum-likelihood trees for large alignments. PLoS ONE5(3): e9490

Paul Michael Agapow, then of the Department of Biology of Imperial College, Silwood Park, U.K. and more recently of the Health Protection Agency, U.K., (agapow (at) agapow.net) has written Mac5, version 1.7.3, a program for phylogenetic reconstruction using gapped data. MAC5 implements MCMC sampling to estimate a phylogenetic tree from a DNA multiple alignment. What differentiates MAC5 from similar programs is its use of five-state sequence evolution models as a means to include the gap information. It is available as C source code, Windows executables and Powermac Mac OS X executables. Its author says that owing to other projects, Mac5 is not being further developed and is not being supported by him. It can be downloaded from its web site at http://www.agapow.net/software/mac5

David Posada (dposada (at) uvigo.es) of the Department of Biochemistry, Genetics and Immunology of the University of Vigo, Spain and Keith Crandall of the Department of Biology, Brigham Young University released Modeltest version 3.7, a program to test a hierarchy of statistical models of DNA evolution using the Likelihood Ratio Test criterion and the AIC (Akaike Information Criterion). The likelihood values are obtained by running PAUP*. MODELTEST accepts likelihood scores corresponding to 56 models of DNA substitution including whether transition and transversion rates are equal, whether rates at different sites are equal, and whether there are invariant sites. Modeltest is described in the paper: Posada, D. and K. A. Crandall. 1998. MODELTEST: testing the model of DNA substitution. Bioinformatics 14: 817-818. It is available as executables for Macintosh, for Windows, and source code in C for that can be compiled on many other systems. It is distributed from its web site at http://darwin.uvigo.es/software/modeltest.html . Modeltest was the basis for two further developments: the MrModeltest program which uses MrBayes and the FindModel server at Los Alamos National laboratories which is a revised version of Modeltest that uses the weighbor program to infer the trees.

David Posada (dposada (at) uvigo.es) of the Department of Biochemistry, Genetics and Immunology of the University of Vigo, Spain has released jMODELTEST version 0.1.1, a Java version of Modeltest. Like Modeltest, it carries out statistical selection of best-fit models of nucleotide substitution. It implements five different model selection strategies: hierarchical and dynamical likelihood ratio tests (hLRT and dLRT), Akaike and Bayesian information criteria (AIC and BIC), and a decision theory method (DT). It also provides estimates of model selection uncertainty, parameter importances and model-averaged parameter estimates, including model-averaged phylogenies. It is described in the paper: Posada D. 2008. jModelTest: Phylogenetic Model Averaging. Molecular Biology and Evolution 25: 1253-1256. It is distributed as Java executables that will run on Java-equipped Windows systems, on Mac OS X, and on Linux systems that have Java installed. It also uses PHYML to comput maximum likelihood trees under the various models. I do not know whether it comes with PHYML installed or requires the user to install it. jMODELTEST will be found at its web site at http://darwin.uvigo.es/software/jmodeltest.html

Paulo Nuin (nuinp (at) mcmaster.ca) of the Department of Biology, McMaster University, Hamilton, Ontario, Canada has released MrMTgui version 1.01. This is a graphic user interface for running Modeltest and MrModeltest. It is available for Windows as executables from the MrMTgui web site at http://genedrift.org/mtgui.php . Source code of a Linux version is also available which can be compiled using the WxWindows windowing software. The Linux sources are available by accessing a svn (subversion) version-control code base, using instructions available at the above site. MrMTgui was formerly known as MTgui in the earlier version which could not access MrModeltest.

Johan Nylander (Johan.Nylander (at) abc.se) has released MrModeltest version 2.2. This is a program which is a simplified version of Modeltest 3.7. It is performs hierarchical likelihood ratio tests and calculates approximate AIC, AICc, and Akaike weights of the nucleotide substitution models currently implemented in both PAUP* and MrBayes. Version 2 has added use of four different hierarchies for the likelihood ratio tests and the selected model being printed in a MrBayes block. MrModeltest is available as an executable and source code for Windows, for Mac OS, and for Mac OS X, and as source code for Linux and Unix. It is available from Nylander's software download site at http://www.abc.se/

Johan Nylander (Johan.Nylander (at) abc.se) has written Modelfit version 1.2, and MrModelfit version 1.2. These are Perl scripts that can run (respectively) Modeltest and MrModeltest simply by typing a single command line. They are available from Nylander's software download site at http://www.abc.se/

Charles Bell of the Department of Biology of Xavier University of Louisiana, New Orleans (cbell3 (at) xula.edu) has written Porn* (Phylogenetics On Rick's Network, as it was originally hosted on Rick Ree's site) verson 2.0, a Linux clone of Modeltest using the Python language. It enables command-line computations equivalent to Modeltest under the Linux operating system. It creates command blocks for PAUP* which can be used when running PAUP*. Porn* is written as a shell script invoking Python modules. It is available at its web site at http://www.phylodiversity.net/cbell/pornstar/

David Posada (dposada (at) uvigo.es) of the Department of Biochemistry, Genetics and Immunology of the University of Vigo, Spain has released ProtTest, version 2.4, a Java program allowing testing of 64 different models of protein evolution, using the AIC, AICc, and BIC criteria for choosing among models that include different substitution models, invariant sites, rate heterogeneity, and empirical amino acid frequency variants of the models. ProtTest uses the PAL library of phylogenetic java routines and also uses the PHYML program to compute likelihoods. It is described in the paper: Abascal, F., R. Zardoya and D. Posada. 2005. ProtTest: Selection of best-fit models of protein evolution. Bioinformatics 21: 2104-2105. It is available from its web site at http://darwin.uvigo.es/software/prottest.html

Thomas Keane, of the Bioinformatics and Pharmacogenomics Lab of the Department of Biology, National University of Ireland, Maynooth (thomas.m.keane (at) nuim.ie) has written ModelGenerator, version 0.85. It is a Java program for model selection that selects amino acid and nucleotide substitution models using Fasta or PHYLIP alignments. It supports 56 nucleotide and 80 amino acid substitution models. It is described in the paper: Keane, T. M., C. J. Creevey, M. M. Pentony, T. J. Naughton and J. O. McInerney. 2006, Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. BMC Evolutionary Biology 6: 29. It is available from its web site at http://bioinf.may.ie/modelgenerator/ .

Johan Nylander (Johan.Nylander (at) abc.se) has written MrAIC verion 1.4.4. This is a Perl script that carries out AIC, AICc, BIC, and Akaike weights model comparison methods for nucleotide substitution models by invoking the PHYML program. It is distributed from Nylander's software download site at http://www.abc.se/

Vladimir Minin, Zaid Abdo, Paul Joyce, and Jack Sullivan of the Department of Biological Sciences at the University of Idaho, Moscow, Idaho (jacks (at) uidaho.edu) or (vminin (at) u.washington.edu) (Minin is now at the University of Washington) have released DT-ModSel (Decision Theory MODel SELection), a performance-based method for selecting a likelihood model for phylogenetic estimation . It implements a model selection method which is based on the Bayesian Information Criterion, but incorporates relative branch-length error as a performance measure in a decision theory (DT) framework. This DT method includes a penalty for overfitting, is applicable prior to running extensive analyses, and simultaneously compares all models being considered and thus does not rely on a series of pairwise comparisons of models to traverse model space. It can compare 56 different models of molecular sequence evolution on a given tree. Minin, V., Z. Abdo, P. Joyce, and J. Sullivan. 2003. Performance-based selection of likelihood models for phylogeny estimation. Systematic Biology 52: 674-683. It is available as Perl script. It can be downloaded from its web site at http://www.webpages.uidaho.edu/

Sergei Kosakovsky Pond and Simon Frost of the Anitviral Research Center, University of California, San Diego and Spencer Muse of the Department of Statistics, North Carolina State University, Raleigh, North Carolina ( muse (at) stat.ncsu.edu ) have released HY-PHY (HYpothesis testing using PHYlogenies), version 0.99Beta. HY-PHY has general ways of enabling the user to perform a wide variety of statistical tests of different models of molecular sequence change. It is actually a higher-level programming language which enables the user to set up many different kinds of tests. The user can define their own alphabet of symbols and test any reversible subtitution model. Examples of tests that can be performed include molecular clock tests, relative rate tests, relative ratio tests, and tests of positive selection. It is described in a paper: Kosakovsky Pond, S. L., S. D. Frost, and S. V. Muse. 2005. HyPhy: hypothesis testing using phylogenies. Bioinformatics 21(5): 676-679.

Although not primarily intended as a phylogeny estimation package, it also can infer trees by Neighbor-Joining and UPGMA methods, and a number of search strategies are also available for likelihood inference. HY-PHY is freely available as executables for Mac OS, for Mac OS X, for Windows, and as source code for for Unix and Linux. It is available at the HY-PHY web page at http://www.hyphy.org .

Akifumi S. Tanabe of the Division of Ecology and Evolutionary Biology, Department of Environmental Life Sciences, Graduate School of Life Sciences of Tohoku University, Japan (astanabe (at) mail.tains.tohoku.ac.jp) has released Kakusan4, a parallelized nucleotide substitution model selection script written in the Perl language for data sets with multiple partitions. Kakusan3 supports nucleotide substitution model selection on each partition and/or each codon position by AIC, AICc or BIC. Because the optimization of likelihoods is executed using BASEML, PAUP* or Treefinder and these can be run in parallel, Kakusan can take advantage of multi-core systems or multiple processor systems. The Kakusan Perl script can be run on Windows, MacOS X, Linux, FreeBSD and on other UNIX operating systems. It accepts several different input file formats. It outputs configuration files for Treefinder, MrBayes and PAUP*. It is described in the paper: Tanabe, A. S., 2007, Kakusan: a computer program to automate the selection of a nucleotide substitution model and the configuration of a mixed model on multilocus data. Molecular Ecology Notes 7: 962-964. It is available as Perl script, Windows executables and Mac OS X universal executables. It can be downloaded from its web site at http://www.fifthdimension.jp/products/kakusan/ . Earlier versions, Kakusan, Kakusan2, and Kakusan3 can also be downloaded there.

Jonathan Bollback of the University of Edinburgh, Edinburgh, U.K., and of the Institute of Science and Technology, Austria (j.p.bollback (at) ed.ac.uk) has written MAPPS (Model Adequacy in Phylogenetics by Predictive Simulation) version 1.1.6, a program to evaluate the fit of a group of phylogenetic models to DNA sequence data. The rationale behind this approach is that an adequate model should be able to predict future data (nucleotide site patterns). In the absence of future data the model's predictive ability is compared to the original data set. The model's predictive ability is evaluated through simulation under the model. Comparison of simulated (or predictive) data sets is evaluated using the multinomial test statistic. The program uses data and trees in a format compatible with the output from MrBayes. It is described in the paper: Bollback, J. P. 2002. Bayesian model adequacy and choice in phylogenetics. Molecular Biology and Evolution 19(7): 1171-1180. It is available as Mac OS X universal executables. It can be downloaded from its web site at http://www.simmap.com/bollback/software.html

Hidetoshi Shimodaira ("Shimo") of the Department of Mathematical and Computing Sciences, Tokyo Institute of Technology, Japan ( shimo (at) is.titech.ac.jp ) has released CONSEL version 0.1k, a package of small programs to calculate P values for tests of phylogenies. It uses output from other phylogeny programs (in particular it can use output from PAUP, PAML, PHYML, and MOLPHY) which makes available to it the sitewise log-likelihoods for some trees and the trees themselves. It uses these to carry out the Kishino-Hasegawa test, the Shimodaira-Hasegawa test, a weighted version of the SH test, and a new "approximately unbiased" test of Shimodaira's. CONSEL is available as C source code that will compile on Linux and Unix systems that have the gcc compiler, and it is also available as a DOS executable that will run on DOS or Windows systems. It can be downloaded from its web site at http://www.ism.ac.jp/

shimo/prog/consel/index.html . It is described in a paper: Shimodaira, H. and M. Hasegawa. 2001. CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics 17: 1246-1247 which cites the statistical papers describing the methods.

  • Shimodaira, H. 2002. An approximately unbiased test of phylogenetic tree selection, Systematic Biology51 492-508.
  • Shimodaira, H. 2004. Approximately unbiased tests of regions using multistep-multiscale bootstrap resampling, Annals of Statistics32 2616-2641.

Maria Anisimova, Olivier Gascuel, and Jean-François Dufayard of the Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM) of the Université de Montpellier II, Montpellier, France (manisimova (at) hotmail.com) have produced PHYML-aLRT (PHYML approximate Likelihood Ratio Test), version 1.1, a program to carry out likelihood ratio tests of the presence of branches in a phylogeny. PHYML-aLRT is a modification of the original PHYML program, and is designed to compute test of the reality of branches in a known phylogeny. Five branch support tests are available: (1) the bootstrap, (2) aLRT statistics, (3) aLRT parametric (Chi 2 -based) branch support, (4) aLRT non-parametric branch support based on a Shimodaira-Hasegawa-like procedure, and (5) a combination of these two latters supports, that is, the minimum value of both. The methods are described in the paper: Anisimova, M., and O. Gascuel. 2006. Approximate likelihood ratio test for branchs: A fast, accurate and powerful alternative. Systematic Biology 55(4): 539-552. It is available as Windows executables, Linux executables, Solaris executables, Powermac Mac OS X executables and Intel Mac OS X executables. It can be downloaded from its web site at http://atgc.lirmm.fr/phyml/alrt/ This program was of temporary usefulness the method was made available in PHYML 3.0 and should probably be used from that program, athough these executables are still available for download.

Nick Grassly, of the Department of Infectious Disease Epidemiology of the School of Public Health, Imperial College School of Medicine, St. Mary's Campus, London (n.grassly (at) imperial.ac.uk) has written PLATO , version 2.11, (Partial Likelihoods Assessed Through Optimisation), a program that takes sequential PHYLIP-style DNA sequences followed by their maximum likelihood phylogeny, and using a likelihood approach with sliding window analysis and Monte Carlo simulation of the null distribution detects anomalously evolving regions in the DNA sequences and assesses their significance. This may lead to the detection of, for example, recombination, gene conversion or convergence, or reveal variable selective pressures along the gene sequence. A general substitution model is used that can allow the test to reveal differences due to recombination while ignoring those due to varying rate of evolution. The method is described in the paper: Grassly, N. C., and E. C. Holmes. 1997. A likelihood method for the detection of selection and recombination using sequence data. Molecular Biology and Evolution 14: 239-247. It is available as a Mac OS Macintosh binary executable, or in source code for Unix systems. Although no longer distrubuted by Grassly, it is available at the IUBIO web site at http://microbe.bio.indiana.edu:7131/soft/iubionew/molbio/dna/analysis/Plato/

  • Milne, I., D. Lindner, M. Bayer, D. Husmeier, G. McGuire, D. F. Marshall and F. Wright. 2009. TOPALi v2: a rich graphical interface for evolutionary analyses of multiple alignments on HPC clusters and multi-core desktops. Bioinformatics25 (1): 126-127
  • Milne, I., F. Wright, G. Rowe, D. F. Marshal, D. Husmeier, and G. McGuire. 2004. TOPALi: Software for automatic identification of recombinant sequences within DNA multiple alignments. Bioinformatics20 (11): 1806-1807.

Kim Fisker, then of the Computer Science Department at Aarhus University, Denmark released RecPars , which does a parsimony analysis of DNA sequences. It was more recently maintained by Thomas Christensen of that department. It tries to find the best phylogenies for different regions of the sequences and thereby postulating a recombination event between these segments. The method is described in a paper: Hein, J. 1993. A heuristic method to reconstruct the history of sequences subject to recombination. Journal of Molecular Evolution 36: 396-406. RecPars is available as C source code for Unix. It is distributed from its web site at http://www.daimi.au.dk/

compbio/recpars/recpars.html . A web server is available there as well.

Dan Gusfield (gusfield (at) cs.ucdavis.edu) and Ren-Hua Chung (rchung (at) ucdaavis.edu), both of the Department of Computer Science at the University of California, Davis, have released PPH (Perfect Phylogeny Haplotyper). PPH takes a set of diploid genotypes for SNP (single nucleotide polymorphism) markers, and infers haplotypes for them. It does this by seeing whether it can find a set of haplotypes that resolve all diploid genotypes and that fit onto a tree without requiring any extra changes of nucleotides (in other words, they are all compatible with the same tree). The result is not only the haplotype resolution but the resulting tree, if any. The method is described in a paper: Gusfield, D., 2002 Haplotyping as perfect phylogeny: conceptual framework and efficient solutions, pp. 165-175 in Proceedings of RECOMB 2002, edited by G. Myers, S. Hannenhalli, D. Sankoff, S. Istrail, P. Pevzner et al. ACM Press, New York. The program is available as C++ and Perl source code, and as executables for Windows, for SUN SPARC Solaris, for Intel/AMD-compatible Linux, and for Mac OS X from its web site at http://wwwcsif.cs.ucdavis.edu/

  • Minin, V. N., K. S. Dorman, and M. A. Suchard. 2005. Dual multiple change-point model leads to more accurate recombination detection, Bioinformatics21: 3034-3042.
  • Suchard M. A., R. E. Weiss, K. S. Dorman. and J. S. Sinsheimer. 2003. Inferring spatial phylogenetic variation along nucleotide sequences: a multiple change-point model. Journal of the American Statistical Association98: 427-437.
  • Suchard M. A., R. E. Weiss, K. S. Dorman, and J. S. Sinsheimer. 2002. Oh brother, where art thou? a Bayes factor test for recombination with uncertain heritage. Systematic Biology51: 715-728.

Karin Dorman of the Department of Genetics, Development and Cell Biology of Iowa State University, Ames, Iowa (kdorman (at) iastate.edu) has written cBrother, a C version of the DualBrothers program, with extensions. cBrother is a C version of the Java code of DualBrothers, developed by Suchard et al. as a Bayesian multiple change point model to test for the presence of rare recombination events in the history of a set of sampled sequences. It is available as C source code. It can be downloaded from its web site at http://rumi.gdcb.iastate.edu/software/index.xml

Simone Linz, Achim Radtke, and Arndt von Haeseler of the Center of Integrative BioInformatics Vienna of the University of Vienna, Austria (jarndt.von.haeseler (at) univie.ac.at and linz (at) cs.uni-duesseldorf.de) have written HGT (Horizontal Gene Transfer), a program to test for the presence of horizontal gene transfer. HGT considers the distribution of trees obtained from a set of different genes, and then simulates the trees obtained with a single species tree and different rates of horizontal gene transfer. The estimation of the rate of horizontal gene transfer is made based on the extent of differences among individual gene trees in the simulation and in the observed set of loci. The methods are described in the paper: Linz, S., A. Radtke, and A. von Haeseler. 2007. A Likelihood framework to measure horizontal gene transfer. Molecular Biology and Evolution 24: 1312-1319. HGT is available as C source code. It can be downloaded from its web site at http://www.cibiv.at/software/hgt/

Darren P. Martin and Ed Rybicki of the Microbiology Department of the University of Cape Town, Cape Town, South Africa Darrin.Martin (at) uct.ac.za) have released RDP3 (Recombination Detection Program), version 3.27, a program that applies a large number of recombination detection and analysis algorithms. This includes many of the methods used in other recombination-detection programs. In all it has about 12 different methods. The software runs under Windows and combines highly automated screening of large numbers of sequences with a highly interactive interface for examining the results of the analyses. It is described in the paper: Martin, D. P., and E. P. Rybicki. 2000. RDP: detection of recombination amongst aligned sequences. Bioinformatics 16: 562-563. It is available as Windows executables. It can be downloaded from its web site at http://darwin.uvigo.es/rdp/rdp.html An older version, RDP2, is also available there, as is an "unstable" early release of RDP4.

Robert Beiko and Nicholas Hamilton of the Institute for Molecular Bioscience at the University of Queensland, Australia (beiko (at) cs.dal.ca) have released EEEP (Efficient Evaluation of Edit Paths), version 1.0, a program for inference of lateral genetic transfer by comparison of phylogenetic trees. EEEP performs subtree prune-and-regraft (SPR) operations on a rooted reference tree to reconcile it with a user-supplied tree inferred from data. The rooting of the reference tree is used to constrain the SPR operations that are allowed. The test tree need not be rooted or binary, and may contain an incomplete subset of the taxa represented in the reference tree. EEEP has been successfully compiled under RedHat Linux and AIX, as well as in Mac OS X and Windows XP. It is described in the paper: Beiko, R.G., and N. Hamilton. 2006. Phylogenetic identification of lateral genetic transfer events. BMC Evolutionary Biology 6: 15, in which it was used to infer LGT events on 16,000 genes. It is available as C++ source code, Windows executables and Linux executables. It can be downloaded from its web site at http://bioinformatics.org.au/eeep

Gary Olsen of the Department of Microbiology, University of Illinois, Urbana, Illinois ( gary (at) phylo.life.uiuc.edu ) has written dnarates version 1.1.0. It reads a set of DNA sequences and a tree, and for that tree makes a maximum likelihood estimate of the rate of evolution at each site. This is done by taking the rate at each site as a separate parameter and maximizing the likelihood with respect to all those parameters. The program is available as generic C source code. It is based in part (with my permission) on code from my PHYLIP program DNAML . dnarates is available from the IUBIO phylogeny software page at http://iubio.bio.indiana.edu/soft/molbio/evolve/

Bette Korber of the Theoretical Division, Los Alamos National Laboratory , Los Alamos, New Mexico ( btk (at) t10.lanl.gov ) and her colleagues have released RevDNArates which is a version of Gary Olsen's program dnarates which uses the REV (general reversible) model of DNA evolution and calculates the maximum likelihood estimate of rate of change at each site (one parameter per site). They used it for the results in the paper: B. Korber, M. Muldoon, J. Theiler, F. Gao, R. Gupta, A. Lapedes, B. H. Hahn, S. Wolinksy and T. Bhattacharya. 2000. Timing the ancestor of the HIV-1 pandemic strains. Science 288: 1789-1796. The program is available as C source code for Unix from the web site for the programs from that paper at http://www.santafe.edu/

Sonja Meyer and Arndt von Haeseler, then of the Insititut für Bioinformatik, Heinrich Heine Universität, Düsseldorf, Germany (von Haeseler is now at the Center for Integrative Bioinformatics Vienna, and his email address is arndt.von.haeseler (at) &nbpsunivie.ac.at) have released PARAT, version 0.9.1. This program infers a phylogeny and also site-specific evolutionary rates (one for each site). It can do so for up to 100 sequences directly. Above 100 sequences, it samples sets of sequences and estimates the rates from each such set, and then averages the resulting rates. It is distributed as open source C source code, which can readily be compiled and installed. PARAT is decscribed in a paper: Meyer, S. and A. von Haeseler. 2003. Identifying site specific substitution rates. Molecular Biology and Evolution 20: 182-189. It is available at its web site at http://www.cibiv.at/software/parat/

Itay Mayrose of the Department of Cell Research and Immunology of the George S. Wise Faculty of Life Sciences, Tel Aviv University, Israel (itaymay (at) post.tau.ac.il ) has written Rate4Site version 2.01, a program to estimate rates of evolution at different sites in protein sequences. Rate4Site uses aligned protein sequences, constructs a tree by a neighbor-joining or uses a user-defined input tree, and then infers the branch lengths and the rates of evolution at the sites. These are assumed to be drawn from a Gamma distribution and can be estimated either by maximizing the likelihood of the tree with respect to each of the rates, or by using a Bayesian inference with the Gamma distribution as the prior (the parameters of the Gamma distribution are estimated empirically so that this is an Emprical Bayes method). The methods are described in the paper: Mayrose, I., D. Graur, N. Ben-Tal and T. Pupko. 2004. Comparison of site-specific rate-inference methods: Bayesian methods are superior. Molecular Biology and Evoution 21: 1781-1791. It is available as C++ source code and Windows executables. It can be downloaded from its web site at http://www.tau.ac.il/

  • Mayrose I, D. Graur, N. Ben-Tal, and T. Pupko. 2004. Comparison of site-specific rate-inference methods for protein sequences: Bayesian methods are superior. Molecular Biology and Evolution21: 1781-1791.
  • Mayrose, I., A. Mitchell, and T. Pupko. 2004. Site-specific evolutionary rate inference: taking phylogenetic uncertainty into account. Journal of Molecular Evolution60(3): 315-326.

Jessica Leigh, Ed Susko, Manuela Bumgartner, and Andrew Roger of the Department of Biochemistry and Molecular Biology and the Department of Mathematics and Statistics of Dalhousie University, Halifax, Nova Scotia, Canada (jleigh (at) dal.ca) have written Concaterpillar version 1.2, a program that carries out a hierarchical likelihood ratio test for phylogenetic congruence. It tests for two kinds of hypotheses in supermatrix analysis. The first is the null hypothesis (H0) that the phylogenies of markers in the supermatrix are congruent. If we cannot reject congruence for a set of markers, the second hypothesis to test is whether or not the markers to be combined have significantly different evolutionary dynamics (branch lengths and rates-across-sites parameters) that is, whether they should be concatenated or subjected to separate analysis. The methods are described in the paper: Leigh, J. W., E. Susko, M. Baumgartner, Roger AJ. 2008. Assessing congruence in phylogenomic data. Systematic Biology 57: 104-115. It is available as Python script. It uses the program RAxML to infer trees, and the SciPy Python library as well. It can be downloaded from its web site at http://rogerlab.biochemistryandmolecularbiology.dal.ca/Software/Software.htm#Concaterpillar

Haichun Wang, Matthew Spencer, Ed Susko, and Andrew Roger of the Department of Mathematics and Statistics and of the Department of Biochemistry and Molecular Biology of Dalhousie University, Halifax, Nova Scotia, Canada (hcwang (at) mathstat.da.ca) have produced PROCOV (PROtein COVarion analysis), version 1.3.2, a program for maximum likelihood estimation of phylogeny under protein covarion models. PROCOV computes the likelihood of a given tree under the rates-across-sites model or under the covarion-like model of Tuffley and Steel, the model of Huelsenbeck, and the model of Galtier, as well as for a general model that combines features of both the Huelsenbeck and Galtier models. Procov can also optimize tree topologies with subtree pruning-regrafting to search tree space. Procov is very computationally slow, so this is most useful for small trees. It is described in the paper: Wang, H-C, M. Spencer, E. Susko, and A. J. Roger. 2007. Testing for covarion-like evolution in protein sequences. Molecular Biology and Evolution 24: 294-305. It is available as C source code. The authors suggest using the BLAS matrix library when compiling it. It can be downloaded from its web site at http://www.mathstat.dal.ca/

  • Goldman, N. 1998. Phylogenetic information and experimental design in molecular systematics. Proceedings of the Royal Society London B265: 1779-1786
  • Massingham, T. and N. Goldman. 2000. EDIBLE: experimental design and information calculations in phylogenetics. Bioinformatics16: 294-295.

Bret Larget, of the Departments of Statistics and Botany at the University of Wisconsin, Madison (larget (at) stat.wisc.edu) and Donald Simon of the Department of Mathematics and Computer Science, Duquesne University, Pittsburgh, Pennsylvania (simon (at) mathcs.duq.edu) have written BAMBE (Bayesian Analysis in Molecular Biology and Evolution) version 4.01a, a program for Bayesian analysis of phylogenies with DNA sequence data. It uses a prior distribution of trees and arearrangement mechanism introduced in the paper: Mau, B., M. A. Newton, and B. Larget. 1997. Bayesian phylogenetic inference via Markov chain Monte Carlo methods. Molecular Biology and Evolution 14: 717-724. The trees and parameter values are sampled by a Metropolis algorithm Markov Chain Monte Carlo sampling. The resulting posterior distribution can be used to characterize the uncertainty about not only the tree, but the parameters of the substitution model as well. The program is in C++ source code for Unix, and is distributed from his web site at http://www.stat.wisc.edu/

larget/ . A Windows executable of an earlier version is also available there. The 2.03 and earlier versions are also available at a web page at Duquesne University. BAMBE is also available as a web server at the Institut Pasteur in Paris.

Mark Pagel and Andrew Meade of the School of Biological Sciences of the University of Reading, Reading, U.K. (m.pagel (at) reading.ac.uk) have written BayesPhylogenies, version 1.1, a program for estimating phylogenies by Bayesian inference. BayesPhylogenies uses Bayesian Markov Chain Monte Carlo (MCMC) or Metropolis-coupled Markov chain Monte Carlo (MCMCMC) methods. The program allows a range of models of gene sequence evolution, models for morphological traits, models for rooted trees, gamma and beta distributed rate-heterogeneity, and implements a mixture model that allows the user to fit more than one model of sequence evolution without partitioning the data. It is described in the paper: Pagel, M. and Meade, A. 2004. A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Systematic Biology 53: 571-581. It is available as Windows executables, Linux executables, and Powermac Mac OS X executables. It can be downloaded from its web site at http://www.evolution.rdg.ac.uk/BayesPhy.html

Nicolas Lartillot of the LIRMM (Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier) of the Université de Montpellier II, Montpellier, France (nicolas.lartillot (at) lirmm.fr) has written PhyloBayes version 2.1c, a Bayesian phylogeny package for protein sequences using a mixture model. PhyloBayes is a Bayesian Monte Carlo Markov Chain (MCMC) sampler for phylogenetic reconstruction using protein alignments. Compared to other phylogenetic MCMC samplers, the main distinguishing feature of PhyloBayes is the underlying probabilistic model, CAT. This is a mixture model especially devised to account for site-specific features of protein evolution. It is particularly well suited for large multigene alignments. PhyloBayes can also do divergence time estimation with a relaxed molecular clock, posterior predictive analyses, including a compositional homogeneity test, and data recoding (analogous to R/Y coding, but for amino-acids). The CAT model is described in the paper: Lartillot, N. and H. Phillipe. 2004. A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Molecular Biology and Evolution 21(6): 1095-1109. PhyloBayes is a package of programs that operate together to do the steps of the analysis. It is distributed as C++ source code and Linux executables. The C++ source code can be compiled on Linux, Windows, or Mac OS X systems. It can be downloaded from its web site at http://www.atgc-montpellier.fr/phylobayes/binaries.php A server is here

John Huelsenbeck ( johnh (at) berkeley.edu ) of the the Department of Integrative Biology of the University of California, Berkeley, and Fredrik Ronquist (Fredrik.Ronquist (at) nrm.se) of the Naturhistoriska riksmuseet, Stockholm, Sweden have written MrBayes , version 3.1.2, a program for Bayesian inference of phylogenies from nucleic acid sequences, protein sequences, and morphological characters. It assumes a prior distribution of tree topologies and uses Markov Chain Monte Carlo (MCMC) methods to search tree space and infer the posterior distribution of topologies. It reads sequence data in the NEXUS file format, and outputs posterior distribution estimates of trees and parameters. It can also use a hierarchical Bayesian framework to infer sites that are under natural selection. It allows for rate variation among sites and a variety of models of sequence evolution. MrBayes is available as a Macintosh (PowerMac) executable, a Windows executable, or as source code in C. It allows for multiple-chain Metropolis-coupled Markov Chain Monte Carlo (MC 3 ) runs for more extensive search, and can be asked to spread jobs over a cluster of computers using the MPI message-passing interface. (Incidentally, since Bayes was Reverend Bayes, shouldn't it be named RevBayes?) . MrBayes executables, source code, and documentation are available from the MrBayes web page at http://mrbayes.net .

Torsten Eriksson of the Bergius Botanical Garden, Stockholm, Sweden ( torsten (at) bergianska.se ) makes available MrBayes tree scanners. These are two Perl scripts that scan the output parameter files produced by MrBayes. One saves the tree corresponding to the best sample. The other saves all trees that contain a specific node (a specific grouping). They are distributed together, and available from his software distribution site at http://www.bergianska.se/index_forskning_soft.html .

Marc Suchard of the Department of Biomathematics of the David Geffen School of Medicine at UCLA, Los Angeles (msuchard (at) ucla.edu) has written MrBayesPlugin, a Java plugin module enabling Geneious to run MrBayes. With it, Geneious v2.5.4 (or above) is enabled perform and analyze simple Bayesian phylogenetic reconstruction using MrBayes. It is available as Java executables. It can be downloaded from its web site at http://www.biomath.ucla.edu/msuchard/software/software.htm

Alexei Drummond, of the Department of Computer Science of the University of Auckland, New Zealand (alexei (at) cs.auckland.ac.nz) and Andrew Rambaut (a.rambaut (at) ed.ac.uk)), of the Institute for Evolutionary Biology, University of Edinburgh, Scotland, and formerly of the Department of Zoology, University of Oxford, Oxford, U.K., have developed BEAST (Bayesian Evolutionary Analysis Sampling Trees), version 1.4.1. This is a general Bayesian inference program for parameters of evolutionary models when the trees are coalescent trees. A variety of nucleotide substitution models including relaxed molecular clocks are allowed, and population models that include exponential population growth and divergence time between populations are included. Most of the analyses use Bayesian sampling to infer parameters by averaging over the posterior on the trees. For the purposes of this listing, the two relevant features are the ability to output a sample of the trees, so that the program can be used for Bayesian tree inference in clocklike models, and the ability to infer the divergence time between populations. The general approach used by BEAST is described in the paper: Drummond, A. J., G. K. Nicholls, A. G. Rodrigo, and W. Solomon. 2002. Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. Genetics 161: 1307-1320. BEAST is available as a Java executable which will run on any system with Java 1.4 or later. There are specific packages available for Mac OS X and for Windows as well as the general distribution. These are all distributed from its web site at http://beast.bio.ed.ac.uk/Main_Page

Alexei Drummond, of the Department of Computer Science of the University of Auckland, New Zealand (alexei (at) cs.auckland.ac.nz) and Andrew Rambaut (a.rambaut (at) ed.ac.uk)), of the Institute of Evolutionary Biology at the University of Edinburgh, Scotland, U.K. have released Tracer, version 1.2. This is a program for analyzing the results of Bayesian sampling runs using either BEAST or MrBayes. It allows analysis of the progress of sampling the parameters. For the purposes of this listing, the relevant feature is an ability to use the trees sampled by these programs to do a Bayesian skyline plot analysis of birth and death rates of lineages. Tracer is available as a Java executable from its web site at http://tree.bio.ed.ac.uk/software/tracer/ with specific packages for Mac OS X and Windows as well.

Johan Nylander of the Department of Botany of the University of Stockholm, Stockholm, Sweden (johan.nylander (at) abc[dot]se) has released burntrees version 0.1.7, a script for manipulating the output from MCMC programs (MrBayes, BEAST). It is a script for manipulating tree (*.t, *.trprobs, *.con) and parameter (*.p) files from MrBayes (v.3), and other MCMC programs. The script can extract any contiguous interval of trees, or make a random selection of a fraction of them. It can also thin the chain by sampling every nth iteration. Branch lengths can also be removed from trees when they are sampled. Trees can also be converted from Nexus to Phylip (Newick) format or to altnexus format (sequence labels instead of numbers). In a similar fashion, lines can also be extracted from a MrBayes *.p file. The script comes with a helper script, catmb.pl, that concatenates files from several runs It is available as Perl script. It can be downloaded from its web site at http://www.abc.se/

  • Rabosky, D. L. 2006. Likelihood methods for inferring temporal shifts in diversification rates. Evolution60: 1152-1164.
  • Rabosky, D. L., S. C. Donnellan, A. L. Talaba, and I. J. Lovette. 2007. Exceptional among-lineage variation in diversification rates during the radiation of Australia's largest vertebrate clade. Proceedings of the Royal Society of London, Series B274: 2915-2923.

Pavel Morozov and Andrey Rzhetsky of the Department of Biomedical Informatics and the Columbia Genome Center of Columbia University, New York, New York (pm259 (at) columbia.edu and andrey.rzhetsky (at) dbmi.columbia.edu) have released PHYLLAB version 1.1, A toolbox for sequence manipulation and phylogenetic analysis in MatLab. PHYLLAB takes as input a set of aligned nucleotide or amino-acid sequences, and performs phylogeny inference. Beside traditional phylogenetic methods it uses a Markov chain Monte Carlo method, evaluating the posterior distribution over tree topologies and a variety of model parameters, including parameters of substitution-rate variation under a wavelet model. The graphical interface helps users to manage input data and to visualize the most likely trees they can also view substitution-rate plots that show the maximum posterior density (confidence) intervals. It is written in the MatLab language, and interested users can extend it easily. The PHYLLAB toolbox is continually expanding, and the authors expect to offer many more functions and scripts for different purposes soon. It is available as a MATLAB package. It can be downloaded from its web site at http://amdec-bioinfo.cu-genome.org/html/misc/Pavel/phyllab.html

Peter Foster (p.foster (at) nhm.ac.uk) of the Natural History Museum, London, England has released p4 version 0.81, a Python package for maximum likelihood and Bayesian phylogenetic analyses of molecular sequences. This is not a program with menus and buttons it is invoked using the Python language, which the user should know before attempting to use it. It can do Bayesian inference of phylogenies, as well as computation of likelihoods of trees. It also has facilties for viewing large trees and for manipulation of trees. It needs Python 2.3 or better and the Gnu Scientific Library (GSL) installed on the machine. It is distributed as Python source code at its web site at http://www.bmnh.org/web_users/pf/p4.html

Mike Charleston ( mcharles (at) it.usyd.edu.au ) of the Sydney University Biological Informatics and Technology Centre, Sydney, Australia has developed Spectrum , a program for finding bipartition spectra from phylogenetic molecular and distance data, according to the method of Hendy et al. (1994) (Hadamard transforms) for moderately sized data sets (up to 18 taxa). The program also implements a branch-and-bound search for the "closest tree" - that is, the tree whose expected spectrum is closest to the spectrum derived from the observed data. Mac OS PowerMac, 68k Mac OS, and Windows executables are available from his software web site at http://www.it.usyd.edu.au/

Ingrid Jakobsen, Susan Wilson, and Simon Easteal, of Australian National University, Canberra, released partimatrix . (Ingrid Jakobsen is currently at the Department of Mathematics of the University of Queensland, Australia, i.jakobsen (at) uq.edu.au). This program computes a "partition matrix" from aligned DNA sequence data. The method finds partitions of the sequences into two groups and presents a matrix which describes the conflict and agreement among these partitions. The objective is to discover parts of the DNA sequence which imply different trees. It is described in the paper by I. B. Jakobsen, S. R. Wilson and S. Easteal. 1997. The Partition Matrix: Exploring variable phylogenetic signals along nucleotide sequence alignments. Molecular Biology and Evolution 14: 474-484. The program is distributed as C source code for Unix systems with X Windows. It seems not to be available from Dr. Jakobsen, but is ` available from a site at the Centro Nacional de Cálculo Científico de la Universidad de Los Andes, Venezuela at http://www.cecalc.ula.ve/BIOINFO/servicios/herr1/PARTIMATRIX/manual.htm

Carla Cummins and James McInerney of the Department of Biology of the National University of Ireland Maynooth (james.o.mcinerney (at) nuim.ie) has released TIGER version 1.02, A program for identifying rapidly-evolving characters in a matrix of evolutionary characters. TIGER is open source software for identifying rapidly evolving sites (columns in an alignment, or characters in a morphological dataset). It can deal with many kinds of data (molecular, morphological etc.). Sites like these are often removed or reweighted in order to improve phylogenetic reconstruction, as they might not hold much phylogenetic information and therefore might simply be a source of noise. It is described in the paper: Cummins, C. A. and J. O. McInerney. 2011. A method for inferring the rate of evolution of homologous characters that can potentially improve phylogenetic inference, resolve deep divergence and correct systematic biases. Systematic Biology 60 (6): 833-844. doi: 10.1093/sysbio/syr064. It is available as Python script and Mac OS X universal executables. It can be downloaded from its web site at http://bioinf.nuim.ie/tiger

Yasuo Ina of the National Institute of Agrobiological Resources, Tsukuba, Japan developed ODEN version, a package of programs for doing distance matrix analyses on nucleotide or protein sequences. It is described in a paper: Ina, Y. 1994. ODEN: a program package for molecular evolutionary analysis and database search of DNA and amino acid sequences. Computer Applications in the Biosciences (CABIOS) 10: 11-12. It is available free by anonymous ftp from directory pub/unix/oden on ftp.dna.affrc.go.jp as C source code for Unix systems.

Angela Lüttke and Rainer Fuchs (then of the European Molecular Biology Laboratory Fuchs is currently at Biogen, Inc., Cambridge, Massachusetts) wrote MacT , a package of programs for Mac OS Macintoshes that compute distances and compute Neighbor-Joining phylogenies for them. The programs work on 4 through 26 sequences, and source code in Microsoft QuickBasic is provided as well as compiled executables. The package is free and is available on the molecular biology software servers. For example, it is available at the Indiana University IUBIO server at http://iubio.bio.indiana.edu/soft/molbio/mac/ . It is described in a paper: Luttke, A. and R. Fuchs. 1992. MacT: Apple Macintosh programs for constructing phylogenetic trees. Computer Applications in the Biosciences 8: 591-594.

Nicholas Galtier of the University of Lyon ( galtier (at) biomserv.univ-lyon1.fr ) has written Phylo_win , a "graphic interface" for molecular phylogenetic inference. It performs neighbor-joining, parsimony and maximum likelihood methods and can bootstrap with any of them. Many distances can be used including Jukes and Cantor, Kimura, Tajima and Nei, Galtier and Gouy (1995), LogDet for nucleotidic sequences, Poisson correction for protein sequences, Ka and Ks for codon sequences. Species and sites to include in the analysis are selected by mouse. Reconstructed trees can be drawn, edited, printed, stored, evaluated according to numerous criteria. Taxonomic species groups and sets of conserved regions can be defined by mouse in both tools and stored into sequence files, thus avoiding multiple data files. It is entirely mouse-driven. Most usual sequence file formats are read: CLUSTAL, FASTA, PHYLIP, MASE. It runs under X windows on many Unix workstations. It is described in the paper: Galtier, N., M. Gouy, and C. Gautier. 1996. SeaView and Phylo_win, two graphic tools for sequence alignment and molecular phylogeny. Computer Applications in the Biosciences 12: 543-548. Phylo_win is now considered by Galtier to have been superseded by his program SeaView. Phylo_win is distributed as C source code (to compile it one needs the NCBI Vibrant tool kit). It is also available as executables for SunOS, Solaris, SGI Unix, IBM RISC Unix, Linux, HP/UX, and DEC Alpha (Digital Unix). It can be fetched from its web page at http://pbil.univ-lyon1.fr/software/phylowin_legacy.html . It can also be obtained by anonymous ftp from biom3.univ-lyon1.fr in directory pub/mol_phylogeny . A Digital OpenVMS executable is also available as http://www.tmk.com/ftp/vms-freeware/mathog/ .

  • Strimmer, K. and A. von Haeseler. 1996. Quartet puzzling: A quartet maximum likelihood method for reconstructing tree topologies. Molecular Biology and Evolution 13: 964-969.
  • Strimmer, K., and A. von Haeseler. 1997. Likelihood-mapping: A simple method to visualize phylogenetic content of a sequence alignment. Proceedings of the National Academy of Sciences (USA) 94: 6815-6819.
  • Schmidt, H.A., K. Strimmer, M. Vingron, and A. von Haeseler. 2002. TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics18: 502-504.

Mike Holder, formerly of the High Performance Computing Center of the University of Houston and Andrew Roger ( aroger (at) is.dal.ca ) of the Department of Biochemistry and Molecular Biology of Dalhousie University, Halifax, Canada have produced a shell script program for Unix systems, puzzleboot , version 1.03, that allows the analysis of multiple bootstrapped data sets with TREE-PUZZLE. It is designed for use with the distance matrix option of TREE-PUZZLE, to make use of the distance calculation methods. It is available from the Roger lab software page at http://rogerlab.biochemistryandmolecularbiology.dal.ca/Software/Software.htm#puzzleboot

  • Huson, D. H. 1998. SplitsTree: analyzing and visualizing evolutionary data. Bioinformatics 14: 68-73.
  • Huson, D. H. and Bryant, D. 2006. Application of phylogenetic networks in evolutionary studies. Molecular Biology and Evolution23(2): 254-267.
  • SplitsTree4, a Java version which can run under Linux, Windows, and Mac OS X.
  • SplitsTree 3.2, which is available as a Windows executable, a Linux, and a Solaris executable, and also a Mac OS X version by Rod Page
  • SplitsTree 3.1, also as a Windows and a Linux version
  • SplitsTree 2.4, for Mac OS.

Igor Kuznetsov and Pavel Morozov, then of the Institute of Cytology and Genetics, Novosibirsk, Russia produced GEOMETRY , a package for nucleotide sequence analysis using the method of statistical geometry in sequence space. Kuznetsov (ikuznetsov (at) albany.edu) is currently at the Department of Epidemiology and Biostatistics at the State University of New York in Albany, Morozov (pm259 (at) columbia.edu) is currently at the Irving Cancer Research Center at Columbia University. The method is described in this paper: Eigen, M., R. Winkler-Oswatitsch, and A. Dress. 1988. Statistical geometry in sequence space: A method of quantitative comparative sequence analysis, Proc. Natl. Acad. Sci. USA 85: 5913-5917. The program is described in the article: Kuznetsov, I. and P. Morozov. 1996. GEOMETRY: a software package for nucleotide sequence analysis using statistical geometry in sequence space. Computer Applications in the Biosciences (CABIOS) 12: 297-301. The package uses the same data formats for sequence and tree input as the ones used in the VOSTORG package. GEOMETRY is available as a DOS executable. It is available for downloading by ftp from the EMBL file server ftp.ebi.ac.uk in directory pub/software/dos as file geom.zip.

Vincent Berry of the LIRMM, Université de Montpellier, France (vberry (at) lirmm.fr) has released PhyloQuart version 1.4, a package of programs inferring phylogenies from quartets. It is able to use either nucleotide sequences or distances. It implements the Q* method of tree reconstruction, which is inspired by the work of Bandelt and Dress, and is described in the paper: Berry, V. and O. Gascuel. 2000. Inferring evolutionary trees with strong combinatorial evidence. Theoretical Computer Science 240: 271-298. PhyloQuart is available as C source code which can be compiled on Unix systems, from its web site at http://www.lirmm.fr/

  • Vinh, L. S. and A. von Haeseler. 2004. IQPNNI: Moving fast through tree space and stopping in time. Molecular Biology and Evolution21: 1565-1571.
  • Bui Quang Minh,, B. Q. L. S. Vinh, A von Haeseler and H. A. Schmidt. 2005. pIQPNNI: Parallel reconstruction of large maximum likelihood phylogenies. Bioinformatics21(19): 3794-3796.
  • Minh, B. Q., L. S. Vinh, H. A. Schmidt, and A. von Haeseler. 2006. Large maximum likelihood trees. Proceedings of the NIC Symposium 2006 pp. 357-365, Forschungszentrum Jülich, Germany.
  • Willson, S. J. 1998. Measuring inconsistency in phylogenetic trees, Journal of Theoretical Biology 190: 15-36
  • Willson, S. J. 1998. Building phylogenetic trees from quartets by using local inconsistency measures . Molecular Biology and Evolution 16: 685-693.

James Lake of the Department of Molecular, Cell and Developmental Biology of the University of California, Los Angeles (lake (at) mbi.ucla.edu) has released Gambit , which implements a method called Boostrapper's Gambit. The method involves bootstrap sampling sequences, computing trees for quartets of species, and assembling larger trees out of quartets that have significant boostrap support. One of the methods available to estimate trees from the quartets is paralinear (LogDet) distances. Other distance methods and parsimony are also available. The Bootstrapper's Gambit method is described in a paper: Lake, J. A. 1995. Calculating the probability of multitaxation evolutionary trees: Bootstrappers gambit. Proceedings of the National Academy of Sciences, USA 92: 9662-9666. The program is available as a DOS executable, free as a beta release to noncommercial users on a trial basis until January 1, 2003. (It is unclear from the web site whether a free version is to be available to noncommercial users after that point -- a previous deadline was extended). Commercial users are asked to pay $50 on a shareware basis. The program is available at its web site at http://genomics.ucla.edu/gambit/ .

  • Bandelt H.-J., P. Forster, B. C. Sykes, and M. B. Richards. 1995. Mitochondrial portraits of human populations using median networks. Genetics 141: 743-53.
  • Bandelt, H-J., P. Forster, and A. Röhl. 1999. Median-joining networks for inferring intraspecific phylogenies. Molecular Biology and Evolution 16: 37-48.

Mike Hendy, Katharina T. Huber, Michael Langton, Vincent Moulton, and David Penny have written Spectronet version 1.27, a program that computes a collection of weighted splits or partitions and allows the user to interactively analyze the results with a series of tools. Hendy and Penny are at Massey University, New Zealand (m.hendy (at) massey.ac.nz and d.penny (at) massey.ac.nz), Huber and Moulton are at the School of Computational Science of the University of East Anglia, U.K. (Katharina.Huber (at) cmp.uea.ac.uk and Vincent.Moulton (at) cmp.uea.ac.uk). Spectronet can read molecular sequence or discrete character data, compute splits by Hadamard conjugation or directly, compute and display compatibility matrices of characters, make reduced median networks, and plot networks by making a Lentoplot. Spectronet is described in a paper: Huber, K. T., M. Langton, D. Penny, V. Moulton and M. Hendy. 2002. Spectronet: A package for computing spectra and median networks. Applied Bioinformatics 1: 159-161. It is available as a Windows executable from its web site at http://awcmee.massey.ac.nz/spectronet/index.html .

Steven Kelk, Leo van Iersel, Judith Keijsper, and Leen Stougie of the Centrum voor Wiskunde en Informatica (CWI) and Technische Universiteit Eindhoven (TU/e), Netherlands (S.M.Kelk (at) cwi.nl) have produced LEVEL2 version 0.91, which constructs level-2 phylogenetic networks from dense sets of rooted triplets. This program takes as input a dense set of rooted triplets and attempts to construct a level-2 phylogenetic network from them (or level-1, or level-0, if level-2 is not necessary). Triplets are the rooted analogue of quartets, and a dense set of triplets is one where for every subset of three taxa there is at least one triplet. A level-k phylogenetic network is a rooted phylogenetic network where every biconnected component in the underlying, undirected graph contains at most k recombination vertices. The program produces an image of the resulting network, if it is found. It is described in the paper: van Iersel, L., J. Keijsper, S. Kelk, and L. Stougie. 2007. Constructing level-2 phylogenetic networks from triplets. arXiv:0707.2890v1 [q-bio.PE]. It is available as Java source code, and also requires that the DOT graph description package be installed. It can be downloaded from its web site at Sourceforge at http://sourceforge.net/projects/level2/ and a general web page about it is at its web site at http://homepages.cwi.nl/

Luay Nakhleh, Derek Ruths, and Cuong Than of the Department of Computer Science of the Rice University, Houston, Texas (nakhleh (at) cs.rice.edu) have released PhyloNet (Phylogenetic Network analysis ), version 2.3, a phylogeny package with tools for reconstructing and analyzing phylogenetic networks. It has programs for inferring horizontal gene transfer events, by estimating the SPR distance between two trees (along with a bootstrap-based measure of support), and interspecific recombination, by using maximum parsimony. It also has tools for enumerating the trees and clusters of taxa within a given network, comparing the topologies of networks, estimating the strain tree of bacterial genomes from multi-locus data, and enumerating valid coalescent histories of a gene tree within the branches of a species tree. It is described in the paper: Than, C., D. Ruths, and L. Nakhleh, 2008. PhyloNet: A Software Package for Analyzing and Reconstructing Reticulate Evolutionary Relationships. Under Review. It is available as Java executables. It can be downloaded from its web site at http://bioinfo.cs.rice.edu/phylonet/index.html

  • Jin, G., L. Nakhleh, S. Snir, and T. Tuller. 2006. Maximum likelihood of phylogenetic networks. Bioinformatics22(21): 2604-2611.
  • Jin, G., L. Nakhleh, S. Snir, and T. Tuller. 2007. Efficient parsimony-based methods for phylogenetic network reconstruction. Bioinformatics23: e123-e128.
  • Hey, J., and R. Nielsen. 2004. Multilocus methods for estimating population sizes, migration rates and divergence time, with applications to the divergence of Drosophila pseudoobscura and D. persimilis. Genetics167: 747-760.
  • Hey J, and R. Nielsen, 2007, Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics. Proceedings of the National Academy of Sciences USA104(8): 2785-2790.
  • Liu, L. and D. K. Pearl. 2007. Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. Systematic Biology56: 504-514.
  • Edwards, S. V., L. Liu, and D. K. Pearl. 2007. High resolution trees without concatenation. Proceedings of the National Academy of Sciences104: 5936-5941.
  • Liu, L., D. K. Pearl, R. T. Brumfield, and S. V. Edwards. 2008. Estimating species trees using multiple-allele DNA sequence data. Evolution62: 2080-2091.

Ruchi Chaudhary, Mukul S. Bansal, André Wehe, David Fernández-Baca, and Oliver Eulenstein of the Department of Computer Science at Iowa State University, Ames, IA (oeulenst (at) cs.iastate.edu) have released iGTP version 1.0, a software package for large-scale phylogenetic analyses using gene tree parsimony. iGTP implements algorithms for inferring species supertrees that best reconcile the input gene trees under the gene-duplication, gene-duplication and loss, and deep coalescence cost models. iGTP extends the functionality and performance of existing gene tree parsimony software and features building effective initial trees using greedy stepwise leaf addition and the ability to have unrooted gene trees in the input. Moreover, iGTP provides a user-friendly graphical interface with integrated tree visualization software to facilitate analysis of the results. It is described in the paper: Chaudhary, R., M. S. Bansal, A. Wehe, D. Fernández-Baca and O. Eulenstein. 2010. iGTP: A software package for large-scale gene tree parsimony analysis.BMC Bioinformatics 11: 574. It is available as Windows executables, Linux executables and Mac OS X universal executables. The authors can be contacted for the source code. The executables can be downloaded from its web site at http://genome.cs.iastate.edu/CBL/iGTP/

Andrew Roger, of the Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada (aroger (at) is.dal.ca) has written ELW (Expected Likelihood Weights), two PERL scripts -- elw.pl and calcwts.pl -- that, together with PAUP* and the PHYLIP program Seqboot can be used to implement the "expected likelihood weights" method of Strimmer and Rambaut, described in the paper by Strimmer, K. and A. Rambaut. 2002. Inferring confidence sets of possibly misspecified gene trees. Proceedings of the Royal Society of London Series B 269: 137-142. It calculates a confidence interval for the maximum likelihood tree using the variation of the likelihoods among bootstrap estimates of the tree. ELW can be downloaded from its entry on Roger's software web page at http://rogerlab.biochemistryandmolecularbiology.dal.ca/Software/Software.htm#elw

  • njboot -- construct a neighbor-joining (NJ) tree
  • postree -- create a postscript file of trees
  • tpcv -- conduct the two-cluster test
  • branch -- conduct the branch length test
  • branbst -- conduct the branch length test by bootstrap

genomelb/takezaki.eng.html#software and also at the Nei lab software web site at http://www.bio.psu.edu/people/faculty/nei/software.htm . They are also available at by ftp from the IUBio archive at http://iubio.bio.indiana.edu/soft/molbio/evolve/lintr/ .

Andrew Rambaut (a.rambaut (at) ed.ac.uk)), of the Institute for Evolutionary Biology, University of Edinburgh, Scotland, and formerly of the Department of Zoology, University of Oxford, has written TipDate version 1.2. TipDate is an application for estimating the rate molecular evolution (and hence a time-scale) for a phylogeny consisting of dated tips. These will most frequently be from viruses or other fast-evolving pathogens that have been isolated over a range of dates. The program can also return the likelihood for the simple molecular clock model (i.e., assuming that all sequences are contemporary), for a model in which rates of change at different times are drawn from a distribution, or the non-clock model. These are useful for likelihood ratio tests of the fit of the model to the data. TipDate is described in a paper: Rambaut, A. 2000. Estimating the rate of molecular evolution: incorporating non-contemporaneous sequences into maximum likelihood phylogenies. Bioinformatics 16: 395-399. TipDate is available as Mac OS executables and as source code for Linux or Unix from the IUBIO software site at http://microbe.bio.indiana.edu:7131/soft/iubionew/molbio/evolution/evolve/TipDate/ . It is also available in a web-based server version from the Pasteur Institute server.

Thomas Wilcox, formerly of the Center for Computational Biology and Informatics of the University of Texas, and more recently of Long Key Tropical Research Center, Florida (tpwilcox (at) lktrc.org) has produced Cadence version 1.0.1, a program for Bayesian relative rate tests. It is described in the paper: Wilcox, T. P., F. J.García de Leon, D. A. Hendrickson, and D. M. Hillis. 2004. Convergence among cave catfishes: Long-branch attraction and a Bayesian relative rates test. Molecular Phylogenetics and Evolution 31: 1101-1113. It is available as Powermac Mac OS X executables. It can be downloaded from its web site at the University of Texas at http://www.zo.utexas.edu/faculty/antisense/DownloadComputerPrograms.html

  • Hein, J. J. 1990. A unified approach to phylogenies and alignments. Methods in Enzymology183: 625-644.
  • Hein, J. J. 1994. TreeAlign. pp. 349-364 in Computer Analysis of Sequence Data. edited by A. M. Grffin and H. G. Griffin. Humana Press, Tolowa, New Jersey.
  1. ClustalW which has a character-mode interface, in which the user types responses to choose options from a menu.
  2. ClustalX which has a graphical user interface.
  • Larkin, M. A., G. Blackshields, N. P. Brown, R. Chenna, P. A. McGettigan, H. McWilliam, F. Valentin, I. M. Wallace, A. Wilm, R. Lopez, J. D. Thompson, T. J. Gibson, and D. G. Higgins. 2007. Clustal W and Clustal X version 2.0. Bioinformatics23: 2947-2948.
  • Jeanmougin, F., J. D. Thompson, T. J. Gibson, M. Gouy, and D. G. Higgins. 1998. Multiple sequence alignment with Clustal X. Trends in Biochememical Sciences23: 403-405.
  • Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins. 1997. The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Research 24: 4876-4882.
  • Higgins, D. G., J. D. Thompson, and T. J. Gibson. 1996. Using CLUSTAL for multiple sequence alignments. Methods in Enzymology 266: 383-402.
  • Thompson, J.D., D. G. Higgins and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Research 22: 4673-4680.

For the older ClustalV, there exists a Macintosh Hypercard stack, ClustToTree, that can convert its tree files to Newick Standard format (used by many other programs). ClustToTree is made available by Kai-Uwe Fröhlich at the University of Graz, Austria at http://aaa-proteins.uni-graz.at/Archiv/ClustToTreecomp.html .

ClustalW is made available on web servers by the Genebee web server at the Belozersky Institute in Moscow, and at the European Bioinformatics Institute.

Cédric Notredame of the Comparative Bioinformatics Group of the Center for Genomic Regulation (CRG), Barcelona, Spain (cedric.notredame (at) europe.com), Olivier Poirot, Fabrice Armougom, and Sebastien Moretti of the Centre National de la Recherche Scientifique Marseille-Nice Génopole, France have produced T-Coffee (Tree-based Consistency Objective Function For alignmEnt Evaluation), version 8.93. This is a multiple sequence alignment program that aims to improve on ClustalW. It is of the same general approach as ClustalW, a "progressive alignment" method, but it avoids some of the problems with the "greedy" nature of the ClustalW algorithm by taking into account more information about how the sequences all align with each other. T-Coffee is described in the paper: Notredame, C., D. Higgins, and J. Heringa. 2000. T-Coffee: A novel method for multiple sequence alignments. Journal of Molecular Biology 302: 205-217. From the point of view of this listing, the relevant features of T-Coffee are that it makes a "guide tree" and can write that tree out. It also can read in a guide tree supplied by the user. Versions from 2.00 on can align both sequences and structures. T-Coffee is available as Unix source code which can easily be compiled, and as Linux, Mac OS X and Windows binaries. It is available from its web site at http://www.tcoffee.org/Projects_home_page/t_coffee_home_page.html

Ward Wheeler of the Division of Invertebrate Zoology, American Museum of Natural History, New York ( wheeler (at) amnh.org ) and David Gladstein (gladstein (at) gladstein.org) have written MALIGN , version 2.7, a parsimony-based alignment program for molecular sequences. It implements the original suggestion by Sankoff, Morel, and Cedergren (1973) that alignment and phylogenies could be done at the same time by finding that tree that minizes the total alignment score along the tree. Jotun Hein's program TreeAlign (mentioned above) is another, more approximate but possibly faster, attempt to implement the Sankoff-Morel-Cedergren suggestion. MALIGN is one of the only programs to calculate this optimality criterion exactly (Wheeler and Gladstein's other program POY is the other). MALIGN is described in a paper: Wheeler, W. C. and D. S. Gladstein. 1994. MALIGN - A multiple sequence alignment program. Journal of Heredity 85: 417-418. MALIGN is available from its download web site at the Program in Scientific Computation of the American Museum of Natural History at http://research.amnh.org/scicomp/projects/malign.php . It is available as C source code and as binaries for Linux, Windows, Sun Solaris, SGI, and HPUX. The C source code is distributed in two forms, the ordinary one and a special version for parallel computation.

MiraiBio, a Hitachi Software company DNASIS , a general-purpose DNA and protein sequence analysis system, produced by Molecular Biology Insights, Inc. of Cascade, Colorado (but sold through Hitachi). It has many functions including primer design, plasmid maps, contig assembly, alignment, database searching, and many kinds of protein plots. For our purposes what is relevant is the ability to do multiple sequence alignment by the Higgins-Sharp method of progressive sequence alignment (the one used in ClustalV), with one of the results being a UPGMA tree based on pairwise sequence alignment scores. DNASIS is available from MiraiBio as version 3.0 (called DNASIS MAX) Windows executables, including a demo version at its web site at http://www.miraibio.com/dnasis-max/dnasis-max-overview.html . Prices are not stated there -- there is Order form that can be sent to them by email. It was formerly also available from MBI, and at that time a Windows version cost $1,895 and a Mac OS X version cost $2,995 for a 1-10 user network license.

Karl Nicholas ( karlnicholas (at) hotmail.com ) with help from Hugh Nicholas ( nicholas (at) psc.edu ) of the National Resource for Biomedical Supercomputing (NRBSC www.nrbsc.org) at the Pittsburgh Supercomputing Center has produced GeneDoc , version 2.6.0.2, a program for the shading and editing of multiple sequence alignments. Its reads .MSF files and Fasta Files. The alignment can be edited by changing the position of residues in the sequences. GeneDoc includes scoring functions to assist in determining whether your aligment changes are improving the score. Support for obtaining a score via sum-of-pairs or by a phylogenetic tree is included. Phylogenetic trees can be built with either the GUI interface or imported NEXUS or PHYLIP format tree descriptions. The program runs on Windows and both 16-bit and 32-bit executables are distributed. The source code is also available there. It can be downloaded from its Web site at http://www.nrbsc.org/gfx/genedoc/gddl.htm A Windows NT version for Digital Alpha processors was formerly available from Russell Malmberg at the Botany Department of the University of Georgia but is not currently in distribution.

  • Varón, A., L. S. Vinh, and W. C. Wheeler. 2010. POY version 4: phylogenetic analysis using dynamic homologies. Cladistics26: 72-85.
  • Wheeler, W. C. 1999. Fixed character States and the optimization of molecular sequence data. Cladistics15: 379-385.
  • Wheeler, W. 1996. Optimization alignment: the end of multiple sequence alignment in phylogenetics? Cladistics 12: 1-9.

Russell Doolittle ( rdoolittle (at) ucsd.edu ) and Dafei Feng, of the Section of Molecular Biology of the Division of Biological Sciences of the University of California at San Diego, released ALIGN in 1990. A version for Macintoshes was coded by Peter Markeiwicz. ALIGN implements the "progressive alignment" strategy described in their paper: Feng, D.-F. and R. F. Doolittle. 1987. Progressive sequence aligment as a prerequisite to correct phylogenetic trees. Journal of Molecular Evolution 25: 351-360. This is also the basis for the Clustal family of programs as well as the (formerly distributed) Pileup program in the GCG package. The ALIGN program can align as well as print out a tree (which does not have branch lengths). It uses Doolittle's own formats, and so three other programs are included with ALIGN to convert formats. The programs are distributed by ftp from the EBI ftp software server at ftp.ebi.ac.uk in directory pub/software/mac as file align.hqx . A set of C source programs presumably equivalent to these is also made available by Milton Saier at UCSD on a web page at http://www-biology.ucsd.edu/

Roland Fleißner of the Institut für Bioinformatik, University of Duesseldorf, Germany (fleissner (at) cs.uni-duesseldorf.de), Dirk Metzler of the Institut für Informatik, University of Frankfurt, Germany (metzler (at) informatik.uni-frankfurt.de) and Arndt von Haeseler of the Center for Integrative Bioinformatics Vienna (arndt.von.haeseler (at) univie.ac.at) have written ALIFRITZ version 1.0. It simultaneously infers phylogenies and alignments using a model of insertions, deletions, and substitutions, using a Markov chain Monte Carlo method to sample from alignments within given phylogenies. It is described in the paper: Fleissner, R., D. Metzler, and A. von Haeseler. 2005. Simultaneous statistical multiple alignment and phylogeny reconstruction. Systematic Biology 54: 548-561. ALIFRITZ is available as C source code and as a Linux executable from its web page at http://www.cibiv.at/software/alifritz/

  • Redelings B. D, and M. A. Suchard 2005. Joint Bayesian estimation of alignment and phylogeny, Systematic Biology54(3): 401-418.
  • Suchard, M. A. and B. D. Redelings. 2006. BAli-Phy: simultaneous Bayesian inference of alignment and phylogeny. Bioinformatics22: 2047-2048.
  • Katoh, K. and H. Toh. 2007. PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences. Bioinformatics23: 372-374.
  • Katoh, K., K. Kuma, H. Toh and T. Miyata, 2005. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Research33: 511-518.
  • Katoh, K., K. Misawa, K. Kuma and T. Miyata. 2002. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.Nucleic Acids Research30: 3059-3066
  • Edgar, R. C. and K. Sjolander. 2003. SATCHMO: Sequence alignment and tree construction using hidden Markov models, Bioinformatics19(11): 1404-1411.
  • Edgar, R. C. and K. Sjolander. 2004. COACH: profile-profile alignment of protein families using hidden Markov models, Bioinformatics 20(8):1309-1318
  • Edgar, R. C. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucleic Acids Research32(5): 1792-1797.
  • Edgar, R. C. 2004. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5 113.

Manolo Gouy of the Laboratoire de Biometrie et Biologie Evolutive of the Centre National de la Recherche Scientifique, France (manolo.gouy (at) univ-lyon1.fr) has released SeaView version 4.3.3, a multiplatform graphical user interface for sequence alignment and phylogenetic tree building. SeaView allows multiple sequence alignment with the MUSCLE and ClustalW programs, and can also drive many other external multiple alignment algorithms. It also drives GBlocks to help select blocks of evolutionarily conserved sequence sites. Tree building can be done using parsimony, distance, or maximum likelihood (using PHYML) approaches. SeaView also allows network access to sequence databases, and display, printing, and copy-to-clipboard of rooted or unrooted, binary or multifurcating phylogenetic trees. Given this availability of many different methods for phylogenetic analyses, SeaView will be especially useful for teaching and for occasional users of such software. It is described in the paper: Gouy M., S. Guindon, and O. Gascuel. 2010. SeaView version 4: a multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Molecular Biology and Evolution 27 (2): 221-224. It is available as C++ source code, Windows executables, Linux executables and Mac OS X universal executables, and SeaView is also available as Linux packages for Debian, Fedora, and Gentoo Linux. It can be downloaded from its web site at http://pbil.univ-lyon1.fr/software/seaview.html

Pietro Liò, of the Computer Laboratory at the University of Cambridge ( Pietro.Lio (at) cl.cam.ac.uk ), has written PASSML and PASSML_TM , which use likelihood methods with Hidden Markov models to infer phylogeny and also secondary structure from protein data. PASSML is for general proteins and PASSML_TM is for membrane proteins. The methods used are described in the papers: Goldman, N., J. L. Thorne, and D. T. Jones. 1998. Assessing the impact of secondary structure and solvent accessibility on protein evolution. Genetics 149: 445-458, PASSML is described in the paper: Liò, P., N. Goldman, J. L. Thorne and D. T. Jones. 1998. PASSML: combining evolutionary inference and protein secondary structure prediction. Bioinformatics 14: 726-733, and PASSML_TM is described in the paper: Liò, P. and N. Goldman. 1999 Using protein structural information in evolutionary inference: transmembrane proteins. Molecular Biology and Evolution 16: 1696-1710. The programs are available as ANSI C source code. The source code is available via its web page at http://www.ebi.ac.uk/goldman/hmm/passml.html .

Rod Page ( r.page (at) bio.gla.ac.uk ), of the Division of Environmental and Evolutionary Biology of the University of Glasgow has written COMPONENT version 2.0, a program for Windows systems for comparing cladograms for use in phylogeny and biogeography studies. It has many tree comparison and consensus methods, and far more features for biogeographic studies (such as comparing species and area cladograms) than any other package. It also can generate random trees. It runs under Windows 3.0 or higher. There is a review of the program in: Slowinksi, J. 1993. Review of Component, Version 2.0, by Roderick D. M. Page. Cladistics 9: 351-353. COMPONENT is available free from its web site at http://taxonomy.zoology.gla.ac.uk/rod/cpw.html . Source code in Pascal and documentation (as PDFs) are also available there. A very early development Macintosh version ("COMPONENT Lite") is available from the COMPONENT Lite web site at http://taxonomy.zoology.gla.ac.uk/rod/cplite/guide.html .

Rod Page( r.page (at) bio.gla.ac.uk ), of the Division of Environmental and Evolutionary Biology of the University of Glasgow and Michael Charleston ( mcharles (at) it.usyd.edu.au ) of the Biological Informatics and Technology Centre of the School of Information Technologies of the University of Sydney, Sydney, Australia have written TREEMAP , version 3, a free program for comparing host and parasite phylogenies. It allows you to interactively compare host and parasite trees, construct reconstructions of the history of the association, and perform some simple randomisation tests of hypotheses of cospeciation. It also can use Charleston's "Jungles" method to fit parasite trees to host trees by parsimony. That method is described in his paper: Charleston, M. A. 1998 Jungles: A new solution to the host/parasite phylogeny reconciliation problem. Mathematical Biosciences 149: 191-223. For a description of the method used by TreeMap, see Page, R.D.M. 1994. Parallel phylogenies: Reconstructing the history of host-parasite assemblages. Cladistics 10: 155-173. It can also estimate the number of randomized parasite trees that map as well to the host tree as does the original parasite tree. The program is available as a Java executable, which can be downloaded from its web site at http://www.it.usyd.edu.au/

mcharles/software/treemap/treemap3.html . A beta release executable for Mac OS of version 2.0, called version 2.0&beta, is available at the Treemap 2.0&beta web site at http://www.it.usyd.edu.au/

mcharles/software/treemap/treemap.html . An earlier version, 1.0, is available as an executable for Mac OS or as an executable for Windows PCs. They can be downloaded from its WWW site: http://taxonomy.zoology.gla.ac.uk/rod/treemap.html .

Fredrik Ronquist (Fredrik.Ronquist (at) nrm.se) of the Naturhistoriska riksmuseet, Stockholm, Sweden has released DIVA version 1.2, a program for DIspersal Vicariance Analysis. It is for analyses in historical biogeography, where one is reconstructing the distribution history of a group of organisms from the distribution areas of extant species and their phylogeny. It is a parsimony-style analysis based on optimization of the numbers of dispersal and extinction events, where one assumes that speciations divide species ranges allopatrically. It does not make any assumption about the hierarchical nature of vicariance events. It was formerly available as either a Windows executable or a Mac OS executable from its web page at http://www.ebc.uu.se/systzoo/research/diva/diva.html . Currently there is some download, not well described, including perhaps source code, available from the Sourceforge site at http://diva.sourceforge.net/ .

Yu Yan, of the College of Life Sciences of Sichuan University, Chengdu, China (yuyan (at) mnh.scu.edu.cn) and A. J. Harris of the Department of Botany, Oklahoma State University, Stillwater, Oklahoma, USA have produced S-DIVA (Statistical Dispersal-Vicariance Analysis), version 1.9&beta and 1.5c, a tool for inferring biogeographic histories. It uses statistical dispersal-vicariance analysis to check the ancestral reconstructions and evaluate the alternative ancestral areas at each node in the tree. S-DIVA provides a graphical user interface and can export high resolution graphical results for further analysis. It expands on the methods provided by DIVA by using a Bayesian approach to uncertainty in the phylogeny. It is described in a paper: Yu, Y., A. J. Harris, and X. J. He. 2010. S-DIVA (Statistical Dispersal-Vicariance Analysis): a tool for inferring biogeographic histories. Molecular Phylogenetics and Evolution 56: 848-850. It is available as a Windows executable. Microsoft .NET 2 Framework should be installed on the system to enable S-DIVA to be run. S-DIVA can be downloaded from its web site at http://mnh.scu.edu.cn/s-diva/

Fredrik Ronquist (Fredrik.Ronquist (at) nrm.se) of the Naturhistoriska riksmuseet, Stockholm, Sweden has written TreeFitter version 1.0. It fits parasite trees to a host tree, and can also use them to infer the best host tree. The program, which has many options, uses an event-based parsimony method, which penalizes events using penalties chosen to reflect their improbability. The NEXUS file format is used for the tree files. It is available from its web site at http://www.ebc.uu.se/systzoo/research/treefitter/treefitter.html as either a Windows executable or a Mac OS executable. An on-line manual is available at the web site.

Steffen Junick, Daniel Merkle, and Martin Middendorf, of the research group on Parallel Computing and Complex Systems of the Faculty of Mathematics and Computer Science at the the Universität Leipzig, Germany (Merkle is currently in the Department of Mathematics and Computer Science at the University of Southern Denmark) (daniel (at) imada.sdu.dk) have written Tarzan (it is called this because pairs of trees that are cophylogenies have been called "Jungles"), Tarzan is in version 0.9. It is a program for the reconstruction of cophylogenies (host/parasite trees or fits of trees to biogeographic vicariance patterns). Tarzan uses an event-based method to find cost minimal or reconstructions or reconstructions that have a minimal (or maximal) number of certain evolutionary events. Five different types of evolutionary events are considered: cospeciation, duplication, sorting, switching, and extinction. For host-parasite systems cospeciation events refer to simultaneous host and parasite speciation, duplication events are independent parasite speciations, sorting events correspond to lineage sorting, and switches correspond to host shifts. It is described in the paper: Merkle, D. and M. Middendorf. 2005. Reconstruction of the cophylogenetic history of related phylogenetic trees with divergence timing information Theory in Biosciences 123(4): 277-299. Tarzan is available as Java code. It can be downloaded from its web site at http://pacosy.informatik.uni-leipzig.de/pv/Software/Tarzan/PV-Tarzan.engl.html

Pierre Legendre of the Département de Sciences Biologiques of the Université de Montréal, Montréal, Quebec (Pierre.Legendre (at) umontreal.ca) has written ParaFit, a program that tests host-parasite evolution. It tests the hypothesis of coevolution between a clade of hosts and a clade of parasites. The null hypothesis of the global test is that the evolution of the two groups, as revealed by the two phylogenetic trees and the set of host-parasite association links, has been independent. The method requires some estimates of the phylogenetic trees or phylogenetic distances, and also a description of the host-parasite associations (H-P links) observed in nature. Two types of test are produced by the program: a global test of coevolution and a test on each H-P link. It is described in the paper: Legendre, P., Y. Desdevises and E. Bazin. 2002. A statistical test for host-parasite coevolution. Systematic Biology 51(2): 217-234. It is available as FORTRAN source code, Windows executables, Powermac Mac OS X executables, and Mac OS 9 executables. It can be downloaded from its web site at http://www.bio.umontreal.ca/casgrain/en/labo/parafit.html

Alexandros Stamatakis, A. Auch, J. Meier-Kolthoff, and M. Göker , of the Laboratory for Computational Biology and Bioinformatics (LCBB) of the École Polytechnique Fédérale de Lausanne, Switzerland and of the Center for Bioinformatics (ZBIT) of the University of Tübingen, Germany and the Lehrstuhl für Spezielle Botanik und Mykologie of the Botanisches Institut, Universität Tübingen (Stamatakis is currently at Lehrstuhl XII - Machine Learning and Data Mining in Bioinformatics at the Technische Universität München, Germany) (Alexandros.Stamatakis (at) h-its,org) have written AxParafit (AleXandros's version of Parafit), a parallel version of the program ParaFit for fitting host and parasite trees. AxParafit and AxPcoords are highly optimized versions of Pierre Legendre's ParaFit and DistPCoA programs for statistical analysis of host-parasite coevolution. AxParafit has also been parallelized with MPI (Message Passing Interface) for compute clusters. The AxParafit site also includes a parallel version of the program AxPcoords which is used with AxParafit. They are described in the paper: Stamatakis, A., A. Auch, J. Meier-Kolthoff, and M. Göker. 2007. AxPcoords and Parallel AxParafit: Statistical co-phylogenetic analyses on thousands of taxa. BMC Bioinformatics 8: 405. They are available as C source code, Windows executables, Linux executables and Mac OS X universal executables. They can be downloaded from its web site at http://icwww.epfl.ch/

Daniel Merkle, Martin Middendorf, and Nicolas Wieseke of the Department of Computer Science of the University of Leipzig, Germany (middendorf (at) informatik.uni-leipzig.de) has released CoRe-PA version 1.0, tool for reconstructing the coevolutionary history of host parasite systems. CoRe-PA is a tool for reconstructing the coevolutionary history of host parasite systems. As Tarzan it uses an event-based method to find cost minimal reconstructions. These events are cospeciation, sorting, duplication and (host)switching. With CoRe-PA you can design host parasite scenarios with a graphical editor, generate random coevolutionary scenarios using the beta-split model with beta 0, -1 or -1.5, generate random coevolutionary scenarios by simulating coevolution, generate random coevolutionary scenarios which retain the characterof given host parasite systems, handle non-binary host and parasite phylogenies, choose between different ways of handling host switches, use divergence timing information, compute the best reconstructions for a given set of costs, compute the best cost vector for a given host parasite system (where the cost vector fits best to the reconstructed event frequencies), do randomization tests for given host parasite systems to analyze the evidence for coevolution, and export host parasite scenarios and their reconstructions to SVG graphics files. It is described in the paper: Merkle, D., M. Middendorf, and N. Wieseke. 2010. A parameter-adaptive dynamic programming approach for inferring cophylogenies. BMC Bioinformatics 11 (Suppl 1): S60. It is available as Java executables, Windows executables, Linux executables and Mac OS X universal executables. It can be downloaded from its web site at http://pacosy.informatik.uni-leipzig.de/58-1-Downloads.html

Ran Libeskind-Hadas of the Computer Science of Harvey Mudd College in Claremont, California (hadas (at) cs.hmc.edu) has released Jane version 3, a cophylogeny reconstruction package. The input to Jane is a file containing a host tree, a parasite tree, and a mapping of the tips of the parasite tree to tips of the host tree. The user may specify the costs of each of five types of events: cospeciations, duplications, host switches, losses, and failure to diverge. Jane then endeavors to find least cost mappings of the parasite tree onto the host tree subject to the given tip mapping. Jane also has a features to perform randomization tests. It is described in the paper: Conow, C., D. Fielder, Y. Ovadia and R. Libeskind-Hadas. 2010. Jane: A new tool for the cophylogeny reconstruction problem. Algorithms for Molecular Biology 5: 16. It is available as Java executables. It can be downloaded from its web site at http://www.cs.hmc.edu/

Athanasia C. Tzika, Raphaël Helaers, and Michel Milinkovitch of the Laboratory of Artificial and Natural Evolution (LANE) of the Department of Zoology and Animal Biology at the University of Geneva, Switzerland, and Yves Van de Peer, of the Department of Plant Systems Biology of the University of Gent, Belgium (info (at) mantisdb.org ) or ( Michel.Milinkovitch (at) unige.ch) have produced MANTiS version 1.1, a program using molecular databases to reconstruct gene duplications and losses . MANTiS builds a relational database integrating, in a phylogenetic framework, all Ensembl genes, corresponding PANTHER molecular functions and biological processes, as well as GNF, e-genetics, and HMDEG expression data. It makes use of the Ensembl ortholog/paralog prediction pipeline to reconstruct gene duplication events, and implements a dynamical programming approach for the mapping of gene gains, duplications, and losses on the phylogenetic tree.

It allows the user to identify gains and losses on specific branches of the tree, see the genome content of ancestral species, statistically over- or under-represented molecular functions, biological processes and anatomical systems (expression data), and reconstruct tissue specificity of gained, duplicated, and lost genes. It is described in the paper: Tzika, A. C., R. Helaers, Y. Van de Peer and M. C. Milinkovitch. 2008. MANTiS: a phylogenetic framework for multi-species genome comparisons. Bioinformatics 24(2): 151-157. It is available as Java executables with a Windows executable installer, a Linux executable installer, and a Mac OS X universal executable installer. It can be downloaded from its web site at http://www.mantisdb.org

  • Nielsen, R. 2002. Mapping mutations on phylogenies. Systematic Biology51: 729-739.
  • Huelsenbeck, J. P., R. Nielsen, and J. P. Bollback. 2003. Stochastic mapping of morphological characters. Systematic Biology52: 131-158.
  • Bollback, J. P. 2006. SIMMAP: Stochastic character mapping of discrete traits on phylogenies. BMC Bioinformatics 7:88.

Liran Carmel, Yuri I. Wolf, Igor B. Rogozin, and Eugene V. Koonin of the National Center for Biotechnology Information, National Library of Medicine of the National Institutes of Health, Bethesda, Maryland (Carmel is now at the the Department of Computer Science at Hebrew University, Jerusalem, Israel with email address liran.carmel (at) carmellab.com) released EREM (Evolutionary Reconstruction by Expectation-Maximization), a program for parameter estimation and ancestral reconstruction for evolution of binary characters. EREM assumes a probabilistic model for evolution of binary characters on a given bifurcating tree. EREM estimates rates of change between states 0 and 1 of the model, and reconstructs ancestral states (presence and absence in internal nodes) and the location of events (gains and loss along branches). It can also be used to simulate data on a tree. It is available as C++ source code and Windows executables. It can be downloaded from its web site at http://carmelab.huji.ac.il/software/EREM/erem.html

Antonio Marco and Ignacio Marín of the Departamento de Genética of the Universitat de València and of the Instituto de Biomedicina de València, of the Consejo Superior de Investigaciones Científicas, València, Spain (marcasan (at) uv.es) have written Tree Tracker, a Perl script to detect overrepresented clusters in a tree. It takes a user-supplied tree and a list of genes. The program uses a permutation analysis of ranked clusters to test whether groups within the tree are overrepresented for having one state of genes that have two possible states. It is described in the paper: Marco, A. and I. Marín. 2007. A general strategy to determine the congruence between a hierarchical and a non-hierarchical classification. BMC Bioinformatics 8: 442. It is available as a Perl script. It can be downloaded from its web site at http://www.uv.es/

Jianzhi George Zhang of the Department of Ecology and Evolutionary Biology of the University of Michigan, Ann Arbor, Michigan (jianzhi (at) umich.edu) produced Ancestor, a program for inferring the ancestral protein sequence of a set of species from their protein sequences. The tree of the sequences is inferred by the minimum evolution distance matrix method of Rzhetsky and Nei. I can estimate the ancestral sequences at all nodes of the tree. The methods are described in a paper: Zhang, J., and M. Nei. 1997. Accuracies of ancestral amino acid sequences inferred by the parsimony, likelihood, and distance methods. Journal of Molecular Evolution 44: S139-S146. The program is distributed as a DOS executable with C source code. It will run in a Windows Command Prompt window. It is available from Masatoshi Nei's lab software site software site at https://homes.bio.psu.edu/people/faculty/nei/software.htm

Jianzhi George Zhang of the Department of Ecology and Evolutionary Biology of the University of Michigan, Ann Arbor, Michigan (jianzhi (at) umich.edu) has produced ANC-GENE, a program to infer ancestral protein and DNA sequences from DNA sequences of a coding gene when the phylogeny of the species is known. It first infers the amino acids by a distance-based Bayesian method, and then infers the underlying nucleotide sequences by fixing the inferred amino acids. It estimates branch lengths on the phylogeny by a distance method before inferring the ancestral sequences. It uses one of two possible models of amino acid changes (the Poisson-f or JTT-f models), as well as the Jukes-Cantor model of nucleotide substitution. It outputs both inferred pathways of change at each amino acid position and inferred sequences at each node of the tree. The methods are discussed in' this paper: Zhang, J., and M. Nei. 1997. Accuracies of ancestral amino acid sequences inferred by the parsimony, likelihood, and distance methods. Journal of Molecular Evolution 44 (Suppl 1): S139-S146. ANC-GENE is available as a DOS executable and C souce code. These can be executed in Windows in a Command Prompt windows. It can be downloaded from the Nei laboratory software web site at https://homes.bio.psu.edu/people/faculty/nei/software.htm

Xun Gu, of the Department of Genetics, Development and Cell Biology and the Center for Bioinformatics and Biological Statistics at Iowa State University, Ames, Iowa (xgu (at) iastate.edu) has release Mgenome version 1.0. It finds trees for multiple genome rearrangement by signed reversals. For a collection of genomes represented by signed permutations of genes, it finds a tree that connects all given genomes by reversal paths such that the number of all signed reversals is as small as possible. The methods seem to be described in a paper: Wu, S., and X. Gu. 2003. Algorithms for multiple genome rearrangement by signed reversals. Pacfic Symposium on Biocomputing 8: 363-74, although the paper does not refer to the program. The paper is available as a PDF at the Gu lab web site. The program is available as a Windows executable at the Gu lab software web site at http://xungulab.com/software.html .

Mathieu Blanchette, of the School of Computer Science, McGill University, Montréal, Québec (blanchem (at) mcb.mcgill.edu) has written BPAnalysis, a program that infers phylogenies from a set of gene orders by minimizing the number of breakpoints required in genome rearrangement (this is not the same as minimizing the number of rearrangement events). It is a C++ program which is also distributed in source code and in an executable for DOS and Windows. The method employed is described in the paper: Sankoff, D. and M. Blanchette. 1998. Multiple genome rearrangement and breakpoint phylogeny. Journal of Computational Biology 5: 555-570. It is available from Blanchette's software page at http://www.mcb.mcgill.ca/

Benjamin Vernot, Aiton Goldman and Dannie Durand of the Departments of Biological Sciences and Computer Science of the Carnegie Mellon University, Pittsburgh, Pennsylvania (notung (at) cs.cmu.edu) have released Notung version 2.6, a unified framework for incorporating gene duplication/loss parsimony in phylogenetic inference. Given a gene and species tree as input, Notung can: (1) Reconcile the trees, (2) Estimate upper and lower bounds on duplication times in terms of speciation events, (3) Root an unrooted tree by minimizing gene duplications and losses, and (4) Rearrange regions of a gene tree with weak support in the sequence data to obtain alternate hypotheses. Notung's graphical user interface supports exploratory data analysis of very large trees and rapid review of many alternate hypotheses. Notung also provides a command-line interface for automated analysis of many trees in high-throughput genomic studies. Notung can read and save trees in Newick, NHX, or Notung file format. Images can be outputted in PNG format for use in publications. Notung is freely available in a Java executable which can run on Mac OS X, Windows and Linux systems. The distribution includes: Notung java executable, a manual in PDF format with worked examples, sample trees, sample scripts for automated analysis. Java 1.4 or higher is required. It is described in the paper: Durand, D., B. V. Halldorsson, and B. Vernot. 2005. A hybrid micro-macroevolutionary approach to gene tree reconstruction. Journal of Computational Biology 13(2): 320-335. It can be downloaded from its web site at http://www.cs.cmu.edu/

Olivier Elemento, then of the IMGT, the International imMunoGeneTics database and the LIRMM (Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier) of the Université de Montpellier II, Montpellier, France (He is now at the Institute for Computational Biomedicine at Weill Cornell Medical College in New York City and his email address is ole2001 (at) med.cornell.edu) has written DTscore (Duplication, Tandem - score), a distance-based tandem duplication tree reconstruction program. It takes as input a distance matrix between copies in a family of tandem repeats. The rows and columns need to be ordered in the same way as the copies are in the locus. DTscore can be applied to relatively large datasets (more than a hundred copies). It is described in the paper: Elemento O. and O. Gascuel. 2002. A fast and accurate distance algorithm to reconstruct tandem duplication trees. Bioinformatics 18: S92-S99. It is available as C source code, Windows executables and Linux executables. It can be downloaded from its web page at http://www.lirmm.fr/%7Eelemento/DTscore/ and also at its web site at ATGC at http://www.atgc-montpellier.fr/dtscore/binaries.php

Michael Sanderson of the Department of Ecology and Evolutionary Biology of the University of Arizona, Tucson, Arizona (sanderm (at) email.arizona.edu) has written gtp (Gene Tree Parsimony), version 0.15, a program to reconcile gene trees with species trees using a gene tree parsimony criterion. The program reads a NEXUS-format file containing the species tree and a series of gene trees, which have at their tips the names of the species. The gene trees are reconciled with the species trees using a gene duplication count. The gene trees can either be considered to be rooted as given, or optionally they can be considered to be unrooted, in which case the count of duplications is made by considering the minimum over all possible rootings of each gene tree. The methods are described in the paper: Zmasek, C. M. and S. R. Eddy. 2001. A simple algorithm to infer gene duplication and speciation events on a gene tree. Bioinformatics 17: 821-828. It is available as C source code. It can be downloaded from its web site at http://loco.biosci.arizona.edu/gtp/gtp.html

Notices added in compliance with University of Washington requirements for web sites hosted at the University: Privacy Terms


CLC Sequence Viewer

CLC Sequence Viewer is another free bioinformatics software for Windows. Through this software, you can make a large number of bioinformatics analysis using various inbuilt tools. Besides this, some excellent graphical viewing and output options are also available. This freeware comes with various features through which you can perform tasks like create and edit alignments, work with interactive restriction site analysis, phylogenetics, advanced DNA to protein translation, etc. In addition, you can also use the integrated GenBank search options and many other features to search and analyze right set of sequences.

In this software, you can input DNA, RNA, or protein sequence either by directly searching from NCBI database or by using local sequence files of various formats (zip, .gz, .clc, .cm5, etc.). The data of the imported sequence can be viewed, edited, and analyzed from the interface. Here, you can also analyze multiple sequences at once by using the multi-tab interface. On the right side of interface, various handy sequence settings, annotation settings, residue color settings, etc. are available to make modifications to the sequence.

After annotating the sequence, you can save the modified sequence in various formats like PDF, GeneBank (gb), ZIP, Excel (xls), HTML, etc.

In general, it is quite a simple software to view, edit, and analyze gene sequences.


3. How to achieve data visualization?

Technically, the simplest understanding of data visualization is the mapping from data space to graphic space.

A classic visual implementation procedure is to process and filter the data, transform it into an expressible visual form, and then render it into a user-visible view.


Circos Stages Mesolithic to Neolithic Transition

Bollongino et al. present evidence of a slow transition between Mesolithic hunter-gatherer groups to Neolithic farmers.

Previous theories that the foragers disappeared shortly after the arrival of farmers are at odds with palaeogenetic and isotopic data analysis from Neolithic human skeletons from the Blätterhöhle burial site in Germany. Instead of an abrupt transition, the data suggest a more complex pattern of coexistence that persisted for over 2000 years.

Bollongino R, Nehlich O, Richards MP et al. 2013 2000 years of parallel societies in Stone Age Central Europe Science 342 : 479-481 .


Topics similar to or like Neighbor joining

Computational problem concerned with producing multiple sequence alignments, or alignments of three or more sequences of DNA, RNA, or protein. Sequences are arranged into a phylogenetic tree, modeling the evolutionary relationships between species or taxa. Wikipedia

Application of computational algorithms, methods, and programs to phylogenetic analyses. To assemble a phylogenetic tree representing a hypothesis about the evolutionary ancestry of a set of genes, species, or other taxa. Wikipedia

One of several methods of hierarchical clustering. Based on grouping clusters in bottom-up fashion , at each step combining two clusters that contain the closest pair of elements not yet belonging to the same cluster as each other. Wikipedia

Simple way to calculate the distance between phylogenetic trees. Defined as where A is the number of partitions of data implied by the first tree but not the second tree and B) is the number of partitions of data implied by the second tree but not the first tree (although some software implementations divide the RF metric by 2 and others scale the RF distance to have a maximum value of 1). Wikipedia

Branching diagram or a tree showing the evolutionary relationships among various biological species or other entities based upon similarities and differences in their physical or genetic characteristics. Part of a single phylogenetic tree, indicating common ancestry. Wikipedia

Algorithm that can speed up several methods for agglomerative hierarchical clustering. These are methods that take a collection of points as input, and create a hierarchy of clusters of points by repeatedly merging pairs of smaller clusters to form larger clusters. Wikipedia

Interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. Interdisciplinary field of science, bioinformatics combines biology, computer science, information engineering, mathematics and statistics to analyze and interpret the biological data. Wikipedia

Compilation of software tools and web portals used in visualising phylogenetic trees. 1 "All" refers to Microsoft Windows, Apple OSX and Linux L=Linux, M=Apple Mac, W=Microsoft Windows Wikipedia

Compilation of computational phylogenetics software used to produce phylogenetic trees. Such tools are commonly used in comparative genomics, cladistics, and bioinformatics. Wikipedia

In bioinformatics, sequence clustering algorithms attempt to group biological sequences that are somehow related. The sequences can be either of genomic, "transcriptomic" (ESTs) or protein origin. Wikipedia

Method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean , serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. Wikipedia

Computer software for conducting statistical analysis of molecular evolution and for constructing phylogenetic trees. It includes many sophisticated methods and tools for phylogenomics and phylomedicine. Wikipedia

Optimality criterion under which the phylogenetic tree that minimizes the total number of character-state changes is to be preferred. Under the maximum-parsimony criterion, the optimal tree will minimize the amount of homoplasy (i.e., convergent evolution, parallel evolution, and evolutionary reversals). Wikipedia

Method of cluster analysis which seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two types: Wikipedia

Method of species identification using a short section of DNA from a specific gene or genes. That, by comparison with a reference library of such DNA sections , an individual sequence can be used to uniquely identify an organism to species, in the same way that a supermarket scanner uses the familiar black stripes of the UPC barcode to identify an item in its stock against its reference database. Wikipedia

One method for finding community structures in a network. The technique arranges the network into a hierarchy of groups according to a specified weight function. Wikipedia


Frequently asked questions about our family tree maker

Make a family tree chart from scratch by dragging shapes from your toolbox, connecting them with lines, and adding text. Or, simply customize one of our family tree diagram templates.

Our template gallery in Lucidchart contains several examples of family tree charts, each with different features. Browse our template examples before diagramming to see which might work for you.

Yes! Similar to how you can import an org chart, you can create a family tree from a spreadsheet with your family information directly into Lucidchart and a chart will automatically generate based on the data set you've linked.


Watch the video: Making a Neighbor-Joining Tree with MEGA5 Part 1 (December 2022).