PhyML 3.0

New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0

Stéphane Guindon Jean-François Dufayard Vincent Lefort Maria Anisimova Wim Hordijk Olivier Gascuel

Overview: new algorithms, methods and utilities

PhyML is a software package that uses modern statistical approaches to build phylogenetic trees from the analysis of alignments of nucleotide or amino acid sequences. The main tool in this package builds phylogenies under the maximum likelihood criterion. It implements a large number of substitution models coupled to efficient options to search the space of phylogenetic tree topologies.

Installation

To install PhyML, download the code from https://github.com/stephaneguindon/phyml/. After unpacking the archive, go into the phyml/ folder and type the following command:

sh ./autogen.sh;

If you are using a Mac computer or running a Unix-like operating system, you will need to install the packages autoconf automake and pkg-config. On a Mac, the following command should set you up (provided Homebrew is installed on your Mac…): brew install pkg-config autoconf automake;

Next, to install any program that is part of the PhyML package, type the following commands:

./configure --enable-phyml;
make;

To compile a Windows executable, install MinGW and run:

./configure --enable-win --enable-phyml;
make;

To install the MPI version of PhyML, type the following commands:

autoreconf -i;
./configure --enable-phyml-mpi;
make;

PhyML 3.0 online execution

Account Information

Name

Name your analysis for easy identification

Email address for job notifications

Confirm email

Please confirm your email address

Private

Protect your job with a randomly generated password. You will receive it by email.

Input data

Sequence alignment

Drag and drop a file or click to browse.

No file selected

Upload a PHYLIP formatted alignment file.

Sequence type

Let TIDE infer the alignment alphabet, or force nucleotide/amino-acid mode.

Substitution Model

Model selection

Choose between automatic Smart Model Selection (SMS) or manual parameters.

Automatic model selection (SMS)

SMS criterion

Criterion used by SMS to rank candidate models.

User defined model parameters

Equilibrium frequencies

How equilibrium base or amino-acid frequencies are determined.

Invariable sites estimation

Choose whether to estimate the proportion of invariable sites.

Proportion of invariable sites

Proportion of sites with zero substitution rate.

Rate variation model

Specify how rate variation across sites is handled.

Number of rate categories

Number of discrete rate categories (for gamma models).

Gamma shape estimation

Choose whether to estimate the gamma shape parameter.

Gamma shape parameter

Shape parameter for the gamma rate distribution.

Substitution Model (DNA)

DNA substitution model

Substitution model applied to nucleotide sequences.

DNA transition parameters

Transition/transversion estimation

Whether to estimate the transition/transversion ratio or use a fixed value.

Transition/transversion ratio

Ratio of transition to transversion rates (DNA only).

Substitution Model (Protein)

Protein substitution model

Substitution model applied to amino-acid sequences.

Optimization options

Starting tree

Starting tree used to initiate the search.

User starting tree

Starting tree file

Drag and drop a file or click to browse.

No file selected

Upload the custom starting tree in Newick format.

Constraint tree

Optionally provide a tree to constrain the search space.

Constraint tree

Constraint tree file

Drag and drop a file or click to browse.

No file selected

Constraint tree in Newick format.

Random starting trees

Number of random starting trees

Specify how many random starting trees should be generated.

Optimize tree topology

Control whether tree topology is optimized.

Optimize branch lengths

Control whether branch lengths are optimized.

Add random starting trees

Add additional random starting trees to the search.

Branch supports

aLRT support

Fast branch support method.

Standard bootstrap analysis

Perform non-parametric bootstrap replicates.

Standard bootstrap settings

Number of bootstrap replicates

Specify the number of standard bootstrap replicates.

Transfer bootstrap settings

Number of transfer bootstrap replicates

Specify the number of transfer bootstrap replicates.

Transfer bootstrap analysis

Perform transfer bootstrap expectation analysis.

Extra options

Keep duplicate sequences

Preserve duplicate sequences instead of collapsing them.

Print site likelihood

Output per-site likelihood values.

Infer ancestral sequences

Infer ancestral sequences for internal nodes.

Related tool pages:

PhyML 3.0 PhyML 3.0 online Benchmarks Datasets Forum & FAQ Papers & contacts Versions

Other tools

WAVES

Summary WAVES is a web application dedicated to bioinformatic tool integration. It provides an efficient way to implement a service for any bioinformatic software. Such services are automatically made available in three ways: web pages, web forms to include in remote websites, and a RESTful web services API to access remotely from applications. In order…

RSCU_RS: Measuring the bias in…

Overview Overview: In the protein coding sequences of a species, the 61 possible codons of the genetic code are not equally distributed. This observation is referred to as the Codon Usage Bias (CUB) of a species. Several measures have been proposed to quantify the CUB using the frequencies of codons in all RNA coding sequences…

Bioinformatics Comparative genomics Computational biology Codon usage analysis Codon usage bias Expression data RNA sequence BAM

dipwmsearch

Protein binding sites in DNA or RNA sequences are modeled by probabilistic motifs. A Position Weight Matrix (PWM) is a simple, powerful, and widely used representation of such motifs. Because PWMs assume that sequence positions are independent of eachother (which is too restrictive for some binding or interaction sites), a generalisation of PWMs, termed di-nucleotidic…

Bioinformatics Biology Nucleic acid sites, features and motifs Protein sites, features and motifs Sequence analysis Sequence motif recognition Sequence similarity search Sequence motif FASTA