A FUBAR test, which stands for Fast, Unconstrained Bayesian AppRoximation, is a statistical method used in evolutionary biology to analyze coding sequences. It employs a Bayesian approach to infer the rates of nonsynonymous (dN) and synonymous (dS) substitutions at each site within a given alignment of coding sequences, along with its corresponding phylogeny.
Understanding FUBAR in Detail
FUBAR's primary goal is to identify sites in a protein that are under positive or negative selection. Here's a breakdown:
-
Bayesian Approach: FUBAR uses Bayesian statistics, which means it incorporates prior beliefs about the distribution of dN/dS ratios and updates these beliefs based on the observed data.
-
dN/dS Ratio: The ratio of nonsynonymous to synonymous substitution rates (dN/dS, also denoted as ω) is a key indicator of selection pressure.
- dN > dS (ω > 1): Suggests positive selection, meaning that changes in the amino acid sequence are favored.
- dN < dS (ω < 1): Suggests negative (purifying) selection, meaning that changes in the amino acid sequence are disfavored.
- dN = dS (ω = 1): Suggests neutral evolution, meaning that changes are neither favored nor disfavored.
-
Per-Site Analysis: FUBAR estimates dN/dS on a site-by-site basis, allowing it to pinpoint specific amino acid positions within a protein that are evolving under different selection pressures.
Practical Applications of FUBAR
FUBAR is used to identify amino acid residues within a protein that are undergoing selection. This information can be used in different fields.
- Identify sites under positive selection:
- To identify potential drug-resistance mutations in viruses like HIV.
- To understand the adaptive evolution of proteins in response to environmental changes.
- Identify sites under negative selection:
- To understand the critical functional regions of a protein that are intolerant to change.
- To determine important residues for protein stability or folding.
Example Scenario
Imagine analyzing the gene encoding the hemagglutinin protein of the influenza virus using FUBAR. The analysis may reveal:
- Sites under positive selection: These sites may correspond to amino acid positions in the hemagglutinin protein that are evolving rapidly to evade the host's immune system.
- Sites under negative selection: These sites may correspond to amino acid positions that are crucial for the protein's structure or function, where mutations are detrimental to the virus's fitness.
Key Benefits of FUBAR
-
Fast Computation: As the name suggests, FUBAR is designed to be computationally efficient, making it suitable for analyzing large datasets.
-
Unconstrained: The "Unconstrained" aspect means the method doesn't impose strong assumptions about the distribution of dN/dS ratios.
-
Bayesian Inference: Provides a probabilistic framework for estimating dN/dS ratios and quantifying uncertainty in the estimates.