Software

BGFA: Bayesian canonical correlation analysis and group factor analysis

Given two or more paired observation matrices, BGFA finds sparse and dense latent components corresponding to observation specific covariances or covariance terms shared across observations. In the case of m=2 observations, this model is the canonical correlation model. The linear latent space is the linear projection that maximizes the correlation across the two observations.
The work is described in:

Zhao S, Gao C, Mukherjee S, and Engelhardt BE. “Bayesian group latent factor analysis with structured sparse priors” (submitted) [arXiv]

The BGFA software, written and maintained by Shiwen Zhao, is publicly available: [Software]


BicMix: Bayesian biclustering via a doubly-sparse latent factor model

This software finds two sparse low dimensional matrices that capture sparse covariance structure in the response matrix.
The work is described in:

Gao C, Zhao S, McDowell IC, Brown CD, and Engelhardt BE. “Differential gene co-expression networks via Bayesian biclustering models” (submitted) [arXiv]

The BicMix software, written and maintained Dr. Chuan Gao, is publicly available: [Software]; send questions and comments to: chuan.gao@duke.edu


Bayesian structured sparse regression

This software computes the posterior probability of inclusion for each covariate given a set of predictors (and a positive definite matrix describing their similarity) and a quantitative response. The work is described in:

Engelhardt BE, and Adams RP. “Bayesian structured sparsity from Gaussian fields” (in review) [ArXiV]

The software is available on GitHub [Software]


Posterior predictive checks (PPCs) for admixture models

This software fits the original admixture model to genomic data and encodes the process of performing a posterior predictive check with five possible discrepancy functions. The work is described in:

Mimno D, Blei DM, and Engelhardt BE. “Posterior predictive checks to quantify lack-of-fit in admixture models of latent population structure” (in review) [ArXiV]

The software is available on GitHub [Software]


Sparse and dense factor analysis (SFAmix)

This software computes a low-rank matrix factorization with a combination of both sparse and dense factor loadings for a given matrix, as described in

Gao C, Brown CD, and Engelhardt BE. “A latent factor model with a mixture of sparse and dense factors to model gene expression data with confounding effects” Submitted. [ArXiV]

Download C++ code, instructions, and documentation for SFAmix 1.0.


Data: publicly available eQTL study data with a uniform processing pipeline

These data sets have been processed through a single pipeline for gene expression and genotype data as described in

Brown CD, Mangravite LM, Engelhardt BE (2013). “Integrative modeling of eQTLs and cis-regulatory elements suggests mechanisms underlying cell type specificity of eQTLs” PLoS Genetics 9(8): e1003649. [PDF]

One change from the pipeline noted above is that we include genotypes imputed using Impute2 software with prephasing, and we impute up to the 1000 Genomes reference data from March 2012, and we do not filter low MAF SNPs. Note that the resulting imputed genotype files are in CHIAMO format.

[HapMap 3]


Sparse factor analysis (SFA)

This software uses ECME to compute a sparse, low-rank matrix factorization for a given matrix, as described in

Engelhardt BE, Stephens M (2010) “Analysis of population structure: a unifying framework and novel methods based on sparse factor analysis.” PLoS Genetics 6(9):e1001117.

Download C++ code and instructions for SFA 1.0 and further documentation for the SFA model.


SIFTER: Statistical Inference of Function Through Evolutionary Relationships

SIFTER software and instructions reside at the Brenner Lab at UC Berkeley, although I am still actively maintaining the code. This software uses a
statistical model to predict protein molecular function for unannotated proteins using functional annotations from a set of homologous proteins, described in:


Engelhardt BE, Jordan MI, Srouji JR, and Brenner SE (2011) Genome-scale phylogenetic function annotation of large and diverse protein families. Genome Research (in press).