Events and calendar
Lab meeting focus: Poisson Processes Chapters 4-5.
Presenter: Genna Gliner
Today we discussed chapters four and five of Poisson Processes by J.F.C Kingman. These two chapters continued the introduction to Poisson Processes that we discussed on March 13th with a focus on Poisson Process on the real line. When considering Poisson Processes on the real line, the random points have the additional property that they can be ordered. This leads to some interesting properties. For example, if you consider a Poisson Process that models the arrivals in a queue, the Interval Theorem states that time between any two arrivals follows an exponential distribution. Chapter 5 introduces the idea of a marked Poisson Process. That is, it develops the theory of Poisson Processes when the random points can be distinguished by colors or some other label. This leads to the notion of Poisson Processes on a product space (the combination of the physical space and the space that contains these labels). This leads to properties like the Displacement Theorem which states that a random displacement of points in a random set is a Poisson Process. Our group spent some time brainstorming ideas on how Poisson Processes are relevant in the context of genetics. While genetic data is not measured in terms of time, their are other mediums in genetic data that share the same property of order. In particular, we discussed that the notion of location in the genome is a property that can be ordered. Under this notion, we went on to discuss how the coloring theorem and marking theorem can be applied in the context of SNPs, haplotypes, and other genetic markers of interest whose occurrence can be modeled by a Poisson Process.
March 13, 2015 Lab meeting focus: Poisson Processes Chapters 1-3.
Presenter: Genna Gliner
Today we discussed the first three chapters of Poisson Processes by J.F.C Kingman. The motivation behind the Poisson Process is that the Poisson distribution has many wonderful properties that we can exploit when computing joint probabilities, moments, conditional expectations, and much more. Kingman shows that the nice properties of Poisson distributions naturally arise or have counterparts in the theory of Poisson Process’. This leads to behaviors and patterns that can be used to derive analytic formulas and/or characterize a variety of interesting distributions, similar to the theory of Gaussian Processes. The Poisson distribution can be thought to model the distribution of random points in space. For instance, the Poisson can be used to model the number of trees growing in an acre of land. A key (and powerful) property of the Poisson distribution is that the number of trees in two disjoint acres of land are independent of each other. This independence means that we can compute the joint distributions. While the Poisson distribution models the number of elements in a certain spatial region, a Poisson Process models the change in the number of elements if the spatial region were to change. It turns out that this change follows a Poisson distribution. Kingman shows you can use this relationship to prove practical theorems about Poisson Process’. These include: The Superposition Theorem (the countable union of independent Poisson Process’ is a Poisson Process), The Mapping Theorem (under certain conditions a transformation of a Poisson Process from one state space to another is a Poisson Process), Campbell’s Theorem (The sum of a real-valued function on a Poisson Process exists), Renyi’s Theorem (Gives conditions, which DO NOT include independence, under which a countable random subset in d-dimension Euclidean space is a Poisson Process).
March 6, 2015 Lab meeting focus: Maximization Expectation.
Presenter: Derek Aguiar
The 2006 Welling and Kurihara paper formalizes the Maximization Expectation alternating model estimation framework that reverses the roles of expectation and maximization of the well known EM algorithm. A common characteristic of biological data is that the number of latent variables far outnumber random model parameters. For example, clustering gene expression data requires a latent cluster assignment for each gene (~20k genes), while model parameters are confined to the number of clusters and cluster properties. The ME algorithm combines selection of model structure, e.g. the number of clusters, with hard assignments of latent variables, e.g. the cluster assignments, frequently, leading to fast implementations. One of the important contributions of this paper is formalizing the four alternating model learning algorithms and placing them in a proper perspective (image, right).