[Paper] Data Analysis with Latent Variable Models

Data Analysis with Latent Variable Models - Blei, 2014 ¶

Reading Purpose¶

Q: Why am I reading this?
A: To understnad the motivation behind latent variable models In particular, I want to be clear about the relations among: Probablistic Graphical Model vs. Latent Variable Model vs. Bayesian Inference. They all come up in a very similar setting, but what exactly are they?

Reading Goal¶

Q: What is the product/outcome of this reading?

Be able to articulate the definition of Latent Variable Model

A latent variable model is a probabilistic model that encodes hidden patterns in the data (p.203)
Give an example in Text domain and Image domain where the model is used
Describe the general workflow using the model with Bayesian inference

(1) Build a model
Encode our assumptions about our data and hidden quantities that would have? could have? generated the observations as a joint distribution of hidden random variables and observation(aka. data) random variables.

Output of this "build" step:
- definition of H (hidden random vars), V (observation random var)
- Description of the model in one of the specification
  - Describe the model with its generative process, or
  - Descrive the model with its the joint distribution ,P(H,V), or
  - Describe the model with its graphical model representation

(2) Compute
Given observation data D, compute the conditional distribution of hidden variables given D (ie. the random variable here can be written as "h|x=D".

NB: this is not the same random variable as "h".
This computing process is often referred to as "inference" (I think?), as in "Bayesian inferece". However, this is different from the "inference" used to describe the processing of using a trained model as the test time to "infer", for example, the class of a new test image. Here, we are "inferring" the conditional random varialble h|x=Data, which is equivalent to say "compute the conditional distribution of the random variable, h|x=D.
NB: This conditional distribution is often referred to as posterior. The reason it makes sense to call it so is because we are looking at the hidden quantities "posterior" to the data observation process. This is the term from the Bayseian community. But since our model is neither about being bayesian or not (see Dustin's post: "model is just a joint distribution of hidden and observation random variable. To compute the conditional distribution of hidden|observed data, or the predictive distribution P(Xnew|X=data), we can use either frequentist's tools (Eg. EM) or bayesian's tool(eg. hierachical something <- I don't know what this it), I will stick to calling this probability distribution as the "conditional" (as opposed to "posterior") distribution.
so, the outputs of the "compute" step are:
- the conditional distribution, P(H|X=D)
- the predictive distribution, which can be computed from the conditional distribution above

(3) Critique [todo]

Prelim¶

Q1: What is a probabilisitc graphical model?¶

Stanford's CS228:

"Probabilistic graphical models are a powerful framework for representing complex domains usinb probability distributions, with numerous applications in achine learning, computer vision, natural language processing and computational biology. Graphical models bring together graphy theory and probability theory, and provide a flexible framework for modeling large collections of random variables with complex interactions"

Q2: What is a Latent Variable Model?¶

Cross Validated:

"...latent variable models are graphical models where some variables are not seen, but are the causes to the observations. Some of the earliest models were factor analysis. Here the idea is to find a representation of data which reveals some inherent structure of the data..." - link

Q3: What do you mean when you use "bayesisan" to describe a model or an inference method?¶

Dustin Tran's comment on calling a model "bayesian model" (See Bullet 1)

post

I strongely believe models should simply be gramed as a joint distribution p(x, z) for observation/data variable x and latent variables z

NB: this is in line with Blei's definition on a model, provided in this article (pg. 207)

A model is a joint distribution of x and h, p(h,x | hyperparam) = p(h|hyperparam) * p(x|h), which formally describes how the hidden variables and observations interact in a probability distribution

Dustin continues his comment on calling a model either "bayesian" or "frequentist". He argues this is not the right way to communicate because "there is no difference!"

... They are all just "probabilistic models". The statistical methodology -- whether it be Bayesian, frequentist, fiducial, whatever -- is about how to reason with the model given data. That is, they are about "inference" and not the "model" (Bullet 2)

This comment really clarifies my confusion on when to use the adjective "Bayesian" (ie. to describe a model? or a method of inference?):

"Bayesian" approach is one way to do your inference (eg. compute the conditional distribution of P(hidden vars | x = observed_data).
NB: I'm intentionally using the term conditional distribution (rather than posterior distribution of hidden variables because "posterior" is the term most often assosiated with the Bayesian inference. But, as Dustin says, we can do inference (ie. compute -- exactly or approximately -- the conditional distribution of the random variable, (z|x=Data) using either of what we ascribe as a Bayesian tool (eg. hierachical models <-- I don't know what this is) or a frequential tool (eg. EM) .
NB2: in Bayesian framework, "posterior" means "after observations are gathered and incorporated into our reasoning about the hidden variables, ie. those that remain unobservable"

Sec2: Model¶

The descritpion of specifying how the observations arise from the hidden quantities (aka. the generative process of the data) is where everything starts. The story you are constructing/assuming about this generative process can be expressed in 1) plain english, 2) the joint distribution between the hidden variable and observation variables, and how the joint distribution can be factored, and 3) the probabilistic graphical model. So the first thing is to write this generative process (btw this is "your" choice, "your" story, ie, the assumptions you choose/hypothesize).

Step 1: Write in plain english what your hidden quantities that are assumed to have given rise to the observations
Step 2: Write in plain english how they give rise to the observations. That is, what is the dependency like between the hidden quantities and the observations?
Step 3: Now, translate the english description (that is, the definition of your model and the dependency relations) to mathematical expression using the joint distribution (<- encodes your assumptions about the data generative process) and its facotirazation (<- encodes depedency relations)
Step 4: Represent the joint distribution (or the generative process) as a graphical model
Generative process
indicates a pattern of dependence among the random variable

Joint distribution
The traditional way of representing a model is with the factored joint distribution of its hidden and observed variables
This factorization comes directly from the generative process
Graphical model

Aside: Why a joint distribution?¶

Q: What can we do with a joint distribution of the hidden and observable variables? A: Joint distribution is like the root of all other distributions. For example, we can derive conditional distribution of hidden variables given observable variables taking specific values (ie. the observations in our data): P(H|X=Data).

NB2: Why the conditional distribution of H|X=Data?¶

NB: aka. the posterior distribution of the hidden variables given observations

We use the posterior to examine the particular hidden structure that is manifest in the observed data. We also use the posterior (over the global variables) to form the posterior predictive distribution of the future data. ... In section 5, we discuss how the predictive distribution is important for checking and criticizing latent variable models
[Blei 2014, pg. 209]

Story so far,¶

model = joint distribution of hidden and observable variables
Once we have a defined model, and observations (D) , then we can compute the conditional distribution of H|X=D
- what does this posterior distribution tell us?: it helps examine the partidular hidden structure that is manifest in the observed data
- From the conditional distribution, we can compute the predictive distribution of the future data (given the model and data D). This predictive distribution is important for checking and criticizing latent variable models

Sec3: Example models¶

Q: Why am I reading this? A: What are the outcome/product of reading this section?

Articulate the 5 example models in three ways of specficying a model. See Sec.2.1, 2.2, 2.3 for each of the three ways.
- Describe each model by its generative process (sec 2.1)
- Describe each model by its joint distribution (sec 2.2)
- Describe each model by its graphical model (sec 2.3)

Most important figure blei2014

[todo]

Gaussian Mixture model¶

Linear Factor model¶

More general category is called "Factor models"
Factor models are important as they are components in more complicated models (eg. the Kalman filter)
Examples of statistical models that fall into this category: principal component analysis, factor analysis, canonical correlation anlysis
Relation to Gaussian Mixture model: in Gussian mixture model, our z_n's are discrete random vars. Factor model's use 'continuous` hidden variable z.
Generative process
Joint Distribution
Represent the model as a graphical model

Mixed-Membership model¶

Matrix factorization model¶

Hidden Markov model¶

Kalman filter model¶

Running list of definitions

Running list of terms I can't give a definition or construct an example/story out of it yet 1. A numbered 2. list * With some * Sub bullets

posted at 00:00 · A collection of reading lists

How to read a paper

Ref: medium

Before you start

Q: "why are you reading this?"

Write it down where you can see it while reading the paper
- Your purpose/goal of reading may change later. You will have a different experience then.
Is there a clear answer for this question? If not, you probably should not go on reading the paper

Warm-up (1 hr)

Think of it like going on a date with a new person. It's a new relationship, so don't try/expect to understand it in one go -- this is rude:)

Go to a quiet place for a few hours. Take your coffee with you
Start by reading the title and abstract
- Goal: gain a high level overview of the paper
- What are the main goals of the authors?
- What are the high level results?
- What is the problem the paper is solving?
Skim the paper (~15min)
- Look at the figures
- Jot down any keywords to look out for when reading
- Goal: get a sense for the layout of the paper; get keywords to look out for
Go to introduction, especially if you feel unfamiliar with the field/paper. Okay to do it often.
- Goal: get other references to fill in the gap in your understanding
Carefully step through each figure
- why?: each figure contain key points of the paper. Authors spend a lot of time creating them and try to condense important information that supports their experiments/hypothesis. Pay particular attention to them.
- Goal: Gain feel for what the authors think is most important; Write down what to look out for when reading the paper in detail (which will follow soon)
Take a break. Walk a bit.

First ~pass~ date (1.5hr)

Start taking high level notes. Expect new words, unfamiliar ideas. Mark those (you don't yet need to understand every single word), move on.

This is your first date with the paper. You are not going to learn all gory details about it, but you will ask good questions, understand what motivated the paper, and what it's going to be about.

Begin again with the abstract, skim through the introduction*
Diligent pass through the methods section
- Goal: Draw down the overall setup
Read the results and discussion
- Goal: write down the key findings and how they were determined
Take a break. Do jumping jacks. Sing a song.

Let's continue.

Revisit the figures: by now, you should be able to get into nitty gritties of the figures (having read the methods, results, and discussion section)
- Goal: find more gems from the figures.
- Spend about 30min ~ 1hr

Second full pass (1-2hrs)

Goal:

Focus on shoring up what you didn't understand previously,
Gain a command of the methods section
- Test if you can write a pseudocode
Being a critical reader of the discussion section

Details:

Pay particular attention to the areas you marked as being difficult to understand. This is why you read a new paper. Don't play safe. Okay to feel uncomfortable. Okay to do it the following day (but don't push it back too much).
Leave no word undefined, unclear. Make sure you understand every sentence.
Skim through areas you feel confident in (eg. abstract, intro, results)

Guiding Questions

from Quora
What previous research and ideas were cited that this paper is building off of? (usually introduction)
Was there reasoning for performing this research, if so what was it? (introduction)
Clearly list out the objectives of the study
Did you write down 3 on your note?
Was any equipment/software used? (methods)
What variables were measured during experimentation? (methods)
Were any statistical tests used? What were their results? (methods/results)
What are the main findings? (results)
How do these results fit into the context of other research and their 'field'? (discussion)
Explain each figure and discuss their significance.
Did you write down 9 on your note?
Can the results be reproduced and is there any code available?
Name the authors, year, and title of the paper!
Are any of the authors familiar, do you know their previous work?
What key terms and concepts do I not know and need to look up in a dictionary, textbook, or ask someone?
What are your thoughts on the results? Do they seem valid?

Apply the technique

Most importantly, apply this guideline to your reading. Modify it to suit your personality.

Write a reading report

This is the end product of your reading. Without it, you didn't do your job.
^Really.

To check out

## check out: - Jason Eisner (JHU): [how to read a paper](https://www.cs.jhu.edu/~jason/advice/how-to-read-a-paper.html) - Prof.Murat at Buffalo: - [how to lead a reading group](https://tinyurl.com/rbree4d) - [how he reads a paper](http://muratbuffalo.blogspot.com/2013/07/how-i-read-research-paper.html) - how Prof. Nancy Lynch works: cool! - Cathy Wu, MIT: [how to lead a reading group](https://tinyurl.com/rbree4d)

posted at 00:00 · A collection of reading lists

Small Simplicity

Understanding Intelligence from Computational Perspective

Feb 17, 2020

[Paper] Data Analysis with Latent Variable Models

Data Analysis with Latent Variable Models - Blei, 2014 ¶

Reading Purpose¶

Reading Goal¶

Prelim¶

Q1: What is a probabilisitc graphical model?¶

Q2: What is a Latent Variable Model?¶

Q3: What do you mean when you use "bayesisan" to describe a model or an inference method?¶

Sec2: Model¶

Aside: Why a joint distribution?¶

NB2: Why the conditional distribution of H|X=Data?¶

Story so far,¶

Sec3: Example models¶

Gaussian Mixture model¶

Linear Factor model¶

Mixed-Membership model¶

Matrix factorization model¶

Hidden Markov model¶

Kalman filter model¶

Jan 16, 2020

How to read a paper

Before you start

Warm-up (1 hr)

First ~pass~ date (1.5hr)

Second full pass (1-2hrs)

Goal:

Details:

Guiding Questions

Apply the technique

Write a reading report

Feb 17, 2020

[Paper] Data Analysis with Latent Variable Models

Data Analysis with Latent Variable Models - Blei, 2014¶

Reading Purpose¶

Reading Goal¶

Prelim¶

Q1: What is a probabilisitc graphical model?¶

Q2: What is a Latent Variable Model?¶

Q3: What do you mean when you use "bayesisan" to describe a model or an inference method?¶

Sec2: Model¶

Aside: Why a joint distribution?¶

NB2: Why the conditional distribution of H|X=Data?¶

Story so far,¶

Sec3: Example models¶

Gaussian Mixture model¶

Linear Factor model¶

Mixed-Membership model¶

Matrix factorization model¶

Hidden Markov model¶

Kalman filter model¶

Jan 16, 2020

How to read a paper

Before you start

Warm-up (1 hr)

First ~pass~ date (1.5hr)

Second full pass (1-2hrs)

Goal:

Details:

Guiding Questions

Apply the technique

Write a reading report

Data Analysis with Latent Variable Models - Blei, 2014 ¶