Small Simplicity

Understanding Intelligence from Computational Perspective

Feb 23, 2021

Short note on coarse-graining

One of the axioms in Shannon's information theory is that (Shannon's) entropy satisfies coarse-graining property:

coarse-graining-dedeo
While reading Information Theory for Intelligent People by S.DeDeo

This property is closely related to the conditional probabilities. In communication -- regardless of the types of agents involved, eg. between the people over a phone, between parent cell's DNA to daughter cell's DNA, between a disk storage at time T and that at time T+10), or between me (the writer of this article) and you (the reader), there is some 'tolerance' bound that allows "good-enough" intention/semantics to be transmitted and understood between the sender and the receiver. How is this idea related to the Rate-Distortion theory or error-correcting codes? Can this idea help us understand/define the "semantic" information (vs. Shannon's Information measure is often called "syntactic" because it is ignorant/invariant to the identities of the events whose probabilities within the process we are measuring the uncertainty of).

Pondering...
  • Coarse-graining/level of details when describing a process
  • As we 'abstract' away from particular representational form of an event/instance, we move from semantics+form domain to → → → semantics+less form domain. This allows me to say "The chair is blue" and you understand what general color the chair is.
  • At which level of abstraction / this ladder of coarse-graining, do we get sufficient (ie. good-enough to communication our intentions) level of semantics?
  • If we measure $H(\tilde{X})$ at that level, can we say that quantity measures 'semantic information'?
  • The difference $H(G)$ is the force/gradient that drives the flow of information -- information of what?

Feb 23, 2020

Basic concepts in measure theory

Measure

orbanz-1-2 - Intuition: roughly a measure is an integral as a function of its region

$$ \mu(A) = \int_{A} dx ~~\text{or,} ~~~\mu(A) = \int_{A} p(x) dx $$

For example, in geometric case, \(\mu(A)\) can be interpreted as a (physical) length (if \(A\) is one dimensional), mass (if \(A\) is two dimensional), or volumn (if three-dim) of a region \(A\). In the case that \(\mu\) is a measure of probability, \(\mu(A)\) is the probability mass of event, "random variable \(X\) takes values in the set \(A\) (also called event \(A\))"

Density

orbanz-1-3 A (probability) density is a function that transforms one measure to another measure by pointwise reweighting (on the abstract sample space \(\Omega\)) probability-density

Measure-theoretic formalism for Probability

measure-theory-framework

  • abstract probability space vs. observation space Think of the abstract probability space as the entire system of the universe. A point in the space is a state of the universe (eg. a long vector of values assigned to all existing atoms' states). We often don't have a direct access to this "state", ie. it is not fully observable to us. Instead we observe/measure variables that are some functions of this atomic configuration/state (\(w\)). This mapping from a state of the universe to a value that the variable of our interest is observed/measured to take is called a "Random Variable".

Random Variable

orbanz-1-4 - is a function that maps a outcome in the abstract probability space's sample space \(\Lambda\) to the sample space of the observation space \(\Omega\) (often \(\mathbb{R}\)) - it is the key component that connects the abstract probability space (which we don't get to directly observe) to the observation space - Image measure \(\mu_{X}\) is the (derived/induced) measure on the observation space that is related to the abstract probability space via the random variable \(X\). - We need it since the measure on the abstract probability space \(\mathbb{P}\) is not known explicitly, but we need to have a way to descirbe the measure of sets in the Borel set of the observation space. - To assign measures to an event in the observation space, we use "Image measure" \(\mu_{X}\) which is linked to \(\mathbb{P}\) via:

$$ \mu_{X}(A) := \mathbb{P}(X^{-1}(A))$$

- In other words, we compute the probability measure of an event \(A\) (ie.the probability that the random variable X takes a value in the set A) by: 1. Map the set A in the observation space to a space in the abstract probability space, \(A^{\leftarrow} = X^{-1}(A)\) 2. Compute the probability of event \(A^{\leftarrow}\) using \(\mathbb{P}\)

Relationship between two random variables and their image measures

"Density" describes the relationship between two random variables and their image measures:

Source

  • Theoretical Foundations of Nonparametric Bayesian Models, by P.Orbanz. MLSS2009: video part 1, 2. Slides 1, 2 Great introduction of measure theory just as much in detail to be relevant for statistics (and nonparametric Bayesian models)

More resources

  • MLSS09 all lecture and slide links: here

Feb 22, 2020

KL Divergence

Resource

  • Lec on VAE, by Ali Ghodsi: This lecture motivates KL Divergence as the measurement of difference in the average information content of two random varialbes, whose distributions are \(p\) and \(q\) in in the article.
  • Wiki: It clears up different terminologies that are (misused) to refer to the KL.
  • Information Theory for Intelligent People, by Simon Dedeo
    • It gives a great example of "answering 20 questions" problem as a way to think about basic concepts in info theory, including entropy, KL divergence and mutual information.
    • \(H(X)\) is equal to the average length of an arbitrary tree, which is the number of questions to get to choice \(x\)
    • "(Using \(H(X)\),) (f)or any probability distribution, we can now talk about "how uncertain we are about the outcome", "how much information is in the process", or "how much entropy the process has", and even measure it, in bits" (p.3)

Feb 22, 2020

Use `Make` for Reproduible Research

Basics of make for Reproducible Research

  • A research project ca be seen as a tree of dependencies

    • the report depends on the figures and tables
    • the figurese and tables depend on the data and the analysis scripts used to process this data
  • Make is a tool for creating output files from their dependencies through pre-specified rules

  • Make is a build automatio tool
    • Makefile: a configuratio file that contains the rules for what to build
    • Make builds targets using recipes
    • targets can optionally have _prerequisites
    • prerequisites can be files on your computer or other targets: prerequisites == dependent files/targets
    • Make figures out what to build based on the DAC of the targets and prerequisites (ie. its dependencies)
      • the targets are updated only when needed, based on the modification time of their dependencies
    • Phony targets (eg. all, clean): targets that don't actually create an output file
      • they are always run if they come up in a dependency
      • but will no longer be run if a directory/file is ever created that is called all or clean
      • To define targets as Phony target, add a line at the top of the Makefile like:
        .PHONY: all clean test
        
    • Automatic Variables and Pattern Rules
      • $<: first prerequisite
      • $@: target
      • %: wildcard for pattern rules
    • Variables and Functions: both use $(...) syntax. For example,
      ALL_CSV = $(wildcard data/*.csv)
      INPUT_CSV = $(wildcard data/input_file_*.csv)
      DATA = $(filter-out $(INPUT_CSV),$(ALL_CSV)
      
      ## Caveats
  • Use space as the delimiter. Don't use ,.
  • Indent with tabs in Makefiles. Makefiles do not accept indentation with spaces
  • Make executes each line of a Makefile independently in a separate subshell
  • Make executes the first target when no explicit target is given

Recommendation

  • Put all as the first target in the Makefile
  • Name the main target all: it's a convention many people follow
    • all == reference to the main target of the Makefile
    • In other words, this is the target that generates the main desired output(s)
    • Put multiple outputs as the prerequisite of the main target (all)
    • This allows the user to call make in the commandline, and get the desired output(s)
      • All the other rules are there to help build that output (in the simplest case)
  • Design your project's directory structure and Makefile hand in hand
  • Use all capitals for variable names; define variables at the top of the Makefile
  • start small and start early!

Source

More

Next → Page 1 of 5