Small Simplicity

Understanding Intelligence from Computational Perspective

Feb 01, 2020

Multimodal Distribution in Image or Text domain

Q: What does "multimodal distribution" mean in computer vision literature (eg. image-to-image translation)?

While reading papers on conditional image generation using generative modeling (eg. "Toward Multimodal Image-to-Image Translation" by Zhu et al (NIPS 2017)), I wasn't clear what was meant by "one-to-many mapping" between input image domain and output image domain, "multimodal distribution" in the output image domain, or "multi-modal outputs" (eg. Quora).

Definition

In statistics, a multimodal distribution is a continuous probability distribution with two or more modes (distinct peaks; local maxima) - wikipedia

(single-variable) bimodal distribution bivariate multimodal distribution
bimodal bivariate-multimodal

In high-dimensional space (such as an Image domain: \(P(X)\) where X lives in \(d\)-dim space where \(d\) is the number of pixels, eg. \(32x32=1024\). If each pixel \(X_i\) takes a binary value (0 or 1), the size of this image domain is \(2^{1024}\). If each pixel takes an integer value \(\in [0,255]\), then the size of this image domain is \(256^{1024}\). This, by the way, is too big to compute for Mac's spotlight:

too-big

What it means by saying "the distribution of (output) image is multimodal" is to say, there are multiple images (ie. realization of the random variable (vector) X) with the (local) maxima value of the probability. In Figure below, the green dots represent the local maxima, ie. modes of the distribution. The configurations (ie. specific values/realizations) that achieves the (local) maximum probability density are the "probable/likely" images.

gan-multimodal-outputs multimodal-annot
The green dots representing modes of the distribution over the image domain (which is abstracted into a 2Dim space for visualization, in this case)

So, given one input image, if the distribution of the output image random variable is multi-modal, the standard problem of

Find \(x\) s.t. \(\underset{x \in \mathcal{X}}{\arg\max} P(X)\) (\(\mathcal{X}\) is the image space)
has multiple solutions. According to the paper (Toward Multimodal Image-to-Image Translation), many previous works have produced a "single" output image as "the" argmax of the output image distribution. But this is not accurate if the output image distribution is multi-modal. We would like to see/generate as many of those argmax configurations/output images. One way to do so, is by sampling from the output image distribution. This is the paper's approach.


Multimodal distribution as the distribution over the space of target domains [Domain adaption/transfer leraning]

So far, I viewed the multimodal distribution as a distribution over a specific domain (eg. Image domain), and the random variable corresponded to a realization, eg. an observed/sampled/output image instance. However,

Jan 20, 2020

Stochastic Thinking: Predictive non-determinism

MIT 6.0002 Lec4: Stochastic Thinking - YOUTUBE - predictive-nondeterminism

Often confusing categorization of a mathematical model: - SE - NB: in CS, people often use "deterministic" to mean non-randomized. This causes confusion: > "Determinism" means non-randomized. But, "Non-determinism" does not mean "randomized". - Determinism vs. Non-Determinism - ...? vs. stochastic/random - a stochastic (or random) process means,

<p hidden>![stochastic-process](images/stochastic-process.png)</p>

I think better way to put this confusion into words is: "Nondeterinistc vs. Probabilistic models" - lavalle 2006

Jan 10, 2020

Let's blog with Pelican

Makefile

  1. make html: generates output html files from files in content folder using development config file

    • make regenerate: do make html with "listening" to new changes
    • vs. make publish: similar to make html except it uses settings in pulishconf.py
  2. make devserver: (re)starts a http server in the output folder. Default port is 8000

  3. Go to localhost:<PORT> to see the output website
  4. ghp-import -b <local-gh-branch> <outputdir>: imports content in to the local branch local-gh-branch

Workflow

Key: Do every work in dev branch. Do not touch blog-build or master. blog-build will be indirectly modified by ghp-import (or make publish-to-github) and master is the branch that Github will access to show my website. So, manage the source (and outputs) only in dev branch.

  • Local dev
conda activate pelican-blog
cd ~/Workspace/Blog

# Make sure you are on `dev` branch
git checkout dev

# Add new files  under `content`
git add my-article.md

# Generate the content with pelican
make html # or, make regenerate 

# Start a local server 
make devserver

# Open a browser and go to localhost:8000
  • Global dev

    1. Use make publish instead of make html
    2. Update the blog-build branch with contents in output folder
    3. Push the blog-build branch's content to origin/master

    These three steps can be done in one line:

make publish-to-github
  • Version control the source
    Important: Write new contents only on the dev branch
git add <files> # make sure not to push the output folder
git cm "<commit message>"
git push origin dev #origin/dev is the remote branch that keeps track of blog sources

Dec 31, 2019

Cool Chart tool: `amcharts`

While making a visualization for my part whereabouts for the front page of this blog, I came across this easy-to-use visualization examples using amcharts. Initially, I wanted to use Google Earth Studio but it required me to import country boundaries (in KML files) as well as time to learn new toolsuites, so I find this javascript based demos more useful for my need.

List of timeline + map charts - Fishbone timeline - Fight routes on map - Flight animation on map - Timeline animation with fligh on map - and many more demos

Pretty neat!

Aug 01, 2019

Basics of Python Image Library (PIL)

Python Imaging Library (PIL)

  • Date: 2020-03-01 (Sun)
In [2]:
from matplotlib import pyplot as plt
%matplotlib inline

from PIL import Image
In [25]:
def print_info(arr):
    print("dtype: ", arr.dtype)
    print("range: ", f'({arr.min(), arr.max()})')
    print("shape: ", arr.shape)
In [30]:
fn = '/data/hayley/maptiles/paris/EsriImagery/16617_11283_15.png'
im = Image.open(fn)
plt.imshow(im);

PIL.Image <-> np.ndarray converison

  • PIL.Image to ndarray

    im = Image.open(img_fn)
    np_im = np.asarray(im)
    
  • np.array to PIL.Image

    pil_im = Image.fromarray(np_im)
    
In [31]:
print(im.mode)
np_im = np.asarray(im)
print(np_im.shape)
plt.imshow(np_im[...,:3]);
RGB
(256, 256, 3)
In [32]:
im_3d = Image.fromarray(np_im)
plt.imshow(im_3d);

Basic Image Processing

In [33]:
plt.imshow(im_3d.crop((150, 50, 250, 200)))
Out[33]:
<matplotlib.image.AxesImage at 0x7fa864264e10>

PIL Modes

A mode of an PIL.Image object defines the type and depth of a pixel in the image. Each pixel uses the full range of the bit depth

  • a 1-bit pixel can take either 0 or 1
  • a 8-bit pixel can take an integer value in range 0,1, ..., 255

Resource:

  • SOF

    Here's the kicker... if you want and expect an RGB image, you should just convert to RGB on opening:

    im = Image.open(fn).convert('RGB')
    

    List of standard modes:

  • Single channel image
  • 1: 1-bit pixels, binary Black and White
  • L: 8-bit pixels, Greyscale in 256 degree. "L" for "Luminance" (not color)

  • Multi-channel image

  • RGB:(3x8-bit pixels), true color
  • RGBA: (4X8-bit pixels), true color with transparency mask
  • CMYK: (4X8-bit pixels), color separation
  • LAB: (3X8-bit pixels), Lab color space
  • YCbCr: (3x8-bit pixels), color video format
  • HSV : (3X8-bit pixels), Hue-Saturation-Value

  • I: 32-bit signed integer pixels
  • F: 32-bit floating point pixels
In [34]:
im_3d = im.convert('RGB')
print(im_3d.mode)
plt.imshow(im_3d);
RGB

Data type of PIL.Image objects

Most underlying data of a PIL.Image object is uint8 dtype, except:

  • I: 32-bit signed integer pixels (4*8-bit signed integer)
  • F: 32-bit floating point pixels (4*8-bit floating point)
In [40]:
# im is PIL.Image of mode `RGBA`
im_3d = im.convert('RGB')
np_im_3d = np.asarray(im_3d)
print_info(np_im_3d)
plt.imshow(np_im_3d);
dtype:  uint8
range:  ((0, 255))
shape:  (256, 256, 3)
In [41]:
im_L = im.convert('L')
np_im_L = np.asarray(im_L)
print_info(np_im_L)
plt.imshow(np_im_L);
dtype:  uint8
range:  ((0, 253))
shape:  (256, 256)
In [42]:
im_I = im.convert('I') #single-channel, 32bit signed integer (not uint8)
np_im_I = np.asarray(im_I)
print_info(np_im_I)
plt.imshow(np_im_I);
dtype:  int32
range:  ((0, 253))
shape:  (256, 256)
In [43]:
im_F = im.convert('F') #single-channel, 32bit floats  (not uint8) in range of [0.0, 256.0)
np_im_F = np.asarray(im_F)
print_info(np_im_F)
plt.imshow(np_im_F);
dtype:  float32
range:  ((0.456, 253.256))
shape:  (256, 256)

Conversions between PIL.Image and torch.tensor

Use torchvision.transforms

  1. tvts.ToTensor()(pil_im) $\Rightarrow$ torch.FloatTensor of shsape (C,H,W)
    • this transform scales the values to range [0.0, 1.0] if the PIL.Image belongs to one of modes in (1, L, LA, P, RGB, YCbCr, RGBA, CMYK, I, F,). Note this list of modes covers most of the cases
    • this function can take np.ndarray as an input to return a torch.FloatTensor.
      • assumes the np.ndarray follows (H,W,C) numpy image convension.
      • scaling to range [0.0, 1.0] happens if the np.ndarray had dtype of np.uint8 Let's check if the scaling effect of tvts.ToTensor() actually happens only when its input np.ndarray is of dtype == np.uint8

PIL.Image object with its mode == F stores each pixel as 32-bit float in range of [0.0, 256.0).

In [50]:
im_F = im.convert('F')
print('PIL Image mode: ', im_F.mode)
PIL Image mode:  F
In [51]:
np_im_F = np.asarray(im_F)
print_info(np_im_F)
dtype:  float32
range:  ((0.456, 253.256))
shape:  (256, 256)

Now, let's try to convert this 32-bit floating point np.ndarray to a torch.tensor object using torchvision.transforms.ToTensor() class

In [53]:
import torchvision.transforms as tvts
In [54]:
t_from_pil = tvts.ToTensor()(im_F)
print('=== t_from_pil ===')
print_info(t_from_pil)
=== t_from_pil ===
dtype:  torch.float32
range:  ((tensor(0.4560), tensor(253.2560)))
shape:  torch.Size([1, 256, 256])
In [55]:
t_from_np = tvts.ToTensor()(np_im_F)
print('=== t_from_np ===')
print_info(t_from_np)
=== t_from_np ===
dtype:  torch.float32
range:  ((tensor(0.4560), tensor(253.2560)))
shape:  torch.Size([1, 256, 256])

We can see that the output tensors are still torch.FloatTensor (ie. torch.tensor of dtype of float32) but the values are not normalized to range [0.0, 1.0]. This confirms that the tvts.ToTensor() scales its input PIL.Image or np.nadarray object to range [0.0, 1.0] only when the input's dtype == uint8. Note that the returned torch.Tensor always has 3 dims.

Now, let's check if it does the scaling when the input np.ndarrayor PIL.Image indeed has dtype of unsigned 8bit-integer (ie. uint8). We are going to use PIL.Image with mode RGB as an example.

In [56]:
im_rgb = im.convert('RGB')
print(im.mode)
RGB
In [57]:
np_rgb = np.asarray(im_rgb)
print_info(np_rgb)
dtype:  uint8
range:  ((0, 255))
shape:  (256, 256, 3)
In [58]:
# to tensor
t_from_pil_rgb = tvts.ToTensor()(im_rgb)
print(' === t_from_pil_rgb === ')
print_info(t_from_pil_rgb)
 === t_from_pil_rgb === 
dtype:  torch.float32
range:  ((tensor(0.), tensor(1.)))
shape:  torch.Size([3, 256, 256])
In [59]:
t_from_np_rgb = tvts.ToTensor()(np_rgb)
print(' === t_from_np_rgb === ')
print_info(t_from_np_rgb)
 === t_from_np_rgb === 
dtype:  torch.float32
range:  ((tensor(0.), tensor(1.)))
shape:  torch.Size([3, 256, 256])

Notice the range of the returned torch.Tensors. They are both in range of [0.0, 1.0].

In [ ]:
 
Next → Page 1 of 2