Small Simplicity

Understanding Intelligence from Computational Perspective

Aug 01, 2019

Basics of Python Image Library (PIL)

Python Imaging Library (PIL)

  • Date: 2020-03-01 (Sun)
In [2]:
from matplotlib import pyplot as plt
%matplotlib inline

from PIL import Image
In [25]:
def print_info(arr):
    print("dtype: ", arr.dtype)
    print("range: ", f'({arr.min(), arr.max()})')
    print("shape: ", arr.shape)
In [30]:
fn = '/data/hayley/maptiles/paris/EsriImagery/16617_11283_15.png'
im = Image.open(fn)
plt.imshow(im);

PIL.Image <-> np.ndarray converison

  • PIL.Image to ndarray

    im = Image.open(img_fn)
    np_im = np.asarray(im)
    
  • np.array to PIL.Image

    pil_im = Image.fromarray(np_im)
    
In [31]:
print(im.mode)
np_im = np.asarray(im)
print(np_im.shape)
plt.imshow(np_im[...,:3]);
RGB
(256, 256, 3)
In [32]:
im_3d = Image.fromarray(np_im)
plt.imshow(im_3d);

Basic Image Processing

In [33]:
plt.imshow(im_3d.crop((150, 50, 250, 200)))
Out[33]:
<matplotlib.image.AxesImage at 0x7fa864264e10>

PIL Modes

A mode of an PIL.Image object defines the type and depth of a pixel in the image. Each pixel uses the full range of the bit depth

  • a 1-bit pixel can take either 0 or 1
  • a 8-bit pixel can take an integer value in range 0,1, ..., 255

Resource:

  • SOF

    Here's the kicker... if you want and expect an RGB image, you should just convert to RGB on opening:

    im = Image.open(fn).convert('RGB')
    

    List of standard modes:

  • Single channel image
  • 1: 1-bit pixels, binary Black and White
  • L: 8-bit pixels, Greyscale in 256 degree. "L" for "Luminance" (not color)

  • Multi-channel image

  • RGB:(3x8-bit pixels), true color
  • RGBA: (4X8-bit pixels), true color with transparency mask
  • CMYK: (4X8-bit pixels), color separation
  • LAB: (3X8-bit pixels), Lab color space
  • YCbCr: (3x8-bit pixels), color video format
  • HSV : (3X8-bit pixels), Hue-Saturation-Value

  • I: 32-bit signed integer pixels
  • F: 32-bit floating point pixels
In [34]:
im_3d = im.convert('RGB')
print(im_3d.mode)
plt.imshow(im_3d);
RGB

Data type of PIL.Image objects

Most underlying data of a PIL.Image object is uint8 dtype, except:

  • I: 32-bit signed integer pixels (4*8-bit signed integer)
  • F: 32-bit floating point pixels (4*8-bit floating point)
In [40]:
# im is PIL.Image of mode `RGBA`
im_3d = im.convert('RGB')
np_im_3d = np.asarray(im_3d)
print_info(np_im_3d)
plt.imshow(np_im_3d);
dtype:  uint8
range:  ((0, 255))
shape:  (256, 256, 3)
In [41]:
im_L = im.convert('L')
np_im_L = np.asarray(im_L)
print_info(np_im_L)
plt.imshow(np_im_L);
dtype:  uint8
range:  ((0, 253))
shape:  (256, 256)
In [42]:
im_I = im.convert('I') #single-channel, 32bit signed integer (not uint8)
np_im_I = np.asarray(im_I)
print_info(np_im_I)
plt.imshow(np_im_I);
dtype:  int32
range:  ((0, 253))
shape:  (256, 256)
In [43]:
im_F = im.convert('F') #single-channel, 32bit floats  (not uint8) in range of [0.0, 256.0)
np_im_F = np.asarray(im_F)
print_info(np_im_F)
plt.imshow(np_im_F);
dtype:  float32
range:  ((0.456, 253.256))
shape:  (256, 256)

Conversions between PIL.Image and torch.tensor

Use torchvision.transforms

  1. tvts.ToTensor()(pil_im) $\Rightarrow$ torch.FloatTensor of shsape (C,H,W)
    • this transform scales the values to range [0.0, 1.0] if the PIL.Image belongs to one of modes in (1, L, LA, P, RGB, YCbCr, RGBA, CMYK, I, F,). Note this list of modes covers most of the cases
    • this function can take np.ndarray as an input to return a torch.FloatTensor.
      • assumes the np.ndarray follows (H,W,C) numpy image convension.
      • scaling to range [0.0, 1.0] happens if the np.ndarray had dtype of np.uint8 Let's check if the scaling effect of tvts.ToTensor() actually happens only when its input np.ndarray is of dtype == np.uint8

PIL.Image object with its mode == F stores each pixel as 32-bit float in range of [0.0, 256.0).

In [50]:
im_F = im.convert('F')
print('PIL Image mode: ', im_F.mode)
PIL Image mode:  F
In [51]:
np_im_F = np.asarray(im_F)
print_info(np_im_F)
dtype:  float32
range:  ((0.456, 253.256))
shape:  (256, 256)

Now, let's try to convert this 32-bit floating point np.ndarray to a torch.tensor object using torchvision.transforms.ToTensor() class

In [53]:
import torchvision.transforms as tvts
In [54]:
t_from_pil = tvts.ToTensor()(im_F)
print('=== t_from_pil ===')
print_info(t_from_pil)
=== t_from_pil ===
dtype:  torch.float32
range:  ((tensor(0.4560), tensor(253.2560)))
shape:  torch.Size([1, 256, 256])
In [55]:
t_from_np = tvts.ToTensor()(np_im_F)
print('=== t_from_np ===')
print_info(t_from_np)
=== t_from_np ===
dtype:  torch.float32
range:  ((tensor(0.4560), tensor(253.2560)))
shape:  torch.Size([1, 256, 256])

We can see that the output tensors are still torch.FloatTensor (ie. torch.tensor of dtype of float32) but the values are not normalized to range [0.0, 1.0]. This confirms that the tvts.ToTensor() scales its input PIL.Image or np.nadarray object to range [0.0, 1.0] only when the input's dtype == uint8. Note that the returned torch.Tensor always has 3 dims.

Now, let's check if it does the scaling when the input np.ndarrayor PIL.Image indeed has dtype of unsigned 8bit-integer (ie. uint8). We are going to use PIL.Image with mode RGB as an example.

In [56]:
im_rgb = im.convert('RGB')
print(im.mode)
RGB
In [57]:
np_rgb = np.asarray(im_rgb)
print_info(np_rgb)
dtype:  uint8
range:  ((0, 255))
shape:  (256, 256, 3)
In [58]:
# to tensor
t_from_pil_rgb = tvts.ToTensor()(im_rgb)
print(' === t_from_pil_rgb === ')
print_info(t_from_pil_rgb)
 === t_from_pil_rgb === 
dtype:  torch.float32
range:  ((tensor(0.), tensor(1.)))
shape:  torch.Size([3, 256, 256])
In [59]:
t_from_np_rgb = tvts.ToTensor()(np_rgb)
print(' === t_from_np_rgb === ')
print_info(t_from_np_rgb)
 === t_from_np_rgb === 
dtype:  torch.float32
range:  ((tensor(0.), tensor(1.)))
shape:  torch.Size([3, 256, 256])

Notice the range of the returned torch.Tensors. They are both in range of [0.0, 1.0].

In [ ]: