Exploring the Shape of Deep Learning Data with Numpy

At the core of deep learning systems are multidimensional numeric arrays and matrices. The input data, the layers of neurons in the application, and the output data all come in arrays of various forms. Python (and specifically the numpy package) provides ways to examine and debug these arrays.

A neural network runs algorithms on the input data to train the network and later produce the output. It’s often useful to reach into these arrays to see what’s in them, to make sure the data looks good or maybe just to verify that some result is being produced at all (i.e., not leaving rows and rows of zeroes). In this post we’ll explore some simple techniques for doing this using both features of standard python arrays and Numpy-enhanced arrays.

For a simple example, let’s start by importing Numpy and create a Numpy array from a simple Python list:

1
2
3
4
5
6
7
8
9
10
>>> import numpy as np
>>> an=np.array(range(1,7))
>>> an
array([1, 2, 3, 4, 5, 6])
>>> type(an)    # numpy array is of type "numpy.ndarray"
<type 'numpy.ndarray'>
>>> a
[1, 2, 3, 4, 5, 6]
>>> type(a)     # primitive list is simply type "list"
<type 'list'>

The Numpy “ndarray” type acts as a wrapper around the plain Python list type; acting like a decorator, it provides additional methods for working with arrays.

With a Numpy array, you can query for the “dimensions” and “shape” of the array. For the simple, flat array above, there is just one dimension of 6 elements:

1
2
3
4
>>> an.ndim       # it's a flat, one dimensional
1
>>> an.shape      # 6 elements in it
(6,)

Numpy provides a facility for “reshaping” an array, fitting the elements of the original array into an array of a different shape. For example, let’s take the above flat array (6 elements in 1 dimension) and create a new array with 2 dimensions, 2 rows and 3 columns:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
>>> an
array([1, 2, 3, 4, 5, 6])
>>> an2=an.reshape(2,3)    # 2 rows, 3 columns
>>> an2
array([[1, 2, 3],
       [4, 5, 6]])
>>> an2.ndim
2
>>> an2.shape
(2, 3)
>>> an2[1]        # the second row
array([4, 5, 6])
>>> an2[1][0:2]   # a slice: the first two columns of the second row
array([4, 5])

The new array (really, a matrix now) has a dimension of two and has the desired shape of 2 rows and 3 columns: an array of size 2, each element being an array of size 3.

The shapes of multidimensional arrays and matrices can be important for some mathematical operations. For the classic dot product of two arrays or matrices the number of columns in the first matrix must match the number of rows in the second matrix, otherwise an error occurs. Here’s an example of multiplying the 2 x 3 array by a 1 x 2 array:

1
2
3
4
5
6
7
8
9
10
>>> an2
array([[1, 2, 3],
       [4, 5, 6]])
>>> an2.shape      # 2 rows, 3 columns in the previous array
(2, 3)
>>> k=np.array([2, 3])
>>> k.shape        # 1 row, 2 columns in a flat array
(2,)
>>> k.dot(an2)        # multiply a 2 column array and 2 row matrix (compatible)
array([14, 19, 24])   # dot product is 1 row, 3 columns

If the shapes of the matrices don’t match correctly, an error occurs; in numpy terminology, the matrices are “not aligned”:

1
2
3
4
>>> an2.dot(k)    # 3 column matrix and a 2 column array: mismatch!!!
Traceback (most recent call last):  File "<stdin>", line 1, in <module>ValueError: objects are not aligned

Numpy also provides operations for doing term-wise multiplication, which produces a different result from the classic dot product. Numpy multiplication involves “broadcasting” (expanding arrays and matrices so that their dimensions match), which is beyond the scope of this post. See the discussion of broadcasting in the Numpy documentation for more information.

Applications to Deep Learning Data

The MNIST data that comes with Tensorflow, used in many convolutional neural net tutorials, consists of an array of black and white images, each 28 x 28 pixels in size. Let’s explore what this data looks like.

First we load the MNIST example data:

1
2
3
4
5
6
7
8
9
10
11
12
>>> import tensorflow as tf
>>> from tensorflow.examples.tutorials.mnist import input_data
 
>>> mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz

Now what’s the size and shape of the data, and what does the data look like?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
>>> import numpy as np
 
>>> mnist.train.images.ndim        # training data
2
 
>>> np.shape(mnist.train.images)
(55000, 784)
 
>>> mnist.test.images.ndim         # testing data
2
 
>>> np.shape(mnist.test.images)
(10000, 784)
 
>>> mnist.train.images
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       ..., 
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]], dtype=float32)
 
...(same for mnist.test.images)

The training data has 55,000 images and the test data has 10,000 images. The data is saved as a dimension 2 array, 55000 images. More notably, the 1 byte pixels of each image are stored in a flattened array, 784 elements in size (28 x 28 = 784).

When evaluating the value of the image data, Python shows just the first 3 and last 3 elements of each image array for brevity. As far as we can tell from this the data may be all zeros. This is where the slice operator becomes useful to search inside other areas of the data.

Let’s try looking at the first row of 28 pixels ([0:28]) in the first image ([0]):

1
2
3
4
>>> mnist.train.images[0][0:28]
array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.], dtype=float32)

This still really doesn’t tell us much; these images are black and white, so it’s not surprising that the first row of pixels in the image are all zeros (color black): this row is on a margin (edge) of the image.

Instead, let’s try sampling in the middle of the image where there’s more likely to be some non-zero data. Given that the images are 28 pixels in height, we’ll select row 14, near the middle of the data. Because the image data is flattened, we’ll have to multiply the row number by the number of pixels in each row (28):

1
2
3
4
5
6
7
>>> mnist.train.images[0][14*28:15*28]    # first image, 14th and 15th rows
array([ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.41568631,  0.6156863 ,  0.99607849,  0.99607849,
        0.95294124,  0.20000002,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ], dtype=float32)

This confirms that there are some real non-zero data in the image arrays.

Numpy supports term-wise multiplication of arrays and matrices by a scalar value. In this case, it might be useful to multiply the image arrays by 255, to translate the gray shade color values (in the range 0.0 to 1.0) into more familiar terms 0 to 255 values:

1
2
3
4
5
6
7
8
>>> 255 * mnist.train.images[0][14*28:15*28]
array([   0.        ,    0.        ,    0.        ,    0.        ,
          0.        ,    0.        ,    0.        ,    0.        ,
          0.        ,    0.        ,    0.        ,    0.        ,
          0.        ,    0.        ,    0.        ,    0.        ,
        106.00000763,  157.        ,  254.00001526,  254.00001526,
        243.00001526,   51.00000381,    0.        ,    0.        ,
          0.        ,    0.        ,    0.        ,    0.        ], dtype=float32)

TL; DR

Using a combination of Numpy shape properties and Python array slicing, large arrays of data can be explored to verify that the data has realistic values and to make sure that arrays are a correct shape for performing mathematical operations on them.

A quick recap of the features discussed:

1
2
3
4
5
6
7
8
9
10
11
12
13
$ python
>>> import numpy as np              # import numpy
>>> a = np.array([1, 2, 3])
>>> a = np.array(range(1, 10001))   # array of 10000 elements
>>> len(a)
>>> m = a.reshape(100, 100)         # reshape the array into a 100 x 100 matrix
>>> m.ndim                          # 2 dimensions
>>> m.shape                         # shape (100, 100)
>>> m[50:52,50:52]                  # sample the center of the matrix
>>> a2 = np.array(range(1, 101))
>>> a2.dot(m)                       # matrix dot product
>>> k * m                           # element-wise multiplication ("*")
>>> np.multiply(k, m)               # element-wise multiplication (alternate form)

Try this to get an idea of the difference in operations supported by native arrays vs numpy ndarrays:

1
2
3
$ python
>>> dir(list)           # Python list
>>> dir(np.ndarray)     # Numpy ndarray

Resources

Numpy Broadcasting
Matrices
How to Multiply Matrices

Versions

$ python -V
Python 2.7.10

>>> np.version.version
‘1.12.1’
>>> tf.__version__
1.0.1

Add a Comment