NumPy under MinPy, with GPU

This part of tutorial is also available in step-by-step notebook version on github. Please try it out!

Basic NDArray Operation

MinPy has the same syntax as NumPy, which is the language of choice for numerical computing, and in particular deep learning. The popular Stanford course cs231n uses NumPy as its main coursework. To use NumPy under MinPy, you only need to replace import numpy as np with import minpy.numpy as np at the header of your NumPy program. if you are not familiar with NumPy, you may want to look up NumPy Quickstart Tutorial for more details.

Using NumPy under MinPy has two simple but important reasons, one for productivity and another for performance: 1) Auto-differentiation, and 2) GPU/CPU co-execution. We will discuss them in this tutorial.

But first, let us review some of the most common usages of NumPy.

Array Creation

An array can be created in multiple ways. For example, we can create an array from a regular Python list or tuple by using the array function

In [1]:
import minpy.numpy as np

a = np.array([1,2,3])  # create a 1-dimensional array with a python list
b = np.array([[1,2,3], [2,3,4]])  # create a 2-dimensional array with a nested python list

Here are some useful ways to create arrays with initial placeholder content.

In [2]:
a = np.zeros((2,3))    # create a 2-dimensional array full of zeros with shape (2,3)
b = np.ones((2,3))     # create a same shape array full of ones
c = np.full((2,3), 7)  # create a same shape array with all elements set to 7
d = np.empty((2,3))    # create a same shape whose initial content is random and depends on the state of the memory

Basic Operations

Arithmetic operators on arrays apply elementwise, with a new array holding result.

In [3]:
a = np.ones((2,3))
b = np.ones((2,3))
c = a + b  # elementwise plus
d = - c    # elementwise minus
print(d)
e = np.sin(c**2).T  # elementwise pow and sin, and then transpose
print(e)
f = np.maximum(a, c)  # elementwise max
print(f)
[[-2. -2. -2.]
 [-2. -2. -2.]]
[[-0.7568025 -0.7568025]
 [-0.7568025 -0.7568025]
 [-0.7568025 -0.7568025]]
[[ 2.  2.  2.]
 [ 2.  2.  2.]]

Indexing and Slicing

The slice operator [] applies on axis 0.

In [4]:
a = np.arange(6)
a = np.reshape(a, (3,2))
print(a[:])
# assign -1 to the 2nd row
a[1:2] = -1
print(a)
[[ 0.  1.]
 [ 2.  3.]
 [ 4.  5.]]
[[ 0.  1.]
 [-1. -1.]
 [ 4.  5.]]

We can also slice a particular axis with the method slice_axis

In [5]:
# slice out the 2nd column
d = np.slice_axis(a, axis=1, begin=1, end=2)
print(d)
[[ 1.]
 [-1.]
 [ 5.]]

AutoGrad Feature

If you work in a policy mode called NumpyOnlyPolicy (refer here for more details), MinPy is almost compatible with the most of NumPy usages. But what makes MinPy awesome is that it give you the power of autograd, saving you from writing the most tedious and error prone part of deep net implementation:

In [6]:
from minpy.core import grad

# define a function: f(x) = 5*x^2 + 3*x - 2
def foo(x):
    return 5*(x**2) + 3*x - 2

# f(4) = 90
print(foo(4))

# get the derivative function by `grad`: f'(x) = 10*x + 3
d_foo = grad(foo)

# f'(4) = 43.0
print(d_foo(4))
90
43.0

More details about this part can be found in Autograd Tutorial.

GPU Support

But we do not stop here, we want MinPy not only friendly to use, but also fast. To this end, MinPy leverages GPU’s parallel computing ability. The code below shows our GPU support and a set of API to make you freely to change the runnning context (i.e. to run on CPU or GPU). You can refer to Select Context for MXNet for more details.

In [7]:
import minpy.numpy as np
import minpy.numpy.random as random
from minpy.context import cpu, gpu
import time

n = 100

with cpu():
    x_cpu = random.rand(1024, 1024) - 0.5
    y_cpu = random.rand(1024, 1024) - 0.5

    # dry run
    for i in xrange(10):
        z_cpu = np.dot(x_cpu, y_cpu)
    z_cpu.asnumpy()

    # real run
    t0 = time.time()
    for i in xrange(n):
        z_cpu = np.dot(x_cpu, y_cpu)
    z_cpu.asnumpy()
    t1 = time.time()

with gpu(0):
    x_gpu0 = random.rand(1024, 1024) - 0.5
    y_gpu0 = random.rand(1024, 1024) - 0.5

    # dry run
    for i in xrange(10):
        z_gpu0 = np.dot(x_gpu0, y_gpu0)
    z_gpu0.asnumpy()

    # real run
    t2 = time.time()
    for i in xrange(n):
        z_gpu0 = np.dot(x_gpu0, y_gpu0)
    z_gpu0.asnumpy()
    t3 = time.time()

print("run on cpu: %.6f s/iter" % ((t1 - t0) / n))
print("run on gpu: %.6f s/iter" % ((t3 - t2) / n))
run on cpu: 0.100039 s/iter
run on gpu: 0.000422 s/iter

The asnumpy() call is somewhat mysterious, implying z_cpu is not NumPy’s ndarray type. Indeed this is true. For fast execution, MXNet maintains its own datastrcutre NDArray. This calls re-synced z_cpu into NumPy array.

As you can see, there is a gap between the speeds of matrix multiplication in CPU and GPU. That’s why we set default policy mode as PreferMXNetPolicy, which means MinPy will dispatch the operator to MXNet as much as possible for you, and achieve transparent fallback while there is no MXNet implementation. MXNet operations run on GPU, whereas the fallbacks run on CPU.

See Transparent Fallback for more details.

Something You Need to Know

With Transparent Fallback, we hope to transparently upgrade the running speed without your changing a line of code. This can be done by expanding the MXNet GPU operators.

However, there are some important pitfalls you should know when you try to use MinPy, we strongly suggest that you should read it next.