NumPy under MinPy, with GPU¶
This part of tutorial is also available in step-by-step notebook version on github. Please try it out!
Basic NDArray Operation¶
MinPy has the same syntax as NumPy, which is the language of choice for
numerical computing, and in particular deep learning. The popular
Stanford course cs231n
uses NumPy as its main coursework. To use NumPy under MinPy, you only
need to replace
import numpy as np with
import minpy.numpy as np
at the header of your NumPy program. if you are not familiar with NumPy,
you may want to look up NumPy Quickstart
for more details.
Using NumPy under MinPy has two simple but important reasons, one for productivity and another for performance: 1) Auto-differentiation, and 2) GPU/CPU co-execution. We will discuss them in this tutorial.
But first, let us review some of the most common usages of NumPy.
An array can be created in multiple ways. For example, we can create an
array from a regular Python list or tuple by using the
import minpy.numpy as np a = np.array([1,2,3]) # create a 1-dimensional array with a python list b = np.array([[1,2,3], [2,3,4]]) # create a 2-dimensional array with a nested python list
Here are some useful ways to create arrays with initial placeholder content.
a = np.zeros((2,3)) # create a 2-dimensional array full of zeros with shape (2,3) b = np.ones((2,3)) # create a same shape array full of ones c = np.full((2,3), 7) # create a same shape array with all elements set to 7 d = np.empty((2,3)) # create a same shape whose initial content is random and depends on the state of the memory
Arithmetic operators on arrays apply elementwise, with a new array holding result.
a = np.ones((2,3)) b = np.ones((2,3)) c = a + b # elementwise plus d = - c # elementwise minus print(d) e = np.sin(c**2).T # elementwise pow and sin, and then transpose print(e) f = np.maximum(a, c) # elementwise max print(f)
[[-2. -2. -2.] [-2. -2. -2.]] [[-0.7568025 -0.7568025] [-0.7568025 -0.7568025] [-0.7568025 -0.7568025]] [[ 2. 2. 2.] [ 2. 2. 2.]]
Indexing and Slicing¶
The slice operator
 applies on axis 0.
a = np.arange(6) a = np.reshape(a, (3,2)) print(a[:]) # assign -1 to the 2nd row a[1:2] = -1 print(a)
[[ 0. 1.] [ 2. 3.] [ 4. 5.]] [[ 0. 1.] [-1. -1.] [ 4. 5.]]
We can also slice a particular axis with the method
# slice out the 2nd column d = np.slice_axis(a, axis=1, begin=1, end=2) print(d)
[[ 1.] [-1.] [ 5.]]
If you work in a policy mode called
for more details), MinPy is almost compatible with the most of NumPy
usages. But what makes MinPy awesome is that it give you the power of
autograd, saving you from writing the most tedious and error prone part
of deep net implementation:
from minpy.core import grad # define a function: f(x) = 5*x^2 + 3*x - 2 def foo(x): return 5*(x**2) + 3*x - 2 # f(4) = 90 print(foo(4)) # get the derivative function by `grad`: f'(x) = 10*x + 3 d_foo = grad(foo) # f'(4) = 43.0 print(d_foo(4))
More details about this part can be found in Autograd Tutorial.
But we do not stop here, we want MinPy not only friendly to use, but also fast. To this end, MinPy leverages GPU’s parallel computing ability. The code below shows our GPU support and a set of API to make you freely to change the runnning context (i.e. to run on CPU or GPU). You can refer to Select Context for MXNet for more details.
import minpy.numpy as np import minpy.numpy.random as random from minpy.context import cpu, gpu import time n = 100 with cpu(): x_cpu = random.rand(1024, 1024) - 0.5 y_cpu = random.rand(1024, 1024) - 0.5 # dry run for i in xrange(10): z_cpu = np.dot(x_cpu, y_cpu) z_cpu.asnumpy() # real run t0 = time.time() for i in xrange(n): z_cpu = np.dot(x_cpu, y_cpu) z_cpu.asnumpy() t1 = time.time() with gpu(0): x_gpu0 = random.rand(1024, 1024) - 0.5 y_gpu0 = random.rand(1024, 1024) - 0.5 # dry run for i in xrange(10): z_gpu0 = np.dot(x_gpu0, y_gpu0) z_gpu0.asnumpy() # real run t2 = time.time() for i in xrange(n): z_gpu0 = np.dot(x_gpu0, y_gpu0) z_gpu0.asnumpy() t3 = time.time() print("run on cpu: %.6f s/iter" % ((t1 - t0) / n)) print("run on gpu: %.6f s/iter" % ((t3 - t2) / n))
run on cpu: 0.100039 s/iter run on gpu: 0.000422 s/iter
asnumpy() call is somewhat mysterious, implying
z_cpu is not
ndarray type. Indeed this is true. For fast execution, MXNet
maintains its own datastrcutre
NDArray. This calls re-synced
z_cpu into NumPy array.
As you can see, there is a gap between the speeds of matrix
multiplication in CPU and GPU. That’s why we set default policy mode as
PreferMXNetPolicy, which means MinPy will dispatch the operator to
MXNet as much as possible for you, and achieve transparent fallback
while there is no MXNet implementation. MXNet operations run on GPU,
whereas the fallbacks run on CPU.
See Transparent Fallback for more details.
Something You Need to Know¶
With Transparent Fallback, we hope to transparently upgrade the running speed without your changing a line of code. This can be done by expanding the MXNet GPU operators.
However, there are some important pitfalls you should know when you try to use MinPy, we strongly suggest that you should read it next.