NumPy under MinPy, with GPU¶
This part of tutorial is also available in step-by-step notebook version on github. Please try it out!
Basic NDArray Operation¶
MinPy has the same syntax as NumPy, which is the language of choice for
numerical computing, and in particular deep learning. The popular
Stanford course cs231n
uses NumPy as its main coursework. To use NumPy under MinPy, you only
need to replace import numpy as np
with import minpy.numpy as np
at the header of your NumPy program. if you are not familiar with NumPy,
you may want to look up NumPy Quickstart
Tutorial
for more details.
Using NumPy under MinPy has two simple but important reasons, one for productivity and another for performance: 1) Auto-differentiation, and 2) GPU/CPU co-execution. We will discuss them in this tutorial.
But first, let us review some of the most common usages of NumPy.
Array Creation¶
An array can be created in multiple ways. For example, we can create an
array from a regular Python list or tuple by using the array
function
In [1]:
import minpy.numpy as np
a = np.array([1,2,3]) # create a 1-dimensional array with a python list
b = np.array([[1,2,3], [2,3,4]]) # create a 2-dimensional array with a nested python list
Here are some useful ways to create arrays with initial placeholder content.
In [2]:
a = np.zeros((2,3)) # create a 2-dimensional array full of zeros with shape (2,3)
b = np.ones((2,3)) # create a same shape array full of ones
c = np.full((2,3), 7) # create a same shape array with all elements set to 7
d = np.empty((2,3)) # create a same shape whose initial content is random and depends on the state of the memory
Basic Operations¶
Arithmetic operators on arrays apply elementwise, with a new array holding result.
In [3]:
a = np.ones((2,3))
b = np.ones((2,3))
c = a + b # elementwise plus
d = - c # elementwise minus
print(d)
e = np.sin(c**2).T # elementwise pow and sin, and then transpose
print(e)
f = np.maximum(a, c) # elementwise max
print(f)
[[-2. -2. -2.]
[-2. -2. -2.]]
[[-0.7568025 -0.7568025]
[-0.7568025 -0.7568025]
[-0.7568025 -0.7568025]]
[[ 2. 2. 2.]
[ 2. 2. 2.]]
Indexing and Slicing¶
The slice operator []
applies on axis 0.
In [4]:
a = np.arange(6)
a = np.reshape(a, (3,2))
print(a[:])
# assign -1 to the 2nd row
a[1:2] = -1
print(a)
[[ 0. 1.]
[ 2. 3.]
[ 4. 5.]]
[[ 0. 1.]
[-1. -1.]
[ 4. 5.]]
We can also slice a particular axis with the method slice_axis
In [5]:
# slice out the 2nd column
d = np.slice_axis(a, axis=1, begin=1, end=2)
print(d)
[[ 1.]
[-1.]
[ 5.]]
AutoGrad Feature¶
If you work in a policy mode called NumpyOnlyPolicy
(refer
here
for more details), MinPy is almost compatible with the most of NumPy
usages. But what makes MinPy awesome is that it give you the power of
autograd, saving you from writing the most tedious and error prone part
of deep net implementation:
In [6]:
from minpy.core import grad
# define a function: f(x) = 5*x^2 + 3*x - 2
def foo(x):
return 5*(x**2) + 3*x - 2
# f(4) = 90
print(foo(4))
# get the derivative function by `grad`: f'(x) = 10*x + 3
d_foo = grad(foo)
# f'(4) = 43.0
print(d_foo(4))
90
43.0
More details about this part can be found in Autograd Tutorial.
GPU Support¶
But we do not stop here, we want MinPy not only friendly to use, but also fast. To this end, MinPy leverages GPU’s parallel computing ability. The code below shows our GPU support and a set of API to make you freely to change the runnning context (i.e. to run on CPU or GPU). You can refer to Select Context for MXNet for more details.
In [7]:
import minpy.numpy as np
import minpy.numpy.random as random
from minpy.context import cpu, gpu
import time
n = 100
with cpu():
x_cpu = random.rand(1024, 1024) - 0.5
y_cpu = random.rand(1024, 1024) - 0.5
# dry run
for i in xrange(10):
z_cpu = np.dot(x_cpu, y_cpu)
z_cpu.asnumpy()
# real run
t0 = time.time()
for i in xrange(n):
z_cpu = np.dot(x_cpu, y_cpu)
z_cpu.asnumpy()
t1 = time.time()
with gpu(0):
x_gpu0 = random.rand(1024, 1024) - 0.5
y_gpu0 = random.rand(1024, 1024) - 0.5
# dry run
for i in xrange(10):
z_gpu0 = np.dot(x_gpu0, y_gpu0)
z_gpu0.asnumpy()
# real run
t2 = time.time()
for i in xrange(n):
z_gpu0 = np.dot(x_gpu0, y_gpu0)
z_gpu0.asnumpy()
t3 = time.time()
print("run on cpu: %.6f s/iter" % ((t1 - t0) / n))
print("run on gpu: %.6f s/iter" % ((t3 - t2) / n))
run on cpu: 0.100039 s/iter
run on gpu: 0.000422 s/iter
The asnumpy()
call is somewhat mysterious, implying z_cpu
is not
NumPy’s ndarray
type. Indeed this is true. For fast execution, MXNet
maintains its own datastrcutre NDArray
. This calls re-synced
z_cpu
into NumPy array.
As you can see, there is a gap between the speeds of matrix
multiplication in CPU and GPU. That’s why we set default policy mode as
PreferMXNetPolicy
, which means MinPy will dispatch the operator to
MXNet as much as possible for you, and achieve transparent fallback
while there is no MXNet implementation. MXNet operations run on GPU,
whereas the fallbacks run on CPU.
See Transparent Fallback for more details.
Something You Need to Know¶
With Transparent Fallback, we hope to transparently upgrade the running speed without your changing a line of code. This can be done by expanding the MXNet GPU operators.
However, there are some important pitfalls you should know when you try to use MinPy, we strongly suggest that you should read it next.