Issue
I'm getting some efficiency test results that I can't explain.
I want to assemble a matrix B whose i-th entries B[i,:,:] = A[i,:,:].dot(x), where each A[i,:,:] is a 2D matrix, and so is x.
I can do this three ways, to test performance I make random (numpy.random.randn
) matrices A = (10,1000,1000), x = (1000,1200). I get the following time results:
(1) single multidimensional dot product
B = A.dot(x)
total time: 102.361 s
(2) looping through i and performing 2D dot products
# initialize B = np.zeros([dim1, dim2, dim3])
for i in range(A.shape[0]):
B[i,:,:] = A[i,:,:].dot(x)
total time: 0.826 s
(3) numpy.einsum
B3 = np.einsum("ijk, kl -> ijl", A, x)
total time: 8.289 s
So, option (2) is the fastest by far. But, considering just (1) and (2), I don't see the big difference between them. How can looping through and doing 2D dot products be ~ 124 times faster? They both use numpy.dot. Any insights?
I include the code used for the above results just below:
import numpy as np
import numpy.random as npr
import time
dim1, dim2, dim3 = 10, 1000, 1200
A = npr.randn(dim1, dim2, dim2)
x = npr.randn(dim2, dim3)
# consider three ways of assembling the same matrix B: B1, B2, B3
t = time.time()
B1 = np.dot(A,x)
td1 = time.time() - t
print "a single dot product of A [shape = (%d, %d, %d)] with x [shape = (%d, %d)] completes in %.3f s" \
% (A.shape[0], A.shape[1], A.shape[2], x.shape[0], x.shape[1], td1)
B2 = np.zeros([A.shape[0], x.shape[0], x.shape[1]])
t = time.time()
for i in range(A.shape[0]):
B2[i,:,:] = np.dot(A[i,:,:], x)
td2 = time.time() - t
print "taking %d dot products of 2D dot products A[i,:,:] [shape = (%d, %d)] with x [shape = (%d, %d)] completes in %.3f s" \
% (A.shape[0], A.shape[1], A.shape[2], x.shape[0], x.shape[1], td2)
t = time.time()
B3 = np.einsum("ijk, kl -> ijl", A, x)
td3 = time.time() - t
print "using np.einsum, it completes in %.3f s" % td3
Solution
With smaller dims 10,100,200
, I get a similar ranking
In [355]: %%timeit
.....: B=np.zeros((N,M,L))
.....: for i in range(N):
B[i,:,:]=np.dot(A[i,:,:],x)
.....:
10 loops, best of 3: 22.5 ms per loop
In [356]: timeit np.dot(A,x)
10 loops, best of 3: 44.2 ms per loop
In [357]: timeit np.einsum('ijk,km->ijm',A,x)
10 loops, best of 3: 29 ms per loop
In [367]: timeit np.dot(A.reshape(-1,M),x).reshape(N,M,L)
10 loops, best of 3: 22.1 ms per loop
In [375]: timeit np.tensordot(A,x,(2,0))
10 loops, best of 3: 22.2 ms per loop
the itererative is faster, though not by as much as in your case.
This is probably true as long as that iterating dimension is small compared to the other ones. In that case the overhead of iteration (function calls etc) is small compared to the calculation time. And doing all the values at once uses more memory.
I tried a dot
variation where I reshaped A
into 2d, thinking that dot
does that kind of reshaping internally. I'm surprised that it is actually fastest. tensordot
is probably doing the same reshaping (that code if Python readable).
einsum
sets up a 'sum of products' iteration involving 4 variables, the i,j,k,m
- that is dim1*dim2*dim2*dim3
steps with the C level nditer
. So the more indices you have the larger the iteration space.
Answered By - hpaulj Answer Checked By - Katrina (PHPFixing Volunteer)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.