我剛剛在Python中遇到了增量Numpy數(shù)組的需要,因?yàn)槲覜]有找到任何實(shí)現(xiàn)它的東西.我只是想知道我的方式是最好的方式還是你可以提出其他想法.
所以,問題是我有一個(gè)2D數(shù)組(程序處理nD數(shù)組),其大小事先是未知的,并且可變數(shù)據(jù)量需要在一個(gè)方向上連接到數(shù)組(讓我們說我要去很多次打電話給np.vstak).每次我連接數(shù)據(jù)時(shí),我都需要獲取數(shù)組,沿軸0排序并執(zhí)行其他操作,因此我無法構(gòu)建一個(gè)長(zhǎng)列表數(shù)組,然后立即對(duì)列表進(jìn)行np.vstak. 由于內(nèi)存分配很昂貴,我轉(zhuǎn)向增量數(shù)組,其中我增加數(shù)量大于我需要的數(shù)量的數(shù)組(我使用50%增量),以便最小化分配數(shù)量.
我對(duì)此進(jìn)行了編碼,您可以在以下代碼中看到它:
class ExpandingArray:
__DEFAULT_ALLOC_INIT_DIM = 10 # default initial dimension for all the axis is nothing is given by the user
__DEFAULT_MAX_INCREMENT = 10 # default value in order to limit the increment of memory allocation
__MAX_INCREMENT = [] # Max increment
__ALLOC_DIMS = [] # Dimensions of the allocated np.array
__DIMS = [] # Dimensions of the view with data on the allocated np.array (__DIMS <= __ALLOC_DIMS)
__ARRAY = [] # Allocated array
def __init__(self,initData,allocInitDim=None,dtype=np.float64,maxIncrement=None):
self.__DIMS = np.array(initData.shape)
self.__MAX_INCREMENT = maxIncrement
if self.__MAX_INCREMENT == None:
self.__MAX_INCREMENT = self.__DEFAULT_MAX_INCREMENT
# Compute the allocation dimensions based on user's input
if allocInitDim == None:
allocInitDim = self.__DIMS.copy()
while np.any( allocInitDim < self.__DIMS ) or np.any(allocInitDim == 0):
for i in range(len(self.__DIMS)):
if allocInitDim[i] == 0:
allocInitDim[i] = self.__DEFAULT_ALLOC_INIT_DIM
if allocInitDim[i] < self.__DIMS[i]:
allocInitDim[i] = min(allocInitDim[i]/2, self.__MAX_INCREMENT)
# Allocate memory
self.__ALLOC_DIMS = allocInitDim
self.__ARRAY = np.zeros(self.__ALLOC_DIMS,dtype=dtype)
# Set initData
sliceIdxs = [slice(self.__DIMS[i]) for i in range(len(self.__DIMS))]
self.__ARRAY[sliceIdxs] = initData
def shape(self):
return tuple(self.__DIMS)
def getAllocArray(self):
return self.__ARRAY
def getDataArray(self):
"""
Get the view of the array with data
"""
sliceIdxs = [slice(self.__DIMS[i]) for i in range(len(self.__DIMS))]
return self.__ARRAY[sliceIdxs]
def concatenate(self,X,axis=0):
if axis > len(self.__DIMS):
print "Error: axis number exceed the number of dimensions"
return
# Check dimensions for remaining axis
for i in range(len(self.__DIMS)):
if i != axis:
if X.shape[i] != self.shape()[i]:
print "Error: Dimensions of the input array are not consistent in the axis %d" % i
return
# Check whether allocated memory is enough
needAlloc = False
while self.__ALLOC_DIMS[axis] < self.__DIMS[axis] X.shape[axis]:
needAlloc = True
# Increase the __ALLOC_DIMS
self.__ALLOC_DIMS[axis] = min(self.__ALLOC_DIMS[axis]/2,self.__MAX_INCREMENT)
# Reallocate memory and copy old data
if needAlloc:
# Allocate
newArray = np.zeros(self.__ALLOC_DIMS)
# Copy
sliceIdxs = [slice(self.__DIMS[i]) for i in range(len(self.__DIMS))]
newArray[sliceIdxs] = self.__ARRAY[sliceIdxs]
self.__ARRAY = newArray
# Concatenate new data
sliceIdxs = []
for i in range(len(self.__DIMS)):
if i != axis:
sliceIdxs.append(slice(self.__DIMS[i]))
else:
sliceIdxs.append(slice(self.__DIMS[i],self.__DIMS[i] X.shape[i]))
self.__ARRAY[sliceIdxs] = X
self.__DIMS[axis] = X.shape[axis]
該代碼顯示出比vstack / hstack幾個(gè)隨機(jī)大小的連接更好的性能.
我想知道的是:這是最好的方式嗎? numpy中有沒有這樣做的東西?
而且這將是很好能夠重載np.array切片賦值運(yùn)算符,所以實(shí)際的尺寸之外,一旦用戶分配什么,一個(gè)ExpandingArray.concatenate()執(zhí)行.怎么做這樣的重載?
測(cè)試代碼:我在這里也發(fā)布了一些代碼,用于比較vstack和我的方法.我添加了最大長(zhǎng)度為100的隨機(jī)數(shù)據(jù)塊.
import time
N = 10000
def performEA(N):
EA = ExpandingArray(np.zeros((0,2)),maxIncrement=1000)
for i in range(N):
nNew = np.random.random_integers(low=1,high=100,size=1)
X = np.random.rand(nNew,2)
EA.concatenate(X,axis=0)
# Perform operations on EA.getDataArray()
return EA
def performVStack(N):
A = np.zeros((0,2))
for i in range(N):
nNew = np.random.random_integers(low=1,high=100,size=1)
X = np.random.rand(nNew,2)
A = np.vstack((A,X))
# Perform operations on A
return A
start_EA = time.clock()
EA = performEA(N)
stop_EA = time.clock()
start_VS = time.clock()
VS = performVStack(N)
stop_VS = time.clock()
print "Elapsed Time EA: %.2f" % (stop_EA-start_EA)
print "Elapsed Time VS: %.2f" % (stop_VS-start_VS)
解決方法: 我認(rèn)為這些東西最常見的設(shè)計(jì)模式是只使用小數(shù)組的列表.當(dāng)然你可以做動(dòng)態(tài)調(diào)整大小的事情(如果你想做瘋狂的事情,你也可以嘗試使用resize數(shù)組方法).我認(rèn)為一種典型的方法是在你真的不知道會(huì)有多大的時(shí)候總是加倍.當(dāng)然,如果您知道陣列將增長(zhǎng)到多大,那么只需預(yù)先分配完整的東西就是最簡(jiǎn)單的.
def performVStack_fromlist(N):
l = []
for i in range(N):
nNew = np.random.random_integers(low=1,high=100,size=1)
X = np.random.rand(nNew,2)
l.append(X)
return np.vstack(l)
我確信有一些用例,其中擴(kuò)展數(shù)組可能很有用(例如當(dāng)附加數(shù)組都非常小時(shí)),但是使用上述模式可以更好地處理這個(gè)循環(huán).優(yōu)化主要是關(guān)于你需要復(fù)制周圍事物的頻率,以及像這樣的列表(除了列表本身之外),這恰好就在這里.所以它通常要快得多. 來源:https://www./content-1-266801.html
|