Python中的大小增量Numpy數(shù)組

印度阿三17 2019-06-25

展開全文

我剛剛在Python中遇到了增量Numpy數(shù)組的需要,因?yàn)槲覜]有找到任何實(shí)現(xiàn)它的東西.我只是想知道我的方式是最好的方式還是你可以提出其他想法.

所以,問題是我有一個(gè)2D數(shù)組(程序處理nD數(shù)組),其大小事先是未知的,并且可變數(shù)據(jù)量需要在一個(gè)方向上連接到數(shù)組(讓我們說我要去很多次打電話給np.vstak).每次我連接數(shù)據(jù)時(shí),我都需要獲取數(shù)組,沿軸0排序并執(zhí)行其他操作,因此我無法構(gòu)建一個(gè)長(zhǎng)列表數(shù)組,然后立即對(duì)列表進(jìn)行np.vstak.
由于內(nèi)存分配很昂貴,我轉(zhuǎn)向增量數(shù)組,其中我增加數(shù)量大于我需要的數(shù)量的數(shù)組(我使用50％增量),以便最小化分配數(shù)量.

我對(duì)此進(jìn)行了編碼,您可以在以下代碼中看到它：

class ExpandingArray:

    __DEFAULT_ALLOC_INIT_DIM = 10   # default initial dimension for all the axis is nothing is given by the user
    __DEFAULT_MAX_INCREMENT = 10    # default value in order to limit the increment of memory allocation

    __MAX_INCREMENT = []    # Max increment
    __ALLOC_DIMS = []       # Dimensions of the allocated np.array
    __DIMS = []             # Dimensions of the view with data on the allocated np.array (__DIMS <= __ALLOC_DIMS)

    __ARRAY = []            # Allocated array

    def __init__(self,initData,allocInitDim=None,dtype=np.float64,maxIncrement=None):
        self.__DIMS = np.array(initData.shape)

        self.__MAX_INCREMENT = maxIncrement
        if self.__MAX_INCREMENT == None:
            self.__MAX_INCREMENT = self.__DEFAULT_MAX_INCREMENT

        # Compute the allocation dimensions based on user's input
        if allocInitDim == None:
            allocInitDim = self.__DIMS.copy()

        while np.any( allocInitDim < self.__DIMS  ) or np.any(allocInitDim == 0):
            for i in range(len(self.__DIMS)):
                if allocInitDim[i] == 0:
                    allocInitDim[i] = self.__DEFAULT_ALLOC_INIT_DIM
                if allocInitDim[i] < self.__DIMS[i]:
                    allocInitDim[i]  = min(allocInitDim[i]/2, self.__MAX_INCREMENT)

        # Allocate memory 
        self.__ALLOC_DIMS = allocInitDim
        self.__ARRAY = np.zeros(self.__ALLOC_DIMS,dtype=dtype)

        # Set initData 
        sliceIdxs = [slice(self.__DIMS[i]) for i in range(len(self.__DIMS))]
        self.__ARRAY[sliceIdxs] = initData

    def shape(self):
        return tuple(self.__DIMS)

    def getAllocArray(self):
        return self.__ARRAY

    def getDataArray(self):
        """
        Get the view of the array with data
        """
        sliceIdxs = [slice(self.__DIMS[i]) for i in range(len(self.__DIMS))]
        return self.__ARRAY[sliceIdxs]

    def concatenate(self,X,axis=0):
        if axis > len(self.__DIMS):
            print "Error: axis number exceed the number of dimensions"
            return

        # Check dimensions for remaining axis 
        for i in range(len(self.__DIMS)):
            if i != axis:
                if X.shape[i] != self.shape()[i]:
                    print "Error: Dimensions of the input array are not consistent in the axis %d" % i
                    return

        # Check whether allocated memory is enough 
        needAlloc = False
        while self.__ALLOC_DIMS[axis] < self.__DIMS[axis]   X.shape[axis]:
            needAlloc = True
            # Increase the __ALLOC_DIMS 
            self.__ALLOC_DIMS[axis]  = min(self.__ALLOC_DIMS[axis]/2,self.__MAX_INCREMENT)

        # Reallocate memory and copy old data 
        if needAlloc:
            # Allocate 
            newArray = np.zeros(self.__ALLOC_DIMS)
            # Copy 
            sliceIdxs = [slice(self.__DIMS[i]) for i in range(len(self.__DIMS))]
            newArray[sliceIdxs] = self.__ARRAY[sliceIdxs]
            self.__ARRAY = newArray

        # Concatenate new data 
        sliceIdxs = []
        for i in range(len(self.__DIMS)):
            if i != axis:
                sliceIdxs.append(slice(self.__DIMS[i]))
            else:
                sliceIdxs.append(slice(self.__DIMS[i],self.__DIMS[i] X.shape[i]))

        self.__ARRAY[sliceIdxs] = X
        self.__DIMS[axis]  = X.shape[axis]

該代碼顯示出比vstack / hstack幾個(gè)隨機(jī)大小的連接更好的性能.

我想知道的是：這是最好的方式嗎？ numpy中有沒有這樣做的東西？

而且這將是很好能夠重載np.array切片賦值運(yùn)算符,所以實(shí)際的尺寸之外,一旦用戶分配什么,一個(gè)ExpandingArray.concatenate()執(zhí)行.怎么做這樣的重載？

測(cè)試代碼：我在這里也發(fā)布了一些代碼,用于比較vstack和我的方法.我添加了最大長(zhǎng)度為100的隨機(jī)數(shù)據(jù)塊.

import time

N = 10000

def performEA(N):
    EA = ExpandingArray(np.zeros((0,2)),maxIncrement=1000)
    for i in range(N):
        nNew = np.random.random_integers(low=1,high=100,size=1)
        X = np.random.rand(nNew,2)
        EA.concatenate(X,axis=0)
        # Perform operations on EA.getDataArray()
    return EA

def performVStack(N):
    A = np.zeros((0,2))
    for i in range(N):
        nNew = np.random.random_integers(low=1,high=100,size=1)
        X = np.random.rand(nNew,2)
        A = np.vstack((A,X))
        # Perform operations on A
    return A

start_EA = time.clock()
EA = performEA(N)
stop_EA = time.clock()

start_VS = time.clock()
VS = performVStack(N)
stop_VS = time.clock()

print "Elapsed Time EA: %.2f" % (stop_EA-start_EA)
print "Elapsed Time VS: %.2f" % (stop_VS-start_VS)

解決方法:

我認(rèn)為這些東西最常見的設(shè)計(jì)模式是只使用小數(shù)組的列表.當(dāng)然你可以做動(dòng)態(tài)調(diào)整大小的事情(如果你想做瘋狂的事情,你也可以嘗試使用resize數(shù)組方法).我認(rèn)為一種典型的方法是在你真的不知道會(huì)有多大的時(shí)候總是加倍.當(dāng)然,如果您知道陣列將增長(zhǎng)到多大,那么只需預(yù)先分配完整的東西就是最簡(jiǎn)單的.

def performVStack_fromlist(N):
    l = []
    for i in range(N):
        nNew = np.random.random_integers(low=1,high=100,size=1)
        X = np.random.rand(nNew,2)
        l.append(X)
    return np.vstack(l)

我確信有一些用例,其中擴(kuò)展數(shù)組可能很有用(例如當(dāng)附加數(shù)組都非常小時(shí)),但是使用上述模式可以更好地處理這個(gè)循環(huán).優(yōu)化主要是關(guān)于你需要復(fù)制周圍事物的頻率,以及像這樣的列表(除了列表本身之外),這恰好就在這里.所以它通常要快得多.

來源：https://www./content-1-266801.html

本站是提供個(gè)人知識(shí)管理的網(wǎng)絡(luò)存儲(chǔ)空間，所有內(nèi)容均由用戶發(fā)布，不代表本站觀點(diǎn)。請(qǐng)注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購買等信息，謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請(qǐng)點(diǎn)擊一鍵舉報(bào)。

轉(zhuǎn)藏 分享

QQ空間 QQ好友新浪微博微信

獻(xiàn)花（0） +1

來自：印度阿三17 > 《開發(fā)》

舉報(bào)/認(rèn)領(lǐng)