python – 在NumPy數(shù)組中搜索序列

印度阿三17 2019-10-02

展開全文

假設(shè)我有以下數(shù)組：

 array([2, 0, 0, 1, 0, 1, 0, 0])

如何獲得我出現(xiàn)值序列的索引：[0,0]？因此,這種情況的預(yù)期輸出將是：[1,2,6,7].

編輯：

1)請注意[0,0]只是一個(gè)序列.它可能是[0,0,0]或[4,6,8,9]或[5,2,0],只是任何東西.

2)如果我的數(shù)組被修改為：array([2,0,0,0,0,1,0,1,0,0]),具有相同序列[0,0]的預(yù)期結(jié)果將是[ 1,2,3,4,8,9.

我正在尋找一些NumPy快捷方式.

解決方法:

嗯,這基本上是圖像處理中出現(xiàn)的template-matching problem.在這篇文章中列出了兩種方法：基于Pure NumPy和基于OpenCV(cv2).

方法#1：使用NumPy,可以在輸入數(shù)組的整個(gè)長度上創(chuàng)建一個(gè)滑動(dòng)索引的2D數(shù)組.因此,每行將是元素的滑動(dòng)窗口.接下來,將每一行與輸入序列匹配,這將為矢量化解決方案帶來broadcasting.我們尋找所有True行,表明那些是完美的匹配,因此將是匹配的起始索引.最后,使用這些索引,創(chuàng)建一系列延伸到序列長度的索引,以便為我們提供所需的輸出.實(shí)施將是 –

def search_sequence_numpy(arr,seq):
    """ Find sequence in an array using NumPy only.

    Parameters
    ----------    
    arr    : input 1D array
    seq    : input 1D array

    Output
    ------    
    Output : 1D Array of indices in the input array that satisfy the 
    matching of input sequence in the input array.
    In case of no match, an empty list is returned.
    """

    # Store sizes of input array and sequence
    Na, Nseq = arr.size, seq.size

    # Range of sequence
    r_seq = np.arange(Nseq)

    # Create a 2D array of sliding indices across the entire length of input array.
    # Match up with the input sequence & get the matching starting indices.
    M = (arr[np.arange(Na-Nseq 1)[:,None]   r_seq] == seq).all(1)

    # Get the range of those indices as final output
    if M.any() >0:
        return np.where(np.convolve(M,np.ones((Nseq),dtype=int))>0)[0]
    else:
        return []         # No match found

方法#2：使用OpenCV(cv2),我們有一個(gè)用于模板匹配的內(nèi)置函數(shù)：cv2.matchTemplate.使用這個(gè),我們將得到起始匹配索引.其余步驟與前一種方法相同.這是cv2的實(shí)現(xiàn)：

from cv2 import matchTemplate as cv2m

def search_sequence_cv2(arr,seq):
    """ Find sequence in an array using cv2.
    """

    # Run a template match with input sequence as the template across
    # the entire length of the input array and get scores.
    S = cv2m(arr.astype('uint8'),seq.astype('uint8'),cv2.TM_SQDIFF)

    # Now, with floating point array cases, the matching scores might not be 
    # exactly zeros, but would be very small numbers as compared to others.
    # So, for that use a very small to be used to threshold the scorees 
    # against and decide for matches.
    thresh = 1e-5 # Would depend on elements in seq. So, be careful setting this.

    # Find the matching indices
    idx = np.where(S.ravel() < thresh)[0]

    # Get the range of those indices as final output
    if len(idx)>0:
        return np.unique((idx[:,None]   np.arange(seq.size)).ravel())
    else:
        return []         # No match found

樣品運(yùn)行

In [512]: arr = np.array([2, 0, 0, 0, 0, 1, 0, 1, 0, 0])

In [513]: seq = np.array([0,0])

In [514]: search_sequence_numpy(arr,seq)
Out[514]: array([1, 2, 3, 4, 8, 9])

In [515]: search_sequence_cv2(arr,seq)
Out[515]: array([1, 2, 3, 4, 8, 9])

運(yùn)行時(shí)測試

In [477]: arr = np.random.randint(0,9,(100000))
     ...: seq = np.array([3,6,8,4])
     ...: 

In [478]: np.allclose(search_sequence_numpy(arr,seq),search_sequence_cv2(arr,seq))
Out[478]: True

In [479]: %timeit search_sequence_numpy(arr,seq)
100 loops, best of 3: 11.8 ms per loop

In [480]: %timeit search_sequence_cv2(arr,seq)
10 loops, best of 3: 20.6 ms per loop

看起來像Pure NumPy一樣是最安全和最快的！

來源：https://www./content-1-480101.html

本站是提供個(gè)人知識管理的網(wǎng)絡(luò)存儲(chǔ)空間，所有內(nèi)容均由用戶發(fā)布,，不代表本站觀點(diǎn)。請注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購買等信息,，謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容,，請點(diǎn)擊一鍵舉報(bào),。

轉(zhuǎn)藏 分享

QQ空間 QQ好友新浪微博微信

獻(xiàn)花（0） +1

來自：印度阿三17 > 《開發(fā)》

舉報(bào)/認(rèn)領(lǐng)