【原】每一個(gè)GISer都應(yīng)該了解的算法

GIS薈 2021-09-19

展開(kāi)全文

前言：作為一個(gè) GISer 需要掌握多方面的知識(shí)，哪怕不是開(kāi)發(fā)方向，了解了解天天和我們打交道的 GIS 數(shù)據(jù)變化背后的原因也是很好的！

除了社交、娛樂(lè)，人們外出使用最多的軟件可能就是各種地圖導(dǎo)航軟件了，最常用的比如高德和百度地圖。

打開(kāi)軟件，雙指滑動(dòng)手機(jī)屏幕，放大縮小地圖的時(shí)候，你可能會(huì)注意到這樣的情況：放大地圖，道路變得崎嶇，彎來(lái)繞去；縮小地圖，道路一下子變得筆直。就像下面這種情況，造成這種情況的原因就是 RDP 算法或者是相關(guān)的改進(jìn)算法。

同樣在 GIS 軟件中的抽稀和概化功能就是相應(yīng)算法的實(shí)現(xiàn)。

廣州市道路截取自百度地圖；
左：道路較為筆直；右：道路較為曲折

什么是 RDP 算法

RDP 算法的全稱(chēng)是拉默-道格拉斯-普克算法（英文：Ramer–Douglas–Peucker algorithm），分別是三位科學(xué)家的名字，又稱(chēng)道格拉斯-普克算法（DP），也稱(chēng)迭代端點(diǎn)擬合算法（英語(yǔ)：iterative end-point fit algorithm）。

這是一種將曲線(xiàn)（折線(xiàn)）降采樣為點(diǎn)數(shù)較少的類(lèi)似曲線(xiàn)（折線(xiàn)）的算法，簡(jiǎn)單來(lái)說(shuō)就是簡(jiǎn)化；是線(xiàn)狀要素抽稀的經(jīng)典算法（也可以叫概化），在保證幾何形狀基本不變的情況下，去除大量冗余折點(diǎn)，縮小體積。

這對(duì)于網(wǎng)絡(luò)地圖是非常重要的，能減少加載時(shí)間，增強(qiáng)程序穩(wěn)定性，保證終端設(shè)備的流暢，進(jìn)而提升用戶(hù)的體驗(yàn)。

RDP 算法的應(yīng)用非常廣泛，特別是 GIS 領(lǐng)域。

與 RDP 相似的另一個(gè)著名的算法是Visvalingam-Whyatt。大家可登陸 mapshaper.org 網(wǎng)站在線(xiàn)使用這兩種算法，可調(diào)節(jié)參數(shù)，可導(dǎo)出處理后的 shp 數(shù)據(jù)。

算法基本原理

將待處理曲線(xiàn)（折線(xiàn)）的首末端點(diǎn)連成一條直線(xiàn)，求所有中間點(diǎn)與直線(xiàn)的距離，并找出最大距離 max，用 max 與抽稀閾值 ε 相比較：

若 max <= ε，這條曲線(xiàn)上的中間點(diǎn)全部舍去;

若 max > ε，保留 max 對(duì)應(yīng)的坐標(biāo)點(diǎn)，并以該點(diǎn)為界，把曲線(xiàn)分為兩部分，對(duì)這兩部分曲線(xiàn)重復(fù)上述過(guò)程，直至所有的點(diǎn)都被處理完成；

最后將首尾端點(diǎn)和保存下來(lái)的點(diǎn)相連，獲得簡(jiǎn)化后的曲線(xiàn)（折現(xiàn)）。

原理演示 from wikipedia

原理演示（動(dòng)態(tài)） from wikipedia

沒(méi)人看的代碼實(shí)現(xiàn)

Python 已經(jīng)有實(shí)現(xiàn)其算法的第三方庫(kù) rdp，你可以通過(guò)以下命令安裝。

pip install rdp

該模塊非常簡(jiǎn)潔，僅專(zhuān)注于實(shí)現(xiàn) rdp 算法，其 Python 源代碼如下：

"""
rdp
~~~

Python implementation of the Ramer-Douglas-Peucker algorithm.

:copyright: 2014-2016 Fabian Hirschmann <fabian@hirschmann.email>
:license: MIT, see LICENSE.txt for more details.

"""
from math import sqrt
from functools import partial
import numpy as np
import sys

if sys.version_info[0] >= 3:
xrange = range

def pldist(point, start, end):
"""
Calculates the distance from ``point`` to the line given
by the points ``start`` and ``end``.

:param point: a point
:type point: numpy array
:param start: a point of the line
:type start: numpy array
:param end: another point of the line
:type end: numpy array
"""
if np.all(np.equal(start, end)):
return np.linalg.norm(point - start)

return np.divide(
np.abs(np.linalg.norm(np.cross(end - start, start - point))),
np.linalg.norm(end - start))

def rdp_rec(M, epsilon, dist=pldist):
"""
Simplifies a given array of points.

Recursive version.

:param M: an array
:type M: numpy array
:param epsilon: epsilon in the rdp algorithm
:type epsilon: float
:param dist: distance function
:type dist: function with signature ``f(point, start, end)`` -- see :func:`rdp.pldist`
"""
dmax = 0.0
index = -1

for i in xrange(1, M.shape[0]):
d = dist(M[i], M[0], M[-1])

if d > dmax:
index = i
dmax = d

if dmax > epsilon:
r1 = rdp_rec(M[:index + 1], epsilon, dist)
r2 = rdp_rec(M[index:], epsilon, dist)

return np.vstack((r1[:-1], r2))
else:
return np.vstack((M[0], M[-1]))

def _rdp_iter(M, start_index, last_index, epsilon, dist=pldist):
stk = []
stk.append([start_index, last_index])
global_start_index = start_index
indices = np.ones(last_index - start_index + 1, dtype=bool)

while stk:
start_index, last_index = stk.pop()

dmax = 0.0
index = start_index

for i in xrange(index + 1, last_index):
if indices[i - global_start_index]:
d = dist(M[i], M[start_index], M[last_index])
if d > dmax:
index = i
dmax = d

if dmax > epsilon:
stk.append([start_index, index])
stk.append([index, last_index])
else:
for i in xrange(start_index + 1, last_index):
indices[i - global_start_index] = False

return indices

def rdp_iter(M, epsilon, dist=pldist, return_mask=False):
"""
Simplifies a given array of points.

Iterative version.

:param M: an array
:type M: numpy array
:param epsilon: epsilon in the rdp algorithm
:type epsilon: float
:param dist: distance function
:type dist: function with signature ``f(point, start, end)`` -- see :func:`rdp.pldist`
:param return_mask: return the mask of points to keep instead
:type return_mask: bool
"""
mask = _rdp_iter(M, 0, len(M) - 1, epsilon, dist)

if return_mask:
return mask

return M[mask]

def rdp(M, epsilon=0, dist=pldist, algo="iter", return_mask=False):
"""
Simplifies a given array of points using the Ramer-Douglas-Peucker
algorithm.

Example:

# >>> from rdp import rdp
# >>> rdp([[1, 1], [2, 2], [3, 3], [4, 4]])
[[1, 1], [4, 4]]

This is a convenience wrapper around both :func:`rdp.rdp_iter`
and :func:`rdp.rdp_rec` that detects if the input is a numpy array
in order to adapt the output accordingly. This means that
when it is called using a Python list as argument, a Python
list is returned, and in case of an invocation using a numpy
array, a NumPy array is returned.

The parameter ``return_mask=True`` can be used in conjunction
with ``algo="iter"`` to return only the mask of points to keep. Example:

# >>> from rdp import rdp
# >>> import numpy as np
# >>> arr = np.array([1, 1, 2, 2, 3, 3, 4, 4]).reshape(4, 2)
# >>> arr
array([[1, 1],
[2, 2],
[3, 3],
[4, 4]])
# >>> mask = rdp(arr, algo="iter", return_mask=True)
# >>> mask
array([ True, False, False, True], dtype=bool)
# >>> arr[mask]
array([[1, 1],
[4, 4]])

:param M: a series of points
:type M: numpy array with shape ``(n,d)`` where ``n`` is the number of points and ``d`` their dimension
:param epsilon: epsilon in the rdp algorithm
:type epsilon: float
:param dist: distance function
:type dist: function with signature ``f(point, start, end)`` -- see :func:`rdp.pldist`
:param algo: either ``iter`` for an iterative algorithm or ``rec`` for a recursive algorithm
:type algo: string
:param return_mask: return mask instead of simplified array
:type return_mask: bool
"""

if algo == "iter":
algo = partial(rdp_iter, return_mask=return_mask)
elif algo == "rec":
if return_mask:
raise NotImplementedError(
"return_mask=True not supported with algo=\"rec\"")
algo = rdp_rec

if "numpy" in str(type(M)):
return algo(M, epsilon, dist)

return algo(np.array(M), epsilon, dist).tolist()

該源代碼中，計(jì)算每個(gè)點(diǎn)到直線(xiàn)距離封裝成了一個(gè)單獨(dú)的函數(shù)：pldist 。

然后根據(jù)迭代、或者遞歸這兩種編程技巧（方式）分別實(shí)現(xiàn)了 rdp，供用戶(hù)選擇。默認(rèn)選擇是遞歸實(shí)現(xiàn)。

同樣閾值（epsilon）的設(shè)置非常重要，下面有三張?jiān)诓煌撝登闆r下，進(jìn)行曲線(xiàn)（折線(xiàn)）簡(jiǎn)化后的可視化對(duì)比：其中藍(lán)色曲線(xiàn)是原始數(shù)據(jù)的曲線(xiàn)，紅色是簡(jiǎn)化后的曲線(xiàn)。

閾值=0 原始點(diǎn)：77 簡(jiǎn)化后：50；
當(dāng)閾值為0時(shí)，僅會(huì)舍棄一條直線(xiàn)上的多余節(jié)點(diǎn)

閾值=1 原始點(diǎn)：77 簡(jiǎn)化后：16

閾值=4 原始點(diǎn)：77 簡(jiǎn)化后：4

隨著閾值從0到1再到4，可以直觀(guān)的看到曲線(xiàn)（藍(lán)色）從平滑慢慢變得菱角分明（紅色），魚(yú)與熊掌不可得兼，既要精簡(jiǎn)的數(shù)據(jù)量，又要盡可能的好看平滑，這就是算法開(kāi)發(fā)人員的“白鯨”，夢(mèng)想魚(yú)與熊掌的最大兼得。

隨便一提，閾值為0可用于刪減同一條直線(xiàn)上的多余的點(diǎn)。