直播案例 | 機器學(xué)習(xí)中常用優(yōu)化算法的 Python 實踐

昵稱14934981 2020-07-11

展開全文

獲取案例鏈接、直播課件、數(shù)據(jù)集在本公眾號內(nèi)發(fā)送“機器學(xué)習(xí)”。

機器學(xué)習(xí)模型的求解最終都會歸結(jié)為求解一個最優(yōu)化問題，最優(yōu)化的目標(biāo)為模型誤差，它是模型參數(shù)的函數(shù)。例如線性回歸的優(yōu)化目標(biāo)是均方誤差，參數(shù)是每個特征的系數(shù)。根據(jù)目標(biāo)函數(shù)的特點（凸與非凸），樣本數(shù)量，特征數(shù)量，在實踐中會選擇不同的優(yōu)化方法。常見的優(yōu)化方法包括解析法、梯度下降法、共軛梯度法、交替迭代法等。本案例將對常見的優(yōu)化算法進行分析，以便理解不同優(yōu)化方法的特點和適用場景，幫助我們在機器學(xué)習(xí)實踐中選擇最合適的優(yōu)化方法。

1 Python 梯度下降法實現(xiàn)

import matplotlib.pyplot as plt
import numpy as np

from mpl_toolkits.mplot3d import Axes3D
from matplotlib import animation
from IPython.display import HTML

from autograd import elementwise_grad, value_and_grad,grad
from scipy.optimize import minimize
from scipy import optimize
from collections import defaultdict
from itertools import zip_longest
plt.rcParams['axes.unicode_minus']=False  # 用來正常顯示負號

1.1 實現(xiàn)簡單優(yōu)化函數(shù)

借助 Python 的匿名函數(shù)定義目標(biāo)函數(shù)。

f1 = lambda x1,x2 : x1**2 + 0.5*x2**2 #函數(shù)定義
f1_grad = value_and_grad(lambda args : f1(*args)) #函數(shù)梯度

1.2 梯度下降法實現(xiàn)

梯度下降法使用以下迭代公式進行參數(shù)的更新。

其中為學(xué)習(xí)率。我們實現(xiàn) gradient_descent 方法來進行參數(shù)的更新。

def gradient_descent(func, func_grad, x0, learning_rate=0.1, max_iteration=20):
    path_list = [x0]
    best_x = x0
    step = 0
    while step < max_iteration:
        update = -learning_rate * np.array(func_grad(best_x)[1])
        if(np.linalg.norm(update) < 1e-4):
            break
        best_x = best_x + update
        path_list.append(best_x)
        step = step + 1
    return best_x, np.array(path_list)

2 梯度下降法求解路徑可視化

首先我們使用上節(jié)實現(xiàn)的梯度下降法求解，得到參數(shù)的優(yōu)化路徑。

best_x_gd, path_list_gd = gradient_descent(f1,f1_grad,[-4.0,4.0],0.1,30)
path_list_gd

array([[-4. , 4. ],
[-3.2 , 3.6 ],
[-2.56 , 3.24 ],
[-2.048 , 2.916 ],
[-1.6384 , 2.6244 ],
[-1.31072 , 2.36196 ],
[-1.048576 , 2.125764 ],
[-0.8388608 , 1.9131876 ],
[-0.67108864, 1.72186884],
[-0.53687091, 1.54968196],
[-0.42949673, 1.39471376],
[-0.34359738, 1.25524238],
[-0.27487791, 1.12971815],
[-0.21990233, 1.01674633],
[-0.17592186, 0.9150717 ],
[-0.14073749, 0.82356453],
[-0.11258999, 0.74120808],
[-0.09007199, 0.66708727],
[-0.07205759, 0.60037854],
[-0.05764608, 0.54034069],
[-0.04611686, 0.48630662],
[-0.03689349, 0.43767596],
[-0.02951479, 0.39390836],
[-0.02361183, 0.35451752],
[-0.01888947, 0.31906577],
[-0.01511157, 0.2871592 ],
[-0.01208926, 0.25844328],
[-0.00967141, 0.23259895],
[-0.00773713, 0.20933905],
[-0.0061897 , 0.18840515],
[-0.00495176, 0.16956463]])

2.1 目標(biāo)函數(shù)曲面的可視化

為了將函數(shù)曲面繪制出來，我們先借助 np.meshgrid 生成網(wǎng)格點坐標(biāo)矩陣。兩個維度上每個維度顯示范圍為-5到5。對應(yīng)網(wǎng)格點的函數(shù)值保存在 z 中。

x1,x2 = np.meshgrid(np.linspace(-5.0,5.0,50), np.linspace(-5.0,5.0,50))
z = f1(x1,x2 )
minima = np.array([0, 0]) #對于函數(shù)f1，我們已知最小點為(0,0)

ax.plot_surface?

Matplotlib 中的 plot_surface 函數(shù)能夠幫助我們繪制3D函數(shù)曲面圖。函數(shù)的主要參數(shù)如下表所示。

%matplotlib inline
fig = plt.figure(figsize=(8, 8))
ax = plt.axes(projection='3d', elev=50, azim=-50)

ax.plot_surface(x1,x2, z, alpha=.8, cmap=plt.cm.jet)
ax.plot([minima[0]],[minima[1]],[f1(*minima)], 'r*', markersize=10)

ax.set_xlabel('$x1$')
ax.set_ylabel('$x2$')
ax.set_zlabel('$f$')

ax.set_xlim((-5, 5))
ax.set_ylim((-5, 5))

plt.show()

2.2 繪制等高線和梯度場

contour 方法能夠繪制等高線，clabel 能夠?qū)?yīng)線的高度（函數(shù)值）顯示出來，這里我們保留兩位小數(shù)（fmt='%.2f'）。

dz_dx1 = elementwise_grad(f1, argnum=0)(x1, x2)
dz_dx2 = elementwise_grad(f1, argnum=1)(x1, x2)

fig, ax = plt.subplots(figsize=(6, 6))

contour = ax.contour(x1, x2, z,levels=20,cmap=plt.cm.jet)
ax.clabel(contour,fontsize=10,colors='k',fmt='%.2f')
ax.plot(*minima, 'r*', markersize=18)

ax.set_xlabel('$x1$')
ax.set_ylabel('$x2$')

ax.set_xlim((-5, 5))
ax.set_ylim((-5, 5))

plt.show()

2.3 梯度下降法求解路徑二維動畫可視化

借助 quiver 函數(shù)，我們可以將梯度下降法得到的優(yōu)化路徑使用箭頭連接進行可視化。

fig, ax = plt.subplots(figsize=(6, 6))

ax.contour(x1, x2, z, levels=20,cmap=plt.cm.jet)#等高線
#繪制軌跡箭頭
ax.quiver(path_list_gd[:-1,0], path_list_gd[:-1,1], path_list_gd[1:,0]-path_list_gd[:-1,0], path_list_gd[1:,1]-path_list_gd[:-1,1], scale_units='xy', angles='xy', scale=1, color='k')
#標(biāo)注最優(yōu)值點
ax.plot(*minima, 'r*', markersize=18)

ax.set_xlabel('$x1$')
ax.set_ylabel('$x2$')

ax.set_xlim((-5, 5))
ax.set_ylim((-5, 5))
plt.show()

使用動畫將每一步的路徑展示出來，我們使用 animation.FuncAnimation 類來完成動畫模擬，然后使用 .to_jshtml 方法將動畫顯示出來。

path = path_list_gd #梯度下降法的優(yōu)化路徑
fig, ax = plt.subplots(figsize=(6, 6))
line, = ax.plot([], [], 'b', label='Gradient Descent', lw=2) #保存路徑
point, = ax.plot([], [], 'bo') #保存路徑最后的點

def init_draw(): 
    ax.contour(x1, x2, z, levels=20, cmap=plt.cm.jet)
    ax.plot(*minima, 'r*', markersize=18) #將最小值點繪制成紅色五角星
    ax.set_xlabel('$x$')
    ax.set_ylabel('$y$')
    ax.set_xlim((-5, 5))
    ax.set_ylim((-5, 5))
    return line, point

def update_draw(i):
    line.set_data(path[:i,0],path[:i,1])
    point.set_data(path[i-1:i,0],path[i-1:i,1])
    plt.close()
    return line, point

anim = animation.FuncAnimation(fig, update_draw, init_func=init_draw,frames=path.shape[0], interval=60, repeat_delay=5, blit=True)
HTML(anim.to_jshtml())

3 不同優(yōu)化方法對比

使用 `scipy.optimize`^[1] 模塊求解最優(yōu)化問題。由于我們需要對優(yōu)化路徑進行可視化，因此 minimize 函數(shù)需要制定一個回調(diào)函數(shù)參數(shù) callback。

x0 = np.array([-4, 4])
def make_minimize_cb(path=[]):
    
    def minimize_cb(xk):
        path.append(np.copy(xk))

    return minimize_cb

3.1 選取不同的優(yōu)化方法求解

在這里我們選取 scipy.optimize 模塊實現(xiàn)的一些常見的優(yōu)化方法。

methods = [ 'CG', 'BFGS','Newton-CG','L-BFGS-B']

import warnings
warnings.filterwarnings('ignore') #該行代碼的作用是隱藏警告信息
x0 = [-4.0,4.0]
paths = []
zpaths = []
for method in methods:
    path = [x0]
    res = minimize(fun=f1_grad, x0=x0,jac=True,method = method,callback=make_minimize_cb(path), bounds=[(-5, 5), (-5, 5)], tol=1e-20)
    paths.append(np.array(path))

增加我們自己實現(xiàn)的梯度下降法的結(jié)果。

methods.append('GD')
paths.append(path_list_gd)
zpaths = [f1(path[:,0],path[:,1]) for path in paths]

3.2 實現(xiàn)動畫演示封裝類

封裝一個 TrajectoryAnimation 類 ,將不同算法得到的優(yōu)化路徑進行動畫演示。本代碼來自 ^[2]。

class TrajectoryAnimation(animation.FuncAnimation):
    
    def __init__(self, paths, labels=[], fig=None, ax=None, frames=None, 
                 interval=60, repeat_delay=5, blit=True, **kwargs):
        #如果傳入的fig和ax參數(shù)為空，則新建一個fig對象和ax對象
        if fig is None:
            if ax is None:
                fig, ax = plt.subplots()
            else:
                fig = ax.get_figure()
        else:
            if ax is None:
                ax = fig.gca()
        self.fig = fig
        self.ax = ax 
        self.paths = paths
        #動畫的幀數(shù)等于最長的路徑長度
        if frames is None:
            frames = max(path.shape[0] for path in paths) #獲取最長的路徑長度
        self.lines = [ax.plot([], [], label=label, lw=2)[0] 
                      for _, label in zip_longest(paths, labels)]
        self.points = [ax.plot([], [], 'o', color=line.get_color())[0] 
                       for line in self.lines]
        super(TrajectoryAnimation, self).__init__(fig, self.animate, init_func=self.init_anim,
                                                  frames=frames, interval=interval, blit=blit,
                                                  repeat_delay=repeat_delay, **kwargs)
    def init_anim(self):
        for line, point in zip(self.lines, self.points):
            line.set_data([], [])
            point.set_data([], [])
        return self.lines + self.points

    def animate(self, i):
        for line, point, path in zip(self.lines, self.points, self.paths):
            line.set_data(path[:i,0],path[:i,1])
            point.set_data(path[i-1:i,0],path[i-1:i,1])
            plt.close()
        return self.lines + self.points

3.3 求解路徑的對比

fig, ax = plt.subplots(figsize=(8, 8))

ax.contour(x1, x2, z, cmap=plt.cm.jet)
ax.plot(*minima, 'r*', markersize=10)

ax.set_xlabel('$x1$')
ax.set_ylabel('$x2$')

ax.set_xlim((-5, 5))
ax.set_ylim((-5, 5))

anim = TrajectoryAnimation(paths, labels=methods, ax=ax)

ax.legend(loc='upper left')
HTML(anim.to_jshtml())

3.4 復(fù)雜函數(shù)優(yōu)化的對比

我們再來看一個有多個局部最小值和鞍點的函數(shù)。

f2 = lambda x1, x2 :((4 - 2.1*x1**2 + x1**4 / 3.) * x1**2 + x1 * x2  + (-4 + 4*x2**2) * x2 **2)
f2_grad = value_and_grad(lambda args: f2(*args))

x1,x2 = np.meshgrid(np.linspace(-2.0,2.0,50), np.linspace(-1.0,1.0,50))
z = f2(x1,x2 )

%matplotlib inline
fig = plt.figure(figsize=(6, 6))
ax = plt.axes(projection='3d', elev=50, azim=-50)

ax.plot_surface(x1,x2, z, alpha=.8, cmap=plt.cm.jet)

ax.set_xlabel('$x1$')
ax.set_ylabel('$x2$')
ax.set_zlabel('$f$')

ax.set_xlim((-2.0, 2.0))
ax.set_ylim((-1.0, 1.0))

plt.show()

使用 Scipy 中實現(xiàn)的不同的優(yōu)化方法以及我們在本案例實現(xiàn)的梯度下降法進行求解。

x02 = [-1.0,-0.5]  #初始點，嘗試不同初始點，[-1.0,-0.5] ，[1.5,0.75],[-0.8,0.25]
_, path_list_gd2 = gradient_descent(f2,f2_grad,x02,0.1,30) #使用梯度下降法求解

paths = []
zpaths = []
methods = [ 'CG', 'BFGS','Newton-CG','L-BFGS-B']
for method in methods:
    path = [x02]
    res = minimize(fun=f2_grad, x0=x02,jac=True,method = method,callback=make_minimize_cb(path), bounds=[(-2.0, 2.0), (-1.0, 1.0)], tol=1e-20)
    paths.append(np.array(path))
    
methods.append('GD')
paths.append(path_list_gd2)
zpaths = [f2(path[:,0],path[:,1]) for path in paths]

將不同方法的求解路徑以動畫形式顯示出來。

%matplotlib inline
fig, ax = plt.subplots(figsize=(8, 8))

contour = ax.contour(x1, x2, z, levels=50, cmap=plt.cm.jet)
ax.clabel(contour,fontsize=10,colors='k',fmt='%.2f')
ax.set_xlabel('$x1$')
ax.set_ylabel('$x2$')

ax.set_xlim((-2.0, 2.0))
ax.set_ylim((-1.0, 1.0))

anim = TrajectoryAnimation(paths, labels=methods, ax=ax)
ax.legend(loc='upper left')
HTML(anim.to_jshtml())

4 使用不同的優(yōu)化算法求解手寫數(shù)字分類問題

4.1 手寫數(shù)字數(shù)據(jù)加載和預(yù)處理

MNIST 手寫數(shù)字數(shù)據(jù)集是在圖像處理和深度學(xué)習(xí)領(lǐng)域一個著名的圖像數(shù)據(jù)集。該數(shù)據(jù)集包含一份 60000 個圖像樣本的訓(xùn)練集和包含 10000 個圖像樣本的測試集。每一個樣本是的圖像，每個圖像有一個標(biāo)簽，標(biāo)簽取值為 0-9 。MNIST 數(shù)據(jù)集下載地址為 http://yann./exdb/mnist/^[3]。

import numpy as np
f = np.load('input/mnist.npz') 
X_train, y_train, X_test, y_test = f['x_train'], f['y_train'],f['x_test'], f['y_test']
f.close()
x_train = X_train.reshape((-1, 28*28)) / 255.0
x_test = X_test.reshape((-1, 28*28)) / 255.0

隨機打印一些手寫數(shù)字，查看數(shù)據(jù)集。

rndperm = np.random.permutation(len(x_train))
%matplotlib inline
import matplotlib.pyplot as plt
plt.gray()
fig = plt.figure( figsize=(8,8) )
for i in range(0,100):
    ax = fig.add_subplot(10,10,i+1)
    ax.matshow(x_train[rndperm[i]].reshape((28,28)))
    plt.box(False) #去掉邊框
    plt.axis('off')#不顯示坐標(biāo)軸  
plt.show()

< Figure size 432x288 with 0 Axes >

為了便于后續(xù)模型訓(xùn)練，對手寫數(shù)字的標(biāo)簽進行 One-Hot 編碼。

import pandas as pd
y_train_onehot = pd.get_dummies(y_train)
y_train_onehot.head()

	0	1	4	5	9
0	0	0	0	1	0
1	1	0	0	0	0
2	0	0	1	0	0
3	0	1	0	0	0
4	0	0	0	0	1

4.2 使用 TensorFlow 構(gòu)建手寫數(shù)字識別神經(jīng)網(wǎng)絡(luò)

構(gòu)建一個簡單的全連接神經(jīng)網(wǎng)絡(luò)，用于手寫數(shù)字的分類，網(wǎng)絡(luò)結(jié)構(gòu)如下圖所示：

import tensorflow as tf
import tensorflow.keras.layers as layers

現(xiàn)在我們構(gòu)建上述神經(jīng)網(wǎng)絡(luò)，結(jié)構(gòu)為 784->100->100->50->10。

inputs = layers.Input(shape=(28*28,), name='inputs')
hidden1 = layers.Dense(100, activation='relu', name='hidden1')(inputs)
hidden2 = layers.Dense(100, activation='relu', name='hidden2')(hidden1)
hidden3 = layers.Dense(50, activation='relu', name='hidden3')(hidden2)
outputs = layers.Dense(10, activation='softmax', name='outputs')(hidden3)
deep_networks = tf.keras.Model(inputs,outputs)
deep_networks.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
inputs (InputLayer)          (None, 784)               0         
_________________________________________________________________
hidden1 (Dense)              (None, 100)               78500     
_________________________________________________________________
hidden2 (Dense)              (None, 100)               10100     
_________________________________________________________________
hidden3 (Dense)              (None, 50)                5050      
_________________________________________________________________
outputs (Dense)              (None, 10)                510       
=================================================================
Total params: 94,160
Trainable params: 94,160
Non-trainable params: 0
_________________________________________________________________

4.3 損失函數(shù)、優(yōu)化方法選擇與模型訓(xùn)練

deep_networks.compile(optimizer='SGD',loss='categorical_crossentropy',metrics=['accuracy']) #定義誤差和優(yōu)化方法 SGD,RMSprop,Adam,Adagrad，Nadam
%time history = deep_networks.fit(x_train, y_train_onehot, batch_size=500, epochs=10,validation_split=0.5,verbose=1) #模型訓(xùn)練

Train on 30000 samples, validate on 30000 samplesEpoch 1/1030000/30000 [==============================] - 1s 27us/step - loss: 0.0516 - acc: 0.9865 - val_loss: 0.1246 - val_acc: 0.9634Epoch 2/1030000/30000 [==============================] - 0s 16us/step - loss: 0.0502 - acc: 0.9869 - val_loss: 0.1243 - val_acc: 0.9634Epoch 3/1030000/30000 [==============================] - 0s 16us/step - loss: 0.0496 - acc: 0.9871 - val_loss: 0.1244 - val_acc: 0.9634Epoch 4/1030000/30000 [==============================] - 0s 16us/step - loss: 0.0492 - acc: 0.9874 - val_loss: 0.1244 - val_acc: 0.9634Epoch 5/1030000/30000 [==============================] - 0s 16us/step - loss: 0.0489 - acc: 0.9875 - val_loss: 0.1247 - val_acc: 0.9633Epoch 6/1030000/30000 [==============================] - 0s 16us/step - loss: 0.0485 - acc: 0.9873 - val_loss: 0.1244 - val_acc: 0.9635Epoch 7/1030000/30000 [==============================] - 0s 16us/step - loss: 0.0483 - acc: 0.9873 - val_loss: 0.1244 - val_acc: 0.9637Epoch 8/1030000/30000 [==============================] - 0s 16us/step - loss: 0.0479 - acc: 0.9878 - val_loss: 0.1242 - val_acc: 0.9636Epoch 9/1030000/30000 [==============================] - 0s 16us/step - loss: 0.0477 - acc: 0.9874 - val_loss: 0.1245 - val_acc: 0.9636Epoch 10/1030000/30000 [==============================] - 0s 17us/step - loss: 0.0475 - acc: 0.9874 - val_loss: 0.1245 - val_acc: 0.9637CPU times: user 17.8 s, sys: 2.08 s, total: 19.8 sWall time: 5.36 s

打印誤差變化曲線。

fig, ax = plt.subplots(figsize=(20, 8))

ax.plot(history.epoch, history.history['loss'])

ax.set_xlabel('$epoch$')
ax.set_ylabel('$loss$')

Text(0, 0.5, '$loss$')

test_loss, test_acc = deep_networks.evaluate(x_test,  pd.get_dummies(y_test), verbose=2)

print('\nTest accuracy:', test_acc)

Test accuracy: 0.9667

5 總結(jié)

本案例我們實現(xiàn)了梯度下降法，借助 Scipy 的 optimize 模塊，在兩個不同的二維函數(shù)上使用梯度下降、共軛梯度下降法和擬牛頓法的優(yōu)化路徑，并使用 Matplotlib 進行了動畫展示。然后在手寫數(shù)字數(shù)據(jù)集上，我們使用 TensorFlow 構(gòu)建分類模型，使用不同的優(yōu)化方法進行模型訓(xùn)練。本案例主要用到的 Python 包列舉如下。

包或方法	版本	用途
Matplotlib	3.0.2	繪制三維曲面,繪制等高線，制作動畫，繪制梯度場（箭頭
Scipy	1.0.0	scipy.optimize.minimize 求解最優(yōu)化問題
TensorFlow	1.12.0	構(gòu)建手寫數(shù)字神經(jīng)網(wǎng)絡(luò)模型
Pandas	0.23.4	數(shù)據(jù)預(yù)處理，One-Hot 編碼