機(jī)器學(xué)習(xí)模型的求解最終都會歸結(jié)為求解一個最優(yōu)化問題,,最優(yōu)化的目標(biāo)為模型誤差,它是模型參數(shù)的函數(shù),。例如線性回歸的優(yōu)化目標(biāo)是均方誤差,參數(shù)是每個特征的系數(shù),。根據(jù)目標(biāo)函數(shù)的特點(diǎn)(凸與非凸),樣本數(shù)量,特征數(shù)量,,在實踐中會選擇不同的優(yōu)化方法,。常見的優(yōu)化方法包括解析法、梯度下降法,、共軛梯度法,、交替迭代法等,。本案例將對常見的優(yōu)化算法進(jìn)行分析,,以便理解不同優(yōu)化方法的特點(diǎn)和適用場景,,幫助我們在機(jī)器學(xué)習(xí)實踐中選擇最合適的優(yōu)化方法。
1 Python 梯度下降法實現(xiàn) import matplotlib.pyplot as pltimport numpy as npfrom mpl_toolkits.mplot3d import Axes3Dfrom matplotlib import animationfrom IPython.display import HTMLfrom autograd import elementwise_grad, value_and_grad,gradfrom scipy.optimize import minimizefrom scipy import optimizefrom collections import defaultdictfrom itertools import zip_longest plt.rcParams['axes.unicode_minus' ]=False # 用來正常顯示負(fù)號
1.1 實現(xiàn)簡單優(yōu)化函數(shù) 借助 Python 的匿名函數(shù)定義目標(biāo)函數(shù),。
f1 = lambda x1,x2 : x1**2 + 0.5 *x2**2 #函數(shù)定義 f1_grad = value_and_grad(lambda args : f1(*args)) #函數(shù)梯度
1.2 梯度下降法實現(xiàn) 梯度下降法使用以下迭代公式進(jìn)行參數(shù)的更新,。
其中 為學(xué)習(xí)率。我們實現(xiàn) gradient_descent
方法來進(jìn)行參數(shù)的更新,。
def gradient_descent (func, func_grad, x0, learning_rate=0.1 , max_iteration=20 ) : path_list = [x0] best_x = x0 step = 0 while step < max_iteration: update = -learning_rate * np.array(func_grad(best_x)[1 ]) if (np.linalg.norm(update) < 1e-4 ): break best_x = best_x + update path_list.append(best_x) step = step + 1 return best_x, np.array(path_list)
2 梯度下降法求解路徑可視化 首先我們使用上節(jié)實現(xiàn)的梯度下降法求解,,得到參數(shù)的優(yōu)化路徑。
best_x_gd, path_list_gd = gradient_descent(f1,f1_grad,[-4.0 ,4.0 ],0.1 ,30 ) path_list_gd
array([[-4. , 4. ], [-3.2 , 3.6 ], [-2.56 , 3.24 ], [-2.048 , 2.916 ], [-1.6384 , 2.6244 ], [-1.31072 , 2.36196 ], [-1.048576 , 2.125764 ], [-0.8388608 , 1.9131876 ], [-0.67108864, 1.72186884], [-0.53687091, 1.54968196], [-0.42949673, 1.39471376], [-0.34359738, 1.25524238], [-0.27487791, 1.12971815], [-0.21990233, 1.01674633], [-0.17592186, 0.9150717 ], [-0.14073749, 0.82356453], [-0.11258999, 0.74120808], [-0.09007199, 0.66708727], [-0.07205759, 0.60037854], [-0.05764608, 0.54034069], [-0.04611686, 0.48630662], [-0.03689349, 0.43767596], [-0.02951479, 0.39390836], [-0.02361183, 0.35451752], [-0.01888947, 0.31906577], [-0.01511157, 0.2871592 ], [-0.01208926, 0.25844328], [-0.00967141, 0.23259895], [-0.00773713, 0.20933905], [-0.0061897 , 0.18840515], [-0.00495176, 0.16956463]])
2.1 目標(biāo)函數(shù)曲面的可視化 為了將函數(shù)曲面繪制出來,,我們先借助 np.meshgrid
生成網(wǎng)格點(diǎn)坐標(biāo)矩陣,。兩個維度上每個維度顯示范圍為-5到5。對應(yīng)網(wǎng)格點(diǎn)的函數(shù)值保存在 z
中,。
x1,x2 = np.meshgrid(np.linspace(-5.0 ,5.0 ,50 ), np.linspace(-5.0 ,5.0 ,50 )) z = f1(x1,x2 ) minima = np.array([0 , 0 ]) #對于函數(shù)f1,,我們已知最小點(diǎn)為(0,0)
ax.plot_surface?
Matplotlib 中的 plot_surface
函數(shù)能夠幫助我們繪制3D函數(shù)曲面圖。函數(shù)的主要參數(shù)如下表所示,。
%matplotlib inline fig = plt.figure(figsize=(8 , 8 )) ax = plt.axes(projection='3d' , elev=50 , azim=-50 ) ax.plot_surface(x1,x2, z, alpha=.8 , cmap=plt.cm.jet) ax.plot([minima[0 ]],[minima[1 ]],[f1(*minima)], 'r*' , markersize=10 ) ax.set_xlabel('$x1$' ) ax.set_ylabel('$x2$' ) ax.set_zlabel('$f$' ) ax.set_xlim((-5 , 5 )) ax.set_ylim((-5 , 5 )) plt.show()
2.2 繪制等高線和梯度場 contour
方法能夠繪制等高線,,clabel
能夠?qū)?yīng)線的高度(函數(shù)值)顯示出來,這里我們保留兩位小數(shù)(fmt='%.2f'
),。
dz_dx1 = elementwise_grad(f1, argnum=0 )(x1, x2) dz_dx2 = elementwise_grad(f1, argnum=1 )(x1, x2)
fig, ax = plt.subplots(figsize=(6 , 6 )) contour = ax.contour(x1, x2, z,levels=20 ,cmap=plt.cm.jet) ax.clabel(contour,fontsize=10 ,colors='k' ,fmt='%.2f' ) ax.plot(*minima, 'r*' , markersize=18 ) ax.set_xlabel('$x1$' ) ax.set_ylabel('$x2$' ) ax.set_xlim((-5 , 5 )) ax.set_ylim((-5 , 5 )) plt.show()
2.3 梯度下降法求解路徑二維動畫可視化 借助 quiver
函數(shù),,我們可以將梯度下降法得到的優(yōu)化路徑使用箭頭連接進(jìn)行可視化。
fig, ax = plt.subplots(figsize=(6 , 6 )) ax.contour(x1, x2, z, levels=20 ,cmap=plt.cm.jet)#等高線 #繪制軌跡箭頭 ax.quiver(path_list_gd[:-1 ,0 ], path_list_gd[:-1 ,1 ], path_list_gd[1 :,0 ]-path_list_gd[:-1 ,0 ], path_list_gd[1 :,1 ]-path_list_gd[:-1 ,1 ], scale_units='xy' , angles='xy' , scale=1 , color='k' )#標(biāo)注最優(yōu)值點(diǎn) ax.plot(*minima, 'r*' , markersize=18 ) ax.set_xlabel('$x1$' ) ax.set_ylabel('$x2$' ) ax.set_xlim((-5 , 5 )) ax.set_ylim((-5 , 5 )) plt.show()
使用動畫將每一步的路徑展示出來,,我們使用 animation.FuncAnimation
類來完成動畫模擬,,然后使用 .to_jshtml
方法將動畫顯示出來。
path = path_list_gd #梯度下降法的優(yōu)化路徑 fig, ax = plt.subplots(figsize=(6 , 6 )) line, = ax.plot([], [], 'b' , label='Gradient Descent' , lw=2 ) #保存路徑 point, = ax.plot([], [], 'bo' ) #保存路徑最后的點(diǎn) def init_draw () : ax.contour(x1, x2, z, levels=20 , cmap=plt.cm.jet) ax.plot(*minima, 'r*' , markersize=18 ) #將最小值點(diǎn)繪制成紅色五角星 ax.set_xlabel('$x$' ) ax.set_ylabel('$y$' ) ax.set_xlim((-5 , 5 )) ax.set_ylim((-5 , 5 )) return line, pointdef update_draw (i) : line.set_data(path[:i,0 ],path[:i,1 ]) point.set_data(path[i-1 :i,0 ],path[i-1 :i,1 ]) plt.close() return line, point anim = animation.FuncAnimation(fig, update_draw, init_func=init_draw,frames=path.shape[0 ], interval=60 , repeat_delay=5 , blit=True ) HTML(anim.to_jshtml())
3 不同優(yōu)化方法對比 使用 `scipy.optimize` [1] 模塊求解最優(yōu)化問題,。由于我們需要對優(yōu)化路徑進(jìn)行可視化,,因此 minimize
函數(shù)需要制定一個回調(diào)函數(shù)參數(shù) callback
,。
x0 = np.array([-4 , 4 ])def make_minimize_cb (path=[]) : def minimize_cb (xk) : path.append(np.copy(xk)) return minimize_cb
3.1 選取不同的優(yōu)化方法求解 在這里我們選取 scipy.optimize
模塊實現(xiàn)的一些常見的優(yōu)化方法。
methods = [ 'CG' , 'BFGS' ,'Newton-CG' ,'L-BFGS-B' ]
import warnings warnings.filterwarnings('ignore' ) #該行代碼的作用是隱藏警告信息 x0 = [-4.0 ,4.0 ] paths = [] zpaths = []for method in methods: path = [x0] res = minimize(fun=f1_grad, x0=x0,jac=True ,method = method,callback=make_minimize_cb(path), bounds=[(-5 , 5 ), (-5 , 5 )], tol=1e-20 ) paths.append(np.array(path))
增加我們自己實現(xiàn)的梯度下降法的結(jié)果,。
methods.append('GD' ) paths.append(path_list_gd) zpaths = [f1(path[:,0 ],path[:,1 ]) for path in paths]
3.2 實現(xiàn)動畫演示封裝類 封裝一個 TrajectoryAnimation
類 ,將不同算法得到的優(yōu)化路徑進(jìn)行動畫演示,。本代碼來自 [2] 。
class TrajectoryAnimation (animation.FuncAnimation) : def __init__ (self, paths, labels=[], fig=None, ax=None, frames=None, interval=60 , repeat_delay=5 , blit=True, **kwargs) : #如果傳入的fig和ax參數(shù)為空,,則新建一個fig對象和ax對象 if fig is None : if ax is None : fig, ax = plt.subplots() else : fig = ax.get_figure() else : if ax is None : ax = fig.gca() self.fig = fig self.ax = ax self.paths = paths #動畫的幀數(shù)等于最長的路徑長度 if frames is None : frames = max(path.shape[0 ] for path in paths) #獲取最長的路徑長度 self.lines = [ax.plot([], [], label=label, lw=2 )[0 ] for _, label in zip_longest(paths, labels)] self.points = [ax.plot([], [], 'o' , color=line.get_color())[0 ] for line in self.lines] super(TrajectoryAnimation, self).__init__(fig, self.animate, init_func=self.init_anim, frames=frames, interval=interval, blit=blit, repeat_delay=repeat_delay, **kwargs) def init_anim (self) : for line, point in zip(self.lines, self.points): line.set_data([], []) point.set_data([], []) return self.lines + self.points def animate (self, i) : for line, point, path in zip(self.lines, self.points, self.paths): line.set_data(path[:i,0 ],path[:i,1 ]) point.set_data(path[i-1 :i,0 ],path[i-1 :i,1 ]) plt.close() return self.lines + self.points
3.3 求解路徑的對比 fig, ax = plt.subplots(figsize=(8 , 8 )) ax.contour(x1, x2, z, cmap=plt.cm.jet) ax.plot(*minima, 'r*' , markersize=10 ) ax.set_xlabel('$x1$' ) ax.set_ylabel('$x2$' ) ax.set_xlim((-5 , 5 )) ax.set_ylim((-5 , 5 )) anim = TrajectoryAnimation(paths, labels=methods, ax=ax) ax.legend(loc='upper left' ) HTML(anim.to_jshtml())
3.4 復(fù)雜函數(shù)優(yōu)化的對比 我們再來看一個有多個局部最小值和鞍點(diǎn)的函數(shù),。
f2 = lambda x1, x2 :((4 - 2.1 *x1**2 + x1**4 / 3. ) * x1**2 + x1 * x2 + (-4 + 4 *x2**2 ) * x2 **2 ) f2_grad = value_and_grad(lambda args: f2(*args)) x1,x2 = np.meshgrid(np.linspace(-2.0 ,2.0 ,50 ), np.linspace(-1.0 ,1.0 ,50 )) z = f2(x1,x2 ) %matplotlib inline fig = plt.figure(figsize=(6 , 6 )) ax = plt.axes(projection='3d' , elev=50 , azim=-50 ) ax.plot_surface(x1,x2, z, alpha=.8 , cmap=plt.cm.jet) ax.set_xlabel('$x1$' ) ax.set_ylabel('$x2$' ) ax.set_zlabel('$f$' ) ax.set_xlim((-2.0 , 2.0 )) ax.set_ylim((-1.0 , 1.0 )) plt.show()
使用 Scipy 中實現(xiàn)的不同的優(yōu)化方法以及我們在本案例實現(xiàn)的梯度下降法進(jìn)行求解。
x02 = [-1.0 ,-0.5 ] #初始點(diǎn),,嘗試不同初始點(diǎn),,[-1.0,-0.5] ,[1.5,0.75],[-0.8,0.25] _, path_list_gd2 = gradient_descent(f2,f2_grad,x02,0.1 ,30 ) #使用梯度下降法求解 paths = [] zpaths = [] methods = [ 'CG' , 'BFGS' ,'Newton-CG' ,'L-BFGS-B' ]for method in methods: path = [x02] res = minimize(fun=f2_grad, x0=x02,jac=True ,method = method,callback=make_minimize_cb(path), bounds=[(-2.0 , 2.0 ), (-1.0 , 1.0 )], tol=1e-20 ) paths.append(np.array(path)) methods.append('GD' ) paths.append(path_list_gd2) zpaths = [f2(path[:,0 ],path[:,1 ]) for path in paths]
將不同方法的求解路徑以動畫形式顯示出來,。
%matplotlib inline fig, ax = plt.subplots(figsize=(8 , 8 )) contour = ax.contour(x1, x2, z, levels=50 , cmap=plt.cm.jet) ax.clabel(contour,fontsize=10 ,colors='k' ,fmt='%.2f' ) ax.set_xlabel('$x1$' ) ax.set_ylabel('$x2$' ) ax.set_xlim((-2.0 , 2.0 )) ax.set_ylim((-1.0 , 1.0 )) anim = TrajectoryAnimation(paths, labels=methods, ax=ax) ax.legend(loc='upper left' ) HTML(anim.to_jshtml())
4 使用不同的優(yōu)化算法求解手寫數(shù)字分類問題 4.1 手寫數(shù)字?jǐn)?shù)據(jù)加載和預(yù)處理 MNIST 手寫數(shù)字?jǐn)?shù)據(jù)集是在圖像處理和深度學(xué)習(xí)領(lǐng)域一個著名的圖像數(shù)據(jù)集,。該數(shù)據(jù)集包含一份 60000 個圖像樣本的訓(xùn)練集和包含 10000 個圖像樣本的測試集。每一個樣本是 的圖像,,每個圖像有一個標(biāo)簽,,標(biāo)簽取值為 0-9 。MNIST 數(shù)據(jù)集下載地址為 http://yann./exdb/mnist/ [3] ,。
import numpy as np f = np.load('input/mnist.npz' ) X_train, y_train, X_test, y_test = f['x_train' ], f['y_train' ],f['x_test' ], f['y_test' ] f.close() x_train = X_train.reshape((-1 , 28 *28 )) / 255.0 x_test = X_test.reshape((-1 , 28 *28 )) / 255.0
隨機(jī)打印一些手寫數(shù)字,查看數(shù)據(jù)集,。
rndperm = np.random.permutation(len(x_train)) %matplotlib inlineimport matplotlib.pyplot as plt plt.gray() fig = plt.figure( figsize=(8 ,8 ) )for i in range(0 ,100 ): ax = fig.add_subplot(10 ,10 ,i+1 ) ax.matshow(x_train[rndperm[i]].reshape((28 ,28 ))) plt.box(False ) #去掉邊框 plt.axis('off' )#不顯示坐標(biāo)軸 plt.show()
< Figure size 432x288 with 0 Axes >
為了便于后續(xù)模型訓(xùn)練,,對手寫數(shù)字的標(biāo)簽進(jìn)行 One-Hot 編碼。
import pandas as pd y_train_onehot = pd.get_dummies(y_train) y_train_onehot.head()
0 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 2 0 0 0 0 1 0 0 0 0 0 3 0 1 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 1
4.2 使用 TensorFlow 構(gòu)建手寫數(shù)字識別神經(jīng)網(wǎng)絡(luò) 構(gòu)建一個簡單的全連接神經(jīng)網(wǎng)絡(luò),,用于手寫數(shù)字的分類,,網(wǎng)絡(luò)結(jié)構(gòu)如下圖所示:
import tensorflow as tfimport tensorflow.keras.layers as layers
現(xiàn)在我們構(gòu)建上述神經(jīng)網(wǎng)絡(luò),結(jié)構(gòu)為 784->100->100->50->10
,。
inputs = layers.Input(shape=(28 *28 ,), name='inputs' ) hidden1 = layers.Dense(100 , activation='relu' , name='hidden1' )(inputs) hidden2 = layers.Dense(100 , activation='relu' , name='hidden2' )(hidden1) hidden3 = layers.Dense(50 , activation='relu' , name='hidden3' )(hidden2) outputs = layers.Dense(10 , activation='softmax' , name='outputs' )(hidden3) deep_networks = tf.keras.Model(inputs,outputs) deep_networks.summary()
_________________________________________________________________ Layer (type ) Output Shape Param # ================================================================= inputs (InputLayer) (None, 784) 0 _________________________________________________________________ hidden1 (Dense) (None, 100) 78500 _________________________________________________________________ hidden2 (Dense) (None, 100) 10100 _________________________________________________________________ hidden3 (Dense) (None, 50) 5050 _________________________________________________________________ outputs (Dense) (None, 10) 510 ================================================================= Total params: 94,160 Trainable params: 94,160 Non-trainable params: 0 _________________________________________________________________
4.3 損失函數(shù),、優(yōu)化方法選擇與模型訓(xùn)練 deep_networks.compile(optimizer='SGD' ,loss='categorical_crossentropy' ,metrics=['accuracy' ]) #定義誤差和優(yōu)化方法 SGD,RMSprop,Adam,Adagrad,Nadam %time history = deep_networks.fit(x_train, y_train_onehot, batch_size=500 , epochs=10 ,validation_split=0.5 ,verbose=1 ) #模型訓(xùn)練
Train on 30000 samples, validate on 30000 samplesEpoch 1/1030000/30000 [==============================] - 1s 27us/step - loss: 0.0516 - acc: 0.9865 - val_loss: 0.1246 - val_acc: 0.9634Epoch 2/1030000/30000 [==============================] - 0s 16us/step - loss: 0.0502 - acc: 0.9869 - val_loss: 0.1243 - val_acc: 0.9634Epoch 3/1030000/30000 [==============================] - 0s 16us/step - loss: 0.0496 - acc: 0.9871 - val_loss: 0.1244 - val_acc: 0.9634Epoch 4/1030000/30000 [==============================] - 0s 16us/step - loss: 0.0492 - acc: 0.9874 - val_loss: 0.1244 - val_acc: 0.9634Epoch 5/1030000/30000 [==============================] - 0s 16us/step - loss: 0.0489 - acc: 0.9875 - val_loss: 0.1247 - val_acc: 0.9633Epoch 6/1030000/30000 [==============================] - 0s 16us/step - loss: 0.0485 - acc: 0.9873 - val_loss: 0.1244 - val_acc: 0.9635Epoch 7/1030000/30000 [==============================] - 0s 16us/step - loss: 0.0483 - acc: 0.9873 - val_loss: 0.1244 - val_acc: 0.9637Epoch 8/1030000/30000 [==============================] - 0s 16us/step - loss: 0.0479 - acc: 0.9878 - val_loss: 0.1242 - val_acc: 0.9636Epoch 9/1030000/30000 [==============================] - 0s 16us/step - loss: 0.0477 - acc: 0.9874 - val_loss: 0.1245 - val_acc: 0.9636Epoch 10/1030000/30000 [==============================] - 0s 17us/step - loss: 0.0475 - acc: 0.9874 - val_loss: 0.1245 - val_acc: 0.9637CPU times: user 17.8 s, sys: 2.08 s, total: 19.8 sWall time: 5.36 s
打印誤差變化曲線,。
fig, ax = plt.subplots(figsize=(20 , 8 )) ax.plot(history.epoch, history.history['loss' ]) ax.set_xlabel('$epoch$' ) ax.set_ylabel('$loss$' )
Text(0, 0.5, '$loss$')
test_loss, test_acc = deep_networks.evaluate(x_test, pd.get_dummies(y_test), verbose=2 ) print('\nTest accuracy:' , test_acc)
Test accuracy: 0.9667
5 總結(jié) 本案例我們實現(xiàn)了梯度下降法,,借助 Scipy 的 optimize
模塊,在兩個不同的二維函數(shù)上使用梯度下降,、共軛梯度下降法和擬牛頓法的優(yōu)化路徑,,并使用 Matplotlib 進(jìn)行了動畫展示。然后在手寫數(shù)字?jǐn)?shù)據(jù)集上,,我們使用 TensorFlow 構(gòu)建分類模型,,使用不同的優(yōu)化方法進(jìn)行模型訓(xùn)練。本案例主要用到的 Python 包列舉如下,。
包或方法 版本 用途 Matplotlib 3.0.2 繪制三維曲面,繪制等高線,,制作動畫,,繪制梯度場(箭頭 Scipy 1.0.0 scipy.optimize.minimize 求解最優(yōu)化問題 TensorFlow 1.12.0 構(gòu)建手寫數(shù)字神經(jīng)網(wǎng)絡(luò)模型 Pandas 0.23.4 數(shù)據(jù)預(yù)處理,One-Hot 編碼
參考資料 [1] scipy.optimize
: http://docs./doc/scipy/reference/optimize.html
[2] : http:///notes/visualizing-and-animating-optimization-algorithms-with-matplotlib/
[3] http://yann./exdb/mnist/: http://yann./exdb/mnist/