本次编程作业的实现环境是Python3.6、Anaconda3(64-bit)、Jupyter Notebook。是网易云课堂上的吴恩达《神经网络与深度学习》的编程作业,个别代码有修改,仅供交流学习之用。
Logistic Regression with a Neural Network mindset?
本次作业对猫图片的识别利用深度学习技术建立一个逻辑回归分类器,将编码实现三个函数:初始化参数、计算代价函数和梯度、使用优化算法,并整合为一个主模型函数。
1-导入软件包?
numpy:基于Python的科学计算函数包
H5py:与H5文件进行数据交互的公共包
matplotlib:Python绘图库
PIL和scipy:测试你的模型和图片
import numpy as np import matplotlib.pyplot as plt plt.rcParams['font.sans-serif']=['FangSong'] plt.rcParams['axes.unicode_minus'] = False import h5py import scipy from PIL import Image from scipy import ndimage from lr_utils import load_dataset
2-浏览问题集?
数据集(data.h5)包括: -包含标签的训练数据集m_train个图片,y=1为猫,y=0非猫, -包含标签的测试数据集m_test个图片, -每个图片的数据形状为(num_px, num_px, 3),分别表示高、宽、RGB通道数。
下面将建立一个图像识别算法对图片分类。先浏览一下数据集。
# 获得数据(猫/非猫) train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset() print(train_set_x_orig.shape, train_set_y.shape, test_set_x_orig.shape, test_set_y.shape, classes.shape) print(type(train_set_y)) print('train_set_x_orig:',train_set_x_orig[:3], '\ntrain_set_y:',train_set_y[:,:3], '\ntest_set_x_orig:',test_set_x_orig[:3], '\ntest_set_y:',test_set_y[:,:3], '\nclasses:',classes[:3]) ? # 显示一个图片 index = 26 plt.imshow(train_set_x_orig[index]) print('y = ' + str(train_set_y[:, index]) + ',它是‘' + classes[np.squeeze(train_set_y[:, index])].decode("utf-8") + '’图')
#----- m_train = train_set_x_orig.shape[0] m_test = test_set_x_orig.shape[0] num_px = train_set_x_orig.shape[1] #----- print ("Number of training examples: m_train = " + str(m_train)) print ("Number of testing examples: m_test = " + str(m_test)) print ("Height/Width of each image: num_px = " + str(num_px)) print ("Each image is of size: (" + str(num_px) + ", " + str(num_px) + ", 3)") print ("train_set_x shape: " + str(train_set_x_orig.shape)) print ("train_set_y shape: " + str(train_set_y.shape)) print ("test_set_x shape: " + str(test_set_x_orig.shape)) print ("test_set_y shape: " + str(test_set_y.shape))
输出:
Number of training examples: m_train = 209
Number of testing examples: m_test = 50
Height/Width of each image: num_px = 64
Each image is of size: (64, 64, 3)
train_set_x shape: (209, 64, 64, 3)
train_set_y shape: (1, 209)
test_set_x shape: (50, 64, 64, 3)
test_set_y shape: (1, 50)
下面将训练数据集和测试数据集图片尺寸从(num_px, num_px, 3)扁平化为单一向量形式(num_px * num_px * 3, 1),对数据集进行标准化处理。
#----- #train_set_x_flatten = np.reshape(train_set_x_orig, (train_set_x_orig.shape[0], num_px*num_px*3)) train_set_x_flatten = train_set_x_orig.reshape(m_train, -1).T #-1什么意思? test_set_x_flatten = test_set_x_orig.reshape(m_test, -1).T #----- print ("train_set_x_flatten shape: " + str(train_set_x_flatten.shape)) print ("train_set_y shape: " + str(train_set_y.shape)) print ("test_set_x_flatten shape: " + str(test_set_x_flatten.shape)) print ("test_set_y shape: " + str(test_set_y.shape)) print ("sanity check after reshaping: " + str(train_set_x_flatten[0:5,0])) train_set_x = train_set_x_flatten / 255. test_set_x = test_set_x_flatten / 255.
4-完成算法的部分代码?
建立神经网络的主要步骤是: 1)定义模型结构(比如特征数量) 2)初始化模型参数 3)循环: 计算前向传播的当前损失值;计算反向传播的当前梯度;更新梯度参数
4.1-Helper函数?
编码实现sigmoid()函数
# GRADED FUNCTION: sigmoid def sigmoid(z): """ Compute the sigmoid of z Arguments: z -- A scalar or numpy array of any size. Return: s -- sigmoid(z) """ ### START CODE HERE ### (≈ 1 line of code) s = 1 / (1 + np.exp(-z)) ### END CODE HERE ### return s print('sigmoid([0,2]) = ' + str(sigmoid(np.array([0,2]))))
输出:
sigmoid([0,2]) = [0.5 0.88079708]
4.2-初始化参数?
将w初始化一个零向量
# GRADED FUNCTION: initialize_with_zeros def initialize_with_zeros(dim): """ This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0. Argument: dim -- size of the w vector we want (or number of parameters in this case) Returns: w -- initialized vector of shape (dim, 1) b -- initialized scalar (corresponds to the bias) """ ### START CODE HERE ### (≈ 1 line of code) w = np.zeros((dim, 1)) b = 0 ### END CODE HERE ### assert(w.shape == (dim, 1)) assert(isinstance(b, float) or isinstance(b, int)) return w, b dim = 2 w, b = initialize_with_zeros(dim) print('w = ' + str(w)) print('b = ' + str(b))
4.3-前向传播和反向传播?
编码实现函数propagate(),计算代价和梯度。
def propagate(w, b, X, Y): """ Implement the cost function and its gradient for the propagation explained above Arguments: w -- weights, a numpy array of size (num_px * num_px * 3, 1) b -- bias, a scalar X -- data of size (num_px * num_px * 3, number of examples) Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples) Return: cost -- negative log-likelihood cost for logistic regression dw -- gradient of the loss with respect to w, thus same shape as w db -- gradient of the loss with respect to b, thus same shape as b """ m = X.shape[1] #实例数量 # FORWARD PROPAGATION (FROM X TO COST),计算激活函数和代价 ### START CODE HERE ### (≈ 2 lines of code) A = sigmoid(np.dot(w.T, X) + b) #print(type(A), A.shape) cost = - (np.dot(Y, np.log(A).T) + np.dot((1 - Y), np.log(1 - A).T)) / m ### END CODE HERE ### # BACKWARD PROPAGATION (TO FIND GRAD) ### START CODE HERE ### (≈ 2 lines of code) dw = 1/m * (np.dot(X, (A - Y).T)) db = 1/m * ((A - Y).sum()) ### END CODE HERE ### assert(dw.shape == w.shape) assert(db.dtype == float) cost = np.squeeze(cost) assert(cost.shape == ()) grads = {"dw": dw, "db": db} return grads, cost w, b, X, Y = np.array([[1.], [2.]]), 2., np.array([[1,2,-1],[3.,4,-3.2]]), np.array([[1,0,1]]) print(w.shape, X.shape, Y.shape) grads, cost = propagate(w, b, X, Y) print ("dw = " + str(grads["dw"])) print ("db = " + str(grads["db"])) print ("cost = " + str(cost))
输出:
(2, 1) (2, 3) (1, 3)
dw = [[0.99845601]
[2.39507239]]
db = 0.001455578136784208
cost = 5.801545319394553
编写梯度优化函数,目的是求代价最小值时的参数w和b。
# GRADED FUNCTION: optimize def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False): """ This function optimizes w and b by running a gradient descent algorithm Arguments: w -- weights, a numpy array of size (num_px * num_px * 3, 1) b -- bias, a scalar X -- data of shape (num_px * num_px * 3, number of examples) Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of shape (1, number of examples) num_iterations -- number of iterations of the optimization loop learning_rate -- learning rate of the gradient descent update rule print_cost -- True to print the loss every 100 steps Returns: params -- dictionary containing the weights w and bias b grads -- dictionary containing the gradients of the weights and bias with respect to the cost function costs -- list of all the costs computed during the optimization, this will be used to plot the learning curve. """ costs = [] for i in range(num_iterations): # Cost and gradient calculation (≈ 1-4 lines of code) ### START CODE HERE ### grads, cost = propagate(w, b, X, Y) ### END CODE HERE ### # Retrieve derivatives from grads dw = grads['dw'] db = grads['db'] # update rule (≈ 2 lines of code) ### START CODE HERE ### w = w - learning_rate * dw b = b - learning_rate * db ### END CODE HERE ### # Record the costs if i % 100 == 0: costs.append(cost) # Print the cost every 100 training examples if print_cost and i % 100 == 0: print('迭代 %i 次以后的代价为:%f' %(i, cost)) params = {'w': w, 'b': b} grads = {'dw': dw, 'db': db} return params, grads, costs params, grads, costs = optimize(w, b, X, Y, num_iterations= 1000, learning_rate = 0.009, print_cost = True) print ("w = " + str(params["w"])) print ("b = " + str(params["b"])) print ("dw = " + str(grads["dw"])) print ("db = " + str(grads["db"]))
输出;
迭代 0 次以后的代价为:5.801545
迭代 100 次以后的代价为:1.055933
迭代 200 次以后的代价为:0.378303
迭代 300 次以后的代价为:0.363595
迭代 400 次以后的代价为:0.356242
迭代 500 次以后的代价为:0.349210
迭代 600 次以后的代价为:0.342420
迭代 700 次以后的代价为:0.335860
迭代 800 次以后的代价为:0.329517
迭代 900 次以后的代价为:0.323380
w = [[-0.64226437]
[-0.43498153]]
b = 2.2025594747904087
dw = [[ 0.06282959]
[-0.01416124]]
db = -0.04847508604218077
编写预测函数predict(),根据计算出来的w和b对数据集X进行预测。
# GRADED FUNCTION: predict def predict(w, b, X): ''' Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b) Arguments: w -- weights, a numpy array of size (num_px * num_px * 3, 1) b -- bias, a scalar X -- data of size (num_px * num_px * 3, number of examples) Returns: Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X ''' ? m = X.shape[1] Y_prediction = np.zeros((1, m)) w = w.reshape(X.shape[0], 1) # Compute vector "A" predicting the probabilities of a cat being present in the picture ### START CODE HERE ### (≈ 1 line of code) A = sigmoid(np.dot(w.T, X) + b) ### END CODE HERE ### for i in range(A.shape[1]): # Convert probabilities A[0,i] to actual predictions p[0,i] ### START CODE HERE ### (≈ 4 lines of code) if A[0, i] > 0.5: Y_prediction[0, i] = 1 else: Y_prediction[0, i] = 0 ### END CODE HERE ### assert(Y_prediction.shape == (1, m)) return Y_prediction
5-把所有函数合并到一个模型中?
# GRADED FUNCTION: model def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False): """ Builds the logistic regression model by calling the function you've implemented previously Arguments: X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train) Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train) X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test) Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test) num_iterations -- hyperparameter representing the number of iterations to optimize the parameters learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize() print_cost -- Set to true to print the cost every 100 iterations Returns: d -- dictionary containing information about the model. """ ### START CODE HERE ### # initialize parameters with zeros (≈ 1 line of code) w, b = initialize_with_zeros(X_train.shape[0]) # Gradient descent (≈ 1 line of code) params, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost) # Retrieve parameters w and b from dictionary "parameters" w = params['w'] b = params['b'] # Predict test/train set examples (≈ 2 lines of code) Y_prediction_train = predict(w, b, X_train) Y_prediction_test = predict(w, b, X_test) ### END CODE HERE ### # Print train/test Errors print('训练集的预测精确度为:{}%'.format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100)) print('测试集的预测精确度为:{}%'.format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100)) d = {"costs": costs, "Y_prediction_test": Y_prediction_test, "Y_prediction_train" : Y_prediction_train, "w" : w, "b" : b, "learning_rate" : learning_rate, "num_iterations": num_iterations} return d # 计算 d = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations=2000, learning_rate=0.005, print_cost=True)
输出:
迭代 0 次以后的代价为:0.693147
迭代 100 次以后的代价为:0.584508
迭代 200 次以后的代价为:0.466949
迭代 300 次以后的代价为:0.376007
迭代 400 次以后的代价为:0.331463
迭代 500 次以后的代价为:0.303273
迭代 600 次以后的代价为:0.279880
迭代 700 次以后的代价为:0.260042
迭代 800 次以后的代价为:0.242941
迭代 900 次以后的代价为:0.228004
迭代 1000 次以后的代价为:0.214820
迭代 1100 次以后的代价为:0.203078
迭代 1200 次以后的代价为:0.192544
迭代 1300 次以后的代价为:0.183033
迭代 1400 次以后的代价为:0.174399
迭代 1500 次以后的代价为:0.166521
迭代 1600 次以后的代价为:0.159305
迭代 1700 次以后的代价为:0.152667
迭代 1800 次以后的代价为:0.146542
迭代 1900 次以后的代价为:0.140872
训练集的预测精确度为:99.04306220095694%
测试集的预测精确度为:70.0%
# 预测一个图 index = 11 plt.imshow(test_set_x[:, index].reshape((num_px, num_px, 3))) print('y = ' + str(test_set_y[0, index]) + ',你预测的是一个\"' + classes[int(d['Y_prediction_test'][0, index])].decode('utf-8') + '\"图。') # Plot learning curve (with costs) costs = np.squeeze(d['costs']) #squeeze将[array(),array(),...]变为array([,,...]) plt.plot(costs) plt.ylabel('代价') plt.xlabel('迭代次数(百次)') plt.title('学习率=' + str(d['learning_rate'])) plt.show()
6-进一步分析(选择学习率)?
不同的学习率将产生不同的代价和不同的预测结果。 如果学习率太大,代价值就会上下震荡,可能错过最佳值,甚至偏离;如果太小,就需要更多的迭代次数。 代价更低并不意味着模型更好,必须检测是否过拟合,即训练数据的预测精确度远远超过测试数据的预测精确度。 下面将比较不同的学习率带来的学习曲线,看看结果怎样。
learning_rates = [0.01, 0.001, 0.0001] models = {} for i in learning_rates: print('学习率是:' + str(i)) models[str(i)] = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations=1500, learning_rate=i, print_cost=False) print('\n' + '------------------------------------------------' + '\n') for i in learning_rates: plt.plot(np.squeeze(models[str(i)]['costs']), label=str(models[str(i)]['learning_rate'])) plt.ylabel('代价') plt.xlabel('迭代次数(百次)') legend = plt.legend(loc='upper center', shadow=True) frame = legend.get_frame() frame.set_facecolor('0.90') plt.show()
7-用自己的图形进行测试?
## START CODE HERE ## (PUT YOUR IMAGE NAME) my_image = "timg.jpg" # change this to the name of your image file ## END CODE HERE ## ? # We preprocess the image to fit your algorithm. fname = "images/" + my_image image = np.array(ndimage.imread(fname, flatten=False)) my_image = scipy.misc.imresize(image, size=(num_px,num_px)).reshape((1, num_px*num_px*3)).T my_predicted_image = predict(d["w"], d["b"], my_image) ? plt.imshow(image) print("y = " + str(np.squeeze(my_predicted_image)) + ",你的算法预测结果是\"" + classes[int(np.squeeze(my_predicted_image)),].decode("utf-8") + "\"图。")
小结: 对数据集的预处理是非常重要的; 实现了一些函数,initialize(), propagate(), optimize(),并建立了一个模型 model(); 不同学习率对算法结果会带来很大的区别。 可以尝试不同的学习率和迭代次数;试试不同的初始化方法并比较结果;试试数据预处理(如center the data,或每行数据除以标准差)。