介绍
神经网络在过去几年中已被广泛应用于图像识别。最酷的应用之一是Leon A. Gatys最初提出的神经风格迁移算法。该算法使用预训练的模型和简单的优化过程,将两个独立图像的风格和内容组合成一副图像。
直觉
任何图像都可以被认为有两个组成部分:
- 内容:图像中的对象及其空间排列
- 风格:纹理,视觉图案和配色方案
让我们将我们想要复制的内容的图像定义为I_CONTENT,并将我们想要复制的风格图像定义为I_STYLE。然后,目标是生成一个内容为I_CONTENT且风格为I_STYLE的图像(I_TARGET)。
为了实现这一点,首先对一个最初填充随机像素的图像进行迭代修改,直到实现所需的内容和风格。
它是如何工作的?
我们可以看到,对最初有噪声的图像进行迭代修改以匹配所选图像的内容和风格。
但是,我们如何衡量两个图像之间的内容相似性或风格相似度呢?这就是预训练卷积神经网络(CNN)发挥作用的地方。
C???,S???和T???是层l的第i个特征映射的第j个激活,内容、风格和目标图像通过预训练网络传递。H,W和C指的是层l的高度,宽度和通道数。
内容损失
在用于对象检测的巨大图像数据集上训练的卷积神经网络(CNN)的高层学习,显式地识别和学习图像中对象(内容)表示,而不是学习精确的像素值。
因此,在较高层生成类似feature map激活的图像将具有类似的内容。形式上,我们可以将层l测得的内容损失定义为:
这基本上是当通过卷积神经网络(CNN)时图像X和图像Y的各个激活之间的归一化平方误差。
风格损失
风格被测量为特定层处的不同feature map激活之间的相关性。这在Gram矩阵(G)中正式表示为::
Gram Matrix:当图像X通过时,捕获神经网络的第1层的不同feature map之间的相关性
G???捕获层l处第i个和第j个之间的feature map相关性,因此,第l层测得的风格损失定义为:
请注意,可以在网络的不同层测量内容损失和风格损失,以形成最终的损失函数。
c?和s?是分别在第l层计算的内容和风格损失的权重。α和β分别是给予总体内容和风格损失的权重。
在最初的实现中,VGG-19被用作预训练的CNN,并且已经使用了内容和风格层的不同组合。
Python实现
步骤1:我们需要预训练的VGG19模型权重来计算每次迭代时的损失。我们将构建一个字典,将层名称映射到包含每层VGG19图层权重的张量。
import tensorflow as tf from tensorflow import keras import numpy as np import os, shutil import random import matplotlib.pyplot as plt import string import cv2 from google.colab.patches import cv2_imshow import scipy
model.layers包含每层的权重张量。
对于每个卷积层i,model.layers [i] .weights [0]有filter权重,model.layers [i] .weights [1]有偏差权重。
我们将提取这些值并将它们添加到字典weights_dict中。
def getModelWeightsAsDict(): keras.backend.clear_session() tf.reset_default_graph() model = keras.applications.VGG19(input_shape=(IMAGE_SIZE,IMAGE_SIZE,3),include_top=False,weights='imagenet') with keras.backend.get_session() as sess: weights_dict = {} for i in range(len(model.layers)): weights_dict['layer_' + str(i)] = [] for j in range(len(model.layers[i].weights)): weights_dict['layer_' + str(i)].append(sess.run(model.layers[i].weights[j])) tf.reset_default_graph() return weights_dict
第2步:我们将构建一个复制VGG19网络连接和权重的Tensorflow图。但是,我们将把输入张量设置为tf.Variable(),并使用步骤1中存储的权重定义图。这是因为我们不会训练网络的权值,我们只会修改输入张量来最小化定义的损失。
像以前一样,我们将对应于每一层的张量存储在字典中。
def get_nst_model(weights_dict): layers = {} image_shape = (IMAGE_SIZE,IMAGE_SIZE,3) layers['input'] = tf.Variable(initial_value=tf.initializers.random_normal().__call__((1,)+image_shape),expected_shape=(1,)+image_shape, name='nst_output',dtype=tf.float32) layers['conv1_1'] = tf.nn.relu(tf.nn.bias_add(tf.nn.convolution(layers['input'],weights_dict['layer_1'][0],padding='SAME',strides=(1,1)), weights_dict['layer_1'][1])) layers['conv1_2'] = tf.nn.relu(tf.nn.bias_add(tf.nn.convolution(layers['conv1_1'],weights_dict['layer_2'][0],padding='SAME',strides=(1,1)), weights_dict['layer_2'][1])) layers['pool1'] = tf.nn.avg_pool(layers['conv1_2'],ksize=(1,2,2,1),strides=(1,2,2,1),padding='VALID') layers['conv2_1'] = tf.nn.relu(tf.nn.bias_add(tf.nn.convolution(layers['pool1'],weights_dict['layer_4'][0],padding='SAME',strides=(1,1)), weights_dict['layer_4'][1])) layers['conv2_2'] = tf.nn.relu(tf.nn.bias_add(tf.nn.convolution(layers['conv2_1'],weights_dict['layer_5'][0],padding='SAME',strides=(1,1)), weights_dict['layer_5'][1])) layers['pool2'] = tf.nn.avg_pool(layers['conv2_2'],ksize=(1,2,2,1),strides=(1,2,2,1),padding='VALID') layers['conv3_1'] = tf.nn.relu(tf.nn.bias_add(tf.nn.convolution(layers['pool2'],weights_dict['layer_7'][0],padding='SAME',strides=(1,1)), weights_dict['layer_7'][1])) layers['conv3_2'] = tf.nn.relu(tf.nn.bias_add(tf.nn.convolution(layers['conv3_1'],weights_dict['layer_8'][0],padding='SAME',strides=(1,1)), weights_dict['layer_8'][1])) layers['conv3_3'] = tf.nn.relu(tf.nn.bias_add(tf.nn.convolution(layers['conv3_2'],weights_dict['layer_9'][0],padding='SAME',strides=(1,1)), weights_dict['layer_9'][1])) layers['conv3_4'] = tf.nn.relu(tf.nn.bias_add(tf.nn.convolution(layers['conv3_3'],weights_dict['layer_10'][0],padding='SAME',strides=(1,1)), weights_dict['layer_10'][1])) layers['pool3'] = tf.nn.avg_pool(layers['conv3_4'],ksize=(1,2,2,1),strides=(1,2,2,1),padding='VALID') layers['conv4_1'] = tf.nn.relu(tf.nn.bias_add(tf.nn.convolution(layers['pool3'],weights_dict['layer_12'][0],padding='SAME',strides=(1,1)), weights_dict['layer_12'][1])) layers['conv4_2'] = tf.nn.relu(tf.nn.bias_add(tf.nn.convolution(layers['conv4_1'],weights_dict['layer_13'][0],padding='SAME',strides=(1,1)), weights_dict['layer_13'][1])) layers['conv4_3'] = tf.nn.relu(tf.nn.bias_add(tf.nn.convolution(layers['conv4_2'],weights_dict['layer_14'][0],padding='SAME',strides=(1,1)), weights_dict['layer_14'][1])) layers['conv4_4'] = tf.nn.relu(tf.nn.bias_add(tf.nn.convolution(layers['conv4_3'],weights_dict['layer_15'][0],padding='SAME',strides=(1,1)), weights_dict['layer_15'][1])) layers['pool4'] = tf.nn.avg_pool(layers['conv4_4'],ksize=(1,2,2,1),strides=(1,2,2,1),padding='VALID') layers['conv5_1'] = tf.nn.relu(tf.nn.bias_add(tf.nn.convolution(layers['pool4'],weights_dict['layer_17'][0],padding='SAME',strides=(1,1)), weights_dict['layer_17'][1])) layers['conv5_2'] = tf.nn.relu(tf.nn.bias_add(tf.nn.convolution(layers['conv5_1'],weights_dict['layer_18'][0],padding='SAME',strides=(1,1)), weights_dict['layer_18'][1])) layers['conv5_3'] = tf.nn.relu(tf.nn.bias_add(tf.nn.convolution(layers['conv5_2'],weights_dict['layer_19'][0],padding='SAME',strides=(1,1)), weights_dict['layer_19'][1])) layers['conv5_4'] = tf.nn.relu(tf.nn.bias_add(tf.nn.convolution(layers['conv5_3'],weights_dict['layer_20'][0],padding='SAME',strides=(1,1)), weights_dict['layer_20'][1])) layers['pool5'] = tf.nn.avg_pool(layers['conv5_4'],ksize=(1,2,2,1),strides=(1,2,2,1),padding='VALID') return layers
第3步:让我们加载I_CONTENT和I_STYLE图像。我们将通过减去像素方式来预处理这些。
此外,我们定义save_image()函数,它接受模型输出图像,添加MEANS,将值剪辑到范围[0,255]并保存图像。
MEANS = np.array([123.68, 116.779, 103.939]).reshape((1,1,1,3)) def load_image(filename): image = cv2.imread(filename) image = cv2.resize(image,(IMAGE_SIZE,IMAGE_SIZE)) cv2_imshow(image) image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) image = image.reshape((1,)+image.shape) image = image - MEANS return image def save_image(image,filename): image = image + MEANS image = np.clip(image[0], 0, 255).astype('uint8') scipy.misc.imsave(filename, image) return image
第4步:让我们定义层(和相应的权重)来计算内容和风格损失。
CONTENT_LAYERS = [ ('conv4_2', 1.0), ] STYLE_LAYERS = [ ('conv1_1', 0.2), ('conv2_1', 0.2), ('conv3_1', 0.2), ('conv4_1', 0.2), ('conv5_1', 0.2) ]
让我们定义损失项和优化器以最小化损失:
def getContentLoss(layers,content_target,content_layer_name): m,H,W,C = content_target.shape return tf.reduce_sum(tf.pow(content_target-layers[content_layer_name],2))/(4.0*C*H*W) # Calculate tensor for content loss J_content = 0.0 with tf.Session() as sess: for content_layer_name, weight in CONTENT_LAYERS: content_layer = layers[content_layer_name] content_target = sess.run(content_layer,feed_dict={layers['input']:content_image}) J_content = J_content + weight * getContentLoss(layers,content_target, content_layer_name) print('Content loss defined...\n') def getStyleLoss(layers,style_target,style_layer_name): m,H,W,C = style_target.shape style_target = tf.transpose(tf.reshape(style_target,[H*W,C])) gram_target = tf.matmul(style_target,tf.transpose(style_target)) style_tensor = tf.transpose(tf.reshape(layers[style_layer_name],[H*W,C])) gram_tensor = tf.matmul(style_tensor,tf.transpose(style_tensor)) return tf.reduce_sum(tf.pow(gram_tensor-gram_target,2))/(4.0*C*C*H*H*W*W) # Calculate tensor for style loss J_style = 0.0 with tf.Session() as sess: for style_layer_name, weight in STYLE_LAYERS: style_layer = layers[style_layer_name] style_target = sess.run(style_layer,feed_dict={layers['input']:style_image}) J_style = J_style + weight * getStyleLoss(layers,style_target,style_layer_name) print('Style loss defined...\n') def getTotalLoss(J_content,J_style,alpha,beta): return J_content*alpha + J_style*beta # Calculate tensor for total loss J_total = getTotalLoss(J_content,J_style,alpha=alpha,beta=beta) # Define loss optimizer optimizer = tf.train.AdamOptimizer(learning_rate).minimize(J_total)
第5步:我们需要做的就是用白噪声图像初始化layer['input'] 并运行优化器。
每100次迭代,我们将使用可定义的路径前缀保存图像。这是在前面定义的save_image()函数中完成的。
我已将所有这些包装在函数run_style_transfer()中,您可以将文件路径传递给内容和风格图像,内容和风格权重(α和β),学习率和epochs数。
def run_style_transfer(content_image_filename, style_image_filename, initialImage=None, alpha=10,beta=1e-1, epochs=1200, learning_rate=5.0, prefix=None): if prefix is None: print('Output file prefix not defined...') return content_image = load_image(content_image_filename) style_image = load_image(style_image_filename) weights_dict = getModelWeightsAsDict() tf.reset_default_graph() layers = get_nst_model(weights_dict) print('Model graph generated...\n') # Calculate tensor for content loss J_content = 0.0 with tf.Session() as sess: for content_layer_name, weight in CONTENT_LAYERS: content_layer = layers[content_layer_name] content_target = sess.run(content_layer,feed_dict={layers['input']:content_image}) J_content = J_content + weight * getContentLoss(layers,content_target, content_layer_name) print('Content loss defined...\n') # Calculate tensor for style loss J_style = 0.0 with tf.Session() as sess: for style_layer_name, weight in STYLE_LAYERS: style_layer = layers[style_layer_name] style_target = sess.run(style_layer,feed_dict={layers['input']:style_image}) J_style = J_style + weight * getStyleLoss(layers,style_target,style_layer_name) print('Style loss defined...\n') J_total = getTotalLoss(J_content,J_style,alpha=alpha,beta=beta) optimizer = tf.train.AdamOptimizer(learning_rate).minimize(J_total) print('Losses defined...\n') print('Trainable variables:\n',tf.trainable_variables()) with tf.Session() as sess: # Initialize image sess.run(tf.global_variables_initializer()) if initialImage is not None: sess.run(layers['input'].assign(initialImage)) else: sess.run(layers['input'].assign(generate_noise_image())) for epoch in range(epochs): epoch_loss, epoch_content_loss, epoch_style_loss, _ = sess.run([J_total,J_content,J_style,optimizer]) if (epoch+1) % 100 == 0: generated_image = sess.run(layers['input']) generated_image = save_image(generated_image,prefix + '_' + str(epoch+1) + '.jpg') print('Loss after epoch %d: \nT: %f, \nC: %f, \nS: %f'%(epoch,epoch_loss,epoch_content_loss,epoch_style_loss)) generated_image = cv2.cvtColor(generated_image, cv2.COLOR_RGB2BGR) cv2_imshow(generated_image)
设置GPU
# CHECK AND SETUP GPU device_name = tf.test.gpu_device_name() print('Found GPU at: {}'.format(device_name)) config = tf.ConfigProto() config.gpu_options.allow_growth = True
调用的Python示例代码
# Example Usage IMAGE_SIZE = 900 content_image_filename = './knight.jpg' style_image_filename = './/patterned_leaves.jpg' with tf.device('/gpu:0'): run_style_transfer(content_image_filename=content_image_filename, style_image_filename=style_image_filename, epochs=3000, alpha=18, beta=1e-1, learning_rate=10.0, prefix='knight-patterned_leaves')
将α设定在[10,20]范围内,β为~1e-1,学习率为~5.0。能够在大约1000次迭代中产生良好的结果。