什么是mixup training?
论文mixup: BEYOND EMPIRICAL RISK MINIMIZATION(https://arxiv.org/pdf/1710.09412.pdf)提供了传统图像增强技术的替代方案,如缩放和旋转。通过两个现有实例的加权线性插值形成一个新的实例。
(xi; yi)和(xj; yj)是从训练数据中随机抽取的两个例子,λ∈[0; 1]。实际上,λ是从β分布中随机取样的,即β(α;α)。
α∈[0.1; 0.4]导致性能提高,较小的α产生较少的mixup效果,而对于较大的α,mixup会导致underfitting。
正如您在下图中所看到的,给定小的α= 0.2,β分布采样更接近0或1的值,使得mixup结果更接近于两个实例中的一个。
import numpy as np import scipy.stats as stats import matplotlib.pyplot as plt alpha = 0.2 array = np.random.beta(alpha, alpha, 5000) h = sorted(array) #sorted fit = stats.norm.pdf(h, np.mean(h), np.std(h)) #this is a fitting indeed plt.hist(h,normed=True) plt.title('Beta distribution') plt.show()
mixup training有哪些好处?
传统的数据增强(如Keras ImageDataGenerator类中提供的数据增强)可以持续改进泛化,但该过程依赖于机器学习数据集,因此需要使用专业知识。
此外,数据增强不会模拟不同类的实例之间的关系。
另一方面,
- Mixup是一种与数据无关的数据增强例程。
- 它使决策边界从一个类到另一个类线性地过渡,从而提供更平滑的不确定性估计。
- 它减少了损坏标签的存储
- 增强了对抗性实例的鲁棒性,稳定了生成对抗性网络的训练。
Keras中的Mixup图像数据生成器
让我们实现一个图像数据生成器,它从文件中读取图像,并使用Keras model.fit_generator()开箱即用。Python代码如下:
import numpy as np train_dir = "./data" batch_size = 5 validation_split = 0.3 img_height = 150 img_width = 150 epochs = 10 class MixupImageDataGenerator(): def __init__(self, generator, directory, batch_size, img_height, img_width, alpha=0.2, subset=None): """Constructor for mixup image data generator. Arguments: generator {object} -- An instance of Keras ImageDataGenerator. directory {str} -- Image directory. batch_size {int} -- Batch size. img_height {int} -- Image height in pixels. img_width {int} -- Image width in pixels. Keyword Arguments: alpha {float} -- Mixup beta distribution alpha parameter. (default: {0.2}) subset {str} -- 'training' or 'validation' if validation_split is specified in `generator` (ImageDataGenerator).(default: {None}) """ self.batch_index = 0 self.batch_size = batch_size self.alpha = alpha # First iterator yielding tuples of (x, y) self.generator1 = generator.flow_from_directory(directory, target_size=( img_height, img_width), class_mode="categorical", batch_size=batch_size, shuffle=True, subset=subset) # Second iterator yielding tuples of (x, y) self.generator2 = generator.flow_from_directory(directory, target_size=( img_height, img_width), class_mode="categorical", batch_size=batch_size, shuffle=True, subset=subset) # Number of images across all classes in image directory. self.n = self.generator1.samples def reset_index(self): """Reset the generator indexes array. """ self.generator1._set_index_array() self.generator2._set_index_array() def on_epoch_end(self): self.reset_index() def reset(self): self.batch_index = 0 def __len__(self): # round up return (self.n + self.batch_size - 1) // self.batch_size def get_steps_per_epoch(self): """Get number of steps per epoch based on batch size and number of images. Returns: int -- steps per epoch. """ return self.n // self.batch_size def __next__(self): """Get next batch input/output pair. Returns: tuple -- batch of input/output pair, (inputs, outputs). """ if self.batch_index == 0: self.reset_index() current_index = (self.batch_index * self.batch_size) % self.n if self.n > current_index + self.batch_size: self.batch_index += 1 else: self.batch_index = 0 # random sample the lambda value from beta distribution. l = np.random.beta(self.alpha, self.alpha, self.batch_size) X_l = l.reshape(self.batch_size, 1, 1, 1) y_l = l.reshape(self.batch_size, 1) # Get a pair of inputs and outputs from two iterators. X1, y1 = self.generator1.next() X2, y2 = self.generator2.next() # Perform the mixup. X = X1 * X_l + X2 * (1 - X_l) y = y1 * y_l + y2 * (1 - y_l) return X, y def __iter__(self): while True: yield next(self)
mixup生成器的核心由一对迭代器组成,这些迭代器一次一个地从一个目录中随机采样图像,并在该__next__方法中执行mixup。
然后,您可以创建用于拟合机器学习模型的训练和验证生成器,注意我们不在验证生成器中使用mixup。
train_dir = "./data" batch_size = 5 validation_split = 0.3 img_height = 150 img_width = 150 epochs = 10 # Optional additional image augmentation with ImageDataGenerator. input_imgen = ImageDataGenerator( rescale=1./255, rotation_range=5, width_shift_range=0.05, height_shift_range=0, shear_range=0.05, zoom_range=0, brightness_range=(1, 1.3), horizontal_flip=True, fill_mode='nearest', validation_split=validation_split) # Create training and validation generator. train_generator = MixupImageDataGenerator(generator=input_imgen, directory=train_dir, batch_size=batch_size, img_height=img_height, img_width=img_height, subset='training') validation_generator = input_imgen.flow_from_directory(train_dir, target_size=( img_height, img_width), class_mode="categorical", batch_size=batch_size, shuffle=True, subset='validation') print('training steps: ', train_generator.get_steps_per_epoch()) print('validation steps: ', validation_generator.samples // batch_size)
像往常一样构建Keras图像分类机器学习模型。
from tensorflow.keras.applications import VGG16 conv_base = VGG16(weights='imagenet', include_top=False, input_shape=(img_height, img_width, 3)) from tensorflow.keras import models from tensorflow.keras import layers from tensorflow.keras import optimizers model = models.Sequential() model.add(conv_base) model.add(layers.Flatten()) model.add(layers.Dense(256, activation='relu')) model.add(layers.Dense(4, activation='sigmoid')) conv_base.trainable = False model.compile(optimizer=optimizers.RMSprop(lr=2e-5), loss='binary_crossentropy', metrics=['acc'])
训练机器学习模型
train_generator.reset() validation_generator.reset() # Start the traning. history = model.fit_generator( train_generator, steps_per_epoch=train_generator.get_steps_per_epoch(), validation_data=validation_generator, validation_steps=validation_generator.samples // batch_size, epochs=epochs)
我们可以使用以下Python代码段可视化一批Mixup 图像和标签。
sample_x, sample_y = next(train_generator) for i in range(batch_size): display(image.array_to_img(sample_x[i])) print(sample_y)
结论
您可能认为一次Mixup 超过2个实例可能会导致更好的训练,相反,将三个或更多的例子与从beta分布的多元泛化中取样的权重组合,并不能提供进一步的增益,而是增加了Mixup 的计算成本。此外,仅在具有相同标签的输入之间进行插值并不会导致性能的提高。