百度360必应搜狗淘宝本站头条
当前位置:网站首页 > 编程字典 > 正文

轻量级YOLO检测与从头开始的目标追踪

toyiye 2024-09-04 20:14 2 浏览 0 评论

轻量级YOLO检测与从头开始的目标追踪

1. YOLO目标检测简介

在YOLO之前,除了R-CNN,还有另一种简单的框架是使用滑动窗口遍历整个输入框架,每个滑动窗口一次输入到单个CNN。这种简单的方法易于实现,但计算成本极高,使其不适合实时目标检测。

YOLO模型通过将整个输入帧传递给单个卷积神经网络(CNN)来反转滑动窗口框架,并输出一个3D张量值,其中每个横截面表示原始输入帧的子划分网格。这个3D张量的通道包含有关是否检测到任何感兴趣的对象、检测到的对象类别,以及每个网格单元中对象的尺寸的信息。

我在下面的图表中说明了YOLO的概念。考虑一张道路上汽车的图片,我们想将图像分割成3 x 3的网格框。因此,我们将创建一个CNN,其输出维度为3 x 3 x 通道数,其中每个通道可以是一个向量:

[物体中心点被检测的概率, X的中心点, Y的中心点, 物体的高度相对于网格框, 物体的宽度相对于网格框, 类别 A, 类别 B, 类别 C]

注意,上述向量仅保存网格单元中单个对象的信息。我们可以通过扩展向量来允许网格单元包含多个对象,附加第二个或更多对象的相关信息。

作者提供的图片。

作者提供的图片。

接下来,我们需要创建一个包含这些输出张量及相关信息的数据集。因此,我们将继续使用OpenCV模拟来创建我们的YOLO数据集。

2. 颗粒仿真与 YOLO 数据收集

我们的数据集将基于一个OpenCV模拟,模拟多种颜色的颗粒从各个方向在黑色画布上移动的情景。该数据集通过强制统一球体半径来简化YOLO检测任务,因此标记的宽度和高度在整个数据集中是相同的。

在仿真的开始,粒子从各个方向开始出现,并以0到180度之间的角度向前“游动”,直到它们沿着另一个边缘消失。在仿真过程中,我们收集每一帧及其相关的边界框。

使用仿真让我们摆脱了手动标注边界框的繁琐过程,加快了测试YOLO模型的过程。生成的代码如下:

import random  
import time  
import cv2  
import numpy as np  
  
frame_height, frame_width = 600, 600  
  
fourcc = cv2.VideoWriter_fourcc(*'XVID')  
out = cv2.VideoWriter('simulation_detection.mp4', fourcc, 50.0, (frame_width, frame_height))  
  
def create_particle():  
    color = (random.randint(0, 255), random.randint(0, 255), random.randint(0, 255))  
    radius = 10  # 粒子的半径  
    uniform_random = np.random.uniform()  
      
    if uniform_random <= 0.25:  
        # 从底部开始  
        position = (random.randint(radius, frame_width - radius), radius)  
        angle = random.randint(0, 180)  
        start_pos = "bottom"  
    elif uniform_random <= 0.5:  
        # 从顶部开始  
        position = (random.randint(radius, frame_width - radius), frame_height - radius)  
        angle = random.randint(180, 360)  
        start_pos = "top"  
    elif uniform_random <= 0.75:  
        # 从左侧开始  
        position = (radius, random.randint(radius, frame_height - radius))  
        angle = random.randint(-90, 90)  
        start_pos = "left"  
    else:  
        # 从右侧开始  
        position = (frame_width - radius, random.randint(radius, frame_height - radius))  
        angle = random.randint(90, 270)  
        start_pos = "right"  
      
    return {'position': position, 'color': color, 'radius': radius, 'angle': angle, 'start_pos': start_pos}  
  
  
def move_particle(particle):  
      
    if particle['start_pos']=='bottom':  
        angle = random.randint(0, 180)  
    elif particle['start_pos']=='top':  
        angle = random.randint(180, 360)  
    elif particle['start_pos']=='left':  
        angle = random.randint(-90, 90)  
    elif particle['start_pos']=='right':  
        angle = random.randint(90, 270)  
      
    angle_rad = np.deg2rad(angle)  
    dx = int(particle['radius'] * np.cos(angle_rad))  
    dy = int(particle['radius'] * np.sin(angle_rad))  
    x, y = particle['position']  
    particle['position'] = (x + dx, y + dy)  
  
def is_off_screen(particle):  
    x, y = particle['position']  
    return x < 1 or x > frame_width-1 or y < 1 or y > frame_height-1  
  
def draw_frame(particles):  
    frame = np.zeros((frame_height, frame_width, 3), dtype=np.uint8)  
    bounding_boxes = []  
    for particle in particles:  
        cv2.circle(frame, particle['position'], particle['radius'], particle['color'], -1)  
        x, y = particle['position']  
        # cv2.rectangle(frame, (x - 2* particle['radius'], y - 2 * particle['radius']), (x + 2 * particle['radius'], y + 2 * particle['radius']), (0, 255, 0), 1)  
        bounding_boxes.append({'x_center': x, 'y_center': y, 'width': particle['radius'], 'height': particle['radius']})  
          
    return frame, bounding_boxes  
  
  
def simulate_particles(total_data):  
    particles = []  
    max_particles = 50  
    total_particles_created = 0  
    timer = 0   
  
    while len(particles) > 0 or total_particles_created < max_particles:  
        if total_particles_created < max_particles and timer % 5 == 0:  
            total_particles_created += 1  
            particles.append(create_particle())  
  
        for particle in particles[:]:  
            move_particle(particle)  
            if is_off_screen(particle):  
                particles.remove(particle)  
  
        frame, bounding_boxes = draw_frame(particles)  
        total_data.append({'frame': frame, 'boundary_boxes': bounding_boxes})  
        out.write(frame)  
        cv2.imshow('Frame', frame)  
        if cv2.waitKey(1) & 0xFF == ord('q'):  
            break  
        timer += 1  
  
    out.release()  
    cv2.destroyAllWindows()  
      
    return total_data  
  
  
total_data = []  
  
for i in range(12):  
    total_data = simulate_particles(total_data)

多彩粒子抖动的模拟。GIF由作者制作。

在收集原始帧及其相关的边界框后,我们将把这些数据安排成一个30x30的网格张量,具有9个输出通道,其中每个网格单元最多可以容纳3个粒子。由于粒子的宽度和高度是固定的,并且只有一种类型的对象需要被检测,因此问题大大简化。因此,对于每个网格单元中的每个粒子,我们只需考虑这个向量:

[物体中点被检测到的概率,X的中点,Y的中点]

每个 30 x 30 的网格张量将成为 y_true 列表中的一个单一数据点。我们还将每个 600 x 600 的帧调整大小为 240 x 240 的数据点,以供 X_true 列表使用。

def convert_data(total_data):  
  
    grid_size = 30  
    cell_size = 600 // grid_size  # 每个单元格为20x20像素  
  
    X_true = np.array([data['frame'] for data in total_data])  
    y_true = np.zeros((len(total_data), grid_size, grid_size, 9))    
  
    for i, data in enumerate(total_data):  
        frame = data['frame']  
        boxes = data['boundary_boxes']  
        for box in boxes:  
            x_center = box['x_center']  
            y_center = box['y_center']  
            width = box['width']  
            height = box['height']  
  
            # 确定网格单元的索引  
            grid_x = int(x_center / cell_size)   
            grid_y = int(y_center / cell_size)   
  
            if y_true[i, grid_y, grid_x, 0] == 0:  # 检查第一个槽位是否可用  
                y_true[i, grid_y, grid_x, 0] = 1  # 粒子存在  
                y_true[i, grid_y, grid_x, 1] = (x_center % cell_size) / cell_size   # 局部 x_center  
                y_true[i, grid_y, grid_x, 2] = (y_center % cell_size) / cell_size   # 局部 y_center  
                  
            elif y_true[i, grid_y, grid_x, 3] == 0:  # 检查第二个槽位是否可用  
                y_true[i, grid_y, grid_x, 3] = 1  # 粒子存在  
                y_true[i, grid_y, grid_x, 4] = (x_center % cell_size) / cell_size   # 局部 x_center  
                y_true[i, grid_y, grid_x, 5] = (y_center % cell_size) / cell_size   # 局部 y_center  
                  
            elif y_true[i, grid_y, grid_x, 6] == 0: # 检查第三个槽位是否可用  
                y_true[i, grid_y, grid_x, 6] = 1  # 粒子存在  
                y_true[i, grid_y, grid_x, 7] = (x_center % cell_size) / cell_size   # 局部 x_center  
                y_true[i, grid_y, grid_x, 8] = (y_center % cell_size) / cell_size   # 局部 y_center  
  
  
    return X_true, y_true  
  
X_true, y_true = convert_data(total_data)
from sklearn.model_selection import train_test_split  
  
resized_images = np.zeros((len(X_true), 240, 240, 3))    
  
for i in range(X_true.shape[0]):  
    resized_images[i] = cv2.resize(X_true[i], (240, 240))  
  
resized_images = resized_images / 255.0  
X_true = resized_images  
  
X_train, X_test, y_train, y_test = train_test_split(  
    X_true,  
    y_true,  
    test_size=0.03,  
    random_state=42  
)

3. 训练YOLO模型

接下来,我们准备实例化我们的YOLO模型,过程非常简单,可以轻松地通过TensorFlow Keras框架实现。除了卷积层外,我们还应用了三个2x2的最大池化层,将240x240的输入减少为30x30的输出。

更有趣的部分是损失函数的设计,其中包含物体的网格单元的损失值相比于没有物体的网格单元的损失值被放大,从而使模型优先“关注”含有物体的网格。因此,忽略没有物体的网格单元对应的损失值。

import tensorflow as tf  
from tensorflow.keras.models import Sequential  
from tensorflow.keras.optimizers import RMSprop, Adam  
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Reshape, Resizing  
  
model = Sequential([  
    Conv2D(32, (3, 3), padding='same', activation='relu'),  # 输入为(240,240,3)  
    MaxPooling2D(2, 2),  
    Conv2D(64, (3, 3), padding='same', activation='relu'),  
    MaxPooling2D(2, 2),  
    Conv2D(128, (3, 3), padding='same', activation='relu'),  
    MaxPooling2D(2, 2),  
    Conv2D(128, (3, 3), padding='same', activation='relu'),  
    Conv2D(256, (3, 3), padding='same', activation='relu'),  
    Conv2D(9, (1, 1), padding='same', activation='sigmoid'),  # 输出为(30, 30, 9)  
])  
  
def yolo_loss(y_true, y_pred):  
    # 存在掩码(物体存在时为1)  
    object_mask = y_true[:,:,:, 0]  
    object_mask_2 = y_true[:,:,:, 3]  
    object_mask_3 = y_true[:,:,:, 6]  
  
    # 缺失掩码(物体不存在时为1)  
    no_object_mask = 1 - object_mask  
    no_object_mask_2 = 1 - object_mask_2  
    no_object_mask_3 = 1 - object_mask_3  
  
    # 置信度的物体损失(物体所在单元的二元交叉熵)  
    object_loss_1 = tf.keras.losses.binary_crossentropy(tf.expand_dims(y_true[:,:,:, 0], -1), tf.expand_dims(y_pred[:,:,:, 0], -1))  
    object_loss_2 = tf.keras.losses.binary_crossentropy(tf.expand_dims(y_true[:,:,:, 3], -1), tf.expand_dims(y_pred[:,:,:, 3], -1))  
    object_loss_3 = tf.keras.losses.binary_crossentropy(tf.expand_dims(y_true[:,:,:, 6], -1), tf.expand_dims(y_pred[:,:,:, 6], -1))  
    object_loss = tf.reduce_sum(object_loss_1 * object_mask) + tf.reduce_sum(object_loss_2 * object_mask_2) + tf.reduce_sum(object_loss_3 * object_mask_3)  
    object_loss *= 10  
  
    # 置信度的缺失物体损失(物体不存在单元的二元交叉熵)  
    no_object_loss_1 = tf.keras.losses.binary_crossentropy(tf.expand_dims(y_true[:,:,:, 0], -1), tf.expand_dims(y_pred[:,:,:, 0], -1))  
    no_object_loss_2 = tf.keras.losses.binary_crossentropy(tf.expand_dims(y_true[:,:,:, 3], -1), tf.expand_dims(y_pred[:,:,:, 3], -1))  
    no_object_loss_3 = tf.keras.losses.binary_crossentropy(tf.expand_dims(y_true[:,:,:, 6], -1), tf.expand_dims(y_pred[:,:,:, 6], -1))  
    no_object_loss = tf.reduce_sum(no_object_loss_1 * no_object_mask) + tf.reduce_sum(no_object_loss_2 * no_object_mask_2) + tf.reduce_sum(no_object_loss_3 * no_object_mask_3)  
  
    # 边界框损失(仅对有物体的单元)  
    bbox_loss = tf.reduce_sum(tf.square(y_true[:,:,:, 1:3] - y_pred[:,:,:, 1:3]) * tf.expand_dims(object_mask, -1))  
    bbox_loss += tf.reduce_sum(tf.square(y_true[:,:,:, 4:6] - y_pred[:,:,:, 4:6]) * tf.expand_dims(object_mask_2, -1))  
    bbox_loss += tf.reduce_sum(tf.square(y_true[:,:,:, 7:9] - y_pred[:,:,:, 7:9]) * tf.expand_dims(object_mask_3, -1))  
  
    # 总损失包括物体和缺失物体的损失  
    total_loss = object_loss + no_object_loss + bbox_loss  
      
    return total_loss  
  
model.compile(  
    optimizer=RMSprop(learning_rate=1e-3),   
    loss=yolo_loss  
)
model.fit(  
X_train,  
y_train,  
epochs=300,  
batch_size=8,  
validation_data=[X_test,y_test],  
verbose = 1,  
callbacks=callbacks  
)

4. 使用YOLO模型进行推理

在使用YOLO模型进行推理时,有时我们需要实现一种称为非极大值抑制的功能,以过滤掉多个指向同一对象的边界框。算法如下:

  1. 按照检测到的物体置信度以降序对边界框进行排序。
  2. 从具有最高置信度的边界框开始,计算其区域与其他每个边界框的交并比(IOU),如果IOU超过某个阈值,我们将移除该特定边界框的检测。

如果每个网格单元可以包含多个对象,则必须计算在不同网格单元中检测到的对象的非极大抑制。

尽管如此,对于每个对象多个边界框的问题在物体相对于网格单元较大的情况下更为突出。在我们的情况下,粒子的宽度几乎与网格单元相同,因此我们可以安全地省略推理中的非极大抑制。

以下是我们的实现:

    model =  tf.keras.models.load_model("YOLO_particle_detector", custom_objects={'yolo_loss': yolo_loss})
frame_height, frame_width = 600, 600  
  
fourcc = cv2.VideoWriter_fourcc(*'XVID')  # 或 'DIVX'  
out = cv2.VideoWriter('inference_detections.mp4', fourcc, 50.0, (frame_width, frame_height))  
  
  
def convert_to_absolute_coordinates(predictions, cell_size=20):  
      
    absolute_predictions = []  
  
    for y_index, y_grid in enumerate(predictions[0]):  
          
        for x_index, x_grid in enumerate(y_grid):  
            if x_grid[0] > 0.5:  
                x_center = x_grid[1] * cell_size + (x_index * cell_size)  
                y_center = x_grid[2] * cell_size + (y_index * cell_size)  
                width = 40  
                height = 40  
                absolute_predictions.append({'x_center': x_center, 'y_center': y_center, 'width': width, 'height': height, 'confidence': x_grid[0], 'grid': (y_index,x_index)})  
                  
            if x_grid[3] > 0.5:  
                x_center = x_grid[4] * cell_size + (x_index * cell_size)  
                y_center = x_grid[5] * cell_size + (y_index * cell_size)  
                width = 40  
                height = 40  
                absolute_predictions.append({'x_center': x_center, 'y_center': y_center, 'width': width, 'height': height, 'confidence': x_grid[5], 'grid': (y_index,x_index)})  
                  
            if x_grid[6] > 0.5:  
                x_center = x_grid[7] * cell_size + (x_index * cell_size)  
                y_center = x_grid[8] * cell_size + (y_index * cell_size)  
                width = 40  
                height = 40  
                absolute_predictions.append({'x_center': x_center, 'y_center': y_center, 'width': width, 'height': height, 'confidence': x_grid[5], 'grid': (y_index,x_index)})  
              
    return absolute_predictions  
  
  
def detect(frame):  
      
    frame_resized = cv2.resize(frame,(240,240))  
    frame_normalized = frame_resized / 255  
      
    detections = model(np.expand_dims(frame_normalized,axis=0))  
    predictions = convert_to_absolute_coordinates(np.array(detections))  
      
    for prediction in predictions:      
        x = int(prediction['x_center'])  
        y = int(prediction['y_center'])  
        cv2.rectangle(frame, (x - 20, y - 20), (x + 20, y + 20), (0, 255, 0), 1)  
          
    return frame  
  
def draw_particles(particles):  
    frame = np.zeros((frame_height, frame_width, 3), dtype=np.uint8)  
    bounding_boxes = []  
    for particle in particles:  
        cv2.circle(frame, particle['position'], particle['radius'], particle['color'], -1)  
          
    return frame  
  
def test_particles():  
    particles = []  
    max_particles = 50  
    total_particles_created = 0  
    timer = 0     
  
    while len(particles) > 0 or total_particles_created < max_particles:  
        if total_particles_created < max_particles and timer % 5 == 0:  
            total_particles_created += 1  
            particles.append(create_particle())  
  
        for particle in particles[:]:  
            move_particle(particle)  
            if is_off_screen(particle):  
                particles.remove(particle)  
  
        frame = draw_particles(particles)  
        frame = detect(frame)  
          
        out.write(frame)  
        cv2.imshow('Frame', frame)  
        if cv2.waitKey(1) & 0xFF == ord('q'):  
            break  
        timer += 1  
  
    out.release()  
    cv2.destroyAllWindows()

多彩颗粒的抖动仿真。边界框是使用训练模型推断的。GIF作者。

5. 对象跟踪简介

现在我们有了用于物体检测的YOLO模型,我们可以利用该模型进行下游的物体追踪任务。在这里,我们将从零开始构建一个定制的物体追踪模型,而不参考任何先前的资料。

我们的目标跟踪模型将对两个连续帧及其边界框检测进行推理。当一个新的未标记对象进入检测时,模型将在后面的帧中为其分配一个任意(或增量)标签。在第二步中,该帧将被分配为具有所有标记检测的前一帧。新的后续帧将根据其关联的标签检测与前一帧建立关联,模型将在后续帧中的未分配检测标签上进行推理。因此,这个周期持续下去,唯一的对象及其对应的标签在整个画布上被传播。

一个关于对象追踪如何工作的简单示意图。画布假设为8×8的网格。图片来源:作者。

我们随后设计了一个多输入CNN架构,该架构同时接受连续帧、YOLO检测输出和带有分配检测标签的张量(前一帧)作为输入,以生成训练输出,用于预测输出(后一帧)的检测标签。

下面的图表展示了架构的一个简单示意图。

值得注意的是,输入(前一帧)和输出(后一帧)中的检测身份必须进行独热编码,这也意味着我们必须设置每帧可以容纳的最大对象标签数量。

一个对象追踪 CNN 架构的简单示意图。 图片来源:作者。

6. 粒子模拟和物体追踪的数据收集

与YOLO目标检测相比,对象跟踪的数据收集非常相似,但还需要模拟粒子标签。在我们的模型中,我们假设第一个出现的粒子应从索引1开始,然后随着新粒子的加入,索引将逐步增加。当一个粒子从视野中消失时,它的标签将被回收并排队。排队的标签将在新的粒子出现时立即重新使用,而不是应用新的增量标签。

代码如下所示:

frame_height, frame_width = 600, 600  
  
fourcc = cv2.VideoWriter_fourcc(*'XVID')  
out = cv2.VideoWriter('simulation_tracking.mp4', fourcc, 40.0, (frame_width, frame_height))  
particles_disappeared = []  
particles_appeared = []  
particle_max_index = 0  
  
def create_particle_tracking():  
      
    global particles_disappeared, particles_appeared, particle_max_index  
      
    color = (random.randint(0, 255), random.randint(0, 255), random.randint(0, 255))  
    radius = 10  # 粒子的半径  
    uniform_random = np.random.uniform()  
      
    if not particles_disappeared:  
        particle_max_index += 1  
        particle_index = particle_max_index   
        particles_appeared.append(particle_index)  
    else:  
        particle_index = particles_disappeared[0]  
        particles_disappeared = particles_disappeared[1:]  
        particles_appeared.append(particle_index)  
      
      
    if uniform_random <= 0.25:  
        # 从底部开始  
        position = (random.randint(radius, frame_width - radius), radius)  
        angle = random.randint(0, 180)  
        start_pos = "bottom"  
    elif uniform_random <= 0.5:  
        # 从顶部开始  
        position = (random.randint(radius, frame_width - radius), frame_height - radius)  
        angle = random.randint(180, 360)  
        start_pos = "top"  
    elif uniform_random <= 0.75:  
        # 从左侧开始  
        position = (radius, random.randint(radius, frame_height - radius))  
        angle = random.randint(-90, 90)  
        start_pos = "left"  
    else:  
        # 从右侧开始  
        position = (frame_width - radius, random.randint(radius, frame_height - radius))  
        angle = random.randint(90, 270)  
        start_pos = "right"  
      
    return {'position': position, 'color': color, 'radius': radius, 'angle': angle, 'start_pos': start_pos, 'particle_index': particle_index}  
  
  
def draw_frame_tracking(particles):  
    frame = np.zeros((frame_height, frame_width, 3), dtype=np.uint8)  
    bounding_boxes = []  
    for particle in particles:  
        cv2.circle(frame, particle['position'], particle['radius'], particle['color'], -1)  
        # 绘制边界框  
        x, y = particle['position']  
        cv2.rectangle(frame, (x - 2* particle['radius'], y - 2 * particle['radius']), (x + 2 * particle['radius'], y + 2 * particle['radius']), (0, 255, 0), 1)  
        cv2.putText(frame,f"#{particle['particle_index']}", (x - particle['radius'] - 10,y - particle['radius'] - 15), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 255),1)  
        confidence = np.random.uniform(0.99,0.9999999)  
  
        bounding_boxes.append({'x_center': x, 'y_center': y, 'width': particle['radius']*4, 'height': particle['radius']*4, 'index': particle['particle_index'], 'confidence': confidence})  
          
    return frame, bounding_boxes  
  
total_data = []  
  
def simulate_particles_tracking():  
    particles = []  
    max_particles = 50  
    total_particles_created = 0  
    timer = 0     
  
    while len(particles) > 0 or total_particles_created < max_particles:  
        if total_particles_created < max_particles and timer % 5 == 0:  
            total_particles_created += 1  
            particles.append(create_particle_tracking())  
  
        for particle in particles[:]:  
            move_particle(particle)  
            if is_off_screen(particle):  
                particles_appeared.remove(particle['particle_index'])  
                particles_disappeared.append(particle['particle_index'])  
                particles.remove(particle)  
  
        frame, bounding_boxes = draw_frame_tracking(particles)  
        total_data.append({'frame': frame, 'boundary_boxes': bounding_boxes})  
        out.write(frame)  
        cv2.imshow('Frame', frame)  
        if cv2.waitKey(1) & 0xFF == ord('q'):  
            break  
        timer += 1  
  
    out.release()  
    cv2.destroyAllWindows()  
      
    return total_data  
  
  
for i in range(80):  
    total_data = simulate_particles_tracking()

在我们收集了原始数据之后,将其处理成我们物体跟踪的CNN架构所需的格式需要更多的工作。需要注意的是,该模型需要几个输入数组,下面的代码基于我们收集的原始Python字典整洁地提取并捕获它们。

在将格式化的数据分割为训练集和测试集时,我们选择按时间顺序而非随机顺序进行分割,以便测试数据与训练数据无任何相关性,因为它们是按顺序收集的。

此外,为了训练大量的模拟数据,我们还将训练集和测试集转化为可以在有限的GPU内存资源上进行训练的生成器。

def resize(X_true):  
  
    resized_images = np.zeros((len(X_true), 240, 240, 3))    
    for i in range(X_true.shape[0]):  
        resized_images[i] = cv2.resize(X_true[i], (240, 240))  
  
    resized_images = resized_images / 255.0      
    return resized_images  
  
def convert_data_tracking(total_data):  
  
    grid_size = 30  
    cell_size = 600 // grid_size  # 每个单元格是20x20像素  
  
    # 初始化数组  
    first_frames = resize(np.array([data['frame'] for data in total_data[:-1]]))  
    second_frames = resize(np.array([data['frame'] for data in total_data[1:]]))  
  
    X_true_frames = np.concatenate([first_frames, second_frames],axis=-1)  
  
    del first_frames  
    del second_frames  
      
    X_true_detection = np.zeros((len(total_data), grid_size, grid_size, 12))  # 每个单元的12个输出  
    y_true = np.zeros((len(total_data)-1, grid_size, grid_size, 24))  
    X_true_first_indices = np.zeros((len(total_data)-1, grid_size, grid_size, 24))  
  
  
    for i, data in tqdm.tqdm(enumerate(total_data)):  
  
        boxes = data['boundary_boxes']  
        for box in boxes:  
            x_center = box['x_center']  
            y_center = box['y_center']  
            confidence = box['confidence']  
            particle_index = box['index']  
              
            # 确定网格单元索引  
            grid_x = int(x_center / cell_size)   
            grid_y = int(y_center / cell_size)  
  
            if X_true_detection[i, grid_y, grid_x, 0] == 0:  # 检查第一个位置是否可用  
                X_true_detection[i, grid_y, grid_x, 0] = confidence  # 粒子存在  
                X_true_detection[i, grid_y, grid_x, 1] = (x_center % cell_size) / cell_size   # 局部 x_center  
                X_true_detection[i, grid_y, grid_x, 2] = (y_center % cell_size) / cell_size   # 局部 y_center  
                X_true_detection[i, grid_y, grid_x, 9] = particle_index  
                  
            elif X_true_detection[i, grid_y, grid_x, 3] == 0:  # 检查第二个位置是否可用  
                X_true_detection[i, grid_y, grid_x, 3] = confidence  # 粒子存在  
                X_true_detection[i, grid_y, grid_x, 4] = (x_center % cell_size) / cell_size   # 局部 x_center  
                X_true_detection[i, grid_y, grid_x, 5] = (y_center % cell_size) / cell_size   # 局部 y_center  
                X_true_detection[i, grid_y, grid_x, 10] = particle_index     
                  
            elif X_true_detection[i, grid_y, grid_x, 6] == 0:   # 检查第三个位置是否可用  
                X_true_detection[i, grid_y, grid_x, 6] = confidence  # 粒子存在  
                X_true_detection[i, grid_y, grid_x, 7] = (x_center % cell_size) / cell_size   # 局部 x_center  
                X_true_detection[i, grid_y, grid_x, 8] = (y_center % cell_size) / cell_size   # 局部 y_center  
                X_true_detection[i, grid_y, grid_x, 11] = particle_index   
                  
      
    for i, data in enumerate(X_true_detection[1:,:,:,9:]):  
        for j, y_index in enumerate(data):  
            for k, x_index in enumerate(y_index):  
                for particle in x_index:  
                    if particle > 0:  
                        y_true[i, j, k, int(particle)-1] = 1  
  
    for i, data in enumerate(X_true_detection[:-1,:,:,9:]):  
        for j, y_index in enumerate(data):  
            for k, x_index in enumerate(y_index):  
                for particle in x_index:  
                    if particle > 0:  
                        X_true_first_indices[i, j, k, int(particle)-1] = 1  
                          
    X_true_first_detection = X_true_detection[:-1,:,:,:9]  
    X_true_second_detection = X_true_detection[1:,:,:,:9]  
  
    del X_true_detection  
  
    X_true_both_detection = np.concatenate([X_true_first_detection, X_true_first_indices],axis=-1)   
    X_true_both_detection = np.concatenate([X_true_both_detection, X_true_second_detection],axis=-1)   
  
    X_true = [X_true_frames, X_true_both_detection]  
  
    return X_true, y_true  
  
X_true, y_true = convert_data_tracking(total_data)
[X_true_frames, X_true_both_detection] =  X_true  
  
split_index = int(len(X_true_frames) * 0.97)  
  
X_frames_train = X_true_frames[:split_index]  
X_frames_test = X_true_frames[split_index:]  
  
X_detections_train = X_true_both_detection[:split_index]  
X_detections_test = X_true_both_detection[split_index:]  
  
y_train = y_true[:split_index]  
y_test = y_true[split_index:]
def train_generator():  
    for i in range(len(X_frames_train)):  
        yield ((X_frames_train[i], X_detections_train[i]), y_train[i])  
  
def test_generator():  
    for i in range(len(X_frames_test)):  
        yield ((X_frames_test[i], X_detections_test[i]), y_test[i])  
          
          
train_dataset = tf.data.Dataset.from_generator(  
    train_generator,  
    output_types=((tf.float32,tf.float32), np.float32),  
    output_shapes=(((None,None,None), (None,None,None)), (None,None,None))  # 根据实际数据形状调整这些形状  
)  
  
# 创建测试数据集  
test_dataset = tf.data.Dataset.from_generator(  
    test_generator,  
    output_types=((tf.float32,tf.float32), np.float32),  
    output_shapes=(((None, None, None), (None,None,None)), (None,None,None))  # 根据实际数据形状调整这些形状  
)  
  
# 定义批量大小和预取  
train_dataset = train_dataset.batch(32).prefetch(tf.data.AUTOTUNE)  
test_dataset = test_dataset.batch(32).prefetch(tf.data.AUTOTUNE)

7. 训练物体跟踪模型

利用功能性API,物体跟踪模型的训练也可以使用TensorFlow Keras框架,如下所示的简单架构。输出类似于YOLO模型,采用30乘30的网格,不同的是现在的输出张量有24个通道,表示画布最多可以容纳24个粒子。

此外,输出中我们使用sigmoid激活函数而不是softmax激活函数,因为每个网格单元最多可以容纳3个粒子。因此,例如,如果一个网格单元的所有通道都是0,除了索引5和12接近于1,这意味着标签为5和12的粒子存在于该网格单元中。

在这个框架中,输出张量将是稀疏的,在推理过程中,我们只检查YOLO模型检测到物体的网格单元。因此,我们设计了一个自定义跟踪损失函数,仅考虑包含至少一个检测到的物体的网格单元中的损失值,然后为对象标签存在的通道缩放损失值。

from tensorflow.keras.optimizers import RMSprop, Adam  
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Reshape, Resizing, concatenate  
  
input_frames = tf.keras.Input(shape=X_frames_train.shape[1:])  
input_detections = tf.keras.Input(shape=X_detections_train.shape[1:])  
  
x = Conv2D(32, (3, 3), padding='same', activation='relu')(input_frames)   # 输入为 (240,240,3)  
x = MaxPooling2D(2, 2)(x)  
x = Conv2D(64, (3, 3), padding='same', activation='relu')(x)  
x = MaxPooling2D(2, 2)(x)  
x = Conv2D(128, (3, 3), padding='same', activation='relu')(x)  
x = MaxPooling2D(2, 2)(x)  
  
x = concatenate([x, input_detections])  
x = Conv2D(256, (3, 3), padding='same', activation='relu')(x)  
  
x = Conv2D(256, (1, 1), padding='same', activation='relu')(x)  
x = Conv2D(128, (1, 1), padding='same', activation='relu')(x)  
output = Conv2D(24, (1, 1), padding='same', activation='sigmoid')(x)    # 输出为 (30, 30, 24)  
  
model = tf.keras.Model(inputs=[input_frames, input_detections], outputs=output)   
  
def tracking_loss(y_true, y_pred):  
    # 存在掩码(对象存在时为1)  
    object_mask = y_true  
  
    # 不存在掩码(对象不存在时为1)  
    no_object_mask = 1 - y_true  
  
    mask = tf.reduce_max(y_true, axis=-1, keepdims=True)  
    mask = tf.cast(mask, dtype=tf.float32)  
    expanded_mask = tf.repeat(mask, repeats=24, axis=-1)  
  
    # 对象损失(对于有对象的单元的二元交叉熵)  
    object_loss = tf.keras.losses.binary_crossentropy(y_true, y_pred)  
    object_loss = tf.reduce_sum(tf.expand_dims(object_loss,-1) * object_mask)   
    object_loss *= 5  
  
    # 无对象损失(对于没有对象的单元的二元交叉熵,针对有对象的网格)  
    no_object_loss = tf.keras.losses.binary_crossentropy(y_true, y_pred)  
    no_object_loss = tf.reduce_sum(tf.expand_dims(no_object_loss,-1) * expanded_mask)   
  
    total_loss = object_loss + no_object_loss  
    return total_loss  
  
def thresholded_accuracy(y_true, y_pred):  
  
    threshold = 0.5  
    y_pred_thresholded = tf.cast(y_pred > threshold, tf.float32)  
    return tf.keras.metrics.binary_accuracy(y_true, y_pred_thresholded)  
  
model.compile(  
    optimizer=RMSprop(learning_rate=1e-3),   
    loss=tracking_loss,  
    metrics=thresholded_accuracy  
)  
  
model.fit(  
    train_dataset,  
    epochs=300,  
    validation_data=test_dataset,  
    verbose = 1,  
    callbacks=callbacks  
)

8. 使用对象追踪模型进行推断

最后,在我们拥有训练好的YOLO和跟踪器模型之后,我们来到了项目的核心部分,这也是编码中最棘手的部分。在我们处理好代码逻辑以确保多模态系统的输入和输出就位后,最困难的部分涉及确保粒子的标签是递增初始化的,并且在单个画布中不重复。

虽然跟踪模型已成功训练以在连续帧之间传播标签,但在推理过程中有两个问题需要手动硬编码:

  1. 强制在粒子出现时增量缩放标签。如果有旧粒子离开视图,则将回收的、排队的标签应用于新粒子。当应用跟踪模型而没有任何干预时,标签几乎是随机分配的。
  2. 重复标签发生在新粒子出现时。当这种情况发生时,跟踪模型必须重新调整,以根据我们期望的框架给新的粒子打标签。

我们最终能够实现模型的预期行为,经过应用以下详细代码:

    detection_model =  tf.keras.models.load_model("YOLO_particle_detector", custom_objects={'yolo_loss': yolo_loss})  
tracking_model = tf.keras.models.load_model("YOLO_particle_tracker", custom_objects={'tracking_loss': tracking_loss})
frame_height, frame_width = 600, 600  
  
fourcc = cv2.VideoWriter_fourcc(*'XVID')  
out = cv2.VideoWriter('inference_tracking.mp4', fourcc, 40.0, (frame_width, frame_height))  
particles_disappeared = []  
particles_appeared = []  
particles_appeared_pos = []  
particle_max_index = 0  
consecutive_frames = deque(maxlen=2)  
indices_matrix = []  
  
  
def convert_to_absolute_coordinates(predictions, cell_size=20):  
      
    absolute_predictions = []  
  
    for y_index, y_grid in enumerate(predictions[0]):  
          
        for x_index, x_grid in enumerate(y_grid):  
            if x_grid[0] > 0.5:  
                x_center = x_grid[1] * cell_size + (x_index * cell_size)  
                y_center = x_grid[2] * cell_size + (y_index * cell_size)  
                width = 40  
                height = 40  
                absolute_predictions.append({'x_center': x_center, 'y_center': y_center, 'width': width, 'height': height, 'confidence': x_grid[0], 'grid': (y_index,x_index)})  
                  
            if x_grid[3] > 0.5:  
                x_center = x_grid[4] * cell_size + (x_index * cell_size)  
                y_center = x_grid[5] * cell_size + (y_index * cell_size)  
                width = 40  
                height = 40  
                absolute_predictions.append({'x_center': x_center, 'y_center': y_center, 'width': width, 'height': height, 'confidence': x_grid[3], 'grid': (y_index,x_index)})  
                  
            if x_grid[6] > 0.5:  
                x_center = x_grid[7] * cell_size + (x_index * cell_size)  
                y_center = x_grid[8] * cell_size + (y_index * cell_size)  
                width = 40  
                height = 40  
                absolute_predictions.append({'x_center': x_center, 'y_center': y_center, 'width': width, 'height': height, 'confidence': x_grid[6], 'grid': (y_index,x_index)})  
              
    return absolute_predictions  
  
  
def convert_to_tracking_data(predictions, cell_size=20):  
      
    global particles_disappeared, particles_appeared, particle_max_index, particles_appeared_pos  
      
    absolute_predictions = []  
    current_particles = []  
  
    for y_index, y_grid in enumerate(predictions[0]):  
          
        for x_index, x_grid in enumerate(y_grid):  
              
              
            detection_indices = x_grid[9:]              
            sorted_detection_indices = np.argsort(detection_indices)[::-1]  
            sorted_probabilities = np.sort(detection_indices)[::-1]             
              
            if x_grid[0] > 0.5:  
  
                x_center = x_grid[1] * cell_size + (x_index * cell_size)  
                y_center = x_grid[2] * cell_size + (y_index * cell_size)  
                width = 40  
                height = 40  
                particle_index = sorted_detection_indices[0] + 1  
  
                if particle_index not in particles_appeared:  
                    if particle_index > particle_max_index + 1:  
                        if not particles_disappeared:  
                            particle_max_index += 1  
                            if particle_max_index == 25:  
                                particle_max_index = 1  
                            while particle_max_index in particles_appeared:  
                                particle_max_index += 1  
                                if particle_max_index == 25:  
                                    particle_max_index = 1  
                            particle_index = particle_max_index  
                        else:  
                            particle_index = particles_disappeared[0]  
                            particles_disappeared = particles_disappeared[1:]  
  
                while particle_index in particles_appeared:  
  
                    x, y = particles_appeared_pos[particles_appeared.index(particle_index)]  
                    distance = np.sqrt((x_center-x)**2 + (y_center-y)**2)  
  
                    if distance < 20:  
                        break  
                    else:  
                        if not particles_disappeared:  
                            particle_max_index += 1  
                            if particle_max_index == 25:  
                                particle_max_index = 1  
                            particle_index = particle_max_index                 
                        else:  
                            particle_index = particles_disappeared[0]  
                            particles_disappeared = particles_disappeared[1:]  
  
                if particle_index not in particles_appeared:  
                    particles_appeared.append(particle_index)       
                    particles_appeared_pos.append((x_center,y_center))   
                else:  
                     particles_appeared_pos[particles_appeared.index(particle_index)] = (x_center,y_center)  
  
                current_particles.append(particle_index)   
                  
                  
                absolute_predictions.append({'x_center': x_center, 'y_center': y_center, 'width': width, 'height': height, 'confidence': x_grid[0], 'grid': (y_index,x_index), 'index': particle_index})  
                  
            if x_grid[3] > 0.5:  
                x_center = x_grid[4] * cell_size + (x_index * cell_size)  
                y_center = x_grid[5] * cell_size + (y_index * cell_size)  
                width = 40  
                height = 40  
                particle_index = sorted_detection_indices[1] + 1  
  
                if particle_index not in particles_appeared:  
                    if particle_index > particle_max_index + 1:  
                        if not particles_disappeared:  
                            particle_max_index += 1  
                            if particle_max_index == 25:  
                                particle_max_index = 1  
                            while particle_max_index in particles_appeared:  
                                particle_max_index += 1  
                                if particle_max_index == 25:  
                                    particle_max_index = 1  
                            particle_index = particle_max_index  
                        else:  
                            particle_index = particles_disappeared[0]  
                            particles_disappeared = particles_disappeared[1:]  
  
                while particle_index in particles_appeared:  
  
                    x, y = particles_appeared_pos[particles_appeared.index(particle_index)]  
                    distance = np.sqrt((x_center-x)**2 + (y_center-y)**2)  
  
                    if distance < 20:  
                        break  
                    else:  
                        if not particles_disappeared:  
                            particle_max_index += 1  
                            if particle_max_index == 25:  
                                particle_max_index = 1  
                            particle_index = particle_max_index                
                        else:  
                            particle_index = particles_disappeared[0]  
                            particles_disappeared = particles_disappeared[1:]  
                  
                if particle_index not in particles_appeared:  
                    particles_appeared.append(particle_index)       
                    particles_appeared_pos.append((x_center,y_center))   
                else:  
                     particles_appeared_pos[particles_appeared.index(particle_index)] = (x_center,y_center)  
  
  
                current_particles.append(particle_index)                    
                  
                absolute_predictions.append({'x_center': x_center, 'y_center': y_center, 'width': width, 'height': height, 'confidence': x_grid[3], 'grid': (y_index,x_index), 'index': particle_index})  
                  
            if x_grid[6] > 0.5:  
                x_center = x_grid[7] * cell_size + (x_index * cell_size)  
                y_center = x_grid[8] * cell_size + (y_index * cell_size)  
                width = 40  
                height = 40  
                particle_index = sorted_detection_indices[2] + 1  
  
                if particle_index not in particles_appeared:  
                    if particle_index > particle_max_index + 1:  
                        if not particles_disappeared:  
                            particle_max_index += 1  
                            if particle_max_index == 25:  
                                particle_max_index = 1  
                            while particle_max_index in particles_appeared:  
                                particle_max_index += 1  
                                if particle_max_index == 25:  
                                    particle_max_index = 1  
                            particle_index = particle_max_index  
                        else:  
                            particle_index = particles_disappeared[0]  
                            particles_disappeared = particles_disappeared[1:]  
         
                while particle_index in particles_appeared:  
  
                    x, y = particles_appeared_pos[particles_appeared.index(particle_index)]  
                    distance = np.sqrt((x_center-x)**2 + (y_center-y)**2)  
  
                    if distance < 20:  
                        break  
                    else:  
                        if not particles_disappeared:  
                            particle_max_index += 1  
                            if particle_max_index == 25:  
                                particle_max_index = 1  
                            particle_index = particle_max_index                   
                        else:  
                            particle_index = particles_disappeared[0]  
                            particles_disappeared = particles_disappeared[1:]  
                  
                if particle_index not in particles_appeared:  
                    particles_appeared.append(particle_index)       
                    particles_appeared_pos.append((x_center,y_center))   
                else:  
                     particles_appeared_pos[particles_appeared.index(particle_index)] = (x_center,y_center)   
  
  
                current_particles.append(particle_index)                            
                  
                absolute_predictions.append({'x_center': x_center, 'y_center': y_center, 'width': width, 'height': height, 'confidence': x_grid[6], 'grid': (y_index,x_index), 'index': particle_index})  
                  
    for particle_index in particles_appeared:  
        if particle_index not in current_particles:  
            particles_disappeared.append(particle_index)  
            particles_appeared_pos.pop(particles_appeared.index(particle_index))  
            particles_appeared.remove(particle_index)  
              
    return absolute_predictions  
  
  
def create_particle():  
    color = (random.randint(0, 255), random.randint(0, 255), random.randint(0, 255))  
    radius = 10  # 粒子的半径  
    uniform_random = np.random.uniform()  
      
    if uniform_random <= 0.50:  
        # 从底部开始  
        position = (random.randint(radius, int((frame_width - radius)/2))-radius, radius)  
        angle = random.randint(0, 180)  
        start_pos = "bottom"  
    elif uniform_random <= 1.0:  
        # 从顶部开始  
        position = (random.randint(int((frame_width - radius)/2)+radius, frame_width - radius), frame_height - radius)  
        angle = random.randint(180, 360)  
        start_pos = "top"  
      
    return {'position': position, 'color': color, 'radius': radius, 'angle': angle, 'start_pos': start_pos}  
  
  
def move_particle(particle):  
      
    if particle['start_pos']=='bottom':  
        angle = 90  
    elif particle['start_pos']=='top':  
        angle = 270  
    elif particle['start_pos']=='left':  
        angle = 0  
    elif particle['start_pos']=='right':  
        angle = 180  
      
    angle_rad = np.deg2rad(angle)  
    dx = int(particle['radius'] * np.cos(angle_rad))  
    dy = int(particle['radius'] * np.sin(angle_rad))  
    x, y = particle['position']  
    particle['position'] = (x + dx, y + dy)  
    particle['displacement'] = np.sqrt(dx**2 + dy**2)  
  
  
def draw_particles(particles):  
    frame = np.zeros((frame_height, frame_width, 3), dtype=np.uint8)  
    bounding_boxes = []  
    for particle in particles:  
        cv2.circle(frame, particle['position'], particle['radius'], particle['color'], -1)  
          
    return frame  
  
  
def detect(consecutive_frames):  
      
    global particles_disappeared, particles_appeared, particle_max_index, particles_appeared_pos  
      
    frame = consecutive_frames[0]  
  
    frame_resized = cv2.resize(frame,(240,240))  
    frame_normalized = frame_resized / 255  
      
    detections = detection_model(np.expand_dims(frame_normalized,axis=0))  
      
    detection_indices = np.zeros((1, 30, 30, 24))    
    predictions = convert_to_absolute_coordinates(np.array(detections))  
      
    for prediction in predictions:    
          
        if not particles_disappeared:  
            particle_max_index += 1  
            particle_index = particle_max_index  
            particles_appeared.append(particle_index)  
  
        else:  
            particle_index = particles_disappeared[0]  
            particles_disappeared = particles_disappeared[1:]  
            particles_appeared.append(particle_index)    
  
              
        (grid_y, grid_x) = prediction['grid']  
        x = int(prediction['x_center'])  
        y = int(prediction['y_center'])  
  
        particles_appeared_pos.append((x,y))  
          
        cv2.rectangle(frame, (x - 20, y - 20), (x + 20, y + 20), (0, 255, 0), 1)  
        cv2.putText(frame,f"#{particle_index}", (x -20, y - 25), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 255),1)  
          
        detection_indices[0,grid_y, grid_x, particle_index-1] = 1  
           
    X_first_detection = np.concatenate([detections, detection_indices],axis=-1)    
          
    return frame, X_first_detection     
      
      
def detect_and_track(consecutive_frames, X_first_detection):  
      
    global particles_disappeared, particles_appeared, particle_max_index, indices_matrix  
          
    first_frame = consecutive_frames[0]  
    second_frame = consecutive_frames[1]  
      
    first_frame_resized = cv2.resize(first_frame,(240,240))  
    first_frame_normalized = first_frame_resized / 255  
      
    second_frame_resized = cv2.resize(second_frame,(240,240))  
    second_frame_normalized = second_frame_resized / 255  
      
    second_detections = detection_model(np.expand_dims(second_frame_normalized,axis=0))  
    second_detection_indices = np.zeros((1, 30, 30, 24))    
      
    X_detections = np.concatenate([X_first_detection, second_detections], axis=-1)  
      
    X_frames = np.concatenate([first_frame_normalized, second_frame_normalized],axis=-1)  
    X_frames = np.expand_dims(X_frames,axis=0)  
  
    second_indices = tracking_model([X_frames, X_detections])  
  
    indices_matrix.append(second_indices)  
      
    second_data = np.concatenate([second_detections, second_indices], axis=-1)  
      
    predictions = convert_to_tracking_data(second_data)  
  
      
    for prediction in predictions:  
        (grid_y, grid_x) = prediction['grid']  
        x = int(prediction['x_center'])  
        y = int(prediction['y_center'])  
        particle_index = int(prediction['index'])  
        cv2.rectangle(second_frame, (x - 20, y - 20), (x + 20, y + 20), (0, 255, 0), 1)  
        cv2.putText(second_frame,f"#{particle_index}", (x -20, y - 25), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 255),1)  
          
        second_detection_indices[0,grid_y, grid_x, particle_index-1] = 1  
              
    X_first_detection = np.concatenate([second_detections, second_detection_indices],axis=-1)    
  
      
    return second_frame, X_first_detection  
  
  
def test_particles_tracking():  
    particles = []  
    max_particles = 50  
    total_particles_created = 0  
    timer = 0     
  
    while len(particles) > 0 or total_particles_created < max_particles:  
        if total_particles_created < max_particles and timer % 10 == 0:  
            total_particles_created += 1  
            particles.append(create_particle())  
  
        for particle in particles[:]:  
            move_particle(particle)  
            if is_off_screen(particle):  
                particles.remove(particle)  
  
        frame = draw_particles(particles)  
        consecutive_frames.append(frame)  
          
        if len(consecutive_frames) == 1:  
              
            frame_to_display, X_first_detection = detect(consecutive_frames)  
              
        elif len(consecutive_frames) == 2:  
              
            frame_to_display, X_first_detection = detect_and_track(  
                consecutive_frames,  
                X  
            )  
              
        X = X_first_detection  
          
        out.write(frame_to_display)  
          
        cv2.imshow('Frame', frame_to_display)  
        if cv2.waitKey(1) & 0xFF == ord('q'):  
            break  
        timer += 1  
  
    out.release()  
    cv2.destroyAllWindows()  
  
  
test_particles_tracking()

对上下两个车道中多颜色颗粒的检测和跟踪进行模拟,就像道路上的汽车一样。GIF作者提供。

9. 结论和最后的思考

推理结果相当稳健,但还有最后一个问题,这个问题困扰着其他目标跟踪模型,并且很可能是一个活跃的研究领域——当有两个或更多重叠的物体时,跟踪模型更容易出现混乱。

例如,当一个 #4 物体和 #8 物体交叉时,它们的物体标签在相互离开后可能会交换。这是目标跟踪中的一个令人烦恼的问题。解决这个问题的一种方法是使用多个帧(而不是 2)作为输入;然而,如果两个物体长时间保持靠近,然后再分开,这种方法将变得毫无用处。

另一个我想到的想法是使用一个特定长度的嵌入向量(表示裁剪后的对象),可以与模型的中间层连接。这个仍有待观察,我将在不久的将来对其进行实验。

[2024年8月3日] 更新:我成功地对跟踪算法和训练进行了些调整,现在它能够更好地处理重叠对象:

晃动的多色粒子有时会重叠,但跟踪依然稳健。GIF由作者提供。

最后,恭喜你完成了这个教程!我希望这篇文章成功地指导你从头开始编码YOLO和目标跟踪。接下来,我打算开发一个视觉模型,它将使用基于图的视觉变换器(我们称之为GraphViT)。如果你对我的工作感兴趣,请留意!

相关推荐

# Python 3 # Python 3字典Dictionary(1)

Python3字典字典是另一种可变容器模型,且可存储任意类型对象。字典的每个键值(key=>value)对用冒号(:)分割,每个对之间用逗号(,)分割,整个字典包括在花括号({})中,格式如...

Python第八课:数据类型中的字典及其函数与方法

Python3字典字典是另一种可变容器模型,且可存储任意类型对象。字典的每个键值...

Python中字典详解(python 中字典)

字典是Python中使用键进行索引的重要数据结构。它们是无序的项序列(键值对),这意味着顺序不被保留。键是不可变的。与列表一样,字典的值可以保存异构数据,即整数、浮点、字符串、NaN、布尔值、列表、数...

Python3.9又更新了:dict内置新功能,正式版十月见面

机器之心报道参与:一鸣、JaminPython3.8的热乎劲还没过去,Python就又双叒叕要更新了。近日,3.9版本的第四个alpha版已经开源。从文档中,我们可以看到官方透露的对dic...

Python3 基本数据类型详解(python三种基本数据类型)

文章来源:加米谷大数据Python中的变量不需要声明。每个变量在使用前都必须赋值,变量赋值以后该变量才会被创建。在Python中,变量就是变量,它没有类型,我们所说的"类型"是变...

一文掌握Python的字典(python字典用法大全)

字典是Python中最强大、最灵活的内置数据结构之一。它们允许存储键值对,从而实现高效的数据检索、操作和组织。本文深入探讨了字典,涵盖了它们的创建、操作和高级用法,以帮助中级Python开发...

超级完整|Python字典详解(python字典的方法或操作)

一、字典概述01字典的格式Python字典是一种可变容器模型,且可存储任意类型对象,如字符串、数字、元组等其他容器模型。字典的每个键值key=>value对用冒号:分割,每个对之间用逗号,...

Python3.9版本新特性:字典合并操作的详细解读

处于测试阶段的Python3.9版本中有一个新特性:我们在使用Python字典时,将能够编写出更可读、更紧凑的代码啦!Python版本你现在使用哪种版本的Python?3.7分?3.5分?还是2.7...

python 自学,字典3(一些例子)(python字典有哪些基本操作)

例子11;如何批量复制字典里的内容2;如何批量修改字典的内容3;如何批量修改字典里某些指定的内容...

Python3.9中的字典合并和更新,几乎影响了所有Python程序员

全文共2837字,预计学习时长9分钟Python3.9正在积极开发,并计划于今年10月发布。2月26日,开发团队发布了alpha4版本。该版本引入了新的合并(|)和更新(|=)运算符,这个新特性几乎...

Python3大字典:《Python3自学速查手册.pdf》限时下载中

最近有人会想了,2022了,想学Python晚不晚,学习python有前途吗?IT行业行业薪资高,发展前景好,是很多求职群里严重的香饽饽,而要进入这个高薪行业,也不是那么轻而易举的,拿信工专业的大学生...

python学习——字典(python字典基本操作)

字典Python的字典数据类型是基于hash散列算法实现的,采用键值对(key:value)的形式,根据key的值计算value的地址,具有非常快的查取和插入速度。但它是无序的,包含的元素个数不限,值...

324页清华教授撰写【Python 3 菜鸟查询手册】火了,小白入门字典

如何入门学习python...

Python3.9中的字典合并和更新,了解一下

全文共2837字,预计学习时长9分钟Python3.9正在积极开发,并计划于今年10月发布。2月26日,开发团队发布了alpha4版本。该版本引入了新的合并(|)和更新(|=)运算符,这个新特性几乎...

python3基础之字典(python中字典的基本操作)

字典和列表一样,也是python内置的一种数据结构。字典的结构如下图:列表用中括号[]把元素包起来,而字典是用大括号{}把元素包起来,只不过字典的每一个元素都包含键和值两部分。键和值是一一对应的...

取消回复欢迎 发表评论:

请填写验证码