轻量级YOLO检测与从头开始的目标追踪

1. YOLO目标检测简介

在YOLO之前，除了R-CNN，还有另一种简单的框架是使用滑动窗口遍历整个输入框架，每个滑动窗口一次输入到单个CNN。这种简单的方法易于实现，但计算成本极高，使其不适合实时目标检测。

YOLO模型通过将整个输入帧传递给单个卷积神经网络(CNN)来反转滑动窗口框架，并输出一个3D张量值，其中每个横截面表示原始输入帧的子划分网格。这个3D张量的通道包含有关是否检测到任何感兴趣的对象、检测到的对象类别，以及每个网格单元中对象的尺寸的信息。

我在下面的图表中说明了YOLO的概念。考虑一张道路上汽车的图片，我们想将图像分割成3 x 3的网格框。因此，我们将创建一个CNN，其输出维度为3 x 3 x 通道数，其中每个通道可以是一个向量：

[物体中心点被检测的概率, X的中心点, Y的中心点, 物体的高度相对于网格框, 物体的宽度相对于网格框, 类别 A, 类别 B, 类别 C]

注意，上述向量仅保存网格单元中单个对象的信息。我们可以通过扩展向量来允许网格单元包含多个对象，附加第二个或更多对象的相关信息。

作者提供的图片。

接下来，我们需要创建一个包含这些输出张量及相关信息的数据集。因此，我们将继续使用OpenCV模拟来创建我们的YOLO数据集。

2. 颗粒仿真与 YOLO 数据收集

我们的数据集将基于一个OpenCV模拟，模拟多种颜色的颗粒从各个方向在黑色画布上移动的情景。该数据集通过强制统一球体半径来简化YOLO检测任务，因此标记的宽度和高度在整个数据集中是相同的。

在仿真的开始，粒子从各个方向开始出现，并以0到180度之间的角度向前“游动”，直到它们沿着另一个边缘消失。在仿真过程中，我们收集每一帧及其相关的边界框。

使用仿真让我们摆脱了手动标注边界框的繁琐过程，加快了测试YOLO模型的过程。生成的代码如下：

import random  
import time  
import cv2  
import numpy as np  
  
frame_height, frame_width = 600, 600  
  
fourcc = cv2.VideoWriter_fourcc(*'XVID')  
out = cv2.VideoWriter('simulation_detection.mp4', fourcc, 50.0, (frame_width, frame_height))  
  
def create_particle():  
    color = (random.randint(0, 255), random.randint(0, 255), random.randint(0, 255))  
    radius = 10  # 粒子的半径  
    uniform_random = np.random.uniform()  
      
    if uniform_random <= 0.25:  
        # 从底部开始  
        position = (random.randint(radius, frame_width - radius), radius)  
        angle = random.randint(0, 180)  
        start_pos = "bottom"  
    elif uniform_random <= 0.5:  
        # 从顶部开始  
        position = (random.randint(radius, frame_width - radius), frame_height - radius)  
        angle = random.randint(180, 360)  
        start_pos = "top"  
    elif uniform_random <= 0.75:  
        # 从左侧开始  
        position = (radius, random.randint(radius, frame_height - radius))  
        angle = random.randint(-90, 90)  
        start_pos = "left"  
    else:  
        # 从右侧开始  
        position = (frame_width - radius, random.randint(radius, frame_height - radius))  
        angle = random.randint(90, 270)  
        start_pos = "right"  
      
    return {'position': position, 'color': color, 'radius': radius, 'angle': angle, 'start_pos': start_pos}  
  
  
def move_particle(particle):  
      
    if particle['start_pos']=='bottom':  
        angle = random.randint(0, 180)  
    elif particle['start_pos']=='top':  
        angle = random.randint(180, 360)  
    elif particle['start_pos']=='left':  
        angle = random.randint(-90, 90)  
    elif particle['start_pos']=='right':  
        angle = random.randint(90, 270)  
      
    angle_rad = np.deg2rad(angle)  
    dx = int(particle['radius'] * np.cos(angle_rad))  
    dy = int(particle['radius'] * np.sin(angle_rad))  
    x, y = particle['position']  
    particle['position'] = (x + dx, y + dy)  
  
def is_off_screen(particle):  
    x, y = particle['position']  
    return x < 1 or x > frame_width-1 or y < 1 or y > frame_height-1  
  
def draw_frame(particles):  
    frame = np.zeros((frame_height, frame_width, 3), dtype=np.uint8)  
    bounding_boxes = []  
    for particle in particles:  
        cv2.circle(frame, particle['position'], particle['radius'], particle['color'], -1)  
        x, y = particle['position']  
        # cv2.rectangle(frame, (x - 2* particle['radius'], y - 2 * particle['radius']), (x + 2 * particle['radius'], y + 2 * particle['radius']), (0, 255, 0), 1)  
        bounding_boxes.append({'x_center': x, 'y_center': y, 'width': particle['radius'], 'height': particle['radius']})  
          
    return frame, bounding_boxes  
  
  
def simulate_particles(total_data):  
    particles = []  
    max_particles = 50  
    total_particles_created = 0  
    timer = 0   
  
    while len(particles) > 0 or total_particles_created < max_particles:  
        if total_particles_created < max_particles and timer % 5 == 0:  
            total_particles_created += 1  
            particles.append(create_particle())  
  
        for particle in particles[:]:  
            move_particle(particle)  
            if is_off_screen(particle):  
                particles.remove(particle)  
  
        frame, bounding_boxes = draw_frame(particles)  
        total_data.append({'frame': frame, 'boundary_boxes': bounding_boxes})  
        out.write(frame)  
        cv2.imshow('Frame', frame)  
        if cv2.waitKey(1) & 0xFF == ord('q'):  
            break  
        timer += 1  
  
    out.release()  
    cv2.destroyAllWindows()  
      
    return total_data  
  
  
total_data = []  
  
for i in range(12):  
    total_data = simulate_particles(total_data)

多彩粒子抖动的模拟。GIF由作者制作。

在收集原始帧及其相关的边界框后，我们将把这些数据安排成一个30x30的网格张量，具有9个输出通道，其中每个网格单元最多可以容纳3个粒子。由于粒子的宽度和高度是固定的，并且只有一种类型的对象需要被检测，因此问题大大简化。因此，对于每个网格单元中的每个粒子，我们只需考虑这个向量：

[物体中点被检测到的概率，X的中点，Y的中点]

每个 30 x 30 的网格张量将成为 y_true 列表中的一个单一数据点。我们还将每个 600 x 600 的帧调整大小为 240 x 240 的数据点，以供 X_true 列表使用。

def convert_data(total_data):  
  
    grid_size = 30  
    cell_size = 600 // grid_size  # 每个单元格为20x20像素  
  
    X_true = np.array([data['frame'] for data in total_data])  
    y_true = np.zeros((len(total_data), grid_size, grid_size, 9))    
  
    for i, data in enumerate(total_data):  
        frame = data['frame']  
        boxes = data['boundary_boxes']  
        for box in boxes:  
            x_center = box['x_center']  
            y_center = box['y_center']  
            width = box['width']  
            height = box['height']  
  
            # 确定网格单元的索引  
            grid_x = int(x_center / cell_size)   
            grid_y = int(y_center / cell_size)   
  
            if y_true[i, grid_y, grid_x, 0] == 0:  # 检查第一个槽位是否可用  
                y_true[i, grid_y, grid_x, 0] = 1  # 粒子存在  
                y_true[i, grid_y, grid_x, 1] = (x_center % cell_size) / cell_size   # 局部 x_center  
                y_true[i, grid_y, grid_x, 2] = (y_center % cell_size) / cell_size   # 局部 y_center  
                  
            elif y_true[i, grid_y, grid_x, 3] == 0:  # 检查第二个槽位是否可用  
                y_true[i, grid_y, grid_x, 3] = 1  # 粒子存在  
                y_true[i, grid_y, grid_x, 4] = (x_center % cell_size) / cell_size   # 局部 x_center  
                y_true[i, grid_y, grid_x, 5] = (y_center % cell_size) / cell_size   # 局部 y_center  
                  
            elif y_true[i, grid_y, grid_x, 6] == 0: # 检查第三个槽位是否可用  
                y_true[i, grid_y, grid_x, 6] = 1  # 粒子存在  
                y_true[i, grid_y, grid_x, 7] = (x_center % cell_size) / cell_size   # 局部 x_center  
                y_true[i, grid_y, grid_x, 8] = (y_center % cell_size) / cell_size   # 局部 y_center  
  
  
    return X_true, y_true  
  
X_true, y_true = convert_data(total_data)

from sklearn.model_selection import train_test_split  
  
resized_images = np.zeros((len(X_true), 240, 240, 3))    
  
for i in range(X_true.shape[0]):  
    resized_images[i] = cv2.resize(X_true[i], (240, 240))  
  
resized_images = resized_images / 255.0  
X_true = resized_images  
  
X_train, X_test, y_train, y_test = train_test_split(  
    X_true,  
    y_true,  
    test_size=0.03,  
    random_state=42  
)

3. 训练YOLO模型

接下来，我们准备实例化我们的YOLO模型，过程非常简单，可以轻松地通过TensorFlow Keras框架实现。除了卷积层外，我们还应用了三个2x2的最大池化层，将240x240的输入减少为30x30的输出。

更有趣的部分是损失函数的设计，其中包含物体的网格单元的损失值相比于没有物体的网格单元的损失值被放大，从而使模型优先“关注”含有物体的网格。因此，忽略没有物体的网格单元对应的损失值。

import tensorflow as tf  
from tensorflow.keras.models import Sequential  
from tensorflow.keras.optimizers import RMSprop, Adam  
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Reshape, Resizing  
  
model = Sequential([  
    Conv2D(32, (3, 3), padding='same', activation='relu'),  # 输入为(240,240,3)  
    MaxPooling2D(2, 2),  
    Conv2D(64, (3, 3), padding='same', activation='relu'),  
    MaxPooling2D(2, 2),  
    Conv2D(128, (3, 3), padding='same', activation='relu'),  
    MaxPooling2D(2, 2),  
    Conv2D(128, (3, 3), padding='same', activation='relu'),  
    Conv2D(256, (3, 3), padding='same', activation='relu'),  
    Conv2D(9, (1, 1), padding='same', activation='sigmoid'),  # 输出为(30, 30, 9)  
])  
  
def yolo_loss(y_true, y_pred):  
    # 存在掩码（物体存在时为1）  
    object_mask = y_true[:,:,:, 0]  
    object_mask_2 = y_true[:,:,:, 3]  
    object_mask_3 = y_true[:,:,:, 6]  
  
    # 缺失掩码（物体不存在时为1）  
    no_object_mask = 1 - object_mask  
    no_object_mask_2 = 1 - object_mask_2  
    no_object_mask_3 = 1 - object_mask_3  
  
    # 置信度的物体损失（物体所在单元的二元交叉熵）  
    object_loss_1 = tf.keras.losses.binary_crossentropy(tf.expand_dims(y_true[:,:,:, 0], -1), tf.expand_dims(y_pred[:,:,:, 0], -1))  
    object_loss_2 = tf.keras.losses.binary_crossentropy(tf.expand_dims(y_true[:,:,:, 3], -1), tf.expand_dims(y_pred[:,:,:, 3], -1))  
    object_loss_3 = tf.keras.losses.binary_crossentropy(tf.expand_dims(y_true[:,:,:, 6], -1), tf.expand_dims(y_pred[:,:,:, 6], -1))  
    object_loss = tf.reduce_sum(object_loss_1 * object_mask) + tf.reduce_sum(object_loss_2 * object_mask_2) + tf.reduce_sum(object_loss_3 * object_mask_3)  
    object_loss *= 10  
  
    # 置信度的缺失物体损失（物体不存在单元的二元交叉熵）  
    no_object_loss_1 = tf.keras.losses.binary_crossentropy(tf.expand_dims(y_true[:,:,:, 0], -1), tf.expand_dims(y_pred[:,:,:, 0], -1))  
    no_object_loss_2 = tf.keras.losses.binary_crossentropy(tf.expand_dims(y_true[:,:,:, 3], -1), tf.expand_dims(y_pred[:,:,:, 3], -1))  
    no_object_loss_3 = tf.keras.losses.binary_crossentropy(tf.expand_dims(y_true[:,:,:, 6], -1), tf.expand_dims(y_pred[:,:,:, 6], -1))  
    no_object_loss = tf.reduce_sum(no_object_loss_1 * no_object_mask) + tf.reduce_sum(no_object_loss_2 * no_object_mask_2) + tf.reduce_sum(no_object_loss_3 * no_object_mask_3)  
  
    # 边界框损失（仅对有物体的单元）  
    bbox_loss = tf.reduce_sum(tf.square(y_true[:,:,:, 1:3] - y_pred[:,:,:, 1:3]) * tf.expand_dims(object_mask, -1))  
    bbox_loss += tf.reduce_sum(tf.square(y_true[:,:,:, 4:6] - y_pred[:,:,:, 4:6]) * tf.expand_dims(object_mask_2, -1))  
    bbox_loss += tf.reduce_sum(tf.square(y_true[:,:,:, 7:9] - y_pred[:,:,:, 7:9]) * tf.expand_dims(object_mask_3, -1))  
  
    # 总损失包括物体和缺失物体的损失  
    total_loss = object_loss + no_object_loss + bbox_loss  
      
    return total_loss  
  
model.compile(  
    optimizer=RMSprop(learning_rate=1e-3),   
    loss=yolo_loss  
)

model.fit(  
X_train,  
y_train,  
epochs=300,  
batch_size=8,  
validation_data=[X_test,y_test],  
verbose = 1,  
callbacks=callbacks  
)

4. 使用YOLO模型进行推理

在使用YOLO模型进行推理时，有时我们需要实现一种称为非极大值抑制的功能，以过滤掉多个指向同一对象的边界框。算法如下：

按照检测到的物体置信度以降序对边界框进行排序。
从具有最高置信度的边界框开始，计算其区域与其他每个边界框的交并比（IOU），如果IOU超过某个阈值，我们将移除该特定边界框的检测。

如果每个网格单元可以包含多个对象，则必须计算在不同网格单元中检测到的对象的非极大抑制。

尽管如此，对于每个对象多个边界框的问题在物体相对于网格单元较大的情况下更为突出。在我们的情况下，粒子的宽度几乎与网格单元相同，因此我们可以安全地省略推理中的非极大抑制。

以下是我们的实现：

    model =  tf.keras.models.load_model("YOLO_particle_detector", custom_objects={'yolo_loss': yolo_loss})

frame_height, frame_width = 600, 600  
  
fourcc = cv2.VideoWriter_fourcc(*'XVID')  # 或 'DIVX'  
out = cv2.VideoWriter('inference_detections.mp4', fourcc, 50.0, (frame_width, frame_height))  
  
  
def convert_to_absolute_coordinates(predictions, cell_size=20):  
      
    absolute_predictions = []  
  
    for y_index, y_grid in enumerate(predictions[0]):  
          
        for x_index, x_grid in enumerate(y_grid):  
            if x_grid[0] > 0.5:  
                x_center = x_grid[1] * cell_size + (x_index * cell_size)  
                y_center = x_grid[2] * cell_size + (y_index * cell_size)  
                width = 40  
                height = 40  
                absolute_predictions.append({'x_center': x_center, 'y_center': y_center, 'width': width, 'height': height, 'confidence': x_grid[0], 'grid': (y_index,x_index)})  
                  
            if x_grid[3] > 0.5:  
                x_center = x_grid[4] * cell_size + (x_index * cell_size)  
                y_center = x_grid[5] * cell_size + (y_index * cell_size)  
                width = 40  
                height = 40  
                absolute_predictions.append({'x_center': x_center, 'y_center': y_center, 'width': width, 'height': height, 'confidence': x_grid[5], 'grid': (y_index,x_index)})  
                  
            if x_grid[6] > 0.5:  
                x_center = x_grid[7] * cell_size + (x_index * cell_size)  
                y_center = x_grid[8] * cell_size + (y_index * cell_size)  
                width = 40  
                height = 40  
                absolute_predictions.append({'x_center': x_center, 'y_center': y_center, 'width': width, 'height': height, 'confidence': x_grid[5], 'grid': (y_index,x_index)})  
              
    return absolute_predictions  
  
  
def detect(frame):  
      
    frame_resized = cv2.resize(frame,(240,240))  
    frame_normalized = frame_resized / 255  
      
    detections = model(np.expand_dims(frame_normalized,axis=0))  
    predictions = convert_to_absolute_coordinates(np.array(detections))  
      
    for prediction in predictions:      
        x = int(prediction['x_center'])  
        y = int(prediction['y_center'])  
        cv2.rectangle(frame, (x - 20, y - 20), (x + 20, y + 20), (0, 255, 0), 1)  
          
    return frame  
  
def draw_particles(particles):  
    frame = np.zeros((frame_height, frame_width, 3), dtype=np.uint8)  
    bounding_boxes = []  
    for particle in particles:  
        cv2.circle(frame, particle['position'], particle['radius'], particle['color'], -1)  
          
    return frame  
  
def test_particles():  
    particles = []  
    max_particles = 50  
    total_particles_created = 0  
    timer = 0     
  
    while len(particles) > 0 or total_particles_created < max_particles:  
        if total_particles_created < max_particles and timer % 5 == 0:  
            total_particles_created += 1  
            particles.append(create_particle())  
  
        for particle in particles[:]:  
            move_particle(particle)  
            if is_off_screen(particle):  
                particles.remove(particle)  
  
        frame = draw_particles(particles)  
        frame = detect(frame)  
          
        out.write(frame)  
        cv2.imshow('Frame', frame)  
        if cv2.waitKey(1) & 0xFF == ord('q'):  
            break  
        timer += 1  
  
    out.release()  
    cv2.destroyAllWindows()

多彩颗粒的抖动仿真。边界框是使用训练模型推断的。GIF作者。

5. 对象跟踪简介

现在我们有了用于物体检测的YOLO模型，我们可以利用该模型进行下游的物体追踪任务。在这里，我们将从零开始构建一个定制的物体追踪模型，而不参考任何先前的资料。

我们的目标跟踪模型将对两个连续帧及其边界框检测进行推理。当一个新的未标记对象进入检测时，模型将在后面的帧中为其分配一个任意（或增量）标签。在第二步中，该帧将被分配为具有所有标记检测的前一帧。新的后续帧将根据其关联的标签检测与前一帧建立关联，模型将在后续帧中的未分配检测标签上进行推理。因此，这个周期持续下去，唯一的对象及其对应的标签在整个画布上被传播。

一个关于对象追踪如何工作的简单示意图。画布假设为8×8的网格。图片来源：作者。

我们随后设计了一个多输入CNN架构，该架构同时接受连续帧、YOLO检测输出和带有分配检测标签的张量（前一帧）作为输入，以生成训练输出，用于预测输出（后一帧）的检测标签。

下面的图表展示了架构的一个简单示意图。

值得注意的是，输入（前一帧）和输出（后一帧）中的检测身份必须进行独热编码，这也意味着我们必须设置每帧可以容纳的最大对象标签数量。

一个对象追踪 CNN 架构的简单示意图。图片来源：作者。

6. 粒子模拟和物体追踪的数据收集

与YOLO目标检测相比，对象跟踪的数据收集非常相似，但还需要模拟粒子标签。在我们的模型中，我们假设第一个出现的粒子应从索引1开始，然后随着新粒子的加入，索引将逐步增加。当一个粒子从视野中消失时，它的标签将被回收并排队。排队的标签将在新的粒子出现时立即重新使用，而不是应用新的增量标签。

代码如下所示：

frame_height, frame_width = 600, 600  
  
fourcc = cv2.VideoWriter_fourcc(*'XVID')  
out = cv2.VideoWriter('simulation_tracking.mp4', fourcc, 40.0, (frame_width, frame_height))  
particles_disappeared = []  
particles_appeared = []  
particle_max_index = 0  
  
def create_particle_tracking():  
      
    global particles_disappeared, particles_appeared, particle_max_index  
      
    color = (random.randint(0, 255), random.randint(0, 255), random.randint(0, 255))  
    radius = 10  # 粒子的半径  
    uniform_random = np.random.uniform()  
      
    if not particles_disappeared:  
        particle_max_index += 1  
        particle_index = particle_max_index   
        particles_appeared.append(particle_index)  
    else:  
        particle_index = particles_disappeared[0]  
        particles_disappeared = particles_disappeared[1:]  
        particles_appeared.append(particle_index)  
      
      
    if uniform_random <= 0.25:  
        # 从底部开始  
        position = (random.randint(radius, frame_width - radius), radius)  
        angle = random.randint(0, 180)  
        start_pos = "bottom"  
    elif uniform_random <= 0.5:  
        # 从顶部开始  
        position = (random.randint(radius, frame_width - radius), frame_height - radius)  
        angle = random.randint(180, 360)  
        start_pos = "top"  
    elif uniform_random <= 0.75:  
        # 从左侧开始  
        position = (radius, random.randint(radius, frame_height - radius))  
        angle = random.randint(-90, 90)  
        start_pos = "left"  
    else:  
        # 从右侧开始  
        position = (frame_width - radius, random.randint(radius, frame_height - radius))  
        angle = random.randint(90, 270)  
        start_pos = "right"  
      
    return {'position': position, 'color': color, 'radius': radius, 'angle': angle, 'start_pos': start_pos, 'particle_index': particle_index}  
  
  
def draw_frame_tracking(particles):  
    frame = np.zeros((frame_height, frame_width, 3), dtype=np.uint8)  
    bounding_boxes = []  
    for particle in particles:  
        cv2.circle(frame, particle['position'], particle['radius'], particle['color'], -1)  
        # 绘制边界框  
        x, y = particle['position']  
        cv2.rectangle(frame, (x - 2* particle['radius'], y - 2 * particle['radius']), (x + 2 * particle['radius'], y + 2 * particle['radius']), (0, 255, 0), 1)  
        cv2.putText(frame,f"#{particle['particle_index']}", (x - particle['radius'] - 10,y - particle['radius'] - 15), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 255),1)  
        confidence = np.random.uniform(0.99,0.9999999)  
  
        bounding_boxes.append({'x_center': x, 'y_center': y, 'width': particle['radius']*4, 'height': particle['radius']*4, 'index': particle['particle_index'], 'confidence': confidence})  
          
    return frame, bounding_boxes  
  
total_data = []  
  
def simulate_particles_tracking():  
    particles = []  
    max_particles = 50  
    total_particles_created = 0  
    timer = 0     
  
    while len(particles) > 0 or total_particles_created < max_particles:  
        if total_particles_created < max_particles and timer % 5 == 0:  
            total_particles_created += 1  
            particles.append(create_particle_tracking())  
  
        for particle in particles[:]:  
            move_particle(particle)  
            if is_off_screen(particle):  
                particles_appeared.remove(particle['particle_index'])  
                particles_disappeared.append(particle['particle_index'])  
                particles.remove(particle)  
  
        frame, bounding_boxes = draw_frame_tracking(particles)  
        total_data.append({'frame': frame, 'boundary_boxes': bounding_boxes})  
        out.write(frame)  
        cv2.imshow('Frame', frame)  
        if cv2.waitKey(1) & 0xFF == ord('q'):  
            break  
        timer += 1  
  
    out.release()  
    cv2.destroyAllWindows()  
      
    return total_data  
  
  
for i in range(80):  
    total_data = simulate_particles_tracking()

在我们收集了原始数据之后，将其处理成我们物体跟踪的CNN架构所需的格式需要更多的工作。需要注意的是，该模型需要几个输入数组，下面的代码基于我们收集的原始Python字典整洁地提取并捕获它们。

在将格式化的数据分割为训练集和测试集时，我们选择按时间顺序而非随机顺序进行分割，以便测试数据与训练数据无任何相关性，因为它们是按顺序收集的。

此外，为了训练大量的模拟数据，我们还将训练集和测试集转化为可以在有限的GPU内存资源上进行训练的生成器。

def resize(X_true):  
  
    resized_images = np.zeros((len(X_true), 240, 240, 3))    
    for i in range(X_true.shape[0]):  
        resized_images[i] = cv2.resize(X_true[i], (240, 240))  
  
    resized_images = resized_images / 255.0      
    return resized_images  
  
def convert_data_tracking(total_data):  
  
    grid_size = 30  
    cell_size = 600 // grid_size  # 每个单元格是20x20像素  
  
    # 初始化数组  
    first_frames = resize(np.array([data['frame'] for data in total_data[:-1]]))  
    second_frames = resize(np.array([data['frame'] for data in total_data[1:]]))  
  
    X_true_frames = np.concatenate([first_frames, second_frames],axis=-1)  
  
    del first_frames  
    del second_frames  
      
    X_true_detection = np.zeros((len(total_data), grid_size, grid_size, 12))  # 每个单元的12个输出  
    y_true = np.zeros((len(total_data)-1, grid_size, grid_size, 24))  
    X_true_first_indices = np.zeros((len(total_data)-1, grid_size, grid_size, 24))  
  
  
    for i, data in tqdm.tqdm(enumerate(total_data)):  
  
        boxes = data['boundary_boxes']  
        for box in boxes:  
            x_center = box['x_center']  
            y_center = box['y_center']  
            confidence = box['confidence']  
            particle_index = box['index']  
              
            # 确定网格单元索引  
            grid_x = int(x_center / cell_size)   
            grid_y = int(y_center / cell_size)  
  
            if X_true_detection[i, grid_y, grid_x, 0] == 0:  # 检查第一个位置是否可用  
                X_true_detection[i, grid_y, grid_x, 0] = confidence  # 粒子存在  
                X_true_detection[i, grid_y, grid_x, 1] = (x_center % cell_size) / cell_size   # 局部 x_center  
                X_true_detection[i, grid_y, grid_x, 2] = (y_center % cell_size) / cell_size   # 局部 y_center  
                X_true_detection[i, grid_y, grid_x, 9] = particle_index  
                  
            elif X_true_detection[i, grid_y, grid_x, 3] == 0:  # 检查第二个位置是否可用  
                X_true_detection[i, grid_y, grid_x, 3] = confidence  # 粒子存在  
                X_true_detection[i, grid_y, grid_x, 4] = (x_center % cell_size) / cell_size   # 局部 x_center  
                X_true_detection[i, grid_y, grid_x, 5] = (y_center % cell_size) / cell_size   # 局部 y_center  
                X_true_detection[i, grid_y, grid_x, 10] = particle_index     
                  
            elif X_true_detection[i, grid_y, grid_x, 6] == 0:   # 检查第三个位置是否可用  
                X_true_detection[i, grid_y, grid_x, 6] = confidence  # 粒子存在  
                X_true_detection[i, grid_y, grid_x, 7] = (x_center % cell_size) / cell_size   # 局部 x_center  
                X_true_detection[i, grid_y, grid_x, 8] = (y_center % cell_size) / cell_size   # 局部 y_center  
                X_true_detection[i, grid_y, grid_x, 11] = particle_index   
                  
      
    for i, data in enumerate(X_true_detection[1:,:,:,9:]):  
        for j, y_index in enumerate(data):  
            for k, x_index in enumerate(y_index):  
                for particle in x_index:  
                    if particle > 0:  
                        y_true[i, j, k, int(particle)-1] = 1  
  
    for i, data in enumerate(X_true_detection[:-1,:,:,9:]):  
        for j, y_index in enumerate(data):  
            for k, x_index in enumerate(y_index):  
                for particle in x_index:  
                    if particle > 0:  
                        X_true_first_indices[i, j, k, int(particle)-1] = 1  
                          
    X_true_first_detection = X_true_detection[:-1,:,:,:9]  
    X_true_second_detection = X_true_detection[1:,:,:,:9]  
  
    del X_true_detection  
  
    X_true_both_detection = np.concatenate([X_true_first_detection, X_true_first_indices],axis=-1)   
    X_true_both_detection = np.concatenate([X_true_both_detection, X_true_second_detection],axis=-1)   
  
    X_true = [X_true_frames, X_true_both_detection]  
  
    return X_true, y_true  
  
X_true, y_true = convert_data_tracking(total_data)

[X_true_frames, X_true_both_detection] =  X_true  
  
split_index = int(len(X_true_frames) * 0.97)  
  
X_frames_train = X_true_frames[:split_index]  
X_frames_test = X_true_frames[split_index:]  
  
X_detections_train = X_true_both_detection[:split_index]  
X_detections_test = X_true_both_detection[split_index:]  
  
y_train = y_true[:split_index]  
y_test = y_true[split_index:]

def train_generator():  
    for i in range(len(X_frames_train)):  
        yield ((X_frames_train[i], X_detections_train[i]), y_train[i])  
  
def test_generator():  
    for i in range(len(X_frames_test)):  
        yield ((X_frames_test[i], X_detections_test[i]), y_test[i])  
          
          
train_dataset = tf.data.Dataset.from_generator(  
    train_generator,  
    output_types=((tf.float32,tf.float32), np.float32),  
    output_shapes=(((None,None,None), (None,None,None)), (None,None,None))  # 根据实际数据形状调整这些形状  
)  
  
# 创建测试数据集  
test_dataset = tf.data.Dataset.from_generator(  
    test_generator,  
    output_types=((tf.float32,tf.float32), np.float32),  
    output_shapes=(((None, None, None), (None,None,None)), (None,None,None))  # 根据实际数据形状调整这些形状  
)  
  
# 定义批量大小和预取  
train_dataset = train_dataset.batch(32).prefetch(tf.data.AUTOTUNE)  
test_dataset = test_dataset.batch(32).prefetch(tf.data.AUTOTUNE)

7. 训练物体跟踪模型

利用功能性API，物体跟踪模型的训练也可以使用TensorFlow Keras框架，如下所示的简单架构。输出类似于YOLO模型，采用30乘30的网格，不同的是现在的输出张量有24个通道，表示画布最多可以容纳24个粒子。

此外，输出中我们使用sigmoid激活函数而不是softmax激活函数，因为每个网格单元最多可以容纳3个粒子。因此，例如，如果一个网格单元的所有通道都是0，除了索引5和12接近于1，这意味着标签为5和12的粒子存在于该网格单元中。

在这个框架中，输出张量将是稀疏的，在推理过程中，我们只检查YOLO模型检测到物体的网格单元。因此，我们设计了一个自定义跟踪损失函数，仅考虑包含至少一个检测到的物体的网格单元中的损失值，然后为对象标签存在的通道缩放损失值。

from tensorflow.keras.optimizers import RMSprop, Adam  
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Reshape, Resizing, concatenate  
  
input_frames = tf.keras.Input(shape=X_frames_train.shape[1:])  
input_detections = tf.keras.Input(shape=X_detections_train.shape[1:])  
  
x = Conv2D(32, (3, 3), padding='same', activation='relu')(input_frames)   # 输入为 (240,240,3)  
x = MaxPooling2D(2, 2)(x)  
x = Conv2D(64, (3, 3), padding='same', activation='relu')(x)  
x = MaxPooling2D(2, 2)(x)  
x = Conv2D(128, (3, 3), padding='same', activation='relu')(x)  
x = MaxPooling2D(2, 2)(x)  
  
x = concatenate([x, input_detections])  
x = Conv2D(256, (3, 3), padding='same', activation='relu')(x)  
  
x = Conv2D(256, (1, 1), padding='same', activation='relu')(x)  
x = Conv2D(128, (1, 1), padding='same', activation='relu')(x)  
output = Conv2D(24, (1, 1), padding='same', activation='sigmoid')(x)    # 输出为 (30, 30, 24)  
  
model = tf.keras.Model(inputs=[input_frames, input_detections], outputs=output)   
  
def tracking_loss(y_true, y_pred):  
    # 存在掩码（对象存在时为1）  
    object_mask = y_true  
  
    # 不存在掩码（对象不存在时为1）  
    no_object_mask = 1 - y_true  
  
    mask = tf.reduce_max(y_true, axis=-1, keepdims=True)  
    mask = tf.cast(mask, dtype=tf.float32)  
    expanded_mask = tf.repeat(mask, repeats=24, axis=-1)  
  
    # 对象损失（对于有对象的单元的二元交叉熵）  
    object_loss = tf.keras.losses.binary_crossentropy(y_true, y_pred)  
    object_loss = tf.reduce_sum(tf.expand_dims(object_loss,-1) * object_mask)   
    object_loss *= 5  
  
    # 无对象损失（对于没有对象的单元的二元交叉熵，针对有对象的网格）  
    no_object_loss = tf.keras.losses.binary_crossentropy(y_true, y_pred)  
    no_object_loss = tf.reduce_sum(tf.expand_dims(no_object_loss,-1) * expanded_mask)   
  
    total_loss = object_loss + no_object_loss  
    return total_loss  
  
def thresholded_accuracy(y_true, y_pred):  
  
    threshold = 0.5  
    y_pred_thresholded = tf.cast(y_pred > threshold, tf.float32)  
    return tf.keras.metrics.binary_accuracy(y_true, y_pred_thresholded)  
  
model.compile(  
    optimizer=RMSprop(learning_rate=1e-3),   
    loss=tracking_loss,  
    metrics=thresholded_accuracy  
)  
  
model.fit(  
    train_dataset,  
    epochs=300,  
    validation_data=test_dataset,  
    verbose = 1,  
    callbacks=callbacks  
)

8. 使用对象追踪模型进行推断

最后，在我们拥有训练好的YOLO和跟踪器模型之后，我们来到了项目的核心部分，这也是编码中最棘手的部分。在我们处理好代码逻辑以确保多模态系统的输入和输出就位后，最困难的部分涉及确保粒子的标签是递增初始化的，并且在单个画布中不重复。

虽然跟踪模型已成功训练以在连续帧之间传播标签，但在推理过程中有两个问题需要手动硬编码：

强制在粒子出现时增量缩放标签。如果有旧粒子离开视图，则将回收的、排队的标签应用于新粒子。当应用跟踪模型而没有任何干预时，标签几乎是随机分配的。
重复标签发生在新粒子出现时。当这种情况发生时，跟踪模型必须重新调整，以根据我们期望的框架给新的粒子打标签。

我们最终能够实现模型的预期行为，经过应用以下详细代码：

    detection_model =  tf.keras.models.load_model("YOLO_particle_detector", custom_objects={'yolo_loss': yolo_loss})  
tracking_model = tf.keras.models.load_model("YOLO_particle_tracker", custom_objects={'tracking_loss': tracking_loss})

frame_height, frame_width = 600, 600  
  
fourcc = cv2.VideoWriter_fourcc(*'XVID')  
out = cv2.VideoWriter('inference_tracking.mp4', fourcc, 40.0, (frame_width, frame_height))  
particles_disappeared = []  
particles_appeared = []  
particles_appeared_pos = []  
particle_max_index = 0  
consecutive_frames = deque(maxlen=2)  
indices_matrix = []  
  
  
def convert_to_absolute_coordinates(predictions, cell_size=20):  
      
    absolute_predictions = []  
  
    for y_index, y_grid in enumerate(predictions[0]):  
          
        for x_index, x_grid in enumerate(y_grid):  
            if x_grid[0] > 0.5:  
                x_center = x_grid[1] * cell_size + (x_index * cell_size)  
                y_center = x_grid[2] * cell_size + (y_index * cell_size)  
                width = 40  
                height = 40  
                absolute_predictions.append({'x_center': x_center, 'y_center': y_center, 'width': width, 'height': height, 'confidence': x_grid[0], 'grid': (y_index,x_index)})  
                  
            if x_grid[3] > 0.5:  
                x_center = x_grid[4] * cell_size + (x_index * cell_size)  
                y_center = x_grid[5] * cell_size + (y_index * cell_size)  
                width = 40  
                height = 40  
                absolute_predictions.append({'x_center': x_center, 'y_center': y_center, 'width': width, 'height': height, 'confidence': x_grid[3], 'grid': (y_index,x_index)})  
                  
            if x_grid[6] > 0.5:  
                x_center = x_grid[7] * cell_size + (x_index * cell_size)  
                y_center = x_grid[8] * cell_size + (y_index * cell_size)  
                width = 40  
                height = 40  
                absolute_predictions.append({'x_center': x_center, 'y_center': y_center, 'width': width, 'height': height, 'confidence': x_grid[6], 'grid': (y_index,x_index)})  
              
    return absolute_predictions  
  
  
def convert_to_tracking_data(predictions, cell_size=20):  
      
    global particles_disappeared, particles_appeared, particle_max_index, particles_appeared_pos  
      
    absolute_predictions = []  
    current_particles = []  
  
    for y_index, y_grid in enumerate(predictions[0]):  
          
        for x_index, x_grid in enumerate(y_grid):  
              
              
            detection_indices = x_grid[9:]              
            sorted_detection_indices = np.argsort(detection_indices)[::-1]  
            sorted_probabilities = np.sort(detection_indices)[::-1]             
              
            if x_grid[0] > 0.5:  
  
                x_center = x_grid[1] * cell_size + (x_index * cell_size)  
                y_center = x_grid[2] * cell_size + (y_index * cell_size)  
                width = 40  
                height = 40  
                particle_index = sorted_detection_indices[0] + 1  
  
                if particle_index not in particles_appeared:  
                    if particle_index > particle_max_index + 1:  
                        if not particles_disappeared:  
                            particle_max_index += 1  
                            if particle_max_index == 25:  
                                particle_max_index = 1  
                            while particle_max_index in particles_appeared:  
                                particle_max_index += 1  
                                if particle_max_index == 25:  
                                    particle_max_index = 1  
                            particle_index = particle_max_index  
                        else:  
                            particle_index = particles_disappeared[0]  
                            particles_disappeared = particles_disappeared[1:]  
  
                while particle_index in particles_appeared:  
  
                    x, y = particles_appeared_pos[particles_appeared.index(particle_index)]  
                    distance = np.sqrt((x_center-x)**2 + (y_center-y)**2)  
  
                    if distance < 20:  
                        break  
                    else:  
                        if not particles_disappeared:  
                            particle_max_index += 1  
                            if particle_max_index == 25:  
                                particle_max_index = 1  
                            particle_index = particle_max_index                 
                        else:  
                            particle_index = particles_disappeared[0]  
                            particles_disappeared = particles_disappeared[1:]  
  
                if particle_index not in particles_appeared:  
                    particles_appeared.append(particle_index)       
                    particles_appeared_pos.append((x_center,y_center))   
                else:  
                     particles_appeared_pos[particles_appeared.index(particle_index)] = (x_center,y_center)  
  
                current_particles.append(particle_index)   
                  
                  
                absolute_predictions.append({'x_center': x_center, 'y_center': y_center, 'width': width, 'height': height, 'confidence': x_grid[0], 'grid': (y_index,x_index), 'index': particle_index})  
                  
            if x_grid[3] > 0.5:  
                x_center = x_grid[4] * cell_size + (x_index * cell_size)  
                y_center = x_grid[5] * cell_size + (y_index * cell_size)  
                width = 40  
                height = 40  
                particle_index = sorted_detection_indices[1] + 1  
  
                if particle_index not in particles_appeared:  
                    if particle_index > particle_max_index + 1:  
                        if not particles_disappeared:  
                            particle_max_index += 1  
                            if particle_max_index == 25:  
                                particle_max_index = 1  
                            while particle_max_index in particles_appeared:  
                                particle_max_index += 1  
                                if particle_max_index == 25:  
                                    particle_max_index = 1  
                            particle_index = particle_max_index  
                        else:  
                            particle_index = particles_disappeared[0]  
                            particles_disappeared = particles_disappeared[1:]  
  
                while particle_index in particles_appeared:  
  
                    x, y = particles_appeared_pos[particles_appeared.index(particle_index)]  
                    distance = np.sqrt((x_center-x)**2 + (y_center-y)**2)  
  
                    if distance < 20:  
                        break  
                    else:  
                        if not particles_disappeared:  
                            particle_max_index += 1  
                            if particle_max_index == 25:  
                                particle_max_index = 1  
                            particle_index = particle_max_index                
                        else:  
                            particle_index = particles_disappeared[0]  
                            particles_disappeared = particles_disappeared[1:]  
                  
                if particle_index not in particles_appeared:  
                    particles_appeared.append(particle_index)       
                    particles_appeared_pos.append((x_center,y_center))   
                else:  
                     particles_appeared_pos[particles_appeared.index(particle_index)] = (x_center,y_center)  
  
  
                current_particles.append(particle_index)                    
                  
                absolute_predictions.append({'x_center': x_center, 'y_center': y_center, 'width': width, 'height': height, 'confidence': x_grid[3], 'grid': (y_index,x_index), 'index': particle_index})  
                  
            if x_grid[6] > 0.5:  
                x_center = x_grid[7] * cell_size + (x_index * cell_size)  
                y_center = x_grid[8] * cell_size + (y_index * cell_size)  
                width = 40  
                height = 40  
                particle_index = sorted_detection_indices[2] + 1  
  
                if particle_index not in particles_appeared:  
                    if particle_index > particle_max_index + 1:  
                        if not particles_disappeared:  
                            particle_max_index += 1  
                            if particle_max_index == 25:  
                                particle_max_index = 1  
                            while particle_max_index in particles_appeared:  
                                particle_max_index += 1  
                                if particle_max_index == 25:  
                                    particle_max_index = 1  
                            particle_index = particle_max_index  
                        else:  
                            particle_index = particles_disappeared[0]  
                            particles_disappeared = particles_disappeared[1:]  
         
                while particle_index in particles_appeared:  
  
                    x, y = particles_appeared_pos[particles_appeared.index(particle_index)]  
                    distance = np.sqrt((x_center-x)**2 + (y_center-y)**2)  
  
                    if distance < 20:  
                        break  
                    else:  
                        if not particles_disappeared:  
                            particle_max_index += 1  
                            if particle_max_index == 25:  
                                particle_max_index = 1  
                            particle_index = particle_max_index                   
                        else:  
                            particle_index = particles_disappeared[0]  
                            particles_disappeared = particles_disappeared[1:]  
                  
                if particle_index not in particles_appeared:  
                    particles_appeared.append(particle_index)       
                    particles_appeared_pos.append((x_center,y_center))   
                else:  
                     particles_appeared_pos[particles_appeared.index(particle_index)] = (x_center,y_center)   
  
  
                current_particles.append(particle_index)                            
                  
                absolute_predictions.append({'x_center': x_center, 'y_center': y_center, 'width': width, 'height': height, 'confidence': x_grid[6], 'grid': (y_index,x_index), 'index': particle_index})  
                  
    for particle_index in particles_appeared:  
        if particle_index not in current_particles:  
            particles_disappeared.append(particle_index)  
            particles_appeared_pos.pop(particles_appeared.index(particle_index))  
            particles_appeared.remove(particle_index)  
              
    return absolute_predictions  
  
  
def create_particle():  
    color = (random.randint(0, 255), random.randint(0, 255), random.randint(0, 255))  
    radius = 10  # 粒子的半径  
    uniform_random = np.random.uniform()  
      
    if uniform_random <= 0.50:  
        # 从底部开始  
        position = (random.randint(radius, int((frame_width - radius)/2))-radius, radius)  
        angle = random.randint(0, 180)  
        start_pos = "bottom"  
    elif uniform_random <= 1.0:  
        # 从顶部开始  
        position = (random.randint(int((frame_width - radius)/2)+radius, frame_width - radius), frame_height - radius)  
        angle = random.randint(180, 360)  
        start_pos = "top"  
      
    return {'position': position, 'color': color, 'radius': radius, 'angle': angle, 'start_pos': start_pos}  
  
  
def move_particle(particle):  
      
    if particle['start_pos']=='bottom':  
        angle = 90  
    elif particle['start_pos']=='top':  
        angle = 270  
    elif particle['start_pos']=='left':  
        angle = 0  
    elif particle['start_pos']=='right':  
        angle = 180  
      
    angle_rad = np.deg2rad(angle)  
    dx = int(particle['radius'] * np.cos(angle_rad))  
    dy = int(particle['radius'] * np.sin(angle_rad))  
    x, y = particle['position']  
    particle['position'] = (x + dx, y + dy)  
    particle['displacement'] = np.sqrt(dx**2 + dy**2)  
  
  
def draw_particles(particles):  
    frame = np.zeros((frame_height, frame_width, 3), dtype=np.uint8)  
    bounding_boxes = []  
    for particle in particles:  
        cv2.circle(frame, particle['position'], particle['radius'], particle['color'], -1)  
          
    return frame  
  
  
def detect(consecutive_frames):  
      
    global particles_disappeared, particles_appeared, particle_max_index, particles_appeared_pos  
      
    frame = consecutive_frames[0]  
  
    frame_resized = cv2.resize(frame,(240,240))  
    frame_normalized = frame_resized / 255  
      
    detections = detection_model(np.expand_dims(frame_normalized,axis=0))  
      
    detection_indices = np.zeros((1, 30, 30, 24))    
    predictions = convert_to_absolute_coordinates(np.array(detections))  
      
    for prediction in predictions:    
          
        if not particles_disappeared:  
            particle_max_index += 1  
            particle_index = particle_max_index  
            particles_appeared.append(particle_index)  
  
        else:  
            particle_index = particles_disappeared[0]  
            particles_disappeared = particles_disappeared[1:]  
            particles_appeared.append(particle_index)    
  
              
        (grid_y, grid_x) = prediction['grid']  
        x = int(prediction['x_center'])  
        y = int(prediction['y_center'])  
  
        particles_appeared_pos.append((x,y))  
          
        cv2.rectangle(frame, (x - 20, y - 20), (x + 20, y + 20), (0, 255, 0), 1)  
        cv2.putText(frame,f"#{particle_index}", (x -20, y - 25), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 255),1)  
          
        detection_indices[0,grid_y, grid_x, particle_index-1] = 1  
           
    X_first_detection = np.concatenate([detections, detection_indices],axis=-1)    
          
    return frame, X_first_detection     
      
      
def detect_and_track(consecutive_frames, X_first_detection):  
      
    global particles_disappeared, particles_appeared, particle_max_index, indices_matrix  
          
    first_frame = consecutive_frames[0]  
    second_frame = consecutive_frames[1]  
      
    first_frame_resized = cv2.resize(first_frame,(240,240))  
    first_frame_normalized = first_frame_resized / 255  
      
    second_frame_resized = cv2.resize(second_frame,(240,240))  
    second_frame_normalized = second_frame_resized / 255  
      
    second_detections = detection_model(np.expand_dims(second_frame_normalized,axis=0))  
    second_detection_indices = np.zeros((1, 30, 30, 24))    
      
    X_detections = np.concatenate([X_first_detection, second_detections], axis=-1)  
      
    X_frames = np.concatenate([first_frame_normalized, second_frame_normalized],axis=-1)  
    X_frames = np.expand_dims(X_frames,axis=0)  
  
    second_indices = tracking_model([X_frames, X_detections])  
  
    indices_matrix.append(second_indices)  
      
    second_data = np.concatenate([second_detections, second_indices], axis=-1)  
      
    predictions = convert_to_tracking_data(second_data)  
  
      
    for prediction in predictions:  
        (grid_y, grid_x) = prediction['grid']  
        x = int(prediction['x_center'])  
        y = int(prediction['y_center'])  
        particle_index = int(prediction['index'])  
        cv2.rectangle(second_frame, (x - 20, y - 20), (x + 20, y + 20), (0, 255, 0), 1)  
        cv2.putText(second_frame,f"#{particle_index}", (x -20, y - 25), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 255),1)  
          
        second_detection_indices[0,grid_y, grid_x, particle_index-1] = 1  
              
    X_first_detection = np.concatenate([second_detections, second_detection_indices],axis=-1)    
  
      
    return second_frame, X_first_detection  
  
  
def test_particles_tracking():  
    particles = []  
    max_particles = 50  
    total_particles_created = 0  
    timer = 0     
  
    while len(particles) > 0 or total_particles_created < max_particles:  
        if total_particles_created < max_particles and timer % 10 == 0:  
            total_particles_created += 1  
            particles.append(create_particle())  
  
        for particle in particles[:]:  
            move_particle(particle)  
            if is_off_screen(particle):  
                particles.remove(particle)  
  
        frame = draw_particles(particles)  
        consecutive_frames.append(frame)  
          
        if len(consecutive_frames) == 1:  
              
            frame_to_display, X_first_detection = detect(consecutive_frames)  
              
        elif len(consecutive_frames) == 2:  
              
            frame_to_display, X_first_detection = detect_and_track(  
                consecutive_frames,  
                X  
            )  
              
        X = X_first_detection  
          
        out.write(frame_to_display)  
          
        cv2.imshow('Frame', frame_to_display)  
        if cv2.waitKey(1) & 0xFF == ord('q'):  
            break  
        timer += 1  
  
    out.release()  
    cv2.destroyAllWindows()  
  
  
test_particles_tracking()

对上下两个车道中多颜色颗粒的检测和跟踪进行模拟，就像道路上的汽车一样。GIF作者提供。

9. 结论和最后的思考

推理结果相当稳健，但还有最后一个问题，这个问题困扰着其他目标跟踪模型，并且很可能是一个活跃的研究领域——当有两个或更多重叠的物体时，跟踪模型更容易出现混乱。

例如，当一个 #4 物体和 #8 物体交叉时，它们的物体标签在相互离开后可能会交换。这是目标跟踪中的一个令人烦恼的问题。解决这个问题的一种方法是使用多个帧（而不是 2）作为输入；然而，如果两个物体长时间保持靠近，然后再分开，这种方法将变得毫无用处。

另一个我想到的想法是使用一个特定长度的嵌入向量（表示裁剪后的对象），可以与模型的中间层连接。这个仍有待观察，我将在不久的将来对其进行实验。

[2024年8月3日] 更新：我成功地对跟踪算法和训练进行了些调整，现在它能够更好地处理重叠对象：

晃动的多色粒子有时会重叠，但跟踪依然稳健。GIF由作者提供。

最后，恭喜你完成了这个教程！我希望这篇文章成功地指导你从头开始编码YOLO和目标跟踪。接下来，我打算开发一个视觉模型，它将使用基于图的视觉变换器（我们称之为GraphViT）。如果你对我的工作感兴趣，请留意！

轻量级YOLO检测与从头开始的目标追踪

轻量级YOLO检测与从头开始的目标追踪

1. YOLO目标检测简介

2. 颗粒仿真与 YOLO 数据收集

3. 训练YOLO模型

4. 使用YOLO模型进行推理

5. 对象跟踪简介

6. 粒子模拟和物体追踪的数据收集

7. 训练物体跟踪模型

8. 使用对象追踪模型进行推断

9. 结论和最后的思考

相关推荐

取消回复欢迎你发表评论:

Google 黑客常用搜索语句一览原力计划

npx简介（npxvip是哪国的）

在 Android 模拟器上运行 ARM 应用（android模拟器原理）

GB28181,B接口协议之SIPRTSPRTPRTMP协议从入门到精通

手机实时提取SIM卡打电话的信令和声音-辅助外设与商用通话方案

安装使用Hoppscotch构建API请求访问与测试

轻松转换!AppleNumbers到Excel的快捷教程

Python自动化办公——后台截图（python 自动截图）

电脑端腾讯文档如何导出excel

网络流媒体经典开源软件宝典webRTC, FFMpeg, SIP_流媒体开发教程

轻量级YOLO检测与从头开始的目标追踪

轻量级YOLO检测与从头开始的目标追踪

1. YOLO目标检测简介

2. 颗粒仿真与 YOLO 数据收集

3. 训练YOLO模型

4. 使用YOLO模型进行推理

5. 对象跟踪简介

6. 粒子模拟和物体追踪的数据收集

7. 训练物体跟踪模型

8. 使用对象追踪模型进行推断

9. 结论和最后的思考

相关推荐

取消回复欢迎 你 发表评论:

Google 黑客常用搜索语句一览 原力计划

npx简介（npxvip是哪国的）

在 Android 模拟器上运行 ARM 应用（android模拟器原理）

GB28181,B接口协议之SIPRTSPRTPRTMP协议从入门到精通

手机实时提取SIM卡打电话的信令和声音-辅助外设与商用通话方案

安装使用Hoppscotch构建API请求访问与测试

轻松转换!AppleNumbers到Excel的快捷教程

Python自动化办公——后台截图（python 自动截图）

电脑端腾讯文档如何导出excel

网络流媒体经典开源软件宝典webRTC, FFMpeg, SIP_流媒体开发教程

取消回复欢迎你发表评论:

Google 黑客常用搜索语句一览原力计划