YOLOv3目标检测代码精解

前言

通过对YOLOv3代码整个流程的讲解,从而更好的把握对YOLO的熟练度。这里使用的代码基于ultralytics/yolov3版本3.0的代码。

git clone -b v3.0 --depth 1 http://github.com/ultralytics/yolov3.git

整个项目目录如下图所示:

|   .gitignore
|   detect.py
|   LICENSE
|   models.py
|   README.md
|   requirements.txt
|   test.py
|   train.py
|
+---cfg
|       coco.data
|       yolov3-spp.cfg
|       yolov3-tiny.cfg
|       yolov3.cfg
|
+---data
|   |   coco.names
|   |   coco_paper.names
|   |   get_coco_dataset.sh
|   |
|   \---samples
|           zidane.jpg
|
+---utils
|       datasets.py
|       gcp.sh
|       parse_config.py
|       torch_utils.py
|       utils.py
|
\---weights
        download_yolov3_weights.sh

其中cfg目录下存放是YOLOv3模型配置文件,可以在运行时指定对应配置构建相应的模型。而data目录是相应模型需要用到的数据,utils下是相应工具函数,weights下就是模型的权重文件。

目标检测代码

我们从detect.py模块入手查看其整个调用的流程,官方给出的调用方式如下:

detect.py --cfg cfg/yolov3.cfg --weights weights/yolov3.pt

相应的代码如下:

    parser = argparse.ArgumentParser()
    parser.add_argument('--cfg', type=str, default='cfg/yolov3.cfg', help='cfg file path')
    parser.add_argument('--weights', type=str, default='weights/yolov3.weights', help='path to weights file')
    parser.add_argument('--images', type=str, default='data/samples', help='path to images')
    parser.add_argument('--img-size', type=int, default=32 * 13, help='size of each image dimension')
    parser.add_argument('--conf-thres', type=float, default=0.50, help='object confidence threshold')
    parser.add_argument('--nms-thres', type=float, default=0.45, help='iou threshold for non-maximum suppression')
    opt = parser.parse_args()
    print(opt)

    with torch.no_grad():
        detect(
            opt.cfg,
            opt.weights,
            opt.images,
            img_size=opt.img_size,
            conf_thres=opt.conf_thres,
            nms_thres=opt.nms_thres
        )

从命令行中接收相应的参数:

cfg,配置文件config的路径
weights,模型权重文件的路径
images,推理的图片路径
img-size,每张图片维度,默认为32x13,即416像素
conf-thres,置信度阈值,默认为0.5
nms-thres,NMS阈值,默认为0.45

之后其调用detect函数:

    device = torch_utils.select_device()
    if os.path.exists(output):
        shutil.rmtree(output)  # delete output folder
    os.makedirs(output)  # make new output folder

首先选择运行的设备,其代码位于utils/torch_utils.py模块的select_device函数,返回值是cpu或cuda。接着判断输出目录是否存在,若存在则删除进行重建。

# Initialize model
    model = Darknet(cfg, img_size)

    # Load weights
    if weights.endswith('.pt'):  # pytorch format
        if weights.endswith('yolov3.pt') and not os.path.exists(weights):
            if (platform == 'darwin') or (platform == 'linux'):
                os.system('wget https://storage.googleapis.com/ultralytics/yolov3.pt -O ' + weights)
            model.load_state_dict(torch.load(weights, map_location='cpu')['model'])
    else:  # darknet format
        _ = load_darknet_weights(model, weights)

之后开始初始化模型,并加载权重。其中模型类位于models.py下的Darknet类。

class Darknet(nn.Module):
    """YOLOv3 object detection model"""

    def __init__(self, cfg_path, img_size=416):
        super(Darknet, self).__init__()

        self.module_defs = parse_model_cfg(cfg_path)
        self.module_defs[0]['cfg'] = cfg_path
        self.module_defs[0]['height'] = img_size
        self.hyperparams, self.module_list = create_modules(self.module_defs)
        self.img_size = img_size
        self.loss_names = ['loss', 'xy', 'wh', 'conf', 'cls', 'nT']
        self.losses = []

此时将解析传入cfg参数中的配置文件的内容,通过调用utils/parse_config.py模块中的parse_model_cfg函数。在该类中调用同模块下create_modules函数根据配置构建相应的网络结构。

初始化网络模型结构后,下一步就是要加载模型的权重了。此时判断文件的名称是否为.pt后缀,若该文件不存在且为yolov3.pt则尝试下载对应的网络权重文件模型。由于YOLOv3默认的权重文件后缀为.weights,则原作者网站已经无法下载该文件了。如果手头上有相应的文件则直接调用load_darknet_weights函数加载Darknet格式的权重。

紧接着将模型设置为评估模式并开始加载图片。

    model.to(device).eval()
    # Set Dataloader
    if webcam:
        save_images = False
        dataloader = LoadWebcam(img_size=img_size)
    else:
        dataloader = LoadImages(images, img_size=img_size)

对于网络摄像头,其调用utils/datasets.py模块下LoadWebcam函数,否则调用LoadImages函数。可以在调用detect函数时指定其webcam参数为True来使用网络摄像头。

下一步就是加载数据目录data下的类别和颜色:

    # Get classes and colors
    classes = load_classes(parse_data_cfg('cfg/coco.data')['names'])
    colors = [[random.randint(0, 255), random.randint(0, 255), random.randint(0, 255)] for _ in range(len(classes))]

其中YOLOv3中coco数据集共有80个类别,而颜色则为随机分配。下面就要开始遍历图片数据对其进行预测了:

for i, (path, img, im0) in enumerate(dataloader):
    t = time.time()
    if webcam:
      print('webcam frame %g: ' % (i + 1), end='')
    else:
      print('image %g/%g %s: ' % (i + 1, len(dataloader), path), end='')
    save_path = str(Path(output) / Path(path).name)

    # Get detections
    img = torch.from_numpy(img).unsqueeze(0).to(device)
    if ONNX_EXPORT:
        torch.onnx.export(model, img, 'weights/model.onnx', verbose=True)
        return

这里获取图片路径后将其与输出目录拼接在一起,并通过torch加载图片数据,让其为4维张量。如果设置了ONNX导出选项,则直接不进行后续操作,直接将模型导出了。

    pred = model(img)
    pred = pred[pred[:, :, 4] > conf_thres]  # remove boxes < threshold

将图片张量传入模型中,并筛选大于置信度阈值的数据。

    if len(pred) > 0:
        # Run NMS on predictions
        detections = non_max_suppression(pred.unsqueeze(0), conf_thres, nms_thres)[0]

        # Rescale boxes from 416 to true image size
        scale_coords(img_size, detections[:, :4], im0.shape).round()

        # Print results to screen
        unique_classes = detections[:, -1].cpu().unique()
        for c in unique_classes:
            n = (detections[:, -1].cpu() == c).sum()
            print('%g %ss' % (n, classes[int(c)]), end=', ')

        # Draw bounding boxes and labels of detections
        for x1, y1, x2, y2, conf, cls_conf, cls in detections:
            if save_txt:  # Write to file
                with open(save_path + '.txt', 'a') as file:
                    file.write('%g %g %g %g %g %g\n' % (x1, y1, x2, y2, cls, cls_conf * conf))

        # Add bbox to the image
        label = '%s %.2f' % (classes[int(cls)], conf)
        plot_one_box([x1, y1, x2, y2], im0, label=label, color=colors[int(cls)])

如果预测的结果大于0个,说明有多个预选框,此时需要使用NMS筛选出最大IOU的边界框。通过utils/utils.py模块中的non_max_suppression实现NMS筛选,之后将边界框缩放为原来图片的大小,之后根据其坐标在原图中绘制并保存到输出目录中。

而图片绘制使用的是opencv库:

def plot_one_box(x, img, color=None, label=None, line_thickness=None):
    # Plots one bounding box on image img
    tl = line_thickness or round(0.002 * max(img.shape[0:2])) + 1  # line thickness
    color = color or [random.randint(0, 255) for _ in range(3)]
    c1, c2 = (int(x[0]), int(x[1])), (int(x[2]), int(x[3]))
    cv2.rectangle(img, c1, c2, color, thickness=tl)
    if label:
        tf = max(tl - 1, 1)  # font thickness
        t_size = cv2.getTextSize(label, 0, fontScale=tl / 3, thickness=tf)[0]
        c2 = c1[0] + t_size[0], c1[1] - t_size[1] - 3
        cv2.rectangle(img, c1, c2, color, -1)  # filled
        cv2.putText(img, label, (c1[0], c1[1] - 2), 0, tl / 3, [225, 255, 255], thickness=tf, lineType=cv2.LINE_AA)

先获取线框的粗细,之后若无指定颜色则为随机颜色。之后根据左上角坐标及右下角坐标,调用opencv的rectangle方法绘制矩形框。若存在标签的情况,则需要获取字体的大小再进行绘制。

如果喜欢这篇文章或对您有帮助，可以：[☕] 请我喝杯咖啡 | [💓] 小额赞助

码力全开 / YOLOv3目标检测代码精解

前言

目标检测代码