前言
之前介绍了YOLOv3推理过程整个代码的过程,详细内容可以参考。而YOLOv5的推理代码由于是同一个团队的实现,实际上并没有太大的变化,只是将其中的cfg配置文件修改为yaml格式,而权重文件是PyTorch格式的。
首先下载其源码:
git clone -b v4.0 --depth 1 http://github.com/ultralytics/yolov5.git
这里采用的是其版本4.0的代码进行讲解。其代码目录结构如下:
| detect.py | Dockerfile | hubconf.py | LICENSE | README.md | requirements.txt | test.py | train.py | tutorial.ipynb | +---.github | | dependabot.yml | | | +---ISSUE_TEMPLATE | | bug-report.md | | feature-request.md | | question.md | | | \---workflows | ci-testing.yml | codeql-analysis.yml | greetings.yml | rebase.yml | stale.yml | +---data | | coco.yaml | | coco128.yaml | | hyp.finetune.yaml | | hyp.scratch.yaml | | voc.yaml | | | +---images | | bus.jpg | | zidane.jpg | | | \---scripts | get_coco.sh | get_voc.sh | +---models | | common.py | | experimental.py | | export.py | | yolo.py | | yolov5l.yaml | | yolov5m.yaml | | yolov5s.yaml | | yolov5x.yaml | | __init__.py | | | \---hub | anchors.yaml | yolov3-spp.yaml | yolov3-tiny.yaml | yolov3.yaml | yolov5-fpn.yaml | yolov5-p2.yaml | yolov5-p6.yaml | yolov5-p7.yaml | yolov5-panet.yaml | +---utils | | activations.py | | autoanchor.py | | datasets.py | | general.py | | google_utils.py | | loss.py | | metrics.py | | plots.py | | torch_utils.py | | __init__.py | | | \---google_app_engine | additional_requirements.txt | app.yaml | Dockerfile | \---weights download_weights.sh
其中YOLOv5中添加了torch.hub
的加载方式,提供了shell脚本用于下载COCO及PASCAL VOC数据集。另外还提供了教程文件tutorial.ipynb
用于让开发人员了解其调用方式。
目标检测训练代码
其训练代码在train.py
模块中,首先是相应命令行参数:
parser = argparse.ArgumentParser()
parser.add_argument('--weights', type=str, default='yolov5s.pt', help='initial weights path')
parser.add_argument('--cfg', type=str, default='', help='model.yaml path')
parser.add_argument('--data', type=str, default='data/coco128.yaml', help='data.yaml path')
parser.add_argument('--hyp', type=str, default='data/hyp.scratch.yaml', help='hyperparameters path')
parser.add_argument('--epochs', type=int, default=300)
parser.add_argument('--batch-size', type=int, default=16, help='total batch size for all GPUs')
parser.add_argument('--img-size', nargs='+', type=int, default=[640, 640], help='[train, test] image sizes')
parser.add_argument('--rect', action='store_true', help='rectangular training')
parser.add_argument('--resume', nargs='?', const=True, default=False, help='resume most recent training')
parser.add_argument('--nosave', action='store_true', help='only save final checkpoint')
parser.add_argument('--notest', action='store_true', help='only test final epoch')
parser.add_argument('--noautoanchor', action='store_true', help='disable autoanchor check')
parser.add_argument('--evolve', action='store_true', help='evolve hyperparameters')
parser.add_argument('--bucket', type=str, default='', help='gsutil bucket')
parser.add_argument('--cache-images', action='store_true', help='cache images for faster training')
parser.add_argument('--image-weights', action='store_true', help='use weighted image selection for training')
parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
parser.add_argument('--multi-scale', action='store_true', help='vary img-size +/- 50%%')
parser.add_argument('--single-cls', action='store_true', help='train multi-class data as single-class')
parser.add_argument('--adam', action='store_true', help='use torch.optim.Adam() optimizer')
parser.add_argument('--sync-bn', action='store_true', help='use SyncBatchNorm, only available in DDP mode')
parser.add_argument('--local_rank', type=int, default=-1, help='DDP parameter, do not modify')
parser.add_argument('--log-imgs', type=int, default=16, help='number of images for W&B logging, max 100')
parser.add_argument('--log-artifacts', action='store_true', help='log artifacts, i.e. final trained model')
parser.add_argument('--workers', type=int, default=8, help='maximum number of dataloader workers')
parser.add_argument('--project', default='runs/train', help='save to project/name')
parser.add_argument('--name', default='exp', help='save to project/name')
parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
parser.add_argument('--quad', action='store_true', help='quad dataloader')
opt = parser.parse_args()
其中参数说明如下:
- weights,模型权重文件路径
- cfg,模型配置文件路径
- data,训练的数据集,默认为
data/coco128.yaml
。该数据集只有6.6M,可以用于训练测试。而完整的数据集需要调用data/scripts
目录下的shell脚本,其中COCO数据集大小为27GB,而VOC数据集为2.8GB - hyp,超参数路径
- epochs,训练的轮数,默认为300
- batch-size,总的batch数量
其中数据集内目录结构如下:
- images,图片数据
- labels,标注数据
每张图片对应一个标注数据文本文件,其内容类似如下:
45 0.479492 0.688771 0.955609 0.5955 45 0.736516 0.247188 0.498875 0.476417 50 0.637063 0.732938 0.494125 0.510583 45 0.339438 0.418896 0.678875 0.7815
其中45和50是相应的类型的ID,之后4位是x,y,w,h归一化后的值。而yaml文件中需要指定如下一些内容:
train: ../coco128/images/train2017/ # 128 images val: ../coco128/images/train2017/ # 128 images # number of classes nc: 80 # class names names: [ 'person', ..., 'hair drier', 'toothbrush' ]
分别是训练和验证集的目录,分类的数量及每个类的名称。
之后是使用并行GPU的参数设置:
# Set DDP variables
opt.world_size = int(os.environ['WORLD_SIZE']) if 'WORLD_SIZE' in os.environ else 1
opt.global_rank = int(os.environ['RANK']) if 'RANK' in os.environ else -1
set_logging(opt.global_rank)
if opt.global_rank in [-1, 0]:
check_git_status()
# DDP mode
opt.total_batch_size = opt.batch_size
device = select_device(opt.device, batch_size=opt.batch_size)
if opt.local_rank != -1:
assert torch.cuda.device_count() > opt.local_rank
torch.cuda.set_device(opt.local_rank)
device = torch.device('cuda', opt.local_rank)
dist.init_process_group(backend='nccl', init_method='env://') # distributed backend
assert opt.batch_size % opt.world_size == 0, '--batch-size must be multiple of CUDA device count'
opt.batch_size = opt.total_batch_size // opt.world_size
接着是断点续练,从最近的训练点继续训练,需要在命令行参数中设置--resume
为true进行开启:
# Resume
if opt.resume: # resume an interrupted run
ckpt = opt.resume if isinstance(opt.resume, str) else get_latest_run() # specified or most recent path
assert os.path.isfile(ckpt), 'ERROR: --resume checkpoint does not exist'
apriori = opt.global_rank, opt.local_rank
with open(Path(ckpt).parent.parent / 'opt.yaml') as f:
opt = argparse.Namespace(**yaml.load(f, Loader=yaml.FullLoader)) # replace
opt.cfg, opt.weights, opt.resume, opt.global_rank, opt.local_rank = '', ckpt, True, *apriori # reinstate
logger.info('Resuming training from %s' % ckpt)
else:
# opt.hyp = opt.hyp or ('hyp.finetune.yaml' if opt.weights else 'hyp.scratch.yaml')
opt.data, opt.cfg, opt.hyp = check_file(opt.data), check_file(opt.cfg), check_file(opt.hyp) # check files
assert len(opt.cfg) or len(opt.weights), 'either --cfg or --weights must be specified'
opt.img_size.extend([opt.img_size[-1]] * (2 - len(opt.img_size))) # extend to 2 sizes (train, test)
opt.name = 'evolve' if opt.evolve else opt.name
opt.save_dir = increment_path(Path(opt.project) / opt.name, exist_ok=opt.exist_ok | opt.evolve) # increment run
设置超参数的配置:
# Hyperparameters
with open(opt.hyp) as f:
hyp = yaml.load(f, Loader=yaml.FullLoader) # load hyps
if 'box' not in hyp:
warn('Compatibility: %s missing "box" which was renamed from "giou" in %s' %
(opt.hyp, 'https://github.com/ultralytics/yolov5/pull/1120'))
hyp['box'] = hyp.pop('giou')
之后调用train函数进行模型训练:
# Train
logger.info(opt)
if not opt.evolve:
tb_writer = None # init loggers
if opt.global_rank in [-1, 0]:
logger.info(f'Start Tensorboard with "tensorboard --logdir {opt.project}", view at http://localhost:6006/')
tb_writer = SummaryWriter(opt.save_dir) # Tensorboard
train(hyp, opt, device, tb_writer, wandb)
利用Tensorboard的SummaryWriter将训练数据汇总并写入。
整个过程逻辑还是比较清晰的,没有什么难度。只要按照其对应的数据格式标注自己的数据集即可。
如果喜欢这篇文章或对您有帮助,可以:[☕] 请我喝杯咖啡 | [💓] 小额赞助


