YOLOv11改进 | 主干/Backbone篇 | 大核心卷积UniRepLknet目标检测网络(适配yolov11全系列轻量化)

一、本文介绍

本文给大家带来的改进机制是 特征提取网络 UniRepLknet ,该网络结构的重点在于使用Dilated Reparam Block和大核心指导原则,强调了 高效的 结构进行通道间通讯和空间聚合,以及使用带扩张的小核心进行重新参数化,该网络结构就是在LKNet基础上的一个升级版本, LKNet我们之前已经出过教程了 UniRepLknet 各种视觉任务中,包括图像分类、目标检测和语义分割 ,都显示出优异的性能。

欢迎大家订阅我的专栏一起学习YOLO!

(本文内容可根据yolov11的N、S、M、L、X进行二次缩放,轻量化更上一层)。


目录

一、本文介绍

二、 UniRepLknet的框架原理

三、 UniRepLknet的核心代码

四、手把手教你添加UniRepLknet机制

4.1 修改一

4.2 修改二

4.3 修改三

4.4 修改四

4.5 修改五

4.6 修改六

4.7 修改七

4.8 修改八

注意!!! 额外的修改!

打印计算量问题解决方案

注意事项!!!

五、UniRepLknet的yaml文件

5.1UniRepLknet的yaml文件版本1

5.2 训练文件

六、成功运行记录

七、本文总结


二、 UniRepLknet的框架原理

官方论文地址: 官方论文地址

官方代码地址: 官方代码地址


UniRepLKNet论文提出了一种新型的大核心卷积 神经网络 架构。这种架构通过结合非扩张小核心和扩张小核心层来增强非扩张大核心卷积层,旨在优化空间模式层次和表示能力。它强调了根据下游任务选择核心大小的重要性,并展示了该架构在图像识别以外领域(如音频、视频和时间序列数据)的通用性。此外,论文还展示了在各种任务上的领先性能,证明了其多功能性和高效性。

UniRepLKNet的主要创新点包括:

UniRepLKNet的结构上的创新点主要体现在其大核心卷积神经网络的设计上,这包括对大核心的高效利用,以及与传统ConvNets和变换器相比的独特构造。这种架构通过融合非扩张小核心和扩张小核心层来增强大核心层,优化了空间模式层次和网络的表示能力。此外,论文还提出了适用于大核心ConvNets的四个架构设计原则,旨在充分发挥大核心的独特优势,例如通过浅层结构观察更广阔视野,而不需要深入网络层次。

在图像中展示的UniRepLKNet架构设计中,一个显著的结构创新是LaRK(Large Kernel )块,它包括本文提出的Dilated Reparam Block,一个SE(Squeeze-and-Excitation)块,前馈网络(FFN),以及批量归一化(BN)层。LaRK块与SmaK(Small Kernel)块的主要区别在于,LaRK使用深度分离的3x3卷积层代替了Dilated Reparam Block中的层。不同阶段的块通过步长为2的密集3x3卷积层实现的下采样块连接,而这些块可以在不同阶段灵活地排列。这种设计强调了结构的模块化和灵活性,以及通过大核心来增强 模型 的性能和效率。

在图2中,UniRepLKNet的Dilated Reparam Block通过使用扩张的小核心卷积层来增强非扩张的大核心层。这些扩张层从参数角度看等同于一个具有更大稀疏核心的非扩张卷积层,这使得整个块可以等效地转换成单个大核心卷积。通过重新 参数化 的过程,多个具有不同扩张率的小核心卷积层被合并成一个等效的大核心卷积层,从而在保持可学习参数数量和计算效率的同时,增强了网络对空间信息的捕获能力。这种设计创新为ConvNets提供了更广泛的感受野,而不会增加模型的深度。


三、 UniRepLknet的核心代码

下面的核心代码使用方式看章节四。

  1. # UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
  2. # Github source: https://github.com/AILab-CVC/UniRepLKNet
  3. # Licensed under The Apache License 2.0 License [see LICENSE for details]
  4. # Based on RepLKNet, ConvNeXt, timm, DINO and DeiT code bases
  5. # https://github.com/DingXiaoH/RepLKNet-pytorch
  6. # https://github.com/facebookresearch/ConvNeXt
  7. # https://github.com/rwightman/pytorch-image-models/tree/master/timm
  8. # https://github.com/facebookresearch/deit/
  9. # https://github.com/facebookresearch/dino
  10. # --------------------------------------------------------'
  11. import torch
  12. import torch.nn as nn
  13. import torch.nn.functional as F
  14. from timm.layers import trunc_normal_, DropPath, to_2tuple
  15. from functools import partial
  16. import torch.utils.checkpoint as checkpoint
  17. import numpy as np
  18. __all__ = ['unireplknet_a', 'unireplknet_f', 'unireplknet_p', 'unireplknet_n', 'unireplknet_t', 'unireplknet_s', 'unireplknet_b', 'unireplknet_l', 'unireplknet_xl']
  19. class GRNwithNHWC(nn.Module):
  20. """ GRN (Global Response Normalization) layer
  21. Originally proposed in ConvNeXt V2 (https://arxiv.org/abs/2301.00808)
  22. This implementation is more efficient than the original (https://github.com/facebookresearch/ConvNeXt-V2)
  23. We assume the inputs to this layer are (N, H, W, C)
  24. """
  25. def __init__(self, dim, use_bias=True):
  26. super().__init__()
  27. self.use_bias = use_bias
  28. self.gamma = nn.Parameter(torch.zeros(1, 1, 1, dim))
  29. if self.use_bias:
  30. self.beta = nn.Parameter(torch.zeros(1, 1, 1, dim))
  31. def forward(self, x):
  32. Gx = torch.norm(x, p=2, dim=(1, 2), keepdim=True)
  33. Nx = Gx / (Gx.mean(dim=-1, keepdim=True) + 1e-6)
  34. if self.use_bias:
  35. return (self.gamma * Nx + 1) * x + self.beta
  36. else:
  37. return (self.gamma * Nx + 1) * x
  38. class NCHWtoNHWC(nn.Module):
  39. def __init__(self):
  40. super().__init__()
  41. def forward(self, x):
  42. return x.permute(0, 2, 3, 1)
  43. class NHWCtoNCHW(nn.Module):
  44. def __init__(self):
  45. super().__init__()
  46. def forward(self, x):
  47. return x.permute(0, 3, 1, 2)
  48. #================== This function decides which conv implementation (the native or iGEMM) to use
  49. # Note that iGEMM large-kernel conv impl will be used if
  50. # - you attempt to do so (attempt_to_use_large_impl=True), and
  51. # - it has been installed (follow https://github.com/AILab-CVC/UniRepLKNet), and
  52. # - the conv layer is depth-wise, stride = 1, non-dilated, kernel_size > 5, and padding == kernel_size // 2
  53. def get_conv2d(in_channels, out_channels, kernel_size, stride, padding, dilation, groups, bias,
  54. attempt_use_lk_impl=True):
  55. kernel_size = to_2tuple(kernel_size)
  56. if padding is None:
  57. padding = (kernel_size[0] // 2, kernel_size[1] // 2)
  58. else:
  59. padding = to_2tuple(padding)
  60. need_large_impl = kernel_size[0] == kernel_size[1] and kernel_size[0] > 5 and padding == (kernel_size[0] // 2, kernel_size[1] // 2)
  61. # if attempt_use_lk_impl and need_large_impl:
  62. # print('---------------- trying to import iGEMM implementation for large-kernel conv')
  63. # try:
  64. # from depthwise_conv2d_implicit_gemm import DepthWiseConv2dImplicitGEMM
  65. # print('---------------- found iGEMM implementation ')
  66. # except:
  67. # DepthWiseConv2dImplicitGEMM = None
  68. # print('---------------- found no iGEMM. use original conv. follow https://github.com/AILab-CVC/UniRepLKNet to install it.')
  69. # if DepthWiseConv2dImplicitGEMM is not None and need_large_impl and in_channels == out_channels \
  70. # and out_channels == groups and stride == 1 and dilation == 1:
  71. # print(f'===== iGEMM Efficient Conv Impl, channels {in_channels}, kernel size {kernel_size} =====')
  72. # return DepthWiseConv2dImplicitGEMM(in_channels, kernel_size, bias=bias)
  73. return nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride,
  74. padding=padding, dilation=dilation, groups=groups, bias=bias)
  75. def get_bn(dim, use_sync_bn=False):
  76. if use_sync_bn:
  77. return nn.SyncBatchNorm(dim)
  78. else:
  79. return nn.BatchNorm2d(dim)
  80. class SEBlock(nn.Module):
  81. """
  82. Squeeze-and-Excitation Block proposed in SENet (https://arxiv.org/abs/1709.01507)
  83. We assume the inputs to this layer are (N, C, H, W)
  84. """
  85. def __init__(self, input_channels, internal_neurons):
  86. super(SEBlock, self).__init__()
  87. self.down = nn.Conv2d(in_channels=input_channels, out_channels=internal_neurons,
  88. kernel_size=1, stride=1, bias=True)
  89. self.up = nn.Conv2d(in_channels=internal_neurons, out_channels=input_channels,
  90. kernel_size=1, stride=1, bias=True)
  91. self.input_channels = input_channels
  92. self.nonlinear = nn.ReLU(inplace=True)
  93. def forward(self, inputs):
  94. x = F.adaptive_avg_pool2d(inputs, output_size=(1, 1))
  95. x = self.down(x)
  96. x = self.nonlinear(x)
  97. x = self.up(x)
  98. x = F.sigmoid(x)
  99. return inputs * x.view(-1, self.input_channels, 1, 1)
  100. def fuse_bn(conv, bn):
  101. conv_bias = 0 if conv.bias is None else conv.bias
  102. std = (bn.running_var + bn.eps).sqrt()
  103. return conv.weight * (bn.weight / std).reshape(-1, 1, 1, 1), bn.bias + (conv_bias - bn.running_mean) * bn.weight / std
  104. def convert_dilated_to_nondilated(kernel, dilate_rate):
  105. identity_kernel = torch.ones((1, 1, 1, 1)).to(kernel.device)
  106. if kernel.size(1) == 1:
  107. # This is a DW kernel
  108. dilated = F.conv_transpose2d(kernel, identity_kernel, stride=dilate_rate)
  109. return dilated
  110. else:
  111. # This is a dense or group-wise (but not DW) kernel
  112. slices = []
  113. for i in range(kernel.size(1)):
  114. dilated = F.conv_transpose2d(kernel[:,i:i+1,:,:], identity_kernel, stride=dilate_rate)
  115. slices.append(dilated)
  116. return torch.cat(slices, dim=1)
  117. def merge_dilated_into_large_kernel(large_kernel, dilated_kernel, dilated_r):
  118. large_k = large_kernel.size(2)
  119. dilated_k = dilated_kernel.size(2)
  120. equivalent_kernel_size = dilated_r * (dilated_k - 1) + 1
  121. equivalent_kernel = convert_dilated_to_nondilated(dilated_kernel, dilated_r)
  122. rows_to_pad = large_k // 2 - equivalent_kernel_size // 2
  123. merged_kernel = large_kernel + F.pad(equivalent_kernel, [rows_to_pad] * 4)
  124. return merged_kernel
  125. class DilatedReparamBlock(nn.Module):
  126. """
  127. Dilated Reparam Block proposed in UniRepLKNet (https://github.com/AILab-CVC/UniRepLKNet)
  128. We assume the inputs to this block are (N, C, H, W)
  129. """
  130. def __init__(self, channels, kernel_size, deploy, use_sync_bn=False, attempt_use_lk_impl=True):
  131. super().__init__()
  132. self.lk_origin = get_conv2d(channels, channels, kernel_size, stride=1,
  133. padding=kernel_size//2, dilation=1, groups=channels, bias=deploy,
  134. attempt_use_lk_impl=attempt_use_lk_impl)
  135. self.attempt_use_lk_impl = attempt_use_lk_impl
  136. # Default settings. We did not tune them carefully. Different settings may work better.
  137. if kernel_size == 17:
  138. self.kernel_sizes = [5, 9, 3, 3, 3]
  139. self.dilates = [1, 2, 4, 5, 7]
  140. elif kernel_size == 15:
  141. self.kernel_sizes = [5, 7, 3, 3, 3]
  142. self.dilates = [1, 2, 3, 5, 7]
  143. elif kernel_size == 13:
  144. self.kernel_sizes = [5, 7, 3, 3, 3]
  145. self.dilates = [1, 2, 3, 4, 5]
  146. elif kernel_size == 11:
  147. self.kernel_sizes = [5, 5, 3, 3, 3]
  148. self.dilates = [1, 2, 3, 4, 5]
  149. elif kernel_size == 9:
  150. self.kernel_sizes = [5, 5, 3, 3]
  151. self.dilates = [1, 2, 3, 4]
  152. elif kernel_size == 7:
  153. self.kernel_sizes = [5, 3, 3]
  154. self.dilates = [1, 2, 3]
  155. elif kernel_size == 5:
  156. self.kernel_sizes = [3, 3]
  157. self.dilates = [1, 2]
  158. else:
  159. raise ValueError('Dilated Reparam Block requires kernel_size >= 5')
  160. if not deploy:
  161. self.origin_bn = get_bn(channels, use_sync_bn)
  162. for k, r in zip(self.kernel_sizes, self.dilates):
  163. self.__setattr__('dil_conv_k{}_{}'.format(k, r),
  164. nn.Conv2d(in_channels=channels, out_channels=channels, kernel_size=k, stride=1,
  165. padding=(r * (k - 1) + 1) // 2, dilation=r, groups=channels,
  166. bias=False))
  167. self.__setattr__('dil_bn_k{}_{}'.format(k, r), get_bn(channels, use_sync_bn=use_sync_bn))
  168. def forward(self, x):
  169. if not hasattr(self, 'origin_bn'): # deploy mode
  170. return self.lk_origin(x)
  171. out = self.origin_bn(self.lk_origin(x))
  172. for k, r in zip(self.kernel_sizes, self.dilates):
  173. conv = self.__getattr__('dil_conv_k{}_{}'.format(k, r))
  174. bn = self.__getattr__('dil_bn_k{}_{}'.format(k, r))
  175. out = out + bn(conv(x))
  176. return out
  177. def merge_dilated_branches(self):
  178. if hasattr(self, 'origin_bn'):
  179. origin_k, origin_b = fuse_bn(self.lk_origin, self.origin_bn)
  180. for k, r in zip(self.kernel_sizes, self.dilates):
  181. conv = self.__getattr__('dil_conv_k{}_{}'.format(k, r))
  182. bn = self.__getattr__('dil_bn_k{}_{}'.format(k, r))
  183. branch_k, branch_b = fuse_bn(conv, bn)
  184. origin_k = merge_dilated_into_large_kernel(origin_k, branch_k, r)
  185. origin_b += branch_b
  186. merged_conv = get_conv2d(origin_k.size(0), origin_k.size(0), origin_k.size(2), stride=1,
  187. padding=origin_k.size(2)//2, dilation=1, groups=origin_k.size(0), bias=True,
  188. attempt_use_lk_impl=self.attempt_use_lk_impl)
  189. merged_conv.weight.data = origin_k
  190. merged_conv.bias.data = origin_b
  191. self.lk_origin = merged_conv
  192. self.__delattr__('origin_bn')
  193. for k, r in zip(self.kernel_sizes, self.dilates):
  194. self.__delattr__('dil_conv_k{}_{}'.format(k, r))
  195. self.__delattr__('dil_bn_k{}_{}'.format(k, r))
  196. class UniRepLKNetBlock(nn.Module):
  197. def __init__(self,
  198. dim,
  199. kernel_size,
  200. drop_path=0.,
  201. layer_scale_init_value=1e-6,
  202. deploy=False,
  203. attempt_use_lk_impl=True,
  204. with_cp=False,
  205. use_sync_bn=False,
  206. ffn_factor=4):
  207. super().__init__()
  208. self.with_cp = with_cp
  209. # if deploy:
  210. # print('------------------------------- Note: deploy mode')
  211. # if self.with_cp:
  212. # print('****** note with_cp = True, reduce memory consumption but may slow down training ******')
  213. self.need_contiguous = (not deploy) or kernel_size >= 7
  214. if kernel_size == 0:
  215. self.dwconv = nn.Identity()
  216. self.norm = nn.Identity()
  217. elif deploy:
  218. self.dwconv = get_conv2d(dim, dim, kernel_size=kernel_size, stride=1, padding=kernel_size // 2,
  219. dilation=1, groups=dim, bias=True,
  220. attempt_use_lk_impl=attempt_use_lk_impl)
  221. self.norm = nn.Identity()
  222. elif kernel_size >= 7:
  223. self.dwconv = DilatedReparamBlock(dim, kernel_size, deploy=deploy,
  224. use_sync_bn=use_sync_bn,
  225. attempt_use_lk_impl=attempt_use_lk_impl)
  226. self.norm = get_bn(dim, use_sync_bn=use_sync_bn)
  227. elif kernel_size == 1:
  228. self.dwconv = nn.Conv2d(dim, dim, kernel_size=kernel_size, stride=1, padding=kernel_size // 2,
  229. dilation=1, groups=1, bias=deploy)
  230. self.norm = get_bn(dim, use_sync_bn=use_sync_bn)
  231. else:
  232. assert kernel_size in [3, 5]
  233. self.dwconv = nn.Conv2d(dim, dim, kernel_size=kernel_size, stride=1, padding=kernel_size // 2,
  234. dilation=1, groups=dim, bias=deploy)
  235. self.norm = get_bn(dim, use_sync_bn=use_sync_bn)
  236. self.se = SEBlock(dim, dim // 4)
  237. ffn_dim = int(ffn_factor * dim)
  238. self.pwconv1 = nn.Sequential(
  239. NCHWtoNHWC(),
  240. nn.Linear(dim, ffn_dim))
  241. self.act = nn.Sequential(
  242. nn.GELU(),
  243. GRNwithNHWC(ffn_dim, use_bias=not deploy))
  244. if deploy:
  245. self.pwconv2 = nn.Sequential(
  246. nn.Linear(ffn_dim, dim),
  247. NHWCtoNCHW())
  248. else:
  249. self.pwconv2 = nn.Sequential(
  250. nn.Linear(ffn_dim, dim, bias=False),
  251. NHWCtoNCHW(),
  252. get_bn(dim, use_sync_bn=use_sync_bn))
  253. self.gamma = nn.Parameter(layer_scale_init_value * torch.ones(dim),
  254. requires_grad=True) if (not deploy) and layer_scale_init_value is not None \
  255. and layer_scale_init_value > 0 else None
  256. self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
  257. def forward(self, inputs):
  258. def _f(x):
  259. if self.need_contiguous:
  260. x = x.contiguous()
  261. y = self.se(self.norm(self.dwconv(x)))
  262. y = self.pwconv2(self.act(self.pwconv1(y)))
  263. if self.gamma is not None:
  264. y = self.gamma.view(1, -1, 1, 1) * y
  265. return self.drop_path(y) + x
  266. if self.with_cp and inputs.requires_grad:
  267. return checkpoint.checkpoint(_f, inputs)
  268. else:
  269. return _f(inputs)
  270. def reparameterize(self):
  271. if hasattr(self.dwconv, 'merge_dilated_branches'):
  272. self.dwconv.merge_dilated_branches()
  273. if hasattr(self.norm, 'running_var') and hasattr(self.dwconv, 'lk_origin'):
  274. std = (self.norm.running_var + self.norm.eps).sqrt()
  275. self.dwconv.lk_origin.weight.data *= (self.norm.weight / std).view(-1, 1, 1, 1)
  276. self.dwconv.lk_origin.bias.data = self.norm.bias + (self.dwconv.lk_origin.bias - self.norm.running_mean) * self.norm.weight / std
  277. self.norm = nn.Identity()
  278. if self.gamma is not None:
  279. final_scale = self.gamma.data
  280. self.gamma = None
  281. else:
  282. final_scale = 1
  283. if self.act[1].use_bias and len(self.pwconv2) == 3:
  284. grn_bias = self.act[1].beta.data
  285. self.act[1].__delattr__('beta')
  286. self.act[1].use_bias = False
  287. linear = self.pwconv2[0]
  288. grn_bias_projected_bias = (linear.weight.data @ grn_bias.view(-1, 1)).squeeze()
  289. bn = self.pwconv2[2]
  290. std = (bn.running_var + bn.eps).sqrt()
  291. new_linear = nn.Linear(linear.in_features, linear.out_features, bias=True)
  292. new_linear.weight.data = linear.weight * (bn.weight / std * final_scale).view(-1, 1)
  293. linear_bias = 0 if linear.bias is None else linear.bias.data
  294. linear_bias += grn_bias_projected_bias
  295. new_linear.bias.data = (bn.bias + (linear_bias - bn.running_mean) * bn.weight / std) * final_scale
  296. self.pwconv2 = nn.Sequential(new_linear, self.pwconv2[1])
  297. default_UniRepLKNet_A_F_P_kernel_sizes = ((3, 3),
  298. (13, 13),
  299. (13, 13, 13, 13, 13, 13),
  300. (13, 13))
  301. default_UniRepLKNet_N_kernel_sizes = ((3, 3),
  302. (13, 13),
  303. (13, 13, 13, 13, 13, 13, 13, 13),
  304. (13, 13))
  305. default_UniRepLKNet_T_kernel_sizes = ((3, 3, 3),
  306. (13, 13, 13),
  307. (13, 3, 13, 3, 13, 3, 13, 3, 13, 3, 13, 3, 13, 3, 13, 3, 13, 3),
  308. (13, 13, 13))
  309. default_UniRepLKNet_S_B_L_XL_kernel_sizes = ((3, 3, 3),
  310. (13, 13, 13),
  311. (13, 3, 3, 13, 3, 3, 13, 3, 3, 13, 3, 3, 13, 3, 3, 13, 3, 3, 13, 3, 3, 13, 3, 3, 13, 3, 3),
  312. (13, 13, 13))
  313. UniRepLKNet_A_F_P_depths = (2, 2, 6, 2)
  314. UniRepLKNet_N_depths = (2, 2, 8, 2)
  315. UniRepLKNet_T_depths = (3, 3, 18, 3)
  316. UniRepLKNet_S_B_L_XL_depths = (3, 3, 27, 3)
  317. default_depths_to_kernel_sizes = {
  318. UniRepLKNet_A_F_P_depths: default_UniRepLKNet_A_F_P_kernel_sizes,
  319. UniRepLKNet_N_depths: default_UniRepLKNet_N_kernel_sizes,
  320. UniRepLKNet_T_depths: default_UniRepLKNet_T_kernel_sizes,
  321. UniRepLKNet_S_B_L_XL_depths: default_UniRepLKNet_S_B_L_XL_kernel_sizes
  322. }
  323. class UniRepLKNet(nn.Module):
  324. r""" UniRepLKNet
  325. A PyTorch impl of UniRepLKNet
  326. Args:
  327. in_chans (int): Number of input image channels. Default: 3
  328. num_classes (int): Number of classes for classification head. Default: 1000
  329. depths (tuple(int)): Number of blocks at each stage. Default: (3, 3, 27, 3)
  330. dims (int): Feature dimension at each stage. Default: (96, 192, 384, 768)
  331. drop_path_rate (float): Stochastic depth rate. Default: 0.
  332. layer_scale_init_value (float): Init value for Layer Scale. Default: 1e-6.
  333. head_init_scale (float): Init scaling value for classifier weights and biases. Default: 1.
  334. kernel_sizes (tuple(tuple(int))): Kernel size for each block. None means using the default settings. Default: None.
  335. deploy (bool): deploy = True means using the inference structure. Default: False
  336. with_cp (bool): with_cp = True means using torch.utils.checkpoint to save GPU memory. Default: False
  337. init_cfg (dict): weights to load. The easiest way to use UniRepLKNet with for OpenMMLab family. Default: None
  338. attempt_use_lk_impl (bool): try to load the efficient iGEMM large-kernel impl. Setting it to False disabling the iGEMM impl. Default: True
  339. use_sync_bn (bool): use_sync_bn = True means using sync BN. Use it if your batch size is small. Default: False
  340. """
  341. def __init__(self,
  342. in_chans=3,
  343. num_classes=1000,
  344. depths=(3, 3, 27, 3),
  345. dims=(96, 192, 384, 768),
  346. drop_path_rate=0.,
  347. layer_scale_init_value=1e-6,
  348. head_init_scale=1.,
  349. kernel_sizes=None,
  350. deploy=False,
  351. with_cp=False,
  352. init_cfg=None,
  353. attempt_use_lk_impl=True,
  354. use_sync_bn=False,
  355. **kwargs
  356. ):
  357. super().__init__()
  358. depths = tuple(depths)
  359. if kernel_sizes is None:
  360. if depths in default_depths_to_kernel_sizes:
  361. # print('=========== use default kernel size ')
  362. kernel_sizes = default_depths_to_kernel_sizes[depths]
  363. else:
  364. raise ValueError('no default kernel size settings for the given depths, '
  365. 'please specify kernel sizes for each block, e.g., '
  366. '((3, 3), (13, 13), (13, 13, 13, 13, 13, 13), (13, 13))')
  367. # print(kernel_sizes)
  368. for i in range(4):
  369. assert len(kernel_sizes[i]) == depths[i], 'kernel sizes do not match the depths'
  370. self.with_cp = with_cp
  371. dp_rates = [x.item() for x in torch.linspace(0, drop_path_rate, sum(depths))]
  372. # print('=========== drop path rates: ', dp_rates)
  373. self.downsample_layers = nn.ModuleList()
  374. self.downsample_layers.append(nn.Sequential(
  375. nn.Conv2d(in_chans, dims[0] // 2, kernel_size=3, stride=2, padding=1),
  376. LayerNorm(dims[0] // 2, eps=1e-6, data_format="channels_first"),
  377. nn.GELU(),
  378. nn.Conv2d(dims[0] // 2, dims[0], kernel_size=3, stride=2, padding=1),
  379. LayerNorm(dims[0], eps=1e-6, data_format="channels_first")))
  380. for i in range(3):
  381. self.downsample_layers.append(nn.Sequential(
  382. nn.Conv2d(dims[i], dims[i + 1], kernel_size=3, stride=2, padding=1),
  383. LayerNorm(dims[i + 1], eps=1e-6, data_format="channels_first")))
  384. self.stages = nn.ModuleList()
  385. cur = 0
  386. for i in range(4):
  387. main_stage = nn.Sequential(
  388. *[UniRepLKNetBlock(dim=dims[i], kernel_size=kernel_sizes[i][j], drop_path=dp_rates[cur + j],
  389. layer_scale_init_value=layer_scale_init_value, deploy=deploy,
  390. attempt_use_lk_impl=attempt_use_lk_impl,
  391. with_cp=with_cp, use_sync_bn=use_sync_bn) for j in
  392. range(depths[i])])
  393. self.stages.append(main_stage)
  394. cur += depths[i]
  395. self.output_mode = 'features'
  396. norm_layer = partial(LayerNorm, eps=1e-6, data_format="channels_first")
  397. for i_layer in range(4):
  398. layer = norm_layer(dims[i_layer])
  399. layer_name = f'norm{i_layer}'
  400. self.add_module(layer_name, layer)
  401. self.width_list = [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))]
  402. self.apply(self._init_weights)
  403. def _init_weights(self, m):
  404. if isinstance(m, (nn.Conv2d, nn.Linear)):
  405. trunc_normal_(m.weight, std=.02)
  406. if hasattr(m, 'bias') and m.bias is not None:
  407. nn.init.constant_(m.bias, 0)
  408. def forward(self, x):
  409. if self.output_mode == 'logits':
  410. for stage_idx in range(4):
  411. x = self.downsample_layers[stage_idx](x)
  412. x = self.stages[stage_idx](x)
  413. x = self.norm(x.mean([-2, -1]))
  414. x = self.head(x)
  415. return x
  416. elif self.output_mode == 'features':
  417. outs = []
  418. for stage_idx in range(4):
  419. x = self.downsample_layers[stage_idx](x)
  420. x = self.stages[stage_idx](x)
  421. outs.append(self.__getattr__(f'norm{stage_idx}')(x))
  422. return outs
  423. else:
  424. raise ValueError('Defined new output mode?')
  425. def switch_to_deploy(self):
  426. for m in self.modules():
  427. if hasattr(m, 'reparameterize'):
  428. m.reparameterize()
  429. class LayerNorm(nn.Module):
  430. r""" LayerNorm implementation used in ConvNeXt
  431. LayerNorm that supports two data formats: channels_last (default) or channels_first.
  432. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with
  433. shape (batch_size, height, width, channels) while channels_first corresponds to inputs
  434. with shape (batch_size, channels, height, width).
  435. """
  436. def __init__(self, normalized_shape, eps=1e-6, data_format="channels_last", reshape_last_to_first=False):
  437. super().__init__()
  438. self.weight = nn.Parameter(torch.ones(normalized_shape))
  439. self.bias = nn.Parameter(torch.zeros(normalized_shape))
  440. self.eps = eps
  441. self.data_format = data_format
  442. if self.data_format not in ["channels_last", "channels_first"]:
  443. raise NotImplementedError
  444. self.normalized_shape = (normalized_shape,)
  445. self.reshape_last_to_first = reshape_last_to_first
  446. def forward(self, x):
  447. if self.data_format == "channels_last":
  448. return F.layer_norm(x, self.normalized_shape, self.weight, self.bias, self.eps)
  449. elif self.data_format == "channels_first":
  450. u = x.mean(1, keepdim=True)
  451. s = (x - u).pow(2).mean(1, keepdim=True)
  452. x = (x - u) / torch.sqrt(s + self.eps)
  453. x = self.weight[:, None, None] * x + self.bias[:, None, None]
  454. return x
  455. def update_weight(model_dict, weight_dict):
  456. idx, temp_dict = 0, {}
  457. for k, v in weight_dict.items():
  458. if k in model_dict.keys() and np.shape(model_dict[k]) == np.shape(v):
  459. temp_dict[k] = v
  460. idx += 1
  461. model_dict.update(temp_dict)
  462. print(f'loading weights... {idx}/{len(model_dict)} items')
  463. return model_dict
  464. def unireplknet_a(pretrained='', **kwargs):
  465. model = UniRepLKNet(depths=UniRepLKNet_A_F_P_depths, dims=(40, 80, 160, 320), **kwargs)
  466. if pretrained:
  467. model.load_state_dict(update_weight(model.state_dict(), torch.load(pretrained)))
  468. return model
  469. def unireplknet_f(pretrained='', **kwargs):
  470. model = UniRepLKNet(depths=UniRepLKNet_A_F_P_depths, dims=(48, 96, 192, 384), **kwargs)
  471. if pretrained:
  472. model.load_state_dict(update_weight(model.state_dict(), torch.load(pretrained)))
  473. return model
  474. def unireplknet_p(pretrained='', **kwargs):
  475. model = UniRepLKNet(depths=UniRepLKNet_A_F_P_depths, dims=(64, 128, 256, 512), **kwargs)
  476. if pretrained:
  477. model.load_state_dict(update_weight(model.state_dict(), torch.load(pretrained)))
  478. return model
  479. def unireplknet_n(weights='', **kwargs):
  480. model = UniRepLKNet(depths=UniRepLKNet_N_depths, dims=(80, 160, 320, 640), **kwargs)
  481. if weights:
  482. model.load_state_dict(update_weight(model.state_dict(), torch.load(weights)))
  483. return model
  484. def unireplknet_t(pretrained='', **kwargs):
  485. model = UniRepLKNet(depths=UniRepLKNet_T_depths, dims=(80, 160, 320, 640), **kwargs)
  486. if pretrained:
  487. model.load_state_dict(update_weight(model.state_dict(), torch.load(pretrained)))
  488. return model
  489. def unireplknet_s(pretrained='', **kwargs):
  490. model = UniRepLKNet(depths=UniRepLKNet_S_B_L_XL_depths, dims=(96, 192, 384, 768), **kwargs)
  491. if pretrained:
  492. model.load_state_dict(update_weight(model.state_dict(), torch.load(pretrained)))
  493. return model
  494. def unireplknet_b(pretrained='', **kwargs):
  495. model = UniRepLKNet(depths=UniRepLKNet_S_B_L_XL_depths, dims=(128, 256, 512, 1024), **kwargs)
  496. if pretrained:
  497. model.load_state_dict(update_weight(model.state_dict(), torch.load(pretrained)))
  498. return model
  499. def unireplknet_l(pretrained='', **kwargs):
  500. model = UniRepLKNet(depths=UniRepLKNet_S_B_L_XL_depths, dims=(192, 384, 768, 1536), **kwargs)
  501. if pretrained:
  502. model.load_state_dict(update_weight(model.state_dict(), torch.load(pretrained)))
  503. return model
  504. def unireplknet_xl(pretrained='', **kwargs):
  505. model = UniRepLKNet(depths=UniRepLKNet_S_B_L_XL_depths, dims=(256, 512, 1024, 2048), **kwargs)
  506. if pretrained:
  507. model.load_state_dict(update_weight(model.state_dict(), torch.load(pretrained)))
  508. return model


四、手把手教你添加UniRepLknet机制

4.1 修改一

第一步还是建立文件,我们找到如下ultralytics/nn文件夹下建立一个目录名字呢就是'Addmodules'文件夹( !然后在其内部建立一个新的py文件将核心代码复制粘贴进去即可


4.2 修改二

第二步我们在该目录下创建一个新的py文件名字为'__init__.py'( ,然后在其内部导入我们的检测头如下图所示。


4.3 修改三

第三步我门中到如下文件'ultralytics/nn/tasks.py'进行导入和注册我们的模块( !


4.4 修改四

添加如下两行代码!!!


4.5 修改五

找到七百多行大概把具体看图片,按照图片来修改就行,添加红框内的部分,注意没有()只是 函数 名。

  1. elif m in {自行添加对应的模型即可,下面都是一样的}:
  2. m = m(*args)
  3. c2 = m.width_list # 返回通道列表
  4. backbone = True


4.6 修改六

下面的两个红框内都是需要改动的。

  1. if isinstance(c2, list):
  2. m_ = m
  3. m_.backbone = True
  4. else:
  5. m_ = nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args) # module
  6. t = str(m)[8:-2].replace('__main__.', '') # module type
  7. m.np = sum(x.numel() for x in m_.parameters()) # number params
  8. m_.i, m_.f, m_.type = i + 4 if backbone else i, f, t # attach index, 'from' index, type


4.7 修改七

如下的也需要修改,全部按照我的来。

代码如下把原先的代码替换了即可。

  1. if verbose:
  2. LOGGER.info(f'{i:>3}{str(f):>20}{n_:>3}{m.np:10.0f} {t:<45}{str(args):<30}') # print
  3. save.extend(x % (i + 4 if backbone else i) for x in ([f] if isinstance(f, int) else f) if x != -1) # append to savelist
  4. layers.append(m_)
  5. if i == 0:
  6. ch = []
  7. if isinstance(c2, list):
  8. ch.extend(c2)
  9. if len(c2) != 5:
  10. ch.insert(0, 0)
  11. else:
  12. ch.append(c2)


4.8 修改八

修改八和前面的都不太一样,需要修改前向传播中的一个部分, 已经离开了parse_model方法了。

可以在图片中开代码行数,没有离开task.py文件都是同一个文件。 同时这个部分有好几个前向传播都很相似,大家不要看错了, 是70多行左右的!!!,同时我后面提供了代码,大家直接复制粘贴即可,有时间我针对这里会出一个视频。

​​

代码如下->

  1. def _predict_once(self, x, profile=False, visualize=False, embed=None):
  2. """
  3. Perform a forward pass through the network.
  4. Args:
  5. x (torch.Tensor): The input tensor to the model.
  6. profile (bool): Print the computation time of each layer if True, defaults to False.
  7. visualize (bool): Save the feature maps of the model if True, defaults to False.
  8. embed (list, optional): A list of feature vectors/embeddings to return.
  9. Returns:
  10. (torch.Tensor): The last output of the model.
  11. """
  12. y, dt, embeddings = [], [], [] # outputs
  13. for m in self.model:
  14. if m.f != -1: # if not from previous layer
  15. x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f] # from earlier layers
  16. if profile:
  17. self._profile_one_layer(m, x, dt)
  18. if hasattr(m, 'backbone'):
  19. x = m(x)
  20. if len(x) != 5: # 0 - 5
  21. x.insert(0, None)
  22. for index, i in enumerate(x):
  23. if index in self.save:
  24. y.append(i)
  25. else:
  26. y.append(None)
  27. x = x[-1] # 最后一个输出传给下一层
  28. else:
  29. x = m(x) # run
  30. y.append(x if m.i in self.save else None) # save output
  31. if visualize:
  32. feature_visualization(x, m.type, m.i, save_dir=visualize)
  33. if embed and m.i in embed:
  34. embeddings.append(nn.functional.adaptive_avg_pool2d(x, (1, 1)).squeeze(-1).squeeze(-1)) # flatten
  35. if m.i == max(embed):
  36. return torch.unbind(torch.cat(embeddings, 1), dim=0)
  37. return x

到这里就完成了修改部分,但是这里面细节很多,大家千万要注意不要替换多余的代码,导致报错,也不要拉下任何一部,都会导致运行失败,而且报错很难排查!!!很难排查!!!


注意!!! 额外的修改!

关注我的其实都知道,我大部分的修改都是一样的,这个网络需要额外的修改一步,就是s一个参数,将下面的s改为640!!!即可完美运行!!


打印计算量问题解决方案

我们找到如下文件'ultralytics/utils/torch_utils.py'按照如下的图片进行修改,否则容易打印不出来计算量。


注意事项!!!

如果大家在验证的时候报错形状不匹配的错误可以固定 验证集 的图片尺寸,方法如下 ->

找到下面这个文件ultralytics/ models /yolo/detect/train.py然后其中有一个类是DetectionTrainer class中的build_dataset函数中的一个参数rect=mode == 'val'改为rect=False


五、UniRepLknet的yaml文件

复制如下yaml文件进行运行!!!


5.1 UniRepLknet 的yaml文件版本1

此版本训练信息:YOLO11-UniRepLKNet summary: 570 layers, 1,876,231 parameters, 1,876,215 gradients, 4.5 GFLOPs

使用说明:# 下面 [-1, 1, UniRepLKNet, [0.25]] 参数位置的0.25是通道放缩的系数, YOLOv11N是0.25 YOLOv11S是0.5 YOLOv11M是1. YOLOv11l是1 YOLOv11是1.5大家根据自己训练的YOLO版本设定即可.

# 本文支持版本有'unireplknet_a', 'unireplknet_f', 'unireplknet_p', 'unireplknet_n', 'unireplknet_t', 'unireplknet_s', 'unireplknet_b', 'unireplknet_l', 'unireplknet_xl'

  1. # Ultralytics YOLO 🚀, AGPL-3.0 license
  2. # YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
  3. # Parameters
  4. nc: 80 # number of classes
  5. scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  6. # [depth, width, max_channels]
  7. n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  8. s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  9. m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  10. l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  11. x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
  12. # 下面 [-1, 1, LSKNet, [0.250.5]] 参数位置的0.25是通道放缩的系数, YOLOv11N是0.25 YOLOv11S是0.5 YOLOv11M是1. YOLOv11l是1 YOLOv111.5大家根据自己训练的YOLO版本设定即可.
  13. # 支持的版本有: __all__ = ['unireplknet_a', 'unireplknet_f', 'unireplknet_p', 'unireplknet_n', 'unireplknet_t', 'unireplknet_s', 'unireplknet_b', 'unireplknet_l', 'unireplknet_xl']
  14. # YOLO11n backbone
  15. backbone:
  16. # [from, repeats, module, args]
  17. - [-1, 1, unireplknet_a, [0.25]] # 0-4 P1/2
  18. - [-1, 1, SPPF, [1024, 5]] # 5
  19. - [-1, 2, C2PSA, [1024]] # 6
  20. # YOLO11n head
  21. head:
  22. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  23. - [[-1, 3], 1, Concat, [1]] # cat backbone P4
  24. - [-1, 2, C3k2, [512, False]] # 9
  25. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  26. - [[-1, 2], 1, Concat, [1]] # cat backbone P3
  27. - [-1, 2, C3k2, [256, False]] # 12 (P3/8-small)
  28. - [-1, 1, Conv, [256, 3, 2]]
  29. - [[-1, 9], 1, Concat, [1]] # cat head P4
  30. - [-1, 2, C3k2, [512, False]] # 15 (P4/16-medium)
  31. - [-1, 1, Conv, [512, 3, 2]]
  32. - [[-1, 6], 1, Concat, [1]] # cat head P5
  33. - [-1, 2, C3k2, [1024, True]] # 18 (P5/32-large)
  34. - [[12, 15, 18], 1, Detect, [nc]] # Detect(P3, P4, P5)


5.2 训练文件

  1. import warnings
  2. warnings.filterwarnings('ignore')
  3. from ultralytics import YOLO
  4. if __name__ == '__main__':
  5. model = YOLO('ultralytics/cfg/models/v8/yolov8-C2f-FasterBlock.yaml')
  6. # model.load('yolov8n.pt') # loading pretrain weights
  7. model.train(data=r'替换数据集yaml文件地址',
  8. # 如果大家任务是其它的'ultralytics/cfg/default.yaml'找到这里修改task可以改成detect, segment, classify, pose
  9. cache=False,
  10. imgsz=640,
  11. epochs=150,
  12. single_cls=False, # 是否是单类别检测
  13. batch=4,
  14. close_mosaic=10,
  15. workers=0,
  16. device='0',
  17. optimizer='SGD', # using SGD
  18. # resume='', # 如过想续训就设置last.pt的地址
  19. amp=False, # 如果出现训练损失为Nan可以关闭amp
  20. project='runs/train',
  21. name='exp',
  22. )


六、成功运行记录

下面是成功运行的截图,已经完成了有1个epochs的训练,图片太大截不全第2个epochs,这里改完之后打印出了点问题,但是不影响任何功能,后期我找时间修复一下这个问题。

​​


七、本文总结

到此本文的正式分享内容就结束了,在这里给大家推荐我的YOLOv11改进有效涨点专栏,本专栏目前为新开的平均质量分98分,后期我会根据各种最新的前沿顶会进行论文复现,也会对一些老的改进机制进行补充, 目前本专栏免费阅读(暂时,大家尽早关注不迷路~) ,如果大家觉得本文帮助到你了,订阅本专栏,关注后续更多的更新~

​​​