YOLOv11改进 | Conv/卷积篇 | 2024最新线性可变形卷积LDConv替换传统下采样二次创新C3k2(附代码 + 修改方式)

一、本文介绍

本文给大家带来的最新改进机制是利用 2024 最新的线性 可变形卷积 LDConv 替换YOLOv11的传统 下采样 操作(值得一提的是这个作者和RFAConv是同一个作者),介绍了一种新型的卷积操作——线性可变形卷积(LDConv)。LDConv 旨在解决标准卷积操作的局限性,标准卷积在固定形状和大小的局部窗口中进行采样,难以动态适应不同物体的形状。可变形卷积(Deformable Conv)虽然允许灵活的采样位置,但其参数数量随着卷积核大小呈平方增长,计算效率较低。 LDConv 提供了比可变形卷积更大的灵活性 允许卷积核的参数数量呈线性增长,从而克服了可变形卷积参数数量平方增长的问题 该方法可以起到轻量化的作用

欢迎大家订阅我的专栏一起学习YOLO!


目录

一、本文介绍

二、原理介绍

三、核心代码

四、添加方法

4.1 修改一

4.2 修改二

4.3 修改三

4.4 修改四

五、正式训练

5.1 yaml文件1

5.2 yaml文件2

5.3 yaml文件3

5.4 训练代码

5.5 训练过程截图

五、本文总结


二、原理介绍

官方论文地址: 官方论文地址点击此处即可跳转

官方代码地址: 官方代码地址点击此处即可跳转


这篇文章题为《LDConv: 用于改进卷积神经网络的线性可变形卷积》 ,介绍了一种新型的 卷积操作 ——线性可变形卷积(LDConv)。LDConv 旨在解决标准卷积操作的局限性,标准卷积在固定形状和大小的局部窗口中进行采样,难以动态适应不同物体的形状。可变形卷积(Deformable Conv)虽然允许灵活的采样位置,但其参数数量随着卷积核大小呈平方增长,计算效率较低。

主要内容与原理:
1. 标准卷积的局限性:传统的 卷积神经网络 (CNN)使用固定的方形卷积核,无法动态调整以适应变化的目标形状,这限制了网络从不同空间位置捕捉信息的能力。

2. 可变形卷积(Deformable Conv):可变形卷积通过引入偏移量来调整采样网格,使得卷积核能够灵活地适应物体的形状。然而,其参数数量依然随卷积核的增大而平方增长,计算效率较低。

3. LDConv的引入:
- LDConv 提供了比可变形卷积更大的灵活性,允许卷积核的参数数量呈线性增长,从而克服了可变形卷积参数数量平方增长的问题。
- 它引入了一种坐标生成 算法 ,可以为任意大小的卷积核生成不同的初始采样位置。
- 通过偏移量动态调整采样形状,使卷积核能够更精确地适应目标形状,从而提高特征提取效率。

4.主要贡献:
- LDConv 为参数数量和卷积核大小提供了更多的灵活性,能够在网络开销和性能之间实现更好的平衡。
- 它可用于目标检测等 计算机视觉 任务,实验证明其在COCO2017、VOC 和 VisDrone-DET2021 数据集上表现优越。

5. 目标检测实验:在多个数据集上的实验表明,LDConv 在目标检测任务中提升了CNN的性能,尤其是在处理大目标时,得益于其灵活的采样形状调整能力。

6. 应用与灵活性:
- LDConv 可以替换传统的卷积操作,提升网络性能的同时,不显著增加计算成本。
- 该方法是一种即插即用的卷积操作,能轻松集成到现有 模型 中,并提高在各种任务中的表现 (本文用于替换YOLOv8中的Conv)
- LDConv 还可以用于其他模块(如 FasterBlock 和 GSBottleneck),进一步提高网络效率并减少参数增长 (二次创新)

该论文强调,LDConv 通过灵活调整卷积核的形状和大小,提供了比现有方法(如标准卷积和可变形卷积)更好的计算效率和网络性能的平衡。


三、核心代码

代码的使用方式看章节四!

  1. import math
  2. import torch
  3. import torch.nn as nn
  4. from einops import rearrange
  5. __all__ = ['LDConv', 'C3k2_LDConv1', 'C3k2_LDConv2']
  6. class LDConv(nn.Module):
  7. def __init__(self, inc, outc, num_param, stride=1, bias=None):
  8. super(LDConv, self).__init__()
  9. self.num_param = num_param
  10. self.stride = stride
  11. self.conv = nn.Sequential(nn.Conv2d(inc, outc, kernel_size=(num_param, 1), stride=(num_param, 1), bias=bias),
  12. nn.BatchNorm2d(outc),
  13. nn.SiLU()) # the conv adds the BN and SiLU to compare original Conv in YOLOv5.
  14. self.p_conv = nn.Conv2d(inc, 2 * num_param, kernel_size=3, padding=1, stride=stride)
  15. nn.init.constant_(self.p_conv.weight, 0)
  16. self.p_conv.register_full_backward_hook(self._set_lr)
  17. @staticmethod
  18. def _set_lr(module, grad_input, grad_output):
  19. grad_input = (grad_input[i] * 0.1 for i in range(len(grad_input)))
  20. grad_output = (grad_output[i] * 0.1 for i in range(len(grad_output)))
  21. def forward(self, x):
  22. # N is num_param.
  23. offset = self.p_conv(x)
  24. dtype = offset.data.type()
  25. N = offset.size(1) // 2
  26. # (b, 2N, h, w)
  27. p = self._get_p(offset, dtype)
  28. # (b, h, w, 2N)
  29. p = p.contiguous().permute(0, 2, 3, 1)
  30. q_lt = p.detach().floor()
  31. q_rb = q_lt + 1
  32. q_lt = torch.cat([torch.clamp(q_lt[..., :N], 0, x.size(2) - 1), torch.clamp(q_lt[..., N:], 0, x.size(3) - 1)],
  33. dim=-1).long()
  34. q_rb = torch.cat([torch.clamp(q_rb[..., :N], 0, x.size(2) - 1), torch.clamp(q_rb[..., N:], 0, x.size(3) - 1)],
  35. dim=-1).long()
  36. q_lb = torch.cat([q_lt[..., :N], q_rb[..., N:]], dim=-1)
  37. q_rt = torch.cat([q_rb[..., :N], q_lt[..., N:]], dim=-1)
  38. # clip p
  39. p = torch.cat([torch.clamp(p[..., :N], 0, x.size(2) - 1), torch.clamp(p[..., N:], 0, x.size(3) - 1)], dim=-1)
  40. # bilinear kernel (b, h, w, N)
  41. g_lt = (1 + (q_lt[..., :N].type_as(p) - p[..., :N])) * (1 + (q_lt[..., N:].type_as(p) - p[..., N:]))
  42. g_rb = (1 - (q_rb[..., :N].type_as(p) - p[..., :N])) * (1 - (q_rb[..., N:].type_as(p) - p[..., N:]))
  43. g_lb = (1 + (q_lb[..., :N].type_as(p) - p[..., :N])) * (1 - (q_lb[..., N:].type_as(p) - p[..., N:]))
  44. g_rt = (1 - (q_rt[..., :N].type_as(p) - p[..., :N])) * (1 + (q_rt[..., N:].type_as(p) - p[..., N:]))
  45. # resampling the features based on the modified coordinates.
  46. x_q_lt = self._get_x_q(x, q_lt, N)
  47. x_q_rb = self._get_x_q(x, q_rb, N)
  48. x_q_lb = self._get_x_q(x, q_lb, N)
  49. x_q_rt = self._get_x_q(x, q_rt, N)
  50. # bilinear
  51. x_offset = g_lt.unsqueeze(dim=1) * x_q_lt + \
  52. g_rb.unsqueeze(dim=1) * x_q_rb + \
  53. g_lb.unsqueeze(dim=1) * x_q_lb + \
  54. g_rt.unsqueeze(dim=1) * x_q_rt
  55. x_offset = self._reshape_x_offset(x_offset, self.num_param)
  56. out = self.conv(x_offset)
  57. return out
  58. # generating the inital sampled shapes for the LDConv with different sizes.
  59. def _get_p_n(self, N, dtype):
  60. base_int = round(math.sqrt(self.num_param))
  61. row_number = self.num_param // base_int
  62. mod_number = self.num_param % base_int
  63. p_n_x, p_n_y = torch.meshgrid(
  64. torch.arange(0, row_number),
  65. torch.arange(0, base_int))
  66. p_n_x = torch.flatten(p_n_x)
  67. p_n_y = torch.flatten(p_n_y)
  68. if mod_number > 0:
  69. mod_p_n_x, mod_p_n_y = torch.meshgrid(
  70. torch.arange(row_number, row_number + 1),
  71. torch.arange(0, mod_number))
  72. mod_p_n_x = torch.flatten(mod_p_n_x)
  73. mod_p_n_y = torch.flatten(mod_p_n_y)
  74. p_n_x, p_n_y = torch.cat((p_n_x, mod_p_n_x)), torch.cat((p_n_y, mod_p_n_y))
  75. p_n = torch.cat([p_n_x, p_n_y], 0)
  76. p_n = p_n.view(1, 2 * N, 1, 1).type(dtype)
  77. return p_n
  78. # no zero-padding
  79. def _get_p_0(self, h, w, N, dtype):
  80. p_0_x, p_0_y = torch.meshgrid(
  81. torch.arange(0, h * self.stride, self.stride),
  82. torch.arange(0, w * self.stride, self.stride))
  83. p_0_x = torch.flatten(p_0_x).view(1, 1, h, w).repeat(1, N, 1, 1)
  84. p_0_y = torch.flatten(p_0_y).view(1, 1, h, w).repeat(1, N, 1, 1)
  85. p_0 = torch.cat([p_0_x, p_0_y], 1).type(dtype)
  86. return p_0
  87. def _get_p(self, offset, dtype):
  88. N, h, w = offset.size(1) // 2, offset.size(2), offset.size(3)
  89. # (1, 2N, 1, 1)
  90. p_n = self._get_p_n(N, dtype)
  91. # (1, 2N, h, w)
  92. p_0 = self._get_p_0(h, w, N, dtype)
  93. p = p_0 + p_n + offset
  94. return p
  95. def _get_x_q(self, x, q, N):
  96. b, h, w, _ = q.size()
  97. padded_w = x.size(3)
  98. c = x.size(1)
  99. # (b, c, h*w)
  100. x = x.contiguous().view(b, c, -1)
  101. # (b, h, w, N)
  102. index = q[..., :N] * padded_w + q[..., N:] # offset_x*w + offset_y
  103. # (b, c, h*w*N)
  104. index = index.contiguous().unsqueeze(dim=1).expand(-1, c, -1, -1, -1).contiguous().view(b, c, -1)
  105. x_offset = x.gather(dim=-1, index=index).contiguous().view(b, c, h, w, N)
  106. return x_offset
  107. # Stacking resampled features in the row direction.
  108. @staticmethod
  109. def _reshape_x_offset(x_offset, num_param):
  110. b, c, h, w, n = x_offset.size()
  111. # using Conv3d
  112. # x_offset = x_offset.permute(0,1,4,2,3), then Conv3d(c,c_out, kernel_size =(num_param,1,1),stride=(num_param,1,1),bias= False)
  113. # using 1 × 1 Conv
  114. # x_offset = x_offset.permute(0,1,4,2,3), then, x_offset.view(b,c×num_param,h,w) finally, Conv2d(c×num_param,c_out, kernel_size =1,stride=1,bias= False)
  115. # using the column conv as follow, then, Conv2d(inc, outc, kernel_size=(num_param, 1), stride=(num_param, 1), bias=bias)
  116. x_offset = rearrange(x_offset, 'b c h w n -> b c (h n) w')
  117. return x_offset
  118. class Bottleneck(nn.Module):
  119. """Standard bottleneck."""
  120. def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5):
  121. """Initializes a standard bottleneck module with optional shortcut connection and configurable parameters."""
  122. super().__init__()
  123. c_ = int(c2 * e) # hidden channels
  124. self.cv1 = Conv(c1, c_, k[0], 1)
  125. self.cv2 = Conv(c_, c2, k[1], 1, g=g)
  126. self.add = shortcut and c1 == c2
  127. def forward(self, x):
  128. """Applies the YOLO FPN to input data."""
  129. return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))
  130. class Bottleneck_LDConv(nn.Module):
  131. # Standard bottleneck with DCN
  132. def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5): # ch_in, ch_out, shortcut, groups, kernels, expand
  133. super().__init__()
  134. c_ = int(c2 * e) # hidden channels
  135. self.cv1 = Conv(c1, c_, k[0], 1)
  136. self.cv2 = LDConv(c_, c2, 3)
  137. self.add = shortcut and c1 == c2
  138. def forward(self, x):
  139. return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))
  140. def autopad(k, p=None, d=1): # kernel, padding, dilation
  141. """Pad to 'same' shape outputs."""
  142. if d > 1:
  143. k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k] # actual kernel-size
  144. if p is None:
  145. p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad
  146. return p
  147. class Conv(nn.Module):
  148. """Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""
  149. default_act = nn.SiLU() # default activation
  150. def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
  151. """Initialize Conv layer with given arguments including activation."""
  152. super().__init__()
  153. self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
  154. self.bn = nn.BatchNorm2d(c2)
  155. self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()
  156. def forward(self, x):
  157. """Apply convolution, batch normalization and activation to input tensor."""
  158. return self.act(self.bn(self.conv(x)))
  159. def forward_fuse(self, x):
  160. """Perform transposed convolution of 2D data."""
  161. return self.act(self.conv(x))
  162. class C2f(nn.Module):
  163. """Faster Implementation of CSP Bottleneck with 2 convolutions."""
  164. def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):
  165. """Initializes a CSP bottleneck with 2 convolutions and n Bottleneck blocks for faster processing."""
  166. super().__init__()
  167. self.c = int(c2 * e) # hidden channels
  168. self.cv1 = Conv(c1, 2 * self.c, 1, 1)
  169. self.cv2 = Conv((2 + n) * self.c, c2, 1) # optional act=FReLU(c2)
  170. self.m = nn.ModuleList(Bottleneck(self.c, self.c, shortcut, g, k=((3, 3), (3, 3)), e=1.0) for _ in range(n))
  171. def forward(self, x):
  172. """Forward pass through C2f layer."""
  173. y = list(self.cv1(x).chunk(2, 1))
  174. y.extend(m(y[-1]) for m in self.m)
  175. return self.cv2(torch.cat(y, 1))
  176. def forward_split(self, x):
  177. """Forward pass using split() instead of chunk()."""
  178. y = list(self.cv1(x).split((self.c, self.c), 1))
  179. y.extend(m(y[-1]) for m in self.m)
  180. return self.cv2(torch.cat(y, 1))
  181. class C3(nn.Module):
  182. """CSP Bottleneck with 3 convolutions."""
  183. def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):
  184. """Initialize the CSP Bottleneck with given channels, number, shortcut, groups, and expansion values."""
  185. super().__init__()
  186. c_ = int(c2 * e) # hidden channels
  187. self.cv1 = Conv(c1, c_, 1, 1)
  188. self.cv2 = Conv(c1, c_, 1, 1)
  189. self.cv3 = Conv(2 * c_, c2, 1) # optional act=FReLU(c2)
  190. self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, k=((1, 1), (3, 3)), e=1.0) for _ in range(n)))
  191. def forward(self, x):
  192. """Forward pass through the CSP bottleneck with 2 convolutions."""
  193. return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), 1))
  194. class C3k(C3):
  195. """C3k is a CSP bottleneck module with customizable kernel sizes for feature extraction in neural networks."""
  196. def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5, k=3):
  197. """Initializes the C3k module with specified channels, number of layers, and configurations."""
  198. super().__init__(c1, c2, n, shortcut, g, e)
  199. c_ = int(c2 * e) # hidden channels
  200. # self.m = nn.Sequential(*(RepBottleneck(c_, c_, shortcut, g, k=(k, k), e=1.0) for _ in range(n)))
  201. self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, k=(k, k), e=1.0) for _ in range(n)))
  202. class C3kLDConv(C3):
  203. """C3k is a CSP bottleneck module with customizable kernel sizes for feature extraction in neural networks."""
  204. def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5, k=3):
  205. """Initializes the C3k module with specified channels, number of layers, and configurations."""
  206. super().__init__(c1, c2, n, shortcut, g, e)
  207. c_ = int(c2 * e) # hidden channels
  208. # self.m = nn.Sequential(*(RepBottleneck(c_, c_, shortcut, g, k=(k, k), e=1.0) for _ in range(n)))
  209. self.m = nn.Sequential(*(Bottleneck_LDConv(c_, c_, shortcut, g, k=(k, k), e=1.0) for _ in range(n)))
  210. class C3k2_LDConv1(C2f):
  211. """Faster Implementation of CSP Bottleneck with 2 convolutions."""
  212. def __init__(self, c1, c2, n=1, c3k=False, e=0.5, g=1, shortcut=True):
  213. """Initializes the C3k2 module, a faster CSP Bottleneck with 2 convolutions and optional C3k blocks."""
  214. super().__init__(c1, c2, n, shortcut, g, e)
  215. self.m = nn.ModuleList(
  216. C3k(self.c, self.c, 2, shortcut, g) if c3k else Bottleneck_LDConv(self.c, self.c, shortcut, g) for _ in range(n)
  217. )
  218. class C3k2_LDConv2(C2f):
  219. """Faster Implementation of CSP Bottleneck with 2 convolutions."""
  220. def __init__(self, c1, c2, n=1, c3k=False, e=0.5, g=1, shortcut=True):
  221. """Initializes the C3k2 module, a faster CSP Bottleneck with 2 convolutions and optional C3k blocks."""
  222. super().__init__(c1, c2, n, shortcut, g, e)
  223. self.m = nn.ModuleList(
  224. C3kLDConv(self.c, self.c, 2, shortcut, g) if c3k else Bottleneck(self.c, self.c, shortcut, g) for _ in range(n)
  225. )
  226. if __name__ == "__main__":
  227. # Generating Sample image
  228. image_size = (1, 64, 224, 224)
  229. image = torch.rand(*image_size)
  230. # Model
  231. model = C3k2_LDConv2(64, 64)
  232. out = model(image)
  233. print(out.size())


四、添加方法

4.1 修改一

第一还是建立文件,我们找到如下 ultralytics /nn文件夹下建立一个目录名字呢就是'Addmodules'文件夹 !然后在其内部建立一个新的py文件将核心代码复制粘贴进去即可。


4.2 修改二

第二步我们在该目录下创建一个新的py文件名字为'__init__.py' ,然后在其内部导入我们的检测头如下图所示。


4.3 修改三

第三步找到如下文件'ultralytics/nn/tasks.py'进行导入和注册我们的模块( !


4.4 修改四

找到文件到如下文件'ultralytics/nn/tasks.py',在其中的parse_model方法中添加即可。


到此就修改完成了,大家可以复制下面的yaml文件运行。


五、正式训练


5.1 yaml文件1

训练信息:YOLO11-LDConv summary: 337 layers, 2,427,141 parameters, 2,427,125 gradients, 6.2 GFLOPs

  1. # Ultralytics YOLO 🚀, AGPL-3.0 license
  2. # YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
  3. # Parameters
  4. nc: 80 # number of classes
  5. scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  6. # [depth, width, max_channels]
  7. n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  8. s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  9. m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  10. l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  11. x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
  12. # YOLO11n backbone
  13. backbone:
  14. # [from, repeats, module, args]
  15. - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  16. - [-1, 1, LDConv, [128, 6, 2]] # 1-P2/4
  17. - [-1, 2, C3k2, [256, False, 0.25]]
  18. - [-1, 1, LDConv, [256, 6, 2]] # 3-P3/8
  19. - [-1, 2, C3k2, [512, False, 0.25]]
  20. - [-1, 1, LDConv, [512, 6, 2]] # 5-P4/16
  21. - [-1, 2, C3k2, [512, True]]
  22. - [-1, 1, LDConv, [1024, 6, 2]] # 7-P5/32
  23. - [-1, 2, C3k2, [1024, True]]
  24. - [-1, 1, SPPF, [1024, 5]] # 9
  25. - [-1, 2, C2PSA, [1024]] # 10
  26. # YOLO11n head
  27. head:
  28. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  29. - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  30. - [-1, 2, C3k2, [512, False]] # 13
  31. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  32. - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  33. - [-1, 2, C3k2, [256, False]] # 16 (P3/8-small)
  34. - [-1, 1, LDConv, [256, 6, 2]]
  35. - [[-1, 13], 1, Concat, [1]] # cat head P4
  36. - [-1, 2, C3k2, [512, False]] # 19 (P4/16-medium)
  37. - [-1, 1, LDConv, [512, 6, 2]]
  38. - [[-1, 10], 1, Concat, [1]] # cat head P5
  39. - [-1, 2, C3k2, [1024, True]] # 22 (P5/32-large)
  40. - [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)


5.2 yaml文件2

训练信息:YOLO11-C3k2-LDConv-1 summary: 335 layers, 2,566,923 parameters, 2,566,907 gradients, 6.3 GFLOPs

  1. # Ultralytics YOLO 🚀, AGPL-3.0 license
  2. # YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
  3. # Parameters
  4. nc: 80 # number of classes
  5. scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  6. # [depth, width, max_channels]
  7. n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  8. s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  9. m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  10. l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  11. x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
  12. # YOLO11n backbone
  13. backbone:
  14. # [from, repeats, module, args]
  15. - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  16. - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
  17. - [-1, 2, C3k2_LDConv1, [256, False, 0.25]]
  18. - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
  19. - [-1, 2, C3k2_LDConv1, [512, False, 0.25]]
  20. - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
  21. - [-1, 2, C3k2_LDConv1, [512, True]]
  22. - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
  23. - [-1, 2, C3k2_LDConv1, [1024, True]]
  24. - [-1, 1, SPPF, [1024, 5]] # 9
  25. - [-1, 2, C2PSA, [1024]] # 10
  26. # YOLO11n head
  27. head:
  28. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  29. - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  30. - [-1, 2, C3k2_LDConv1, [512, False]] # 13
  31. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  32. - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  33. - [-1, 2, C3k2_LDConv1, [256, False]] # 16 (P3/8-small)
  34. - [-1, 1, Conv, [256, 3, 2]]
  35. - [[-1, 13], 1, Concat, [1]] # cat head P4
  36. - [-1, 2, C3k2_LDConv1, [512, False]] # 19 (P4/16-medium)
  37. - [-1, 1, Conv, [512, 3, 2]]
  38. - [[-1, 10], 1, Concat, [1]] # cat head P5
  39. - [-1, 2, C3k2_LDConv1, [1024, True]] # 22 (P5/32-large)
  40. - [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)


5.3 yaml文件3

训练信息:YOLO11-C3k2-LDConv-2 summary: 338 layers, 2,499,489 parameters, 2,499,473 gradients, 6.4 GFLOPs

  1. # Ultralytics YOLO 🚀, AGPL-3.0 license
  2. # YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
  3. # Parameters
  4. nc: 80 # number of classes
  5. scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  6. # [depth, width, max_channels]
  7. n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  8. s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  9. m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  10. l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  11. x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
  12. # YOLO11n backbone
  13. backbone:
  14. # [from, repeats, module, args]
  15. - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  16. - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
  17. - [-1, 2, C3k2_LDConv2, [256, False, 0.25]]
  18. - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
  19. - [-1, 2, C3k2_LDConv2, [512, False, 0.25]]
  20. - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
  21. - [-1, 2, C3k2_LDConv2, [512, True]]
  22. - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
  23. - [-1, 2, C3k2_LDConv2, [1024, True]]
  24. - [-1, 1, SPPF, [1024, 5]] # 9
  25. - [-1, 2, C2PSA, [1024]] # 10
  26. # YOLO11n head
  27. head:
  28. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  29. - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  30. - [-1, 2, C3k2_LDConv2, [512, False]] # 13
  31. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  32. - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  33. - [-1, 2, C3k2_LDConv2, [256, False]] # 16 (P3/8-small)
  34. - [-1, 1, Conv, [256, 3, 2]]
  35. - [[-1, 13], 1, Concat, [1]] # cat head P4
  36. - [-1, 2, C3k2_LDConv2, [512, False]] # 19 (P4/16-medium)
  37. - [-1, 1, Conv, [512, 3, 2]]
  38. - [[-1, 10], 1, Concat, [1]] # cat head P5
  39. - [-1, 2, C3k2_LDConv2, [1024, True]] # 22 (P5/32-large)
  40. - [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)

5.4 训练代码

大家可以创建一个py文件将我给的代码复制粘贴进去,配置好自己的文件路径即可运行。

  1. import warnings
  2. warnings.filterwarnings('ignore')
  3. from ultralytics import YOLO
  4. if __name__ == '__main__':
  5. model = YOLO('yolov8-MLLA.yaml')
  6. # 如何切换模型版本, 上面的ymal文件可以改为 yolov8s.yaml就是使用的v8s,
  7. # 类似某个改进的yaml文件名称为yolov8-XXX.yaml那么如果想使用其它版本就把上面的名称改为yolov8l-XXX.yaml即可(改的是上面YOLO中间的名字不是配置文件的)!
  8. # model.load('yolov8n.pt') # 是否加载预训练权重,科研不建议大家加载否则很难提升精度
  9. model.train(data=r"C:\Users\Administrator\PycharmProjects\yolov5-master\yolov5-master\Construction Site Safety.v30-raw-images_latestversion.yolov8\data.yaml",
  10. # 如果大家任务是其它的'ultralytics/cfg/default.yaml'找到这里修改task可以改成detect, segment, classify, pose
  11. cache=False,
  12. imgsz=640,
  13. epochs=150,
  14. single_cls=False, # 是否是单类别检测
  15. batch=16,
  16. close_mosaic=0,
  17. workers=0,
  18. device='0',
  19. optimizer='SGD', # using SGD
  20. # resume='runs/train/exp21/weights/last.pt', # 如过想续训就设置last.pt的地址
  21. amp=False, # 如果出现训练损失为Nan可以关闭amp
  22. project='runs/train',
  23. name='exp',
  24. )


5.5 训练过程截图


五、本文总结

到此本文的正式分享内容就结束了,在这里给大家推荐我的YOLOv11改进有效涨点专栏,本专栏目前为新开的平均质量分98分,后期我会根据各种最新的前沿顶会进行论文复现,也会对一些老的改进机制进行补充,如果大家觉得本文帮助到你了,订阅本专栏,关注后续更多的更新~