YOLOv11改进 | 细节创新篇 | iAFF迭代注意力特征融合改进C3k2助力yolov11有效涨点

一、本文介绍

本文给大家带来的最新改进机制是 iAFF(迭代注意力特征融合) ,其主要思想是通过改善特征融合过程来提高检测精度。传统的 特征融合方法 如加法或串联简单,未考虑到特定对象的融合适用性。iAFF通过引入多尺度通道注意力模块 (我个人觉得这个改进机制就算融合了注意力机制的求和操作) ,更好地整合不同尺度和语义不一致的特征。 该方法属于细节上的改进 并不影响任何其它的模块,非常适合大家进行融合改进,单独使用也是有一定的涨点效果。

欢迎大家订阅我的专栏一起学习YOLO!


目录

一、本文介绍

二、iAFF的基本框架原理

三、iAFF的核心代码

四、手把手教你添加C3k2iAFF

4.1 修改一

4.2 修改二

4.3 修改三

4.4 修改四

五、正式训练

5.1 yaml文件1

5.2 训练代码

5.3 训练过程截图

五、本文总结


二、iAFF的基本框架原理

​​

官方论文地址: 官方论文地址点击即可跳转

官方代码地址: 官方代码地址点击即可跳转

​​


iAFF的主要思想在于通过更精细的注意力机制来改善特征融合,从而增强 卷积神经网络 。它不仅处理了由于尺度和语义不一致而引起的特征融合问题,还引入了 多尺度 通道注意力模块,提供了一种统一且通用的特征融合方案。此外,iAFF通过迭代注意力特征融合来解决特征图初始整合可能成为的瓶颈。这种方法使得 模型 即使在层数或参数较少的情况下,也能取得到较好的效果。

iAFF的创新点主要包括:

1. 注意力特征融合: 提出了一种新的特征融合方式,利用注意力机制来改善传统的简单特征融合方法(如加和或串联)。

2. 多尺度通道注意力模块: 解决了在不同尺度上融合特征时出现的问题,特别是语义和尺度不一致的特征融合问题。

3. 迭代注意力特征融合(iAFF): 通过迭代地应用注意力机制来改善特征图的初步整合,克服了初步整合可能成为性能瓶颈的问题。

​​

这张图片是关于所提出的AFF(注意力特征融合)和iAFF(迭代注意力特征融合)的示意图。图中展示了两种结构:

(a) AFF: 展示了一个通过多尺度通道 注意力模块 (MS-CAM)来融合不同特征的基本框架。特征图X和Y通过MS-CAM和其他操作融合,产生输出Z。

(b) iAFF: 与AFF类似,但添加了迭代结构。在这里,输出Z回馈到输入,与X和Y一起再次经过MS-CAM和融合操作,以进一步细化特征融合过程。

(这两种方法都是文章中提出的我仅使用了iAFF也就是更复杂的版本,大家对于AFF有兴趣的可以按照我的该法进行相似添加即可)


三、iAFF的核心代码

使用方式看章节四!

  1. import torch
  2. import torch.nn as nn
  3. __all__ = ['C3k2_iAFF']
  4. def autopad(k, p=None, d=1): # kernel, padding, dilation
  5. """Pad to 'same' shape outputs."""
  6. if d > 1:
  7. k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k] # actual kernel-size
  8. if p is None:
  9. p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad
  10. return p
  11. class Conv(nn.Module):
  12. """Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""
  13. default_act = nn.SiLU() # default activation
  14. def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
  15. """Initialize Conv layer with given arguments including activation."""
  16. super().__init__()
  17. self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
  18. self.bn = nn.BatchNorm2d(c2)
  19. self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()
  20. def forward(self, x):
  21. """Apply convolution, batch normalization and activation to input tensor."""
  22. return self.act(self.bn(self.conv(x)))
  23. def forward_fuse(self, x):
  24. """Perform transposed convolution of 2D data."""
  25. return self.act(self.conv(x))
  26. class iAFF(nn.Module):
  27. '''
  28. 多特征融合 iAFF
  29. '''
  30. def __init__(self, channels=64, r=2):
  31. super(iAFF, self).__init__()
  32. inter_channels = int(channels // r)
  33. # 本地注意力
  34. self.local_att = nn.Sequential(
  35. nn.Conv2d(channels, inter_channels, kernel_size=1, stride=1, padding=0),
  36. nn.BatchNorm2d(inter_channels),
  37. nn.ReLU(inplace=True),
  38. nn.Conv2d(inter_channels, channels, kernel_size=1, stride=1, padding=0),
  39. nn.BatchNorm2d(channels),
  40. )
  41. # 全局注意力
  42. self.global_att = nn.Sequential(
  43. nn.AdaptiveAvgPool2d(1),
  44. nn.Conv2d(channels, inter_channels, kernel_size=1, stride=1, padding=0),
  45. nn.ReLU(inplace=True),
  46. nn.Conv2d(inter_channels, channels, kernel_size=1, stride=1, padding=0),
  47. )
  48. # 第二次本地注意力
  49. self.local_att2 = nn.Sequential(
  50. nn.Conv2d(channels, inter_channels, kernel_size=1, stride=1, padding=0),
  51. nn.BatchNorm2d(inter_channels),
  52. nn.ReLU(inplace=True),
  53. nn.Conv2d(inter_channels, channels, kernel_size=1, stride=1, padding=0),
  54. nn.BatchNorm2d(channels),
  55. )
  56. # 第二次全局注意力
  57. self.global_att2 = nn.Sequential(
  58. nn.AdaptiveAvgPool2d(1),
  59. nn.Conv2d(channels, inter_channels, kernel_size=1, stride=1, padding=0),
  60. nn.BatchNorm2d(inter_channels),
  61. nn.ReLU(inplace=True),
  62. nn.Conv2d(inter_channels, channels, kernel_size=1, stride=1, padding=0),
  63. nn.BatchNorm2d(channels),
  64. )
  65. self.sigmoid = nn.Sigmoid()
  66. def forward(self, x, residual):
  67. xa = x + residual
  68. xl = self.local_att(xa)
  69. xg = self.global_att(xa)
  70. xlg = xl + xg
  71. wei = self.sigmoid(xlg)
  72. xi = x * wei + residual * (1 - wei)
  73. xl2 = self.local_att2(xi)
  74. xg2 = self.global_att(xi)
  75. xlg2 = xl2 + xg2
  76. wei2 = self.sigmoid(xlg2)
  77. xo = x * wei2 + residual * (1 - wei2)
  78. return xo
  79. class AFF(nn.Module):
  80. '''
  81. 多特征融合 AFF
  82. '''
  83. def __init__(self, channels=64, r=4):
  84. super(AFF, self).__init__()
  85. inter_channels = int(channels // r)
  86. self.local_att = nn.Sequential(
  87. nn.Conv2d(channels, inter_channels, kernel_size=1, stride=1, padding=0),
  88. nn.BatchNorm2d(inter_channels),
  89. nn.ReLU(inplace=True),
  90. nn.Conv2d(inter_channels, channels, kernel_size=1, stride=1, padding=0),
  91. nn.BatchNorm2d(channels),
  92. )
  93. self.global_att = nn.Sequential(
  94. nn.AdaptiveAvgPool2d(1),
  95. nn.Conv2d(channels, inter_channels, kernel_size=1, stride=1, padding=0),
  96. nn.BatchNorm2d(inter_channels),
  97. nn.ReLU(inplace=True),
  98. nn.Conv2d(inter_channels, channels, kernel_size=1, stride=1, padding=0),
  99. nn.BatchNorm2d(channels),
  100. )
  101. self.sigmoid = nn.Sigmoid()
  102. def forward(self, x, residual):
  103. xa = x + residual
  104. xl = self.local_att(xa)
  105. xg = self.global_att(xa)
  106. xlg = xl + xg
  107. wei = self.sigmoid(xlg)
  108. xo = 2 * x * wei + 2 * residual * (1 - wei)
  109. return xo
  110. class Bottleneck_iAFF(nn.Module):
  111. """Standard bottleneck."""
  112. def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5):
  113. """Initializes a bottleneck module with given input/output channels, shortcut option, group, kernels, and
  114. expansion.
  115. """
  116. super().__init__()
  117. c_ = int(c2 * e) # hidden channels
  118. self.cv1 = Conv(c1, c_, k[0], 1)
  119. self.cv2 = Conv(c_, c2, k[1], 1, g=g)
  120. self.add = shortcut and c1 == c2
  121. self.iAFF = iAFF(c2)
  122. def forward(self, x):
  123. """'forward()' applies the YOLO FPN to input data."""
  124. if self.add:
  125. results = self.iAFF(x , self.cv2(self.cv1(x)))
  126. else:
  127. results = self.cv2(self.cv1(x))
  128. return results
  129. class Bottleneck_AFF(nn.Module):
  130. """Standard bottleneck."""
  131. def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5):
  132. """Initializes a bottleneck module with given input/output channels, shortcut option, group, kernels, and
  133. expansion.
  134. """
  135. super().__init__()
  136. c_ = int(c2 * e) # hidden channels
  137. self.cv1 = Conv(c1, c_, k[0], 1)
  138. self.cv2 = Conv(c_, c2, k[1], 1, g=g)
  139. self.add = shortcut and c1 == c2
  140. self.AFF = AFF(c2)
  141. def forward(self, x):
  142. """'forward()' applies the YOLO FPN to input data."""
  143. if self.add:
  144. results = self.AFF(x, self.cv2(self.cv1(x)))
  145. else:
  146. results = self.cv2(self.cv1(x))
  147. return results
  148. class Bottleneck(nn.Module):
  149. """Standard bottleneck."""
  150. def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5):
  151. """Initializes a standard bottleneck module with optional shortcut connection and configurable parameters."""
  152. super().__init__()
  153. c_ = int(c2 * e) # hidden channels
  154. self.cv1 = Conv(c1, c_, k[0], 1)
  155. self.cv2 = Conv(c_, c2, k[1], 1, g=g)
  156. self.add = shortcut and c1 == c2
  157. def forward(self, x):
  158. """Applies the YOLO FPN to input data."""
  159. return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))
  160. class C2f(nn.Module):
  161. """Faster Implementation of CSP Bottleneck with 2 convolutions."""
  162. def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):
  163. """Initializes a CSP bottleneck with 2 convolutions and n Bottleneck blocks for faster processing."""
  164. super().__init__()
  165. self.c = int(c2 * e) # hidden channels
  166. self.cv1 = Conv(c1, 2 * self.c, 1, 1)
  167. self.cv2 = Conv((2 + n) * self.c, c2, 1) # optional act=FReLU(c2)
  168. self.m = nn.ModuleList(Bottleneck(self.c, self.c, shortcut, g, k=((3, 3), (3, 3)), e=1.0) for _ in range(n))
  169. def forward(self, x):
  170. """Forward pass through C2f layer."""
  171. y = list(self.cv1(x).chunk(2, 1))
  172. y.extend(m(y[-1]) for m in self.m)
  173. return self.cv2(torch.cat(y, 1))
  174. def forward_split(self, x):
  175. """Forward pass using split() instead of chunk()."""
  176. y = list(self.cv1(x).split((self.c, self.c), 1))
  177. y.extend(m(y[-1]) for m in self.m)
  178. return self.cv2(torch.cat(y, 1))
  179. class C3(nn.Module):
  180. """CSP Bottleneck with 3 convolutions."""
  181. def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):
  182. """Initialize the CSP Bottleneck with given channels, number, shortcut, groups, and expansion values."""
  183. super().__init__()
  184. c_ = int(c2 * e) # hidden channels
  185. self.cv1 = Conv(c1, c_, 1, 1)
  186. self.cv2 = Conv(c1, c_, 1, 1)
  187. self.cv3 = Conv(2 * c_, c2, 1) # optional act=FReLU(c2)
  188. self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, k=((1, 1), (3, 3)), e=1.0) for _ in range(n)))
  189. def forward(self, x):
  190. """Forward pass through the CSP bottleneck with 2 convolutions."""
  191. return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), 1))
  192. class C3k(C3):
  193. """C3k is a CSP bottleneck module with customizable kernel sizes for feature extraction in neural networks."""
  194. def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5, k=3):
  195. """Initializes the C3k module with specified channels, number of layers, and configurations."""
  196. super().__init__(c1, c2, n, shortcut, g, e)
  197. c_ = int(c2 * e) # hidden channels
  198. # self.m = nn.Sequential(*(RepBottleneck(c_, c_, shortcut, g, k=(k, k), e=1.0) for _ in range(n)))
  199. self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, k=(k, k), e=1.0) for _ in range(n)))
  200. class C3k2_iAFF(C2f):
  201. """Faster Implementation of CSP Bottleneck with 2 convolutions."""
  202. def __init__(self, c1, c2, n=1, c3k=False, e=0.5, g=1, shortcut=True):
  203. """Initializes the C3k2 module, a faster CSP Bottleneck with 2 convolutions and optional C3k blocks."""
  204. super().__init__(c1, c2, n, shortcut, g, e)
  205. self.m = nn.ModuleList(
  206. C3k(self.c, self.c, 2, shortcut, g) if c3k else Bottleneck_iAFF(self.c, self.c, shortcut, g)for _ in range(n)
  207. )
  208. # 解析利用Bottleneck_iAFF替换Bottneck


四、手把手教你添加C3k2 iAFF

4.1 修改一

第一还是建立文件,我们找到如下 ultralytics /nn文件夹下建立一个目录名字呢就是'Addmodules'文件夹( !然后在其内部建立一个新的py文件将核心代码复制粘贴进去即可。


4.2 修改二

第二步我们在该目录下创建一个新的py文件名字为'__init__.py'( ,然后在其内部导入我们的检测头如下图所示。

​​


4.3 修改三

第三步我门中到如下文件'ultralytics/nn/tasks.py'进行导入和注册我们的模块( !

​​


4.4 修改四

按照我的添加在parse_model里添加即可。

​​


到此就修改完成了,大家可以复制下面的yaml文件运行。


五、正式训练


5.1 yaml文件1

训练信息:YOLO11-C3k2-iAFF summary: 450 layers, 2,636,114 parameters, 2,636,098 gradients, 6.6 GFLOPs

  1. # Ultralytics YOLO 🚀, AGPL-3.0 license
  2. # YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
  3. # Parameters
  4. nc: 80 # number of classes
  5. scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  6. # [depth, width, max_channels]
  7. n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  8. s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  9. m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  10. l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  11. x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
  12. # YOLO11n backbone
  13. backbone:
  14. # [from, repeats, module, args]
  15. - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  16. - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
  17. - [-1, 2, C3k2_iAFF, [256, False, 0.25]]
  18. - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
  19. - [-1, 2, C3k2_iAFF, [512, False, 0.25]]
  20. - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
  21. - [-1, 2, C3k2_iAFF, [512, True]]
  22. - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
  23. - [-1, 2, C3k2_iAFF, [1024, True]]
  24. - [-1, 1, SPPF, [1024, 5]] # 9
  25. - [-1, 2, C2PSA, [1024]] # 10
  26. # YOLO11n head
  27. head:
  28. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  29. - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  30. - [-1, 2, C3k2_iAFF, [512, False]] # 13
  31. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  32. - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  33. - [-1, 2, C3k2_iAFF, [256, False]] # 16 (P3/8-small)
  34. - [-1, 1, Conv, [256, 3, 2]]
  35. - [[-1, 13], 1, Concat, [1]] # cat head P4
  36. - [-1, 2, C3k2_iAFF, [512, False]] # 19 (P4/16-medium)
  37. - [-1, 1, Conv, [512, 3, 2]]
  38. - [[-1, 10], 1, Concat, [1]] # cat head P5
  39. - [-1, 2, C3k2_iAFF, [1024, True]] # 22 (P5/32-large)
  40. - [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)


5.2 训练代码

大家可以创建一个py文件将我给的代码复制粘贴进去,配置好自己的文件路径即可运行。

  1. import warnings
  2. warnings.filterwarnings('ignore')
  3. from ultralytics import YOLO
  4. if __name__ == '__main__':
  5. model = YOLO('模型配置文件')
  6. # 如何切换模型版本, 上面的ymal文件可以改为 yolov8s.yaml就是使用的v8s,
  7. # 类似某个改进的yaml文件名称为yolov8-XXX.yaml那么如果想使用其它版本就把上面的名称改为yolov8l-XXX.yaml即可(改的是上面YOLO中间的名字不是配置文件的)!
  8. # model.load('yolov8n.pt') # 是否加载预训练权重,科研不建议大家加载否则很难提升精度
  9. model.train(data=r"C:\Users\Administrator\PycharmProjects\yolov5-master\yolov5-master\Construction Site Safety.v30-raw-images_latestversion.yolov8\data.yaml",
  10. # 如果大家任务是其它的'ultralytics/cfg/default.yaml'找到这里修改task可以改成detect, segment, classify, pose
  11. cache=False,
  12. imgsz=640,
  13. epochs=150,
  14. single_cls=False, # 是否是单类别检测
  15. batch=16,
  16. close_mosaic=0,
  17. workers=0,
  18. device='0',
  19. optimizer='SGD', # using SGD
  20. # resume='runs/train/exp21/weights/last.pt', # 如过想续训就设置last.pt的地址
  21. amp=True, # 如果出现训练损失为Nan可以关闭amp
  22. project='runs/train',
  23. name='exp',
  24. )


5.3 训练过程截图


五、本文总结

到此本文的正式分享内容就结束了,在这里给大家推荐我的YOLOv11改进有效涨点专栏,本专栏目前为新开的平均质量分98分,后期我会根据各种最新的前沿顶会进行论文复现,也会对一些老的改进机制进行补充,如果大家觉得本文帮助到你了,订阅本专栏,关注后续更多的更新~