YOLOv11改进 | 特殊场景改进篇 | 2024最新改进CPA-Enhancer链式思考网络(适用低照度、图像去雾、雨天、雪天)

一、本文介绍

本文给大家带来的 2024.3月份最新改进机制 ,由CPA-Enhancer: Chain-of-Thought Prompted Adaptive Enhancer for Object Detection under Unknown Degradations论文提出的 CPA-Enhancer链式思考网络 CPA-Enhancer通过引入 链式思考提示机制 ,实现了对未知退化条件下图像的 自适应增强 。该方法的核心在于能够利用CoT提示对图像退化进行动态分析和适应, 从而显著提升物体检测性能 其适用的场景非常多低照度、图像去雾、雨天、雪天均有提点效果 同时其参数量进入的非常小仅有V8n仅有350W,本文内容由我独家整理!

欢迎大家订阅我的专栏一起学习YOLO!

c5ea51009525477b9cbcfad4326aa3b9.png


目录

一、本文介绍

二、原理介绍

三、核心代码

四、手把手教你添加本文机制

4.1 修改一

4.2 修改二

4.3 修改三

关闭混合精度验证!

打印计算量的问题!

五、yaml文件和运行记录

5.1 yaml文件

5.2 训练代码

5.3 训练过程截图

五、本文总结


二、原理介绍

9fb657a3ac1245baa5915abe7d12a22e.png 官方论文地址: 官方论文地址点击此处即可跳转

官方代码地址: 官方代码地址点击此处即可跳转

db6c02d92f1f44f5bf90387caf72fa03.png


c1f4a4bed35e46dbb800bf969110bfe9.png

CPA-Enhancer的创新点和改进机制可以从以下几个方面进行概括:
1. 链式思考(CoT)提示: 首次将链式思考(CoT)提示机制应用于 物体检测 任务中,通过逐步引导的方式处理未知退化图像的问题。
2. 自适应增强策略: 提出了一种能够根据CoT提示动态调整其增强策略的自适应增强器,无需事先了解图像的退化类型。
3. 插件式模型设计: CPA-Enhancer设计为一个插件式模块,可以轻松地与任何现有的通用物体检测器集成,提升在退化图像上的检测 性能

改进机制
CoT提示生成模块(CGM):通过CoT提示生成模块动态生成与图像退化相关的上下文信息,使 模型 能够识别并适应不同类型的图像退化。
内容驱动提示块(CPB): 利用内容驱动提示块加强输入特征与CoT提示之间的交互,允许模型根据退化的类型调整其增强策略。
端到端训练: CPA-Enhancer能够与目标检测器一起端到端地训练,无需单独的预训练过程或额外的监督信号。

总结
CPA-Enhancer通过引入链式思考提示机制,实现了对未知退化条件下图像的自适应增强。该方法的核心在于能够利用CoT提示对图像退化进行动态分析和适应,从而显著提升物体检测性能。其插件式设计使其可以无缝集成到现有的检测框架中,为处理实际应用中遇到的各种退化条件提供了一种有效的解决方案。通过实验验证,CPA-Enhancer不仅在物体检测任务上设立了新的性能标准,还证明了其对其他下游视觉任务性能的提升作用,展示了广泛的应用潜力。


三、核心代码

核心代码的使用方式看章节四!

  1. import torch
  2. import torch.nn as nn
  3. import torch.nn.functional as F
  4. import numbers
  5. from einops import rearrange
  6. from einops.layers.torch import Rearrange
  7. __all__ = ['CPA_arch']
  8. class RFAConv(nn.Module): # 基于Group Conv实现的RFAConv
  9. def __init__(self, in_channel, out_channel, kernel_size=3, stride=1):
  10. super().__init__()
  11. self.kernel_size = kernel_size
  12. self.get_weight = nn.Sequential(nn.AvgPool2d(kernel_size=kernel_size, padding=kernel_size // 2, stride=stride),
  13. nn.Conv2d(in_channel, in_channel * (kernel_size ** 2), kernel_size=1,
  14. groups=in_channel, bias=False))
  15. self.generate_feature = nn.Sequential(
  16. nn.Conv2d(in_channel, in_channel * (kernel_size ** 2), kernel_size=kernel_size, padding=kernel_size // 2,
  17. stride=stride, groups=in_channel, bias=False),
  18. nn.BatchNorm2d(in_channel * (kernel_size ** 2)),
  19. nn.ReLU())
  20. self.conv = nn.Sequential(nn.Conv2d(in_channel, out_channel, kernel_size=kernel_size, stride=kernel_size),
  21. nn.BatchNorm2d(out_channel),
  22. nn.ReLU())
  23. def forward(self, x):
  24. b, c = x.shape[0:2]
  25. weight = self.get_weight(x)
  26. h, w = weight.shape[2:]
  27. weighted = weight.view(b, c, self.kernel_size ** 2, h, w).softmax(2) # b c*kernel**2,h,w -> b c k**2 h w
  28. feature = self.generate_feature(x).view(b, c, self.kernel_size ** 2, h,
  29. w) # b c*kernel**2,h,w -> b c k**2 h w 获得感受野空间特征
  30. weighted_data = feature * weighted
  31. conv_data = rearrange(weighted_data, 'b c (n1 n2) h w -> b c (h n1) (w n2)', n1=self.kernel_size,
  32. # b c k**2 h w -> b c h*k w*k
  33. n2=self.kernel_size)
  34. return self.conv(conv_data)
  35. class Downsample(nn.Module):
  36. def __init__(self, n_feat):
  37. super(Downsample, self).__init__()
  38. self.body = nn.Sequential(nn.Conv2d(n_feat, n_feat // 2, kernel_size=3, stride=1, padding=1, bias=False),
  39. nn.PixelUnshuffle(2))
  40. def forward(self, x):
  41. return self.body(x)
  42. class Upsample(nn.Module):
  43. def __init__(self, n_feat):
  44. super(Upsample, self).__init__()
  45. self.body = nn.Sequential(nn.Conv2d(n_feat, n_feat * 2, kernel_size=3, stride=1, padding=1, bias=False),
  46. nn.PixelShuffle(2))
  47. def forward(self, x): # (b,c,h,w)
  48. return self.body(x) # (b,c/2,h*2,w*2)
  49. class SpatialAttention(nn.Module):
  50. def __init__(self):
  51. super(SpatialAttention, self).__init__()
  52. self.sa = nn.Conv2d(2, 1, 7, padding=3, padding_mode='reflect', bias=True)
  53. def forward(self, x): # x:[b,c,h,w]
  54. x_avg = torch.mean(x, dim=1, keepdim=True) # (b,1,h,w)
  55. x_max, _ = torch.max(x, dim=1, keepdim=True) # (b,1,h,w)
  56. x2 = torch.concat([x_avg, x_max], dim=1) # (b,2,h,w)
  57. sattn = self.sa(x2) # 7x7conv (b,1,h,w)
  58. return sattn * x
  59. class ChannelAttention(nn.Module):
  60. def __init__(self, dim, reduction=8):
  61. super(ChannelAttention, self).__init__()
  62. self.gap = nn.AdaptiveAvgPool2d(1)
  63. self.ca = nn.Sequential(
  64. nn.Conv2d(dim, dim // reduction, 1, padding=0, bias=True),
  65. nn.ReLU(inplace=True), # Relu
  66. nn.Conv2d(dim // reduction, dim, 1, padding=0, bias=True),
  67. )
  68. def forward(self, x): # x:[b,c,h,w]
  69. x_gap = self.gap(x) # [b,c,1,1]
  70. cattn = self.ca(x_gap) # [b,c,1,1]
  71. return cattn * x
  72. class Channel_Shuffle(nn.Module):
  73. def __init__(self, num_groups):
  74. super(Channel_Shuffle, self).__init__()
  75. self.num_groups = num_groups
  76. def forward(self, x):
  77. batch_size, chs, h, w = x.shape
  78. chs_per_group = chs // self.num_groups
  79. x = torch.reshape(x, (batch_size, self.num_groups, chs_per_group, h, w))
  80. # (batch_size, num_groups, chs_per_group, h, w)
  81. x = x.transpose(1, 2) # dim_1 and dim_2
  82. out = torch.reshape(x, (batch_size, -1, h, w))
  83. return out
  84. class TransformerBlock(nn.Module):
  85. def __init__(self, dim, num_heads, ffn_expansion_factor, bias, LayerNorm_type):
  86. super(TransformerBlock, self).__init__()
  87. self.norm1 = LayerNorm(dim, LayerNorm_type)
  88. self.attn = Attention(dim, num_heads, bias)
  89. self.norm2 = LayerNorm(dim, LayerNorm_type)
  90. self.ffn = FeedForward(dim, ffn_expansion_factor, bias)
  91. def forward(self, x):
  92. x = x + self.attn(self.norm1(x))
  93. x = x + self.ffn(self.norm2(x))
  94. return x
  95. def to_3d(x):
  96. return rearrange(x, 'b c h w -> b (h w) c')
  97. def to_4d(x, h, w):
  98. return rearrange(x, 'b (h w) c -> b c h w', h=h, w=w)
  99. class BiasFree_LayerNorm(nn.Module):
  100. def __init__(self, normalized_shape):
  101. super(BiasFree_LayerNorm, self).__init__()
  102. if isinstance(normalized_shape, numbers.Integral):
  103. normalized_shape = (normalized_shape,)
  104. normalized_shape = torch.Size(normalized_shape)
  105. assert len(normalized_shape) == 1
  106. self.weight = nn.Parameter(torch.ones(normalized_shape))
  107. self.normalized_shape = normalized_shape
  108. def forward(self, x):
  109. sigma = x.var(-1, keepdim=True, unbiased=False)
  110. return x / torch.sqrt(sigma + 1e-5) * self.weight
  111. class WithBias_LayerNorm(nn.Module):
  112. def __init__(self, normalized_shape):
  113. super(WithBias_LayerNorm, self).__init__()
  114. if isinstance(normalized_shape, numbers.Integral):
  115. normalized_shape = (normalized_shape,)
  116. normalized_shape = torch.Size(normalized_shape)
  117. assert len(normalized_shape) == 1
  118. self.weight = nn.Parameter(torch.ones(normalized_shape))
  119. self.bias = nn.Parameter(torch.zeros(normalized_shape))
  120. self.normalized_shape = normalized_shape
  121. def forward(self, x):
  122. device = x.device
  123. mu = x.mean(-1, keepdim=True)
  124. sigma = x.var(-1, keepdim=True, unbiased=False)
  125. result = (x - mu) / torch.sqrt(sigma + 1e-5) * self.weight.to(device) + self.bias.to(device)
  126. return result
  127. class LayerNorm(nn.Module):
  128. def __init__(self, dim, LayerNorm_type):
  129. super(LayerNorm, self).__init__()
  130. if LayerNorm_type == 'BiasFree':
  131. self.body = BiasFree_LayerNorm(dim)
  132. else:
  133. self.body = WithBias_LayerNorm(dim)
  134. def forward(self, x):
  135. h, w = x.shape[-2:]
  136. return to_4d(self.body(to_3d(x)), h, w)
  137. class FeedForward(nn.Module):
  138. def __init__(self, dim, ffn_expansion_factor, bias):
  139. super(FeedForward, self).__init__()
  140. hidden_features = int(dim * ffn_expansion_factor)
  141. self.project_in = nn.Conv2d(dim, hidden_features * 2, kernel_size=1, bias=bias)
  142. self.dwconv = nn.Conv2d(hidden_features * 2, hidden_features * 2, kernel_size=3, stride=1, padding=1,
  143. groups=hidden_features * 2, bias=bias)
  144. self.project_out = nn.Conv2d(hidden_features, dim, kernel_size=1, bias=bias)
  145. def forward(self, x):
  146. device = x.device
  147. self.project_in = self.project_in.to(device)
  148. self.dwconv = self.dwconv.to(device)
  149. self.project_out = self.project_out.to(device)
  150. x = self.project_in(x)
  151. x1, x2 = self.dwconv(x).chunk(2, dim=1)
  152. x = F.gelu(x1) * x2
  153. x = self.project_out(x)
  154. return x
  155. class Attention(nn.Module):
  156. def __init__(self, dim, num_heads, bias):
  157. super(Attention, self).__init__()
  158. self.num_heads = num_heads
  159. self.temperature = nn.Parameter(torch.ones(num_heads, 1, 1, dtype=torch.float32), requires_grad=True)
  160. self.qkv = nn.Conv2d(dim, dim * 3, kernel_size=1, bias=bias)
  161. self.qkv_dwconv = nn.Conv2d(dim * 3, dim * 3, kernel_size=3, stride=1, padding=1, groups=dim * 3,
  162. bias=bias)
  163. self.project_out = nn.Conv2d(dim, dim, kernel_size=1, bias=bias)
  164. def forward(self, x):
  165. b, c, h, w = x.shape
  166. device = x.device
  167. self.qkv = self.qkv.to(device)
  168. self.qkv_dwconv = self.qkv_dwconv.to(device)
  169. self.project_out = self.project_out.to(device)
  170. qkv = self.qkv(x)
  171. qkv = self.qkv_dwconv(qkv)
  172. q, k, v = qkv.chunk(3, dim=1)
  173. q = rearrange(q, 'b (head c) h w -> b head c (h w)', head=self.num_heads)
  174. k = rearrange(k, 'b (head c) h w -> b head c (h w)', head=self.num_heads)
  175. v = rearrange(v, 'b (head c) h w -> b head c (h w)', head=self.num_heads)
  176. q = torch.nn.functional.normalize(q, dim=-1)
  177. k = torch.nn.functional.normalize(k, dim=-1)
  178. attn = (q @ k.transpose(-2, -1)) * self.temperature.to(device)
  179. attn = attn.softmax(dim=-1)
  180. out = (attn @ v)
  181. out = rearrange(out, 'b head c (h w) -> b (head c) h w', head=self.num_heads, h=h, w=w)
  182. out = self.project_out(out)
  183. return out
  184. class resblock(nn.Module):
  185. def __init__(self, dim):
  186. super(resblock, self).__init__()
  187. # self.norm = LayerNorm(dim, LayerNorm_type='BiasFree')
  188. self.body = nn.Sequential(nn.Conv2d(dim, dim, kernel_size=3, stride=1, padding=1, bias=False),
  189. nn.PReLU(),
  190. nn.Conv2d(dim, dim, kernel_size=3, stride=1, padding=1, bias=False))
  191. def forward(self, x):
  192. res = self.body((x))
  193. res += x
  194. return res
  195. #########################################################################
  196. # Chain-of-Thought Prompt Generation Module (CGM)
  197. class CotPromptParaGen(nn.Module):
  198. def __init__(self,prompt_inch,prompt_size, num_path=3):
  199. super(CotPromptParaGen, self).__init__()
  200. # (128,32,32)->(64,64,64)->(32,128,128)
  201. self.chain_prompts=nn.ModuleList([
  202. nn.ConvTranspose2d(
  203. in_channels=prompt_inch if idx==0 else prompt_inch//(2**idx),
  204. out_channels=prompt_inch//(2**(idx+1)),
  205. kernel_size=3, stride=2, padding=1
  206. ) for idx in range(num_path)
  207. ])
  208. def forward(self,x):
  209. prompt_params = []
  210. prompt_params.append(x)
  211. for pe in self.chain_prompts:
  212. x=pe(x)
  213. prompt_params.append(x)
  214. return prompt_params
  215. #########################################################################
  216. # Content-driven Prompt Block (CPB)
  217. class ContentDrivenPromptBlock(nn.Module):
  218. def __init__(self, dim, prompt_dim, reduction=8, num_splits=4):
  219. super(ContentDrivenPromptBlock, self).__init__()
  220. self.dim = dim
  221. self.num_splits = num_splits
  222. self.pa2 = nn.Conv2d(2 * dim, dim, 7, padding=3, padding_mode='reflect', groups=dim, bias=True)
  223. self.sigmoid = nn.Sigmoid()
  224. self.conv3x3 = nn.Conv2d(prompt_dim, prompt_dim, kernel_size=3, stride=1, padding=1, bias=False)
  225. self.conv1x1 = nn.Conv2d(dim, prompt_dim, kernel_size=1, stride=1, bias=False)
  226. self.sa = SpatialAttention()
  227. self.ca = ChannelAttention(dim, reduction)
  228. self.myshuffle = Channel_Shuffle(2)
  229. self.out_conv1 = nn.Conv2d(prompt_dim + dim, dim, kernel_size=1, stride=1, bias=False)
  230. self.transformer_block = [
  231. TransformerBlock(dim=dim // num_splits, num_heads=1, ffn_expansion_factor=2.66, bias=False,
  232. LayerNorm_type='WithBias') for _ in range(num_splits)]
  233. def forward(self, x, prompt_param):
  234. # latent: (b,dim*8,h/8,w/8) prompt_param3: (1, 256, 16, 16)
  235. x_ = x
  236. B, C, H, W = x.shape
  237. cattn = self.ca(x) # channel-wise attn
  238. sattn = self.sa(x) # spatial-wise attn
  239. pattn1 = sattn + cattn
  240. pattn1 = pattn1.unsqueeze(dim=2) # [b,c,1,h,w]
  241. x = x.unsqueeze(dim=2) # [b,c,1,h,w]
  242. x2 = torch.cat([x, pattn1], dim=2) # [b,c,2,h,w]
  243. x2 = Rearrange('b c t h w -> b (c t) h w')(x2) # [b,c*2,h,w]
  244. x2 = self.myshuffle(x2) # [c1,c1_att,c2,c2_att,...]
  245. pattn2 = self.pa2(x2)
  246. pattn2 = self.conv1x1(pattn2) # [b,prompt_dim,h,w]
  247. prompt_weight = self.sigmoid(pattn2) # Sigmod
  248. prompt_param = F.interpolate(prompt_param, (H, W), mode="bilinear")
  249. # (b,prompt_dim,prompt_size,prompt_size) -> (b,prompt_dim,h,w)
  250. prompt = prompt_weight * prompt_param
  251. prompt = self.conv3x3(prompt) # (b,prompt_dim,h,w)
  252. inter_x = torch.cat([x_, prompt], dim=1) # (b,prompt_dim+dim,h,w)
  253. inter_x = self.out_conv1(inter_x) # (b,dim,h,w) dim=64
  254. splits = torch.split(inter_x, self.dim // self.num_splits, dim=1)
  255. transformered_splits = []
  256. for i, split in enumerate(splits):
  257. transformered_split = self.transformer_block[i](split)
  258. transformered_splits.append(transformered_split)
  259. result = torch.cat(transformered_splits, dim=1)
  260. return result
  261. #########################################################################
  262. # CPA_Enhancer
  263. class CPA_arch(nn.Module):
  264. def __init__(self, c_in=3, c_out=3, dim=4, prompt_inch=128, prompt_size=32):
  265. super(CPA_arch, self).__init__()
  266. self.conv0 = RFAConv(c_in, dim)
  267. self.conv1 = RFAConv(dim, dim)
  268. self.conv2 = RFAConv(dim * 2, dim * 2)
  269. self.conv3 = RFAConv(dim * 4, dim * 4)
  270. self.conv4 = RFAConv(dim * 8, dim * 8)
  271. self.conv5 = RFAConv(dim * 8, dim * 4)
  272. self.conv6 = RFAConv(dim * 4, dim * 2)
  273. self.conv7 = RFAConv(dim * 2, c_out)
  274. self.down1 = Downsample(dim)
  275. self.down2 = Downsample(dim * 2)
  276. self.down3 = Downsample(dim * 4)
  277. self.prompt_param_ini = nn.Parameter(torch.rand(1, prompt_inch, prompt_size, prompt_size)) # (b,c,h,w)
  278. self.myPromptParamGen = CotPromptParaGen(prompt_inch=prompt_inch,prompt_size=prompt_size)
  279. self.prompt1 = ContentDrivenPromptBlock(dim=dim * 2 ** 1, prompt_dim=prompt_inch // 4, reduction=8) # !!!!
  280. self.prompt2 = ContentDrivenPromptBlock(dim=dim * 2 ** 2, prompt_dim=prompt_inch // 2, reduction=8)
  281. self.prompt3 = ContentDrivenPromptBlock(dim=dim * 2 ** 3, prompt_dim=prompt_inch , reduction=8)
  282. self.up3 = Upsample(dim * 8)
  283. self.up2 = Upsample(dim * 4)
  284. self.up1 = Upsample(dim * 2)
  285. def forward(self, x): # (b,c_in,h,w)
  286. prompt_params = self.myPromptParamGen(self.prompt_param_ini)
  287. prompt_param1 = prompt_params[2] # [1, 64, 64, 64]
  288. prompt_param2 = prompt_params[1] # [1, 128, 32, 32]
  289. prompt_param3 = prompt_params[0] # [1, 256, 16, 16]
  290. x0 = self.conv0(x) # (b,dim,h,w)
  291. x1 = self.conv1(x0) # (b,dim,h,w)
  292. x1_down = self.down1(x1) # (b,dim,h/2,w/2)
  293. x2 = self.conv2(x1_down) # (b,dim,h/2,w/2)
  294. x2_down = self.down2(x2)
  295. x3 = self.conv3(x2_down)
  296. x3_down = self.down3(x3)
  297. x4 = self.conv4(x3_down)
  298. device = x4.device
  299. self.prompt1 = self.prompt1.to(device)
  300. self.prompt2 = self.prompt2.to(device)
  301. self.prompt3 = self.prompt3.to(device)
  302. x4_prompt = self.prompt3(x4, prompt_param3)
  303. x3_up = self.up3(x4_prompt)
  304. x5 = self.conv5(torch.cat([x3_up, x3], 1))
  305. x5_prompt = self.prompt2(x5, prompt_param2)
  306. x2_up = self.up2(x5_prompt)
  307. x2_cat = torch.cat([x2_up, x2], 1)
  308. x6 = self.conv6(x2_cat)
  309. x6_prompt = self.prompt1(x6, prompt_param1)
  310. x1_up = self.up1(x6_prompt)
  311. x7 = self.conv7(torch.cat([x1_up, x1], 1))
  312. return x7
  313. if __name__ == "__main__":
  314. # Generating Sample image
  315. image_size = (1, 3, 640, 640)
  316. image = torch.rand(*image_size)
  317. out = CPA_arch(3, 3, 4)
  318. out = out(image)
  319. print(out.size())

四、手把手教你添加本文机制

4.1 修改一

第一还是建立文件,我们找到如下 ultralytics /nn/modules文件夹下建立一个目录名字呢就是'Addmodules'文件夹( !然后在其内部建立一个新的py文件将核心代码复制粘贴进去即可。

330d62a4676f4777aec8046cb5cf0443.png


4.2 修改二

第二步我们在该目录下创建一个新的py文件名字为'__init__.py'( ,然后在其内部导入我们的检测头如下图所示。

0e4adc15be024ee49b74e6f9e75217d8.png


4.3 修改三

第三步我门中到如下文件'ultralytics/nn/tasks.py'进行导入和注册我们的模块( !

67b28bda87e44d3285f0241acd165256.png ​​


4.4 修改四

按照我的添加在parse_model里添加即可。


关闭混合精度验证!

找到'ultralytics/engine/validator.py'文件找到 'class BaseValidator:' 然后在其'__call__'中 self.args.half = self.device.type != 'cpu' # force FP16 val during training的一行代码下面加上self.args.half = False


打印计算量的问题!

计算的GFLOPs计算异常不打印,所以需要额外修改一处, 我们找到如下文件'ultralytics/utils/torch_utils.py'文件内有如下的代码按照如下的图片进行修改,有一个get_flops的 函数 我们直接用我给的代码全部替换!

  1. def get_flops(model, imgsz=640):
  2. """Return a YOLO model's FLOPs."""
  3. if not thop:
  4. return 0.0 # if not installed return 0.0 GFLOPs
  5. try:
  6. model = de_parallel(model)
  7. p = next(model.parameters())
  8. if not isinstance(imgsz, list):
  9. imgsz = [imgsz, imgsz] # expand if int/float
  10. try:
  11. # Use stride size for input tensor
  12. stride = 640
  13. im = torch.empty((1, 3, stride, stride), device=p.device) # input image in BCHW format
  14. flops = thop.profile(deepcopy(model), inputs=[im], verbose=False)[0] / 1e9 * 2 # stride GFLOPs
  15. return flops * imgsz[0] / stride * imgsz[1] / stride # imgsz GFLOPs
  16. except Exception:
  17. # Use actual image size for input tensor (i.e. required for RTDETR models)
  18. im = torch.empty((1, p.shape[1], *imgsz), device=p.device) # input image in BCHW format
  19. return thop.profile(deepcopy(model), inputs=[im], verbose=False)[0] / 1e9 * 2 # imgsz GFLOPs
  20. except Exception:
  21. return 0.0

到此就修改完成了,大家可以复制下面的yaml文件运行。


五、yaml文件和运行记录

5.1 yaml文件

此版本训练信息:YOLO11-CPAEnhancer summary: 490 layers, 3,087,792 parameters, 3,087,776 gradients, 19.2 GFLOPs

  1. # Ultralytics YOLO 🚀, AGPL-3.0 license
  2. # YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
  3. # Parameters
  4. nc: 80 # number of classes
  5. scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  6. # [depth, width, max_channels]
  7. n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  8. s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  9. m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  10. l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  11. x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
  12. # YOLO11n backbone
  13. backbone:
  14. # [from, repeats, module, args]
  15. - [-1, 1, CPA_arch, []] # 0-P1/2
  16. - [-1, 1, Conv, [64, 3, 2]] # 1-P1/2
  17. - [-1, 1, Conv, [128, 3, 2]] # 2-P2/4
  18. - [-1, 2, C3k2, [256, False, 0.25]]
  19. - [-1, 1, Conv, [256, 3, 2]] # 4-P3/8
  20. - [-1, 2, C3k2, [512, False, 0.25]]
  21. - [-1, 1, Conv, [512, 3, 2]] # 6-P4/16
  22. - [-1, 2, C3k2, [512, True]]
  23. - [-1, 1, Conv, [1024, 3, 2]] # 8-P5/32
  24. - [-1, 2, C3k2, [1024, True]]
  25. - [-1, 1, SPPF, [1024, 5]] # 10
  26. - [-1, 2, C2PSA, [1024]] # 11
  27. # YOLO11n head
  28. head:
  29. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  30. - [[-1, 7], 1, Concat, [1]] # cat backbone P4
  31. - [-1, 2, C3k2, [512, False]] # 14
  32. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  33. - [[-1, 5], 1, Concat, [1]] # cat backbone P3
  34. - [-1, 2, C3k2, [256, False]] # 17 (P3/8-small)
  35. - [-1, 1, Conv, [256, 3, 2]]
  36. - [[-1, 14], 1, Concat, [1]] # cat head P4
  37. - [-1, 2, C3k2, [512, False]] # 20 (P4/16-medium)
  38. - [-1, 1, Conv, [512, 3, 2]]
  39. - [[-1, 11], 1, Concat, [1]] # cat head P5
  40. - [-1, 2, C3k2, [1024, True]] # 23 (P5/32-large)
  41. - [[17, 20, 23], 1, Detect, [nc]] # Detect(P3, P4, P5)


5.2 训练代码

大家可以创建一个py文件将我给的代码复制粘贴进去,配置好自己的文件路径即可运行。

  1. import warnings
  2. warnings.filterwarnings('ignore')
  3. from ultralytics import YOLO
  4. if __name__ == '__main__':
  5. model = YOLO('ultralytics/cfg/models/v8/yolov8-C2f-FasterBlock.yaml')
  6. # model.load('yolov8n.pt') # loading pretrain weights
  7. model.train(data=r'替换数据集yaml文件地址',
  8. # 如果大家任务是其它的'ultralytics/cfg/default.yaml'找到这里修改task可以改成detect, segment, classify, pose
  9. cache=False,
  10. imgsz=640,
  11. epochs=150,
  12. single_cls=False, # 是否是单类别检测
  13. batch=4,
  14. close_mosaic=10,
  15. workers=0,
  16. device='0',
  17. optimizer='SGD', # using SGD
  18. # resume='', # 如过想续训就设置last.pt的地址
  19. amp=False, # 如果出现训练损失为Nan可以关闭amp
  20. project='runs/train',
  21. name='exp',
  22. )


5.3 训练过程截图


五、本文总结

到此本文的正式分享内容就结束了,在这里给大家推荐我的YOLOv11改进有效涨点专栏,本专栏目前为新开的平均质量分98分,后期我会根据各种最新的前沿顶会进行论文复现,也会对一些老的改进机制进行补充,如果大家觉得本文帮助到你了,订阅本专栏,关注后续更多的更新~