YOLOv11改进 | 主干/Backbone篇 | 反向残差块目标检测网络EMO一种轻量级的CNN架构(支持yolov11全系列轻量化)

一、本文介绍

本文给大家带来的改进机制是 反向残差块网络EMO ,其的构成块iRMB在之前我已经发过了,同时进行了二次创新,本文的网络就是由iRMB组成的网络EMO,所以我们二次创新之后的iEMA也可以用于这个网络中,再次形成二次创新,同时本文的主干网络为一种 轻量级 的CNN架构,在开始之前给大家推荐一下我的专栏,本专栏每周更新3-10篇最新前沿机制 | 包括二次创新全网无重复,以及融合改进, 本文支持yolov11的全系列模型放缩,也就是nsmlx五个版本, 本文内容为个人独家创新,抄袭必究。

欢迎大家订阅我的专栏一起学习YOLO!


目录

一、本文介绍

二、EMO模型原理

三、EMO的核心代码

四、手把手教你添加EMO

4.1 修改一

4.2 修改二

4.3 修改三

4.4 修改四

4.5 修改五

4.6 修改六

4.7 修改七

4.8 修改八

注意!!! 额外的修改!

打印计算量问题解决方案

注意事项!!!

五、EMO的yaml文件

5.1 EMO的yaml文件

5.2 训练文件的代码

六、成功运行记录

七、本文总结


二、EMO 模型 原理

论文地址: 官方论文地址

代码地址: 官方代码地址


Efficient MOdel (EMO) 模型基于 反向残差块(Inverted Residual Block, IRB) ,这是一种轻量级 CNN 的基础架构,同时融合了 Transformer的有效组件 。通过这种结合,EMO实现了一个统一的视角来处理轻量级模型的设计,创新地将CNN和注意力机制相结合。此外,EMO模型在各种基准测试中展示出优越的性能,特别是在ImageNet-1K、COCO2017和ADE20K等数据集上的表现。该模型不仅在效率和精度方面取得了平衡,而且在轻量级设计方面实现了突破。

EMO的 基本原理 可以分为以下几个要点:

1. 反向残差块(IRB)的应用: IRB作为轻量级CNN的基础架构,EMO将其扩展到基于注意力的模型。

2. 元移动块(MMB)的抽象化: EMO提出了一种新的轻量级设计方法,即单残差的元移动块(MMB),这是从IRB和 Transformer 的有效 组件 中抽象出的。

3. 现代反向残差移动块(iRMB)的构建: 基于简单但有效的设计标准,EMO推导出了iRMB,并以此构建了类似于ResNet的高效模型(EMO)。

在下面这个图中,我们可以看到 EMO模型的结构细节:

左侧 是一个抽象统一的元移动块(Meta-Mobile Block),它融合了多头自注意力机制(Multi-Head Self-Attention)、前馈网络(Feed-Forward Network)和反向 残差块 (Inverted Residual Block)。这个复合模块通过不同的扩展比率和高效的操作符进行具体化。

右侧 展示了一个类似于ResNet的 EMO模型架构 ,它完全由推导出的iRMB组成。图中突出了EMO模型中微操作组合(如深度可分卷积、窗口Transformer等)和不同尺度的网络层次,这些都是用于分类(CLS)、检测(Det)和分割(Seg)任务的。这种设计强调了EMO模型在处理不同下游任务时的灵活性和效率。


三、EMO的核心代码

EMO的核心代码如下,使用方法看章节四!

  1. from timm.models.layers import trunc_normal_
  2. import math
  3. import torch
  4. import torch.nn as nn
  5. import torch.nn.functional as F
  6. from functools import partial
  7. from einops import rearrange, reduce
  8. from timm.models.layers import DropPath
  9. inplace = True
  10. __all__ = ['EMO_1M', 'EMO_2M', 'EMO_5M', 'EMO_6M']
  11. class SELayerV2(nn.Module):
  12. def __init__(self, in_channel, reduction=1):
  13. super(SELayerV2, self).__init__()
  14. assert in_channel >= reduction and in_channel % reduction == 0, 'invalid in_channel in SaElayer'
  15. self.reduction = reduction
  16. self.cardinality = 4
  17. self.avg_pool = nn.AdaptiveAvgPool2d(1)
  18. # cardinality 1
  19. self.fc1 = nn.Sequential(
  20. nn.Linear(in_channel, in_channel // self.reduction, bias=False),
  21. nn.ReLU(inplace=True)
  22. )
  23. # cardinality 2
  24. self.fc2 = nn.Sequential(
  25. nn.Linear(in_channel, in_channel // self.reduction, bias=False),
  26. nn.ReLU(inplace=True)
  27. )
  28. # cardinality 3
  29. self.fc3 = nn.Sequential(
  30. nn.Linear(in_channel, in_channel // self.reduction, bias=False),
  31. nn.ReLU(inplace=True)
  32. )
  33. # cardinality 4
  34. self.fc4 = nn.Sequential(
  35. nn.Linear(in_channel, in_channel // self.reduction, bias=False),
  36. nn.ReLU(inplace=True)
  37. )
  38. self.fc = nn.Sequential(
  39. nn.Linear(in_channel // self.reduction * self.cardinality, in_channel, bias=False),
  40. nn.Sigmoid()
  41. )
  42. def forward(self, x):
  43. b, c, _, _ = x.size()
  44. y = self.avg_pool(x).view(b, c)
  45. y1 = self.fc1(y)
  46. y2 = self.fc2(y)
  47. y3 = self.fc3(y)
  48. y4 = self.fc4(y)
  49. y_concate = torch.cat([y1, y2, y3, y4], dim=1)
  50. y_ex_dim = self.fc(y_concate).view(b, c, 1, 1)
  51. return x * y_ex_dim.expand_as(x)
  52. def get_act(act_layer='relu'):
  53. act_dict = {
  54. 'none': nn.Identity,
  55. 'relu': nn.ReLU,
  56. 'relu6': nn.ReLU6,
  57. 'silu': nn.SiLU,
  58. 'gelu': nn.GELU
  59. }
  60. return act_dict[act_layer]
  61. class LayerNorm2d(nn.Module):
  62. def __init__(self, normalized_shape, eps=1e-6, elementwise_affine=True):
  63. super().__init__()
  64. self.norm = nn.LayerNorm(normalized_shape, eps, elementwise_affine)
  65. def forward(self, x):
  66. x = rearrange(x, 'b c h w -> b h w c').contiguous()
  67. x = self.norm(x)
  68. x = rearrange(x, 'b h w c -> b c h w').contiguous()
  69. return x
  70. def get_norm(norm_layer='in_1d'):
  71. eps = 1e-6
  72. norm_dict = {
  73. 'none': nn.Identity,
  74. 'in_1d': partial(nn.InstanceNorm1d, eps=eps),
  75. 'in_2d': partial(nn.InstanceNorm2d, eps=eps),
  76. 'in_3d': partial(nn.InstanceNorm3d, eps=eps),
  77. 'bn_1d': partial(nn.BatchNorm1d, eps=eps),
  78. 'bn_2d': partial(nn.BatchNorm2d, eps=eps),
  79. # 'bn_2d': partial(nn.SyncBatchNorm, eps=eps),
  80. 'bn_3d': partial(nn.BatchNorm3d, eps=eps),
  81. 'gn': partial(nn.GroupNorm, eps=eps),
  82. 'ln_1d': partial(nn.LayerNorm, eps=eps),
  83. 'ln_2d': partial(LayerNorm2d, eps=eps),
  84. }
  85. return norm_dict[norm_layer]
  86. class LayerScale(nn.Module):
  87. def __init__(self, dim, init_values=1e-5, inplace=True):
  88. super().__init__()
  89. self.inplace = inplace
  90. self.gamma = nn.Parameter(init_values * torch.ones(1, 1, dim))
  91. def forward(self, x):
  92. return x.mul_(self.gamma) if self.inplace else x * self.gamma
  93. class LayerScale2D(nn.Module):
  94. def __init__(self, dim, init_values=1e-5, inplace=True):
  95. super().__init__()
  96. self.inplace = inplace
  97. self.gamma = nn.Parameter(init_values * torch.ones(1, dim, 1, 1))
  98. def forward(self, x):
  99. return x.mul_(self.gamma) if self.inplace else x * self.gamma
  100. class ConvNormAct(nn.Module):
  101. def __init__(self, dim_in, dim_out, kernel_size, stride=1, dilation=1, groups=1, bias=False,
  102. skip=False, norm_layer='bn_2d', act_layer='relu', inplace=True, drop_path_rate=0.):
  103. super(ConvNormAct, self).__init__()
  104. self.has_skip = skip and dim_in == dim_out
  105. padding = math.ceil((kernel_size - stride) / 2)
  106. self.conv = nn.Conv2d(dim_in, dim_out, kernel_size, stride, padding, dilation, groups, bias)
  107. self.norm = get_norm(norm_layer)(dim_out)
  108. self.act = nn.GELU()
  109. self.drop_path = DropPath(drop_path_rate) if drop_path_rate else nn.Identity()
  110. def forward(self, x):
  111. shortcut = x
  112. x = self.conv(x)
  113. x = self.norm(x)
  114. x = self.act(x)
  115. if self.has_skip:
  116. x = self.drop_path(x) + shortcut
  117. return x
  118. # ========== Multi-Scale Populations, for down-sampling and inductive bias ==========
  119. class MSPatchEmb(nn.Module):
  120. def __init__(self, dim_in, emb_dim, kernel_size=2, c_group=-1, stride=1, dilations=[1, 2, 3],
  121. norm_layer='bn_2d', act_layer='silu'):
  122. super().__init__()
  123. self.dilation_num = len(dilations)
  124. assert dim_in % c_group == 0
  125. c_group = math.gcd(dim_in, emb_dim) if c_group == -1 else c_group
  126. self.convs = nn.ModuleList()
  127. for i in range(len(dilations)):
  128. padding = math.ceil(((kernel_size - 1) * dilations[i] + 1 - stride) / 2)
  129. self.convs.append(nn.Sequential(
  130. nn.Conv2d(dim_in, emb_dim, kernel_size, stride, padding, dilations[i], groups=c_group),
  131. get_norm(norm_layer)(emb_dim),
  132. get_act(act_layer)(emb_dim)))
  133. def forward(self, x):
  134. if self.dilation_num == 1:
  135. x = self.convs[0](x)
  136. else:
  137. x = torch.cat([self.convs[i](x).unsqueeze(dim=-1) for i in range(self.dilation_num)], dim=-1)
  138. x = reduce(x, 'b c h w n -> b c h w', 'mean').contiguous()
  139. return x
  140. class iRMB(nn.Module):
  141. def __init__(self, dim_in, dim_out, norm_in=True, has_skip=True, exp_ratio=1.0, norm_layer='bn_2d',
  142. act_layer='relu', v_proj=True, dw_ks=3, stride=1, dilation=1, se_ratio=0.0, dim_head=64, window_size=7,
  143. attn_s=True, qkv_bias=False, attn_drop=0., drop=0., drop_path=0., v_group=False, attn_pre=False):
  144. super().__init__()
  145. self.norm = get_norm(norm_layer)(dim_in) if norm_in else nn.Identity()
  146. dim_mid = int(dim_in * exp_ratio)
  147. self.has_skip = (dim_in == dim_out and stride == 1) and has_skip
  148. self.attn_s = attn_s
  149. if self.attn_s:
  150. assert dim_in % dim_head == 0, 'dim should be divisible by num_heads'
  151. self.dim_head = dim_head
  152. self.window_size = window_size
  153. self.num_head = dim_in // dim_head
  154. self.scale = self.dim_head ** -0.5
  155. self.attn_pre = attn_pre
  156. self.qk = ConvNormAct(dim_in, int(dim_in * 2), kernel_size=1, bias=qkv_bias, norm_layer='none',
  157. act_layer='none')
  158. self.v = ConvNormAct(dim_in, dim_mid, kernel_size=1, groups=self.num_head if v_group else 1, bias=qkv_bias,
  159. norm_layer='none', act_layer=act_layer, inplace=inplace)
  160. self.attn_drop = nn.Dropout(attn_drop)
  161. else:
  162. if v_proj:
  163. self.v = ConvNormAct(dim_in, dim_mid, kernel_size=1, bias=qkv_bias, norm_layer='none',
  164. act_layer=act_layer, inplace=inplace)
  165. else:
  166. self.v = nn.Identity()
  167. self.conv_local = ConvNormAct(dim_mid, dim_mid, kernel_size=dw_ks, stride=stride, dilation=dilation,
  168. groups=dim_mid, norm_layer='bn_2d', act_layer='silu', inplace=inplace)
  169. self.se = SELayerV2(dim_mid)
  170. self.proj_drop = nn.Dropout(drop)
  171. self.proj = ConvNormAct(dim_mid, dim_out, kernel_size=1, norm_layer='none', act_layer='none', inplace=inplace)
  172. self.drop_path = DropPath(drop_path) if drop_path else nn.Identity()
  173. def forward(self, x):
  174. shortcut = x
  175. x = self.norm(x)
  176. B, C, H, W = x.shape
  177. if self.attn_s:
  178. # padding
  179. if self.window_size <= 0:
  180. window_size_W, window_size_H = W, H
  181. else:
  182. window_size_W, window_size_H = self.window_size, self.window_size
  183. pad_l, pad_t = 0, 0
  184. pad_r = (window_size_W - W % window_size_W) % window_size_W
  185. pad_b = (window_size_H - H % window_size_H) % window_size_H
  186. x = F.pad(x, (pad_l, pad_r, pad_t, pad_b, 0, 0,))
  187. n1, n2 = (H + pad_b) // window_size_H, (W + pad_r) // window_size_W
  188. x = rearrange(x, 'b c (h1 n1) (w1 n2) -> (b n1 n2) c h1 w1', n1=n1, n2=n2).contiguous()
  189. # attention
  190. b, c, h, w = x.shape
  191. qk = self.qk(x)
  192. qk = rearrange(qk, 'b (qk heads dim_head) h w -> qk b heads (h w) dim_head', qk=2, heads=self.num_head,
  193. dim_head=self.dim_head).contiguous()
  194. q, k = qk[0], qk[1]
  195. attn_spa = (q @ k.transpose(-2, -1)) * self.scale
  196. attn_spa = attn_spa.softmax(dim=-1)
  197. attn_spa = self.attn_drop(attn_spa)
  198. if self.attn_pre:
  199. x = rearrange(x, 'b (heads dim_head) h w -> b heads (h w) dim_head', heads=self.num_head).contiguous()
  200. x_spa = attn_spa @ x
  201. x_spa = rearrange(x_spa, 'b heads (h w) dim_head -> b (heads dim_head) h w', heads=self.num_head, h=h,
  202. w=w).contiguous()
  203. x_spa = self.v(x_spa)
  204. else:
  205. v = self.v(x)
  206. v = rearrange(v, 'b (heads dim_head) h w -> b heads (h w) dim_head', heads=self.num_head).contiguous()
  207. x_spa = attn_spa @ v
  208. x_spa = rearrange(x_spa, 'b heads (h w) dim_head -> b (heads dim_head) h w', heads=self.num_head, h=h,
  209. w=w).contiguous()
  210. # unpadding
  211. x = rearrange(x_spa, '(b n1 n2) c h1 w1 -> b c (h1 n1) (w1 n2)', n1=n1, n2=n2).contiguous()
  212. if pad_r > 0 or pad_b > 0:
  213. x = x[:, :, :H, :W].contiguous()
  214. else:
  215. x = self.v(x)
  216. x = x + self.se(self.conv_local(x)) if self.has_skip else self.se(self.conv_local(x))
  217. x = self.proj_drop(x)
  218. x = self.proj(x)
  219. x = (shortcut + self.drop_path(x)) if self.has_skip else x
  220. return x
  221. class EMO(nn.Module):
  222. def __init__(self, dim_in=3,factor=1,
  223. depths=[1, 2, 4, 2], stem_dim=16, embed_dims=[64, 128, 256, 512], exp_ratios=[4., 4., 4., 4.],
  224. norm_layers=['bn_2d', 'bn_2d', 'bn_2d', 'bn_2d'], act_layers=['relu', 'relu', 'relu', 'relu'],
  225. dw_kss=[3, 3, 5, 5], se_ratios=[0.0, 0.0, 0.0, 0.0], dim_heads=[32, 32, 32, 32],
  226. window_sizes=[7, 7, 7, 7], attn_ss=[False, False, True, True], qkv_bias=True,
  227. attn_drop=0., drop=0., drop_path=0., v_group=False, attn_pre=False, pre_dim=0):
  228. super().__init__()
  229. # 放缩系数
  230. scale_factor = factor # 比如放大 1.5
  231. # exp_ratios放缩比例不缩放
  232. # 放缩后的 embed_dims,每个元素都被乘以 scale_factor 并转化为整形
  233. embed_dims = [int(dim * scale_factor) for dim in embed_dims]
  234. dprs = [x.item() for x in torch.linspace(0, drop_path, sum(depths))]
  235. self.stage0 = nn.ModuleList([
  236. MSPatchEmb( # down to 112
  237. dim_in, stem_dim, kernel_size=dw_kss[0], c_group=1, stride=2, dilations=[1],
  238. norm_layer=norm_layers[0], act_layer='none'),
  239. iRMB( # ds
  240. stem_dim, stem_dim, norm_in=False, has_skip=False, exp_ratio=1,
  241. norm_layer=norm_layers[0], act_layer=act_layers[0], v_proj=False, dw_ks=dw_kss[0],
  242. stride=1, dilation=1, se_ratio=1,
  243. dim_head=dim_heads[0], window_size=window_sizes[0], attn_s=False,
  244. qkv_bias=qkv_bias, attn_drop=attn_drop, drop=drop, drop_path=0.,
  245. attn_pre=attn_pre
  246. )
  247. ])
  248. emb_dim_pre = stem_dim
  249. for i in range(len(depths)):
  250. layers = []
  251. dpr = dprs[sum(depths[:i]):sum(depths[:i + 1])]
  252. for j in range(depths[i]):
  253. if j == 0:
  254. stride, has_skip, attn_s, exp_ratio = 2, False, False, exp_ratios[i] * 2
  255. else:
  256. stride, has_skip, attn_s, exp_ratio = 1, True, attn_ss[i], exp_ratios[i]
  257. layers.append(iRMB(
  258. emb_dim_pre, embed_dims[i], norm_in=True, has_skip=has_skip, exp_ratio=exp_ratio,
  259. norm_layer=norm_layers[i], act_layer=act_layers[i], v_proj=True, dw_ks=dw_kss[i],
  260. stride=stride, dilation=1, se_ratio=se_ratios[i],
  261. dim_head=dim_heads[i], window_size=window_sizes[i], attn_s=attn_s,
  262. qkv_bias=qkv_bias, attn_drop=attn_drop, drop=drop, drop_path=dpr[j], v_group=v_group,
  263. attn_pre=attn_pre
  264. ))
  265. emb_dim_pre = embed_dims[i]
  266. self.__setattr__(f'stage{i + 1}', nn.ModuleList(layers))
  267. self.norm = get_norm(norm_layers[-1])(embed_dims[-1])
  268. if pre_dim > 0:
  269. self.pre_head = nn.Sequential(nn.Linear(embed_dims[-1], pre_dim), get_act(act_layers[-1])(inplace=inplace))
  270. self.pre_dim = pre_dim
  271. else:
  272. self.pre_head = nn.Identity()
  273. self.pre_dim = embed_dims[-1]
  274. self.apply(self._init_weights)
  275. self.width_list = [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))]
  276. def _init_weights(self, m):
  277. if isinstance(m, nn.Linear):
  278. trunc_normal_(m.weight, std=.02)
  279. if m.bias is not None:
  280. nn.init.zeros_(m.bias)
  281. elif isinstance(m, (nn.LayerNorm, nn.GroupNorm,
  282. nn.BatchNorm1d, nn.BatchNorm2d, nn.BatchNorm3d,
  283. nn.InstanceNorm1d, nn.InstanceNorm2d, nn.InstanceNorm3d)):
  284. nn.init.zeros_(m.bias)
  285. nn.init.ones_(m.weight)
  286. @torch.jit.ignore
  287. def no_weight_decay(self):
  288. return {'token'}
  289. @torch.jit.ignore
  290. def no_weight_decay_keywords(self):
  291. return {'alpha', 'gamma', 'beta'}
  292. @torch.jit.ignore
  293. def no_ft_keywords(self):
  294. # return {'head.weight', 'head.bias'}
  295. return {}
  296. @torch.jit.ignore
  297. def ft_head_keywords(self):
  298. return {'head.weight', 'head.bias'}, self.num_classes
  299. def get_classifier(self):
  300. return self.head
  301. def reset_classifier(self, num_classes):
  302. self.num_classes = num_classes
  303. self.head = nn.Linear(self.pre_dim, num_classes) if num_classes > 0 else nn.Identity()
  304. def check_bn(self):
  305. for name, m in self.named_modules():
  306. if isinstance(m, nn.modules.batchnorm._NormBase):
  307. m.running_mean = torch.nan_to_num(m.running_mean, nan=0, posinf=1, neginf=-1)
  308. m.running_var = torch.nan_to_num(m.running_var, nan=0, posinf=1, neginf=-1)
  309. def forward(self, x):
  310. unique_tensors = {}
  311. for blk in self.stage0:
  312. x = blk(x)
  313. width, height = x.shape[2], x.shape[3]
  314. unique_tensors[(width, height)] = x
  315. for blk in self.stage1:
  316. x = blk(x)
  317. width, height = x.shape[2], x.shape[3]
  318. unique_tensors[(width, height)] = x
  319. for blk in self.stage2:
  320. x = blk(x)
  321. width, height = x.shape[2], x.shape[3]
  322. unique_tensors[(width, height)] = x
  323. for blk in self.stage3:
  324. x = blk(x)
  325. width, height = x.shape[2], x.shape[3]
  326. unique_tensors[(width, height)] = x
  327. for blk in self.stage4:
  328. x = blk(x)
  329. width, height = x.shape[2], x.shape[3]
  330. unique_tensors[(width, height)] = x
  331. result_list = list(unique_tensors.values())[-4:]
  332. return result_list
  333. def EMO_1M(factor=1):
  334. model = EMO(
  335. factor=factor,
  336. depths=[2, 2, 8, 3], stem_dim=24, embed_dims=[32, 48, 80, 168], exp_ratios=[2., 2.5, 3.0, 3.5],
  337. norm_layers=['bn_2d', 'bn_2d', 'ln_2d', 'ln_2d'], act_layers=['silu', 'silu', 'gelu', 'gelu'],
  338. dw_kss=[3, 3, 5, 5], dim_heads=[16, 16, 20, 21], window_sizes=[7, 7, 7, 7], attn_ss=[False, False, True, True],
  339. qkv_bias=True, attn_drop=0., drop=0., drop_path=0.04036, v_group=False, attn_pre=True, pre_dim=0)
  340. return model
  341. def EMO_2M(factor=1):
  342. model = EMO(
  343. factor=factor,
  344. depths=[3, 3, 9, 3], stem_dim=24, embed_dims=[32, 48, 120, 200], exp_ratios=[2., 2.5, 3.0, 3.5],
  345. norm_layers=['bn_2d', 'bn_2d', 'ln_2d', 'ln_2d'], act_layers=['silu', 'silu', 'gelu', 'gelu'],
  346. dw_kss=[3, 3, 5, 5], dim_heads=[16, 16, 20, 20], window_sizes=[7, 7, 7, 7], attn_ss=[False, False, True, True],
  347. qkv_bias=True, attn_drop=0., drop=0., drop_path=0.05, v_group=False, attn_pre=True, pre_dim=0)
  348. return model
  349. def EMO_5M(factor=1):
  350. model = EMO(
  351. factor=factor,
  352. depths=[3, 3, 9, 3], stem_dim=24, embed_dims=[48, 72, 160, 288], exp_ratios=[2., 3., 4., 4.],
  353. norm_layers=['bn_2d', 'bn_2d', 'ln_2d', 'ln_2d'], act_layers=['silu', 'silu', 'gelu', 'gelu'],
  354. dw_kss=[3, 3, 5, 5], dim_heads=[24, 24, 32, 32], window_sizes=[7, 7, 7, 7], attn_ss=[False, False, True, True],
  355. qkv_bias=True, attn_drop=0., drop=0., drop_path=0.05, v_group=False, attn_pre=True, pre_dim=0)
  356. return model
  357. def EMO_6M(factor=1):
  358. model = EMO(
  359. factor=factor,
  360. depths=[3, 3, 9, 3], stem_dim=24, embed_dims=[48, 72, 160, 320], exp_ratios=[2., 3., 4., 5.],
  361. norm_layers=['bn_2d', 'bn_2d', 'ln_2d', 'ln_2d'], act_layers=['silu', 'silu', 'gelu', 'gelu'],
  362. dw_kss=[3, 3, 5, 5], dim_heads=[16, 24, 20, 32], window_sizes=[7, 7, 7, 7], attn_ss=[False, False, True, True],
  363. qkv_bias=True, attn_drop=0., drop=0., drop_path=0.05, v_group=False, attn_pre=True, pre_dim=0)
  364. return model
  365. if __name__ == "__main__":
  366. # Generating Sample image
  367. image_size = (1, 3, 640, 640)
  368. image = torch.rand(*image_size)
  369. # Model
  370. model = EMO_6M()
  371. out = model(image)
  372. print(len(out))

四、手把手教你添加EMO

4.1 修改一

第一步还是建立文件,我们找到如下ultralytics/nn/modules文件夹下建立一个目录名字呢就是'Addmodules'文件夹( !然后在其内部建立一个新的py文件将核心代码复制粘贴进去即可


4.2 修改二

第二步我们在该目录下创建一个新的py文件名字为'__init__.py'( ,然后在其内部导入我们的检测头如下图所示。


4.3 修改三

第三步我门中到如下文件'ultralytics/nn/tasks.py'进行导入和注册我们的模块( !


4.4 修改四

添加如下两行代码!!!


4.5 修改五

找到七百多行大概把具体看图片,按照图片来修改就行,添加红框内的部分,注意没有()只是 函数 名。

  1. elif m in {自行添加对应的模型即可,下面都是一样的}:
  2. m = m(*args)
  3. c2 = m.width_list # 返回通道列表
  4. backbone = True


4.6 修改六

下面的两个红框内都是需要改动的。

  1. if isinstance(c2, list):
  2. m_ = m
  3. m_.backbone = True
  4. else:
  5. m_ = nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args) # module
  6. t = str(m)[8:-2].replace('__main__.', '') # module type
  7. m.np = sum(x.numel() for x in m_.parameters()) # number params
  8. m_.i, m_.f, m_.type = i + 4 if backbone else i, f, t # attach index, 'from' index, type


4.7 修改七

如下的也需要修改,全部按照我的来。

代码如下把原先的代码替换了即可。

  1. if verbose:
  2. LOGGER.info(f'{i:>3}{str(f):>20}{n_:>3}{m.np:10.0f} {t:<45}{str(args):<30}') # print
  3. save.extend(x % (i + 4 if backbone else i) for x in ([f] if isinstance(f, int) else f) if x != -1) # append to savelist
  4. layers.append(m_)
  5. if i == 0:
  6. ch = []
  7. if isinstance(c2, list):
  8. ch.extend(c2)
  9. if len(c2) != 5:
  10. ch.insert(0, 0)
  11. else:
  12. ch.append(c2)


4.8 修改八

修改八和前面的都不太一样,需要修改前向传播中的一个部分, 已经离开了parse_model方法了。

可以在图片中开代码行数,没有离开task.py文件都是同一个文件。 同时这个部分有好几个前向传播都很相似,大家不要看错了, 是70多行左右的!!!,同时我后面提供了代码,大家直接复制粘贴即可,有时间我针对这里会出一个视频。

​​

代码如下->

  1. def _predict_once(self, x, profile=False, visualize=False, embed=None):
  2. """
  3. Perform a forward pass through the network.
  4. Args:
  5. x (torch.Tensor): The input tensor to the model.
  6. profile (bool): Print the computation time of each layer if True, defaults to False.
  7. visualize (bool): Save the feature maps of the model if True, defaults to False.
  8. embed (list, optional): A list of feature vectors/embeddings to return.
  9. Returns:
  10. (torch.Tensor): The last output of the model.
  11. """
  12. y, dt, embeddings = [], [], [] # outputs
  13. for m in self.model:
  14. if m.f != -1: # if not from previous layer
  15. x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f] # from earlier layers
  16. if profile:
  17. self._profile_one_layer(m, x, dt)
  18. if hasattr(m, 'backbone'):
  19. x = m(x)
  20. if len(x) != 5: # 0 - 5
  21. x.insert(0, None)
  22. for index, i in enumerate(x):
  23. if index in self.save:
  24. y.append(i)
  25. else:
  26. y.append(None)
  27. x = x[-1] # 最后一个输出传给下一层
  28. else:
  29. x = m(x) # run
  30. y.append(x if m.i in self.save else None) # save output
  31. if visualize:
  32. feature_visualization(x, m.type, m.i, save_dir=visualize)
  33. if embed and m.i in embed:
  34. embeddings.append(nn.functional.adaptive_avg_pool2d(x, (1, 1)).squeeze(-1).squeeze(-1)) # flatten
  35. if m.i == max(embed):
  36. return torch.unbind(torch.cat(embeddings, 1), dim=0)
  37. return x

到这里就完成了修改部分,但是这里面细节很多,大家千万要注意不要替换多余的代码,导致报错,也不要拉下任何一部,都会导致运行失败,而且报错很难排查!!!很难排查!!!


注意!!! 额外的修改!

关注我的其实都知道,我大部分的修改都是一样的,这个网络需要额外的修改一步,就是s一个参数,将下面的s改为640!!!即可完美运行!!


打印计算量问题解决方案

我们找到如下文件'ultralytics/utils/torch_utils.py'按照如下的图片进行修改,否则容易打印不出来计算量。


注意事项!!!

如果大家在验证的时候报错形状不匹配的错误可以固定 验证集 的图片尺寸,方法如下 ->

找到下面这个文件ultralytics/ models /yolo/detect/train.py然后其中有一个类是DetectionTrainer class中的build_dataset函数中的一个参数rect=mode == 'val'改为rect=False


五、EMO的yaml文件

5.1 EMO的yaml文件

训练信息:YOLO11-EMO summary: 860 layers, 2,423,567 parameters, 2,423,551 gradients, 6.5 GFLOPs

  1. # Ultralytics YOLO 🚀, AGPL-3.0 license
  2. # YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
  3. # Parameters
  4. nc: 80 # number of classes
  5. scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  6. # [depth, width, max_channels]
  7. n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  8. s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  9. m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  10. l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  11. x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
  12. # 我提供了版本分别是对应是 ['EMO_1M', 'EMO_2M', 'EMO_5M', 'EMO_6M']
  13. # 其中n是对应yolo的版本通道放缩 large 和 small 是模型官方本身自带的版本
  14. # YOLO11n backbone
  15. backbone:
  16. # [from, repeats, module, args]
  17. - [-1, 1, EMO_1M, [0.25]] # 0-4 P1/2 这里是四层大家不要被yaml文件限制住了思维.
  18. # 注意args位置的参数对应模型的通道放缩系数width在上面scales位置, 假设你用yolov11n那么可以设置0.25 如果你用yolov11s可以设置0.5
  19. - [-1, 1, SPPF, [1024, 5]] # 5
  20. - [-1, 2, C2PSA, [1024]] # 6
  21. # YOLO11n head
  22. head:
  23. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  24. - [[-1, 3], 1, Concat, [1]] # cat backbone P4
  25. - [-1, 2, C3k2, [512, False]] # 9
  26. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  27. - [[-1, 2], 1, Concat, [1]] # cat backbone P3
  28. - [-1, 2, C3k2, [256, False]] # 12 (P3/8-small)
  29. - [-1, 1, Conv, [256, 3, 2]]
  30. - [[-1, 9], 1, Concat, [1]] # cat head P4
  31. - [-1, 2, C3k2, [512, False]] # 15 (P4/16-medium)
  32. - [-1, 1, Conv, [512, 3, 2]]
  33. - [[-1, 6], 1, Concat, [1]] # cat head P5
  34. - [-1, 2, C3k2, [1024, True]] # 18 (P5/32-large)
  35. - [[12, 15, 18], 1, Detect, [nc]] # Detect(P3, P4, P5)


5.2 训练文件的代码

可以复制我的运行文件进行运行。

  1. import warnings
  2. warnings.filterwarnings('ignore')
  3. from ultralytics import YOLO
  4. if __name__ == '__main__':
  5. model = YOLO('yolov8-MLLA.yaml')
  6. # 如何切换模型版本, 上面的ymal文件可以改为 yolov8s.yaml就是使用的v8s,
  7. # 类似某个改进的yaml文件名称为yolov8-XXX.yaml那么如果想使用其它版本就把上面的名称改为yolov8l-XXX.yaml即可(改的是上面YOLO中间的名字不是配置文件的)!
  8. # model.load('yolov8n.pt') # 是否加载预训练权重,科研不建议大家加载否则很难提升精度
  9. model.train(data=r"C:\Users\Administrator\PycharmProjects\yolov5-master\yolov5-master\Construction Site Safety.v30-raw-images_latestversion.yolov8\data.yaml",
  10. # 如果大家任务是其它的'ultralytics/cfg/default.yaml'找到这里修改task可以改成detect, segment, classify, pose
  11. cache=False,
  12. imgsz=640,
  13. epochs=150,
  14. single_cls=False, # 是否是单类别检测
  15. batch=16,
  16. close_mosaic=0,
  17. workers=0,
  18. device='0',
  19. optimizer='SGD', # using SGD
  20. # resume='runs/train/exp21/weights/last.pt', # 如过想续训就设置last.pt的地址
  21. amp=True, # 如果出现训练损失为Nan可以关闭amp
  22. project='runs/train',
  23. name='exp',
  24. )


六、成功运行记录

下面是成功运行的截图,已经完成了有1个epochs的训练,图片太大截不全第2个epochs了。


七、本文总结

到此本文的正式分享内容就结束了,在这里给大家推荐我的YOLOv11改进有效涨点专栏,本专栏目前为新开的平均质量分98分,后期我会根据各种最新的前沿顶会进行论文复现,也会对一些老的改进机制进行补充 如果大家觉得本文帮助到你了,订阅本专栏,关注后续更多的更新~

​​