MobileNet

MobileNet 系列

MobileNet 是 Google 针对移动端和嵌入式设备提出的轻量级网络系列，主要特点是低延迟、低功耗、模型小。

MobileNet V1

核心创新：深度可分离卷积 (Depthwise Separable Convolution)

传统卷积同时考虑空间信息（Spatial）和通道信息（Channel）。MobileNet V1 将其拆分为两个步骤：

Depthwise Convolution (DW 卷积)：对每个输入通道单独进行卷积（卷积核 depth=1），提取空间特征。
Pointwise Convolution (PW 卷积)：使用 $1 \times 1$ 卷积，将 DW 卷积的输出在通道维度上进行线性组合，提取通道特征。

计算量对比：假设输入尺寸 $D_F \times D_F \times M$，输出通道 $N$，卷积核 $D_K \times D_K$。

标准卷积计算量：$D_K \cdot D_K \cdot M \cdot N \cdot D_F \cdot D_F$
DW+PW 计算量：$D_K \cdot D_K \cdot M \cdot D_F \cdot D_F + M \cdot N \cdot D_F \cdot D_F$
压缩比： $$ \frac{D_K \cdot D_K \cdot M \cdot D_F \cdot D_F + M \cdot N \cdot D_F \cdot D_F}{D_K \cdot D_K \cdot M \cdot N \cdot D_F \cdot D_F} = \frac{1}{N} + \frac{1}{D_K^2} $$ 当 $D_K=3$ 时，计算量大约减少到原来的 1/8 - 1/9。

超参数：

Width Multiplier ($\alpha$)：控制通道数，$\alpha \in (0, 1]$。
Resolution Multiplier ($\rho$)：控制输入分辨率。

Pytorch示例

import torch
import torch.nn as nn

class DepthwiseSeparableConv(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1):
        super().__init__()
        self.depthwise = nn.Sequential(
            nn.Conv2d(in_channels, in_channels, kernel_size=3, stride=stride, padding=1, groups=in_channels, bias=False),
            nn.BatchNorm2d(in_channels),
            nn.ReLU(inplace=True)
        )
        self.pointwise = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, bias=False),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(inplace=True)
        )

    def forward(self, x):
        x = self.depthwise(x)
        x = self.pointwise(x)
        return x

MobileNet V2

核心创新：Inverted Residuals (倒残差结构) & Linear Bottlenecks

V1 的问题：DW 卷积部分训练容易废掉（大部分参数为 0），主要原因是 ReLU 对低维特征信息的破坏。

改进点：

Inverted Residuals：
- ResNet (Bottleneck)：两头大中间小（降维 -> 卷积 -> 升维）。
- MobileNet V2：两头小中间大（升维 -> DW 卷积 -> 降维）。先通过 $1 \times 1$ 卷积扩展通道（Expansion factor $t=6$），在高维空间进行 DW 卷积提取特征，再通过 $1 \times 1$ 卷积投影回低维。
- 理由：高维空间中 ReLU 丢失信息较少，DW 卷积在高维特征上效果更好。
Linear Bottlenecks：
- 在最后一个 $1 \times 1$ 卷积（降维）之后，移除 ReLU 激活函数，直接输出线性结果。
- 理由：在低维空间使用 ReLU 会造成严重的信息丢失。

结构： Input -> [1x1 Conv, ReLU6] (Expand) -> [3x3 DW Conv, ReLU6] -> [1x1 Conv, Linear] (Project) -> Output (当 stride=1 且 input/output channel 相同时使用 Shortcut 连接)

Pytorch示例

import torch
import torch.nn as nn

class InvertedResidual(nn.Module):
    def __init__(self, in_channels, out_channels, stride, expand_ratio):
        super().__init__()
        self.stride = stride
        hidden_dim = int(round(in_channels * expand_ratio))
        self.use_res_connect = self.stride == 1 and in_channels == out_channels

        layers = []
        if expand_ratio != 1:
            # pw
            layers.append(nn.Sequential(
                nn.Conv2d(in_channels, hidden_dim, 1, 1, 0, bias=False),
                nn.BatchNorm2d(hidden_dim),
                nn.ReLU6(inplace=True)
            ))
        layers.extend([
            # dw
            nn.Sequential(
                nn.Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim, bias=False),
                nn.BatchNorm2d(hidden_dim),
                nn.ReLU6(inplace=True)
            ),
            # pw-linear
            nn.Conv2d(hidden_dim, out_channels, 1, 1, 0, bias=False),
            nn.BatchNorm2d(out_channels),
        ])
        self.conv = nn.Sequential(*layers)

    def forward(self, x):
        if self.use_res_connect:
            return x + self.conv(x)
        else:
            return self.conv(x)

MobileNet V3

核心创新：NAS 搜索 + h-swish + SE 模块

MobileNet V3 是通过 AutoML (NAS) 搜索出来的网络结构，结合了人工设计的改进。分为 V3-Large 和 V3-Small 两个版本。

改进点：

网络架构搜索 (NAS)：使用 MnasNet 的方法搜索整体架构，使用 NetAdapt 算法微调层结构。
h-swish 激活函数：
- swish: $x \cdot \sigma(x)$，计算 sigmoid 开销大，不利于量化。
- h-swish (hard-swish): $x \cdot \frac{\text{ReLU6}(x+3)}{6}$。
- 近似效果好，计算快，对量化友好。
Squeeze-and-Excitation (SE) 模块：
- 引入轻量级的 SE 模块（注意力机制），在 Bottleneck 中加入。
- 为了减少计算量，SE 模块的中间层通道数设为输入通道的 $1/4$。
尾部结构优化：
- 重新设计了最后几层，减少了计算量和延迟（移除了 V2 中昂贵的最后 $1 \times 1$ 卷积前的层）。
首层卷积核优化：
- 将第一层 $3 \times 3$ 卷积核个数从 32 减为 16。

Pytorch示例

class h_swish(nn.Module):
    def forward(self, x):
        return x * nn.functional.relu6(x + 3, inplace=True) / 6

class h_sigmoid(nn.Module):
    def forward(self, x):
        return nn.functional.relu6(x + 3, inplace=True) / 6

class SqueezeExcite(nn.Module):
    def __init__(self, in_channels, squeeze_channels):
        super().__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Sequential(
            nn.Conv2d(in_channels, squeeze_channels, 1),
            nn.ReLU(inplace=True),
            nn.Conv2d(squeeze_channels, in_channels, 1),
            h_sigmoid()
        )

    def forward(self, x):
        scale = self.avg_pool(x)
        scale = self.fc(scale)
        return x * scale