[PyTorch深度学习]--常用激活函数详解-365提款-48365365-beat365唯一网址-365提款

一、常用激活函数详解1. Sigmoid（S型函数）原理公式特点适用场景

2. Tanh（双曲正切函数）原理公式特点适用场景

3. ReLU（修正线性单元）原理公式特点适用场景

4. Leaky ReLU（带泄露的 ReLU）原理公式特点适用场景

5. ELU（指数线性单元）原理公式特点适用场景

6. GELU（高斯误差线性单元）原理公式特点适用场景

7. SiLU（Sigmoid 线性单元）原理公式特点适用场景

二、激活函数对比总结1. 各激活函数核心特点对比2. 各激活函数对比总结（省流版本)

一、常用激活函数详解

本文将详细介绍常用的激活函数：Sigmoid、Tanh、ReLU、ELU、Leaky ReLU、GELU 和 SiLU。具体地，每种激活函数将从其原理、公式和特点三个方面进行阐述。

1. Sigmoid（S型函数）

原理

将输入压缩到 (0,1) 区间，模拟神经元的“激活率”，常用于二分类问题的输出层。

公式

(

)

−

f(x) = \frac{1}{1 + e^{-x}}

f(x)=1+e−x1

特点

优点：

输出范围在(0, 1)之间，为概率形式，适合二分类任务。缺点：

梯度消失：存在梯度消失问题，导致深层网络训练困难。输入绝对值较大时，梯度接近零，导致参数更新缓慢。非零中心化：输出均值不为零，可能降低梯度下降效率。

适用场景

二分类输出层（如逻辑回归）。不推荐用于隐藏层。

PyTorch 中Sigmoid激活函数的实现示例，包含直接调用函数和作为神经网络模块的两种方式：

import torch

import torch.nn as nn

# 方法1: 直接使用 torch.sigmoid() 函数

x = torch.tensor([-2.0, 0.0, 1.0, 5.0]) # 输入张量

output = torch.sigmoid(x) # 应用Sigmoid

print("输入:", x)

print("输出:", output)

# 方法2: 作为神经网络模块使用

class SimpleModel(nn.Module):

def __init__(self):

super().__init__()

self.sigmoid = nn.Sigmoid() # 定义Sigmoid层

def forward(self, x):

return self.sigmoid(x)

# 测试模块

model = SimpleModel()

x = torch.randn(2, 2) # 生成随机输入

output = model(x)

print("\n随机输入:\n", x)

print("Sigmoid输出:\n", output)

输出结果示例：

输入: tensor([-2., 0., 1., 5.])

输出: tensor([0.1192, 0.5000, 0.7311, 0.9933])

随机输入:

tensor([[ 2.1243, -0.2247],

[ 0.5646, -0.6863]])

Sigmoid输出:

tensor([[0.8932, 0.4441],

[0.6375, 0.3349]])

注： – 函数调用：torch.sigmoid() 可直接对张量操作，适合快速实验。 – 模块化使用：nn.Sigmoid() 作为网络层，便于集成到 nn.Sequential 或复杂模型中。

2. Tanh（双曲正切函数）

原理

改进版 Sigmoid，输出范围 (-1,1)，零中心化，适合隐藏层。

公式

(

)

−

⋅

Sigmoid

(

)

−

f(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}} = 2 \cdot \text{Sigmoid}(2x) - 1

f(x)=ex+e−xex−e−x=2⋅Sigmoid(2x)−1

特点

优点：

输出对称，适用于输入数据对称分布的情况。零中心化，梯度比 Sigmoid 更大（最大梯度为 1）。缺点：

仍存在梯度消失问题。

适用场景

RNN 的隐藏层（替代 Sigmoid）。

实现示例:

import torch

import torch.nn as nn

# 方法1: 直接调用函数

x = torch.tensor([-2.0, 0.0, 1.0, 5.0])

output = torch.tanh(x) # 或 x.tanh()

print("输入:", x)

print("Tanh输出:", output)

# 方法2: 模块化使用

class TanhModel(nn.Module):

def __init__(self):

super().__init__()

self.tanh = nn.Tanh()

def forward(self, x):

return self.tanh(x)

# 测试模块

model = TanhModel()

x = torch.randn(2, 2)

output = model(x)

print("\n随机输入:\n", x)

print("Tanh输出:\n", output)

输出结果示例：

输入: tensor([-2., 0., 1., 5.])

Tanh输出: tensor([-0.9640, 0.0000, 0.7616, 0.9999])

随机输入:

tensor([[ 0.6681, -0.3812],

[-1.4497, -1.2516]])

Tanh输出:

tensor([[ 0.5837, -0.3638],

[-0.8956, -0.8487]])

3. ReLU（修正线性单元）

原理

正区间保留输入，负区间输出零，简单且高效。

公式

(

)

max

⁡

(

)

f(x) = \max(0, x)

f(x)=max(0,x)

特点

优点：

计算高效：无指数运算。缓解梯度消失：正区间梯度恒为 1。缺点：

死亡神经元问题：负区间梯度为零，可能导致神经元永久失活。

适用场景

大多数深度神经网络的隐藏层（如 CNN、全连接网络）。

实现示例:

# 方法1: 直接调用函数

x = torch.tensor([-2.0, 0.0, 1.0, 5.0])

output = torch.relu(x) # 或 F.relu(x)

print("输入:", x)

print("ReLU输出:", output)

# 方法2: 模块化使用

class ReLUModel(nn.Module):

def __init__(self):

super().__init__()

self.relu = nn.ReLU()

def forward(self, x):

return self.relu(x)

# 测试模块

model = ReLUModel()

x = torch.randn(2, 2)

output = model(x)

print("\n随机输入:\n", x)

print("ReLU输出:\n", output)

输出结果示例：

输入: tensor([-2., 0., 1., 5.])

ReLU输出: tensor([0., 0., 1., 5.])

随机输入:

tensor([[-0.1241, 0.8224],

[-1.1693, 0.5198]])

ReLU输出:

tensor([[0.0000, 0.8224],

[0.0000, 0.5198]])

4. Leaky ReLU（带泄露的 ReLU）

原理

负区间引入小斜率（α），缓解死亡神经元问题。

公式

(

)

{

≥

(

通常设为

0.01

)

f(x) = \begin{cases} x & x \geq 0 \\ \alpha x & x < 0 \end{cases} \quad (\alpha \text{ 通常设为 } 0.01)

f(x)={xαxx≥0x<0(α 通常设为 0.01)

特点

优点：

避免了“死神经元”问题。保持了ReLU 的高效性。缺点：

需手动设置 α，性能对超参数敏感。

适用场景

生成对抗网络（GAN）等对稳定性要求高的模型。

实现示例:

# 方法1: 直接调用函数

x = torch.tensor([-2.0, 0.0, 1.0, 5.0])

output = torch.nn.functional.leaky_relu(x, negative_slope=0.01)

print("输入:", x)

print("Leaky ReLU输出:", output)

# 方法2: 模块化使用（需指定斜率alpha）

class LeakyReLUModel(nn.Module):

def __init__(self):

super().__init__()

self.leaky_relu = nn.LeakyReLU(negative_slope=0.01)

def forward(self, x):

return self.leaky_relu(x)

# 测试模块

model = LeakyReLUModel()

x = torch.randn(2, 2)

output = model(x)

print("\n随机输入:\n", x)

print("Leaky ReLU输出:\n", output)

输出结果示例：

输入: tensor([-2., 0., 1., 5.])

Leaky ReLU输出: tensor([-0.0200, 0.0000, 1.0000, 5.0000])

随机输入:

tensor([[ 0.2123, -2.0427],

[ 1.3490, -1.8808]])

Leaky ReLU输出:

tensor([[ 0.2123, -0.0204],

[ 1.3490, -0.0188]])

5. ELU（指数线性单元）

原理

负区间使用指数函数，输出平滑且接近零，减少噪声。

公式

(

)

{

≥

(

−

)

(

通常设为

)

f(x) = \begin{cases} x & x \geq 0 \\ \alpha (e^x - 1) & x < 0 \end{cases} \quad (\alpha \text{ 通常设为 } 1)

f(x)={xα(ex−1)x≥0x<0(α 通常设为 1)

特点

优点：

负区间输出平滑，梯度稳定。输出均值接近0，加快学习速度。缓解死亡神经元问题，因为负输入仍有梯度。缺点：

指数运算较慢。

适用场景

对噪声敏感的任务（如图像生成）。

实现示例:

# 方法1: 直接调用函数

x = torch.tensor([-2.0, 0.0, 1.0, 5.0])

output = torch.nn.functional.elu(x, alpha=1.0)

print("输入:", x)

print("ELU输出:", output)

# 方法2: 模块化使用（需指定alpha）

class ELUModel(nn.Module):

def __init__(self):

super().__init__()

self.elu = nn.ELU(alpha=1.0)

def forward(self, x):

return self.elu(x)

# 测试模块

model = ELUModel()

x = torch.randn(2, 2)

output = model(x)

print("\n随机输入:\n", x)

print("ELU输出:\n", output)

输出结果示例：

输入: tensor([-2., 0., 1., 5.])

ELU输出: tensor([-0.8647, 0.0000, 1.0000, 5.0000])

随机输入:

tensor([[-0.5995, -0.3823],

[-2.5030, 0.5544]])

ELU输出:

tensor([[-0.4509, -0.3177],

[-0.9182, 0.5544]])

6. GELU（高斯误差线性单元）

原理

结合 ReLU 和 Dropout 思想，通过概率视角建模神经元激活。

公式

(

)

⋅

(

)

(

)

为标准正态分布的累积分布函数

)

f(x) = x \cdot \Phi(x) \quad (\Phi(x) \text{ 为标准正态分布的累积分布函数})

f(x)=x⋅Φ(x)(Φ(x) 为标准正态分布的累积分布函数) 近似计算：

(

)

≈

0.5

(

tanh

⁡

(

0.044715

)

f(x) \approx 0.5x \left(1 + \tanh\left(\sqrt{\frac{2}{\pi}}(x + 0.044715x^3)\right)\right)

f(x)≈0.5x(1+tanh(π2

(x+0.044715x3)))

特点

优点：

在 Transformer（如 BERT、GPT）中表现优异。平滑非线性，适合深层网络，能够缓解梯度消失问题。缺点：

计算复杂，需近似实现。

适用场景

自然语言处理（NLP）和深层 Transformer 模型。

实现示例:

# 方法1: 直接调用函数（PyTorch 1.7+）

x = torch.tensor([-2.0, 0.0, 1.0, 5.0])

output = torch.nn.functional.gelu(x)

print("输入:", x)

print("GELU输出:", output)

# 方法2: 模块化使用

class GELUModel(nn.Module):

def __init__(self):

super().__init__()

self.gelu = nn.GELU()

def forward(self, x):

return self.gelu(x)

# 测试模块

model = GELUModel()

x = torch.randn(2, 2)

output = model(x)

print("\n随机输入:\n", x)

print("GELU输出:\n", output)

输出结果示例：

输入: tensor([-2., 0., 1., 5.])

GELU输出: tensor([-0.0455, 0.0000, 0.8413, 5.0000])

随机输入:

tensor([[ 0.2885, -2.6745],

[ 1.4673, -0.6222]])

GELU输出:

tensor([[ 0.1770, -0.0100],

[ 1.3629, -0.1661]])

7. SiLU（Sigmoid 线性单元）

原理

通过 Sigmoid 函数加权输入，平衡线性和非线性特性。

公式

(

)

⋅

(

)

(

为 Sigmoid 函数

可调参数

)

f(x) = x \cdot \sigma(\beta x) \quad (\sigma \text{ 为 Sigmoid 函数}, \beta \text{ 可调参数})

f(x)=x⋅σ(βx)(σ 为 Sigmoid 函数,β 可调参数) 固定 β = 1（SiLU）：

(

)

⋅

Sigmoid

(

)

f(x) = x \cdot \text{Sigmoid}(x)

f(x)=x⋅Sigmoid(x)

特点

优点：

结合了Sigmoid的光滑性和ReLU的非线性特性。负区间保留少量梯度，避免死亡神经元；能够自动调节神经元的权重，提高模型的泛化能力。缺点：

计算成本略高。

适用场景

计算机视觉（如 EfficientNet）和强化学习。

实现示例:

# 方法1: 直接调用函数（PyTorch 1.7+）

x = torch.tensor([-2.0, 0.0, 1.0, 5.0])

output = torch.nn.functional.silu(x)

print("输入:", x)

print("SiLU输出:", output)

# 方法2: 模块化使用

class SiLUModel(nn.Module):

def __init__(self):

super().__init__()

self.silu = nn.SiLU()

def forward(self, x):

return self.silu(x)

# 测试模块

model = SiLUModel()

x = torch.randn(2, 2)

output = model(x)

print("\n随机输入:\n", x)

print("SiLU输出:\n", output)

输出结果示例：

输入: tensor([-2., 0., 1., 5.])

SiLU输出: tensor([-0.2384, 0.0000, 0.7311, 4.9665])

随机输入:

tensor([[ 0.5578, -0.2293],

[-0.5709, 0.8190]])

SiLU输出:

tensor([[ 0.3547, -0.1016],

[-0.2061, 0.5684]])

二、激活函数对比总结

1. 各激活函数核心特点对比

特性SigmoidTanhReLULeaky ReLUELUGELUSiLU零中心化❌✅❌❌❌❌❌梯度消失缓解❌❌✅✅✅✅✅死亡神经元问题--✅❌❌❌❌计算复杂度中中低低高高中

2. 各激活函数对比总结（省流版本)

激活函数输出范围公式优点缺点适用场景Sigmoid(0, 1)

(

)

−

f(x) = \frac{1}{1 + e^{-x}}

f(x)=1+e−x1输出概率形式，适合二分类梯度消失，非零中心化二分类输出层Tanh(-1, 1)

(

)

−

f(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}

f(x)=ex+e−xex−e−x零中心化，梯度更大梯度消失RNN/LSTM 隐藏层ReLU[0, +∞)

(

)

max

⁡

(

)

f(x) = \max(0, x)

f(x)=max(0,x)计算高效，缓解梯度消失死亡神经元问题CNN/全连接网络隐藏层Leaky ReLU(-∞, +∞)

(

)

{

≥

f(x) = \begin{cases} x & x \geq 0 \\ \alpha x & x < 0 \end{cases}

f(x)={xαxx≥0x<0缓解死亡神经元，保留高效性需调参 (α)GAN、替代 ReLUELU[-α, +∞)

(

)

{

≥

(

−

)

f(x) = \begin{cases} x & x \geq 0 \\ \alpha (e^x - 1) & x < 0 \end{cases}

f(x)={xα(ex−1)x≥0x<0负区间平滑稳定，减少噪声计算指数较慢对噪声敏感的任务（如图像生成）GELU(-∞, +∞)

(

)

⋅

(

)

f(x) = x \cdot \Phi(x)

f(x)=x⋅Φ(x)SOTA 性能，适合深层网络计算复杂（需近似）Transformer、NLP 模型SiLU(-∞, +∞)

(

)

⋅

(

)

f(x) = x \cdot \sigma(\beta x)

f(x)=x⋅σ(βx)平滑非线性，性能优于 ReLU计算成本略高计算机视觉、强化学习

注：公式中 σ 表示 Sigmoid 函数，Φ(x) 为标准正态分布的累积分布函数。

[PyTorch深度学习]--常用激活函数详解

相关推荐

进球彩票复式投注计算器

谷歌浏览器插件强制开启的方法说明

穿越火线9A91-S系列武器测评

友情连接