{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# PyTorch - 灵活的深度学习框架教程\n", "\n", "欢迎来到 PyTorch 教程!PyTorch 是一个由 Facebook AI Research (FAIR) 开发和维护的开源机器学习库,广泛用于计算机视觉和自然语言处理等应用。\n", "\n", "**为什么 PyTorch 在 ML/DL/数据科学领域如此受欢迎?**\n", "\n", "1. **Pythonic 和易用性**:PyTorch 的 API 设计简洁直观,与 Python 和 NumPy 的风格非常接近,学习曲线相对平缓。\n", "2. **动态计算图 (Define-by-Run)**:计算图在运行时动态构建,使得调试更加容易,并且非常适合处理动态结构的模型(如 RNN)。\n", "3. **强大的 GPU 加速**:可以轻松地将计算移到 NVIDIA GPU 上执行,显著加速训练过程。\n", "4. **丰富的生态系统**:拥有庞大的社区和丰富的扩展库(如 TorchVision, TorchText, TorchAudio)。\n", "5. **研究友好**:由于其灵活性,PyTorch 在学术研究界非常受欢迎。\n", "\n", "**本教程将涵盖 PyTorch 的核心概念和基本用法:**\n", "\n", "1. **张量 (Tensors)**:PyTorch 的基本数据结构。\n", "2. **张量操作**:类似于 NumPy 的操作。\n", "3. **自动微分 (`torch.autograd`)**:计算梯度。\n", "4. **神经网络模块 (`torch.nn`)**:构建网络的基础。\n", "5. **构建一个简单的神经网络**。\n", "6. **损失函数 (`torch.nn`)**。\n", "7. **优化器 (`torch.optim`)**。\n", "8. **模型训练流程**。\n", "9. **数据加载 (`Dataset` & `DataLoader`)**。\n", "10. **GPU 使用基础**。\n", "11. **模型保存与加载**。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 准备工作:导入 PyTorch\n", "\n", "确保你已经安装了 PyTorch (可以访问 [pytorch.org](https://pytorch.org/) 获取适合你系统的安装命令)。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import torch\n", "import torch.nn as nn\n", "import torch.optim as optim\n", "import torch.utils.data as data\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "print(f\"PyTorch version: {torch.__version__}\")\n", "\n", "# 检查是否有可用的 GPU\n", "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n", "print(f\"Using device: {device}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. 张量 (Tensors)\n", "\n", "张量是 PyTorch 中的核心数据结构,类似于 NumPy 的 `ndarray`,但增加了在 GPU 上计算和自动微分的功能。\n", "\n", "可以从 Python 列表、NumPy 数组等创建张量,也可以使用 PyTorch 提供的函数创建。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"--- Creating Tensors ---\")\n", "# 从列表创建\n", "data_list = [[1, 2], [3, 4]]\n", "tensor_from_list = torch.tensor(data_list)\n", "print(f\"Tensor from list:\\n{tensor_from_list}\")\n", "print(f\"Tensor dtype: {tensor_from_list.dtype}\") # 默认通常是 torch.int64\n", "\n", "# 从 NumPy 数组创建 (共享内存,除非显式复制)\n", "np_array = np.array([[5, 6], [7, 8]], dtype=np.float32)\n", "tensor_from_np = torch.from_numpy(np_array)\n", "print(f\"\\nTensor from NumPy array:\\n{tensor_from_np}\")\n", "print(f\"Tensor dtype: {tensor_from_np.dtype}\")\n", "\n", "# 修改 NumPy 数组会影响 Tensor (反之亦然)\n", "# np_array[0, 0] = 99\n", "# print(f\"Tensor after modifying NumPy array:\\n{tensor_from_np}\")\n", "\n", "# 创建特定形状和类型的张量\n", "tensor_zeros = torch.zeros(2, 3) # 2x3 全零张量 (默认 float32)\n", "print(f\"\\nZeros tensor (2x3):\\n{tensor_zeros}\")\n", "\n", "tensor_ones = torch.ones(3, 2, dtype=torch.int)\n", "print(f\"\\nOnes tensor (3x2, int):\\n{tensor_ones}\")\n", "\n", "tensor_rand = torch.rand(2, 2) # [0, 1) 均匀分布随机数\n", "print(f\"\\nRandom tensor (2x2, uniform [0,1)):\\n{tensor_rand}\")\n", "\n", "tensor_randn = torch.randn(3, 1) # 标准正态分布随机数\n", "print(f\"\\nRandom tensor (3x1, standard normal):\\n{tensor_randn}\")\n", "\n", "# 获取张量属性\n", "print(f\"\\nProperties of tensor_from_list:\")\n", "print(f\" Shape: {tensor_from_list.shape}\")\n", "print(f\" Size: {tensor_from_list.size()}\") # size() 和 shape 类似\n", "print(f\" Data type: {tensor_from_list.dtype}\")\n", "print(f\" Device: {tensor_from_list.device}\") # 存储设备 (默认是 'cpu')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. 张量操作\n", "\n", "PyTorch 张量支持丰富的操作,语法与 NumPy 非常相似。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "t1 = torch.tensor([[1., 2.], [3., 4.]])\n", "t2 = torch.tensor([[5., 6.], [7., 8.]])\n", "scalar = 10\n", "\n", "print(f\"Tensor t1:\\n{t1}\")\n", "print(f\"Tensor t2:\\n{t2}\")\n", "\n", "# 算术运算 (Element-wise)\n", "print(\"\\n--- Arithmetic Operations ---\")\n", "print(f\"t1 + t2:\\n{t1 + t2}\")\n", "print(f\"torch.add(t1, t2):\\n{torch.add(t1, t2)}\")\n", "print(f\"t1 * t2:\\n{t1 * t2}\") # 逐元素乘法\n", "print(f\"t1 + scalar:\\n{t1 + scalar}\")\n", "\n", "# In-place 操作 (通常方法名后带下划线 '_')\n", "t1_copy = t1.clone() # 创建副本\n", "t1_copy.add_(t2) # t1_copy = t1_copy + t2\n", "print(f\"\\nt1_copy after add_:\\n{t1_copy}\")\n", "\n", "# 矩阵乘法\n", "print(\"\\n--- Matrix Multiplication ---\")\n", "# 使用 @ 运算符 或 torch.matmul()\n", "mat_mul = t1 @ t2\n", "print(f\"t1 @ t2:\\n{mat_mul}\")\n", "mat_mul_func = torch.matmul(t1, t2)\n", "print(f\"torch.matmul(t1, t2):\\n{mat_mul_func}\")\n", "\n", "# 索引和切片 (类似 NumPy)\n", "print(\"\\n--- Indexing and Slicing ---\")\n", "print(f\"t1[0, 1]: {t1[0, 1]}\") # 第一行,第二列\n", "print(f\"t1[:, 0]: {t1[:, 0]}\") # 第一列\n", "print(f\"t1[t1 > 2]: {t1[t1 > 2]}\") # 布尔索引\n", "\n", "# 形状操作\n", "print(\"\\n--- Reshaping ---\")\n", "t_flat = torch.arange(6) # [0, 1, 2, 3, 4, 5]\n", "t_reshaped = t_flat.view(2, 3) # 改变视图,共享数据 (类似 reshape 但不保证)\n", "print(f\"Original flat tensor: {t_flat}\")\n", "print(f\"View as (2, 3):\\n{t_reshaped}\")\n", "t_reshape = t_flat.reshape(3, 2) # 更通用的 reshape,可能返回副本\n", "print(f\"Reshape as (3, 2):\\n{t_reshape}\")\n", "\n", "# NumPy <-> PyTorch 转换\n", "print(\"\\n--- NumPy Bridge ---\")\n", "np_arr = np.ones((2,2))\n", "torch_tensor = torch.from_numpy(np_arr)\n", "print(f\"Tensor from NumPy:\\n{torch_tensor}\")\n", "np_back = torch_tensor.numpy() # 转换回 NumPy 数组 (共享内存)\n", "print(f\"NumPy from Tensor:\\n{np_back}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. 自动微分 (`torch.autograd`)\n", "\n", "`autograd` 是 PyTorch 的核心功能之一,它为张量上的所有操作提供自动微分。这对于神经网络的反向传播至关重要。\n", "\n", "* 设置 `requires_grad=True` 的张量会跟踪其上的所有操作。\n", "* 当计算完成后,可以调用 `.backward()`,PyTorch 会自动计算所有梯度。\n", "* 梯度会累积到张量的 `.grad` 属性中。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"--- Autograd Example ---\")\n", "# 创建需要计算梯度的张量\n", "x = torch.tensor(2.0, requires_grad=True)\n", "w = torch.tensor(3.0, requires_grad=True)\n", "b = torch.tensor(1.0, requires_grad=True)\n", "\n", "# 定义一个简单的计算图 (e.g., y = w * x + b)\n", "y = w * x + b\n", "print(f\"y = w*x + b = {y}\")\n", "\n", "# 定义一个损失函数 (e.g., L = y^2)\n", "L = y.pow(2)\n", "print(f\"L = y^2 = {L}\")\n", "\n", "# 执行反向传播计算梯度\n", "print(\"\\nCalling L.backward()...\")\n", "L.backward()\n", "\n", "# 查看梯度 (dL/dx, dL/dw, dL/db)\n", "# dL/dy = 2*y = 2*(w*x+b) = 2*(3*2+1) = 14\n", "# dL/dw = dL/dy * dy/dw = (2*y) * x = 14 * 2 = 28\n", "# dL/dx = dL/dy * dy/dx = (2*y) * w = 14 * 3 = 42\n", "# dL/db = dL/dy * dy/db = (2*y) * 1 = 14 * 1 = 14\n", "print(f\"Gradient dL/dx (x.grad): {x.grad}\")\n", "print(f\"Gradient dL/dw (w.grad): {w.grad}\")\n", "print(f\"Gradient dL/db (b.grad): {b.grad}\")\n", "\n", "# --- 停止梯度跟踪 ---\n", "# 使用 torch.no_grad() 上下文管理器\n", "print(\"\\nInside torch.no_grad():\")\n", "with torch.no_grad():\n", " y_no_grad = w * x + b\n", " print(f\" y_no_grad = {y_no_grad}\")\n", " print(f\" y_no_grad.requires_grad: {y_no_grad.requires_grad}\") # False\n", "\n", "# 或者使用 .detach() 创建一个不带梯度历史的新张量\n", "x_detached = x.detach()\n", "print(f\"\\nx_detached requires_grad: {x_detached.requires_grad}\") # False\n", "\n", "# 梯度会累积,在优化循环中需要清零\n", "print(\"\\nGradient accumulation:\")\n", "y2 = w * x + b\n", "L2 = y2.pow(2)\n", "L2.backward() # 再次计算梯度\n", "print(f\"w.grad after second backward(): {w.grad}\") # 梯度变成了 28 + 28 = 56\n", "\n", "# 清零梯度 (在优化循环开始时)\n", "w.grad.zero_()\n", "x.grad.zero_()\n", "b.grad.zero_()\n", "print(f\"w.grad after zero_(): {w.grad}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. 神经网络模块 (`torch.nn`)\n", "\n", "`torch.nn` 模块是构建神经网络的核心。它提供了各种网络层、损失函数和容器。\n", "\n", "* **`nn.Module`**: 所有神经网络模块的基类。自定义网络层需要继承它,并实现 `forward()` 方法。\n", "* **常用层**: `nn.Linear`, `nn.Conv2d`, `nn.RNN`, `nn.LSTM`, `nn.Embedding`, `nn.Dropout`, `nn.BatchNorm2d` 等。\n", "* **激活函数**: `nn.ReLU`, `nn.Sigmoid`, `nn.Tanh`, `nn.Softmax` 等 (通常在 `nn.functional` 中也有函数形式)。\n", "* **损失函数**: `nn.MSELoss`, `nn.CrossEntropyLoss`, `nn.BCELoss` 等。\n", "* **容器**: `nn.Sequential` (按顺序连接模块), `nn.ModuleList`, `nn.ModuleDict`。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"--- torch.nn Examples ---\")\n", "\n", "# 全连接层 (Linear Layer)\n", "input_features = 4\n", "output_features = 2\n", "linear_layer = nn.Linear(input_features, output_features)\n", "print(f\"Linear layer: {linear_layer}\")\n", "print(f\"Linear layer weight shape: {linear_layer.weight.shape}\") # [output_features, input_features]\n", "print(f\"Linear layer bias shape: {linear_layer.bias.shape}\") # [output_features]\n", "\n", "# 示例输入 (需要 Batch 维度, e.g., [batch_size, num_features])\n", "sample_input = torch.randn(3, input_features) # Batch of 3 samples\n", "output = linear_layer(sample_input)\n", "print(f\"\\nSample input shape: {sample_input.shape}\")\n", "print(f\"Output shape from linear layer: {output.shape}\") # [batch_size, output_features]\n", "\n", "# ReLU 激活函数\n", "relu = nn.ReLU()\n", "output_relu = relu(output)\n", "print(f\"\\nOutput after ReLU:\\n{output_relu}\")\n", "\n", "# 使用 Sequential 容器构建简单网络\n", "model_sequential = nn.Sequential(\n", " nn.Linear(10, 20), # 输入 10, 输出 20\n", " nn.ReLU(),\n", " nn.Linear(20, 5) # 输入 20, 输出 5\n", ")\n", "print(f\"\\nSequential model:\\n{model_sequential}\")\n", "\n", "# 示例输入\n", "seq_input = torch.randn(4, 10) # Batch of 4, 10 features\n", "seq_output = model_sequential(seq_input)\n", "print(f\"Output shape from sequential model: {seq_output.shape}\") # [4, 5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. 构建一个简单的神经网络\n", "\n", "通常通过继承 `nn.Module` 并定义 `__init__` 和 `forward` 方法来构建自定义网络。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"--- Building a Simple Network (Custom Class) ---\")\n", "\n", "class SimpleNet(nn.Module):\n", " def __init__(self, input_size, hidden_size, output_size):\n", " super(SimpleNet, self).__init__() # 调用父类的 __init__\n", " self.layer1 = nn.Linear(input_size, hidden_size)\n", " self.relu = nn.ReLU()\n", " self.layer2 = nn.Linear(hidden_size, output_size)\n", " print(f\"SimpleNet initialized with input={input_size}, hidden={hidden_size}, output={output_size}\")\n", "\n", " def forward(self, x):\n", " # 定义数据如何通过网络层流动\n", " print(\" SimpleNet forward pass starting...\")\n", " out = self.layer1(x)\n", " print(f\" Shape after layer1: {out.shape}\")\n", " out = self.relu(out)\n", " print(f\" Shape after relu: {out.shape}\")\n", " out = self.layer2(out)\n", " print(f\" Shape after layer2 (final output): {out.shape}\")\n", " return out\n", "\n", "# 实例化网络\n", "input_dim = 5\n", "hidden_dim = 8\n", "output_dim = 2\n", "net = SimpleNet(input_dim, hidden_dim, output_dim)\n", "print(f\"\\nNetwork structure:\\n{net}\")\n", "\n", "# 查看网络参数\n", "print(\"\\nNetwork parameters:\")\n", "for name, param in net.named_parameters():\n", " if param.requires_grad:\n", " print(f\" {name}: shape={param.shape}, requires_grad={param.requires_grad}\")\n", "\n", "# 示例前向传播\n", "print(\"\\nPerforming a forward pass:\")\n", "dummy_input = torch.randn(4, input_dim) # Batch of 4\n", "output = net(dummy_input)\n", "print(f\"Output tensor:\\n{output}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. 损失函数 (`torch.nn`)\n", "\n", "损失函数(或成本函数)衡量模型输出与真实目标之间的差距。训练的目标是最小化损失。\n", "\n", "* 回归常用:`nn.MSELoss` (均方误差), `nn.L1Loss` (平均绝对误差)。\n", "* 分类常用:`nn.CrossEntropyLoss` (通常用于多分类,内部结合了 `LogSoftmax` 和 `NLLLoss`),`nn.BCELoss` (二元交叉熵,需要输入概率和 Sigmoid 激活)。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"--- Loss Function Examples ---\")\n", "\n", "# --- MSELoss for Regression --- \n", "loss_fn_mse = nn.MSELoss()\n", "predictions_reg = torch.tensor([2.5, 4.8, 1.9])\n", "targets_reg = torch.tensor([3.0, 5.0, 1.5])\n", "mse_loss = loss_fn_mse(predictions_reg, targets_reg)\n", "print(f\"MSE Loss: {mse_loss.item()}\") # .item() 获取标量张量的值\n", "\n", "# --- CrossEntropyLoss for Multi-class Classification --- \n", "# 输入: 模型输出的原始分数 (logits),形状 [BatchSize, NumClasses]\n", "# 目标: 类别索引 (整数),形状 [BatchSize]\n", "loss_fn_ce = nn.CrossEntropyLoss()\n", "predictions_clf = torch.tensor([[2.0, -1.0, 0.5], # Sample 1 scores for 3 classes\n", " [0.1, 1.5, -0.5]]) # Sample 2 scores\n", "targets_clf = torch.tensor([0, 1]) # Sample 1 true class is 0, Sample 2 true class is 1\n", "ce_loss = loss_fn_ce(predictions_clf, targets_clf)\n", "print(f\"\\nCross Entropy Loss: {ce_loss.item()}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 7. 优化器 (`torch.optim`)\n", "\n", "优化器实现了更新模型参数(权重和偏置)以最小化损失函数的算法。\n", "\n", "* 常用优化器:`optim.SGD` (随机梯度下降), `optim.Adam`, `optim.RMSprop`, `optim.Adagrad`。\n", "* 需要将模型的参数 (`model.parameters()`) 和学习率 (`lr`) 传递给优化器构造函数。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"--- Optimizer Example (using SimpleNet) ---\")\n", "\n", "learning_rate = 0.01\n", "\n", "# 使用 SGD\n", "optimizer_sgd = optim.SGD(net.parameters(), lr=learning_rate, momentum=0.9)\n", "print(f\"SGD Optimizer: {optimizer_sgd}\")\n", "\n", "# 使用 Adam (常用的自适应学习率优化器)\n", "optimizer_adam = optim.Adam(net.parameters(), lr=learning_rate)\n", "print(f\"Adam Optimizer: {optimizer_adam}\")\n", "\n", "# 优化器的关键方法:\n", "# - optimizer.zero_grad(): 清除所有参数的梯度 (在计算新梯度前必须调用)\n", "# - optimizer.step(): 根据计算出的梯度更新参数" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 8. 模型训练流程\n", "\n", "一个典型的 PyTorch 训练循环包含以下步骤:\n", "\n", "1. **前向传播 (Forward Pass)**:将输入数据传递给模型,得到输出。\n", "2. **计算损失 (Compute Loss)**:使用损失函数计算模型输出和真实目标之间的差距。\n", "3. **反向传播 (Backward Pass)**:调用 `loss.backward()` 计算损失相对于模型参数的梯度。\n", "4. **更新参数 (Update Parameters)**:调用 `optimizer.step()` 更新模型参数。\n", "5. **清零梯度 (Zero Gradients)**:调用 `optimizer.zero_grad()` 清除梯度,为下一次迭代做准备。\n", "\n", "这个过程通常在一个循环中重复多个周期 (epochs)。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"--- Simple Training Loop Example (Linear Regression) ---\")\n", "\n", "# 1. 准备数据 (简单线性关系 y = 2*x + 1 + noise)\n", "X_numpy = np.linspace(0, 10, 100).reshape(-1, 1) # 100个样本, 1个特征\n", "y_numpy = 2 * X_numpy + 1 + np.random.randn(100, 1) * 2 # 添加噪声\n", "\n", "X_train = torch.from_numpy(X_numpy.astype(np.float32))\n", "y_train = torch.from_numpy(y_numpy.astype(np.float32))\n", "\n", "# 2. 定义模型、损失和优化器\n", "input_size = 1\n", "output_size = 1\n", "model = nn.Linear(input_size, output_size)\n", "criterion = nn.MSELoss()\n", "optimizer = optim.SGD(model.parameters(), lr=0.01)\n", "\n", "print(f\"Initial model weights: {model.weight.item():.3f}, bias: {model.bias.item():.3f}\")\n", "\n", "# 3. 训练循环\n", "num_epochs = 100\n", "losses = []\n", "\n", "for epoch in range(num_epochs):\n", " # 前向传播\n", " outputs = model(X_train)\n", " loss = criterion(outputs, y_train)\n", " \n", " # 反向传播和优化\n", " optimizer.zero_grad() # 清零梯度\n", " loss.backward() # 计算梯度\n", " optimizer.step() # 更新权重\n", " \n", " losses.append(loss.item())\n", " \n", " if (epoch + 1) % 20 == 0:\n", " print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')\n", "\n", "print(f\"\\nFinal model weights: {model.weight.item():.3f}, bias: {model.bias.item():.3f}\") # 应该接近 2 和 1\n", "\n", "# 绘制损失曲线\n", "plt.figure(figsize=(6, 4))\n", "plt.plot(losses)\n", "plt.xlabel(\"Epoch\")\n", "plt.ylabel(\"MSE Loss\")\n", "plt.title(\"Training Loss Over Epochs\")\n", "plt.show()\n", "\n", "# 绘制拟合结果\n", "predicted = model(X_train).detach().numpy() # detach() 停止梯度跟踪,.numpy() 转回 NumPy\n", "plt.figure(figsize=(8, 5))\n", "plt.scatter(X_numpy, y_numpy, label='Original data', s=10, alpha=0.7)\n", "plt.plot(X_numpy, predicted, color='red', label='Fitted line')\n", "plt.xlabel(\"X\")\n", "plt.ylabel(\"y\")\n", "plt.title(\"Linear Regression Fit\")\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 9. 数据加载 (`Dataset` & `DataLoader`)\n", "\n", "`torch.utils.data` 提供了两个核心类来帮助加载和处理数据:\n", "* **`Dataset`**: 一个抽象类,代表数据集。你需要继承它并实现 `__len__` (返回数据集大小) 和 `__getitem__` (根据索引获取一个样本)。\n", "* **`DataLoader`**: 将 `Dataset` 包装起来,提供对数据的迭代访问,支持批处理 (batching)、打乱 (shuffling) 和并行加载数据。\n", "\n", "对于常见的图像 (TorchVision)、文本 (TorchText)、音频 (TorchAudio) 数据集,通常有预置的 `Dataset` 类。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"--- Dataset and DataLoader Example ---\")\n", "\n", "# 1. 创建一个简单的自定义 Dataset\n", "class SimpleDataset(data.Dataset):\n", " def __init__(self, features, labels):\n", " self.features = features\n", " self.labels = labels\n", " print(f\"SimpleDataset created with {len(features)} samples.\")\n", "\n", " def __len__(self):\n", " return len(self.features)\n", "\n", " def __getitem__(self, idx):\n", " # 根据索引获取一个样本 (特征和标签)\n", " sample_feature = self.features[idx]\n", " sample_label = self.labels[idx]\n", " # print(f\" Dataset __getitem__ called for index {idx}\")\n", " return sample_feature, sample_label\n", "\n", "# 使用之前的线性回归数据\n", "dataset = SimpleDataset(X_train, y_train)\n", "\n", "# 获取第一个样本\n", "first_sample_feature, first_sample_label = dataset[0]\n", "print(f\"\\nFirst sample feature: {first_sample_feature}, label: {first_sample_label}\")\n", "\n", "# 2. 创建 DataLoader\n", "batch_size = 16\n", "shuffle_data = True\n", "num_workers_loader = 0 # Windows often requires 0, Linux/Mac can use > 0 for parallelism\n", "\n", "dataloader = data.DataLoader(dataset=dataset, \n", " batch_size=batch_size, \n", " shuffle=shuffle_data,\n", " num_workers=num_workers_loader)\n", "\n", "print(f\"\\nDataLoader created with batch_size={batch_size}, shuffle={shuffle_data}\")\n", "\n", "# 迭代 DataLoader 获取数据批次\n", "print(\"Iterating through the first batch from DataLoader:\")\n", "num_batches_to_show = 1\n", "for i, (batch_features, batch_labels) in enumerate(dataloader):\n", " print(f\"\\nBatch {i+1}:\")\n", " print(f\" Features shape: {batch_features.shape}\") # [batch_size, num_features]\n", " print(f\" Labels shape: {batch_labels.shape}\") # [batch_size, num_labels]\n", " # 在训练循环中,你会在这里进行模型的前向/反向传播\n", " if i+1 >= num_batches_to_show:\n", " break" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 10. GPU 使用基础\n", "\n", "如果你的机器上有兼容的 NVIDIA GPU 和 CUDA 环境,可以将张量和模型移动到 GPU 上进行计算。\n", "\n", "1. **确定设备**:`device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")`\n", "2. **移动模型到设备**:`model.to(device)`\n", "3. **移动数据到设备**:在训练循环中,将每个数据批次移动到设备:`inputs, labels = inputs.to(device), labels.to(device)`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"--- GPU Usage Example ---\")\n", "\n", "# 1. 确定设备 (在教程开始时已完成)\n", "print(f\"Using device: {device}\")\n", "\n", "# 2. 创建模型并移动到设备\n", "model_gpu = nn.Linear(10, 5) # 示例模型\n", "model_gpu.to(device)\n", "print(f\"Model moved to device: {next(model_gpu.parameters()).device}\")\n", "\n", "# 3. 创建数据并移动到设备\n", "input_cpu = torch.randn(4, 10)\n", "input_gpu = input_cpu.to(device)\n", "print(f\"Input tensor on device: {input_gpu.device}\")\n", "\n", "# 在 GPU 上执行前向传播\n", "# (确保模型和输入都在同一设备上)\n", "try:\n", " with torch.no_grad(): # 推理时通常不需要梯度\n", " output_gpu = model_gpu(input_gpu)\n", " print(f\"Output tensor on device: {output_gpu.device}\")\n", "\n", " # 如果需要将结果移回 CPU (例如,用于绘图或 NumPy 操作)\n", " output_cpu = output_gpu.cpu()\n", " print(f\"Output tensor moved back to CPU: {output_cpu.device}\")\n", "except Exception as e:\n", " print(f\"An error occurred during GPU operation (maybe no GPU?): {e}\")\n", "\n", "# 在训练循环中移动数据 (伪代码)\n", "print(\"\\nTraining loop data movement (pseudo-code):\")\n", "print(\"for inputs, labels in dataloader:\")\n", "print(\" inputs = inputs.to(device)\")\n", "print(\" labels = labels.to(device)\")\n", "print(\" # ... rest of the training step ...\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 11. 模型保存与加载\n", "\n", "训练好的模型需要保存以便后续使用或继续训练。\n", "\n", "* **推荐方式**:只保存模型的**状态字典 (`state_dict`)**,它包含了模型的所有可学习参数(权重和偏置)。\n", "* 加载时,需要先创建同一模型类的实例,然后加载状态字典。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "print(\"--- Saving and Loading Models ---\")\n", "\n", "# 假设我们已经训练好了之前的线性回归模型 'model'\n", "model_path = \"linear_regression_model.pth\"\n", "\n", "# 1. 保存模型的状态字典\n", "torch.save(model.state_dict(), model_path)\n", "print(f\"Model state_dict saved to {model_path}\")\n", "\n", "# 2. 加载模型\n", "# a) 首先,创建同一模型结构的新实例\n", "loaded_model = nn.Linear(input_size, output_size)\n", "print(\"New model instance created.\")\n", "print(f\"Weights before loading: {loaded_model.weight.item():.3f}\")\n", "\n", "# b) 加载状态字典\n", "loaded_model.load_state_dict(torch.load(model_path))\n", "print(\"State_dict loaded into the new model instance.\")\n", "print(f\"Weights after loading: {loaded_model.weight.item():.3f}\") # 应该和之前保存的一样\n", "\n", "# c) 设置为评估模式 (重要!影响 Dropout, BatchNorm 等层的行为)\n", "loaded_model.eval()\n", "print(\"Model set to evaluation mode (.eval())\")\n", "\n", "# 使用加载的模型进行预测\n", "# with torch.no_grad():\n", "# predictions = loaded_model(some_new_data)\n", "\n", "# 清理保存的文件\n", "if os.path.exists(model_path):\n", " os.remove(model_path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 总结\n", "\n", "PyTorch 是一个强大而灵活的深度学习框架,特别适合研究和需要精细控制的场景。\n", "\n", "**关键要点:**\n", "* **张量 (Tensor)** 是核心数据结构,支持 GPU 和自动微分。\n", "* **`autograd`** 自动计算梯度,是反向传播的基础。\n", "* **`torch.nn`** 提供了构建神经网络的模块、层和损失函数。\n", "* **`torch.optim`** 提供了优化算法来更新模型参数。\n", "* 训练循环通常包括:**前向传播 -> 计算损失 -> 反向传播 -> 更新参数 -> 清零梯度**。\n", "* **`Dataset` 和 `DataLoader`** 用于高效地加载和批处理数据。\n", "* 使用 **`.to(device)`** 将模型和数据移动到 GPU (如果可用)。\n", "* 推荐保存和加载模型的 **`state_dict`**。\n", "\n", "PyTorch 的功能非常丰富,本教程只介绍了基础。要深入学习,需要探索更复杂的网络结构 (CNN, RNN), TorchVision/TorchText 等库,以及更高级的训练技巧。官方文档和教程是极佳的学习资源。" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 5 }