{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# PyTorch - 灵活的深度学习框架教程\n",
    "\n",
    "欢迎来到 PyTorch 教程！PyTorch 是一个由 Facebook AI Research (FAIR) 开发和维护的开源机器学习库，广泛用于计算机视觉和自然语言处理等应用。\n",
    "\n",
    "**为什么 PyTorch 在 ML/DL/数据科学领域如此受欢迎？**\n",
    "\n",
    "1.  **Pythonic 和易用性**：PyTorch 的 API 设计简洁直观，与 Python 和 NumPy 的风格非常接近，学习曲线相对平缓。\n",
    "2.  **动态计算图 (Define-by-Run)**：计算图在运行时动态构建，使得调试更加容易，并且非常适合处理动态结构的模型（如 RNN）。\n",
    "3.  **强大的 GPU 加速**：可以轻松地将计算移到 NVIDIA GPU 上执行，显著加速训练过程。\n",
    "4.  **丰富的生态系统**：拥有庞大的社区和丰富的扩展库（如 TorchVision, TorchText, TorchAudio）。\n",
    "5.  **研究友好**：由于其灵活性，PyTorch 在学术研究界非常受欢迎。\n",
    "\n",
    "**本教程将涵盖 PyTorch 的核心概念和基本用法：**\n",
    "\n",
    "1.  **张量 (Tensors)**：PyTorch 的基本数据结构。\n",
    "2.  **张量操作**：类似于 NumPy 的操作。\n",
    "3.  **自动微分 (`torch.autograd`)**：计算梯度。\n",
    "4.  **神经网络模块 (`torch.nn`)**：构建网络的基础。\n",
    "5.  **构建一个简单的神经网络**。\n",
    "6.  **损失函数 (`torch.nn`)**。\n",
    "7.  **优化器 (`torch.optim`)**。\n",
    "8.  **模型训练流程**。\n",
    "9.  **数据加载 (`Dataset` & `DataLoader`)**。\n",
    "10. **GPU 使用基础**。\n",
    "11. **模型保存与加载**。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 准备工作：导入 PyTorch\n",
    "\n",
    "确保你已经安装了 PyTorch (可以访问 [pytorch.org](https://pytorch.org/) 获取适合你系统的安装命令)。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "import torch.nn as nn\n",
    "import torch.optim as optim\n",
    "import torch.utils.data as data\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "print(f\"PyTorch version: {torch.__version__}\")\n",
    "\n",
    "# 检查是否有可用的 GPU\n",
    "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
    "print(f\"Using device: {device}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. 张量 (Tensors)\n",
    "\n",
    "张量是 PyTorch 中的核心数据结构，类似于 NumPy 的 `ndarray`，但增加了在 GPU 上计算和自动微分的功能。\n",
    "\n",
    "可以从 Python 列表、NumPy 数组等创建张量，也可以使用 PyTorch 提供的函数创建。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"--- Creating Tensors ---\")\n",
    "# 从列表创建\n",
    "data_list = [[1, 2], [3, 4]]\n",
    "tensor_from_list = torch.tensor(data_list)\n",
    "print(f\"Tensor from list:\\n{tensor_from_list}\")\n",
    "print(f\"Tensor dtype: {tensor_from_list.dtype}\") # 默认通常是 torch.int64\n",
    "\n",
    "# 从 NumPy 数组创建 (共享内存，除非显式复制)\n",
    "np_array = np.array([[5, 6], [7, 8]], dtype=np.float32)\n",
    "tensor_from_np = torch.from_numpy(np_array)\n",
    "print(f\"\\nTensor from NumPy array:\\n{tensor_from_np}\")\n",
    "print(f\"Tensor dtype: {tensor_from_np.dtype}\")\n",
    "\n",
    "# 修改 NumPy 数组会影响 Tensor (反之亦然)\n",
    "# np_array[0, 0] = 99\n",
    "# print(f\"Tensor after modifying NumPy array:\\n{tensor_from_np}\")\n",
    "\n",
    "# 创建特定形状和类型的张量\n",
    "tensor_zeros = torch.zeros(2, 3) # 2x3 全零张量 (默认 float32)\n",
    "print(f\"\\nZeros tensor (2x3):\\n{tensor_zeros}\")\n",
    "\n",
    "tensor_ones = torch.ones(3, 2, dtype=torch.int)\n",
    "print(f\"\\nOnes tensor (3x2, int):\\n{tensor_ones}\")\n",
    "\n",
    "tensor_rand = torch.rand(2, 2) # [0, 1) 均匀分布随机数\n",
    "print(f\"\\nRandom tensor (2x2, uniform [0,1)):\\n{tensor_rand}\")\n",
    "\n",
    "tensor_randn = torch.randn(3, 1) # 标准正态分布随机数\n",
    "print(f\"\\nRandom tensor (3x1, standard normal):\\n{tensor_randn}\")\n",
    "\n",
    "# 获取张量属性\n",
    "print(f\"\\nProperties of tensor_from_list:\")\n",
    "print(f\"  Shape: {tensor_from_list.shape}\")\n",
    "print(f\"  Size: {tensor_from_list.size()}\") # size() 和 shape 类似\n",
    "print(f\"  Data type: {tensor_from_list.dtype}\")\n",
    "print(f\"  Device: {tensor_from_list.device}\") # 存储设备 (默认是 'cpu')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. 张量操作\n",
    "\n",
    "PyTorch 张量支持丰富的操作，语法与 NumPy 非常相似。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "t1 = torch.tensor([[1., 2.], [3., 4.]])\n",
    "t2 = torch.tensor([[5., 6.], [7., 8.]])\n",
    "scalar = 10\n",
    "\n",
    "print(f\"Tensor t1:\\n{t1}\")\n",
    "print(f\"Tensor t2:\\n{t2}\")\n",
    "\n",
    "# 算术运算 (Element-wise)\n",
    "print(\"\\n--- Arithmetic Operations ---\")\n",
    "print(f\"t1 + t2:\\n{t1 + t2}\")\n",
    "print(f\"torch.add(t1, t2):\\n{torch.add(t1, t2)}\")\n",
    "print(f\"t1 * t2:\\n{t1 * t2}\") # 逐元素乘法\n",
    "print(f\"t1 + scalar:\\n{t1 + scalar}\")\n",
    "\n",
    "# In-place 操作 (通常方法名后带下划线 '_')\n",
    "t1_copy = t1.clone() # 创建副本\n",
    "t1_copy.add_(t2) # t1_copy = t1_copy + t2\n",
    "print(f\"\\nt1_copy after add_:\\n{t1_copy}\")\n",
    "\n",
    "# 矩阵乘法\n",
    "print(\"\\n--- Matrix Multiplication ---\")\n",
    "# 使用 @ 运算符 或 torch.matmul()\n",
    "mat_mul = t1 @ t2\n",
    "print(f\"t1 @ t2:\\n{mat_mul}\")\n",
    "mat_mul_func = torch.matmul(t1, t2)\n",
    "print(f\"torch.matmul(t1, t2):\\n{mat_mul_func}\")\n",
    "\n",
    "# 索引和切片 (类似 NumPy)\n",
    "print(\"\\n--- Indexing and Slicing ---\")\n",
    "print(f\"t1[0, 1]: {t1[0, 1]}\") # 第一行，第二列\n",
    "print(f\"t1[:, 0]: {t1[:, 0]}\") # 第一列\n",
    "print(f\"t1[t1 > 2]: {t1[t1 > 2]}\") # 布尔索引\n",
    "\n",
    "# 形状操作\n",
    "print(\"\\n--- Reshaping ---\")\n",
    "t_flat = torch.arange(6) # [0, 1, 2, 3, 4, 5]\n",
    "t_reshaped = t_flat.view(2, 3) # 改变视图，共享数据 (类似 reshape 但不保证)\n",
    "print(f\"Original flat tensor: {t_flat}\")\n",
    "print(f\"View as (2, 3):\\n{t_reshaped}\")\n",
    "t_reshape = t_flat.reshape(3, 2) # 更通用的 reshape，可能返回副本\n",
    "print(f\"Reshape as (3, 2):\\n{t_reshape}\")\n",
    "\n",
    "# NumPy <-> PyTorch 转换\n",
    "print(\"\\n--- NumPy Bridge ---\")\n",
    "np_arr = np.ones((2,2))\n",
    "torch_tensor = torch.from_numpy(np_arr)\n",
    "print(f\"Tensor from NumPy:\\n{torch_tensor}\")\n",
    "np_back = torch_tensor.numpy() # 转换回 NumPy 数组 (共享内存)\n",
    "print(f\"NumPy from Tensor:\\n{np_back}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. 自动微分 (`torch.autograd`)\n",
    "\n",
    "`autograd` 是 PyTorch 的核心功能之一，它为张量上的所有操作提供自动微分。这对于神经网络的反向传播至关重要。\n",
    "\n",
    "*   设置 `requires_grad=True` 的张量会跟踪其上的所有操作。\n",
    "*   当计算完成后，可以调用 `.backward()`，PyTorch 会自动计算所有梯度。\n",
    "*   梯度会累积到张量的 `.grad` 属性中。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"--- Autograd Example ---\")\n",
    "# 创建需要计算梯度的张量\n",
    "x = torch.tensor(2.0, requires_grad=True)\n",
    "w = torch.tensor(3.0, requires_grad=True)\n",
    "b = torch.tensor(1.0, requires_grad=True)\n",
    "\n",
    "# 定义一个简单的计算图 (e.g., y = w * x + b)\n",
    "y = w * x + b\n",
    "print(f\"y = w*x + b = {y}\")\n",
    "\n",
    "# 定义一个损失函数 (e.g., L = y^2)\n",
    "L = y.pow(2)\n",
    "print(f\"L = y^2 = {L}\")\n",
    "\n",
    "# 执行反向传播计算梯度\n",
    "print(\"\\nCalling L.backward()...\")\n",
    "L.backward()\n",
    "\n",
    "# 查看梯度 (dL/dx, dL/dw, dL/db)\n",
    "# dL/dy = 2*y = 2*(w*x+b) = 2*(3*2+1) = 14\n",
    "# dL/dw = dL/dy * dy/dw = (2*y) * x = 14 * 2 = 28\n",
    "# dL/dx = dL/dy * dy/dx = (2*y) * w = 14 * 3 = 42\n",
    "# dL/db = dL/dy * dy/db = (2*y) * 1 = 14 * 1 = 14\n",
    "print(f\"Gradient dL/dx (x.grad): {x.grad}\")\n",
    "print(f\"Gradient dL/dw (w.grad): {w.grad}\")\n",
    "print(f\"Gradient dL/db (b.grad): {b.grad}\")\n",
    "\n",
    "# --- 停止梯度跟踪 ---\n",
    "# 使用 torch.no_grad() 上下文管理器\n",
    "print(\"\\nInside torch.no_grad():\")\n",
    "with torch.no_grad():\n",
    "    y_no_grad = w * x + b\n",
    "    print(f\"  y_no_grad = {y_no_grad}\")\n",
    "    print(f\"  y_no_grad.requires_grad: {y_no_grad.requires_grad}\") # False\n",
    "\n",
    "# 或者使用 .detach() 创建一个不带梯度历史的新张量\n",
    "x_detached = x.detach()\n",
    "print(f\"\\nx_detached requires_grad: {x_detached.requires_grad}\") # False\n",
    "\n",
    "# 梯度会累积，在优化循环中需要清零\n",
    "print(\"\\nGradient accumulation:\")\n",
    "y2 = w * x + b\n",
    "L2 = y2.pow(2)\n",
    "L2.backward() # 再次计算梯度\n",
    "print(f\"w.grad after second backward(): {w.grad}\") # 梯度变成了 28 + 28 = 56\n",
    "\n",
    "# 清零梯度 (在优化循环开始时)\n",
    "w.grad.zero_()\n",
    "x.grad.zero_()\n",
    "b.grad.zero_()\n",
    "print(f\"w.grad after zero_(): {w.grad}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. 神经网络模块 (`torch.nn`)\n",
    "\n",
    "`torch.nn` 模块是构建神经网络的核心。它提供了各种网络层、损失函数和容器。\n",
    "\n",
    "*   **`nn.Module`**: 所有神经网络模块的基类。自定义网络层需要继承它，并实现 `forward()` 方法。\n",
    "*   **常用层**: `nn.Linear`, `nn.Conv2d`, `nn.RNN`, `nn.LSTM`, `nn.Embedding`, `nn.Dropout`, `nn.BatchNorm2d` 等。\n",
    "*   **激活函数**: `nn.ReLU`, `nn.Sigmoid`, `nn.Tanh`, `nn.Softmax` 等 (通常在 `nn.functional` 中也有函数形式)。\n",
    "*   **损失函数**: `nn.MSELoss`, `nn.CrossEntropyLoss`, `nn.BCELoss` 等。\n",
    "*   **容器**: `nn.Sequential` (按顺序连接模块), `nn.ModuleList`, `nn.ModuleDict`。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"--- torch.nn Examples ---\")\n",
    "\n",
    "# 全连接层 (Linear Layer)\n",
    "input_features = 4\n",
    "output_features = 2\n",
    "linear_layer = nn.Linear(input_features, output_features)\n",
    "print(f\"Linear layer: {linear_layer}\")\n",
    "print(f\"Linear layer weight shape: {linear_layer.weight.shape}\") # [output_features, input_features]\n",
    "print(f\"Linear layer bias shape: {linear_layer.bias.shape}\")   # [output_features]\n",
    "\n",
    "# 示例输入 (需要 Batch 维度, e.g., [batch_size, num_features])\n",
    "sample_input = torch.randn(3, input_features) # Batch of 3 samples\n",
    "output = linear_layer(sample_input)\n",
    "print(f\"\\nSample input shape: {sample_input.shape}\")\n",
    "print(f\"Output shape from linear layer: {output.shape}\") # [batch_size, output_features]\n",
    "\n",
    "# ReLU 激活函数\n",
    "relu = nn.ReLU()\n",
    "output_relu = relu(output)\n",
    "print(f\"\\nOutput after ReLU:\\n{output_relu}\")\n",
    "\n",
    "# 使用 Sequential 容器构建简单网络\n",
    "model_sequential = nn.Sequential(\n",
    "    nn.Linear(10, 20), # 输入 10, 输出 20\n",
    "    nn.ReLU(),\n",
    "    nn.Linear(20, 5)   # 输入 20, 输出 5\n",
    ")\n",
    "print(f\"\\nSequential model:\\n{model_sequential}\")\n",
    "\n",
    "# 示例输入\n",
    "seq_input = torch.randn(4, 10) # Batch of 4, 10 features\n",
    "seq_output = model_sequential(seq_input)\n",
    "print(f\"Output shape from sequential model: {seq_output.shape}\") # [4, 5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. 构建一个简单的神经网络\n",
    "\n",
    "通常通过继承 `nn.Module` 并定义 `__init__` 和 `forward` 方法来构建自定义网络。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"--- Building a Simple Network (Custom Class) ---\")\n",
    "\n",
    "class SimpleNet(nn.Module):\n",
    "    def __init__(self, input_size, hidden_size, output_size):\n",
    "        super(SimpleNet, self).__init__() # 调用父类的 __init__\n",
    "        self.layer1 = nn.Linear(input_size, hidden_size)\n",
    "        self.relu = nn.ReLU()\n",
    "        self.layer2 = nn.Linear(hidden_size, output_size)\n",
    "        print(f\"SimpleNet initialized with input={input_size}, hidden={hidden_size}, output={output_size}\")\n",
    "\n",
    "    def forward(self, x):\n",
    "        # 定义数据如何通过网络层流动\n",
    "        print(\"  SimpleNet forward pass starting...\")\n",
    "        out = self.layer1(x)\n",
    "        print(f\"  Shape after layer1: {out.shape}\")\n",
    "        out = self.relu(out)\n",
    "        print(f\"  Shape after relu: {out.shape}\")\n",
    "        out = self.layer2(out)\n",
    "        print(f\"  Shape after layer2 (final output): {out.shape}\")\n",
    "        return out\n",
    "\n",
    "# 实例化网络\n",
    "input_dim = 5\n",
    "hidden_dim = 8\n",
    "output_dim = 2\n",
    "net = SimpleNet(input_dim, hidden_dim, output_dim)\n",
    "print(f\"\\nNetwork structure:\\n{net}\")\n",
    "\n",
    "# 查看网络参数\n",
    "print(\"\\nNetwork parameters:\")\n",
    "for name, param in net.named_parameters():\n",
    "    if param.requires_grad:\n",
    "        print(f\"  {name}: shape={param.shape}, requires_grad={param.requires_grad}\")\n",
    "\n",
    "# 示例前向传播\n",
    "print(\"\\nPerforming a forward pass:\")\n",
    "dummy_input = torch.randn(4, input_dim) # Batch of 4\n",
    "output = net(dummy_input)\n",
    "print(f\"Output tensor:\\n{output}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. 损失函数 (`torch.nn`)\n",
    "\n",
    "损失函数（或成本函数）衡量模型输出与真实目标之间的差距。训练的目标是最小化损失。\n",
    "\n",
    "*   回归常用：`nn.MSELoss` (均方误差), `nn.L1Loss` (平均绝对误差)。\n",
    "*   分类常用：`nn.CrossEntropyLoss` (通常用于多分类，内部结合了 `LogSoftmax` 和 `NLLLoss`)，`nn.BCELoss` (二元交叉熵，需要输入概率和 Sigmoid 激活)。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"--- Loss Function Examples ---\")\n",
    "\n",
    "# --- MSELoss for Regression --- \n",
    "loss_fn_mse = nn.MSELoss()\n",
    "predictions_reg = torch.tensor([2.5, 4.8, 1.9])\n",
    "targets_reg = torch.tensor([3.0, 5.0, 1.5])\n",
    "mse_loss = loss_fn_mse(predictions_reg, targets_reg)\n",
    "print(f\"MSE Loss: {mse_loss.item()}\") # .item() 获取标量张量的值\n",
    "\n",
    "# --- CrossEntropyLoss for Multi-class Classification --- \n",
    "# 输入: 模型输出的原始分数 (logits)，形状 [BatchSize, NumClasses]\n",
    "# 目标: 类别索引 (整数)，形状 [BatchSize]\n",
    "loss_fn_ce = nn.CrossEntropyLoss()\n",
    "predictions_clf = torch.tensor([[2.0, -1.0, 0.5],  # Sample 1 scores for 3 classes\n",
    "                              [0.1, 1.5, -0.5]]) # Sample 2 scores\n",
    "targets_clf = torch.tensor([0, 1]) # Sample 1 true class is 0, Sample 2 true class is 1\n",
    "ce_loss = loss_fn_ce(predictions_clf, targets_clf)\n",
    "print(f\"\\nCross Entropy Loss: {ce_loss.item()}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. 优化器 (`torch.optim`)\n",
    "\n",
    "优化器实现了更新模型参数（权重和偏置）以最小化损失函数的算法。\n",
    "\n",
    "*   常用优化器：`optim.SGD` (随机梯度下降), `optim.Adam`, `optim.RMSprop`, `optim.Adagrad`。\n",
    "*   需要将模型的参数 (`model.parameters()`) 和学习率 (`lr`) 传递给优化器构造函数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"--- Optimizer Example (using SimpleNet) ---\")\n",
    "\n",
    "learning_rate = 0.01\n",
    "\n",
    "# 使用 SGD\n",
    "optimizer_sgd = optim.SGD(net.parameters(), lr=learning_rate, momentum=0.9)\n",
    "print(f\"SGD Optimizer: {optimizer_sgd}\")\n",
    "\n",
    "# 使用 Adam (常用的自适应学习率优化器)\n",
    "optimizer_adam = optim.Adam(net.parameters(), lr=learning_rate)\n",
    "print(f\"Adam Optimizer: {optimizer_adam}\")\n",
    "\n",
    "# 优化器的关键方法:\n",
    "# - optimizer.zero_grad(): 清除所有参数的梯度 (在计算新梯度前必须调用)\n",
    "# - optimizer.step(): 根据计算出的梯度更新参数"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 8. 模型训练流程\n",
    "\n",
    "一个典型的 PyTorch 训练循环包含以下步骤：\n",
    "\n",
    "1.  **前向传播 (Forward Pass)**：将输入数据传递给模型，得到输出。\n",
    "2.  **计算损失 (Compute Loss)**：使用损失函数计算模型输出和真实目标之间的差距。\n",
    "3.  **反向传播 (Backward Pass)**：调用 `loss.backward()` 计算损失相对于模型参数的梯度。\n",
    "4.  **更新参数 (Update Parameters)**：调用 `optimizer.step()` 更新模型参数。\n",
    "5.  **清零梯度 (Zero Gradients)**：调用 `optimizer.zero_grad()` 清除梯度，为下一次迭代做准备。\n",
    "\n",
    "这个过程通常在一个循环中重复多个周期 (epochs)。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"--- Simple Training Loop Example (Linear Regression) ---\")\n",
    "\n",
    "# 1. 准备数据 (简单线性关系 y = 2*x + 1 + noise)\n",
    "X_numpy = np.linspace(0, 10, 100).reshape(-1, 1) # 100个样本, 1个特征\n",
    "y_numpy = 2 * X_numpy + 1 + np.random.randn(100, 1) * 2 # 添加噪声\n",
    "\n",
    "X_train = torch.from_numpy(X_numpy.astype(np.float32))\n",
    "y_train = torch.from_numpy(y_numpy.astype(np.float32))\n",
    "\n",
    "# 2. 定义模型、损失和优化器\n",
    "input_size = 1\n",
    "output_size = 1\n",
    "model = nn.Linear(input_size, output_size)\n",
    "criterion = nn.MSELoss()\n",
    "optimizer = optim.SGD(model.parameters(), lr=0.01)\n",
    "\n",
    "print(f\"Initial model weights: {model.weight.item():.3f}, bias: {model.bias.item():.3f}\")\n",
    "\n",
    "# 3. 训练循环\n",
    "num_epochs = 100\n",
    "losses = []\n",
    "\n",
    "for epoch in range(num_epochs):\n",
    "    # 前向传播\n",
    "    outputs = model(X_train)\n",
    "    loss = criterion(outputs, y_train)\n",
    "    \n",
    "    # 反向传播和优化\n",
    "    optimizer.zero_grad() # 清零梯度\n",
    "    loss.backward()     # 计算梯度\n",
    "    optimizer.step()      # 更新权重\n",
    "    \n",
    "    losses.append(loss.item())\n",
    "    \n",
    "    if (epoch + 1) % 20 == 0:\n",
    "        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')\n",
    "\n",
    "print(f\"\\nFinal model weights: {model.weight.item():.3f}, bias: {model.bias.item():.3f}\") # 应该接近 2 和 1\n",
    "\n",
    "# 绘制损失曲线\n",
    "plt.figure(figsize=(6, 4))\n",
    "plt.plot(losses)\n",
    "plt.xlabel(\"Epoch\")\n",
    "plt.ylabel(\"MSE Loss\")\n",
    "plt.title(\"Training Loss Over Epochs\")\n",
    "plt.show()\n",
    "\n",
    "# 绘制拟合结果\n",
    "predicted = model(X_train).detach().numpy() # detach() 停止梯度跟踪，.numpy() 转回 NumPy\n",
    "plt.figure(figsize=(8, 5))\n",
    "plt.scatter(X_numpy, y_numpy, label='Original data', s=10, alpha=0.7)\n",
    "plt.plot(X_numpy, predicted, color='red', label='Fitted line')\n",
    "plt.xlabel(\"X\")\n",
    "plt.ylabel(\"y\")\n",
    "plt.title(\"Linear Regression Fit\")\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 9. 数据加载 (`Dataset` & `DataLoader`)\n",
    "\n",
    "`torch.utils.data` 提供了两个核心类来帮助加载和处理数据：\n",
    "*   **`Dataset`**: 一个抽象类，代表数据集。你需要继承它并实现 `__len__` (返回数据集大小) 和 `__getitem__` (根据索引获取一个样本)。\n",
    "*   **`DataLoader`**: 将 `Dataset` 包装起来，提供对数据的迭代访问，支持批处理 (batching)、打乱 (shuffling) 和并行加载数据。\n",
    "\n",
    "对于常见的图像 (TorchVision)、文本 (TorchText)、音频 (TorchAudio) 数据集，通常有预置的 `Dataset` 类。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"--- Dataset and DataLoader Example ---\")\n",
    "\n",
    "# 1. 创建一个简单的自定义 Dataset\n",
    "class SimpleDataset(data.Dataset):\n",
    "    def __init__(self, features, labels):\n",
    "        self.features = features\n",
    "        self.labels = labels\n",
    "        print(f\"SimpleDataset created with {len(features)} samples.\")\n",
    "\n",
    "    def __len__(self):\n",
    "        return len(self.features)\n",
    "\n",
    "    def __getitem__(self, idx):\n",
    "        # 根据索引获取一个样本 (特征和标签)\n",
    "        sample_feature = self.features[idx]\n",
    "        sample_label = self.labels[idx]\n",
    "        # print(f\"  Dataset __getitem__ called for index {idx}\")\n",
    "        return sample_feature, sample_label\n",
    "\n",
    "# 使用之前的线性回归数据\n",
    "dataset = SimpleDataset(X_train, y_train)\n",
    "\n",
    "# 获取第一个样本\n",
    "first_sample_feature, first_sample_label = dataset[0]\n",
    "print(f\"\\nFirst sample feature: {first_sample_feature}, label: {first_sample_label}\")\n",
    "\n",
    "# 2. 创建 DataLoader\n",
    "batch_size = 16\n",
    "shuffle_data = True\n",
    "num_workers_loader = 0 # Windows often requires 0, Linux/Mac can use > 0 for parallelism\n",
    "\n",
    "dataloader = data.DataLoader(dataset=dataset, \n",
    "                              batch_size=batch_size, \n",
    "                              shuffle=shuffle_data,\n",
    "                              num_workers=num_workers_loader)\n",
    "\n",
    "print(f\"\\nDataLoader created with batch_size={batch_size}, shuffle={shuffle_data}\")\n",
    "\n",
    "# 迭代 DataLoader 获取数据批次\n",
    "print(\"Iterating through the first batch from DataLoader:\")\n",
    "num_batches_to_show = 1\n",
    "for i, (batch_features, batch_labels) in enumerate(dataloader):\n",
    "    print(f\"\\nBatch {i+1}:\")\n",
    "    print(f\"  Features shape: {batch_features.shape}\") # [batch_size, num_features]\n",
    "    print(f\"  Labels shape: {batch_labels.shape}\")   # [batch_size, num_labels]\n",
    "    # 在训练循环中，你会在这里进行模型的前向/反向传播\n",
    "    if i+1 >= num_batches_to_show:\n",
    "        break"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 10. GPU 使用基础\n",
    "\n",
    "如果你的机器上有兼容的 NVIDIA GPU 和 CUDA 环境，可以将张量和模型移动到 GPU 上进行计算。\n",
    "\n",
    "1.  **确定设备**：`device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")`\n",
    "2.  **移动模型到设备**：`model.to(device)`\n",
    "3.  **移动数据到设备**：在训练循环中，将每个数据批次移动到设备：`inputs, labels = inputs.to(device), labels.to(device)`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"--- GPU Usage Example ---\")\n",
    "\n",
    "# 1. 确定设备 (在教程开始时已完成)\n",
    "print(f\"Using device: {device}\")\n",
    "\n",
    "# 2. 创建模型并移动到设备\n",
    "model_gpu = nn.Linear(10, 5) # 示例模型\n",
    "model_gpu.to(device)\n",
    "print(f\"Model moved to device: {next(model_gpu.parameters()).device}\")\n",
    "\n",
    "# 3. 创建数据并移动到设备\n",
    "input_cpu = torch.randn(4, 10)\n",
    "input_gpu = input_cpu.to(device)\n",
    "print(f\"Input tensor on device: {input_gpu.device}\")\n",
    "\n",
    "# 在 GPU 上执行前向传播\n",
    "# (确保模型和输入都在同一设备上)\n",
    "try:\n",
    "    with torch.no_grad(): # 推理时通常不需要梯度\n",
    "        output_gpu = model_gpu(input_gpu)\n",
    "    print(f\"Output tensor on device: {output_gpu.device}\")\n",
    "\n",
    "    # 如果需要将结果移回 CPU (例如，用于绘图或 NumPy 操作)\n",
    "    output_cpu = output_gpu.cpu()\n",
    "    print(f\"Output tensor moved back to CPU: {output_cpu.device}\")\n",
    "except Exception as e:\n",
    "     print(f\"An error occurred during GPU operation (maybe no GPU?): {e}\")\n",
    "\n",
    "# 在训练循环中移动数据 (伪代码)\n",
    "print(\"\\nTraining loop data movement (pseudo-code):\")\n",
    "print(\"for inputs, labels in dataloader:\")\n",
    "print(\"    inputs = inputs.to(device)\")\n",
    "print(\"    labels = labels.to(device)\")\n",
    "print(\"    # ... rest of the training step ...\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 11. 模型保存与加载\n",
    "\n",
    "训练好的模型需要保存以便后续使用或继续训练。\n",
    "\n",
    "*   **推荐方式**：只保存模型的**状态字典 (`state_dict`)**，它包含了模型的所有可学习参数（权重和偏置）。\n",
    "*   加载时，需要先创建同一模型类的实例，然后加载状态字典。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "print(\"--- Saving and Loading Models ---\")\n",
    "\n",
    "# 假设我们已经训练好了之前的线性回归模型 'model'\n",
    "model_path = \"linear_regression_model.pth\"\n",
    "\n",
    "# 1. 保存模型的状态字典\n",
    "torch.save(model.state_dict(), model_path)\n",
    "print(f\"Model state_dict saved to {model_path}\")\n",
    "\n",
    "# 2. 加载模型\n",
    "# a) 首先，创建同一模型结构的新实例\n",
    "loaded_model = nn.Linear(input_size, output_size)\n",
    "print(\"New model instance created.\")\n",
    "print(f\"Weights before loading: {loaded_model.weight.item():.3f}\")\n",
    "\n",
    "# b) 加载状态字典\n",
    "loaded_model.load_state_dict(torch.load(model_path))\n",
    "print(\"State_dict loaded into the new model instance.\")\n",
    "print(f\"Weights after loading: {loaded_model.weight.item():.3f}\") # 应该和之前保存的一样\n",
    "\n",
    "# c) 设置为评估模式 (重要！影响 Dropout, BatchNorm 等层的行为)\n",
    "loaded_model.eval()\n",
    "print(\"Model set to evaluation mode (.eval())\")\n",
    "\n",
    "# 使用加载的模型进行预测\n",
    "# with torch.no_grad():\n",
    "#     predictions = loaded_model(some_new_data)\n",
    "\n",
    "# 清理保存的文件\n",
    "if os.path.exists(model_path):\n",
    "    os.remove(model_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 总结\n",
    "\n",
    "PyTorch 是一个强大而灵活的深度学习框架，特别适合研究和需要精细控制的场景。\n",
    "\n",
    "**关键要点：**\n",
    "*   **张量 (Tensor)** 是核心数据结构，支持 GPU 和自动微分。\n",
    "*   **`autograd`** 自动计算梯度，是反向传播的基础。\n",
    "*   **`torch.nn`** 提供了构建神经网络的模块、层和损失函数。\n",
    "*   **`torch.optim`** 提供了优化算法来更新模型参数。\n",
    "*   训练循环通常包括：**前向传播 -> 计算损失 -> 反向传播 -> 更新参数 -> 清零梯度**。\n",
    "*   **`Dataset` 和 `DataLoader`** 用于高效地加载和批处理数据。\n",
    "*   使用 **`.to(device)`** 将模型和数据移动到 GPU (如果可用)。\n",
    "*   推荐保存和加载模型的 **`state_dict`**。\n",
    "\n",
    "PyTorch 的功能非常丰富，本教程只介绍了基础。要深入学习，需要探索更复杂的网络结构 (CNN, RNN), TorchVision/TorchText 等库，以及更高级的训练技巧。官方文档和教程是极佳的学习资源。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 5
}