{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Python 生成器与迭代器协议 (深入) 教程\n", "\n", "欢迎来到 Python 生成器与迭代器协议的深入教程!迭代是 Python 中一个非常核心的概念,理解其背后的迭代器协议以及强大的生成器机制,可以帮助你编写出更高效、内存友好且富有表现力的代码。\n", "\n", "**为什么深入学习迭代和生成器?**\n", "\n", "1. **内存效率**:生成器允许按需生成值,而不是一次性在内存中创建整个序列,这对于处理大型数据集或无限序列至关重要。\n", "2. **惰性求值 (Lazy Evaluation)**:值仅在需要时才被计算,可以节省计算资源。\n", "3. **代码简洁性**:生成器提供了一种简洁的方式来创建迭代器。\n", "4. **构建数据处理管道**:可以轻松地将多个生成器链接起来,形成高效的数据处理流。\n", "5. **理解 Python 核心**:迭代协议是 `for` 循环、列表推导式、`map()`, `filter()` 等许多 Python 特性的基础。\n", "\n", "**本教程将涵盖:**\n", "\n", "1. **迭代协议 (Iterator Protocol)**:`__iter__` 和 `__next__`。\n", "2. **可迭代对象 (Iterable) vs 迭代器 (Iterator)**。\n", "3. **生成器函数 (Generator Functions)**:使用 `yield` 关键字。\n", "4. **生成器表达式 (Generator Expressions)**。\n", "5. **`itertools` 模块**:强大的迭代工具。\n", "6. **`yield from` 语句** (Python 3.3+)。\n", "7. **生成器的高级特性**:`send()`, `throw()`, `close()` 方法 (传统协程基础)。\n", "8. **应用场景与最佳实践**。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. 迭代协议 (Iterator Protocol)\n", "\n", "Python 的迭代协议定义了对象如何支持迭代。它依赖于两个核心的魔术方法:\n", "\n", "* **`__iter__(self)`**:\n", " * 当一个对象被传递给 `iter()` 内置函数时,或者当 `for` 循环开始时,会调用该对象的 `__iter__` 方法。\n", " * 它必须返回一个**迭代器对象**。\n", "\n", "* **`__next__(self)`**:\n", " * 迭代器对象必须实现这个方法。\n", " * 当调用 `next(iterator)` 内置函数时(`for` 循环在每次迭代时隐式调用它),会调用迭代器的 `__next__` 方法。\n", " * 它应该返回序列中的下一个值。\n", " * 当没有更多值可以返回时,它必须引发 `StopIteration` 异常。`for` 循环会自动捕获这个异常并终止循环。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. 可迭代对象 (Iterable) vs 迭代器 (Iterator)\n", "\n", "* **可迭代对象 (Iterable)**:\n", " * 任何实现了 `__iter__` 方法(返回一个迭代器)的对象都是可迭代的。\n", " * 或者,如果一个对象实现了 `__getitem__` 方法并且可以从索引 0 开始接受整数参数(如序列),它也是可迭代的 (Python 会自动创建一个迭代器来遍历它)。\n", " * 例子:列表 (`list`)、元组 (`tuple`)、字符串 (`str`)、字典 (`dict`)、集合 (`set`)、文件对象、自定义类(实现了 `__iter__` 或 `__getitem__`)。\n", " * 你可以对一个可迭代对象多次调用 `iter()` 来获取新的迭代器,每个迭代器独立地遍历数据。\n", "\n", "* **迭代器 (Iterator)**:\n", " * 任何实现了 `__iter__` 方法和 `__next__` 方法的对象都是迭代器。\n", " * `__iter__` 方法对于迭代器来说,通常只需要返回 `self` (因为迭代器本身就是自己的迭代器)。\n", " * 迭代器是有状态的:它们记住在迭代过程中的当前位置。\n", " * 迭代器通常只能遍历一次。一旦 `__next__` 引发 `StopIteration`,它将继续引发该异常。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# 示例:自定义一个可迭代对象和迭代器\n", "class MyRangeIterable:\n", " \"\"\"一个简单的可迭代对象,类似于 range()\"\"\"\n", " def __init__(self, start, end):\n", " self.start = start\n", " self.end = end\n", " print(f\"MyRangeIterable initialized ({self.start} to {self.end})\")\n", "\n", " def __iter__(self):\n", " print(\"MyRangeIterable.__iter__ called, returning MyRangeIterator\")\n", " # 返回一个新的迭代器实例\n", " return MyRangeIterator(self.start, self.end)\n", "\n", "class MyRangeIterator:\n", " \"\"\"一个迭代器,用于 MyRangeIterable\"\"\"\n", " def __init__(self, start, end):\n", " self.current = start\n", " self.end = end\n", " print(f\"MyRangeIterator initialized (current={self.current}, end={self.end})\")\n", "\n", " def __iter__(self):\n", " # 迭代器自身的 __iter__ 方法应该返回 self\n", " print(\"MyRangeIterator.__iter__ called, returning self\")\n", " return self\n", "\n", " def __next__(self):\n", " print(f\"MyRangeIterator.__next__ called (current={self.current})\")\n", " if self.current < self.end:\n", " value = self.current\n", " self.current += 1\n", " return value\n", " else:\n", " print(\"MyRangeIterator: Raising StopIteration\")\n", " raise StopIteration\n", "\n", "print(\"--- Testing MyRangeIterable ---\")\n", "my_range_obj = MyRangeIterable(1, 4) # 可迭代对象\n", "\n", "print(\"\\nFirst iteration using for loop:\")\n", "for num in my_range_obj: # 隐式调用 iter(my_range_obj) 然后 next()\n", " print(f\" For loop got: {num}\")\n", "\n", "print(\"\\nSecond iteration using for loop (gets a new iterator):\")\n", "for num in my_range_obj:\n", " print(f\" For loop got: {num}\")\n", "\n", "print(\"\\nManual iteration:\")\n", "iterator1 = iter(my_range_obj) # 获取一个迭代器\n", "print(f\"Type of iterator1: {type(iterator1)}\")\n", "print(f\"next(iterator1): {next(iterator1)}\")\n", "print(f\"next(iterator1): {next(iterator1)}\")\n", "\n", "iterator2 = iter(my_range_obj) # 获取另一个独立的迭代器\n", "print(f\"next(iterator2): {next(iterator2)}\") # 从头开始\n", "\n", "print(f\"Continuing iterator1: {next(iterator1)}\")\n", "try:\n", " print(f\"Continuing iterator1 (expect StopIteration): {next(iterator1)}\")\n", "except StopIteration as e:\n", " print(f\" Caught StopIteration as expected: {e}\")\n", "\n", "# 验证迭代器也是可迭代的\n", "iter_from_iter = iter(iterator2)\n", "print(f\"iterator2 is iter_from_iter: {iterator2 is iter_from_iter}\") # True" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. 生成器函数 (Generator Functions)\n", "\n", "生成器函数是一种特殊的函数,它不使用 `return` 返回一个值,而是使用 `yield` 关键字“产生”一系列值。\n", "\n", "* 当调用一个生成器函数时,它**不会立即执行函数体**,而是返回一个**生成器对象 (generator object)**。\n", "* 生成器对象是一种特殊的迭代器:它自动实现了 `__iter__` 和 `__next__` 方法。\n", "* 每次在生成器对象上调用 `next()` 时,函数会从上次 `yield` 语句离开的地方继续执行,直到遇到下一个 `yield` 语句。\n", "* `yield` 语句会“产生”一个值给调用者,并暂停函数的执行状态(包括局部变量)。\n", "* 当函数执行完毕(没有更多 `yield` 或遇到 `return` 语句,或正常退出)时,会自动引发 `StopIteration`。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def simple_generator_func(n):\n", " print(\"Generator function: simple_generator_func called\")\n", " i = 0\n", " while i < n:\n", " print(f\"Generator: yielding {i}\")\n", " yield i # 产生值并暂停\n", " i += 1\n", " print(f\"Generator: resumed, i is now {i}\")\n", " print(\"Generator: finished\")\n", " # 隐式 StopIteration\n", "\n", "print(\"--- Testing simple_generator_func ---\")\n", "gen_obj = simple_generator_func(3) # 调用生成器函数,返回生成器对象\n", "print(f\"Type of gen_obj: {type(gen_obj)}\") # \n", "\n", "print(f\"\\nFirst next(gen_obj): {next(gen_obj)}\") # 开始执行,直到第一个yield\n", "print(f\"Second next(gen_obj): {next(gen_obj)}\")\n", "print(f\"Third next(gen_obj): {next(gen_obj)}\")\n", "try:\n", " print(f\"Fourth next(gen_obj) (expect StopIteration): {next(gen_obj)}\")\n", "except StopIteration:\n", " print(\" Caught StopIteration as expected.\")\n", "\n", "print(\"\\nIterating with a for loop (uses a new generator object):\")\n", "for val in simple_generator_func(2):\n", " print(f\" For loop got: {val}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**生成器的优点:**\n", "* **代码简洁**:创建迭代器的逻辑(状态管理、`StopIteration`)由 Python 自动处理。\n", "* **内存高效**:值是按需生成的,适合处理大数据集或无限序列。\n", "\n", "**无限序列示例:**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def fibonacci_generator():\n", " \"\"\"生成一个无限的斐波那契数列。\"\"\"\n", " a, b = 0, 1\n", " while True:\n", " yield a\n", " a, b = b, a + b\n", "\n", "print(\"--- Fibonacci Generator ---\")\n", "fib_gen = fibonacci_generator()\n", "print(\"First 10 Fibonacci numbers:\")\n", "for _ in range(10):\n", " print(next(fib_gen), end=\" \")\n", "print(\"\\n\")\n", "\n", "# 如果你想从头开始,需要重新创建生成器对象\n", "fib_gen2 = fibonacci_generator()\n", "print(f\"Next from fib_gen2: {next(fib_gen2)}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. 生成器表达式 (Generator Expressions)\n", "\n", "生成器表达式提供了一种更简洁的方式来创建简单的生成器对象,其语法类似于列表推导式,但使用圆括号 `()` 而不是方括号 `[]`。\n", "\n", "`(expression for item in iterable if condition)`\n", "\n", "* 生成器表达式也返回一个生成器对象。\n", "* 它们也是惰性求值的,按需生成值。\n", "* 非常适合作为函数参数传递,尤其是当你不希望立即创建整个列表时。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "squares_list_comp = [x*x for x in range(5)] # 列表推导式,立即创建列表\n", "squares_gen_expr = (x*x for x in range(5)) # 生成器表达式,返回生成器对象\n", "\n", "print(f\"List comprehension: {squares_list_comp}, type: {type(squares_list_comp)}\")\n", "print(f\"Generator expression: {squares_gen_expr}, type: {type(squares_gen_expr)}\")\n", "\n", "print(\"\\nIterating over generator expression:\")\n", "for sq in squares_gen_expr:\n", " print(sq, end=\" \")\n", "print(\"\\n\")\n", "\n", "# 再次迭代会发现它已经耗尽 (因为生成器是一次性的)\n", "print(\"Trying to iterate again (should be empty):\")\n", "for sq in squares_gen_expr: \n", " print(sq, end=\" \") # 不会有输出\n", "print(\"\\n\")\n", "\n", "# 作为函数参数\n", "data = [1, 2, 3, 4, 5, 6]\n", "sum_of_even_squares = sum(x*x for x in data if x % 2 == 0)\n", "# 上面的 sum() 直接消耗了生成器表达式产生的值,没有创建中间列表\n", "print(f\"Sum of even squares: {sum_of_even_squares}\")\n", "\n", "# 如果生成器表达式是函数调用的唯一参数,可以省略外层圆括号\n", "sum_of_cubes = sum(x**3 for x in range(1, 4))\n", "print(f\"Sum of cubes (1,2,3): {sum_of_cubes}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. `itertools` 模块\n", "\n", "`itertools` 模块包含一系列用于创建高效迭代器的函数。这些函数受到 APL, Haskell, SML 等函数式编程语言中类似构造的启发。\n", "\n", "**一些常用的 `itertools` 函数:**\n", "\n", "* **无限迭代器:**\n", " * `count(start=0, step=1)`: 从 `start` 开始,以 `step` 递增的无限序列。\n", " * `cycle(iterable)`: 无限重复 `iterable` 中的元素。\n", " * `repeat(object[, times])`: 重复 `object`,可以指定次数,否则无限重复。\n", "\n", "* **处理有限序列的迭代器:**\n", " * `accumulate(iterable[, func, *, initial=None])`: 返回累积的总和(或其他二元函数的结果)。\n", " * `chain(*iterables)`: 将多个可迭代对象连接成一个序列。\n", " * `compress(data, selectors)`: 根据 `selectors` 中的真值过滤 `data` 中的元素。\n", " * `dropwhile(predicate, iterable)`: 当 `predicate` 为真时,跳过 `iterable` 中的元素,然后返回剩余所有元素。\n", " * `filterfalse(predicate, iterable)`: 返回 `iterable` 中 `predicate` 为假的元素。\n", " * `groupby(iterable, key=None)`: 将连续的具有相同键值(由 `key` 函数确定)的元素分组。\n", " * `islice(iterable, stop)` 或 `islice(iterable, start, stop[, step])`: 返回 `iterable` 的一个切片,类似于列表切片,但返回迭代器。\n", " * `starmap(function, iterable)`: 类似于 `map`,但 `iterable` 中的每个元素是一个元组,会解包作为 `function` 的参数。\n", " * `takewhile(predicate, iterable)`: 只要 `predicate` 为真,就从 `iterable` 中返回元素。\n", " * `tee(iterable, n=2)`: 返回 `n` 个独立的迭代器,它们都从同一个原始 `iterable` 中获取元素。\n", " * `zip_longest(*iterables, fillvalue=None)`: 类似于 `zip`,但会用 `fillvalue` 填充最短的迭代器,直到所有迭代器耗尽。\n", "\n", "* **组合生成器:**\n", " * `product(*iterables, repeat=1)`: 笛卡尔积。\n", " * `permutations(iterable, r=None)`: 排列。\n", " * `combinations(iterable, r)`: 组合。\n", " * `combinations_with_replacement(iterable, r)`: 可重复组合。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import itertools\n", "\n", "print(\"--- itertools.count --- \")\n", "counter = itertools.count(10, 2)\n", "for _ in range(5):\n", " print(next(counter), end=\" \") # 10 12 14 16 18\n", "print(\"\\n\")\n", "\n", "print(\"--- itertools.cycle --- \")\n", "cycler = itertools.cycle(\"ABC\")\n", "for _ in range(7):\n", " print(next(cycler), end=\" \") # A B C A B C A\n", "print(\"\\n\")\n", "\n", "print(\"--- itertools.chain --- \")\n", "chained = itertools.chain([1, 2], \"XY\", (3, 4))\n", "print(list(chained)) # [1, 2, 'X', 'Y', 3, 4]\n", "\n", "print(\"--- itertools.islice --- \")\n", "sliced = itertools.islice(range(10), 2, 8, 2) # 从索引2到8 (不含),步长2\n", "print(list(sliced)) # [2, 4, 6]\n", "\n", "print(\"--- itertools.groupby --- \")\n", "data = \"AAABBCDAA\"\n", "for key, group in itertools.groupby(data):\n", " print(f\"Key: {key}, Group: {list(group)}\")\n", "# Key: A, Group: ['A', 'A', 'A']\n", "# Key: B, Group: ['B', 'B']\n", "# Key: C, Group: ['C']\n", "# Key: D, Group: ['D']\n", "# Key: A, Group: ['A', 'A']\n", "\n", "print(\"--- itertools.combinations --- \")\n", "combs = itertools.combinations(\"ABC\", 2)\n", "print(list(combs)) # [('A', 'B'), ('A', 'C'), ('B', 'C')]\n", "\n", "print(\"--- itertools.product --- \")\n", "prod = itertools.product(\"AB\", \"12\")\n", "print(list(prod)) # [('A', '1'), ('A', '2'), ('B', '1'), ('B', '2')]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. `yield from` 语句 (Python 3.3+)\n", "\n", "`yield from ` 语句允许一个生成器将其部分操作委托给另一个可迭代对象 (通常是另一个生成器)。\n", "\n", "它主要做了以下事情:\n", "1. 迭代 ``。\n", "2. 将从 `` 中产生的每个值直接传递给当前生成器的调用者。\n", "3. 如果 `` 本身是一个生成器,`yield from` 还会处理子生成器可能通过 `send()`, `throw()`, `close()` 接收到的值或异常,并将它们传递给子生成器。\n", "\n", "**用途:**\n", "* **简化生成器嵌套**:避免写很多 `for item in sub_generator: yield item` 这样的代码。\n", "* **构建协程管道** (虽然现代异步编程更多使用 `async/await`)。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def sub_generator(start, end):\n", " print(f\" sub_generator: called with {start}, {end}\")\n", " for i in range(start, end):\n", " print(f\" sub_generator: yielding {i}\")\n", " yield i\n", " print(\" sub_generator: finished\")\n", "\n", "def delegating_generator_manual(iterables_list):\n", " print(\"delegating_generator_manual: called\")\n", " for iterable in iterables_list:\n", " for item in iterable: # 手动迭代子可迭代对象\n", " yield item\n", " print(\"delegating_generator_manual: finished\")\n", "\n", "def delegating_generator_yield_from(iterables_list):\n", " print(\"delegating_generator_yield_from: called\")\n", " for iterable in iterables_list:\n", " # 使用 yield from 委托给子可迭代对象\n", " # 如果 iterable 是一个生成器,yield from 会建立一个双向通道\n", " yield from iterable \n", " print(\"delegating_generator_yield_from: finished\")\n", "\n", "print(\"--- Testing yield from ---\")\n", "data_sources = [\n", " sub_generator(1, 3), # 一个生成器\n", " \"XY\", # 一个字符串 (可迭代)\n", " (10, 11) # 一个元组 (可迭代)\n", "]\n", "\n", "print(\"\\nUsing manual delegation:\")\n", "for item in delegating_generator_manual(list(data_sources)): # list() to consume sub_generator once\n", " print(f\"Got item: {item}\")\n", "\n", "# 重新创建 data_sources 因为生成器会被消耗\n", "data_sources_2 = [\n", " sub_generator(1, 3),\n", " \"XY\",\n", " (10, 11)\n", "]\n", "print(\"\\nUsing yield from:\")\n", "for item in delegating_generator_yield_from(data_sources_2):\n", " print(f\"Got item: {item}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 7. 生成器的高级特性:`send()`, `throw()`, `close()`\n", "\n", "除了通过 `next()` 从生成器获取值,还可以向生成器发送值或异常,或者关闭它。这些特性使得生成器可以用作简单的**协程 (coroutine)** (这是 `async/await` 出现之前的协程概念)。\n", "\n", "* **`generator.send(value)`**:\n", " * 向生成器发送一个值,这个值会成为当前 `yield` 表达式的结果。\n", " * 生成器会从暂停处恢复执行,直到遇到下一个 `yield` (产生一个值) 或终止。\n", " * 在首次启动生成器时(即在第一次 `yield` 之前),必须发送 `None` (或者直接调用 `next(generator)`)。\n", "\n", "* **`generator.throw(type[, value[, traceback]])`**:\n", " * 在生成器暂停的地方(`yield` 表达式处)引发一个异常。\n", " * 如果生成器内部捕获了这个异常,它可以继续执行并 `yield` 一个值,或者正常退出(引发 `StopIteration`),或者引发另一个异常。\n", " * 如果生成器未捕获该异常,异常会传播给调用者。\n", "\n", "* **`generator.close()`**:\n", " * 在生成器暂停的地方引发一个 `GeneratorExit` 异常。\n", " * 生成器通常应该捕获 `GeneratorExit`,执行清理操作,然后要么重新引发 `GeneratorExit`,要么引发 `StopIteration`,要么正常退出。\n", " * 调用 `close()` 后,如果生成器尝试 `yield` 一个值,会引发 `RuntimeError`。\n", " * `close()` 之后的 `next()` 或 `send()` 调用也会引发 `StopIteration`。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def simple_coroutine():\n", " print(\"Coroutine started\")\n", " received_value = None\n", " try:\n", " while True:\n", " received_value = yield received_value # yield 表达式的值是 send() 过来的值\n", " print(f\"Coroutine received: {received_value}\")\n", " if received_value == \"exit\":\n", " print(\"Coroutine exiting normally\")\n", " break\n", " received_value = f\"Processed: {received_value}\"\n", " except GeneratorExit:\n", " print(\"Coroutine: Caught GeneratorExit, cleaning up...\")\n", " # 执行清理操作\n", " print(\"Coroutine: Cleaned up and closing.\")\n", " # 不应再 yield 值,可以重新引发 GeneratorExit 或 StopIteration,或直接返回\n", " except ValueError as e:\n", " print(f\"Coroutine: Caught ValueError: {e}\")\n", " yield f\"Error handled: {e}\" # 可以选择 yield 一个错误处理结果\n", " finally:\n", " print(\"Coroutine finally block executed\")\n", "\n", "print(\"--- Testing Coroutine send() ---\")\n", "co = simple_coroutine()\n", "next(co) # 启动协程,执行到第一个 yield,此时 received_value 为 None\n", "print(f\"Sent 10, got back: {co.send(10)}\") # 发送 10, yield 返回 'Processed: 10'\n", "print(f\"Sent 'hello', got back: {co.send('hello')}\") # 发送 'hello', yield 返回 'Processed: hello'\n", "\n", "print(\"\\n--- Testing Coroutine throw() ---\")\n", "co2 = simple_coroutine()\n", "next(co2)\n", "try:\n", " print(f\"Throwing ValueError, got back: {co2.throw(ValueError, 'Test error')}\")\n", "except ValueError as e:\n", " print(f\"Caller caught an unhandled error from coroutine: {e}\") # 如果协程不处理并重新抛出\n", "\n", "print(f\"Sending 'after error' to co2, got back: {co2.send('after error')}\") # 协程可能已处理异常并继续\n", "\n", "print(\"\\n--- Testing Coroutine close() ---\")\n", "co3 = simple_coroutine()\n", "next(co3)\n", "co3.send(\"data before close\")\n", "co3.close() # 关闭协程,会引发 GeneratorExit\n", "\n", "try:\n", " next(co3) # 尝试再次从已关闭的协程获取值\n", "except StopIteration:\n", " print(\"Caught StopIteration after close, as expected.\")\n", "\n", "print(\"\\n--- Testing Coroutine exit command ---\")\n", "co4 = simple_coroutine()\n", "next(co4)\n", "co4.send(\"some data\")\n", "try:\n", " co4.send(\"exit\") # 协程内部处理 exit 并正常结束\n", "except StopIteration:\n", " print(\"Caught StopIteration after coroutine exited via 'exit' command.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "虽然 `async/await` 是现代 Python 中进行异步编程和协程的首选方式,但理解传统生成器协程的这些机制有助于理解 Python 异步历史以及某些库的底层实现。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 8. 应用场景与最佳实践\n", "\n", "**何时使用迭代器/生成器?**\n", "\n", "* **处理大型数据集**:当数据无法一次性装入内存时(例如,读取大文件、数据库查询结果)。\n", "* **无限序列**:如计数器、斐波那契数列、随机数流。\n", "* **数据处理管道**:将多个生成器链接起来,以流式方式处理数据,每一步都是惰性的。\n", " ```python\n", " # lines = (line for line in open('large_file.txt'))\n", " # non_empty_lines = (line for line in lines if line.strip())\n", " # processed_lines = (process(line) for line in non_empty_lines)\n", " # for result in processed_lines:\n", " # # ...\n", " ```\n", "* **需要自定义迭代行为的类**。\n", "* **替代简单的列表推导式以节省内存**,如果结果列表很大且不需要立即全部使用。\n", "\n", "**最佳实践:**\n", "\n", "1. **优先使用生成器表达式**:对于简单的惰性序列生成,生成器表达式最简洁。\n", "2. **使用生成器函数**:当迭代逻辑复杂,需要多个 `yield` 或内部状态时。\n", "3. **利用 `itertools`**:在自己动手实现复杂迭代逻辑之前,先看看 `itertools` 是否有现成的解决方案。\n", "4. **理解迭代器是一次性的**:如果需要多次迭代,要么重新创建迭代器/生成器,要么将结果存储在列表中(如果内存允许)。\n", "5. **`yield from` 可以使代码更扁平**:当委托给其他可迭代对象时。\n", "6. **谨慎使用生成器的高级方法 (`send`, `throw`, `close`)**:它们引入了更复杂的控制流,对于大多数迭代场景是不必要的。现代异步编程应优先考虑 `async/await`。\n", "\n", "## 总结\n", "\n", "迭代器和生成器是 Python 中非常强大且基础的特性。它们不仅是许多内置功能(如 `for` 循环)的核心,还提供了一种优雅、高效的方式来处理数据流和序列。\n", "\n", "通过深入理解迭代协议、生成器函数、生成器表达式以及 `itertools` 模块,你可以编写出更 Pythonic、更高效、内存更友好的代码。" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.0" } }, "nbformat": 4, "nbformat_minor": 5 }