Yu-Peng Chen ypchen520

`.to()` Method

The .to() method is versatile and widely used for type conversion and device migration.
It allows you to change both the data type and the device (CPU/GPU) in a single line.

tensor.to(device='cuda', dtype=torch.float32)

`.type()` Method

`main.py`

mypackage/
├── __init__.py
└── __main__.py

This file is used to define the entry point of a Python package.
When a directory is used as a package (contains an __init__.py file), and it is executed as the main program (using python -m ), the __main__.py file in that package is executed.

Script

🦉 Source

Hyperparameters

batch_size = 32 # how many independent sequences will we process in parallel

The Deep Q-Learning Algorithm

🪶 Source: HuggingFace course

Deep Q-Learning uses a deep neural network to approximate the different Q-values for each possible action at a state (value-function estimation).

In Deep Q-Learning, we create a loss function that compares our Q-value prediction and the Q-target and uses gradient descent to update the weights of our Deep Q-Network to approximate our Q-values better.

The Deep Q-Learning pseudocode

Deep Q-Learning

🪶 Source: HuggingFace course

Introduction

FrozenLake-v1 ☃️ and Taxi-v3 🚕: the state space was discrete and small (16 different states for FrozenLake-v1 and 500 for Taxi-v3).
- The state space in Atari games can contain $10^9$ to $10^11$ states
- Producing and updating a Q-table can become ineffective in large state space environments
Instead of using a Q-table, Deep Q-Learning uses a Neural Network that takes a state and approximates Q-values for each action based on that state.

Q-Learning with FrozenLake-v1 ⛄ and Taxi-v3 🚕

🪶 Source: HuggingFace course

Environments

RL-Library

5. Introducing Q-Learning

🪶 Source: HuggingFace course

What is Q-Learning

Q-Learning is an off-policy value-based method that uses a TD approach to train its action-value function

Off-policy: using a different policy for acting (inference) and updating (training).
- The epsilon-greedy policy (for acting) vs. the greedy policy (for updating our Q-value)
Value-based method:

Regularization

🔎 Source: Google ML

Any mechanism that reduces overfitting.
Popular types of regularization include:
- $L_1$ regularization.
- $L_2$ regularization.
- Dropout regularization.
Early stopping (this is not a formal regularization method, but can effectively limit overfitting).

4. Monte Carlo vs. Temporal Difference Learning

🪶 Source: HuggingFace course

Summary

With Monte Carlo, we update the value function from a complete episode, and so we use the actual accurate discounted return of this episode.
With TD Learning, we update the value function from a step, and we replace $G_t$, which we don’t know, with an estimated return called the TD target.

Monte Carlo: $V(S_t)\leftarrow V(S_t) + \alpha \times [G_t - V(S_t)]$

	class AttrTest:
	classAttr = "I'm a class attr"

	obj1 = AttrTest()
	obj2 = AttrTest()

	print(f"before changing by through the class: obj1: {obj1.classAttr} and obj2: {obj2.classAttr}")

	AttrTest.classAttr = "Modified via the class"

Yu-Peng Chen ypchen520

.to() Method

.type() Method

__main__.py

Script

Hyperparameters

The Deep Q-Learning Algorithm

The Deep Q-Learning pseudocode

Deep Q-Learning

Introduction

Q-Learning with FrozenLake-v1 ⛄ and Taxi-v3 🚕

Environments

RL-Library

5. Introducing Q-Learning

What is Q-Learning

Regularization

4. Monte Carlo vs. Temporal Difference Learning

Summary

`.to()` Method

`.type()` Method

`main.py`