Skip to content

Instantly share code, notes, and snippets.

View ypchen520's full-sized avatar
πŸ€
Focusing

Yu-Peng Chen ypchen520

πŸ€
Focusing
View GitHub Profile

.to() Method

  • The .to() method is versatile and widely used for type conversion and device migration.
  • It allows you to change both the data type and the device (CPU/GPU) in a single line.
tensor.to(device='cuda', dtype=torch.float32)

.type() Method

__main__.py

mypackage/
β”œβ”€β”€ __init__.py
└── __main__.py
  • This file is used to define the entry point of a Python package.
  • When a directory is used as a package (contains an __init__.py file), and it is executed as the main program (using python -m ), the __main__.py file in that package is executed.
class AttrTest:
classAttr = "I'm a class attr"
obj1 = AttrTest()
obj2 = AttrTest()
print(f"before changing by through the class: obj1: {obj1.classAttr} and obj2: {obj2.classAttr}")
AttrTest.classAttr = "Modified via the class"

Script

πŸ¦‰ Source

Hyperparameters

batch_size = 32 # how many independent sequences will we process in parallel

The Deep Q-Learning Algorithm

πŸͺΆ Source: HuggingFace course

Deep Q-Learning uses a deep neural network to approximate the different Q-values for each possible action at a state (value-function estimation).

In Deep Q-Learning, we create a loss function that compares our Q-value prediction and the Q-target and uses gradient descent to update the weights of our Deep Q-Network to approximate our Q-values better.

The Deep Q-Learning pseudocode

Deep Q-Learning

πŸͺΆ Source: HuggingFace course

Introduction

  • FrozenLake-v1 β˜ƒοΈ and Taxi-v3 πŸš•: the state space was discrete and small (16 different states for FrozenLake-v1 and 500 for Taxi-v3).
    • The state space in Atari games can contain $10^9$ to $10^11$ states
    • Producing and updating a Q-table can become ineffective in large state space environments
  • Instead of using a Q-table, Deep Q-Learning uses a Neural Network that takes a state and approximates Q-values for each action based on that state.

5. Introducing Q-Learning

πŸͺΆ Source: HuggingFace course

What is Q-Learning

Q-Learning is an off-policy value-based method that uses a TD approach to train its action-value function

  • Off-policy: using a different policy for acting (inference) and updating (training).
    • The epsilon-greedy policy (for acting) vs. the greedy policy (for updating our Q-value)
  • Value-based method:

Regularization

πŸ”Ž Source: Google ML

  • Any mechanism that reduces overfitting.
  • Popular types of regularization include:
    • $L_1$ regularization.
    • $L_2$ regularization.
    • Dropout regularization.
  • Early stopping (this is not a formal regularization method, but can effectively limit overfitting).

4. Monte Carlo vs. Temporal Difference Learning

πŸͺΆ Source: HuggingFace course

Summary

  • With Monte Carlo, we update the value function from a complete episode, and so we use the actual accurate discounted return of this episode.
  • With TD Learning, we update the value function from a step, and we replace $G_t$, which we don’t know, with an estimated return called the TD target.

Monte Carlo: $V(S_t)\leftarrow V(S_t) + \alpha \times [G_t - V(S_t)]$