Skip to content

Instantly share code, notes, and snippets.

@w32zhong
Last active February 10, 2025 22:09
Show Gist options
  • Save w32zhong/7d4bf1e0f5b0b7352619f277cfadbba4 to your computer and use it in GitHub Desktop.
Save w32zhong/7d4bf1e0f5b0b7352619f277cfadbba4 to your computer and use it in GitHub Desktop.

EAGLE v1 Replication

Set up environment and run an inference test:

git clone --branch v1 --depth 1 https://github.com/SafeAILab/EAGLE.git EAGLE-v1
cd EAGLE-v1
wget https://raw.githubusercontent.com/w32zhong/EAGLE/refs/heads/eagle-v1-save/application/test_v1.py -O eagle/application/test_v1.py
pip install -e .
pip install transformers==4.36.2
pip install accelerate==0.21.0
pip install datasets==3.2.0
cd eagle
CUDA_VISIBLE_DEVICES=0 python application/test_v1.py

Go to eagle/ge_data/allocation.py and change your training GPU allocations. For example, in my case,

gpus=[[0, 1],[2, 3]]

Then replace ge_data_all_vicuna.py to ge_data_all_llama2chat.py in allocation.py for training llama base model.

Go to eagle/ge_data/ge_data_all_llama2chat.py and change the following:

# bigname="/home/hongyanz/scratch/weights/llama2chat/13B"
bigname="meta-llama/Llama-2-7b-chat-hf"
...
# ds = load_dataset('json', data_files="/home/hongyanz/scratch/data/ShareGPT_V4.3_unfiltered_cleaned_split.json")
ds = load_dataset(
    path="Aeala/ShareGPT_Vicuna_unfiltered",
    data_files=["ShareGPT_V4.3_unfiltered_cleaned_split.json"],
    revision='8b0048ad6ae8c22f46a78c15559dec98feef5539'
)

Run the following to generate training data:

cd ge_data
python -m eagle.ge_data.allocation --outdir /mnt/wd_ssd/

(/mnt/wd_ssd is my data storage directory)

This will take a few hours and will consume 756 GiB disk space.

Change directory to ../train and modify the wandb settings in 'main.py':

#wandb.init(project="ess", entity="yuhui-li", config=train_config)
wandb.init(project="beagle", config=train_config)

Importantly, change the list_files function to filter out empty training files (in my experience there are 0.5% empty inputs), and skip all in-training tests due to potential divided-by-zero errors. Check out this patch for all detailed changes.

Now train the speculative decoder model:

CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch -m --mixed_precision=bf16 eagle.train.main \
    --tmpdir /mnt/wd_ssd/sharegpt_0_67999_mufp16/ --cpdir ./ckpt --configpath ./llama_2_chat_7B_config.json \
    --basepath ~/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-chat-hf/snapshots/f5db02db724555f92da89c216ac04704f23d4590 \
    --gradient-accumulation-steps 4 --bs 1

After training, use a simple test script to evaluate the speed based on the saved model. For example, for 10-epoch checkpoint, change ea_model_path to

ea_model_path='../EAGLE-v1/eagle/train/ckpt/model_9'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment