Set up environment and run an inference test:
git clone --branch v1 --depth 1 https://github.com/SafeAILab/EAGLE.git EAGLE-v1
cd EAGLE-v1
wget https://raw.githubusercontent.com/w32zhong/EAGLE/refs/heads/eagle-v1-save/application/test_v1.py -O eagle/application/test_v1.py
pip install -e .
pip install transformers==4.36.2
pip install accelerate==0.21.0
pip install datasets==3.2.0
cd eagle
CUDA_VISIBLE_DEVICES=0 python application/test_v1.pyGo to eagle/ge_data/allocation.py and change your training GPU allocations. For example, in my case,
gpus=[[0, 1],[2, 3]]Then replace ge_data_all_vicuna.py to ge_data_all_llama2chat.py in allocation.py for training llama base model.
Go to eagle/ge_data/ge_data_all_llama2chat.py and change the following:
# bigname="/home/hongyanz/scratch/weights/llama2chat/13B"
bigname="meta-llama/Llama-2-7b-chat-hf"
...
# ds = load_dataset('json', data_files="/home/hongyanz/scratch/data/ShareGPT_V4.3_unfiltered_cleaned_split.json")
ds = load_dataset(
path="Aeala/ShareGPT_Vicuna_unfiltered",
data_files=["ShareGPT_V4.3_unfiltered_cleaned_split.json"],
revision='8b0048ad6ae8c22f46a78c15559dec98feef5539'
)Run the following to generate training data:
cd ge_data
python -m eagle.ge_data.allocation --outdir /mnt/wd_ssd/(/mnt/wd_ssd is my data storage directory)
This will take a few hours and will consume 756 GiB disk space.
Change directory to ../train and modify the wandb settings in 'main.py':
#wandb.init(project="ess", entity="yuhui-li", config=train_config)
wandb.init(project="beagle", config=train_config)Importantly, change the list_files function to filter out empty training files (in my experience there are 0.5% empty inputs), and skip all in-training tests due to potential divided-by-zero errors.
Check out this patch for all detailed changes.
Now train the speculative decoder model:
CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch -m --mixed_precision=bf16 eagle.train.main \
--tmpdir /mnt/wd_ssd/sharegpt_0_67999_mufp16/ --cpdir ./ckpt --configpath ./llama_2_chat_7B_config.json \
--basepath ~/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-chat-hf/snapshots/f5db02db724555f92da89c216ac04704f23d4590 \
--gradient-accumulation-steps 4 --bs 1After training, use a simple test script to evaluate the speed based on the saved model. For example, for 10-epoch checkpoint, change ea_model_path to
ea_model_path='../EAGLE-v1/eagle/train/ckpt/model_9'