w32zhong · February 10, 2025 22:09 · Feb 10, 2025 · Feb 6, 2025 · Feb 6, 2025
diff --git a/steps.md b/steps.md
@@ -55,4 +55,9 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch -m --mixed_precision=bf16 eagle.t
     --tmpdir /mnt/wd_ssd/sharegpt_0_67999_mufp16/ --cpdir ./ckpt --configpath ./llama_2_chat_7B_config.json \
     --basepath ~/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-chat-hf/snapshots/f5db02db724555f92da89c216ac04704f23d4590 \
     --gradient-accumulation-steps 4 --bs 1
+```
+
+After training, use [a simple test script](https://github.com/w32zhong/EAGLE/blob/eagle-v1-save/application/test_v1.py) to evaluate the speed based on the saved model. For example, for 10-epoch checkpoint, change `ea_model_path` to
+```py
+ea_model_path='../EAGLE-v1/eagle/train/ckpt/model_9'
 ```
diff --git a/gistfile1.txt → steps.md b/gistfile1.txt → steps.md
diff --git a/gistfile1.txt b/gistfile1.txt
@@ -0,0 +1,58 @@
+## EAGLE v1 Replication
+Set up environment and run an inference test:
+```sh
+git clone --branch v1 --depth 1 https://github.com/SafeAILab/EAGLE.git EAGLE-v1
+cd EAGLE-v1
+wget https://raw.githubusercontent.com/w32zhong/EAGLE/refs/heads/eagle-v1-save/application/test_v1.py -O eagle/application/test_v1.py
+pip install -e .
+pip install transformers==4.36.2
+pip install accelerate==0.21.0
+pip install datasets==3.2.0
+cd eagle
+CUDA_VISIBLE_DEVICES=0 python application/test_v1.py
+```
+
+Go to `eagle/ge_data/allocation.py` and change your training GPU allocations. For example, in my case,
+```py
+gpus=[[0, 1],[2, 3]]
+```
+Then replace `ge_data_all_vicuna.py` to `ge_data_all_llama2chat.py` in `allocation.py` for training llama base model.
+
+Go to `eagle/ge_data/ge_data_all_llama2chat.py` and change the following:
+```py
+# bigname="/home/hongyanz/scratch/weights/llama2chat/13B"
+bigname="meta-llama/Llama-2-7b-chat-hf"
+...
+# ds = load_dataset('json', data_files="/home/hongyanz/scratch/data/ShareGPT_V4.3_unfiltered_cleaned_split.json")
+ds = load_dataset(
+    path="Aeala/ShareGPT_Vicuna_unfiltered",
+    data_files=["ShareGPT_V4.3_unfiltered_cleaned_split.json"],
+    revision='8b0048ad6ae8c22f46a78c15559dec98feef5539'
+)
+```
+
+Run the following to generate training data:
+```sh
+cd ge_data
+python -m eagle.ge_data.allocation --outdir /mnt/wd_ssd/
+```
+(`/mnt/wd_ssd` is my data storage directory)
+
+This will take a few hours and will consume 756 GiB disk space.
+
+Change directory to `../train` and modify the wandb settings in 'main.py':
+```py
+#wandb.init(project="ess", entity="yuhui-li", config=train_config)
+wandb.init(project="beagle", config=train_config)
+```
+
+Importantly, change the `list_files` function to filter out empty training files (in my experience there are 0.5% empty inputs), and skip all in-training tests due to potential divided-by-zero errors.
+Check out [this patch](https://github.com/w32zhong/EAGLE/blob/eagle-v1-save/patch_v1.diff) for all detailed changes.
+
+Now train the speculative decoder model:
+```sh
+CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch -m --mixed_precision=bf16 eagle.train.main \
+    --tmpdir /mnt/wd_ssd/sharegpt_0_67999_mufp16/ --cpdir ./ckpt --configpath ./llama_2_chat_7B_config.json \
+    --basepath ~/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-chat-hf/snapshots/f5db02db724555f92da89c216ac04704f23d4590 \
+    --gradient-accumulation-steps 4 --bs 1
+```