w32zhong · February 10, 2025 22:09
diff --git a/gistfile1.txt b/gistfile1.txt
 ## EAGLE v1 Replication
 Set up environment and run an inference test:
 ```sh
 git clone --branch v1 --depth 1 https://github.com/SafeAILab/EAGLE.git EAGLE-v1
 cd EAGLE-v1
 wget https://raw.githubusercontent.com/w32zhong/EAGLE/refs/heads/eagle-v1-save/application/test_v1.py -O eagle/application/test_v1.py
 pip install -e .
 pip install transformers==4.36.2
 pip install accelerate==0.21.0
 pip install datasets==3.2.0
 cd eagle
 CUDA_VISIBLE_DEVICES=0 python application/test_v1.py
 ```

 Go to `eagle/ge_data/allocation.py` and change your training GPU allocations. For example, in my case,
 ```py
 gpus=[[0, 1],[2, 3]]
 ```
 Then replace `ge_data_all_vicuna.py` to `ge_data_all_llama2chat.py` in `allocation.py` for training llama base model.

 Go to `eagle/ge_data/ge_data_all_llama2chat.py` and change the following:
 ```py
 # bigname="/home/hongyanz/scratch/weights/llama2chat/13B"
 bigname="meta-llama/Llama-2-7b-chat-hf"
 ...
 # ds = load_dataset('json', data_files="/home/hongyanz/scratch/data/ShareGPT_V4.3_unfiltered_cleaned_split.json")
 ds = load_dataset(
    path="Aeala/ShareGPT_Vicuna_unfiltered",
    data_files=["ShareGPT_V4.3_unfiltered_cleaned_split.json"],
    revision='8b0048ad6ae8c22f46a78c15559dec98feef5539'
 )
 ```

 Run the following to generate training data:
 ```sh
 cd ge_data
 python -m eagle.ge_data.allocation --outdir /mnt/wd_ssd/
 ```
 (`/mnt/wd_ssd` is my data storage directory)

 This will take a few hours and will consume 756 GiB disk space.

 Change directory to `../train` and modify the wandb settings in 'main.py':
 ```py
 #wandb.init(project="ess", entity="yuhui-li", config=train_config)
 wandb.init(project="beagle", config=train_config)
 ```

 Importantly, change the `list_files` function to filter out empty training files (in my experience there are 0.5% empty inputs), and skip all in-training tests due to potential divided-by-zero errors.
 Check out [this patch](https://github.com/w32zhong/EAGLE/blob/eagle-v1-save/patch_v1.diff) for all detailed changes.

 Now train the speculative decoder model:
 ```sh
 CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch -m --mixed_precision=bf16 eagle.train.main \
    --tmpdir /mnt/wd_ssd/sharegpt_0_67999_mufp16/ --cpdir ./ckpt --configpath ./llama_2_chat_7B_config.json \
    --basepath ~/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-chat-hf/snapshots/f5db02db724555f92da89c216ac04704f23d4590 \
    --gradient-accumulation-steps 4 --bs 1
 ```
	## EAGLE v1 Replication
	Set up environment and run an inference test:
	```sh
	git clone --branch v1 --depth 1 https://github.com/SafeAILab/EAGLE.git EAGLE-v1
	cd EAGLE-v1
	wget https://raw.githubusercontent.com/w32zhong/EAGLE/refs/heads/eagle-v1-save/application/test_v1.py -O eagle/application/test_v1.py
	pip install -e .
	pip install transformers==4.36.2
	pip install accelerate==0.21.0
	pip install datasets==3.2.0
	cd eagle
	CUDA_VISIBLE_DEVICES=0 python application/test_v1.py
	```

	Go to `eagle/ge_data/allocation.py` and change your training GPU allocations. For example, in my case,
	```py
	gpus=[[0, 1],[2, 3]]
	```
	Then replace `ge_data_all_vicuna.py` to `ge_data_all_llama2chat.py` in `allocation.py` for training llama base model.

	Go to `eagle/ge_data/ge_data_all_llama2chat.py` and change the following:
	```py
	# bigname="/home/hongyanz/scratch/weights/llama2chat/13B"
	bigname="meta-llama/Llama-2-7b-chat-hf"
	...
	# ds = load_dataset('json', data_files="/home/hongyanz/scratch/data/ShareGPT_V4.3_unfiltered_cleaned_split.json")
	ds = load_dataset(
	path="Aeala/ShareGPT_Vicuna_unfiltered",
	data_files=["ShareGPT_V4.3_unfiltered_cleaned_split.json"],
	revision='8b0048ad6ae8c22f46a78c15559dec98feef5539'
	)
	```

	Run the following to generate training data:
	```sh
	cd ge_data
	python -m eagle.ge_data.allocation --outdir /mnt/wd_ssd/
	```
	(`/mnt/wd_ssd` is my data storage directory)

	This will take a few hours and will consume 756 GiB disk space.

	Change directory to `../train` and modify the wandb settings in 'main.py':
	```py
	#wandb.init(project="ess", entity="yuhui-li", config=train_config)
	wandb.init(project="beagle", config=train_config)
	```

	Importantly, change the `list_files` function to filter out empty training files (in my experience there are 0.5% empty inputs), and skip all in-training tests due to potential divided-by-zero errors.
	Check out [this patch](https://github.com/w32zhong/EAGLE/blob/eagle-v1-save/patch_v1.diff) for all detailed changes.

	Now train the speculative decoder model:
	```sh
	CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch -m --mixed_precision=bf16 eagle.train.main \
	--tmpdir /mnt/wd_ssd/sharegpt_0_67999_mufp16/ --cpdir ./ckpt --configpath ./llama_2_chat_7B_config.json \
	--basepath ~/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-chat-hf/snapshots/f5db02db724555f92da89c216ac04704f23d4590 \
	--gradient-accumulation-steps 4 --bs 1
	```