Last active
November 10, 2025 05:17
-
-
Save dhruvilp/baea0a0b1470e38b4609f5e01e501ab2 to your computer and use it in GitHub Desktop.
vllm docling granite model
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| mkdir vllm_wheels | |
| pip download vllm torch torchvision torchaudio --only-binary :all: -d vllm_wheels | |
| # You will also need to download all dependencies listed in vllm's setup.py or pyproject.toml | |
| cd vllm_wheels | |
| pip install --no-index --find-links . vllm | |
| # serve the model | |
| # Assuming model weights are at /opt/app/model_weights | |
| python -m vllm.entrypoints.openai.api_server --model /opt/app/model_weights --host 0.0.0.0 --port 8000 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Use an AWS Deep Learning Container (DLC) as a base or a vLLM specific image | |
| # Ensure the base image has the necessary CUDA drivers and PyTorch | |
| FROM vllm/vllm-openai:latest # Or a specific version that matches your CUDA | |
| # Copy the pre-downloaded model weights into the container image | |
| COPY /mnt/models/granite-docling-258M /app/local_model | |
| WORKDIR /app | |
| # The entrypoint command will use the local directory path for the --model argument | |
| CMD ["python", "-m", "vllm.entrypoints.openai.api_server", "--model", "/app/local_model"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from vllm import LLM, SamplingParams | |
| # Point to the local directory path | |
| model_path = "/path/to/your/local/model/directory" | |
| # Initialize the LLM engine | |
| llm = LLM(model=model_path, trust_remote_code=True) # trust_remote_code might be needed | |
| # Define sampling parameters | |
| sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=100) | |
| # Generate predictions | |
| prompts = ["What is the capital of France?", "The sky is what color?"] | |
| outputs = llm.generate(prompts, sampling_params) | |
| for output in outputs: | |
| prompt = output.prompt | |
| generated_text = output.outputs[0].text | |
| print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment