Skip to content

Instantly share code, notes, and snippets.

View dhruvilp's full-sized avatar
๐Ÿ’ญ
๐Ÿ‘จโ€๐Ÿ’ป working on something really cool

Dhruvil Patel dhruvilp

๐Ÿ’ญ
๐Ÿ‘จโ€๐Ÿ’ป working on something really cool
View GitHub Profile
@dhruvilp
dhruvilp / Dockerfile
Last active November 10, 2025 05:17
vllm docling granite model
# Use an AWS Deep Learning Container (DLC) as a base or a vLLM specific image
# Ensure the base image has the necessary CUDA drivers and PyTorch
FROM vllm/vllm-openai:latest # Or a specific version that matches your CUDA
# Copy the pre-downloaded model weights into the container image
COPY /mnt/models/granite-docling-258M /app/local_model
WORKDIR /app
# The entrypoint command will use the local directory path for the --model argument
@dhruvilp
dhruvilp / notes-2.md
Created November 5, 2025 03:24
gpt-oss-20b-fine-tuning-q3-max-part-1

Here's a complete, battle-tested end-to-end script specifically designed for fine-tuning the MXFP4-quantized MoE GPT-oss-20B model on your 4ร—A10G (96GB) setup. This leverages QLoRA for memory efficiency while handling MXFP4 quantization properly.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Fine-tune MXFP4-quantized MoE GPT-oss-20B with QLoRA
Hardware: 4ร— NVIDIA A10G (24GB VRAM each)
Key Tech: bitsandbytes (MXFP4), PEFT (QLoRA), FlashAttention-2, DeepSpeed ZeRO-3
"""
@dhruvilp
dhruvilp / Ft.py
Last active November 12, 2025 22:06
Oss ft
# train.py
# Run with: accelerate launch --num_processes 4 train.py
# Make sure to have accelerate config set up for DDP, or it will auto.
import os
import torch
from datasets import load_dataset
from peft import LoraConfig, get_peft_model, TaskType
from transformers import (
AutoModelForCausalLM,
@dhruvilp
dhruvilp / README.md
Created October 21, 2025 03:56
docling plain parallel processing

Granite Docling Document Converter

A high-performance, parallel-processing library for converting documents to Markdown, JSON, and DocTags using the Granite Docling model. No FastAPI, Flask, or web frameworks required - pure Python library with sync and async support.

๐Ÿš€ Features

  • No Web Framework Required: Pure Python library - use it directly in your code
  • Parallel Processing: Process large PDFs with multiple workers for maximum speed
  • Async Support: Full async/await support for non-blocking operations
  • Multiple Output Formats: Convert to Markdown, JSON, DocTags
@dhruvilp
dhruvilp / README.md
Last active November 3, 2025 23:26
granite-docling-258m inference
[
  {
    "content": "reasoning language: English\n\nYou are an intelligent assistant that can answer customer service queries",
    "role": "system",
    "thinking": null
  },
  {
    "content": "Can you provide me with a list of the top-rated series currently on Netflix?",
 "role": "user",
@dhruvilp
dhruvilp / quant-gpt-oss.py
Last active October 15, 2025 20:44
quant gpt oss local
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import time
model_path = './gpt-oss-model-local'
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
@dhruvilp
dhruvilp / data-gen-script-gen-qa.py
Last active October 13, 2025 03:22
gpt-oss-20b-ft-lora-sample-code
import json
import pandas as pd
from langchain.text_splitter import RecursiveCharacterTextSplitter
from openai import OpenAI
import os
from typing import List, Dict
import random
class SyntheticDataGenerator:
def __init__(self, api_key: str = None, model: str = "gpt-4"):
@dhruvilp
dhruvilp / webpageloader.py
Created October 6, 2025 21:06
crawl4ai web page loader
import asyncio
import json
import os
from base64 import b64decode
from typing import List, Dict, Optional, Any
from pydantic import BaseModel, Field
from crawl4ai import (
AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode,
JsonCssExtractionStrategy, LLMExtractionStrategy, LLMConfig,
@dhruvilp
dhruvilp / t_to_sb.txt
Created March 25, 2025 03:49
Tomcat to Spring Boot
nference Providers
NEW
Fireworks
Text Generation
Reset
Examples
Input a message to start chatting with deepseek-ai/DeepSeek-V3-0324.
How can I convert an app running on tomcat Catalina 8 server to spring boot app with jdk 17
@dhruvilp
dhruvilp / thinking_tokens.py
Created February 18, 2025 16:01 — forked from zainhas/thinking_tokens.py
Extract ONLY thinking tokens from DeepSeek-R1
from together import Together
client = Together(api_key = TOGETHER_API_KEY)
question = "Which is larger 9.9 or 9.11?"
thought = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1",
messages=[{"role": "user", "content": question}],
stop = ['</think>']
)