Skip to content

Instantly share code, notes, and snippets.

@arun-gupta
Last active September 14, 2024 21:20
Show Gist options
  • Save arun-gupta/f25cfc7ee7bc1e02bc2da04abaa7e7c0 to your computer and use it in GitHub Desktop.
Save arun-gupta/f25cfc7ee7bc1e02bc2da04abaa7e7c0 to your computer and use it in GitHub Desktop.

Revisions

  1. arun-gupta revised this gist Sep 14, 2024. 1 changed file with 3 additions and 1 deletion.
    4 changes: 3 additions & 1 deletion readme.md
    Original file line number Diff line number Diff line change
    @@ -202,4 +202,6 @@ Validate the services as explained in [OPEA on AWS document](https://gist.github
    data: b'.'
    data: b' It'
    ```
    ```

    <img width="1018" alt="image" src="https://gist.github.com/user-attachments/assets/d52bb8dc-a319-4664-b50e-dd00776a064e">
  2. arun-gupta revised this gist Sep 14, 2024. 1 changed file with 24 additions and 28 deletions.
    52 changes: 24 additions & 28 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -79,7 +79,7 @@ sudo apt-get -y install docker-ce docker-ce-cli containerd.io docker-buildx-plug
    sudo docker compose -f compose.yaml up -d
    ```
    - Verify the list of containers:
    ```
    ```
    sudo docker container ls
    CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
    f2a6fa5ea3b7 opea/chatqna-ui:latest "docker-entrypoint.s…" 12 seconds ago Up 10 seconds 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-xeon-ui-server
    @@ -115,19 +115,19 @@ Validate the services as explained in [OPEA on AWS document](https://gist.github
    It takes ~5 minutes for this service to be ready. Wait till you see this log output:
    ```
    . . .
    2024-09-12T02:14:07.324250Z INFO shard-manager: text_generation_launcher: Shard ready in 20.620625696s rank=0
    2024-09-12T02:14:07.380398Z INFO text_generation_launcher: Starting Webserver
    2024-09-12T02:14:07.526375Z INFO text_generation_router_v3: backends/v3/src/lib.rs:90: Warming up model
    2024-09-12T02:14:42.046106Z INFO text_generation_launcher: Cuda Graphs are disabled (CUDA_GRAPHS=None).
    2024-09-12T02:14:42.046591Z INFO text_generation_router_v3: backends/v3/src/lib.rs:102: Setting max batch total tokens to 83088
    2024-09-12T02:14:42.047332Z INFO text_generation_router_v3: backends/v3/src/lib.rs:126: Using backend V3
    2024-09-12T02:14:42.051963Z INFO text_generation_router::server: router/src/server.rs:1651: Using the Hugging Face API
    2024-09-12T02:14:42.066054Z INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"
    2024-09-12T02:14:42.473427Z INFO text_generation_router::server: router/src/server.rs:2349: Serving revision 52d98fdf3b050731f1476f1af6efc7970f14c67e of model Intel/neural-chat-7b-v3-3
    2024-09-12T02:14:42.516228Z INFO text_generation_router::server: router/src/server.rs:1747: Overriding LlamaTokenizer with TemplateProcessing to follow python override defined in https://github.com/huggingface/transformers/blob/4aa17d00690b7f82c95bb2949ea57e22c35b4336/src/transformers/models/llama/tokenization_llama_fast.py#L203-L205
    2024-09-12T02:14:42.516696Z INFO text_generation_router::server: router/src/server.rs:1781: Using config Some(Mistral)
    2024-09-12T02:14:42.516736Z WARN text_generation_router::server: router/src/server.rs:1928: Invalid hostname, defaulting to 0.0.0.0
    2024-09-12T02:14:42.528179Z INFO text_generation_router::server: router/src/server.rs:2311: Connected
    2024-09-14T20:38:05.558334Z INFO shard-manager: text_generation_launcher: Shard ready in 35.550264586s rank=0
    2024-09-14T20:38:05.639996Z INFO text_generation_launcher: Starting Webserver
    2024-09-14T20:38:05.708611Z INFO text_generation_router_v3: backends/v3/src/lib.rs:90: Warming up model
    2024-09-14T20:54:53.025600Z INFO text_generation_launcher: Cuda Graphs are disabled (CUDA_GRAPHS=None).
    2024-09-14T20:54:53.026040Z INFO text_generation_router_v3: backends/v3/src/lib.rs:102: Setting max batch total tokens to 80240
    2024-09-14T20:54:53.026618Z INFO text_generation_router_v3: backends/v3/src/lib.rs:126: Using backend V3
    2024-09-14T20:54:53.029554Z INFO text_generation_router::server: router/src/server.rs:1651: Using the Hugging Face API
    2024-09-14T20:54:53.037101Z INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"
    2024-09-14T20:54:53.467570Z INFO text_generation_router::server: router/src/server.rs:2349: Serving revision 52d98fdf3b050731f1476f1af6efc7970f14c67e of model Intel/neural-chat-7b-v3-3
    2024-09-14T20:54:53.513362Z INFO text_generation_router::server: router/src/server.rs:1747: Overriding LlamaTokenizer with TemplateProcessing to follow python override defined in https://github.com/huggingface/transformers/blob/4aa17d00690b7f82c95bb2949ea57e22c35b4336/src/transformers/models/llama/tokenization_llama_fast.py#L203-L205
    2024-09-14T20:54:53.513655Z INFO text_generation_router::server: router/src/server.rs:1781: Using config Some(Mistral)
    2024-09-14T20:54:53.513707Z WARN text_generation_router::server: router/src/server.rs:1928: Invalid hostname, defaulting to 0.0.0.0
    2024-09-14T20:54:53.523637Z INFO text_generation_router::server: router/src/server.rs:2311: Connected
    ```

    ### Let's run!
    @@ -162,14 +162,21 @@ Validate the services as explained in [OPEA on AWS document](https://gist.github
    curl -X POST "http://${host_ip}:6007/v1/dataprep" \
    -H "Content-Type: multipart/form-data" \
    -F 'link_list=["https://opea.dev"]'
    ```
    with the answer:
    ```
    {"status":200,"message":"Data preparation succeeded"}
    {"status":200,"message":"Data preparation succeeded"}status:200: command not found
    ```
    - Ask the question:
    ```
    arun_gupta@opea-demo:~$ curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
    curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
    "messages": "What is OPEA?"
    }'
    ```

    with the answer:

    ```
    data: b'\n'
    data: b'O'
    @@ -195,15 +202,4 @@ Validate the services as explained in [OPEA on AWS document](https://gist.github
    data: b'.'
    data: b' It'
    ```
    - Delete link from the knowledge base:
    ```
    [ec2-user@ip-172-31-77-194 ~]$ # delete link
    curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
    -d '{"file_path": "https://opea.dev"}' \
    -H "Content-Type: application/json"
    {"detail":"File https://opea.dev not found. Please check file_path."}
    ```

    This is giving an error: https://github.com/opea-project/GenAIExamples/issues/724

    ```
  3. arun-gupta revised this gist Sep 14, 2024. No changes.
  4. arun-gupta revised this gist Sep 14, 2024. 1 changed file with 18 additions and 16 deletions.
    34 changes: 18 additions & 16 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -20,22 +20,24 @@

    ### Install Docker:

    ```
    # Add Docker's official GPG key:
    sudo apt-get -y update
    sudo apt-get -y install ca-certificates curl
    sudo install -m 0755 -d /etc/apt/keyrings
    sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
    sudo chmod a+r /etc/apt/keyrings/docker.asc
    # Add the repository to Apt sources:
    echo \
    "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
    $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
    sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
    sudo apt-get -y update
    sudo apt-get -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
    ```
    NOTE: Copying the entire command does not work, had to copy line-by-line.

    ```
    # Add Docker's official GPG key:
    sudo apt-get -y update
    sudo apt-get -y install ca-certificates curl
    sudo install -m 0755 -d /etc/apt/keyrings
    sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
    sudo chmod a+r /etc/apt/keyrings/docker.asc
    # Add the repository to Apt sources:
    echo \
    "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
    $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
    sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
    sudo apt-get -y update
    sudo apt-get -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
    ```

    ## Docker images

  5. arun-gupta revised this gist Sep 14, 2024. 1 changed file with 6 additions and 11 deletions.
    17 changes: 6 additions & 11 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -5,23 +5,18 @@
    ### Ubuntu 22.04

    - https://portal.azure.com/
    - Name: `opea-demo`
    - Region: `(US) West US 2`
    - Availability zone: `Zone 2`
    - Image: `Ubuntu Server 24.04 LTS - x64 Gen2`
    - Size: `Standard_D8s_v4` (8 vcpus, 32 GiB memory)=
    - Key pair name: `azure-opea-key`
    - Click on `Next : Disks>`
    - Size: `Standard_D8s_v4` (8 vcpus, 32 GiB memory)
    - Key pair name: `azure-opea-demo`
    - Click on `Next : Disks >`
    - Choose OS disk size as `512 GB (p20)`
    - Select `Review + Create`
    - Once you see the message `Validation passed`, click on `Create` button
    - Click on `Download private key and create resource`

    ### Connect

    - Click on `Go to resource`
    - Click on `Connect`
    - Click on `Select` in `SSH using Azure CLI`

    - Click on `Download private key and create resource`, `Go to resource`
    - Click on `Connect` on top left, `Select` in `SSH using Azure CLI`

    ### Install Docker:

  6. arun-gupta revised this gist Sep 14, 2024. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion readme.md
    Original file line number Diff line number Diff line change
    @@ -8,7 +8,7 @@
    - Region: `(US) West US 2`
    - Availability zone: `Zone 2`
    - Image: `Ubuntu Server 24.04 LTS - x64 Gen2`
    - Size: `Standard_D8s_v4`
    - Size: `Standard_D8s_v4` (8 vcpus, 32 GiB memory)=
    - Key pair name: `azure-opea-key`
    - Click on `Next : Disks>`
    - Choose OS disk size as `512 GB (p20)`
  7. arun-gupta revised this gist Sep 14, 2024. 1 changed file with 1 addition and 194 deletions.
    195 changes: 1 addition & 194 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -106,85 +106,7 @@ Export `host_ip` environment variable:
    export host_ip=10.0.0.4
    ```

    ### Embedding service

    Test:

    ```
    curl ${host_ip}:6006/embed \
    -X POST \
    -d '{"inputs":"What is Deep Learning?"}' \
    -H 'Content-Type: application/json'
    ```

    Answer:
    ```
    [[0.00037115702,-0.06356819,0.0024758505,-0.012360337,0.050739925,0.023380278,0.022216318,0.0008076447,
    . . .
    0.022558564,-0.04570635,-0.033072025,0.022725677,0.016026087,-0.02125421,-0.02984927,-0.0049473033]]
    ```

    ### Embedding microservice
    Test:
    ```
    curl http://${host_ip}:6000/v1/embeddings\
    -X POST \
    -d '{"text":"hello"}' \
    -H 'Content-Type: application/json'
    ```

    Failing with

    Answer:
    ```
    {"id":"2d6bbb69f440491249e672d6039dfd5f","text":"hello","embedding":[0.0007791813,0.042613804
    . . .
    -0.0044034636],"search_type":"similarity","k":4,"distance_threshold":null,"fetch_k":20,"lambda_mult":0.5,"score_threshold":0.2}
    ```

    ### Retriever microservice

    Test:
    ```
    export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)")
    curl http://${host_ip}:7000/v1/retrieval \
    -X POST \
    -d "{\"text\":\"test\",\"embedding\":${your_embedding}}" \
    -H 'Content-Type: application/json'
    ```

    Answer:
    ```
    {"id":"429f21479c99008b1ade5d8720cc60dc","retrieved_docs":[],"initial_query":"test","top_n":1}
    ```

    ### TEI Reranking service

    Test:
    ```
    curl http://${host_ip}:8808/rerank \
    -X POST \
    -d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
    -H 'Content-Type: application/json'
    ```
    Answer:
    ```
    [{"index":1,"score":0.94238955},{"index":0,"score":0.120219156}]
    ```

    ### Reranking microservice

    Test:
    ```
    curl http://${host_ip}:8000/v1/reranking\
    -X POST \
    -d '{"initial_query":"What is Deep Learning?", "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \
    -H 'Content-Type: application/json'
    ```
    Answer:
    ```
    {"id":"65a489a9fae807039905008dce80ef6b","model":null,"query":"What is Deep Learning?","max_new_tokens":1024,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true,"chat_template":null,"documents":["Deep learning is..."]}
    ```
    Validate the services as explained in [OPEA on AWS document](https://gist.github.com/arun-gupta/7e9f080feff664fbab878b26d13d83d7).

    ### LLM Backend Service

    @@ -210,121 +132,6 @@ Answer:
    2024-09-12T02:14:42.516736Z WARN text_generation_router::server: router/src/server.rs:1928: Invalid hostname, defaulting to 0.0.0.0
    2024-09-12T02:14:42.528179Z INFO text_generation_router::server: router/src/server.rs:2311: Connected
    ```
    - Check TGI service:
    ```
    # TGI service
    curl http://${host_ip}:9009/generate \
    -X POST \
    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
    -H 'Content-Type: application/json'
    ```
    with the response:
    ```
    {"generated_text":"\n\nDeep Learning is a subset of machine learning which focuses on algorithms that learn from"}
    ```
    - Check vLLM service:
    ```
    curl http://${host_ip}:9009/v1/completions \
    -H "Content-Type: application/json" \
    -d '{"model": "Intel/neural-chat-7b-v3-3", "prompt": "What is Deep Learning?", "max_tokens": 32, "temperature": 0}'
    ```
    with the response:
    ```
    {"object":"text_completion","id":"","created":1726117774,"model":"Intel/neural-chat-7b-v3-3","system_fingerprint":"2.2.1-dev0-sha-e4201f4-intel-cpu","choices":[{"index":0,"text":"\n\nDeep Learning is a subset of Machine Learning that is concerned with algorithms inspired by the structure and function of the brain. It is a part of Artificial","logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":6,"completion_tokens":32,"total_tokens":38}}
    ```

    ### LLM microservice

    Test:

    ```
    curl http://${host_ip}:9000/v1/chat/completions\
    -X POST \
    -d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
    -H 'Content-Type: application/json'
    ```

    Answer:
    ```
    data: b'\n'
    data: b'\n'
    data: b'Deep'
    data: b' learning'
    data: b' is'
    data: b' a'
    data: b' subset'
    data: b' of'
    data: b' machine'
    data: b' learning'
    data: b' that'
    data: b' uses'
    data: b' algorithms'
    data: b' to'
    data: b' learn'
    data: b' from'
    data: b' data'
    data: [DONE]
    ```

    ### Megaservice

    Test:
    ```
    curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
    "messages": "What is the revenue of Nike in 2023?"
    }'
    ```

    Answer:
    ```
    data: b'\n'
    data: b'\n'
    data: b'N'
    data: b'ike'
    data: b"'"
    data: b's'
    data: b' revenue'
    data: b' for'
    . . .
    data: b' popularity'
    data: b' among'
    data: b' consumers'
    data: b'.'
    data: b'</s>'
    data: [DONE]
    ```


    ### Let's run!

  8. arun-gupta revised this gist Sep 14, 2024. 1 changed file with 12 additions and 12 deletions.
    24 changes: 12 additions & 12 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -83,19 +83,19 @@
    ```
    - Verify the list of containers:
    ```
    $ sudo docker container ls
    sudo docker container ls
    CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
    dbd94a818b0d opea/chatqna-ui:latest "docker-entrypoint.s…" 24 seconds ago Up 22 seconds 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-xeon-ui-server
    3433b05a6a0b opea/chatqna:latest "python chatqna.py" 24 seconds ago Up 22 seconds 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp chatqna-xeon-backend-server
    3c5c036ae59b opea/dataprep-redis:latest "python prepare_doc_…" 25 seconds ago Up 23 seconds 0.0.0.0:6007->6007/tcp, :::6007->6007/tcp dataprep-redis-server
    08fad4a403cc opea/retriever-redis:latest "python retriever_re…" 25 seconds ago Up 23 seconds 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server
    110c686f9c9c opea/llm-tgi:latest "bash entrypoint.sh" 25 seconds ago Up 23 seconds 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-tgi-server
    40cc0fcd293e opea/reranking-tei:latest "python reranking_te…" 25 seconds ago Up 23 seconds 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-xeon-server
    696959d09c87 opea/embedding-tei:latest "python embedding_te…" 25 seconds ago Up 23 seconds 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding-tei-server
    33549bbb37c3 ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu "text-generation-lau…" 25 seconds ago Up 23 seconds 0.0.0.0:9009->80/tcp, [::]:9009->80/tcp tgi-service
    6d48620d2958 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 25 seconds ago Up 23 seconds 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db
    e1e2e862df01 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 25 seconds ago Up 23 seconds 0.0.0.0:8808->80/tcp, [::]:8808->80/tcp tei-reranking-server
    958e04b00fa8 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 25 seconds ago Up 23 seconds 0.0.0.0:6006->80/tcp, [::]:6006->80/tcp tei-embedding-server
    f2a6fa5ea3b7 opea/chatqna-ui:latest "docker-entrypoint.s…" 12 seconds ago Up 10 seconds 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-xeon-ui-server
    c88745a81f54 opea/chatqna:latest "python chatqna.py" 12 seconds ago Up 10 seconds 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp chatqna-xeon-backend-server
    00f9b2f5c296 opea/dataprep-redis:latest "python prepare_doc_…" 12 seconds ago Up 11 seconds 0.0.0.0:6007->6007/tcp, :::6007->6007/tcp dataprep-redis-server
    886350aea6fc opea/llm-tgi:latest "bash entrypoint.sh" 12 seconds ago Up 11 seconds 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-tgi-server
    018d363ed61b opea/retriever-redis:latest "python retriever_re…" 12 seconds ago Up 11 seconds 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server
    826d5ec265f3 opea/embedding-tei:latest "python embedding_te…" 12 seconds ago Up 11 seconds 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding-tei-server
    ef4e354cf4cb opea/reranking-tei:latest "python reranking_te…" 12 seconds ago Up 11 seconds 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-xeon-server
    b2af32528f92 ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu "text-generation-lau…" 12 seconds ago Up 11 seconds 0.0.0.0:9009->80/tcp, [::]:9009->80/tcp tgi-service
    ffd17623f9a2 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 12 seconds ago Up 11 seconds 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db
    52f70df956a2 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 12 seconds ago Up 11 seconds 0.0.0.0:8808->80/tcp, [::]:8808->80/tcp tei-reranking-server
    6cd64dca38c1 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 12 seconds ago Up 11 seconds 0.0.0.0:6006->80/tcp, [::]:6006->80/tcp tei-embedding-server
    ```

    ## Validate Services
  9. arun-gupta revised this gist Sep 14, 2024. 1 changed file with 19 additions and 21 deletions.
    40 changes: 19 additions & 21 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -9,23 +9,21 @@
    - Availability zone: `Zone 2`
    - Image: `Ubuntu Server 24.04 LTS - x64 Gen2`
    - Size: `Standard_D8s_v4`
    - Key pair name: `opea-demo-key`
    - Key pair name: `azure-opea-key`
    - Click on `Next : Disks>`
    - Choose OS disk size as `512 GB (p20)`
    - Select `Review + Create`
    - Once you see the message `Validation passed`, click on `Create` button
    - Click on `Download private key and create resource`


    ### Connect

    - Click on `Go to resource`
    - Click on `Connect`
    - Click on `Select` in `SSH using Azure CLI`


    - Ubuntu 24.04 LTS

    - Install Docker:
    ### Install Docker:

    ```
    # Add Docker's official GPG key:
    @@ -53,7 +51,7 @@
    ```
    - Replace HuggingFace API token and private IP address of the host below and copy the contents in a file named `.env`:
    ```
    host_ip=10.128.0.3 #private IP address of the host
    host_ip=10.0.0.4 #private IP address of the host
    no_proxy=${host_ip}
    HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
    EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
    @@ -84,28 +82,28 @@
    sudo docker compose -f compose.yaml up -d
    ```
    - Verify the list of containers:
    ```
    arun_gupta@opea-demo:~$ sudo docker container ls
    CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
    65b54e433cfe opea/chatqna-ui:latest "docker-entrypoint.s…" 7 seconds ago Up 6 seconds 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-xeon-ui-server
    798310e0ca77 opea/chatqna:latest "python chatqna.py" 7 seconds ago Up 6 seconds 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp chatqna-xeon-backend-server
    362f3117a528 opea/dataprep-redis:latest "python prepare_doc_…" 7 seconds ago Up 6 seconds 0.0.0.0:6007->6007/tcp, :::6007->6007/tcp dataprep-redis-server
    3985a4de5dc4 opea/embedding-tei:latest "python embedding_te…" 7 seconds ago Up 6 seconds 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding-tei-server
    b41907df6672 opea/reranking-tei:latest "python reranking_te…" 7 seconds ago Up 6 seconds 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-xeon-server
    19d9a30f85de opea/llm-tgi:latest "bash entrypoint.sh" 7 seconds ago Up 6 seconds 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-tgi-server
    3fa19c8ec722 opea/retriever-redis:latest "python retriever_re…" 7 seconds ago Up 6 seconds 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server
    14b5ccd5416c ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu "text-generation-lau…" 19 seconds ago Up 7 seconds 0.0.0.0:9009->80/tcp, [::]:9009->80/tcp tgi-service
    8f58f9aaefae redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 19 seconds ago Up 7 seconds 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db
    931126a552cb ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 19 seconds ago Up 7 seconds 0.0.0.0:6006->80/tcp, [::]:6006->80/tcp tei-embedding-server
    5a2c435edc0f ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 19 seconds ago Up 7 seconds 0.0.0.0:8808->80/tcp, [::]:8808->80/tcp tei-reranking-server
    ```
    $ sudo docker container ls
    CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
    dbd94a818b0d opea/chatqna-ui:latest "docker-entrypoint.s…" 24 seconds ago Up 22 seconds 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-xeon-ui-server
    3433b05a6a0b opea/chatqna:latest "python chatqna.py" 24 seconds ago Up 22 seconds 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp chatqna-xeon-backend-server
    3c5c036ae59b opea/dataprep-redis:latest "python prepare_doc_…" 25 seconds ago Up 23 seconds 0.0.0.0:6007->6007/tcp, :::6007->6007/tcp dataprep-redis-server
    08fad4a403cc opea/retriever-redis:latest "python retriever_re…" 25 seconds ago Up 23 seconds 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server
    110c686f9c9c opea/llm-tgi:latest "bash entrypoint.sh" 25 seconds ago Up 23 seconds 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-tgi-server
    40cc0fcd293e opea/reranking-tei:latest "python reranking_te…" 25 seconds ago Up 23 seconds 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-xeon-server
    696959d09c87 opea/embedding-tei:latest "python embedding_te…" 25 seconds ago Up 23 seconds 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding-tei-server
    33549bbb37c3 ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu "text-generation-lau…" 25 seconds ago Up 23 seconds 0.0.0.0:9009->80/tcp, [::]:9009->80/tcp tgi-service
    6d48620d2958 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 25 seconds ago Up 23 seconds 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db
    e1e2e862df01 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 25 seconds ago Up 23 seconds 0.0.0.0:8808->80/tcp, [::]:8808->80/tcp tei-reranking-server
    958e04b00fa8 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 25 seconds ago Up 23 seconds 0.0.0.0:6006->80/tcp, [::]:6006->80/tcp tei-embedding-server
    ```

    ## Validate Services

    Export `host_ip` environment variable:

    ```
    export host_ip=10.128.0.3
    export host_ip=10.0.0.4
    ```

    ### Embedding service
  10. arun-gupta revised this gist Sep 14, 2024. No changes.
  11. arun-gupta revised this gist Sep 13, 2024. 1 changed file with 3 additions and 3 deletions.
    6 changes: 3 additions & 3 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -9,17 +9,17 @@
    - Availability zone: `Zone 2`
    - Image: `Ubuntu Server 24.04 LTS - x64 Gen2`
    - Size: `Standard_D8s_v4`
    - Key pair name: `opea-azure`
    - Key pair name: `opea-demo-key`
    - Click on `Next : Disks>`
    - Choose OS disk size as `512 GB (p20)`
    - Select `Review + Create`


    - Once you see the message `Validation passed`, click on `Create` button
    - Click on `Download private key and create resource`


    ### Connect

    - Click on `Go to resource`
    - Click on `Connect`


  12. arun-gupta revised this gist Sep 13, 2024. 1 changed file with 16 additions and 2 deletions.
    18 changes: 16 additions & 2 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -5,10 +5,24 @@
    ### Ubuntu 22.04

    - https://portal.azure.com/
    - `D8s_v4`
    - Region: `(US) West US 2`
    - Availability zone: `Zone 2`
    - Image: `Ubuntu Server 24.04 LTS - x64 Gen2`
    - Size: `Standard_D8s_v4`
    - Key pair name: `opea-azure`
    - Click on `Next : Disks>`
    - Choose OS disk size as `512 GB (p20)`
    - Select `Review + Create`


    - Once you see the message `Validation passed`, click on `Create` button
    - Click on `Download private key and create resource`

    ### Connect

    - Click on `Connect`


    - Change boot disk to `500 GB`
    - Ubuntu 24.04 LTS

    - Install Docker:
  13. arun-gupta revised this gist Sep 13, 2024. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion readme.md
    Original file line number Diff line number Diff line change
    @@ -5,7 +5,7 @@
    ### Ubuntu 22.04

    - https://portal.azure.com/
    - `D16ds_v5`
    - `D8s_v4`


    - Change boot disk to `500 GB`
  14. arun-gupta created this gist Sep 13, 2024.
    393 changes: 393 additions & 0 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,393 @@
    # OPEA on Microsoft Azure using Docker Compose

    ## Create your instance

    ### Ubuntu 22.04

    - https://portal.azure.com/
    - `D16ds_v5`


    - Change boot disk to `500 GB`
    - Ubuntu 24.04 LTS

    - Install Docker:

    ```
    # Add Docker's official GPG key:
    sudo apt-get -y update
    sudo apt-get -y install ca-certificates curl
    sudo install -m 0755 -d /etc/apt/keyrings
    sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
    sudo chmod a+r /etc/apt/keyrings/docker.asc
    # Add the repository to Apt sources:
    echo \
    "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
    $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
    sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
    sudo apt-get -y update
    sudo apt-get -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
    ```

    ## Docker images

    - Pull OPEA Docker images:
    ```
    sudo docker pull opea/chatqna:latest
    sudo docker pull opea/chatqna-conversation-ui:latest
    ```
    - Replace HuggingFace API token and private IP address of the host below and copy the contents in a file named `.env`:
    ```
    host_ip=10.128.0.3 #private IP address of the host
    no_proxy=${host_ip}
    HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
    EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
    RERANK_MODEL_ID="BAAI/bge-reranker-base"
    LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
    TEI_EMBEDDING_ENDPOINT="http://${host_ip}:6006"
    TEI_RERANKING_ENDPOINT="http://${host_ip}:8808"
    TGI_LLM_ENDPOINT="http://${host_ip}:9009"
    REDIS_URL="redis://${host_ip}:6379"
    INDEX_NAME="rag-redis"
    REDIS_HOST=${host_ip}
    MEGA_SERVICE_HOST_IP=${host_ip}
    EMBEDDING_SERVICE_HOST_IP=${host_ip}
    RETRIEVER_SERVICE_HOST_IP=${host_ip}
    RERANK_SERVICE_HOST_IP=${host_ip}
    LLM_SERVICE_HOST_IP=${host_ip}
    BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/chatqna"
    DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep"
    DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get_file"
    DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete_file"
    ```
    - Download Docker Compose file:
    ```
    curl -O https://raw.githubusercontent.com/opea-project/GenAIExamples/main/ChatQnA/docker_compose/intel/cpu/xeon/compose.yaml
    ```
    - Start the application:
    ```
    sudo docker compose -f compose.yaml up -d
    ```
    - Verify the list of containers:
    ```
    arun_gupta@opea-demo:~$ sudo docker container ls
    CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
    65b54e433cfe opea/chatqna-ui:latest "docker-entrypoint.s…" 7 seconds ago Up 6 seconds 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-xeon-ui-server
    798310e0ca77 opea/chatqna:latest "python chatqna.py" 7 seconds ago Up 6 seconds 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp chatqna-xeon-backend-server
    362f3117a528 opea/dataprep-redis:latest "python prepare_doc_…" 7 seconds ago Up 6 seconds 0.0.0.0:6007->6007/tcp, :::6007->6007/tcp dataprep-redis-server
    3985a4de5dc4 opea/embedding-tei:latest "python embedding_te…" 7 seconds ago Up 6 seconds 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding-tei-server
    b41907df6672 opea/reranking-tei:latest "python reranking_te…" 7 seconds ago Up 6 seconds 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-xeon-server
    19d9a30f85de opea/llm-tgi:latest "bash entrypoint.sh" 7 seconds ago Up 6 seconds 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-tgi-server
    3fa19c8ec722 opea/retriever-redis:latest "python retriever_re…" 7 seconds ago Up 6 seconds 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server
    14b5ccd5416c ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu "text-generation-lau…" 19 seconds ago Up 7 seconds 0.0.0.0:9009->80/tcp, [::]:9009->80/tcp tgi-service
    8f58f9aaefae redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 19 seconds ago Up 7 seconds 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db
    931126a552cb ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 19 seconds ago Up 7 seconds 0.0.0.0:6006->80/tcp, [::]:6006->80/tcp tei-embedding-server
    5a2c435edc0f ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 19 seconds ago Up 7 seconds 0.0.0.0:8808->80/tcp, [::]:8808->80/tcp tei-reranking-server
    ```

    ## Validate Services

    Export `host_ip` environment variable:

    ```
    export host_ip=10.128.0.3
    ```

    ### Embedding service

    Test:

    ```
    curl ${host_ip}:6006/embed \
    -X POST \
    -d '{"inputs":"What is Deep Learning?"}' \
    -H 'Content-Type: application/json'
    ```

    Answer:
    ```
    [[0.00037115702,-0.06356819,0.0024758505,-0.012360337,0.050739925,0.023380278,0.022216318,0.0008076447,
    . . .
    0.022558564,-0.04570635,-0.033072025,0.022725677,0.016026087,-0.02125421,-0.02984927,-0.0049473033]]
    ```

    ### Embedding microservice
    Test:
    ```
    curl http://${host_ip}:6000/v1/embeddings\
    -X POST \
    -d '{"text":"hello"}' \
    -H 'Content-Type: application/json'
    ```

    Failing with

    Answer:
    ```
    {"id":"2d6bbb69f440491249e672d6039dfd5f","text":"hello","embedding":[0.0007791813,0.042613804
    . . .
    -0.0044034636],"search_type":"similarity","k":4,"distance_threshold":null,"fetch_k":20,"lambda_mult":0.5,"score_threshold":0.2}
    ```

    ### Retriever microservice

    Test:
    ```
    export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)")
    curl http://${host_ip}:7000/v1/retrieval \
    -X POST \
    -d "{\"text\":\"test\",\"embedding\":${your_embedding}}" \
    -H 'Content-Type: application/json'
    ```

    Answer:
    ```
    {"id":"429f21479c99008b1ade5d8720cc60dc","retrieved_docs":[],"initial_query":"test","top_n":1}
    ```

    ### TEI Reranking service

    Test:
    ```
    curl http://${host_ip}:8808/rerank \
    -X POST \
    -d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
    -H 'Content-Type: application/json'
    ```
    Answer:
    ```
    [{"index":1,"score":0.94238955},{"index":0,"score":0.120219156}]
    ```

    ### Reranking microservice

    Test:
    ```
    curl http://${host_ip}:8000/v1/reranking\
    -X POST \
    -d '{"initial_query":"What is Deep Learning?", "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \
    -H 'Content-Type: application/json'
    ```
    Answer:
    ```
    {"id":"65a489a9fae807039905008dce80ef6b","model":null,"query":"What is Deep Learning?","max_new_tokens":1024,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true,"chat_template":null,"documents":["Deep learning is..."]}
    ```

    ### LLM Backend Service

    - Check logs:
    ```
    sudo docker logs tgi-service
    ```

    It takes ~5 minutes for this service to be ready. Wait till you see this log output:
    ```
    . . .
    2024-09-12T02:14:07.324250Z INFO shard-manager: text_generation_launcher: Shard ready in 20.620625696s rank=0
    2024-09-12T02:14:07.380398Z INFO text_generation_launcher: Starting Webserver
    2024-09-12T02:14:07.526375Z INFO text_generation_router_v3: backends/v3/src/lib.rs:90: Warming up model
    2024-09-12T02:14:42.046106Z INFO text_generation_launcher: Cuda Graphs are disabled (CUDA_GRAPHS=None).
    2024-09-12T02:14:42.046591Z INFO text_generation_router_v3: backends/v3/src/lib.rs:102: Setting max batch total tokens to 83088
    2024-09-12T02:14:42.047332Z INFO text_generation_router_v3: backends/v3/src/lib.rs:126: Using backend V3
    2024-09-12T02:14:42.051963Z INFO text_generation_router::server: router/src/server.rs:1651: Using the Hugging Face API
    2024-09-12T02:14:42.066054Z INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"
    2024-09-12T02:14:42.473427Z INFO text_generation_router::server: router/src/server.rs:2349: Serving revision 52d98fdf3b050731f1476f1af6efc7970f14c67e of model Intel/neural-chat-7b-v3-3
    2024-09-12T02:14:42.516228Z INFO text_generation_router::server: router/src/server.rs:1747: Overriding LlamaTokenizer with TemplateProcessing to follow python override defined in https://github.com/huggingface/transformers/blob/4aa17d00690b7f82c95bb2949ea57e22c35b4336/src/transformers/models/llama/tokenization_llama_fast.py#L203-L205
    2024-09-12T02:14:42.516696Z INFO text_generation_router::server: router/src/server.rs:1781: Using config Some(Mistral)
    2024-09-12T02:14:42.516736Z WARN text_generation_router::server: router/src/server.rs:1928: Invalid hostname, defaulting to 0.0.0.0
    2024-09-12T02:14:42.528179Z INFO text_generation_router::server: router/src/server.rs:2311: Connected
    ```
    - Check TGI service:
    ```
    # TGI service
    curl http://${host_ip}:9009/generate \
    -X POST \
    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
    -H 'Content-Type: application/json'
    ```
    with the response:
    ```
    {"generated_text":"\n\nDeep Learning is a subset of machine learning which focuses on algorithms that learn from"}
    ```
    - Check vLLM service:
    ```
    curl http://${host_ip}:9009/v1/completions \
    -H "Content-Type: application/json" \
    -d '{"model": "Intel/neural-chat-7b-v3-3", "prompt": "What is Deep Learning?", "max_tokens": 32, "temperature": 0}'
    ```
    with the response:
    ```
    {"object":"text_completion","id":"","created":1726117774,"model":"Intel/neural-chat-7b-v3-3","system_fingerprint":"2.2.1-dev0-sha-e4201f4-intel-cpu","choices":[{"index":0,"text":"\n\nDeep Learning is a subset of Machine Learning that is concerned with algorithms inspired by the structure and function of the brain. It is a part of Artificial","logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":6,"completion_tokens":32,"total_tokens":38}}
    ```

    ### LLM microservice

    Test:

    ```
    curl http://${host_ip}:9000/v1/chat/completions\
    -X POST \
    -d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
    -H 'Content-Type: application/json'
    ```

    Answer:
    ```
    data: b'\n'
    data: b'\n'
    data: b'Deep'
    data: b' learning'
    data: b' is'
    data: b' a'
    data: b' subset'
    data: b' of'
    data: b' machine'
    data: b' learning'
    data: b' that'
    data: b' uses'
    data: b' algorithms'
    data: b' to'
    data: b' learn'
    data: b' from'
    data: b' data'
    data: [DONE]
    ```

    ### Megaservice

    Test:
    ```
    curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
    "messages": "What is the revenue of Nike in 2023?"
    }'
    ```

    Answer:
    ```
    data: b'\n'
    data: b'\n'
    data: b'N'
    data: b'ike'
    data: b"'"
    data: b's'
    data: b' revenue'
    data: b' for'
    . . .
    data: b' popularity'
    data: b' among'
    data: b' consumers'
    data: b'.'
    data: b'</s>'
    data: [DONE]
    ```


    ### Let's run!

    #### RAG using hyperlink

    - Ask the question:
    ```
    curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
    "messages": "What is OPEA?"
    }'
    data: b'\n'
    data: b'\n'
    data: b'The'
    data: b' Oklahoma'
    data: b' Public'
    data: b' Em'
    data: b'ploy'
    data: b'ees'
    data: b' Association'
    ```
    - Update knowledge base:
    ```
    curl -X POST "http://${host_ip}:6007/v1/dataprep" \
    -H "Content-Type: multipart/form-data" \
    -F 'link_list=["https://opea.dev"]'
    {"status":200,"message":"Data preparation succeeded"}
    {"status":200,"message":"Data preparation succeeded"}status:200: command not found
    ```
    - Ask the question:
    ```
    arun_gupta@opea-demo:~$ curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
    "messages": "What is OPEA?"
    }'
    data: b'\n'
    data: b'O'
    data: b'PE'
    data: b'A'
    data: b' stands'
    data: b' for'
    data: b' Open'
    data: b' Platform'
    data: b' for'
    data: b' Enterprise'
    data: b' AI'
    data: b'.'
    data: b' It'
    ```
    - Delete link from the knowledge base:
    ```
    [ec2-user@ip-172-31-77-194 ~]$ # delete link
    curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
    -d '{"file_path": "https://opea.dev"}' \
    -H "Content-Type: application/json"
    {"detail":"File https://opea.dev not found. Please check file_path."}
    ```

    This is giving an error: https://github.com/opea-project/GenAIExamples/issues/724