Last active
September 14, 2024 21:20
-
-
Save arun-gupta/f25cfc7ee7bc1e02bc2da04abaa7e7c0 to your computer and use it in GitHub Desktop.
Revisions
-
arun-gupta revised this gist
Sep 14, 2024 . 1 changed file with 3 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -202,4 +202,6 @@ Validate the services as explained in [OPEA on AWS document](https://gist.github data: b'.' data: b' It' ``` <img width="1018" alt="image" src="https://gist.github.com/user-attachments/assets/d52bb8dc-a319-4664-b50e-dd00776a064e"> -
arun-gupta revised this gist
Sep 14, 2024 . 1 changed file with 24 additions and 28 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -79,7 +79,7 @@ sudo apt-get -y install docker-ce docker-ce-cli containerd.io docker-buildx-plug sudo docker compose -f compose.yaml up -d ``` - Verify the list of containers: ``` sudo docker container ls CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES f2a6fa5ea3b7 opea/chatqna-ui:latest "docker-entrypoint.s…" 12 seconds ago Up 10 seconds 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-xeon-ui-server @@ -115,19 +115,19 @@ Validate the services as explained in [OPEA on AWS document](https://gist.github It takes ~5 minutes for this service to be ready. Wait till you see this log output: ``` . . . 2024-09-14T20:38:05.558334Z INFO shard-manager: text_generation_launcher: Shard ready in 35.550264586s rank=0 2024-09-14T20:38:05.639996Z INFO text_generation_launcher: Starting Webserver 2024-09-14T20:38:05.708611Z INFO text_generation_router_v3: backends/v3/src/lib.rs:90: Warming up model 2024-09-14T20:54:53.025600Z INFO text_generation_launcher: Cuda Graphs are disabled (CUDA_GRAPHS=None). 2024-09-14T20:54:53.026040Z INFO text_generation_router_v3: backends/v3/src/lib.rs:102: Setting max batch total tokens to 80240 2024-09-14T20:54:53.026618Z INFO text_generation_router_v3: backends/v3/src/lib.rs:126: Using backend V3 2024-09-14T20:54:53.029554Z INFO text_generation_router::server: router/src/server.rs:1651: Using the Hugging Face API 2024-09-14T20:54:53.037101Z INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token" 2024-09-14T20:54:53.467570Z INFO text_generation_router::server: router/src/server.rs:2349: Serving revision 52d98fdf3b050731f1476f1af6efc7970f14c67e of model Intel/neural-chat-7b-v3-3 2024-09-14T20:54:53.513362Z INFO text_generation_router::server: router/src/server.rs:1747: Overriding LlamaTokenizer with TemplateProcessing to follow python override defined in https://github.com/huggingface/transformers/blob/4aa17d00690b7f82c95bb2949ea57e22c35b4336/src/transformers/models/llama/tokenization_llama_fast.py#L203-L205 2024-09-14T20:54:53.513655Z INFO text_generation_router::server: router/src/server.rs:1781: Using config Some(Mistral) 2024-09-14T20:54:53.513707Z WARN text_generation_router::server: router/src/server.rs:1928: Invalid hostname, defaulting to 0.0.0.0 2024-09-14T20:54:53.523637Z INFO text_generation_router::server: router/src/server.rs:2311: Connected ``` ### Let's run! @@ -162,14 +162,21 @@ Validate the services as explained in [OPEA on AWS document](https://gist.github curl -X POST "http://${host_ip}:6007/v1/dataprep" \ -H "Content-Type: multipart/form-data" \ -F 'link_list=["https://opea.dev"]' ``` with the answer: ``` {"status":200,"message":"Data preparation succeeded"} ``` - Ask the question: ``` curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{ "messages": "What is OPEA?" }' ``` with the answer: ``` data: b'\n' data: b'O' @@ -195,15 +202,4 @@ Validate the services as explained in [OPEA on AWS document](https://gist.github data: b'.' data: b' It' ``` -
arun-gupta revised this gist
Sep 14, 2024 . No changes.There are no files selected for viewing
-
arun-gupta revised this gist
Sep 14, 2024 . 1 changed file with 18 additions and 16 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -20,22 +20,24 @@ ### Install Docker: NOTE: Copying the entire command does not work, had to copy line-by-line. ``` # Add Docker's official GPG key: sudo apt-get -y update sudo apt-get -y install ca-certificates curl sudo install -m 0755 -d /etc/apt/keyrings sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc sudo chmod a+r /etc/apt/keyrings/docker.asc # Add the repository to Apt sources: echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \ $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \ sudo tee /etc/apt/sources.list.d/docker.list > /dev/null sudo apt-get -y update sudo apt-get -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin ``` ## Docker images -
arun-gupta revised this gist
Sep 14, 2024 . 1 changed file with 6 additions and 11 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -5,23 +5,18 @@ ### Ubuntu 22.04 - https://portal.azure.com/ - Name: `opea-demo` - Region: `(US) West US 2` - Availability zone: `Zone 2` - Image: `Ubuntu Server 24.04 LTS - x64 Gen2` - Size: `Standard_D8s_v4` (8 vcpus, 32 GiB memory) - Key pair name: `azure-opea-demo` - Click on `Next : Disks >` - Choose OS disk size as `512 GB (p20)` - Select `Review + Create` - Once you see the message `Validation passed`, click on `Create` button - Click on `Download private key and create resource`, `Go to resource` - Click on `Connect` on top left, `Select` in `SSH using Azure CLI` ### Install Docker: -
arun-gupta revised this gist
Sep 14, 2024 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -8,7 +8,7 @@ - Region: `(US) West US 2` - Availability zone: `Zone 2` - Image: `Ubuntu Server 24.04 LTS - x64 Gen2` - Size: `Standard_D8s_v4` (8 vcpus, 32 GiB memory)= - Key pair name: `azure-opea-key` - Click on `Next : Disks>` - Choose OS disk size as `512 GB (p20)` -
arun-gupta revised this gist
Sep 14, 2024 . 1 changed file with 1 addition and 194 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -106,85 +106,7 @@ Export `host_ip` environment variable: export host_ip=10.0.0.4 ``` Validate the services as explained in [OPEA on AWS document](https://gist.github.com/arun-gupta/7e9f080feff664fbab878b26d13d83d7). ### LLM Backend Service @@ -210,121 +132,6 @@ Answer: 2024-09-12T02:14:42.516736Z WARN text_generation_router::server: router/src/server.rs:1928: Invalid hostname, defaulting to 0.0.0.0 2024-09-12T02:14:42.528179Z INFO text_generation_router::server: router/src/server.rs:2311: Connected ``` ### Let's run! -
arun-gupta revised this gist
Sep 14, 2024 . 1 changed file with 12 additions and 12 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -83,19 +83,19 @@ ``` - Verify the list of containers: ``` sudo docker container ls CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES f2a6fa5ea3b7 opea/chatqna-ui:latest "docker-entrypoint.s…" 12 seconds ago Up 10 seconds 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-xeon-ui-server c88745a81f54 opea/chatqna:latest "python chatqna.py" 12 seconds ago Up 10 seconds 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp chatqna-xeon-backend-server 00f9b2f5c296 opea/dataprep-redis:latest "python prepare_doc_…" 12 seconds ago Up 11 seconds 0.0.0.0:6007->6007/tcp, :::6007->6007/tcp dataprep-redis-server 886350aea6fc opea/llm-tgi:latest "bash entrypoint.sh" 12 seconds ago Up 11 seconds 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-tgi-server 018d363ed61b opea/retriever-redis:latest "python retriever_re…" 12 seconds ago Up 11 seconds 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server 826d5ec265f3 opea/embedding-tei:latest "python embedding_te…" 12 seconds ago Up 11 seconds 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding-tei-server ef4e354cf4cb opea/reranking-tei:latest "python reranking_te…" 12 seconds ago Up 11 seconds 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-xeon-server b2af32528f92 ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu "text-generation-lau…" 12 seconds ago Up 11 seconds 0.0.0.0:9009->80/tcp, [::]:9009->80/tcp tgi-service ffd17623f9a2 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 12 seconds ago Up 11 seconds 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db 52f70df956a2 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 12 seconds ago Up 11 seconds 0.0.0.0:8808->80/tcp, [::]:8808->80/tcp tei-reranking-server 6cd64dca38c1 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 12 seconds ago Up 11 seconds 0.0.0.0:6006->80/tcp, [::]:6006->80/tcp tei-embedding-server ``` ## Validate Services -
arun-gupta revised this gist
Sep 14, 2024 . 1 changed file with 19 additions and 21 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -9,23 +9,21 @@ - Availability zone: `Zone 2` - Image: `Ubuntu Server 24.04 LTS - x64 Gen2` - Size: `Standard_D8s_v4` - Key pair name: `azure-opea-key` - Click on `Next : Disks>` - Choose OS disk size as `512 GB (p20)` - Select `Review + Create` - Once you see the message `Validation passed`, click on `Create` button - Click on `Download private key and create resource` ### Connect - Click on `Go to resource` - Click on `Connect` - Click on `Select` in `SSH using Azure CLI` ### Install Docker: ``` # Add Docker's official GPG key: @@ -53,7 +51,7 @@ ``` - Replace HuggingFace API token and private IP address of the host below and copy the contents in a file named `.env`: ``` host_ip=10.0.0.4 #private IP address of the host no_proxy=${host_ip} HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5" @@ -84,28 +82,28 @@ sudo docker compose -f compose.yaml up -d ``` - Verify the list of containers: ``` $ sudo docker container ls CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES dbd94a818b0d opea/chatqna-ui:latest "docker-entrypoint.s…" 24 seconds ago Up 22 seconds 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-xeon-ui-server 3433b05a6a0b opea/chatqna:latest "python chatqna.py" 24 seconds ago Up 22 seconds 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp chatqna-xeon-backend-server 3c5c036ae59b opea/dataprep-redis:latest "python prepare_doc_…" 25 seconds ago Up 23 seconds 0.0.0.0:6007->6007/tcp, :::6007->6007/tcp dataprep-redis-server 08fad4a403cc opea/retriever-redis:latest "python retriever_re…" 25 seconds ago Up 23 seconds 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server 110c686f9c9c opea/llm-tgi:latest "bash entrypoint.sh" 25 seconds ago Up 23 seconds 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-tgi-server 40cc0fcd293e opea/reranking-tei:latest "python reranking_te…" 25 seconds ago Up 23 seconds 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-xeon-server 696959d09c87 opea/embedding-tei:latest "python embedding_te…" 25 seconds ago Up 23 seconds 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding-tei-server 33549bbb37c3 ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu "text-generation-lau…" 25 seconds ago Up 23 seconds 0.0.0.0:9009->80/tcp, [::]:9009->80/tcp tgi-service 6d48620d2958 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 25 seconds ago Up 23 seconds 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db e1e2e862df01 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 25 seconds ago Up 23 seconds 0.0.0.0:8808->80/tcp, [::]:8808->80/tcp tei-reranking-server 958e04b00fa8 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 25 seconds ago Up 23 seconds 0.0.0.0:6006->80/tcp, [::]:6006->80/tcp tei-embedding-server ``` ## Validate Services Export `host_ip` environment variable: ``` export host_ip=10.0.0.4 ``` ### Embedding service -
arun-gupta revised this gist
Sep 14, 2024 . No changes.There are no files selected for viewing
-
arun-gupta revised this gist
Sep 13, 2024 . 1 changed file with 3 additions and 3 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -9,17 +9,17 @@ - Availability zone: `Zone 2` - Image: `Ubuntu Server 24.04 LTS - x64 Gen2` - Size: `Standard_D8s_v4` - Key pair name: `opea-demo-key` - Click on `Next : Disks>` - Choose OS disk size as `512 GB (p20)` - Select `Review + Create` - Once you see the message `Validation passed`, click on `Create` button - Click on `Download private key and create resource` ### Connect - Click on `Go to resource` - Click on `Connect` -
arun-gupta revised this gist
Sep 13, 2024 . 1 changed file with 16 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -5,10 +5,24 @@ ### Ubuntu 22.04 - https://portal.azure.com/ - Region: `(US) West US 2` - Availability zone: `Zone 2` - Image: `Ubuntu Server 24.04 LTS - x64 Gen2` - Size: `Standard_D8s_v4` - Key pair name: `opea-azure` - Click on `Next : Disks>` - Choose OS disk size as `512 GB (p20)` - Select `Review + Create` - Once you see the message `Validation passed`, click on `Create` button - Click on `Download private key and create resource` ### Connect - Click on `Connect` - Ubuntu 24.04 LTS - Install Docker: -
arun-gupta revised this gist
Sep 13, 2024 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -5,7 +5,7 @@ ### Ubuntu 22.04 - https://portal.azure.com/ - `D8s_v4` - Change boot disk to `500 GB` -
arun-gupta created this gist
Sep 13, 2024 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,393 @@ # OPEA on Microsoft Azure using Docker Compose ## Create your instance ### Ubuntu 22.04 - https://portal.azure.com/ - `D16ds_v5` - Change boot disk to `500 GB` - Ubuntu 24.04 LTS - Install Docker: ``` # Add Docker's official GPG key: sudo apt-get -y update sudo apt-get -y install ca-certificates curl sudo install -m 0755 -d /etc/apt/keyrings sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc sudo chmod a+r /etc/apt/keyrings/docker.asc # Add the repository to Apt sources: echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \ $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \ sudo tee /etc/apt/sources.list.d/docker.list > /dev/null sudo apt-get -y update sudo apt-get -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin ``` ## Docker images - Pull OPEA Docker images: ``` sudo docker pull opea/chatqna:latest sudo docker pull opea/chatqna-conversation-ui:latest ``` - Replace HuggingFace API token and private IP address of the host below and copy the contents in a file named `.env`: ``` host_ip=10.128.0.3 #private IP address of the host no_proxy=${host_ip} HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5" RERANK_MODEL_ID="BAAI/bge-reranker-base" LLM_MODEL_ID="Intel/neural-chat-7b-v3-3" TEI_EMBEDDING_ENDPOINT="http://${host_ip}:6006" TEI_RERANKING_ENDPOINT="http://${host_ip}:8808" TGI_LLM_ENDPOINT="http://${host_ip}:9009" REDIS_URL="redis://${host_ip}:6379" INDEX_NAME="rag-redis" REDIS_HOST=${host_ip} MEGA_SERVICE_HOST_IP=${host_ip} EMBEDDING_SERVICE_HOST_IP=${host_ip} RETRIEVER_SERVICE_HOST_IP=${host_ip} RERANK_SERVICE_HOST_IP=${host_ip} LLM_SERVICE_HOST_IP=${host_ip} BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/chatqna" DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep" DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get_file" DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete_file" ``` - Download Docker Compose file: ``` curl -O https://raw.githubusercontent.com/opea-project/GenAIExamples/main/ChatQnA/docker_compose/intel/cpu/xeon/compose.yaml ``` - Start the application: ``` sudo docker compose -f compose.yaml up -d ``` - Verify the list of containers: ``` arun_gupta@opea-demo:~$ sudo docker container ls CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 65b54e433cfe opea/chatqna-ui:latest "docker-entrypoint.s…" 7 seconds ago Up 6 seconds 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-xeon-ui-server 798310e0ca77 opea/chatqna:latest "python chatqna.py" 7 seconds ago Up 6 seconds 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp chatqna-xeon-backend-server 362f3117a528 opea/dataprep-redis:latest "python prepare_doc_…" 7 seconds ago Up 6 seconds 0.0.0.0:6007->6007/tcp, :::6007->6007/tcp dataprep-redis-server 3985a4de5dc4 opea/embedding-tei:latest "python embedding_te…" 7 seconds ago Up 6 seconds 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding-tei-server b41907df6672 opea/reranking-tei:latest "python reranking_te…" 7 seconds ago Up 6 seconds 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-xeon-server 19d9a30f85de opea/llm-tgi:latest "bash entrypoint.sh" 7 seconds ago Up 6 seconds 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-tgi-server 3fa19c8ec722 opea/retriever-redis:latest "python retriever_re…" 7 seconds ago Up 6 seconds 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server 14b5ccd5416c ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu "text-generation-lau…" 19 seconds ago Up 7 seconds 0.0.0.0:9009->80/tcp, [::]:9009->80/tcp tgi-service 8f58f9aaefae redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 19 seconds ago Up 7 seconds 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db 931126a552cb ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 19 seconds ago Up 7 seconds 0.0.0.0:6006->80/tcp, [::]:6006->80/tcp tei-embedding-server 5a2c435edc0f ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 19 seconds ago Up 7 seconds 0.0.0.0:8808->80/tcp, [::]:8808->80/tcp tei-reranking-server ``` ## Validate Services Export `host_ip` environment variable: ``` export host_ip=10.128.0.3 ``` ### Embedding service Test: ``` curl ${host_ip}:6006/embed \ -X POST \ -d '{"inputs":"What is Deep Learning?"}' \ -H 'Content-Type: application/json' ``` Answer: ``` [[0.00037115702,-0.06356819,0.0024758505,-0.012360337,0.050739925,0.023380278,0.022216318,0.0008076447, . . . 0.022558564,-0.04570635,-0.033072025,0.022725677,0.016026087,-0.02125421,-0.02984927,-0.0049473033]] ``` ### Embedding microservice Test: ``` curl http://${host_ip}:6000/v1/embeddings\ -X POST \ -d '{"text":"hello"}' \ -H 'Content-Type: application/json' ``` Failing with Answer: ``` {"id":"2d6bbb69f440491249e672d6039dfd5f","text":"hello","embedding":[0.0007791813,0.042613804 . . . -0.0044034636],"search_type":"similarity","k":4,"distance_threshold":null,"fetch_k":20,"lambda_mult":0.5,"score_threshold":0.2} ``` ### Retriever microservice Test: ``` export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)") curl http://${host_ip}:7000/v1/retrieval \ -X POST \ -d "{\"text\":\"test\",\"embedding\":${your_embedding}}" \ -H 'Content-Type: application/json' ``` Answer: ``` {"id":"429f21479c99008b1ade5d8720cc60dc","retrieved_docs":[],"initial_query":"test","top_n":1} ``` ### TEI Reranking service Test: ``` curl http://${host_ip}:8808/rerank \ -X POST \ -d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \ -H 'Content-Type: application/json' ``` Answer: ``` [{"index":1,"score":0.94238955},{"index":0,"score":0.120219156}] ``` ### Reranking microservice Test: ``` curl http://${host_ip}:8000/v1/reranking\ -X POST \ -d '{"initial_query":"What is Deep Learning?", "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \ -H 'Content-Type: application/json' ``` Answer: ``` {"id":"65a489a9fae807039905008dce80ef6b","model":null,"query":"What is Deep Learning?","max_new_tokens":1024,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true,"chat_template":null,"documents":["Deep learning is..."]} ``` ### LLM Backend Service - Check logs: ``` sudo docker logs tgi-service ``` It takes ~5 minutes for this service to be ready. Wait till you see this log output: ``` . . . 2024-09-12T02:14:07.324250Z INFO shard-manager: text_generation_launcher: Shard ready in 20.620625696s rank=0 2024-09-12T02:14:07.380398Z INFO text_generation_launcher: Starting Webserver 2024-09-12T02:14:07.526375Z INFO text_generation_router_v3: backends/v3/src/lib.rs:90: Warming up model 2024-09-12T02:14:42.046106Z INFO text_generation_launcher: Cuda Graphs are disabled (CUDA_GRAPHS=None). 2024-09-12T02:14:42.046591Z INFO text_generation_router_v3: backends/v3/src/lib.rs:102: Setting max batch total tokens to 83088 2024-09-12T02:14:42.047332Z INFO text_generation_router_v3: backends/v3/src/lib.rs:126: Using backend V3 2024-09-12T02:14:42.051963Z INFO text_generation_router::server: router/src/server.rs:1651: Using the Hugging Face API 2024-09-12T02:14:42.066054Z INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token" 2024-09-12T02:14:42.473427Z INFO text_generation_router::server: router/src/server.rs:2349: Serving revision 52d98fdf3b050731f1476f1af6efc7970f14c67e of model Intel/neural-chat-7b-v3-3 2024-09-12T02:14:42.516228Z INFO text_generation_router::server: router/src/server.rs:1747: Overriding LlamaTokenizer with TemplateProcessing to follow python override defined in https://github.com/huggingface/transformers/blob/4aa17d00690b7f82c95bb2949ea57e22c35b4336/src/transformers/models/llama/tokenization_llama_fast.py#L203-L205 2024-09-12T02:14:42.516696Z INFO text_generation_router::server: router/src/server.rs:1781: Using config Some(Mistral) 2024-09-12T02:14:42.516736Z WARN text_generation_router::server: router/src/server.rs:1928: Invalid hostname, defaulting to 0.0.0.0 2024-09-12T02:14:42.528179Z INFO text_generation_router::server: router/src/server.rs:2311: Connected ``` - Check TGI service: ``` # TGI service curl http://${host_ip}:9009/generate \ -X POST \ -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \ -H 'Content-Type: application/json' ``` with the response: ``` {"generated_text":"\n\nDeep Learning is a subset of machine learning which focuses on algorithms that learn from"} ``` - Check vLLM service: ``` curl http://${host_ip}:9009/v1/completions \ -H "Content-Type: application/json" \ -d '{"model": "Intel/neural-chat-7b-v3-3", "prompt": "What is Deep Learning?", "max_tokens": 32, "temperature": 0}' ``` with the response: ``` {"object":"text_completion","id":"","created":1726117774,"model":"Intel/neural-chat-7b-v3-3","system_fingerprint":"2.2.1-dev0-sha-e4201f4-intel-cpu","choices":[{"index":0,"text":"\n\nDeep Learning is a subset of Machine Learning that is concerned with algorithms inspired by the structure and function of the brain. It is a part of Artificial","logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":6,"completion_tokens":32,"total_tokens":38}} ``` ### LLM microservice Test: ``` curl http://${host_ip}:9000/v1/chat/completions\ -X POST \ -d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \ -H 'Content-Type: application/json' ``` Answer: ``` data: b'\n' data: b'\n' data: b'Deep' data: b' learning' data: b' is' data: b' a' data: b' subset' data: b' of' data: b' machine' data: b' learning' data: b' that' data: b' uses' data: b' algorithms' data: b' to' data: b' learn' data: b' from' data: b' data' data: [DONE] ``` ### Megaservice Test: ``` curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{ "messages": "What is the revenue of Nike in 2023?" }' ``` Answer: ``` data: b'\n' data: b'\n' data: b'N' data: b'ike' data: b"'" data: b's' data: b' revenue' data: b' for' . . . data: b' popularity' data: b' among' data: b' consumers' data: b'.' data: b'</s>' data: [DONE] ``` ### Let's run! #### RAG using hyperlink - Ask the question: ``` curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{ "messages": "What is OPEA?" }' data: b'\n' data: b'\n' data: b'The' data: b' Oklahoma' data: b' Public' data: b' Em' data: b'ploy' data: b'ees' data: b' Association' ``` - Update knowledge base: ``` curl -X POST "http://${host_ip}:6007/v1/dataprep" \ -H "Content-Type: multipart/form-data" \ -F 'link_list=["https://opea.dev"]' {"status":200,"message":"Data preparation succeeded"} {"status":200,"message":"Data preparation succeeded"}status:200: command not found ``` - Ask the question: ``` arun_gupta@opea-demo:~$ curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{ "messages": "What is OPEA?" }' data: b'\n' data: b'O' data: b'PE' data: b'A' data: b' stands' data: b' for' data: b' Open' data: b' Platform' data: b' for' data: b' Enterprise' data: b' AI' data: b'.' data: b' It' ``` - Delete link from the knowledge base: ``` [ec2-user@ip-172-31-77-194 ~]$ # delete link curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ -d '{"file_path": "https://opea.dev"}' \ -H "Content-Type: application/json" {"detail":"File https://opea.dev not found. Please check file_path."} ``` This is giving an error: https://github.com/opea-project/GenAIExamples/issues/724