arun-gupta · September 14, 2024 21:20 · Sep 14, 2024 · Sep 14, 2024 · Sep 14, 2024 · Sep 14, 2024
diff --git a/readme.md b/readme.md
@@ -202,4 +202,6 @@ Validate the services as explained in [OPEA on AWS document](https://gist.github
   data: b'.'
 
   data: b' It'
-  ```
+  ```
+
+  <img width="1018" alt="image" src="https://gist.github.com/user-attachments/assets/d52bb8dc-a319-4664-b50e-dd00776a064e">
diff --git a/readme.md b/readme.md
@@ -79,7 +79,7 @@ sudo apt-get -y install docker-ce docker-ce-cli containerd.io docker-buildx-plug
   sudo docker compose -f compose.yaml up -d
   ```
 - Verify the list of containers:
-    ```
+  ```
   sudo docker container ls
   CONTAINER ID   IMAGE                                                                 COMMAND                  CREATED          STATUS          PORTS                                                                                  NAMES
   f2a6fa5ea3b7   opea/chatqna-ui:latest                                                "docker-entrypoint.s…"   12 seconds ago   Up 10 seconds   0.0.0.0:5173->5173/tcp, :::5173->5173/tcp                                              chatqna-xeon-ui-server
@@ -115,19 +115,19 @@ Validate the services as explained in [OPEA on AWS document](https://gist.github
   It takes ~5 minutes for this service to be ready. Wait till you see this log output:
   ```
   . . .
-  2024-09-12T02:14:07.324250Z  INFO shard-manager: text_generation_launcher: Shard ready in 20.620625696s rank=0
-  2024-09-12T02:14:07.380398Z  INFO text_generation_launcher: Starting Webserver
-  2024-09-12T02:14:07.526375Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:90: Warming up model
-  2024-09-12T02:14:42.046106Z  INFO text_generation_launcher: Cuda Graphs are disabled (CUDA_GRAPHS=None).
-  2024-09-12T02:14:42.046591Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:102: Setting max batch total tokens to 83088
-  2024-09-12T02:14:42.047332Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:126: Using backend V3
-  2024-09-12T02:14:42.051963Z  INFO text_generation_router::server: router/src/server.rs:1651: Using the Hugging Face API
-  2024-09-12T02:14:42.066054Z  INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"    
-  2024-09-12T02:14:42.473427Z  INFO text_generation_router::server: router/src/server.rs:2349: Serving revision 52d98fdf3b050731f1476f1af6efc7970f14c67e of model Intel/neural-chat-7b-v3-3
-  2024-09-12T02:14:42.516228Z  INFO text_generation_router::server: router/src/server.rs:1747: Overriding LlamaTokenizer with TemplateProcessing to follow python override defined in https://github.com/huggingface/transformers/blob/4aa17d00690b7f82c95bb2949ea57e22c35b4336/src/transformers/models/llama/tokenization_llama_fast.py#L203-L205
-  2024-09-12T02:14:42.516696Z  INFO text_generation_router::server: router/src/server.rs:1781: Using config Some(Mistral)
-  2024-09-12T02:14:42.516736Z  WARN text_generation_router::server: router/src/server.rs:1928: Invalid hostname, defaulting to 0.0.0.0
-  2024-09-12T02:14:42.528179Z  INFO text_generation_router::server: router/src/server.rs:2311: Connected
+  2024-09-14T20:38:05.558334Z  INFO shard-manager: text_generation_launcher: Shard ready in 35.550264586s rank=0
+  2024-09-14T20:38:05.639996Z  INFO text_generation_launcher: Starting Webserver
+  2024-09-14T20:38:05.708611Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:90: Warming up model
+  2024-09-14T20:54:53.025600Z  INFO text_generation_launcher: Cuda Graphs are disabled (CUDA_GRAPHS=None).
+  2024-09-14T20:54:53.026040Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:102: Setting max batch total tokens to 80240
+  2024-09-14T20:54:53.026618Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:126: Using backend V3
+  2024-09-14T20:54:53.029554Z  INFO text_generation_router::server: router/src/server.rs:1651: Using the Hugging Face API
+  2024-09-14T20:54:53.037101Z  INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"    
+  2024-09-14T20:54:53.467570Z  INFO text_generation_router::server: router/src/server.rs:2349: Serving revision 52d98fdf3b050731f1476f1af6efc7970f14c67e of model Intel/neural-chat-7b-v3-3
+  2024-09-14T20:54:53.513362Z  INFO text_generation_router::server: router/src/server.rs:1747: Overriding LlamaTokenizer with TemplateProcessing to follow python override defined in https://github.com/huggingface/transformers/blob/4aa17d00690b7f82c95bb2949ea57e22c35b4336/src/transformers/models/llama/tokenization_llama_fast.py#L203-L205
+  2024-09-14T20:54:53.513655Z  INFO text_generation_router::server: router/src/server.rs:1781: Using config Some(Mistral)
+  2024-09-14T20:54:53.513707Z  WARN text_generation_router::server: router/src/server.rs:1928: Invalid hostname, defaulting to 0.0.0.0
+  2024-09-14T20:54:53.523637Z  INFO text_generation_router::server: router/src/server.rs:2311: Connected
   ```
 
 ### Let's run!
@@ -162,14 +162,21 @@ Validate the services as explained in [OPEA on AWS document](https://gist.github
   curl -X POST "http://${host_ip}:6007/v1/dataprep" \
        -H "Content-Type: multipart/form-data" \
        -F 'link_list=["https://opea.dev"]'
+  ```
+  with the answer:
+  ```
   {"status":200,"message":"Data preparation succeeded"}
-  {"status":200,"message":"Data preparation succeeded"}status:200: command not found
   ```
 - Ask the question:
   ```
-  arun_gupta@opea-demo:~$ curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
+  curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
        "messages": "What is OPEA?"
      }'
+  ```
+
+  with the answer:
+
+  ```
   data: b'\n'
 
   data: b'O'
@@ -195,15 +202,4 @@ Validate the services as explained in [OPEA on AWS document](https://gist.github
   data: b'.'
 
   data: b' It'
-  ```
-- Delete link from the knowledge base:
-  ```
-  [ec2-user@ip-172-31-77-194 ~]$ # delete link
-  curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
-       -d '{"file_path": "https://opea.dev"}' \
-       -H "Content-Type: application/json"
-  {"detail":"File https://opea.dev not found. Please check file_path."}
-  ```
-
-  This is giving an error: https://github.com/opea-project/GenAIExamples/issues/724
-
+  ```
diff --git a/readme.md b/readme.md
@@ -20,22 +20,24 @@
 
 ### Install Docker:
 
-  ```
-  # Add Docker's official GPG key:
-  sudo apt-get -y update
-  sudo apt-get -y install ca-certificates curl
-  sudo install -m 0755 -d /etc/apt/keyrings
-  sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
-  sudo chmod a+r /etc/apt/keyrings/docker.asc
-
-  # Add the repository to Apt sources:
-  echo \
-    "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
-    $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
-    sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
-  sudo apt-get -y update
-  sudo apt-get -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
-  ```
+NOTE: Copying the entire command does not work, had to copy line-by-line.
+
+```
+# Add Docker's official GPG key:
+sudo apt-get -y update
+sudo apt-get -y install ca-certificates curl
+sudo install -m 0755 -d /etc/apt/keyrings
+sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
+sudo chmod a+r /etc/apt/keyrings/docker.asc
+
+# Add the repository to Apt sources:
+echo \
+  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
+  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
+  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
+sudo apt-get -y update
+sudo apt-get -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
+```
 
 ## Docker images
 

diff --git a/readme.md b/readme.md
@@ -5,23 +5,18 @@
 ### Ubuntu 22.04
 
 - https://portal.azure.com/
+- Name: `opea-demo`
 - Region: `(US) West US 2`
 - Availability zone: `Zone 2`
 - Image: `Ubuntu Server 24.04 LTS - x64 Gen2`
-- Size: `Standard_D8s_v4` (8 vcpus, 32 GiB memory)=
-- Key pair name: `azure-opea-key`
-- Click on `Next : Disks>`
+- Size: `Standard_D8s_v4` (8 vcpus, 32 GiB memory)
+- Key pair name: `azure-opea-demo`
+- Click on `Next : Disks >`
 - Choose OS disk size as `512 GB (p20)`
 - Select `Review + Create`
 - Once you see the message `Validation passed`, click on `Create` button
-- Click on `Download private key and create resource`
-
-### Connect
-
-- Click on `Go to resource`
-- Click on `Connect`
-- Click on `Select` in `SSH using Azure CLI`
-
+- Click on `Download private key and create resource`, `Go to resource`
+- Click on `Connect` on top left, `Select` in `SSH using Azure CLI`
 
 ### Install Docker:
 

diff --git a/readme.md b/readme.md
@@ -8,7 +8,7 @@
 - Region: `(US) West US 2`
 - Availability zone: `Zone 2`
 - Image: `Ubuntu Server 24.04 LTS - x64 Gen2`
-- Size: `Standard_D8s_v4`
+- Size: `Standard_D8s_v4` (8 vcpus, 32 GiB memory)=
 - Key pair name: `azure-opea-key`
 - Click on `Next : Disks>`
 - Choose OS disk size as `512 GB (p20)`

diff --git a/readme.md b/readme.md
@@ -106,85 +106,7 @@ Export `host_ip` environment variable:
 export host_ip=10.0.0.4
 ```
 
-### Embedding service
-
-Test:
-
-```
-curl ${host_ip}:6006/embed \
-  -X POST \
-  -d '{"inputs":"What is Deep Learning?"}' \
-  -H 'Content-Type: application/json'
-```
-
-Answer:
-```
-[[0.00037115702,-0.06356819,0.0024758505,-0.012360337,0.050739925,0.023380278,0.022216318,0.0008076447,
-  . . . 
-0.022558564,-0.04570635,-0.033072025,0.022725677,0.016026087,-0.02125421,-0.02984927,-0.0049473033]]
-```
-
-### Embedding microservice
-Test:
-```
-curl http://${host_ip}:6000/v1/embeddings\
-  -X POST \
-  -d '{"text":"hello"}' \
-  -H 'Content-Type: application/json'
-```
-
-Failing with 
-
-Answer:
-```
-{"id":"2d6bbb69f440491249e672d6039dfd5f","text":"hello","embedding":[0.0007791813,0.042613804
-. . .
--0.0044034636],"search_type":"similarity","k":4,"distance_threshold":null,"fetch_k":20,"lambda_mult":0.5,"score_threshold":0.2}
-```
-
-### Retriever microservice
-
-Test:
-```
-export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)")
-curl http://${host_ip}:7000/v1/retrieval \
-  -X POST \
-  -d "{\"text\":\"test\",\"embedding\":${your_embedding}}" \
-  -H 'Content-Type: application/json'
-```
-
-Answer:
-```
-{"id":"429f21479c99008b1ade5d8720cc60dc","retrieved_docs":[],"initial_query":"test","top_n":1}
-```
-
-### TEI Reranking service
-
-Test:
-```
-curl http://${host_ip}:8808/rerank \
-    -X POST \
-    -d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
-    -H 'Content-Type: application/json'
-```
-Answer:
-```
-[{"index":1,"score":0.94238955},{"index":0,"score":0.120219156}]
-```
-
-### Reranking microservice
-
-Test:
-```
-curl http://${host_ip}:8000/v1/reranking\
-  -X POST \
-  -d '{"initial_query":"What is Deep Learning?", "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \
-  -H 'Content-Type: application/json'
-```
-Answer:
-```
-{"id":"65a489a9fae807039905008dce80ef6b","model":null,"query":"What is Deep Learning?","max_new_tokens":1024,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true,"chat_template":null,"documents":["Deep learning is..."]}
-```
+Validate the services as explained in [OPEA on AWS document](https://gist.github.com/arun-gupta/7e9f080feff664fbab878b26d13d83d7).
 
 ### LLM Backend Service
 
@@ -210,121 +132,6 @@ Answer:
   2024-09-12T02:14:42.516736Z  WARN text_generation_router::server: router/src/server.rs:1928: Invalid hostname, defaulting to 0.0.0.0
   2024-09-12T02:14:42.528179Z  INFO text_generation_router::server: router/src/server.rs:2311: Connected
   ```
-- Check TGI service:
-  ```
-  # TGI service
-  curl http://${host_ip}:9009/generate \
-    -X POST \
-    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
-    -H 'Content-Type: application/json'
-  ```
-  with the response:
-  ```
-  {"generated_text":"\n\nDeep Learning is a subset of machine learning which focuses on algorithms that learn from"}
-  ```
-- Check vLLM service:
-  ```
-  curl http://${host_ip}:9009/v1/completions \
-    -H "Content-Type: application/json" \
-    -d '{"model": "Intel/neural-chat-7b-v3-3", "prompt": "What is Deep Learning?", "max_tokens": 32, "temperature": 0}'
-  ```
-  with the response:
-  ```
-  {"object":"text_completion","id":"","created":1726117774,"model":"Intel/neural-chat-7b-v3-3","system_fingerprint":"2.2.1-dev0-sha-e4201f4-intel-cpu","choices":[{"index":0,"text":"\n\nDeep Learning is a subset of Machine Learning that is concerned with algorithms inspired by the structure and function of the brain. It is a part of Artificial","logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":6,"completion_tokens":32,"total_tokens":38}}
-  ```
-
-### LLM microservice
-
-Test:
-
-```
-curl http://${host_ip}:9000/v1/chat/completions\
-  -X POST \
-  -d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
-  -H 'Content-Type: application/json'
-```
-
-Answer:
-```
-data: b'\n'
-
-data: b'\n'
-
-data: b'Deep'
-
-data: b' learning'
-
-data: b' is'
-
-data: b' a'
-
-data: b' subset'
-
-data: b' of'
-
-data: b' machine'
-
-data: b' learning'
-
-data: b' that'
-
-data: b' uses'
-
-data: b' algorithms'
-
-data: b' to'
-
-data: b' learn'
-
-data: b' from'
-
-data: b' data'
-
-data: [DONE]
-```
-
-### Megaservice
-
-Test:
-```
-curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
-     "messages": "What is the revenue of Nike in 2023?"
-   }'
-```
-
-Answer:
-```
-data: b'\n'
-
-data: b'\n'
-
-data: b'N'
-
-data: b'ike'
-
-data: b"'"
-
-data: b's'
-
-data: b' revenue'
-
-data: b' for'
-
-. . .
-
-data: b' popularity'
-
-data: b' among'
-
-data: b' consumers'
-
-data: b'.'
-
-data: b'</s>'
-
-data: [DONE]
-```
-
 
 ### Let's run!
 

diff --git a/readme.md b/readme.md
@@ -83,19 +83,19 @@
   ```
 - Verify the list of containers:
     ```
-  $ sudo docker container ls
+  sudo docker container ls
   CONTAINER ID   IMAGE                                                                 COMMAND                  CREATED          STATUS          PORTS                                                                                  NAMES
-  dbd94a818b0d   opea/chatqna-ui:latest                                                "docker-entrypoint.s…"   24 seconds ago   Up 22 seconds   0.0.0.0:5173->5173/tcp, :::5173->5173/tcp                                              chatqna-xeon-ui-server
-  3433b05a6a0b   opea/chatqna:latest                                                   "python chatqna.py"      24 seconds ago   Up 22 seconds   0.0.0.0:8888->8888/tcp, :::8888->8888/tcp                                              chatqna-xeon-backend-server
-  3c5c036ae59b   opea/dataprep-redis:latest                                            "python prepare_doc_…"   25 seconds ago   Up 23 seconds   0.0.0.0:6007->6007/tcp, :::6007->6007/tcp                                              dataprep-redis-server
-  08fad4a403cc   opea/retriever-redis:latest                                           "python retriever_re…"   25 seconds ago   Up 23 seconds   0.0.0.0:7000->7000/tcp, :::7000->7000/tcp                                              retriever-redis-server
-  110c686f9c9c   opea/llm-tgi:latest                                                   "bash entrypoint.sh"     25 seconds ago   Up 23 seconds   0.0.0.0:9000->9000/tcp, :::9000->9000/tcp                                              llm-tgi-server
-  40cc0fcd293e   opea/reranking-tei:latest                                             "python reranking_te…"   25 seconds ago   Up 23 seconds   0.0.0.0:8000->8000/tcp, :::8000->8000/tcp                                              reranking-tei-xeon-server
-  696959d09c87   opea/embedding-tei:latest                                             "python embedding_te…"   25 seconds ago   Up 23 seconds   0.0.0.0:6000->6000/tcp, :::6000->6000/tcp                                              embedding-tei-server
-  33549bbb37c3   ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu   "text-generation-lau…"   25 seconds ago   Up 23 seconds   0.0.0.0:9009->80/tcp, [::]:9009->80/tcp                                                tgi-service
-  6d48620d2958   redis/redis-stack:7.2.0-v9                                            "/entrypoint.sh"         25 seconds ago   Up 23 seconds   0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp   redis-vector-db
-  e1e2e862df01   ghcr.io/huggingface/text-embeddings-inference:cpu-1.5                 "text-embeddings-rou…"   25 seconds ago   Up 23 seconds   0.0.0.0:8808->80/tcp, [::]:8808->80/tcp                                                tei-reranking-server
-  958e04b00fa8   ghcr.io/huggingface/text-embeddings-inference:cpu-1.5                 "text-embeddings-rou…"   25 seconds ago   Up 23 seconds   0.0.0.0:6006->80/tcp, [::]:6006->80/tcp                                                tei-embedding-server
+  f2a6fa5ea3b7   opea/chatqna-ui:latest                                                "docker-entrypoint.s…"   12 seconds ago   Up 10 seconds   0.0.0.0:5173->5173/tcp, :::5173->5173/tcp                                              chatqna-xeon-ui-server
+  c88745a81f54   opea/chatqna:latest                                                   "python chatqna.py"      12 seconds ago   Up 10 seconds   0.0.0.0:8888->8888/tcp, :::8888->8888/tcp                                              chatqna-xeon-backend-server
+  00f9b2f5c296   opea/dataprep-redis:latest                                            "python prepare_doc_…"   12 seconds ago   Up 11 seconds   0.0.0.0:6007->6007/tcp, :::6007->6007/tcp                                              dataprep-redis-server
+  886350aea6fc   opea/llm-tgi:latest                                                   "bash entrypoint.sh"     12 seconds ago   Up 11 seconds   0.0.0.0:9000->9000/tcp, :::9000->9000/tcp                                              llm-tgi-server
+  018d363ed61b   opea/retriever-redis:latest                                           "python retriever_re…"   12 seconds ago   Up 11 seconds   0.0.0.0:7000->7000/tcp, :::7000->7000/tcp                                              retriever-redis-server
+  826d5ec265f3   opea/embedding-tei:latest                                             "python embedding_te…"   12 seconds ago   Up 11 seconds   0.0.0.0:6000->6000/tcp, :::6000->6000/tcp                                              embedding-tei-server
+  ef4e354cf4cb   opea/reranking-tei:latest                                             "python reranking_te…"   12 seconds ago   Up 11 seconds   0.0.0.0:8000->8000/tcp, :::8000->8000/tcp                                              reranking-tei-xeon-server
+  b2af32528f92   ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu   "text-generation-lau…"   12 seconds ago   Up 11 seconds   0.0.0.0:9009->80/tcp, [::]:9009->80/tcp                                                tgi-service
+  ffd17623f9a2   redis/redis-stack:7.2.0-v9                                            "/entrypoint.sh"         12 seconds ago   Up 11 seconds   0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp   redis-vector-db
+  52f70df956a2   ghcr.io/huggingface/text-embeddings-inference:cpu-1.5                 "text-embeddings-rou…"   12 seconds ago   Up 11 seconds   0.0.0.0:8808->80/tcp, [::]:8808->80/tcp                                                tei-reranking-server
+  6cd64dca38c1   ghcr.io/huggingface/text-embeddings-inference:cpu-1.5                 "text-embeddings-rou…"   12 seconds ago   Up 11 seconds   0.0.0.0:6006->80/tcp, [::]:6006->80/tcp                                                tei-embedding-server
   ```
 
 ## Validate Services

diff --git a/readme.md b/readme.md
@@ -9,23 +9,21 @@
 - Availability zone: `Zone 2`
 - Image: `Ubuntu Server 24.04 LTS - x64 Gen2`
 - Size: `Standard_D8s_v4`
-- Key pair name: `opea-demo-key`
+- Key pair name: `azure-opea-key`
 - Click on `Next : Disks>`
 - Choose OS disk size as `512 GB (p20)`
 - Select `Review + Create`
 - Once you see the message `Validation passed`, click on `Create` button
 - Click on `Download private key and create resource`
 
-
 ### Connect
 
 - Click on `Go to resource`
 - Click on `Connect`
+- Click on `Select` in `SSH using Azure CLI`
 
 
-- Ubuntu 24.04 LTS
-
-- Install Docker:
+### Install Docker:
 
   ```
   # Add Docker's official GPG key:
@@ -53,7 +51,7 @@
   ```
 - Replace HuggingFace API token and private IP address of the host below and copy the contents in a file named `.env`:
   ```
-  host_ip=10.128.0.3 #private IP address of the host
+  host_ip=10.0.0.4 #private IP address of the host
   no_proxy=${host_ip}
   HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
   EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
@@ -84,28 +82,28 @@
   sudo docker compose -f compose.yaml up -d
   ```
 - Verify the list of containers:
-  ```
-  arun_gupta@opea-demo:~$ sudo docker container ls
-  CONTAINER ID   IMAGE                                                                 COMMAND                  CREATED          STATUS         PORTS                                                                                  NAMES
-  65b54e433cfe   opea/chatqna-ui:latest                                                "docker-entrypoint.s…"   7 seconds ago    Up 6 seconds   0.0.0.0:5173->5173/tcp, :::5173->5173/tcp                                              chatqna-xeon-ui-server
-  798310e0ca77   opea/chatqna:latest                                                   "python chatqna.py"      7 seconds ago    Up 6 seconds   0.0.0.0:8888->8888/tcp, :::8888->8888/tcp                                              chatqna-xeon-backend-server
-  362f3117a528   opea/dataprep-redis:latest                                            "python prepare_doc_…"   7 seconds ago    Up 6 seconds   0.0.0.0:6007->6007/tcp, :::6007->6007/tcp                                              dataprep-redis-server
-  3985a4de5dc4   opea/embedding-tei:latest                                             "python embedding_te…"   7 seconds ago    Up 6 seconds   0.0.0.0:6000->6000/tcp, :::6000->6000/tcp                                              embedding-tei-server
-  b41907df6672   opea/reranking-tei:latest                                             "python reranking_te…"   7 seconds ago    Up 6 seconds   0.0.0.0:8000->8000/tcp, :::8000->8000/tcp                                              reranking-tei-xeon-server
-  19d9a30f85de   opea/llm-tgi:latest                                                   "bash entrypoint.sh"     7 seconds ago    Up 6 seconds   0.0.0.0:9000->9000/tcp, :::9000->9000/tcp                                              llm-tgi-server
-  3fa19c8ec722   opea/retriever-redis:latest                                           "python retriever_re…"   7 seconds ago    Up 6 seconds   0.0.0.0:7000->7000/tcp, :::7000->7000/tcp                                              retriever-redis-server
-  14b5ccd5416c   ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu   "text-generation-lau…"   19 seconds ago   Up 7 seconds   0.0.0.0:9009->80/tcp, [::]:9009->80/tcp                                                tgi-service
-  8f58f9aaefae   redis/redis-stack:7.2.0-v9                                            "/entrypoint.sh"         19 seconds ago   Up 7 seconds   0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp   redis-vector-db
-  931126a552cb   ghcr.io/huggingface/text-embeddings-inference:cpu-1.5                 "text-embeddings-rou…"   19 seconds ago   Up 7 seconds   0.0.0.0:6006->80/tcp, [::]:6006->80/tcp                                                tei-embedding-server
-  5a2c435edc0f   ghcr.io/huggingface/text-embeddings-inference:cpu-1.5                 "text-embeddings-rou…"   19 seconds ago   Up 7 seconds   0.0.0.0:8808->80/tcp, [::]:8808->80/tcp                                                tei-reranking-server  
+    ```
+  $ sudo docker container ls
+  CONTAINER ID   IMAGE                                                                 COMMAND                  CREATED          STATUS          PORTS                                                                                  NAMES
+  dbd94a818b0d   opea/chatqna-ui:latest                                                "docker-entrypoint.s…"   24 seconds ago   Up 22 seconds   0.0.0.0:5173->5173/tcp, :::5173->5173/tcp                                              chatqna-xeon-ui-server
+  3433b05a6a0b   opea/chatqna:latest                                                   "python chatqna.py"      24 seconds ago   Up 22 seconds   0.0.0.0:8888->8888/tcp, :::8888->8888/tcp                                              chatqna-xeon-backend-server
+  3c5c036ae59b   opea/dataprep-redis:latest                                            "python prepare_doc_…"   25 seconds ago   Up 23 seconds   0.0.0.0:6007->6007/tcp, :::6007->6007/tcp                                              dataprep-redis-server
+  08fad4a403cc   opea/retriever-redis:latest                                           "python retriever_re…"   25 seconds ago   Up 23 seconds   0.0.0.0:7000->7000/tcp, :::7000->7000/tcp                                              retriever-redis-server
+  110c686f9c9c   opea/llm-tgi:latest                                                   "bash entrypoint.sh"     25 seconds ago   Up 23 seconds   0.0.0.0:9000->9000/tcp, :::9000->9000/tcp                                              llm-tgi-server
+  40cc0fcd293e   opea/reranking-tei:latest                                             "python reranking_te…"   25 seconds ago   Up 23 seconds   0.0.0.0:8000->8000/tcp, :::8000->8000/tcp                                              reranking-tei-xeon-server
+  696959d09c87   opea/embedding-tei:latest                                             "python embedding_te…"   25 seconds ago   Up 23 seconds   0.0.0.0:6000->6000/tcp, :::6000->6000/tcp                                              embedding-tei-server
+  33549bbb37c3   ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu   "text-generation-lau…"   25 seconds ago   Up 23 seconds   0.0.0.0:9009->80/tcp, [::]:9009->80/tcp                                                tgi-service
+  6d48620d2958   redis/redis-stack:7.2.0-v9                                            "/entrypoint.sh"         25 seconds ago   Up 23 seconds   0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp   redis-vector-db
+  e1e2e862df01   ghcr.io/huggingface/text-embeddings-inference:cpu-1.5                 "text-embeddings-rou…"   25 seconds ago   Up 23 seconds   0.0.0.0:8808->80/tcp, [::]:8808->80/tcp                                                tei-reranking-server
+  958e04b00fa8   ghcr.io/huggingface/text-embeddings-inference:cpu-1.5                 "text-embeddings-rou…"   25 seconds ago   Up 23 seconds   0.0.0.0:6006->80/tcp, [::]:6006->80/tcp                                                tei-embedding-server
   ```
 
 ## Validate Services
 
 Export `host_ip` environment variable:
 
 ```
-export host_ip=10.128.0.3
+export host_ip=10.0.0.4
 ```
 
 ### Embedding service

diff --git a/readme.md b/readme.md
@@ -9,17 +9,17 @@
 - Availability zone: `Zone 2`
 - Image: `Ubuntu Server 24.04 LTS - x64 Gen2`
 - Size: `Standard_D8s_v4`
-- Key pair name: `opea-azure`
+- Key pair name: `opea-demo-key`
 - Click on `Next : Disks>`
 - Choose OS disk size as `512 GB (p20)`
 - Select `Review + Create`
-
-
 - Once you see the message `Validation passed`, click on `Create` button
 - Click on `Download private key and create resource`
 
+
 ### Connect
 
+- Click on `Go to resource`
 - Click on `Connect`
 
 

diff --git a/readme.md b/readme.md
@@ -5,10 +5,24 @@
 ### Ubuntu 22.04
 
 - https://portal.azure.com/
-- `D8s_v4`
+- Region: `(US) West US 2`
+- Availability zone: `Zone 2`
+- Image: `Ubuntu Server 24.04 LTS - x64 Gen2`
+- Size: `Standard_D8s_v4`
+- Key pair name: `opea-azure`
+- Click on `Next : Disks>`
+- Choose OS disk size as `512 GB (p20)`
+- Select `Review + Create`
+
+
+- Once you see the message `Validation passed`, click on `Create` button
+- Click on `Download private key and create resource`
+
+### Connect
+
+- Click on `Connect`
 
 
-- Change boot disk to `500 GB`
 - Ubuntu 24.04 LTS
 
 - Install Docker:

diff --git a/readme.md b/readme.md
@@ -5,7 +5,7 @@
 ### Ubuntu 22.04
 
 - https://portal.azure.com/
-- `D16ds_v5`
+- `D8s_v4`
 
 
 - Change boot disk to `500 GB`

diff --git a/readme.md b/readme.md
@@ -0,0 +1,393 @@
+# OPEA on Microsoft Azure using Docker Compose
+
+## Create your instance
+
+### Ubuntu 22.04
+
+- https://portal.azure.com/
+- `D16ds_v5`
+
+
+- Change boot disk to `500 GB`
+- Ubuntu 24.04 LTS
+
+- Install Docker:
+
+  ```
+  # Add Docker's official GPG key:
+  sudo apt-get -y update
+  sudo apt-get -y install ca-certificates curl
+  sudo install -m 0755 -d /etc/apt/keyrings
+  sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
+  sudo chmod a+r /etc/apt/keyrings/docker.asc
+
+  # Add the repository to Apt sources:
+  echo \
+    "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
+    $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
+    sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
+  sudo apt-get -y update
+  sudo apt-get -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
+  ```
+
+## Docker images
+
+- Pull OPEA Docker images:
+  ```
+  sudo docker pull opea/chatqna:latest
+  sudo docker pull opea/chatqna-conversation-ui:latest
+  ```
+- Replace HuggingFace API token and private IP address of the host below and copy the contents in a file named `.env`:
+  ```
+  host_ip=10.128.0.3 #private IP address of the host
+  no_proxy=${host_ip}
+  HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
+  EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
+  RERANK_MODEL_ID="BAAI/bge-reranker-base"
+  LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
+  TEI_EMBEDDING_ENDPOINT="http://${host_ip}:6006"
+  TEI_RERANKING_ENDPOINT="http://${host_ip}:8808"
+  TGI_LLM_ENDPOINT="http://${host_ip}:9009"
+  REDIS_URL="redis://${host_ip}:6379"
+  INDEX_NAME="rag-redis"
+  REDIS_HOST=${host_ip}
+  MEGA_SERVICE_HOST_IP=${host_ip}
+  EMBEDDING_SERVICE_HOST_IP=${host_ip}
+  RETRIEVER_SERVICE_HOST_IP=${host_ip}
+  RERANK_SERVICE_HOST_IP=${host_ip}
+  LLM_SERVICE_HOST_IP=${host_ip}
+  BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/chatqna"
+  DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep"
+  DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get_file"
+  DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete_file"
+  ```
+- Download Docker Compose file:
+  ```
+  curl -O https://raw.githubusercontent.com/opea-project/GenAIExamples/main/ChatQnA/docker_compose/intel/cpu/xeon/compose.yaml
+  ```
+- Start the application:
+  ```
+  sudo docker compose -f compose.yaml up -d
+  ```
+- Verify the list of containers:
+  ```
+  arun_gupta@opea-demo:~$ sudo docker container ls
+  CONTAINER ID   IMAGE                                                                 COMMAND                  CREATED          STATUS         PORTS                                                                                  NAMES
+  65b54e433cfe   opea/chatqna-ui:latest                                                "docker-entrypoint.s…"   7 seconds ago    Up 6 seconds   0.0.0.0:5173->5173/tcp, :::5173->5173/tcp                                              chatqna-xeon-ui-server
+  798310e0ca77   opea/chatqna:latest                                                   "python chatqna.py"      7 seconds ago    Up 6 seconds   0.0.0.0:8888->8888/tcp, :::8888->8888/tcp                                              chatqna-xeon-backend-server
+  362f3117a528   opea/dataprep-redis:latest                                            "python prepare_doc_…"   7 seconds ago    Up 6 seconds   0.0.0.0:6007->6007/tcp, :::6007->6007/tcp                                              dataprep-redis-server
+  3985a4de5dc4   opea/embedding-tei:latest                                             "python embedding_te…"   7 seconds ago    Up 6 seconds   0.0.0.0:6000->6000/tcp, :::6000->6000/tcp                                              embedding-tei-server
+  b41907df6672   opea/reranking-tei:latest                                             "python reranking_te…"   7 seconds ago    Up 6 seconds   0.0.0.0:8000->8000/tcp, :::8000->8000/tcp                                              reranking-tei-xeon-server
+  19d9a30f85de   opea/llm-tgi:latest                                                   "bash entrypoint.sh"     7 seconds ago    Up 6 seconds   0.0.0.0:9000->9000/tcp, :::9000->9000/tcp                                              llm-tgi-server
+  3fa19c8ec722   opea/retriever-redis:latest                                           "python retriever_re…"   7 seconds ago    Up 6 seconds   0.0.0.0:7000->7000/tcp, :::7000->7000/tcp                                              retriever-redis-server
+  14b5ccd5416c   ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu   "text-generation-lau…"   19 seconds ago   Up 7 seconds   0.0.0.0:9009->80/tcp, [::]:9009->80/tcp                                                tgi-service
+  8f58f9aaefae   redis/redis-stack:7.2.0-v9                                            "/entrypoint.sh"         19 seconds ago   Up 7 seconds   0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp   redis-vector-db
+  931126a552cb   ghcr.io/huggingface/text-embeddings-inference:cpu-1.5                 "text-embeddings-rou…"   19 seconds ago   Up 7 seconds   0.0.0.0:6006->80/tcp, [::]:6006->80/tcp                                                tei-embedding-server
+  5a2c435edc0f   ghcr.io/huggingface/text-embeddings-inference:cpu-1.5                 "text-embeddings-rou…"   19 seconds ago   Up 7 seconds   0.0.0.0:8808->80/tcp, [::]:8808->80/tcp                                                tei-reranking-server  
+  ```
+
+## Validate Services
+
+Export `host_ip` environment variable:
+
+```
+export host_ip=10.128.0.3
+```
+
+### Embedding service
+
+Test:
+
+```
+curl ${host_ip}:6006/embed \
+  -X POST \
+  -d '{"inputs":"What is Deep Learning?"}' \
+  -H 'Content-Type: application/json'
+```
+
+Answer:
+```
+[[0.00037115702,-0.06356819,0.0024758505,-0.012360337,0.050739925,0.023380278,0.022216318,0.0008076447,
+  . . . 
+0.022558564,-0.04570635,-0.033072025,0.022725677,0.016026087,-0.02125421,-0.02984927,-0.0049473033]]
+```
+
+### Embedding microservice
+Test:
+```
+curl http://${host_ip}:6000/v1/embeddings\
+  -X POST \
+  -d '{"text":"hello"}' \
+  -H 'Content-Type: application/json'
+```
+
+Failing with 
+
+Answer:
+```
+{"id":"2d6bbb69f440491249e672d6039dfd5f","text":"hello","embedding":[0.0007791813,0.042613804
+. . .
+-0.0044034636],"search_type":"similarity","k":4,"distance_threshold":null,"fetch_k":20,"lambda_mult":0.5,"score_threshold":0.2}
+```
+
+### Retriever microservice
+
+Test:
+```
+export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)")
+curl http://${host_ip}:7000/v1/retrieval \
+  -X POST \
+  -d "{\"text\":\"test\",\"embedding\":${your_embedding}}" \
+  -H 'Content-Type: application/json'
+```
+
+Answer:
+```
+{"id":"429f21479c99008b1ade5d8720cc60dc","retrieved_docs":[],"initial_query":"test","top_n":1}
+```
+
+### TEI Reranking service
+
+Test:
+```
+curl http://${host_ip}:8808/rerank \
+    -X POST \
+    -d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
+    -H 'Content-Type: application/json'
+```
+Answer:
+```
+[{"index":1,"score":0.94238955},{"index":0,"score":0.120219156}]
+```
+
+### Reranking microservice
+
+Test:
+```
+curl http://${host_ip}:8000/v1/reranking\
+  -X POST \
+  -d '{"initial_query":"What is Deep Learning?", "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \
+  -H 'Content-Type: application/json'
+```
+Answer:
+```
+{"id":"65a489a9fae807039905008dce80ef6b","model":null,"query":"What is Deep Learning?","max_new_tokens":1024,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true,"chat_template":null,"documents":["Deep learning is..."]}
+```
+
+### LLM Backend Service
+
+- Check logs:
+  ```
+  sudo docker logs tgi-service
+  ```
+
+  It takes ~5 minutes for this service to be ready. Wait till you see this log output:
+  ```
+  . . .
+  2024-09-12T02:14:07.324250Z  INFO shard-manager: text_generation_launcher: Shard ready in 20.620625696s rank=0
+  2024-09-12T02:14:07.380398Z  INFO text_generation_launcher: Starting Webserver
+  2024-09-12T02:14:07.526375Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:90: Warming up model
+  2024-09-12T02:14:42.046106Z  INFO text_generation_launcher: Cuda Graphs are disabled (CUDA_GRAPHS=None).
+  2024-09-12T02:14:42.046591Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:102: Setting max batch total tokens to 83088
+  2024-09-12T02:14:42.047332Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:126: Using backend V3
+  2024-09-12T02:14:42.051963Z  INFO text_generation_router::server: router/src/server.rs:1651: Using the Hugging Face API
+  2024-09-12T02:14:42.066054Z  INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"    
+  2024-09-12T02:14:42.473427Z  INFO text_generation_router::server: router/src/server.rs:2349: Serving revision 52d98fdf3b050731f1476f1af6efc7970f14c67e of model Intel/neural-chat-7b-v3-3
+  2024-09-12T02:14:42.516228Z  INFO text_generation_router::server: router/src/server.rs:1747: Overriding LlamaTokenizer with TemplateProcessing to follow python override defined in https://github.com/huggingface/transformers/blob/4aa17d00690b7f82c95bb2949ea57e22c35b4336/src/transformers/models/llama/tokenization_llama_fast.py#L203-L205
+  2024-09-12T02:14:42.516696Z  INFO text_generation_router::server: router/src/server.rs:1781: Using config Some(Mistral)
+  2024-09-12T02:14:42.516736Z  WARN text_generation_router::server: router/src/server.rs:1928: Invalid hostname, defaulting to 0.0.0.0
+  2024-09-12T02:14:42.528179Z  INFO text_generation_router::server: router/src/server.rs:2311: Connected
+  ```
+- Check TGI service:
+  ```
+  # TGI service
+  curl http://${host_ip}:9009/generate \
+    -X POST \
+    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
+    -H 'Content-Type: application/json'
+  ```
+  with the response:
+  ```
+  {"generated_text":"\n\nDeep Learning is a subset of machine learning which focuses on algorithms that learn from"}
+  ```
+- Check vLLM service:
+  ```
+  curl http://${host_ip}:9009/v1/completions \
+    -H "Content-Type: application/json" \
+    -d '{"model": "Intel/neural-chat-7b-v3-3", "prompt": "What is Deep Learning?", "max_tokens": 32, "temperature": 0}'
+  ```
+  with the response:
+  ```
+  {"object":"text_completion","id":"","created":1726117774,"model":"Intel/neural-chat-7b-v3-3","system_fingerprint":"2.2.1-dev0-sha-e4201f4-intel-cpu","choices":[{"index":0,"text":"\n\nDeep Learning is a subset of Machine Learning that is concerned with algorithms inspired by the structure and function of the brain. It is a part of Artificial","logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":6,"completion_tokens":32,"total_tokens":38}}
+  ```
+
+### LLM microservice
+
+Test:
+
+```
+curl http://${host_ip}:9000/v1/chat/completions\
+  -X POST \
+  -d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
+  -H 'Content-Type: application/json'
+```
+
+Answer:
+```
+data: b'\n'
+
+data: b'\n'
+
+data: b'Deep'
+
+data: b' learning'
+
+data: b' is'
+
+data: b' a'
+
+data: b' subset'
+
+data: b' of'
+
+data: b' machine'
+
+data: b' learning'
+
+data: b' that'
+
+data: b' uses'
+
+data: b' algorithms'
+
+data: b' to'
+
+data: b' learn'
+
+data: b' from'
+
+data: b' data'
+
+data: [DONE]
+```
+
+### Megaservice
+
+Test:
+```
+curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
+     "messages": "What is the revenue of Nike in 2023?"
+   }'
+```
+
+Answer:
+```
+data: b'\n'
+
+data: b'\n'
+
+data: b'N'
+
+data: b'ike'
+
+data: b"'"
+
+data: b's'
+
+data: b' revenue'
+
+data: b' for'
+
+. . .
+
+data: b' popularity'
+
+data: b' among'
+
+data: b' consumers'
+
+data: b'.'
+
+data: b'</s>'
+
+data: [DONE]
+```
+
+
+### Let's run!
+
+#### RAG using hyperlink
+
+- Ask the question:
+  ```
+  curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
+     "messages": "What is OPEA?"
+   }'
+  data: b'\n'
+
+  data: b'\n'
+
+  data: b'The'
+
+  data: b' Oklahoma'
+
+  data: b' Public'
+
+  data: b' Em'
+
+  data: b'ploy'
+
+  data: b'ees'
+
+  data: b' Association'
+  ```
+- Update knowledge base:
+  ```
+  curl -X POST "http://${host_ip}:6007/v1/dataprep" \
+       -H "Content-Type: multipart/form-data" \
+       -F 'link_list=["https://opea.dev"]'
+  {"status":200,"message":"Data preparation succeeded"}
+  {"status":200,"message":"Data preparation succeeded"}status:200: command not found
+  ```
+- Ask the question:
+  ```
+  arun_gupta@opea-demo:~$ curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
+       "messages": "What is OPEA?"
+     }'
+  data: b'\n'
+
+  data: b'O'
+
+  data: b'PE'
+
+  data: b'A'
+
+  data: b' stands'
+
+  data: b' for'
+
+  data: b' Open'
+
+  data: b' Platform'
+
+  data: b' for'
+
+  data: b' Enterprise'
+
+  data: b' AI'
+
+  data: b'.'
+
+  data: b' It'
+  ```
+- Delete link from the knowledge base:
+  ```
+  [ec2-user@ip-172-31-77-194 ~]$ # delete link
+  curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
+       -d '{"file_path": "https://opea.dev"}' \
+       -H "Content-Type: application/json"
+  {"detail":"File https://opea.dev not found. Please check file_path."}
+  ```
+
+  This is giving an error: https://github.com/opea-project/GenAIExamples/issues/724
+