|
|
@@ -3,10 +3,7 @@ |
|
|
"nbformat_minor": 0, |
|
|
"metadata": { |
|
|
"colab": { |
|
|
"provenance": [], |
|
|
"mount_file_id": "1IVBrqqDZiGRwldBJ_SsPtkyEpVJ-diDI", |
|
|
"authorship_tag": "ABX9TyN/jC5e6yzstKUrokalMTiU", |
|
|
"include_colab_link": true |
|
|
"provenance": [] |
|
|
}, |
|
|
"kernelspec": { |
|
|
"name": "python3", |
|
|
@@ -17,16 +14,6 @@ |
|
|
} |
|
|
}, |
|
|
"cells": [ |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"metadata": { |
|
|
"id": "view-in-github", |
|
|
"colab_type": "text" |
|
|
}, |
|
|
"source": [ |
|
|
"<a href=\"https://colab.research.google.com/gist/ninehills/ecf7107574c83016e8b68965bf9a51c4/chatpdf-zh.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"source": [ |
|
|
@@ -51,7 +38,7 @@ |
|
|
}, |
|
|
"outputId": "3c50322c-5eef-43e1-f4a5-34084831e52e" |
|
|
}, |
|
|
"execution_count": 4, |
|
|
"execution_count": null, |
|
|
"outputs": [ |
|
|
{ |
|
|
"output_type": "stream", |
|
|
@@ -72,7 +59,7 @@ |
|
|
"metadata": { |
|
|
"id": "UXw1TWw_nj_F" |
|
|
}, |
|
|
"execution_count": 10, |
|
|
"execution_count": 1, |
|
|
"outputs": [] |
|
|
}, |
|
|
{ |
|
|
@@ -87,66 +74,49 @@ |
|
|
"metadata": { |
|
|
"id": "Aqef8N2RlUpo" |
|
|
}, |
|
|
"execution_count": 5, |
|
|
"execution_count": 2, |
|
|
"outputs": [] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"source": [ |
|
|
"# Load environment variables, Just create a .env file with your OPENAI_API_KEY then load it.\n", |
|
|
"\n", |
|
|
"import os \n", |
|
|
"from dotenv import load_dotenv\n", |
|
|
"\n", |
|
|
"load_dotenv()\n", |
|
|
"\n", |
|
|
"# API configuration\n", |
|
|
"OPENAI_API_KEY = os.getenv(\"OPENAI_API_KEY\")" |
|
|
"from llama_index import GPTSimpleVectorIndex, LLMPredictor, PromptHelper\n", |
|
|
"from llama_index.response.notebook_utils import display_response\n", |
|
|
"from llama_index.prompts.prompts import QuestionAnswerPrompt\n", |
|
|
"from langchain.chat_models import ChatOpenAI\n", |
|
|
"from IPython.display import Markdown, display" |
|
|
], |
|
|
"metadata": { |
|
|
"id": "WKoA2bzul7Gz" |
|
|
"id": "Vp6JcErhmt_w" |
|
|
}, |
|
|
"execution_count": 6, |
|
|
"execution_count": 5, |
|
|
"outputs": [] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"source": [ |
|
|
"# Load pdf to documents\n", |
|
|
"# Load environment variables, Just create a .env file with your OPENAI_API_KEY then load it.\n", |
|
|
"\n", |
|
|
"from pathlib import Path\n", |
|
|
"from llama_index import download_loader\n", |
|
|
"import os \n", |
|
|
"from dotenv import load_dotenv\n", |
|
|
"\n", |
|
|
"CJKPDFReader = download_loader(\"CJKPDFReader\")\n", |
|
|
"load_dotenv()\n", |
|
|
"\n", |
|
|
"loader = CJKPDFReader()\n", |
|
|
"documents = loader.load_data(file=os.path.join(Path(WORK_DIR), Path(SRC_FILE)))" |
|
|
], |
|
|
"metadata": { |
|
|
"id": "q9Pp1AZSkarS" |
|
|
}, |
|
|
"execution_count": 7, |
|
|
"outputs": [] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"source": [ |
|
|
"from llama_index import GPTSimpleVectorIndex, LLMPredictor, PromptHelper\n", |
|
|
"from llama_index.response.notebook_utils import display_response\n", |
|
|
"from llama_index.prompts.prompts import QuestionAnswerPrompt\n", |
|
|
"from langchain.chat_models import ChatOpenAI\n", |
|
|
"from IPython.display import Markdown, display" |
|
|
"# API configuration\n", |
|
|
"OPENAI_API_KEY = os.getenv(\"OPENAI_API_KEY\")" |
|
|
], |
|
|
"metadata": { |
|
|
"id": "Vp6JcErhmt_w" |
|
|
"id": "WKoA2bzul7Gz" |
|
|
}, |
|
|
"execution_count": 15, |
|
|
"execution_count": 3, |
|
|
"outputs": [] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"source": [ |
|
|
"准备 Index 文件,为了避免重复索引,增加缓存" |
|
|
"准备 Index 文件,为了避免重复索引,增加缓存\n", |
|
|
"\n", |
|
|
"\n" |
|
|
], |
|
|
"metadata": { |
|
|
"id": "SApFHwHCpEGJ" |
|
|
@@ -155,9 +125,18 @@ |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"source": [ |
|
|
"# Load pdf to documents\n", |
|
|
"\n", |
|
|
"from pathlib import Path\n", |
|
|
"from llama_index import download_loader\n", |
|
|
"\n", |
|
|
"CJKPDFReader = download_loader(\"CJKPDFReader\")\n", |
|
|
"\n", |
|
|
"loader = CJKPDFReader()\n", |
|
|
"index_file = os.path.join(Path(WORK_DIR), Path(INDEX_FILE))\n", |
|
|
"\n", |
|
|
"if os.path.exists(index_file) == False:\n", |
|
|
" documents = loader.load_data(file=os.path.join(Path(WORK_DIR), Path(SRC_FILE)))\n", |
|
|
" index = GPTSimpleVectorIndex(documents)\n", |
|
|
" index.save_to_disk(index_file)\n", |
|
|
"else:\n", |
|
|
@@ -166,7 +145,7 @@ |
|
|
"metadata": { |
|
|
"id": "Cb98YMtrnTxU" |
|
|
}, |
|
|
"execution_count": 14, |
|
|
"execution_count": 7, |
|
|
"outputs": [] |
|
|
}, |
|
|
{ |
|
|
@@ -181,7 +160,19 @@ |
|
|
" \"\\n---------------------\\n\"\n", |
|
|
" \"{query_str}\\n\"\n", |
|
|
")\n", |
|
|
"QUESTION_ANSWER_PROMPT = QuestionAnswerPrompt(QUESTION_ANSWER_PROMPT_TMPL)\n", |
|
|
"\n", |
|
|
"QUESTION_ANSWER_PROMPT_TMPL_2 = \"\"\"\n", |
|
|
"You are an AI assistant providing helpful advice. You are given the following extracted parts of a long document and a question. Provide a conversational answer based on the context provided.\n", |
|
|
"If you can't find the answer in the context below, just say \"Hmm, I'm not sure.\" Don't try to make up an answer.\n", |
|
|
"If the question is not related to the context, politely respond that you are tuned to only answer questions that are related to the context.\n", |
|
|
"Context information is below.\n", |
|
|
"=========\n", |
|
|
"{context_str}\n", |
|
|
"=========\n", |
|
|
"{query_str}\n", |
|
|
"\"\"\"\n", |
|
|
"\n", |
|
|
"QUESTION_ANSWER_PROMPT = QuestionAnswerPrompt(QUESTION_ANSWER_PROMPT_TMPL_2)\n", |
|
|
"\n", |
|
|
"def chat(query):\n", |
|
|
" return index.query(\n", |
|
|
@@ -197,20 +188,20 @@ |
|
|
"metadata": { |
|
|
"colab": { |
|
|
"base_uri": "https://localhost:8080/", |
|
|
"height": 409 |
|
|
"height": 427 |
|
|
}, |
|
|
"id": "6ddjxclno8tg", |
|
|
"outputId": "3b1fd4b2-a5e7-4ed1-ba22-bbefc1d0ea56" |
|
|
"outputId": "254939dd-b48d-442f-aef1-8e13b5a13c99" |
|
|
}, |
|
|
"execution_count": 22, |
|
|
"execution_count": 8, |
|
|
"outputs": [ |
|
|
{ |
|
|
"output_type": "display_data", |
|
|
"data": { |
|
|
"text/plain": [ |
|
|
"<IPython.core.display.Markdown object>" |
|
|
], |
|
|
"text/markdown": "**`Final Response:`** 这本书叫做《翦商:殷周之变与华夏新生》,主要讲述了商周之际的历史变迁,特别是周公废除人祭的历史转折,以及商周文化的差异。作者从历史、考古、文献等多个角度出发,探讨了商周时期的人祭习俗、周公的作用、商周文化的差异等问题。此外,该书还设立了一个出发点,让对古典中国思想、信仰、伦理、心态、风俗,以及军事、政治、制度、规则有兴趣的研究者或普通读者,可以从这本书开始探索。" |
|
|
"text/markdown": "**`Final Response:`** 这本书叫做《周灭商与华夏新生》,主要讲述了商周之变的历史转折,以及商文化与周文化的不同之处。它是一部关于古代中国思想、信仰、伦理、心态、风俗,以及军事、政治、制度、规则的历史著作,讲述了商朝的祭祀与战争为何有如此紧密的联系,以及殷周之变是如何发生的。它还设立了一个出发点:凡对古典中国思想、信仰、伦理、心态、风俗,以及军事、政治、制度、规则有兴趣的研究者或普通读者,可以先从这本书开始你的探索。" |
|
|
}, |
|
|
"metadata": {} |
|
|
}, |
|
|
@@ -240,7 +231,7 @@ |
|
|
"text/plain": [ |
|
|
"<IPython.core.display.Markdown object>" |
|
|
], |
|
|
"text/markdown": "**Document ID:** 1f43ae29-e41d-474b-b9c7-2afcb36d0d0b<br>**Similarity:** 0.8000716454899834<br>**Text:** 如《象传》和《彖传》可能是周公作品。其他篇章里常出现“子日”,孔子 \n自己肯定不会这样写,它们应当是孔门弟子编写的。《周易》经传的详细知识, \n可参考廖明春《周易经传十五讲》,北京大学出版社,2...<br>" |
|
|
"text/markdown": "**Document ID:** 1f43ae29-e41d-474b-b9c7-2afcb36d0d0b<br>**Similarity:** 0.8000785245735972<br>**Text:** 如《象传》和《彖传》可能是周公作品。其他篇章里常出现“子日”,孔子 \n自己肯定不会这样写,它们应当是孔门弟子编写的。《周易》经传的详细知识, \n可参考廖明春《周易经传十五讲》,北京大学出版社,2...<br>" |
|
|
}, |
|
|
"metadata": {} |
|
|
}, |
|
|
@@ -270,7 +261,7 @@ |
|
|
"text/plain": [ |
|
|
"<IPython.core.display.Markdown object>" |
|
|
], |
|
|
"text/markdown": "**Document ID:** 1f43ae29-e41d-474b-b9c7-2afcb36d0d0b<br>**Similarity:** 0.7965443968610987<br>**Text:** 的支持,其实是心理上的,让我意识到除了祭祀坑里的尸骨,这世界 上还有别的东西。 也许,人不应当凝视深渊;虽然深渊就在那里。 \f \f 始于一页,抵达世界 Humanities ■ Histor...<br>" |
|
|
"text/markdown": "**Document ID:** 1f43ae29-e41d-474b-b9c7-2afcb36d0d0b<br>**Similarity:** 0.7965259185103453<br>**Text:** 的支持,其实是心理上的,让我意识到除了祭祀坑里的尸骨,这世界 上还有别的东西。 也许,人不应当凝视深渊;虽然深渊就在那里。 \f \f 始于一页,抵达世界 Humanities ■ Histor...<br>" |
|
|
}, |
|
|
"metadata": {} |
|
|
}, |
|
|
@@ -300,7 +291,7 @@ |
|
|
"text/plain": [ |
|
|
"<IPython.core.display.Markdown object>" |
|
|
], |
|
|
"text/markdown": "**Document ID:** 1f43ae29-e41d-474b-b9c7-2afcb36d0d0b<br>**Similarity:** 0.7947832308214735<br>**Text:** 书》则是“太姒梦见商之庭产棘”。此事应载于《逸周书•程寤》篇,但传 \n世本只存篇名,正文缺。参见黄怀信等《逸周书汇校集注》(修订本),上海 \n古籍出版社,2007年,第262、1141页;李学勤...<br>" |
|
|
"text/markdown": "**Document ID:** 1f43ae29-e41d-474b-b9c7-2afcb36d0d0b<br>**Similarity:** 0.7947722920317227<br>**Text:** 书》则是“太姒梦见商之庭产棘”。此事应载于《逸周书•程寤》篇,但传 \n世本只存篇名,正文缺。参见黄怀信等《逸周书汇校集注》(修订本),上海 \n古籍出版社,2007年,第262、1141页;李学勤...<br>" |
|
|
}, |
|
|
"metadata": {} |
|
|
} |
|
|
@@ -314,20 +305,137 @@ |
|
|
"metadata": { |
|
|
"colab": { |
|
|
"base_uri": "https://localhost:8080/", |
|
|
"height": 544 |
|
|
"height": 509 |
|
|
}, |
|
|
"id": "bS5LAJkuqR4U", |
|
|
"outputId": "cd066f66-743a-476d-ec22-1b1adb10c2e6" |
|
|
"outputId": "b0431e12-55d0-41ce-b84f-43fe88666c6e" |
|
|
}, |
|
|
"execution_count": 9, |
|
|
"outputs": [ |
|
|
{ |
|
|
"output_type": "display_data", |
|
|
"data": { |
|
|
"text/plain": [ |
|
|
"<IPython.core.display.Markdown object>" |
|
|
], |
|
|
"text/markdown": "**`Final Response:`** 根据文本描述,牧野之战开始时,武王的军队面对着数量远远超过自己的商军阵列,而且他们没有内应,没有商人助战,所以处于两难的困境。武王派出他的岳父兼老师和战略阴谋家吕尚率步兵前往敌阵,自己则带着他的三百辆战车冲向商军阵列,吸引敌军注意力。商军阵列突然自行解体,变成了互相砍杀的人群。接着,西土联军全部投入了混战。最终,商军队伍溃散,武王取得了胜利。" |
|
|
}, |
|
|
"metadata": {} |
|
|
}, |
|
|
{ |
|
|
"output_type": "display_data", |
|
|
"data": { |
|
|
"text/plain": [ |
|
|
"<IPython.core.display.Markdown object>" |
|
|
], |
|
|
"text/markdown": "---" |
|
|
}, |
|
|
"metadata": {} |
|
|
}, |
|
|
{ |
|
|
"output_type": "display_data", |
|
|
"data": { |
|
|
"text/plain": [ |
|
|
"<IPython.core.display.Markdown object>" |
|
|
], |
|
|
"text/markdown": "**`Source Node 1/3`**" |
|
|
}, |
|
|
"metadata": {} |
|
|
}, |
|
|
{ |
|
|
"output_type": "display_data", |
|
|
"data": { |
|
|
"text/plain": [ |
|
|
"<IPython.core.display.Markdown object>" |
|
|
], |
|
|
"text/markdown": "**Document ID:** 1f43ae29-e41d-474b-b9c7-2afcb36d0d0b<br>**Similarity:** 0.7968888490270644<br>**Text:** 王受命第十一年,5他再度起兵东征。有好几种文献记载武王此次伐\n商的行军日程,但年份和月份皆有所不同。总的来说,武王此次起兵\n是在隆冬季节,决战则是在冬末春初。\n\n总攻的前期工作在前一年底就开始了...<br>" |
|
|
}, |
|
|
"metadata": {} |
|
|
}, |
|
|
{ |
|
|
"output_type": "display_data", |
|
|
"data": { |
|
|
"text/plain": [ |
|
|
"<IPython.core.display.Markdown object>" |
|
|
], |
|
|
"text/markdown": "---" |
|
|
}, |
|
|
"metadata": {} |
|
|
}, |
|
|
{ |
|
|
"output_type": "display_data", |
|
|
"data": { |
|
|
"text/plain": [ |
|
|
"<IPython.core.display.Markdown object>" |
|
|
], |
|
|
"text/markdown": "**`Source Node 2/3`**" |
|
|
}, |
|
|
"metadata": {} |
|
|
}, |
|
|
{ |
|
|
"output_type": "display_data", |
|
|
"data": { |
|
|
"text/plain": [ |
|
|
"<IPython.core.display.Markdown object>" |
|
|
], |
|
|
"text/markdown": "**Document ID:** 1f43ae29-e41d-474b-b9c7-2afcb36d0d0b<br>**Similarity:** 0.7918882416293495<br>**Text:** 并不是武王的私人属下。他们很在意这种身份区别。7\n\n天色渐明,雨势渐小,对面的商军阵列逐渐成形。周人史诗的描\n述是,敌军的戈矛像森林一样密集,所谓“殷商之旅,其会如林”。(《诗\n经•大雅•大明》...<br>" |
|
|
}, |
|
|
"metadata": {} |
|
|
}, |
|
|
{ |
|
|
"output_type": "display_data", |
|
|
"data": { |
|
|
"text/plain": [ |
|
|
"<IPython.core.display.Markdown object>" |
|
|
], |
|
|
"text/markdown": "---" |
|
|
}, |
|
|
"metadata": {} |
|
|
}, |
|
|
{ |
|
|
"output_type": "display_data", |
|
|
"data": { |
|
|
"text/plain": [ |
|
|
"<IPython.core.display.Markdown object>" |
|
|
], |
|
|
"text/markdown": "**`Source Node 3/3`**" |
|
|
}, |
|
|
"metadata": {} |
|
|
}, |
|
|
{ |
|
|
"output_type": "display_data", |
|
|
"data": { |
|
|
"text/plain": [ |
|
|
"<IPython.core.display.Markdown object>" |
|
|
], |
|
|
"text/markdown": "**Document ID:** 1f43ae29-e41d-474b-b9c7-2afcb36d0d0b<br>**Similarity:** 0.7853722213787273<br>**Text:** 作的“金花”。这都是王室才会有的财物,看来王室和奴隶们居住\n的地方相隔并不远。\n\n\f\n第十一章商人的思维与国家\n\n225\n\n需要注意的是,只有殷墟王宫区发现有大量集中存放的石头农具,\n其他任何商...<br>" |
|
|
}, |
|
|
"metadata": {} |
|
|
} |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"source": [ |
|
|
"display_response(chat(\"人祭有几种情况?\"))" |
|
|
], |
|
|
"metadata": { |
|
|
"colab": { |
|
|
"base_uri": "https://localhost:8080/", |
|
|
"height": 421 |
|
|
}, |
|
|
"id": "7c1Kwsj2waEB", |
|
|
"outputId": "da748c8d-3a13-43cb-881a-efd347088018" |
|
|
}, |
|
|
"execution_count": 23, |
|
|
"execution_count": null, |
|
|
"outputs": [ |
|
|
{ |
|
|
"output_type": "display_data", |
|
|
"data": { |
|
|
"text/plain": [ |
|
|
"<IPython.core.display.Markdown object>" |
|
|
], |
|
|
"text/markdown": "**`Final Response:`** 牧野之战是周武王率领西土联军与商朝军队在牧野展开的一场决定性战役。商军总数为七十万人,而西土联军只有轻微的损失。在甲子凌晨,规模较小的周军首先列队完毕,武王全身盔甲戎装,在阵前宣誓,这便是著名的《尚书•牧誓》。紧接着,武王一一点名麾下的盟友、将领、军官,直到“百夫长”,命令他们:“拿起你们的戈,连接好你们的盾牌,立起你们的长矛,现在。”商军阵列却突然自行解体,变成了互相砍杀的人群。或许是看到周军义无反顾的冲锋,商军中的密谋者终于鼓起勇气,倒戈杀向纣王中军。接着,西土联军全部投入了混战。最终,商军队伍就像滚水冲刷的油脂,瞬间溃散,融化。" |
|
|
"text/markdown": "**`Final Response:`** 人祭有两种情况,一种是有蓄意虐杀的迹象,献祭者会尽量延缓人牲的死亡,任凭被剁去肢体的人牲尽量地挣扎、哀嚎或咒骂;另一种是例行的祭祀,随意性更大。" |
|
|
}, |
|
|
"metadata": {} |
|
|
}, |
|
|
@@ -357,7 +465,7 @@ |
|
|
"text/plain": [ |
|
|
"<IPython.core.display.Markdown object>" |
|
|
], |
|
|
"text/markdown": "**Document ID:** 1f43ae29-e41d-474b-b9c7-2afcb36d0d0b<br>**Similarity:** 0.7969439158483032<br>**Text:** 王受命第十一年,5他再度起兵东征。有好几种文献记载武王此次伐\n商的行军日程,但年份和月份皆有所不同。总的来说,武王此次起兵\n是在隆冬季节,决战则是在冬末春初。\n\n总攻的前期工作在前一年底就开始了...<br>" |
|
|
"text/markdown": "**Document ID:** 1f43ae29-e41d-474b-b9c7-2afcb36d0d0b<br>**Similarity:** 0.8310803169491916<br>**Text:** 祭坑留有蓄意虐杀的迹象,尤其当人牲数量不足,献祭者还会尽量延 \n缓人牲的死亡,任凭被剁去肢体的人牲尽量地挣扎、哀嚎或咒骂。这 \n种心态,跟观看古罗马的角斗士表演有相似之处。\n\n\f\n第二十一章殷都...<br>" |
|
|
}, |
|
|
"metadata": {} |
|
|
}, |
|
|
@@ -387,7 +495,7 @@ |
|
|
"text/plain": [ |
|
|
"<IPython.core.display.Markdown object>" |
|
|
], |
|
|
"text/markdown": "**Document ID:** 1f43ae29-e41d-474b-b9c7-2afcb36d0d0b<br>**Similarity:** 0.7919791150839193<br>**Text:** 并不是武王的私人属下。他们很在意这种身份区别。7\n\n天色渐明,雨势渐小,对面的商军阵列逐渐成形。周人史诗的描\n述是,敌军的戈矛像森林一样密集,所谓“殷商之旅,其会如林”。(《诗\n经•大雅•大明》...<br>" |
|
|
"text/markdown": "**Document ID:** 1f43ae29-e41d-474b-b9c7-2afcb36d0d0b<br>**Similarity:** 0.8285896369748251<br>**Text:** :祭祀坑中的无头尸身,往往连带着下顎甚至上顎骨,说明\n每年例行的祭祀的随意性更大。\n\n殷商的王陵祭祀对男性人牲和殉人多用斩首,甚至肢解,而女性\n则多能保存全尸。这背后的宗教思维可能是:男性俘虏和...<br>" |
|
|
}, |
|
|
"metadata": {} |
|
|
}, |
|
|
@@ -417,7 +525,7 @@ |
|
|
"text/plain": [ |
|
|
"<IPython.core.display.Markdown object>" |
|
|
], |
|
|
"text/markdown": "**Document ID:** 1f43ae29-e41d-474b-b9c7-2afcb36d0d0b<br>**Similarity:** 0.7854334012776085<br>**Text:** 作的“金花”。这都是王室才会有的财物,看来王室和奴隶们居住\n的地方相隔并不远。\n\n\f\n第十一章商人的思维与国家\n\n225\n\n需要注意的是,只有殷墟王宫区发现有大量集中存放的石头农具,\n其他任何商...<br>" |
|
|
"text/markdown": "**Document ID:** 1f43ae29-e41d-474b-b9c7-2afcb36d0d0b<br>**Similarity:** 0.8259583785233079<br>**Text:** 仪式上,首先奉献的是侯来、陈本等征伐周边斩获的首级,并 搭配现场屠宰的牲畜,“断牛六,断羊二”;然后向天(上帝)和后 稷献祭,用的是牛“五百有四”头;再向其他百神、水土之神献祭, 用猪、羊等牲畜...<br>" |
|
|
}, |
|
|
"metadata": {} |
|
|
} |
|
|
|