Chroma Ollama 搭建本地RAG應用

黃爸爸好 2024-06-05 發(fā)布于上海

展開全文

> 本文作者為 360 奇舞團前端開發(fā)工程師

本篇文章我們將基于Ollama本地運行大語言模型（LLM），并結合ChormaDB、Langchain來建立一個小型的基于網(wǎng)頁內(nèi)容進行本地問答的RAG應用。

概念介紹

先簡單了解下這些術語：

LLM (A large language model) 是通過使用海量的文本數(shù)據(jù)集（書籍,、網(wǎng)站等）訓練出來的，具備通用語言理解和生成的能力。雖然它可以推理許多內(nèi)容,，但它們的知識僅限于特定時間點之前用于訓練的數(shù)據(jù)。

LangChain 是一個用于開發(fā)由大型語言模型（LLM）驅(qū)動的應用程序的框架,。提供了豐富的接口,、組件、能力簡化了構建LLM應用程序的過程,。

Ollama 是一個免費的開源框架,，可以讓大模型很容易的運行在本地電腦上。

RAG（Retrieval Augmented Generation）是一種利用額外數(shù)據(jù)增強 LLM 知識的技術,，它通過從外部數(shù)據(jù)庫獲取當前或相關上下文信息,，并在請求大型語言模型（LLM）生成響應時呈現(xiàn)給它，從而解決了生成不正確或誤導性信息的問題,。

工作流程圖解如下：

基于上述RAG步驟,接下來我們將使用代碼完成它,。

開始搭建

1. 依據(jù)Ollama使用指南完成大模型的本地下載和的運行。

# LLM
ollama pull llama3
# Embedding Model
ollama pull nomic-embed-text

2. 安裝langchain,、langchain-community,、bs4

pip install langchain langchain-community bs4

3. 初始化langchain提供的Ollama對象

from langchain_community.llms import Ollama
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
# 1. 初始化llm, 讓其流式輸出
llm = Ollama(model='llama3', 
             temperature=0.1, 
             top_p=0.4, 
             callback_manager=CallbackManager([StreamingStdOutCallbackHandler()])
             )

temperature控制文本生成的創(chuàng)造性,，為0時響應是可預測，始終選擇下一個最可能的單詞,，這對于事實和準確性非常重要的答案是非常有用的,。為 1時生成文本會選擇更多的單詞，會產(chǎn)生更具創(chuàng)意但不可能預測的答案,。

top_p 或核心采樣決定了生成時要考慮多少可能的單詞,。高top_p值意味著模型會考慮更多可能的單詞，甚至是可能性較低的單詞,，從而使生成的文本更加多樣化,。

較低的temperature和較高的top_p，可以產(chǎn)生具有創(chuàng)意的連貫文字,。由于temperature較低,，答案通常具有邏輯性和連貫性，但由于top_p較高,，答案仍然具有豐富的詞匯和觀點,。比較適合生成信息類文本，內(nèi)容清晰且能吸引讀者,。

較高的temperature和較低的top_p,，可能會把單詞以難以預測的方式組合在一起。生成的文本創(chuàng)意高,，會出現(xiàn)意想不到的結果,，適合創(chuàng)作。

4. 獲取RAG檢索內(nèi)容并分塊

#`BeautifulSoup'解析網(wǎng)頁內(nèi)容：按照標簽,、類名,、ID 等方式來定位和提取你需要的內(nèi)容
import bs4 
#Load HTML pages using `urllib` and parse them with `BeautifulSoup'
from langchain_community.document_loaders import WebBaseLoader
#文本分割
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = WebBaseLoader(
    web_paths=('https:///guide/introduction.html#html',),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=('content',),
            # id=('article-root',)
        )
    ),
)
docs = loader.load()
# chunk_overlap：分塊的重疊部分
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

chunk_overlap：分塊的重疊部分,重疊有助于降低將語句與與其相關的重要上下文分開的可能性。chunk_size：分塊的大小,，合理的分詞設置會提高RAG的效果

內(nèi)容基于本地的詞嵌入模型 nomic-embed-text 嵌入向量數(shù)據(jù)庫中

# 向量嵌入 ::: conda install onnxruntime -c conda-forge
from langchain_community.vectorstores import Chroma
# 有許多嵌入模型
from langchain_community.embeddings import OllamaEmbeddings
# 基于ollama運行嵌入模型 nomic-embed-text ：A high-performing open embedding model with a large token context window.
vectorstore = Chroma.from_documents(documents=splits,
                                    embedding=OllamaEmbeddings(model='nomic-embed-text'))
# 相似搜索
# vectorstore.similarity_search('vue')

此處的嵌入模型也可以使用其他的比如llama3,、mistral，但是在本地運行太慢了,，它們和nomic-embed-text 一樣不支持中文的詞嵌入,。如果想試試建立一個中文的文檔庫，可以試試 herald/dmeta-embedding-zh詞嵌入的模型,，支持中文,。

ollama pull herald/dmeta-embedding-zh:latest

設置Prompt規(guī)范輸出

from langchain_core.prompts import PromptTemplate
prompt = PromptTemplate(
    input_variables=['context', 'question'],
    template=
    '''You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the 
    question. you don't know the answer, just say you don't know 
    without any explanation Question: {question} Context: {context} Answer:''',
)

基于langchain實現(xiàn)檢索問答

from langchain.chains import RetrievalQA
# 向量數(shù)據(jù)庫檢索器
retriever = vectorstore.as_retriever()

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=retriever,
    chain_type_kwargs={'prompt': prompt}
)
# what is Composition API？
question = 'what is vue?'
result = qa_chain.invoke({'query': question})

# output
# I think I know this one! Based on the context, 
# Vue is a JavaScript framework for building user interfaces 
# that builds on top of standard HTML, CSS, and JavaScript. 
# It provides a declarative way to use Vue primarily in 
# low-complexity scenarios or for building full applications with 
# Composition API + Single-File Components.

如果我問的問題與文檔無關它的回答是怎樣呢,？

question = 'what is react?'
result = qa_chain.invoke({'query': question})

最終執(zhí)行后輸出了I don't know.,。

構建用戶界面

Gradio是一個用于構建交互式機器學習界面的Python庫。Gradio使用非常簡單,。你只需要定義一個有輸入和輸出的函數(shù),，然后Gradio將自動為你生成一個界面,。用戶可以在界面中輸入數(shù)據(jù)，然后觀察模型的輸出結果,。

整合上述代碼,，構建可交互的UI：

import gradio as gr
from langchain_community.llms import Ollama
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import OllamaEmbeddings
from langchain.chains import RetrievalQA
from langchain_core.prompts import PromptTemplate

def init_ollama_llm(model, temperature, top_p):
    return Ollama(model=model,
                  temperature=temperature,
                  top_p=top_p,
                  callback_manager=CallbackManager([StreamingStdOutCallbackHandler()])
                  )

def content_web(url):
    loader = WebBaseLoader(
        web_paths=(url,),
    )
    docs = loader.load()
    # chunk_overlap：分塊的重疊部分,重疊有助于降低將語句與與其相關的重要上下文分開的可能性，
    # 設置了chunk_overlap效果會更好
    # 合理的分詞會提高RAG的效果
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    splits = text_splitter.split_documents(docs)
    return splits

def chroma_retriever_store_content(splits):
    # 基于ollama運行嵌入模型 nomic-embed-text ：A high-performing open embedding model with a large token context window.
    vectorstore = Chroma.from_documents(documents=splits,
                                        embedding=OllamaEmbeddings(model='nomic-embed-text'))
    return vectorstore.as_retriever()

def rag_prompt():
    return PromptTemplate(
        input_variables=['context', 'question'],
        template=
        '''You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the 
        question. you don't know the answer, just say you don't know 
        without any explanation Question: {question} Context: {context} Answer:''',
    )

def ollama_rag_chroma_web_content(web_url, question,temperature,top_p):
    llm = init_ollama_llm('llama3', temperature, top_p)
    splits = content_web(web_url)
    retriever = chroma_retriever_store_content(splits)
    qa_chain = RetrievalQA.from_chain_type(llm, retriever=retriever, chain_type_kwargs={'prompt': rag_prompt()})
    return qa_chain.invoke({'query': question})['result']

demo = gr.Interface(
    fn=ollama_rag_chroma_web_content,
    inputs=[gr.Textbox(label='web_url',value='https:///guide/introduction.html',info='爬取內(nèi)容的網(wǎng)頁地址'),
            'text',
            gr.Slider(0, 1,step=0.1),
            gr.Slider(0, 1,step=0.1)],
    outputs='text',
    title='Ollama+RAG Example',
    description='輸入網(wǎng)頁的URL,，然后提問, 獲取答案'
)

demo.launch()