【原】LLMs：《Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca》翻譯與解讀

處女座的程序猿 2023-06-03 發(fā)布于上海

展開全文

LLMs：《Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca》翻譯與解讀

LLMs：《Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca》翻譯與解讀_一個(gè)處女座的程序猿的博客-CSDN博客

LLMs：在單機(jī)CPU+Windows系統(tǒng)上實(shí)現(xiàn)中文LLaMA算法(基于Chinese-LLaMA-Alpaca)進(jìn)行模型部署且實(shí)現(xiàn)模型推理全流程步驟的圖文教程(非常詳細(xì))

https://yunyaniu.blog.csdn.net/article/details/131016046

《Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca》翻譯與解讀

地址

論文：https:///abs/2304.08177

GitHub地址：
GitHub - ymcui/Chinese-LLaMA-Alpaca: 中文LLaMA&Alpaca大語(yǔ)言模型+本地CPU/GPU訓(xùn)練部署 (Chinese LLaMA & Alpaca LLMs)

作者

Yiming Cui? ymcui@ieee.org

Ziqing Yang? ziqingyang@gmail.com

Xin Yao yaoxin94@foxmail.com

時(shí)間

2023年4月17日

ABSTRACT

Large Language Models (LLMs), such as ChatGPT and GPT-4, have revolutionized natural language processing research and demonstrated potential in Artificial General Intelligence (AGI). However, the expensive training and deployment of LLMs present challenges to transparent and open academic research. To address these issues, this project open-sources the Chinese LLaMA and Alpaca large models, emphasizing instruction fine-tuning. We expand the original LLaMA's Chinese vocabulary by adding 20K Chinese tokens, increasing encoding efficiency and enhancing basic semantic understanding. By incorporating secondary pre-training using Chinese data and fine-tuning with Chinese instruction data, we substantially improve the models' comprehension and execution of instructions. Our pilot study serves as a foundation for researchers adapting LLaMA and Alpaca models to other languages. Resources are made publicly available through GitHub, fostering open research in the Chinese NLP community and beyond. GitHub repository:?this https URL

大型語(yǔ)言模型(LLM)，如ChatGPT和GPT-4，已經(jīng)徹底改變了自然語(yǔ)言處理研究，并在人工通用智能(AGI)中展示了潛力,。然而，法學(xué)碩士的昂貴培訓(xùn)和部署對(duì)透明和開放的學(xué)術(shù)研究提出了挑戰(zhàn)。為了解決這些問(wèn)題，本項(xiàng)目開放了中國(guó)LLaMA 和Alpaca 的大型模型，強(qiáng)調(diào)指令微調(diào),。我們?cè)谠械腖LaMA中文詞匯中增加了20K個(gè)中文標(biāo)記，提高了編碼效率，增強(qiáng)了基本的語(yǔ)義理解,。通過(guò)結(jié)合中文數(shù)據(jù)的二次預(yù)訓(xùn)練和中文指令數(shù)據(jù)的微調(diào)，我們大大提高了模型對(duì)指令的理解和執(zhí)行能力。我們的試點(diǎn)研究為研究人員將LLaMA和Alpaca 模型應(yīng)用于其他語(yǔ)言奠定了基礎(chǔ),。資源通過(guò)GitHub公開提供，促進(jìn)中國(guó)NLP社區(qū)及其他領(lǐng)域的開放研究,。

1、INTRODUCTION

The field of natural language processing (NLP) has undergone a transformative paradigm shift with the advent of Large Language Models (LLMs). These models, characterized by their vast size and extensive training data, have demonstrated remarkable capabilities in understanding and generat- ing human-like text. Unlike pre-trained language models for text understanding, such as BERT (Devlin et al., 2019), the GPT series (Radford et al., 2018) focuses on text generation abilities, mak- ing them a more suitable testbed for creativity than their counterparts. As the latest LLMs in the GPT family, ChatGPT and GPT-4 have attracted significant attention and emerged as leading examples in this rapidly evolving domain.

ChatGPT (OpenAI, 2022), built on the GPT-3.5 (Ouyang et al., 2022) architecture, is an advanced conversational AI model that can engage in context-aware, human-like interactions. Its success has paved the way for the development of GPT-4 (OpenAI, 2023), a more sophisticated LLM, which has demonstrated even greater potential in natural language understanding, generation, and vari- ous NLP tasks. Both models have opened up new avenues of research and applications, fueling interest in exploring the capabilities of Artificial General Intelligence (AGI). These LLMs have not only shown impressive performance in multiple benchmarks but have also exhibited a capacity for few-shot learning and adapting to novel tasks. As a result, they have significantly contributed to the expansion of NLP research, inspiring researchers and industry professionals alike to explore and leverage their potential in a wide range of applications, from sentiment analysis and machine translation to question-answering systems and beyond.

自然語(yǔ)言處理（NLP）領(lǐng)域隨著大型語(yǔ)言模型（LLM）的出現(xiàn)經(jīng)歷了一次轉(zhuǎn)變性的范式轉(zhuǎn)變,。這些模型以其龐大的規(guī)模和豐富的訓(xùn)練數(shù)據(jù)為特征，在理解和生成類人文本方面展示出了非凡的能力,。與用于文本理解的預(yù)訓(xùn)練語(yǔ)言模型（如BERT）不同，GPT系列側(cè)重于文本生成能力，使其比其同行更適合用作創(chuàng)造力的試驗(yàn)平臺(tái),。作為GPT家族中的最新LLM，ChatGPT和GPT-4吸引了廣泛的關(guān)注，并成為這個(gè)快速發(fā)展領(lǐng)域中的領(lǐng)先示例。

ChatGPT（OpenAI, 2022）基于GPT-3.5（Ouyang et al., 2022）架構(gòu)構(gòu)建，是一種先進(jìn)的對(duì)話型AI模型，可以進(jìn)行上下文感知的類人對(duì)話,。其成功為GPT-4（OpenAI, 2023）的開發(fā)鋪平了道路，后者是一個(gè)更復(fù)雜的LLM，展示了在自然語(yǔ)言理解,、生成和各種NLP任務(wù)方面更大的潛力。這兩個(gè)模型開辟了研究和應(yīng)用領(lǐng)域的新途徑，引發(fā)了對(duì)探索人工通用智能（AGI）能力的興趣,。這些LLM不僅在多個(gè)基準(zhǔn)測(cè)試中表現(xiàn)出色，而且還展示了少樣本學(xué)習(xí)和適應(yīng)新任務(wù)的能力,。因此，它們對(duì)于NLP研究的拓展做出了重要貢獻(xiàn)，激發(fā)了研究人員和行業(yè)專業(yè)人士對(duì)在情感分析、機(jī)器翻譯,、問(wèn)答系統(tǒng)等各種應(yīng)用中發(fā)掘和利用它們潛力的興趣,。

Despite the remarkable advancements brought about by LLMs, these models come with certain limitations that hinder transparent and open research. One of the most notable concerns is their proprietary nature, which restricts access to the models and hampers the ability of the broader re- search community to build upon their successes. Additionally, the immense computational resources required for training and deploying these models pose a challenge for researchers with limited re- sources, further exacerbating the accessibility problem.

In response to these limitations, the NLP research community has turned to open-source alternatives to foster greater transparency and collaboration. Two such examples are LLaMA (Touvron et al., 2023) and Alpaca (Taori et al., 2023), where the Alpaca model is further finetuned on LLaMA with instruction data. These open-source LLMs have been designed to facilitate academic research and accelerate progress in the field of NLP. By open-sourcing these models, the NLP community aims to create an environment that encourages further advancements in model development, fine-tuning, and evaluation, ultimately leading to more robust and capable LLMs that can be utilized in a wide range of applications.

Although LLaMA and Alpaca have made significant strides in the world of NLP, they possess inher- ent limitations when it comes to natively supporting Chinese language tasks. The original models contain only a few hundred Chinese tokens in their vocabularies, significantly hampering their effi- ciency in encoding and decoding Chinese text2. Drawing from our previous work on Chinese BERT series (Cui et al., 2021) and Chinese minority-oriented multilingual pre-trained models (Yang et al., 2022), in this technical report, we propose Chinese LLaMA and Alpaca with enhanced abilities in Chinese understanding and generation. We extend the original LLaMA’s vocabulary with an ad- ditional 20K Chinese tokens, substantially improving its ability to process and generate Chinese text. To ensure efficient training and deployment of the Chinese LLaMA and Alpaca models, we adopt the Low-Rank Adaptation (LoRA) approach (Hu et al., 2021), which allows us to train and fine-tune the models without incurring excessive computational costs. Our pilot study in enhancing the Chinese understanding and generation capabilities of LLaMA and Alpaca models can serve as a foundation for researchers seeking to adapt these models to other languages. By demonstrating the feasibility and effectiveness of our approach, we provide insights and methodologies that can be applied to extend the vocabularies and improve the performance of LLaMA and Alpaca models in different languages.

盡管LLM帶來(lái)了顯著的進(jìn)展，但這些模型存在一些限制，阻礙了透明和開放的研究。其中最顯著的問(wèn)題之一是它們的專有性質(zhì)，限制了對(duì)模型的訪問(wèn)，并阻礙了廣大研究社區(qū)在其成功基礎(chǔ)上的進(jìn)一步開發(fā),。此外，訓(xùn)練和部署這些模型所需的巨大計(jì)算資源對(duì)于資源有限的研究人員構(gòu)成了挑戰(zhàn)，進(jìn)一步加劇了可訪問(wèn)性問(wèn)題,。

針對(duì)這些限制，NLP研究社區(qū)已經(jīng)轉(zhuǎn)向開源替代方案，以促進(jìn)更大的透明度和合作。LLaMA（Touvron et al., 2023）和Alpaca（Taori et al., 2023）就是其中兩個(gè)例子，其中Alpaca模型在LLaMA基礎(chǔ)上進(jìn)一步進(jìn)行了指導(dǎo)數(shù)據(jù)的微調(diào),。這些開源LLM的設(shè)計(jì)旨在促進(jìn)學(xué)術(shù)研究，并加速NLP領(lǐng)域的進(jìn)展,。通過(guò)開源這些模型，NLP社區(qū)旨在創(chuàng)造一個(gè)鼓勵(lì)模型開發(fā)、微調(diào)和評(píng)估進(jìn)一步進(jìn)展的環(huán)境，最終實(shí)現(xiàn)更強(qiáng)大,、更具能力的LLM，可以在各種應(yīng)用中利用,。

盡管LLaMA和Alpaca在NLP領(lǐng)域取得了重要進(jìn)展，但它們在本地支持中文任務(wù)方面存在固有的局限性。原始模型的詞匯表中只包含了幾百個(gè)中文標(biāo)記，嚴(yán)重影響了它們?cè)诰幋a和解碼中文文本方面的效率,。借鑒我們之前在中文BERT系列（Cui et al., 2021）和面向中國(guó)少數(shù)民族的多語(yǔ)言預(yù)訓(xùn)練模型（Yang et al., 2022）上的工作，本技術(shù)報(bào)告中，我們提出了具有增強(qiáng)中文理解和生成能力的中文LLaMA和Alpaca,。我們通過(guò)額外添加2萬(wàn)個(gè)中文標(biāo)記來(lái)擴(kuò)展原始LLaMA的詞匯表，大大提高了處理和生成中文文本的能力。為了確保高效訓(xùn)練和部署中文LLaMA和Alpaca模型，我們采用了低秩自適應(yīng)（Low-Rank Adaptation，LoRA）方法（Hu et al., 2021），它允許我們?cè)?span style="color:#ff0000;">不產(chǎn)生過(guò)多計(jì)算成本的情況下進(jìn)行模型的訓(xùn)練和微調(diào),。我們?cè)谠鰪?qiáng)LLaMA和Alpaca模型的中文理解和生成能力方面的初步研究可以為希望將這些模型適應(yīng)其他語(yǔ)言的研究人員提供基礎(chǔ),。通過(guò)展示我們方法的可行性和有效性，我們提供了可以應(yīng)用于擴(kuò)展LLaMA和Alpaca模型不同語(yǔ)言詞匯表和提高性能的見解和方法。

In summary, the contributions of this technical report are as follows:

>> We enhance the Chinese encoding and decoding efficiency and improve LLaMA’s Chinese understanding ability by extending the original LLaMA’s vocabulary with an additional 20,000 Chinese tokens.

>> We adopt the Low-Rank Adaptation (LoRA) approach for the efficient training and deployment of the Chinese LLaMA and Alpaca models, enabling researchers to work with these models without incurring excessive computational costs.

>> We evaluate the performance of the Chinese Alpaca 7B and 13B models on a variety of natural language understanding (NLU) and natural language generation (NLG) tasks, demonstrating significant improvements over the original LLaMA counterparts in the context of Chinese lan- guage tasks.

>> We make the resources and findings of our study publicly available, fostering further research and collaboration within the NLP community and encouraging the adaptation of LLaMA and Alpaca models to other languages.

總結(jié)起來(lái)，本技術(shù)報(bào)告的貢獻(xiàn)如下：

>> 我們通過(guò)在原始LLaMA詞匯表中增加2萬(wàn)個(gè)中文標(biāo)記，提高了中文編碼和解碼的效率，并改進(jìn)了LLaMA在中文理解方面的能力,。

>> 我們采用低秩自適應(yīng)（LoRA）方法對(duì)中文LLaMA和Alpaca模型進(jìn)行高效的訓(xùn)練和部署，使研究人員能夠在不產(chǎn)生過(guò)多計(jì)算成本的情況下使用這些模型,。

>> 我們評(píng)估了中文Alpaca 7B和13B模型在各種自然語(yǔ)言理解（NLU）和自然語(yǔ)言生成（NLG）任務(wù)上的性能，在中文語(yǔ)言任務(wù)的背景下，相比原始LLaMA模型，取得了顯著的改進(jìn)。

>> 我們公開提供了我們研究的資源和發(fā)現(xiàn)，促進(jìn)了NLP社區(qū)內(nèi)的進(jìn)一步研究和合作，并鼓勵(lì)將LLaMA和Alpaca模型應(yīng)用到其他語(yǔ)言中,。

2,、Chinese Llama

LLaMA (, ) is a decoder-only, foundational large language model based on the transformer architecture (, ). Similar to other transformer-based LLMs, LLaMA comprises an embedding layer, multiple transformer blocks, and an LM head layer. It also in- corporates various improvements, such as Pre-normalization (, ), SwiGLU activation (, ), and Rotary Embeddings (, ). The total number of parame- ters in LLaMA ranges from 7B to 65B. Experiments demonstrate that LLaMA achieves competitive performance compared to other LLMs, like GPT-3, while maintaining a smaller model size.

LLaMA has been pre-trained on 1T to 1.4T tokens from publicly available corpora, with the majority of the data in English and only a small fraction in other languages using Latin or Cyrillic scripts. As a result, LLaMA’s ability to understand and generate Chinese is limited. To address this, we propose pre-training the LLaMA model on Chinese corpora to enhance its fundamental Chinese understanding and generation capabilities.

LLaMA（Touvron等人，2023年）是一種基于Transformer架構(gòu)（Vaswani等人，2017年）的僅解碼的基礎(chǔ)大型語(yǔ)言模型,。類似于其他基于Transformer的語(yǔ)言模型，LLaMA包括嵌入層,、多個(gè)Transformer塊和LM頭部層,。它還融入了各種改進(jìn)，如Pre-normalization（Zhang＆Sennrich，2019年）,、SwiGLU激活（Shazeer，2020年）和Rotary Embeddings（Su等人，2021年）。LLaMA的總參數(shù)數(shù)量在7B到65B之間,。實(shí)驗(yàn)表明，LLaMA在保持更小模型尺寸的同時(shí)，與其他語(yǔ)言模型（如GPT-3）相比取得了競(jìng)爭(zhēng)性能,。

LLaMA已經(jīng)在公開可用的語(yǔ)料庫(kù)中對(duì)1T到1.4T個(gè)標(biāo)記進(jìn)行了預(yù)訓(xùn)練，其中大部分?jǐn)?shù)據(jù)是英語(yǔ)，只有很小一部分是其他語(yǔ)言，使用拉丁字母或西里爾字母腳本。因此，LLaMA對(duì)于理解和生成中文的能力有限,。為了解決這個(gè)問(wèn)題，我們建議在中文語(yǔ)料庫(kù)上對(duì)LLaMA模型進(jìn)行預(yù)訓(xùn)練，以增強(qiáng)其對(duì)中文的基本理解和生成能力,。

Directly pre-training LLaMA on Chinese corpora faces several challenges. Firstly, there are less than one thousand Chinese characters in the original LLaMA tokenizer vocabulary. Although the LLaMA tokenizer supports all Chinese characters by falling back to bytes, this fallback strategy significantly increases sequence length and slows down the processing efficiency on Chinese texts. Moreover, byte tokens are not exclusively designed for representing Chinese characters, as they are also used to represent other UTF-8 tokens, making it difficult for byte tokens to learn the semantic meaning of Chinese characters.

To address these issues, we propose to extend the LLaMA tokenizer with additional Chinese tokens and adapt the model for the new tokenizer (, ):

To enhance the tokenizer’s support for Chinese text, we first train a Chinese tokenizer with SentencePiece (, ) on the Chinese corpus, using a vocabulary size of 20,000. We then merge the Chinese tokenizer into the original LLaMA tokenizer by combin- ing their vocabularies. Ultimately, we obtain a merged tokenizer, which we call the Chinese LLaMA tokenizer, with a vocabulary size of 49,953.
To adapt the model for the Chinese LLaMA tokenizer, we resize the word embeddings and language model head from shape V × H to V ′ × H, where V = 32, 000 represents the original vocabulary size, and V ′ = 49, 953 is the vocabulary size of the Chinese LLaMA tokenizer.The new rows are appended to the end of the original embedding matrices, ensuring that the embeddings of the tokens in the original vocabulary remain unaffected.

直接在中文語(yǔ)料庫(kù)上對(duì)LLaMA進(jìn)行預(yù)訓(xùn)練面臨一些挑戰(zhàn),。首先，原始LLaMA分詞器的詞匯表中只包含不到一千個(gè)中文字,。雖然LLaMA分詞器通過(guò)字節(jié)回退來(lái)支持所有中文字，但這種回退策略會(huì)顯著增加序列長(zhǎng)度，并降低處理中文文本的效率。此外，字節(jié)標(biāo)記并不是專門設(shè)計(jì)用于表示中文字符的，因?yàn)樗鼈円灿糜诒硎酒渌鸘TF-8標(biāo)記，這使得字節(jié)標(biāo)記難以學(xué)習(xí)到中文字符的語(yǔ)義含義,。

為了解決這些問(wèn)題，我們提議通過(guò)添加額外的中文標(biāo)記來(lái)擴(kuò)展LLaMA分詞器，并使模型適應(yīng)新的分詞器（Yang等人，2022年）：

為增強(qiáng)分詞器對(duì)中文文本的支持，我們首先使用SentencePiece（Kudo＆Richardson，2018年）在中文語(yǔ)料庫(kù)上訓(xùn)練一個(gè)中文分詞器，使用詞匯表大小為20,000。然后，我們將中文分詞器與原始LLaMA分詞器合并，結(jié)合它們的詞匯表,。最終，我們獲得了一個(gè)合并的分詞器，稱為Chinese LLaMA分詞器，其詞匯表大小為49,953,。
為了使模型適應(yīng)Chinese LLaMA分詞器，我們將詞嵌入和語(yǔ)言模型頭部的形狀從V×H調(diào)整為V'×H，其中V = 32,000表示原始詞匯表大小，V' = 49,953表示Chinese LLaMA分詞器的詞匯表大小,。

新的行被追加到原始嵌入矩陣的末尾，確保原始詞匯表中的標(biāo)記的嵌入不受影響,。

Our preliminary experiments show that the number of tokens generated by the Chinese-LLaMA tokenizer is roughly half of those generated by the original LLaMA tokenizer. Table shows an example comparison between the original LLaMA tokenizer and our Chinese LLaMA tokenizer. As we can see, using the Chinese LLaMA tokenizer significantly reduces the encoding length com- pared to the original. Given a fixed context length, the model can accommodate about twice as much information, and the generation speed is two times faster compared to the original LLaMA tokenizer. This demonstrates the effectiveness of our proposed approach in enhancing the Chinese understanding and generation capabilities of the LLaMA model.

After completing the aforementioned adaptation steps, we pre-train the Chinese-LLaMA model us- ing the Chinese-LLaMA tokenizer on the standard Casual Language Modeling (CLM) task. Given an input token sequence x = (x0, x1, x2, . . .), the model is trained to predict the next token in an autoregressive manner. The objective is to minimize the following negative log likelihood:

我們的初步實(shí)驗(yàn)結(jié)果顯示，Chinese LLaMA分詞器生成的標(biāo)記數(shù)量大約是原始LLaMA分詞器生成的標(biāo)記數(shù)量的一半。表1展示了原始LLaMA分詞器和我們的Chinese LLaMA分詞器之間的示例對(duì)比,。從中可以看出，使用Chinese LLaMA分詞器顯著減少了編碼長(zhǎng)度，相比原始分詞器，給定固定的上下文長(zhǎng)度，模型可以容納大約兩倍的信息量，并且生成速度比原始LLaMA分詞器快兩倍,。這證明了我們提出的方法在增強(qiáng)LLaMA模型的中文理解和生成能力方面的有效性。

在完成上述適應(yīng)步驟后，我們使用Chinese LLaMA分詞器在標(biāo)準(zhǔn)的非正式語(yǔ)言建模（CLM）任務(wù)上對(duì)Chinese LLaMA模型進(jìn)行預(yù)訓(xùn)練,。給定輸入標(biāo)記序列x =（x0，x1，x2，...），模型以自回歸的方式預(yù)測(cè)下一個(gè)標(biāo)記,。目標(biāo)是最小化以下負(fù)對(duì)數(shù)似然：

3、Chinese Alpaca

After obtaining the pre-trained Chinese LLaMA model, we follow the approach used in Stanford Al- paca (, ) to apply self-instructed fine-tuning to train the instruction-following model.

Each example consists of an instruction and an output. We input the instruction into the model and prompt the model to generate the output auto-regressively. This process is similar to the ca- sual language modeling task. We adopt the following prompt template from Stanford Alpaca for self-instructed fine-tuning, which is also utilized during inference:

A key difference between our approach and Stanford Alpaca is that we exclusively use the prompt template designed for examples without an input field, whereas Stanford Alpaca employs two tem- plates for examples with and without an input field separately. If the example contains a non-empty input field, we concatenate the instruction and input with an “\n” to form the new instruction. Note that there is an additional padding token for the Alpaca model, resulting in a vocabulary size of 49,954.

在獲得預(yù)訓(xùn)練的Chinese LLaMA模型之后，我們按照Stanford Alpaca（Taori等人，2023年）中使用的方法，采用自我指導(dǎo)的微調(diào)方法來(lái)訓(xùn)練指令跟隨模型,。

每個(gè)示例包括一條指令和一個(gè)輸出,。我們將指令輸入模型，并提示模型自回歸地生成輸出。這個(gè)過(guò)程類似于非正式語(yǔ)言建模任務(wù),。我們采用了Stanford Alpaca中的以下提示模板來(lái)進(jìn)行自我指導(dǎo)的微調(diào)，在推理過(guò)程中也使用該模板：

我們的方法與Stanford Alpaca的一個(gè)關(guān)鍵區(qū)別在于，我們專門使用為沒(méi)有輸入字段的示例設(shè)計(jì)的提示模板，而Stanford Alpaca分別使用了針對(duì)具有和不具有輸入字段的示例的兩個(gè)模板,。如果示例包含非空的輸入字段，我們將指令和輸入用“\n”連接起來(lái)形成新的指令。請(qǐng)注意，Alpaca模型還有一個(gè)額外的填充標(biāo)記，導(dǎo)致詞匯表大小為49,954,。

4,、Parameter Efficient Fine-Tuning With Lora使用LoRA進(jìn)行參數(shù)高效微調(diào)

Low-Rank Adaptation (LoRA) (, ) is a parameter-efficient training method that main- tains the pre-trained model weights while introducing trainable rank decomposition matrices. This approach significantly reduces the number of trainable parameters. The general formulation of LoRA is represented in the following equation, where r is the pre-determined rank, d is the hid- den size, and A and B are the decomposed trainable matrices:

h = W0x + ?W x = W0x + BAx, ?B ∈ Rd×r , A ∈ Rr×d?(3)

低秩適應(yīng)（LoRA）（Hu等，2021年）是一種參數(shù)高效的訓(xùn)練方法，它在保持預(yù)訓(xùn)練模型權(quán)重的同時(shí)引入可訓(xùn)練的秩分解矩陣。這種方法顯著減少了可訓(xùn)練參數(shù)的數(shù)量,。LoRA的一般公式如下所示，其中r是預(yù)定的秩，d是隱藏大小，A和B是分解的可訓(xùn)練矩陣：

h = W0x + ?W x = W0x + BAx, B ∈ Rd×r , A ∈ Rr×d (3)

To achieve parameter-efficient training while adhering to a tight budget, we apply LoRA to the

Chinese-LLaMA/Alpaca models in all our experiments, including both pre-training and fine-tuning

stages. We primarily incorporate LoRA adapters into the weights of the attention module and, in

some cases, additional MLP layers. For further details, please refer to the next section and Table .

為了在預(yù)算限制下實(shí)現(xiàn)參數(shù)高效的訓(xùn)練，我們將LoRA應(yīng)用于所有實(shí)驗(yàn)中的Chinese LLaMA/Alpaca模型，包括預(yù)訓(xùn)練和微調(diào)階段,。我們主要將LoRA適配器應(yīng)用于注意力模塊的權(quán)重，有時(shí)還包括其他MLP層。更多詳細(xì)信息，請(qǐng)參見下一節(jié)和表2,。

5,、Experimental Setups實(shí)驗(yàn)設(shè)置

5.1、Experimental Setups For Pre-Training And Fine-Tuning預(yù)訓(xùn)練和微調(diào)的實(shí)驗(yàn)設(shè)置

5.1.1,、7B Version版本

Pre-training We initialize the Chinese-LLaMA model with the original LLaMA weights and pre- train the model on general Chinese corpora, consistent with the corpora used in Chinese BERT-wwm (, ), MacBERT (, ), LERT (, ), and others, resulting in a 20GB text corpus. The pre-training process consists of two stages:

Stage 1: We fix the parameters of the transformer encoders within the model and only train the embeddings, adapting the newly added Chinese word vectors while minimizing the disturbance to the original model.

Stage 2: We add LoRA weights (adapters) to the attention mechanisms and train the embed- dings, LM heads, and newly added LoRA parameters.

預(yù)訓(xùn)練：我們使用原始LLaMA模型的權(quán)重初始化Chinese LLaMA模型，并在與中文BERT-wwm（Cui等，2021年）,、MacBERT（Cui等，2020年）、LERT（Cui等，2022年）等使用的語(yǔ)料庫(kù)相一致的通用中文語(yǔ)料庫(kù)上對(duì)模型進(jìn)行預(yù)訓(xùn)練，得到一個(gè)20GB的文本語(yǔ)料庫(kù),。預(yù)訓(xùn)練過(guò)程分為兩個(gè)階段：

階段1：我們固定模型內(nèi)部的Transformer編碼器的參數(shù)，只訓(xùn)練嵌入層，適應(yīng)新增的中文詞向量，同時(shí)盡量減小對(duì)原始模型的干擾,。

階段2：我們添加LoRA權(quán)重（適配器）到注意力機(jī)制中，并訓(xùn)練嵌入層、語(yǔ)言模型頭部和新增的LoRA參數(shù),。

Instruction Fine-tuning After obtaining the pre-trained model, we fine-tune it according to Sec- tion . We also use LoRA for efficient fine-tuning, increasing the number of trainable parameters by adding LoRA adapters to the MLP layers. We utilize approximately 2M data points, including translation (, ), pCLUE, Stanford Alpaca, and crawled SFT data for tuning the 7B model.

For the crawled data, we employ the self-instruction (, ) method for automatically obtaining data from ChatGPT (gpt-3.5-turbo API), as used in (). Templates and code details are available on GitHub.

The hyperparameters are listed in Table . Detailed information about the fine-tuning data is pro- vided in Table .

指令微調(diào)：在獲得預(yù)訓(xùn)練模型后，我們根據(jù)第3節(jié)對(duì)其進(jìn)行微調(diào),。我們還使用LoRA進(jìn)行高效微調(diào)，通過(guò)將LoRA適配器添加到MLP層中，增加可訓(xùn)練參數(shù)的數(shù)量。我們使用了約2M個(gè)數(shù)據(jù)點(diǎn)進(jìn)行微調(diào)，包括翻譯（Xu，2019年）,、pCLUE3,、Stanford Alpaca和從ChatGPT（gpt-3.5-turbo API）中自動(dòng)獲取的爬取的SFT數(shù)據(jù)，這與Taori等人（2023年）使用的方法相同。模板和代碼細(xì)節(jié)可以在GitHub上找到,。

具體的超參數(shù)請(qǐng)參見表2,。有關(guān)微調(diào)數(shù)據(jù)的詳細(xì)信息請(qǐng)參見表3。

5.1.2,、13B Version版本

Pre-training The pre-training process for the 13B model is largely the same as that of the 7B model, with the exception that we skip stage 1 in the pre-training. We directly apply LoRA to attentions and MLPs for training while setting the embeddings and LM head as trainable.

Instruction Fine-tuning The LoRA settings and trainable parameters remain the same as in the pre-training stage. We use an additional 1M crawled self-instructed data points for the 13B model fine-tuning, resulting in a total data size of 3M for the 13B model.

預(yù)訓(xùn)練：13B模型的預(yù)訓(xùn)練過(guò)程與7B模型基本相同，唯一的區(qū)別是我們跳過(guò)了預(yù)訓(xùn)練的第一階段,。我們直接對(duì)注意力機(jī)制和MLP進(jìn)行LoRA訓(xùn)練，同時(shí)將嵌入層和語(yǔ)言模型頭部設(shè)置為可訓(xùn)練。

指令微調(diào)：LoRA的設(shè)置和可訓(xùn)練參數(shù)與預(yù)訓(xùn)練階段相同,。我們使用額外的1M個(gè)自我指導(dǎo)的爬取數(shù)據(jù)點(diǎn)對(duì)13B模型進(jìn)行微調(diào)，從而使13B模型的總數(shù)據(jù)量達(dá)到3M,。

The hyperparameters are listed in Table .

Table 2: Training recipes for LLaMA (pre-training stages) and Alpaca (instruction SFT stage) 7B and 13B. PT: pre-training. SFT: supervised fine-tuning. QKVO: four matrices (represents query, key, value, and output) in each attention module. MLP: three matrices in each MLP layer.

具體的超參數(shù)請(qǐng)參見表2。

表2：LLaMA（預(yù)訓(xùn)練階段）和Alpaca（指令SFT階段）7B和13B的訓(xùn)練配置,。PT：預(yù)訓(xùn)練,。SFT：監(jiān)督微調(diào)。QKVO：每個(gè)注意力模塊中的四個(gè)矩陣（表示查詢,、鍵,、值和輸出）。MLP：每個(gè)MLP層中的三個(gè)矩陣,。

7B Settings	PT Stage 1 PT Stage 2	Instruction SFT
Batch size	1024 1024	512
Peak learning rate	2e-4 1e-4	1e-4
Training steps	3K 6K	6-10K
Max length	512 512	512
Trainable parameters	2.97% 6.06%	6.22%
LoRA rank	- 8	8
LoRA weights	- QKVO	QKVO, MLP
Training device	8 × A100 16 × A100	16 × A100
Distributed training	DeepSpeed ZeRO-2 DeepSpeed ZeRO-2	DeepSpeed ZeRO-2
13B Settings	PT	Instruction SFT
Batch size	2304	1152
Peak learning rate	2e-4	1e-4
Training steps	7K	5.5K
Max length	512	512
Trainable parameters	4.10%	4.10%
LoRA rank	8	8
LoRA weights	QKVO, MLP	QKVO, MLP
Training device	48 × A100	48 × A100
Distributed training	DeepSpeed ZeRO-2	DeepSpeed ZeRO-2

5.2,、Experimental Setups For Decoding解碼的實(shí)驗(yàn)設(shè)置

The decoding process of LLMs plays a critical role in determining the quality and diversity of the generated text. In our experiments, we use the following decoding hyperparameters

LLM的解碼過(guò)程在確定生成文本的質(zhì)量和多樣性方面起著關(guān)鍵作用。在我們的實(shí)驗(yàn)中，我們使用以下解碼超參數(shù)：

Context size: We set the context size to 2048, which determines the maximum number of tokens that the model can consider simultaneously when generating text.

Maximum sequence length: We limit the generated sequence length to 512 tokens to ensure that the outputs remain focused and relevant to the input prompt.

Temperature: We set the temperature to 0.2, which controls the randomness of the sampling process. Lower values make the model generate more focused and deterministic outputs, while higher values increase diversity at the cost of coherence.

Top-k sampling: We use Top-k sampling with k = 40, meaning that the model selects its next token from the top 40 most probable tokens at each step, adding an element of randomness and diversity to the generated text.

Top-p sampling: We also employ Top-p sampling with p = 0.9, which further enhances diver- sity by considering a dynamic set of tokens that collectively account for 90% of the probability mass.

Repetition penalty: To discourage the model from generating repetitive text, we apply a repeti- tion penalty with a factor of 1.3, penalizing tokens that have already been selected.

Note that these values may not be optimal for each testing scenario. We did not perform further tuning on these hyperparameters for each task to maintain a balanced view.

>> 上下文大小：我們將上下文大小設(shè)置為2048，這決定了模型在生成文本時(shí)同時(shí)考慮的最大標(biāo)記數(shù),。

>> 最大序列長(zhǎng)度：我們將生成的序列長(zhǎng)度限制為512個(gè)標(biāo)記，以確保輸出保持專注和與輸入提示相關(guān),。

>> 溫度：我們將溫度設(shè)置為0.2，控制抽樣過(guò)程的隨機(jī)性。較低的值使模型生成更專注和確定性的輸出，而較高的值增加了多樣性，但可能降低連貫性,。

>> Top-k抽樣：我們使用k=40的Top-k抽樣，意味著模型在每個(gè)步驟從最有可能的前40個(gè)標(biāo)記中選擇下一個(gè)標(biāo)記，從而為生成的文本添加一定的隨機(jī)性和多樣性,。

>> Top-p抽樣：我們還使用p=0.9的Top-p抽樣，通過(guò)考慮動(dòng)態(tài)的標(biāo)記集合，這些標(biāo)記總體上占據(jù)了90%的概率質(zhì)量，進(jìn)一步增強(qiáng)了多樣性。

>> 重復(fù)懲罰：為了防止模型生成重復(fù)的文本，我們應(yīng)用了一個(gè)重復(fù)懲罰因子為1.3的機(jī)制，對(duì)已經(jīng)被選中的標(biāo)記進(jìn)行懲罰,。

請(qǐng)注意，這些值可能不適用于每個(gè)測(cè)試場(chǎng)景,。我們沒(méi)有針對(duì)每個(gè)任務(wù)進(jìn)一步調(diào)整這些超參數(shù)，以保持平衡的觀點(diǎn)。

5.3,、Deployment On CPU在CPU上部署

Deploying large language models on personal computers, particularly on CPUs, has historically been challenging due to their immense computational requirements. However, with the help of many community efforts, such as llama.cpp (, ), users can efficiently quantize LLMs into 4-bit forms, significantly reducing memory usage and computational demands, making it easier to deploy LLMs on personal computers. This also enables quicker interactions with the models and facilitates local data processing.

在個(gè)人計(jì)算機(jī)上部署大型語(yǔ)言模型，特別是在CPU上，過(guò)去一直存在著巨大的計(jì)算需求挑戰(zhàn),。然而，借助眾多社區(qū)的努力，例如llama.cpp（Gerganov，2023年），用戶可以將LLM有效地量化為4位形式，顯著減少內(nèi)存使用和計(jì)算需求，從而更容易在個(gè)人計(jì)算機(jī)上部署LLM。這也可以實(shí)現(xiàn)與模型的更快交互，并促進(jìn)本地?cái)?shù)據(jù)處理,。

Quantizing LLMs and deploying them on personal computers offer several benefits. Firstly, it helps users protect their data privacy by ensuring that sensitive information remains within their local environment, rather than being transmitted to external servers. Secondly, it democratizes access to LLMs by making them more accessible to users with limited computational resources. Lastly, it promotes the development of new applications and research directions that take advantage of local LLM deployments. Overall, the ability to deploy LLMs on personal computers using llama.cpp (or similar) paves the way for a more versatile and privacy-conscious utilization of LLMs in various domains.

In the following sections, we will use the 4-bit round-to-nearest (RTN) (, ; , ) quantized Chinese Alpaca for evaluation, which is more realistic from a user perspective rather than a research-oriented view. As a kind reminder, 4-bit quantized models gener- ally perform worse than FP16 or FP32 models.

量化LLM并在個(gè)人計(jì)算機(jī)上部署具有多個(gè)好處,。首先，它有助于用戶通過(guò)確保敏感信息保留在本地環(huán)境中而不是傳輸?shù)酵獠糠?wù)器來(lái)保護(hù)數(shù)據(jù)隱私。其次，它使具有有限計(jì)算資源的用戶更容易訪問(wèn)LLM，從而使其更加民主化,。最后，它促進(jìn)了利用本地LLM部署的新應(yīng)用程序和研究方向的發(fā)展,。總體而言，使用llama.cpp（或類似工具）在個(gè)人計(jì)算機(jī)上部署LLM為各個(gè)領(lǐng)域中更多樣化和注重隱私的LLM利用鋪平了道路,。

在接下來(lái)的幾節(jié)中，我們將使用4位最接近（RTN）（Yao等，2022年；Dettmers等，2022年）量化的中文Alpaca進(jìn)行評(píng)估，從用戶的角度來(lái)看，這更加符合實(shí)際，而不是面向研究,。

需要提醒的是，4位量化模型的性能通常低于FP16或FP32模型。

5.4、Evaluation And Task Design評(píng)估和任務(wù)設(shè)計(jì)

Evaluating the performance of text generation tasks can be challenging due to the significant variety in their form, unlike natural language understanding tasks (such as text classification and extractive machine reading comprehension). Following previous work that utilizes GPT-4 as a scoring method, we also adopt GPT-4 to provide an overall score (on a 10-point scale) for each sample, which is more efficient than human evaluation. However, GPT-4 may not always provide accurate scores, so we?perform manual checks on its ratings and adjust them if necessary. The manual checks ensure that the scores are consistent and reflect the true performance of the models being evaluated. We use the following prompt template for scoring the outputs of the systems:

由于文本生成任務(wù)的形式存在顯著的多樣性，評(píng)估其性能可能具有挑戰(zhàn)性，與自然語(yǔ)言理解任務(wù)（如文本分類和抽取式機(jī)器閱讀理解）不同,。我們采用了以GPT-4作為評(píng)分方法的先前工作所采用的方法，用于為每個(gè)樣本提供一個(gè)綜合得分（在10分制度上），這比人工評(píng)估更高效,。然而，GPT-4并不總是能夠提供準(zhǔn)確的評(píng)分，因此我們對(duì)其評(píng)分進(jìn)行人工檢查，并在必要時(shí)進(jìn)行調(diào)整。人工檢查確保得分的一致性，并反映所評(píng)估模型的真實(shí)性能,。我們使用以下提示模板對(duì)系統(tǒng)輸出進(jìn)行評(píng)分：

By employing GPT-4 as a scoring method in conjunction with manual checks, we establish a reliable evaluation framework that effectively measures the performance of our Chinese Alpaca models on a range of natural language understanding and generation tasks.

Our evaluation set is designed to provide a comprehensive assessment of the Chinese Alpaca models across a wide range of natural language understanding and generation tasks. The set comprises 160 samples, covering 10 distinct tasks, including Question Answering, Reasoning, Literature, Enter- tainment, Translation, Multi-turn Dialogue, Coding, and Ethics, among others. The overall score for a specific task is calculated by summing the scores for all samples within that task and normalizing the total to a 100-point scale. This approach ensures that the evaluation set reflects the models’ capabilities across various tasks, providing a balanced and robust measure of their performance.

通過(guò)將GPT-4作為評(píng)分方法與人工檢查相結(jié)合，我們建立了一個(gè)可靠的評(píng)估框架，有效地衡量了我們的中文Alpaca模型在各種自然語(yǔ)言理解和生成任務(wù)上的性能,。

我們的評(píng)估集旨在全面評(píng)估中文Alpaca模型在各種自然語(yǔ)言理解和生成任務(wù)中的性能。該集合包含160個(gè)樣本，涵蓋了10個(gè)不同的任務(wù)，包括問(wèn)答,、推理,、文學(xué)、娛樂(lè),、翻譯,、多輪對(duì)話、編碼和倫理等,。特定任務(wù)的總體得分是通過(guò)對(duì)該任務(wù)中所有樣本的得分求和并將總和歸一化到100分制來(lái)計(jì)算的,。這種方法確保評(píng)估集反映了模型在各種任務(wù)上的能力，提供了一個(gè)平衡和穩(wěn)健的性能度量。

6,、Results結(jié)果

In this section, we present and analyze the results obtained from our experiments with 4-bit quan- tized Chinese Alpaca-7B and Alpaca-13B models, as shown in Table . The evaluation is based on GPT-4 rated results across ten distinct NLP tasks, encompassing a total of 160 samples. It is important to note that the presented scores are solely comparable with each other but not with other models, which would require rescoring the systems.

The performance of both Chinese Alpaca-7B and Alpaca-13B models demonstrates significant im- provements over their original LLaMA counterparts. The Chinese Alpaca-13B model consistently outperforms the 7B variant, highlighting the benefits of increased model capacity.

For Question Answering tasks, the Chinese Alpaca-13B achieves a score of 77, compared to 53 for the 7B model. Similar improvements can be observed in Open-ended QA, with scores of 73 and 64 for the 13B and 7B models, respectively. Numerical Reasoning shows a more considerable improvement, with the 13B model scoring 50 compared to 23 for the 7B model.

在本節(jié)中，我們展示和分析了我們對(duì)4位量化的中文Alpaca-7B和Alpaca-13B模型進(jìn)行實(shí)驗(yàn)的結(jié)果，如表4所示,。評(píng)估基于GPT-4對(duì)十個(gè)不同的自然語(yǔ)言處理任務(wù)中的總共160個(gè)樣本的評(píng)分結(jié)果。需要注意的是，所呈現(xiàn)的得分僅可相互比較，而不可與其他模型進(jìn)行比較，這將需要重新對(duì)系統(tǒng)進(jìn)行評(píng)分,。

中文Alpaca-7B和Alpaca-13B模型的性能顯示出顯著的改進(jìn)，超過(guò)了它們的原始LLaMA對(duì)應(yīng)模型,。中文Alpaca-13B模型始終優(yōu)于7B變體，突顯了增加模型容量的好處。

在問(wèn)答任務(wù)中，中文Alpaca-13B的得分為77，而7B模型為53,。在開放式問(wèn)答中也可以觀察到類似的改進(jìn)，13B和7B模型的得分分別為73和64,。在數(shù)字推理中，13B模型的得分為50，而7B模型的得分為23，顯示出更為顯著的改進(jìn)。

In the domains of Poetry, Literature, Philosophy, Music, Sports, and Entertainment, the 13B model continues to outperform the 7B model, with scores of 54 and 65 against 31 and 36, respectively. The performance gap remains significant for tasks involving Letters and Articles, Translation, and Multi-turn Dialogue, with the 13B model consistently achieving higher scores. Interestingly, we observe that even though we did not use any multi-turn dialogue data for tuning systems, Chinese Alpaca still has the ability to track conversation history and follow user instructions in a consecutive manner.

Coding tasks exhibit a noticeable improvement, with the Chinese Alpaca-13B scoring 49 compared to 27 for the 7B model. The most striking performance difference can be observed in the Ethics task, where the 13B model achieves a perfect score of 100, in contrast to the 7B model’s score of 50, indicating superior performance in rejecting any unethical user inputs.

在詩(shī)歌,、文學(xué),、哲學(xué)、音樂(lè),、體育和娛樂(lè)領(lǐng)域，13B模型繼續(xù)優(yōu)于7B模型，分別得分54和65，而7B模型得分為31和36,。對(duì)于涉及信函和文章、翻譯以及多輪對(duì)話的任務(wù)，性能差距仍然顯著，13B模型始終獲得更高的得分,。有趣的是，我們觀察到，即使我們沒(méi)有使用任何多輪對(duì)話數(shù)據(jù)來(lái)調(diào)整系統(tǒng)，中文Alpaca仍然具有跟蹤對(duì)話歷史和按照用戶指令連貫進(jìn)行的能力,。

編碼任務(wù)顯示出明顯的改進(jìn)，中文Alpaca-13B得分為49，而7B模型得分為27。最引人注目的性能差異出現(xiàn)在倫理任務(wù)中，13B模型獲得完美的100分，而7B模型得分為50，表明在拒絕任何不道德的用戶輸入方面具有更優(yōu)越的性能,。

In summary, the experimental results demonstrate that both Chinese Alpaca-7B and Alpaca-13B models exhibit significant improvements over their original LLaMA counterparts, with the 13B model consistently outperforming the 7B model across all tasks. This underscores the effectiveness of our approach in enhancing the Chinese understanding and generation capabilities of the LLaMA and Alpaca models.

We provide some cases in Table , , and . For full comparisons and samples, please refer to our GitHub repository.

總之，實(shí)驗(yàn)結(jié)果表明，中文Alpaca-7B和Alpaca-13B模型在所有任務(wù)中都顯示出顯著的改進(jìn)，13B模型始終優(yōu)于7B模型,。這凸顯了我們的方法在增強(qiáng)LLaMA和Alpaca模型的中文理解和生成能力方面的有效性。

我們?cè)诒?,、表6和表7中提供了一些案例,。有關(guān)完整的比較和樣本，請(qǐng)參閱我們的GitHub存儲(chǔ)庫(kù),。

Table 4: GPT-4 rated results for 4-bit quantized Chinese Alpaca-7B and Alpaca-13B. Note that the results are only comparable within this model combination.

表4：4位量化中文Alpaca-7B和Alpaca-13B的GPT-4評(píng)分結(jié)果。請(qǐng)注意，這些結(jié)果僅在該模型組合內(nèi)可比較,。

Table 4: GPT-4 rated results for 4-bit quantized Chinese Alpaca-7B and Alpaca-13B. Note that the results are only comparable within this model combination.

Task	Samples #	Chinese-Alpaca-7B	Chinese-Alpaca-13B
Question Answering	20	53	77
Open-ended QA	20	64	73
Numerical Reasoning	20	23	50
Poetry, Literature, Philosophy	20	31	54
Music, Sports, Entertainment	20	36	65
Letters and Articles Writing	15	65	78
Translation	15	63	78
Multi-turn Dialogue	10	80	83
Coding	10	27	49
Ethics	10	50	100
Total	160	49	71

7,、CONCLUSION結(jié)論

In this technical report, we have presented an approach to enhance the Chinese understanding and generation capabilities of the LLaMA model. Acknowledging the limitations of the original LLaMA’s Chinese vocabulary, we expanded it by incorporating 20K additional Chinese tokens, sig- nificantly increasing its encoding efficiency for the Chinese language. Building on the Chinese LLaMA, we employed supervised fine-tuning with instruction data, resulting in the development of the Chinese Alpaca models, which exhibit improved instruction-following capabilities.

To evaluate our models effectively, we annotated 160 samples across 10 distinct task types and uti- lized GPT-4 for evaluation. Our experiments demonstrated that the proposed models significantly outperform the original LLaMA in Chinese understanding and generation tasks, with the 13B ver- sion consistently achieving greater improvements compared to the 7B variant.

在本技術(shù)報(bào)告中，我們提出了一種增強(qiáng)LLaMA模型中文理解和生成能力的方法。鑒于原始LLaMA模型在中文詞匯方面的局限性，我們通過(guò)添加2萬(wàn)個(gè)額外的中文標(biāo)記來(lái)擴(kuò)展其詞匯表，顯著提高了對(duì)中文語(yǔ)言的編碼效率,。在中文LLaMA的基礎(chǔ)上，我們采用了帶指導(dǎo)數(shù)據(jù)的有監(jiān)督微調(diào)方法，開發(fā)了中文Alpaca模型，其具有改進(jìn)的指令跟隨能力,。

為了有效評(píng)估我們的模型，我們對(duì)10個(gè)不同任務(wù)類型的160個(gè)樣本進(jìn)行了注釋，并利用GPT-4進(jìn)行評(píng)估。我們的實(shí)驗(yàn)結(jié)果表明，所提出的模型在中文理解和生成任務(wù)中明顯優(yōu)于原始LLaMA模型，13B版本相比于7B版本持續(xù)取得更大的改進(jìn),。

Looking ahead, we plan to explore Reinforcement Learning from Human Feedback (RLHF) or Re- inforcement Learning from AI Instructed Feedback (RLAIF) to further align the models’ output with human preferences. Moreover, we intend to adopt more advanced and effective quantization methods, such as GPTQ (Frantar et al., 2022), among others. Additionally, we aim to investigate al- ternative methods to LoRA for more efficient and effective pre-training and fine-tuning of large lan- guage models, ultimately enhancing their performance and applicability across various tasks within the Chinese NLP community.

展望未來(lái)，我們計(jì)劃探索人類反饋強(qiáng)化學(xué)習(xí)（RLHF）或AI指導(dǎo)反饋強(qiáng)化學(xué)習(xí)（RLAIF）等方法，進(jìn)一步使模型的輸出與人類偏好相一致。此外，我們打算采用更先進(jìn),、更有效的量化方法，例如GPTQ（Frantar et al., 2022）等,。另外，我們還打算研究替代LoRA的更高效、更有效的大型語(yǔ)言模型預(yù)訓(xùn)練和微調(diào)方法，最終提高這些模型在中文NLP社區(qū)內(nèi)各種任務(wù)中的性能和適用性,。

LIMITATIONS限制

While this project has successfully enhanced the Chinese understanding and generation capabilities of the LLaMA and Alpaca models, several limitations must be acknowledged:

雖然本項(xiàng)目成功增強(qiáng)了LLaMA和Alpaca模型的中文理解和生成能力，但仍需注意以下幾個(gè)限制：

>> Harmful and unpredictable content: Our results demonstrate that the 13B version has a better ability to reject unethical queries than the 7B version. However, these models may still generate content that is harmful or misaligned with human preferences and values. This issue may arise from biases present in the training data or the models’ inability to discern appropriate outputs in certain contexts.

>> Insufficient training: Due to constraints in computing power and data availability, the training of the models may not be sufficient for optimal performance. As a result, t5.1here is still room for improvement in the Chinese understanding capabilities of the models.

有害和不可預(yù)測(cè)的內(nèi)容：我們的結(jié)果表明，13B版本比7B版本更能拒絕不道德的查詢,。然而，這些模型仍可能生成有害或與人類偏好和價(jià)值觀不一致的內(nèi)容。這個(gè)問(wèn)題可能源于訓(xùn)練數(shù)據(jù)中存在的偏見，或者模型在某些情境下無(wú)法判斷適當(dāng)?shù)妮敵觥?/p>

訓(xùn)練不足：由于計(jì)算能力和數(shù)據(jù)可用性的限制，模型的訓(xùn)練可能不足以實(shí)現(xiàn)最佳性能,。因此，模型在中文理解能力方面仍有改進(jìn)的空間,。