久久国产成人av_抖音国产毛片_a片网站免费观看_A片无码播放手机在线观看,色五月在线观看,亚洲精品m在线观看,女人自慰的免费网址,悠悠在线观看精品视频,一级日本片免费的,亚洲精品久,国产精品成人久久久久久久

分享

Py之tiktoken:tiktoken的簡(jiǎn)介,、安裝,、使用方法之詳細(xì)攻略

 處女座的程序猿 2023-10-20 發(fā)布于上海

Py之tiktoken:tiktoken的簡(jiǎn)介、安裝,、使用方法之詳細(xì)攻略


tiktoken的簡(jiǎn)介

tiktoken是一個(gè)用于OpenAI模型的快速BPE標(biāo)記器,。

1、性能:tiktoken比一個(gè)類似的開(kāi)源分詞器快3到6倍

tiktoken的安裝

pip install tiktoken

pip install -i https://pypi.tuna./simple tiktoken
C:\Windows\system32>pip install -i https://pypi.tuna./simple tiktoken
Looking in indexes: https://pypi.tuna./simple
Collecting tiktoken
  Downloading https://pypi.tuna./packages/91/cf/7f3b821152f7abb240950133c60c394f7421a5791b020cedb190ff7a61b4/tiktoken-0.5.1-cp39-cp39-win_amd64.whl (760 kB)
     |████████████████████████████████| 760 kB 726 kB/s
Requirement already satisfied: regex>=2022.1.18 in d:\programdata\anaconda3\lib\site-packages (from tiktoken) (2022.3.15)
Requirement already satisfied: requests>=2.26.0 in d:\programdata\anaconda3\lib\site-packages (from tiktoken) (2.31.0)
Requirement already satisfied: charset-normalizer<4,>=2 in d:\programdata\anaconda3\lib\site-packages (from requests>=2.26.0->tiktoken) (2.0.12)
Requirement already satisfied: urllib3<3,>=1.21.1 in d:\programdata\anaconda3\lib\site-packages (from requests>=2.26.0->tiktoken) (1.26.9)
Requirement already satisfied: idna<4,>=2.5 in d:\programdata\anaconda3\lib\site-packages (from requests>=2.26.0->tiktoken) (3.3)
Requirement already satisfied: certifi>=2017.4.17 in d:\programdata\anaconda3\lib\site-packages (from requests>=2.26.0->tiktoken) (2021.10.8)
Installing collected packages: tiktoken
Successfully installed tiktoken-0.5.1

tiktoken的使用方法

1,、基礎(chǔ)用法

(1),、用于OpenAI模型的快速BPE標(biāo)記器

import tiktoken
enc = tiktoken.get_encoding("cl100k_base")
assert enc.decode(enc.encode("hello world")) == "hello world"

# To get the tokeniser corresponding to a specific model in the OpenAI API:
enc = tiktoken.encoding_for_model("gpt-4")

(2)、幫助可視化BPE過(guò)程的代碼

from tiktoken._educational import *

# Train a BPE tokeniser on a small amount of text
enc = train_simple_encoding()

# Visualise how the GPT-4 encoder encodes text
enc = SimpleBytePairEncoding.from_tiktoken("cl100k_base")
enc.encode("hello world aaaaaaaaaaaa")

    轉(zhuǎn)藏 分享 獻(xiàn)花(0

    0條評(píng)論

    發(fā)表

    請(qǐng)遵守用戶 評(píng)論公約

    類似文章 更多