python數(shù)據(jù)可視化——詞云

liqualife 2019-11-30

展開全文

閱讀本文需要4分鐘

詞云百度百科：“詞云”就是對網(wǎng)絡(luò)文本中出現(xiàn)頻率較高的“關(guān)鍵詞”予以視覺上的突出，形成“關(guān)鍵詞云層”或“關(guān)鍵詞渲染”,，從而過濾掉大量的文本信息,，使瀏覽網(wǎng)頁者只要一眼掃過文本就可以領(lǐng)略文本的主旨

先上幾張圖片讓大家欣賞一番：

這是我之前爬取的一篇文章并進行可視化而形成的詞云

個性化——添加了個背景圖

一般情況下對本狗來講,，更喜歡詞云。

廢話少說,，開始教程：

需要的模塊

import jiebaimport numpy as npfrom PIL import Imagefrom wordcloud import WordCloudfrom matplotlib import pyplot as plt

小刀試牛

首先需要進行分詞,，也就是將一個句子分割成一個個的詞語，我這里使用的是jieba分詞

import jieba cut = jieba.cut(text)  #text為你需要分詞的字符串/句子string = ' '.join(cut)  #將分開的詞用空格連接print(string)

分好詞后就需要將詞做成詞云了,，我使用的是wordcloud

from matplotlib import pyplot as pltfrom wordcloud import WordCloud string = ''' I volunteer to join the Communist Party of China, support the Party's program, abide by the Party's

 Articles of Association, fulfill Party duties, implement Party decisions, strictly observe Party discipline,

 keep the secrets of the Conservative Party, be loyal to the Party, work actively, and fight for communism for life.

 We are always ready to sacrifice everything for the Party and the people and never defect to the Party.

'''font = r'C:\Windows\Fonts\simfang.ttf' #設(shè)置字體路徑 wc = WordCloud(font_path=font, #如果是中文必須要添加這個,，否則會顯示成框框 background_color='white', width=1000, height=800, ).generate(string)wc.to_file('789.png') #保存圖片plt.imshow(wc) #用plt顯示圖片plt.axis('off') #不顯示坐標(biāo)軸plt.show() #顯示圖片

效果圖：

屬性設(shè)置

font_path : string //字體路徑，需要展現(xiàn)什么字體就把該字體路徑+后綴名寫上,，如：font_path = '黑體.ttf'width : int (default=400) //輸出的畫布寬度,，默認為400像素height : int (default=200) //輸出的畫布高度，默認為200像素prefer_horizontal : float (default=0.90) //詞語水平方向排版出現(xiàn)的頻率,，默認 0.9 （所以詞語垂直方向排版出現(xiàn)頻率為 0.1 ）mask : nd-array or None (default=None) //如果參數(shù)為空,，則使用二維遮罩繪制詞云。如果 mask 非空,，設(shè)置的寬高值將被忽略,，遮罩形狀被 mask 取代。除全白（#FFFFFF）的部分將不會繪制,，其余部分會用于繪制詞云,。如：bg_pic = imread('讀取一張圖片.png')，背景圖片的畫布一定要設(shè)置為白色（#FFFFFF）,，然后顯示的形狀為不是白色的其他顏色,。可以用ps工具將自己要顯示的形狀復(fù)制到一個純白色的畫布上再保存,，就ok了,。scale : float (default=1) //按照比例進行放大畫布，如設(shè)置為1.5,，則長和寬都是原來畫布的1.5倍,。min_font_size : int (default=4) //顯示的最小的字體大小font_step : int (default=1) //字體步長，如果步長大于1,，會加快運算但是可能導(dǎo)致結(jié)果出現(xiàn)較大的誤差,。max_words : number (default=200) //要顯示的詞的最大個數(shù)stopwords : set of strings or None //設(shè)置需要屏蔽的詞，如果為空,，則使用內(nèi)置的STOPWORDbackground_color : color value (default=”black”) //背景顏色,，如background_color='white',背景顏色為白色。max_font_size : int or None (default=None) //顯示的最大的字體大小mode : string (default=”RGB”) //當(dāng)參數(shù)為“RGBA”并且background_color不為空時,，背景為透明,。relative_scaling : float (default=.5) //詞頻和字體大小的關(guān)聯(lián)color_func : callable, default=None //生成新顏色的函數(shù)，如果為空,，則使用 self.color_funcregexp : string or None (optional) //使用正則表達式分隔輸入的文本collocations : bool, default=True //是否包括兩個詞的搭配colormap : string or matplotlib colormap, default=”viridis” //給每個單詞隨機分配顏色,，若指定color_func,，則忽略該方法。

自定義背景形狀

通過添加 “mask=”這個屬性,，來實現(xiàn)改變背景形狀,，但是

背景圖片必須是白底，它會在你非白底的地方填充上文字,，

所以最終我的代碼是這樣的：

import jiebafrom matplotlib import pyplot as pltfrom wordcloud import WordCloudfrom PIL import Imageimport numpy as np path = r'文件存儲的目錄'font = r'C:\Windows\Fonts\FZSTK.TTF' text = (open(path+r',？？,？,？.txt','r',encoding='utf-8')).read()cut = jieba.cut(text) #分詞string = ' '.join(cut)print(len(string))img = Image.open(path+r'\456.png') #打開背景圖img_array = np.array(img) #將圖片裝換為數(shù)組

stopword=['xa0']  #設(shè)置停止詞，也就是你不想顯示的詞,，這里這個詞是我前期處理沒處理好,，你可以刪掉他看看他的作用

wc = WordCloud( background_color='white', width=1000, height=800, mask=img_array, font_path=font, stopwords=stopword)wc.generate_from_text(string)#繪制圖片plt.imshow(wc)plt.axis('off')plt.figure()plt.show() #顯示圖片wc.to_file(path+r'\123.png') #保存圖片

圖片源：