【原】【tensorflow速成】Tensorflow圖像分類從模型自定義到測試

有三AI 2020-11-27

展開全文

這是給大家準(zhǔn)備的tensorflow速成例子

言有三

畢業(yè)于中國科學(xué)院,，計(jì)算機(jī)視覺方向從業(yè)者,，有三工作室等創(chuàng)始人

作者 | 言有三（微信號Longlongtogo）

編輯 | 言有三

上一篇介紹了 Caffe ,，這篇將介紹 TensorFlow。

什么是 TensorFlow

TensorFlow 是 Google brain 推出的開源機(jī)器學(xué)習(xí)庫,，與 Caffe 一樣,，主要用作深度學(xué)習(xí)相關(guān)的任務(wù)。

與 Caffe 相比 TensorFlow 的安裝簡單很多,，一條 pip 命令就可以解決,，新手也不會誤入各種坑。

TensorFlow = Tensor + Flow

Tensor 就是張量,，代表 N 維數(shù)組,，與 Caffe 中的 blob 是類似的；Flow 即流,，代表基于數(shù)據(jù)流圖的計(jì)算,。神經(jīng)網(wǎng)絡(luò)的運(yùn)算過程，就是數(shù)據(jù)從一層流動到下一層,，在 Caffe 的每一個中間 layer 參數(shù)中,，都有 bottom 和 top，這就是一個分析和處理的過程,。TensorFlow更直接強(qiáng)調(diào)了這個過程,。

TensorFlow 最大的特點(diǎn)是計(jì)算圖，即先定義好圖,，然后進(jìn)行運(yùn)算,，所以所有的TensorFlow 代碼，都包含兩部分：

（1）創(chuàng)建計(jì)算圖,，表示計(jì)算的數(shù)據(jù)流,。它做了什么呢？實(shí)際上就是定義好了一些操作，你可以將它看做是 Caffe 中的 prototxt 的定義過程,。

（2）運(yùn)行會話,，執(zhí)行圖中的運(yùn)算，可以看作是 Caffe 中的訓(xùn)練過程,。只是TensorFlow的會話比 Caffe 靈活很多,，由于是 Python 接口，取中間結(jié)果分析,，Debug 等方便很多,。

TensorFlow 訓(xùn)練

咱們這是實(shí)戰(zhàn)速成，沒有這么多時間去把所有事情細(xì)節(jié)都說清楚,，而是抓住主要脈絡(luò),。有了 TensorFlow 這個工具后，我們接下來的任務(wù)就是開始訓(xùn)練模型,。訓(xùn)練模型,，包括數(shù)據(jù)準(zhǔn)備、模型定義,、結(jié)果保存與分析,。

2.1數(shù)據(jù)準(zhǔn)備

上一節(jié)我們說過 Caffe 中的數(shù)據(jù)準(zhǔn)備，只需要準(zhǔn)備一個 list 文件,，其中每一行存儲 image,、labelid 就可以了，那是 Caffe 默認(rèn)的分類網(wǎng)絡(luò)的 imagedata 層的輸入格式,。如果想定義自己的輸入格式,，可以去新建自定義的 Data Layer，而 Caffe 官方的 data layer 和 imagedata layer 都非常穩(wěn)定,，幾乎沒有變過,，這是我更欣賞 Caffe 的一個原因。因?yàn)檩斎霐?shù)據(jù),，簡單即可,。相比之下，TensorFlow 中的數(shù)據(jù)輸入接口就要復(fù)雜很多,，更新也非?？欤抑跤幸黄恼?，說過從《從 Caffe 到 TensorFlow 1,，IO 操作》，有興趣的讀者可以了解一下,。

這里我們不再說 TensorFlow 中有多少種數(shù)據(jù) IO 方法,，先確定好我們的數(shù)據(jù)格式,，那就是跟 Caffe一樣，準(zhǔn)備好一個list,，它的格式一樣是 image,、labelid，然后再看如何將數(shù)據(jù)讀入 TensorFlow 進(jìn)行訓(xùn)練,。

我們定義一個類,，叫 imagedata，模仿 Caffe 中的使用方式,。代碼如下，源代碼可移步 Git,。

import tensorflow as tf

from tensorflow.contrib.data import Dataset

from tensorflow.python.framework import dtypes

from tensorflow.python.framework.ops import convert_to_tensor

import numpy as np

class ImageData:

def read_txt_file(self):

self.img_paths = []

self.labels = []

for line in open(self.txt_file, 'r'):

items = line.split(' ')

self.img_paths.append(items[0])

self.labels.append(int(items[1]))

def __init__(self, txt_file, batch_size, num_classes,

image_size,buffer_scale=100):

self.image_size = image_size

self.batch_size = batch_size

self.txt_file = txt_file ##txt list file,stored as: imagename id

self.num_classes = num_classes

buffer_size = batch_size * buffer_scale

# 讀取圖片

self.read_txt_file()

self.dataset_size = len(self.labels)

print "num of train datas=",self.dataset_size

# 轉(zhuǎn)換成Tensor

self.img_paths = convert_to_tensor(self.img_paths, dtype=dtypes.string)

self.labels = convert_to_tensor(self.labels, dtype=dtypes.int32)

# 創(chuàng)建數(shù)據(jù)集

data = Dataset.from_tensor_slices((self.img_paths, self.labels))

print "data type=",type(data)

data = data.map(self.parse_function)

data = data.repeat(1000)

data = data.shuffle(buffer_size=buffer_size)

# 設(shè)置self data Batch

self.data = data.batch(batch_size)

print "self.data type=",type(self.data)

def augment_dataset(self,image,size):

distorted_image = tf.image.random_brightness(image,

max_delta=63)

distorted_image = tf.image.random_contrast(distorted_image,

lower=0.2, upper=1.8)

# Subtract off the mean and divide by the variance of the pixels.

float_image = tf.image.per_image_standardization(distorted_image)

return float_image

def parse_function(self, filename, label):

label_ = tf.one_hot(label, self.num_classes)

img = tf.read_file(filename)

img = tf.image.decode_jpeg(img, channels=3)

img = tf.image.convert_image_dtype(img, dtype = tf.float32)

img = tf.random_crop(img,[self.image_size[0],self.image_size[1],3])

img = tf.image.random_flip_left_right(img)

img = self.augment_dataset(img,self.image_size)

return img, label_

下面來分析上面的代碼,，類是 ImageData，它包含幾個函數(shù),，__init__構(gòu)造函數(shù),，read_txt_file數(shù)據(jù)讀取函數(shù)，parse_function數(shù)據(jù)預(yù)處理函數(shù),，augment_dataset數(shù)據(jù)增強(qiáng)函數(shù),。

我們直接看構(gòu)造函數(shù)吧，分為幾個步驟：

（1）讀取變量,，文本 list 文件txt_file,，批處理大小batch_size，類別數(shù)num_classes,，要處理成的圖片大小image_size,，一個內(nèi)存變量buffer_scale=100。

（2）在獲取完這些值之后,，就到了read_txt_file函數(shù),。代碼很簡單，就是利用self.img_paths和 self.labels存儲輸入 txt 中的文件列表和對應(yīng)的 label,，這一點(diǎn)和 Caffe 很像了,。

（3）然后，就是分別將img_paths和 labels 轉(zhuǎn)換為 Tensor,，函數(shù)是convert_to_tensor,，這是 Tensor 內(nèi)部的數(shù)據(jù)結(jié)構(gòu)。

（4）創(chuàng)建 dataset,，Dataset.from_tensor_slices,，這一步，是為了將 img 和 label 合并到一個數(shù)據(jù)格式,，此后我們將利用它的接口,，來循環(huán)讀取數(shù)據(jù)做訓(xùn)練,。當(dāng)然，創(chuàng)建好 dataset 之后,，我們需要給它賦值才能真正的有數(shù)據(jù),。data.map 就是數(shù)據(jù)的預(yù)處理，包括讀取圖片,、轉(zhuǎn)換格式,、隨機(jī)旋轉(zhuǎn)等操作，可以在這里做,。

data = data.repeat(1000) 是將數(shù)據(jù)復(fù)制 1000 份,，這可以滿足我們訓(xùn)練 1000 個 epochs。data = data.shuffle(buffer_size=buffer_size)就是數(shù)據(jù) shuffle 了,，buffer_size就是在做 shuffle 操作時的控制變量,，內(nèi)存越大，就可以用越大的值,。

（5）給 selft.data 賦值,，我們每次訓(xùn)練的時候，是取一個 batchsize 的數(shù)據(jù),，所以 self.data = data.batch(batch_size),，就是從上面創(chuàng)建的 dataset 中，一次取一個 batch 的數(shù)據(jù),。

到此,，數(shù)據(jù)接口就定義完畢了，接下來在訓(xùn)練代碼中看如何使用迭代器進(jìn)行數(shù)據(jù)讀取就可以了,。

關(guān)于更多 TensorFlow 的數(shù)據(jù)讀取方法,，請移步知乎專欄和公眾號。

2.2 模型定義

創(chuàng)建數(shù)據(jù)接口后,，我們開始定義一個網(wǎng)絡(luò),。

def simpleconv3net(x):

x_shape = tf.shape(x)

with tf.variable_scope("conv3_net"):

conv1 = tf.layers.conv2d(x, name="conv1", filters=12,kernel_size=[3,3], strides=(2,2), activation=tf.nn.relu,kernel_initializer=tf.contrib.layers.xavier_initializer(),bias_initializer=tf.contrib.layers.xavier_initializer())

bn1 = tf.layers.batch_normalization(conv1, training=True, name='bn1')

conv2 = tf.layers.conv2d(bn1, name="conv2", filters=24,kernel_size=[3,3], strides=(2,2), activation=tf.nn.relu,kernel_initializer=tf.contrib.layers.xavier_initializer(),bias_initializer=tf.contrib.layers.xavier_initializer())

bn2 = tf.layers.batch_normalization(conv2, training=True, name='bn2')

conv3 = tf.layers.conv2d(bn2, name="conv3", filters=48,kernel_size=[3,3], strides=(2,2), activation=tf.nn.relu,kernel_initializer=tf.contrib.layers.xavier_initializer(),bias_initializer=tf.contrib.layers.xavier_initializer())

bn3 = tf.layers.batch_normalization(conv3, training=True, name='bn3')

conv3_flat = tf.reshape(bn3, [-1, 5 * 5 * 48])

dense = tf.layers.dense(inputs=conv3_flat, units=128, activation=tf.nn.relu,name="dense",kernel_initializer=tf.contrib.layers.xavier_initializer())

logits= tf.layers.dense(inputs=dense, units=2, activation=tf.nn.relu,name="logits",kernel_initializer=tf.contrib.layers.xavier_initializer())

if debug:

print "x size=",x.shape

print "relu_conv1 size=",conv1.shape

print "relu_conv2 size=",conv2.shape

print "relu_conv3 size=",conv3.shape

print "dense size=",dense.shape

print "logits size=",logits.shape

return logits

上面就是我們定義的網(wǎng)絡(luò)，是一個簡單的3層卷積,。在 tf.layers 下,，有各種網(wǎng)絡(luò)層，這里就用到了 tf.layers.conv2d,，tf.layers.batch_normalization和 tf.layers.dense,，分別是卷積層，BN 層和全連接層,。我們以一個卷積層為例：

x 即輸入,，name 是網(wǎng)絡(luò)名字，filters 是卷積核數(shù)量,，kernel_size即卷積核大小,，strides 是卷積 stride,，activation 即激活函數(shù)，kernel_initializer和bias_initializer分別是初始化方法,?？梢娨呀?jīng)將激活函數(shù)整合進(jìn)了卷積層，更全面的參數(shù),，請自查 API,。其實(shí)網(wǎng)絡(luò)的定義，還有其他接口,，tf.nn,、tf.layers、tf.contrib,，各自重復(fù),，在我看來有些混亂。這里之所以用 tf.layers,，就是因?yàn)閰?shù)豐富，適合從頭訓(xùn)練一個模型,。

2.3 模型訓(xùn)練

老規(guī)矩,，我們直接上代碼，其實(shí)很簡單,。

from dataset import *
from net import simpleconv3net
import sys
import os
import cv2

////-------1 定義一些全局變量-------////

txtfile = sys.argv[1]
batch_size = 64
num_classes = 2
image_size = (48,48)
learning_rate = 0.0001

debug=False

if __name__=="__main__":

////-------2 載入網(wǎng)絡(luò)結(jié)構(gòu),，定義損失函數(shù)，創(chuàng)建計(jì)算圖-------////

dataset = ImageData(txtfile,batch_size,num_classes,image_size)
iterator = dataset.data.make_one_shot_iterator()
dataset_size = dataset.dataset_size
batch_images,batch_labels = iterator.get_next()
Ylogits = simpleconv3net(batch_images)

print "Ylogits size=",Ylogits.shape

Y = tf.nn.softmax(Ylogits)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=Ylogits, labels=batch_labels)
cross_entropy = tf.reduce_mean(cross_entropy)
correct_prediction = tf.equal(tf.argmax(Y, 1), tf.argmax(batch_labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_step = tf.train.AdamOptimizer(learning_rate).minimize(cross_entropy)

saver = tf.train.Saver()
in_steps = 100
checkpoint_dir = 'checkpoints/'
if not os.path.exists(checkpoint_dir):
os.mkdir(checkpoint_dir)
log_dir = 'logs/'
if not os.path.exists(log_dir):
os.mkdir(log_dir)
summary = tf.summary.FileWriter(logdir=log_dir)
loss_summary = tf.summary.scalar("loss", cross_entropy)
acc_summary = tf.summary.scalar("acc", accuracy)
image_summary = tf.summary.image("image", batch_images)
////-------3 執(zhí)行會話,，保存相關(guān)變量,，還可以添加一些debug函數(shù)來查看中間結(jié)果-------////

with tf.Session() as sess:
init = tf.global_variables_initializer()
sess.run(init)
steps = 10000
for i in range(steps):
_,cross_entropy_,accuracy_,batch_images_,batch_labels_,loss_summary_,acc_summary_,image_summary_ = sess.run([train_step,cross_entropy,accuracy,batch_images,batch_labels,loss_summary,acc_summary,image_summary])
if i % in_steps == 0 :
print i,"iterations,loss=",cross_entropy_,"acc=",accuracy_
saver.save(sess, checkpoint_dir + 'model.ckpt', global_step=i)
summary.add_summary(loss_summary_, i)
summary.add_summary(acc_summary_, i)
summary.add_summary(image_summary_, i)
#print "predict=",Ylogits," labels=",batch_labels

if debug:
imagedebug = batch_images_[0].copy()
imagedebug = np.squeeze(imagedebug)
print imagedebug,imagedebug.shape
print np.max(imagedebug)
imagelabel = batch_labels_[0].copy()
print np.squeeze(imagelabel)

imagedebug = cv2.cvtColor((imagedebug*255).astype(np.uint8),cv2.COLOR_RGB2BGR)
cv2.namedWindow("debug image",0)
cv2.imshow("debug image",imagedebug)
k = cv2.waitKey(0)
if k == ord('q'):
break

2.4 可視化

TensorFlow 很方便的一點(diǎn)，就是 Tensorboard 可視化,。Tensorboard 的具體原理就不細(xì)說了,，很簡單，就是三步,。

第一步,，創(chuàng)建日志目錄。

log_dir = 'logs/'

if not os.path.exists(log_dir): os.mkdir(log_dir)

第二步,，創(chuàng)建 summary 操作并分配標(biāo)簽,，如我們要記錄 loss、acc 和迭代中的圖片,，則創(chuàng)建了下面的變量：

loss_summary = tf.summary.scalar("loss", cross_entropy)acc_summary = tf.summary.scalar("acc", accuracy)image_summary = tf.summary.image("image", batch_images)

第三步,，session 中記錄結(jié)果，如下面代碼：

_,cross_entropy_,accuracy_,batch_images_,batch_labels_,loss_summary_,acc_summary_,image_summary_ = sess.run([train_step,cross_entropy,accuracy,batch_images,batch_labels,loss_summary,acc_summary,image_summary])

查看訓(xùn)練過程和最終結(jié)果時使用：

tensorboard --logdir=logs

Loss 和 acc 的曲線圖如下：

TensorFlow 測試

上面已經(jīng)訓(xùn)練好了模型,，我們接下來的目標(biāo),，就是要用它來做 inference 了,。同樣給出代碼。

import tensorflow as tf
from net import simpleconv3net
import sys
import numpy as np
import cv2
import os

testsize = 48
x = tf.placeholder(tf.float32, [1,testsize,testsize,3])
y = simpleconv3net(x)
y = tf.nn.softmax(y)

lines = open(sys.argv[2]).readlines()
count = 0
acc = 0
posacc = 0
negacc = 0
poscount = 0
negcount = 0

with tf.Session() as sess:
init = tf.global_variables_initializer()
sess.run(init)
saver = tf.train.Saver()
saver.restore(sess,sys.argv[1])

#test one by one, you can change it into batch inputs
for line in lines:
imagename,label = line.strip().split(' ')
img = tf.read_file(imagename)
img = tf.image.decode_jpeg(img,channels = 3)
img = tf.image.convert_image_dtype(img,dtype = tf.float32)
img = tf.image.resize_images(img,(testsize,testsize),method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
img = tf.image.per_image_standardization(img)

imgnumpy = img.eval()
imgs = np.zeros([1,testsize,testsize,3],dtype=np.float32)
imgs[0:1,] = imgnumpy

result = sess.run(y, feed_dict={x:imgs})
result = np.squeeze(result)
if result[0] > result[1]:
predict = 0
else:
predict = 1

count = count + 1
if str(predict) == '0':
negcount = negcount + 1
if str(label) == str(predict):
negacc = negacc + 1
acc = acc + 1
else:
poscount = poscount + 1
if str(label) == str(predict):
posacc = posacc + 1
acc = acc + 1

print result
print "acc = ",float(acc) / float(count)
print "poscount=",poscount
print "posacc = ",float(posacc) / float(poscount)
print "negcount=",negcount
print "negacc = ",float(negacc) / float(negcount)

從上面的代碼可知,，與 Train 時同樣,，需要定義模型，這個跟 Caffe 在測試時使用的 Deploy 是一樣的,。

然后,，用 restore 函數(shù)從 saver 中載入?yún)?shù)，讀取圖像并準(zhǔn)備好網(wǎng)絡(luò)的格式,，sess.run 就可以得到最終的結(jié)果了,。

總結(jié)

本篇內(nèi)容講解了一個最簡單的分類例子，相比大部分已封裝好的 mnist 或 cifar 為例的代碼來說更實(shí)用,。我們自己準(zhǔn)備了數(shù)據(jù)集,，自己設(shè)計(jì)了網(wǎng)絡(luò)并進(jìn)行了結(jié)果可視化，學(xué)習(xí)了如何使用已經(jīng)訓(xùn)練好的模型做預(yù)測,。