從神經(jīng)網(wǎng)絡說起：深度學習初學者不可不知的25個術語和概念上

網(wǎng)摘文苑 2017-06-08

展開全文

人工智能,，深度學習和機器學習，不論你現(xiàn)在是否能夠理解這些概念,，你都應該學習。否則三年內(nèi)，你就會像滅絕的恐龍一樣被社會淘汰,。

——馬克·庫班（NBA小牛隊老板，億萬富翁）

馬克·庫班的這番話可能聽起來挺嚇人的,，但道理是沒毛病的,！我們正經(jīng)歷一場大革命，這場革命就是由大數(shù)據(jù)和強大電腦計算能力發(fā)起的,。

讓我們花幾分鐘回想一下20世紀初的景象,。那個時候很多人都不懂什么是電,，在過去幾十年，甚至幾百年的時間里,，人們一直沿用一種方式去做某件事情,，但是突然間，好像身邊的一切都變了,。

以前需要很多人才能做成的事情,，現(xiàn)在只需要一個人應用電力就能做成。而我們現(xiàn)在就正在經(jīng)歷相似的變革過程,，今天這場變革的主角就是機器學習和深度學習,。

如果你現(xiàn)在還不懂深度學習的巨大力量，那你真的要抓緊時間開始學啦,！這篇文章就為大家介紹了深度學習領域常用的一些術語和概念?，F(xiàn)在就從神經(jīng)網(wǎng)絡開始講起。

神經(jīng)網(wǎng)絡基礎概念：

（1）神經(jīng)元——正如神經(jīng)元是大腦的基本單位一樣,，在神經(jīng)網(wǎng)絡結構中,，神經(jīng)元也是一個小單位。大家不妨想象一下當我們接觸到新的信息時,，大腦是如何運作的,。

首先，我們會在腦中處理這個信息,，然后產(chǎn)生輸出信息,。在神經(jīng)網(wǎng)絡中也是如此，神經(jīng)元接收到一個輸入信息,，然后對它進行加工處理,，然后產(chǎn)生輸出信息，傳輸?shù)狡渌窠?jīng)元中進行進一步信息處理,。

從神經(jīng)網(wǎng)絡說起：深度學習初學者不可不知的25個術語和概念上

（2）權重——當輸入信息到達神經(jīng)元時,，它就會乘上一個權重。舉例來說,，如果一個神經(jīng)元包含兩個輸入信息,，那么每個輸入信息都被賦予它的關聯(lián)權重。我們隨機初始化權重,，并在模型訓練過程中更新這些權重,。

接受訓練后的神經(jīng)網(wǎng)絡會賦予它認為重要的輸入信息更高的權重值，而那些不重要的輸入信息權重值則會相對較小,。權重值為零就意味著這個特征是無關緊要的,。

從神經(jīng)網(wǎng)絡說起：深度學習初學者不可不知的25個術語和概念上

（3）偏置 —— 除了權重之外，輸入還有另一個線性分量,，被稱為偏置,。輸入信息乘上權重后再加上偏置,，用來改變權重乘輸入的范圍。加上偏置之后,，結果就變?yōu)?/p>

從神經(jīng)網(wǎng)絡說起：深度學習初學者不可不知的25個術語和概念上

,，這就是輸入信息變換的最終線性分量。

（4）激活函數(shù)——線性分量應用可以到輸入信息,，非線性函數(shù)也可以應用到輸入信息,。這種輸入信息過程是通過激活函數(shù)來實現(xiàn)的。

激活函數(shù)將輸入信號翻譯成輸出信號,。激活函數(shù)產(chǎn)生的輸出信息為

從神經(jīng)網(wǎng)絡說起：深度學習初學者不可不知的25個術語和概念上

其中f(x)就是激活函數(shù),。

從神經(jīng)網(wǎng)絡說起：深度學習初學者不可不知的25個術語和概念上

常用激活函數(shù)

最常用的激活函數(shù)有Sigmoid、ReLU 和softmax,。

Sigmoid——Sigmoid是最常用的激活函數(shù)之一,。它的定義為：

從神經(jīng)網(wǎng)絡說起：深度學習初學者不可不知的25個術語和概念上

Sigmoid函數(shù)會生成0到1之間的更平滑的取值范圍。我們可能需要觀察輸出值的變化,，同時輸入值也會略有變化,。而平滑的曲線更方便我們觀察，因此它優(yōu)于階梯函數(shù)（step functions）,。

ReLU（線性修正單位）——不同于sigmoid函數(shù),，現(xiàn)在的網(wǎng)絡更傾向于使用隱層ReLu激活函數(shù)。該函數(shù)的定義是：

從神經(jīng)網(wǎng)絡說起：深度學習初學者不可不知的25個術語和概念上

當X> 0時,，函數(shù)的輸出為X,，X <=>

從神經(jīng)網(wǎng)絡說起：深度學習初學者不可不知的25個術語和概念上

使用ReLU的好處主要是它對于大于0的所有輸入值都有對應的不變導數(shù)值。而常數(shù)導數(shù)值可以加快對網(wǎng)絡的訓練,。

Softmax——Softmax激活函數(shù)常用于輸出層的分類問題,。它與sigmoid函數(shù)類似，唯一的區(qū)別是在Softmax激活函數(shù)中,，輸出被歸一化，總和變?yōu)?,。

如果我們遇到的是二進制輸出問題,，就可以使用Sigmoid函數(shù)，而如果我們遇到的是多類型分類問題,，使用softmax函數(shù)可以輕松地為每個類型分配值,，并且可以很容易地將這個值轉化為概率。

這樣看可能更容易理解一些——假設你正在嘗試識別一個看起來像8實際為6的數(shù),。該函數(shù)將為每個數(shù)字賦值,，如下所示。我們可以很容易地看出,，最高概率被分配給了6,，下一個最高概率則分配給8,，依此類推...

從神經(jīng)網(wǎng)絡說起：深度學習初學者不可不知的25個術語和概念上

（5）神經(jīng)網(wǎng)絡 ——神經(jīng)網(wǎng)絡是深度學習的主干之一。神經(jīng)網(wǎng)絡的目標是找到未知函數(shù)的一個近似值,。它由相互聯(lián)系的神經(jīng)元組成,。

這些神經(jīng)元具有權重，并且會根據(jù)出錯情況,，在網(wǎng)絡訓練期間更新偏置值,。激活函數(shù)將非線性變換置于線性組合，之后生成輸出,。被激活的神經(jīng)元組合再產(chǎn)生輸出,。

對神經(jīng)網(wǎng)絡的定義中，以Liping Yang的最為貼切：

“神經(jīng)網(wǎng)絡由許多相互關聯(lián)的概念化的人造神經(jīng)元組成,，這些人造神經(jīng)元之間可以互相傳遞數(shù)據(jù),，并且具有根據(jù)網(wǎng)絡‘經(jīng)驗’調(diào)整的相關權重。

神經(jīng)元具有激活閾值,，如果結合相關權重組合并激活傳遞給他們的數(shù)據(jù),，神經(jīng)元的激活閾值就會被解除，激活的神經(jīng)元的組合就會開始‘學習’,?！?/p>

更多深度學習精彩內(nèi)容，下期繼續(xù),！

英文原文

25 Must Know Terms & concepts forBeginners in Deep Learning

Artificial Intelligence, deep learning,machine learning?—?whatever you’re doing if you don’t understand it?—?learn it.Because otherwise you’re going to be a dinosaur within 3 years.

——Mark Cuban

This statement from Mark Cuban might sounddrastic – but its message is spot on! We are in middle of a revolution – arevolution caused by Big Huge data and a ton of computational power.

For a minute, think how a person would feelin early 20th century if he / she did not understand electricity.

You wouldhave been used to doing things in a particular manner for ages and all of asudden things around you started changing.

Things which required many peoplecan now be done with one person and electricity. We are going through a similarjourney with machine learning & deep learning today.

So, if you haven’t explored or understoodthe power of deep learning – you should start it today. I have written thisarticle to help you understand common terms used in deep learning.

Who should read this article?

If you are some one who wants to learn orunderstand deep learning, this article is meant for you. In this article, Iwill explain various terms used commonly in deep learning.

If you are wondering why I am writing thisarticle – I am writing it because I want you to start your deep learningjourney without hassle or without getting intimidated.

When I first beganreading about deep learning, there were several terms I had heard about, but itwas intimidating when I tried to understand them. There are several words whichare recurring when we start reading about any deep learning application.

In this article, I have created somethinglike a deep learning dictionary for you which you can refer whenever you needthe basic definition of the most common terms used. I hope after this articlethese terms wouldn’t haunt you anymore.

Terms related to topics:

To help you understand various terms, Ihave broken them in 3 different groups. If you are looking for a specific term,you can skip to that section. If you are new to the domain, I would recommendthat you go through them in the order I have written them.

Basics of Neural Networks
Common Activation Functions
Convolutional Neural Networks
Recurrent Neural Networks

Basics of Neural Networks

(1) Neuron- Just like a neuron forms thebasic element of our brain, a neuron forms the basic structure of a neuralnetwork. Just think of what we do when we get new information.

When we get theinformation, we process it and then we generate an output. Similarly, in caseof a neural network, a neuron receives an input, processes it and generates anoutput which is either sent to other neurons for further processing or it isthe final output.

(2) Weights – When input enters the neuron,it is multiplied by a weight. For example, if a neuron has two inputs, theneach input will have has an associated weight assigned to it. We initialize theweights randomly and these weights are updated during the model trainingprocess.

The neural network after training assigns a higher weight to the inputit considers more important as compared to the ones which are considered lessimportant. A weight of zero denotes that the particular feature isinsignificant.

Let’s assume the input to be a, and theweight associated to be

從神經(jīng)網(wǎng)絡說起：深度學習初學者不可不知的25個術語和概念上

. Then after passing through the node the inputbecomes

從神經(jīng)網(wǎng)絡說起：深度學習初學者不可不知的25個術語和概念上

(3) Bias – In addition to the weights,another linear component is applied to the input, called as the bias. It isadded to the result of weight multiplication to the input.

The bias isbasically added to change the range of the weight multiplied input. Afteradding the bias, the result would look like

從神經(jīng)網(wǎng)絡說起：深度學習初學者不可不知的25個術語和概念上

. This is the final linearcomponent of the input transformation.

(4) Activation Function – Once the linearcomponent is applied to the input, a non-linear function is applied to it. Thisis done by applying the activation function to the linear combination.

Theactivation function translates the input signals to output signals. The outputafter application of the activation function would look something like

從神經(jīng)網(wǎng)絡說起：深度學習初學者不可不知的25個術語和概念上

where f() is the activation function.

In the below diagram we have “n” inputsgiven as

從神經(jīng)網(wǎng)絡說起：深度學習初學者不可不知的25個術語和概念上

, and corresponding weights

從神經(jīng)網(wǎng)絡說起：深度學習初學者不可不知的25個術語和概念上

. We have a bias given as

從神經(jīng)網(wǎng)絡說起：深度學習初學者不可不知的25個術語和概念上

The weights are first multiplied to its corresponding input and are thenadded together along with the bias. Let this be called as u.

u=∑w*x+b

The activation function is applied to ui.e. f(u) and we receive the final output from the neuron as

從神經(jīng)網(wǎng)絡說起：深度學習初學者不可不知的25個術語和概念上

Commonly applied Activation Functions

The most commonly applied activationfunctions are – Sigmoid, ReLU and softmax

(a) Sigmoid – One of the most commonactivation functions used is Sigmoid. It is defined as:

The sigmoid transformation generates a moresmooth range of values between 0 and 1.

We might need to observe the changes inthe output with slight changes in the input values. Smooth curves allow us todo that and are hence preferred over step functions.

(b) ReLU(Rectified Linear Units) – Insteadof sigmoids, the recent networks prefer using ReLu activation functions for thehidden layers. The function is defined as:

The output of the function is X when X>0and 0 for X<=0. the="" function="" looks="" like="">

The major benefit of using ReLU is that ithas a constant derivative value for all inputs greater than 0. The constantderivative value helps the network to train faster.

(c) Softmax – Softmax activation functionsare normally used in the output layer for classification problems. It issimilar to the sigmoid function, with the only difference being that theoutputs are normalized to sum up to 1.

The sigmoid function would work in casewe have a binary output, however in case we have a multiclass classificationproblem, softmax makes it really easy to assign values to each class which canbe easily interpreted as probabilities.

It’s very easy to see it this way – Supposeyou’re trying to identify a 6 which might also look a bit like 8.

The functionwould assign values to each number as below. We can easily see that the highestprobability is assigned to 6, with the next highest assigned to 8 and so on…

(5) Neural Network – Neural Networks formthe backbone of deep learning.The goal of a neural network is to find anapproximation of an unknown function.

It is formed by interconnected neurons.These neurons have weights, and bias which is updated during the networktraining depending upon the error.

The activation function puts a nonlineartransformation to the linear combination which then generates the output. Thecombinations of the activated neurons give the output.

A neural network is best defined by “LipingYang” as –

“Neural networks are made up of numerous interconnectedconceptualized artificial neurons, which pass data between themselves, andwhich have associated weights which are tuned based upon the network’s“experience.”

Neurons have activation thresholds which, if met by a combinationof their associated weights and data passed to them, are fired; combinations offired neurons result in “l(fā)earning”.

翻譯：燈塔大數(shù)據(jù)