久久国产成人av_抖音国产毛片_a片网站免费观看_A片无码播放手机在线观看,色五月在线观看,亚洲精品m在线观看,女人自慰的免费网址,悠悠在线观看精品视频,一级日本片免费的,亚洲精品久,国产精品成人久久久久久久

分享

精簡指令集

 無知就無畏 2015-02-04

精簡指令集,,計(jì)算機(jī)CPU的一種設(shè)計(jì)模式,,也被稱為RISC(Reduced Instruction Set Computing 的縮寫),。常見的精簡指令集微處理器包括AVR,、PIC,、ARMDEC Alpha,、PA-RISC,、SPARCMIPS,、Power架構(gòu)等,。

早期,這種CPU指令集的特點(diǎn)是指令數(shù)目少,,每條指令都采用標(biāo)準(zhǔn)字長,、執(zhí)行時間短、CPU的實(shí)現(xiàn)細(xì)節(jié)對于機(jī)器級程序是可見的等等,。

實(shí)際上在后來的發(fā)展中,,RISC與CISC在爭吵的過程中相互學(xué)習(xí),現(xiàn)在的RISC指令集也達(dá)到數(shù)百條,,運(yùn)行周期也不再固定……雖然如此,RISC設(shè)計(jì)的根本原則--針對流水線化的處理器優(yōu)化--沒有改變,。

目錄

 [隱藏

RISC之前的設(shè)計(jì)原理

在早期的計(jì)算機(jī)業(yè)中,,編譯器技術(shù)尚未出現(xiàn)。程序是以機(jī)器語言匯編語言完成的,。為了便于編寫程序,,計(jì)算機(jī)架構(gòu)師造出越來越復(fù)雜的指令,,可以高階程序語言直接陳述高階功能。當(dāng)時的看法是硬件比編譯器更易設(shè)計(jì),,所以復(fù)雜的東西就加進(jìn)硬件了,。

加速復(fù)雜化的其它因素是缺乏大內(nèi)存。內(nèi)存小的環(huán)境中,,具有極高訊息密度的程序較有利,。當(dāng)內(nèi)存中的每一字節(jié)如此珍貴,例如儲存某個完整系統(tǒng)只需幾千字節(jié),,它使產(chǎn)業(yè)移向高度編碼的指令,、長度不等的指令、執(zhí)行多個操作的指令,,和執(zhí)行數(shù)據(jù)傳輸與計(jì)算的指令,。當(dāng)時指令封包問題遠(yuǎn)比易解的指令重要。

內(nèi)存不僅小,,而且很慢,,打從當(dāng)時使用磁性技術(shù)。這是維持極高訊息密度的其它原因,。借著具有極高訊息密度封包,,當(dāng)必須存取慢速資源時可以降低頻率。

CPU只有少數(shù)緩存器的兩個原因︰

  • CPU內(nèi)部緩存器遠(yuǎn)貴于外部內(nèi)存,。以當(dāng)時的集成電路技術(shù)水準(zhǔn),,大緩存器集對芯片或電路板區(qū)域只是多余的浪費(fèi)。
  • 具有大數(shù)量的緩存器將需要大數(shù)量的指令位(使用珍貴的RAM)以做為緩存器指定器,。

基于上述原因,,CPU設(shè)計(jì)師試著令指令盡可能做更多的工作。這導(dǎo)致一個指令將做全部的工作︰讀入兩個數(shù)字,,相加,,并且直接在內(nèi)存儲存計(jì)算結(jié)果。其它版本將從內(nèi)存讀取兩個數(shù)字,,但計(jì)算結(jié)果儲存在緩存器,。另一個版本將從內(nèi)存和緩存器各讀一個數(shù)字,并再次存入內(nèi)存,。以此類推,。這種處理器設(shè)計(jì)原理最終成為復(fù)雜指令集(CISC)。

當(dāng)時的目標(biāo)是給所有的指令提供所有的尋址模式,,此稱為「正交性」,。這在 CPU 上導(dǎo)致了一些復(fù)雜性,但就理論上每個可能的命令都可以單獨(dú)的調(diào)試(調(diào)用,,be tuned),,這樣使得程序員能夠比用簡單的命令來得更快速,。

這類的設(shè)計(jì)最終可以由光譜的兩端來表達(dá), 6502 在光譜的一端,,而 VAX 在光譜的另一端,。單價25美元的 1MHz 6502 芯片只有單一的通用緩存器, 但它的極精簡的單周期內(nèi)存界面(single-cycle memory interface)讓一個位的操作效能和更高頻率設(shè)計(jì)幾乎相同,,例如 4MHz Zilog Z80 在使用相同慢速的記憶芯片下(大約近似 300ns),。The VAX was a minicomputer whose initial implementation required 3 racks of equipment for a single cpu, and was notable for the amazing variety of memory access styles it supported, and the fact that every one of them was available for every instruction. The VAX was a minicomputer whose initial implementation required 3 racks of equipment for a single cpu, and was notable for the amazing variety of memory access styles it supported, and the fact that every one of them was available for every instruction.

RISC設(shè)計(jì)原理

后70年代的IBM研究人員(以及其它地方的類似計(jì)劃)顯示,大多數(shù)正交尋址模式已被多數(shù)程序員所忽略,。這是逐漸使用編譯器的副作用,,不太使用匯編語言。The compilers in use at the time only had a limited ability to take advantage of the features provided by CISC CPUs; this was largely a result of the difficulty of writing a compiler. The market was clearly moving to even wider use of compilers, diluting the usefulness of these orthogonal modes even more.

Another discovery was that since these operations were rarely used, in fact they tended to be slower than a number of smaller operations doing the same thing. This seeming paradox was a side effect of the time spent designing the CPUs, designers simply did not have time to tune every possible instruction, and instead tuned only the most used ones. One famous example of this was the VAX's INDEX instruction, which ran slower than a loop implementing the same code.

At about the same time CPUs started to run even faster than the memory they talked to. Even in the late 1970s it was apparent that this disparity was going to continue to grow for at least the next decade, by which time the CPU would be tens to hundreds of times faster than the memory. It became apparent that more registers (and later caches) would be needed to support these higher operating frequencies. These additional registers and cache memories would require sizeable chip or board areas that could be made available if the complexity of the CPU was reduced.

Yet another part of RISC design came from practical measurements on real-world programs. Andrew Tanenbaum summed up many of these, demonstrating that most processors were vastly overdesigned. For instance, he showed that 98% of all the constants in a program would fit in 13 bits, yet almost every CPU design dedicated some multiple of 8 bits to storing them, typically 8, 16 or 32, one entire word. Taking this fact into account suggests that a machine should allow for constants to be stored in unused bits of the instruction itself, decreasing the number of memory accesses. Instead of loading up numbers from memory or registers, they would be "right there" when the CPU needed them, and therefore much faster. However this required the operation itself to be very small, otherwise there would not be enough room left over in a 32-bit instruction to hold reasonably sized constants.

Since real-world programs spent most of their time executing very simple operations, some researchers decided to focus on making those common operations as simple and as fast as possible. Since the clock rate of the CPU is limited by the time it takes to execute the slowest instruction, speeding up that instruction -- perhaps by reducing the number of addressing modes it supports -- also speeds up the execution of every other instruction. The goal of RISC was to make instructions so simple, each one could be executed in a single clock cycle [1]. The focus on "reduced instructions" led to the resulting machine being called a "reduced instruction set computer" (RISC).

Unfortunately, the term "reduced instruction set computer" is often misunderstood as meaning that there are fewer instructions in the instruction set of the processor. Instead, RISC designs often have huge command sets of their own. Inspired by the desire for simpler designs, some people have developed some interesting MISC and OISC machines such as Transport Triggered Architectures, while others have walked into a Turing tarpit.

The real difference between RISC and CISC is the philosophy of doing everything in registers and loading and saving the data to and from them. To avoid that misunderstanding, many researchers prefer the term load-store.

Over time the older design technique became known as Complex Instruction Set Computer, or CISC, although this was largely to give them a different name for comparison purposes.

Code was implemented as a series of these simple instructions, instead of a single complex instruction that had the same result. This had the side effect of leaving more room in the instruction to carry data with it, meaning that there was less need to use registers or memory. At the same time the memory interface was considerably simpler, allowing it to be tuned.

不過RISC也有它的缺點(diǎn),。當(dāng)需要一系列指令用來完成非常簡單的程序時,,從內(nèi)存讀入的指令總數(shù)會很多,程序也因此變大,。當(dāng)時對RISC的優(yōu)劣有很多的爭論,。

提升CPU性能的方法

While the RISC philosophy was coming into its own, new ideas about how to dramatically increase performance of the CPUs were starting to develop.

In the early 1980s it was thought that existing design was reaching theoretical limits. Future improvements in speed would be primarily through improved semiconductor "process", that is, smaller features (transistors and wires) on the chip. The complexity of the chip would remain largely the same, but the smaller size would allow it to run at higher clock rates. A considerable amount of effort was put into designing chips for parallel computing, with built-in communications links. Instead of making faster chips, a large number of chips would be used, dividing up problems among them. However history has shown that the original fears were not valid, and there were a number of ideas that dramatically improved performance in the late 1980s.

One idea was to include a pipeline which would break down instructions into steps, and work on one step of several different instructions at the same time. A normal processor might read an instruction, decode it, fetch the memory the instruction asked for, perform the operation, and then write the results back out. The key to pipelining is the observation that the processor can start reading the next instruction as soon as it finishes reading the last, meaning that there are now two instructions being worked on (one is being read, the next is being decoded), and after another cycle there will be three. While no single instruction is completed any faster, the next instruction would complete right after the previous one. The result was a much more efficient utilization of processor resources.

Yet another solution was to use several processing elements inside the processor and run them in parallel. Instead of working on one instruction to add two numbers, these superscalar processors would look at the next instruction in the pipeline and attempt to run it at the same time in an identical unit. However, this can be difficult to do, as many instructions in computing depend on the results of some other instruction.

Both of these techniques relied on increasing speed by adding complexity to the basic layout of the CPU, as opposed to the instructions running on them. With chip space being a finite quantity, in order to include these features something else would have to be removed to make room. RISC was tailor-made to take advantage of these techniques, because the core logic of a RISC CPU was considerably simpler than in CISC designs. Although the first RISC designs had marginal performance, they were able to quickly add these new design features and by the late 1980s they were significantly outperforming their CISC counterparts. In time this would be addressed as process improved to the point where all of this could be added to a CISC design and still fit on a single chip, but this took most of the late-80s and early 90s.

The long and short of it is that for any given level of general performance, a RISC chip will typically have many fewer transistors dedicated to the core logic. This allows the designers considerable flexibility; they can, for instance:

  • 增加緩存器的大小
  • 增進(jìn)內(nèi)部的平行性
  • 增加快取大小
  • 加入其它功能,如I/O和定時器
  • 加入向量處理器(SIMD),,如AltiVec,、Streaming SIMD Extensions(SSE)
  • build the chips on older fabrication lines, which would otherwise go unused
  • 避免附加。使朝向省電化(battery-constrained)或小型化的應(yīng)用

RISC設(shè)計(jì)中常見的特征︰

  • 統(tǒng)一指令編碼(例如,,所有指令中的op-code永遠(yuǎn)位于同樣的位位置,、等長指令),可快速解譯︰
  • 泛用的緩存器,,所有緩存器可用于所有內(nèi)容,,以及編譯器設(shè)計(jì)的單純化(不過緩存器中區(qū)分了整數(shù)浮點(diǎn)數(shù));
  • 單純的尋址模式(復(fù)雜尋址模式以簡單計(jì)算指令序列取代),;
  • 硬件中支持少數(shù)數(shù)據(jù)型別(例如,,一些CISC計(jì)算機(jī)中存有處理字節(jié)字符串的指令。這在RISC計(jì)算機(jī)中不太可能出現(xiàn)),。

RISC設(shè)計(jì)上同時也有哈佛內(nèi)存模塊特色,,凡指令流和數(shù)據(jù)流在概念上分開;這意味著更改代碼存在的內(nèi)存地址對處理器執(zhí)行過的指令沒有影響(因?yàn)镃PU有著獨(dú)立的指令和數(shù)據(jù)緩存),,至少在特殊的同步指令發(fā)出前,。在另一面,這允許指令緩存和數(shù)據(jù)緩存同時被訪問,,通常能改進(jìn)運(yùn)行效率,。

許多早期的RISC設(shè)計(jì)同樣共享著不好的副作用——轉(zhuǎn)移延時槽,轉(zhuǎn)移延時槽是指一個跳轉(zhuǎn)或轉(zhuǎn)移指令之后的指令空間。無論轉(zhuǎn)移是否發(fā)生,,空間中的指令將被執(zhí)行(或者說是轉(zhuǎn)移效果被延遲)。這些指令讓CPU的算術(shù)和邏輯單元(ALU)繁忙比通常執(zhí)行轉(zhuǎn)移所需更多的時間?,F(xiàn)在轉(zhuǎn)移延時槽被認(rèn)為是實(shí)現(xiàn)特定RISC設(shè)計(jì)的副作用,,現(xiàn)代的RISC設(shè)計(jì)通常避免了這個問題(如PowerPC,最近的SPARC版本,,MIPS),。

參考

例如:Intel的奔騰系列CPU屬于復(fù)雜指令集CPU,IBM 的PowerPC 970(用于蘋果機(jī)MAC G5)CPU屬于精簡指令集CPU,。

    本站是提供個人知識管理的網(wǎng)絡(luò)存儲空間,,所有內(nèi)容均由用戶發(fā)布,不代表本站觀點(diǎn),。請注意甄別內(nèi)容中的聯(lián)系方式,、誘導(dǎo)購買等信息,謹(jǐn)防詐騙,。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容,,請點(diǎn)擊一鍵舉報(bào)。
    轉(zhuǎn)藏 分享 獻(xiàn)花(0

    0條評論

    發(fā)表

    請遵守用戶 評論公約

    類似文章 更多