Immediate Operand

Architecture

David Coin Harris , Sarah 50. Harris , in Digital Design and Computer Compages (Second Edition), 2013

Constants/Immediates

Load word and store give-and-take, lw and sw, also illustrate the use of constants in MIPS instructions. These constants are called immediates, because their values are immediately available from the instruction and exercise not crave a annals or memory access. Add together immediate, addi , is another common MIPS educational activity that uses an firsthand operand. addi adds the firsthand specified in the instruction to a value in a annals, equally shown in Code Example 6.9.

Lawmaking Case 6.9

Immediate Operands

High-Level Lawmaking

a = a + 4;

b = a − 12;

MIPS Assembly Code

#   $s0 = a, $s1 = b

  addi $s0, $s0, 4   # a = a + 4

  addi $s1, $s0, −12   # b = a − 12

The immediate specified in an instruction is a 16-flake two's complement number in the range [–32,768, 32,767]. Subtraction is equivalent to calculation a negative number, so, in the involvement of simplicity, there is no subi instruction in the MIPS architecture.

Recall that the add together and sub instructions utilise three annals operands. But the lw, sw, and addi instructions employ two register operands and a constant. Because the instruction formats differ, lw and sw instructions violate blueprint principle 1: simplicity favors regularity. However, this upshot allows us to introduce the last design principle:

Design Principle 4: Good design demands good compromises.

A single instruction format would be simple merely non flexible. The MIPS educational activity set makes the compromise of supporting iii instruction formats. Ane format, used for instructions such as add and sub, has three register operands. Another, used for instructions such as lw and addi, has two annals operands and a 16-fleck firsthand. A 3rd, to exist discussed after, has a 26-bit immediate and no registers. The side by side section discusses the 3 MIPS instruction formats and shows how they are encoded into binary.

Read full chapter

URL:

https://www.sciencedirect.com/scientific discipline/article/pii/B9780123944245000069

Compages

Sarah L. Harris , David Money Harris , in Digital Blueprint and Reckoner Architecture, 2016

Constants/Immediates

In addition to register operations, ARM instructions tin can use constant or immediate operands. These constants are called immediates, because their values are immediately available from the instruction and do not crave a register or memory access. Code Instance 6.6 shows the ADD educational activity adding an immediate to a register. In assembly code, the immediate is preceded by the # symbol and tin exist written in decimal or hexadecimal. Hexadecimal constants in ARM assembly language commencement with 0x, as they do in C. Immediates are unsigned viii- to 12-bit numbers with a peculiar encoding described in Section 6.4.

Code Example half dozen.6

Immediate Operands

High-Level Code

a = a + 4;

b = a − 12;

ARM Associates Code

; R7 = a, R8 = b

  Add together R7, R7, #4   ; a = a + 4

  SUB R8, R7, #0xC   ; b = a − 12

The move instruction (MOV) is a useful way to initialize annals values. Lawmaking Example half dozen.7 initializes the variables i and x to 0 and 4080, respectively. MOV can as well accept a annals source operand. For example, MOV R1, R7 copies the contents of register R7 into R1.

Lawmaking Example half dozen.seven

Initializing Values Using Immediates

Loftier-Level Code

i = 0;

x = 4080;

ARM Assembly Code

; R4 = i, R5 = x

  MOV R4, #0   ; i = 0

  MOV R5, #0xFF0   ; x = 4080

Read full affiliate

URL:

https://www.sciencedirect.com/scientific discipline/commodity/pii/B9780128000564000066

Architecture

Sarah L. Harris , David Harris , in Digital Blueprint and Computer Compages, 2022

Constants/Immediates

In add-on to register operations, RISC-V instructions can use abiding or immediate operands. These constants are called immediates because their values are immediately bachelor from the instruction and do not crave a register or memory access. Code Example 6.6 shows the add together immediate instruction, addi, that adds an immediate to a register. In assembly lawmaking, the firsthand can be written in decimal, hexadecimal, or binary. Hexadecimal constants in RISC-V assembly linguistic communication kickoff with 0x and binary constants first with 0b, as they practise in C. Immediates are 12-bit two's complement numbers, and then they are sign-extended to 32 $.25. The addi didactics is a useful mode to initialize register values with pocket-size constants. Code Case half dozen.7 initializes the variables i, 10, and y to 0, 2032, –78, respectively.

Code Example 6.vi

Firsthand Operands

High-Level Lawmaking

a = a + iv;

b = a − 12;

RISC-5 Assembly Code

# s0 = a, s1 = b

  addi s0, s0, 4   # a = a + 4

  addi s1, s0, −12   # b = a − 12

Lawmaking Example half-dozen.seven

Initializing Values Using Immediates

Loftier-Level Code

i = 0;

10 = 2032;

y = −78;

RISC-V Assembly Code

# s4 = i, s5 = 10, s6 = y

  addi s4, cipher, 0   # i = 0

  addi s5, zero, 2032   # 10 = 2032

  addi s6, naught, −78   # y = −78

Immediates tin be written in decimal, hexadecimal, or binary. For example, the following instructions all put the decimal value 109 into s5:

addi s5,x0,0b1101101

addi s5,x0,0x6D

addi s5,x0,109

To create larger constants, use a load upper immediate instruction (lui) followed past an add immediate educational activity (addi), as shown in Code Case vi.8. The lui instruction loads a 20-bit firsthand into the virtually significant 20 $.25 of the didactics and places zeros in the to the lowest degree pregnant bits.

Code Example half dozen.viii

32-Bit Constant Example

High-Level Code

int a = 0xABCDE123;

RISC-5 Associates Code

lui   s2, 0xABCDE   # s2 = 0xABCDE000

addi s2, s2, 0x123   # s2 = 0xABCDE123

When creating large immediates, if the 12-bit immediate in addi is negative (i.e., fleck xi is 1), the upper immediate in the lui must be incremented by i. Call up that addi sign-extends the 12-flake immediate, so a negative firsthand will have all 1's in its upper 20 bits. Because all 1's is −1 in two'southward complement, adding all 1's to the upper immediate results in subtracting one from the upper immediate. Lawmaking Example vi.nine shows such a case where the desired immediate is 0xFEEDA987. lui s2, 0xFEEDB puts 0xFEEDB000 into s2. The desired 20-bit upper immediate, 0xFEEDA, is incremented past 1. 0x987 is the 12-bit representation of −1657, so addi s2, s2, −1657 adds s2 and the sign-extended 12-bit firsthand (0xFEEDB000 + 0xFFFFF987 = 0xFEEDA987) and places the event in s2, as desired.

Code Case half dozen.9

32-scrap Constant with a One in Bit 11

Loftier-Level Lawmaking

int a = 0xFEEDA987;

RISC-V Associates Code

lui   s2, 0xFEEDB   # s2 = 0xFEEDB000

addi s2, s2, −1657   # s2 = 0xFEEDA987

The int information blazon in C represents a signed number, that is, a 2'due south complement integer. The C specification requires that int exist at to the lowest degree 16 bits broad only does not crave a detail size. Most modern compilers (including those for RV32I) use 32 bits, and then an int represents a number in the range [−231, 231− one]. C likewise defines int32_t as a 32-bit ii'due south complement integer, but this is more than cumbersome to type.

Read full chapter

URL:

https://world wide web.sciencedirect.com/science/article/pii/B9780128200643000064

Embedded Processor Architecture

Peter Barry , Patrick Crowley , in Modern Embedded Computing, 2012

Firsthand Operands

Some instructions use data encoded in the instruction itself equally a source operand. The operands are chosen firsthand operands. For case, the following instruction loads the EAX annals with zero.

MOV   EAX, 00

The maximum value of an firsthand operand varies among instructions, merely it can never be greater than ii32. The maximum size of an firsthand on RISC architecture is much lower; for example, on the ARM architecture the maximum size of an immediate is 12 $.25 as the teaching size is stock-still at 32 $.25. The concept of a literal pool is commonly used on RISC processors to get around this limitation. In this instance the 32-chip value to be stored into a register is a information value held as part of the code section (in an area set aside for literals, often at the end of the object file). The RISC didactics loads the register with a load program counter relative operation to read the 32-flake data value into the register.

Read total chapter

URL:

https://www.sciencedirect.com/scientific discipline/commodity/pii/B9780123914903000059

PIC Microcontroller Systems

Martin P. Bates , in Programming 8-chip Motion picture Microcontrollers in C, 2008

Plan Execution

The chip has 8   k (8096 × fourteen bits) of flash ROM plan memory, which has to exist programmed via the series programming pins PGM, PGC, and PGD. The fixed-length instructions contain both the performance code and operand (firsthand information, register address, or jump address). The mid-range PIC has a limited number of instructions (35) and is therefore classified as a RISC (reduced instruction set calculator) processor.

Looking at the internal compages, we tin can identify the blocks involved in program execution. The program memory ROM contains the machine code, in locations numbered from 0000h to 1FFFh (8   k). The program counter holds the address of the current pedagogy and is incremented or modified after each pace. On reset or power up, it is reset to zilch and the first education at address 0000 is loaded into the educational activity annals, decoded, and executed. The program then gain in sequence, operating on the contents of the file registers (000–1FFh), executing data move instructions to transfer data between ports and file registers or arithmetic and logic instructions to process it. The CPU has 1 main working register (W), through which all the data must pass.

If a branch instruction (conditional jump) is decoded, a fleck exam is carried out; and if the result is true, the destination address included in the instruction is loaded into the program counter to forcefulness the jump. If the consequence is imitation, the execution sequence continues unchanged. In assembly linguistic communication, when Telephone call and Return are used to implement subroutines, a similar process occurs. The stack is used to store return addresses, so that the program can return automatically to the original plan position. However, this mechanism is not used by the CCS C compiler, as it limits the number of levels of subroutine (or C functions) to eight, which is the depth of the stack. Instead, a elementary GOTO instruction is used for function calls and returns, with the return address computed by the compiler.

Read full chapter

URL:

https://www.sciencedirect.com/scientific discipline/article/pii/B9780750689601000018

HPC Architecture 1

Thomas Sterling , ... Maciej Brodowicz , in Loftier Performance Calculating, 2018

2.7.one Single-Teaching, Multiple Data Architecture

The SIMD array class of parallel computer architecture consists of a very large number of relatively elementary PEs, each operating on its ain data memory (Fig. 2.thirteen). The Pes are all controlled by a shared sequencer or sequence controller that broadcasts instructions in social club to all the PEs. At any point in time all the Human foot are doing the same functioning but on their respective dedicated retentiveness blocks. An interconnection network provides data paths for concurrent transfers of information between PEs, also managed past the sequence controller. I/O channels provide high bandwidth (in many cases) to the system equally a whole or directly to the PEs for rapid postsensor processing. SIMD assortment architectures have been employed every bit standalone systems or integrated with other computer systems as accelerators.

Effigy two.13. The SIMD assortment class of parallel computer architecture.

The PE of the SIMD array is highly replicated to deliver potentially dramatic operation gain through this level of parallelism. The canonical PE consists of key internal functional components, including the following.

Memory block—provides part of the organisation total memory which is straight accessible to the individual PE. The resulting organization-broad memory bandwidth is very high, with each retentiveness read from and written to its own PE.

ALU—performs operations on contents of information in local memory, perhaps via local registers with additional firsthand operand values within broadcast instructions from the sequence controller.

Local registers—hold current working data values for operations performed past the PE. For load/store architectures, registers are direct interfaces to the local memory block. Local registers may serve equally intermediate buffers for nonlocal data transfers from system-wide network and remote Foot besides every bit external I/O channels.

Sequencer controller—accepts the stream of instructions from the arrangement pedagogy sequencer, decodes each teaching, and generates the necessary local PE control signals, maybe as a sequence of microoperations.

Pedagogy interface—a port to the broadcast network that distributes the instruction stream from the sequence controller.

Data interface—a port to the system data network for exchanging data amid PE memory blocks.

External I/O interface—for those systems that associate private PEs with system external I/O channels, the PE includes a direct interface to the dedicated port.

The SIMD assortment sequence controller determines the operations performed by the set of Human foot. Information technology as well is responsible for some of the computational work itself. The sequence controller may take diverse forms and is itself a target for new designs even today. Simply in the most general sense, a set of features and subcomponents unify most variations.

As a starting time approximation, Amdahl's constabulary may be used to estimate the functioning gain of a classical SIMD assortment computer. Assume that in a given educational activity wheel either all the assortment processor cores, p due north , perform their respective operations simultaneously or only the control sequencer performs a series operation with the array processor cores idle; also assume that the fraction of cycles, f, can take advantage of the assortment processor cores. And then using Amdahl's law (run across Department 2.7.two) the speedup, S, tin can be adamant as:

(2.11) S = 1 i f + ( f p n )

Read full chapter

URL:

https://www.sciencedirect.com/scientific discipline/article/pii/B9780124201583000022

MPUs for Medical Networks

Syed 5. Ahamed , in Intelligent Networks, 2013

eleven.four.3 Object Processor Units

The architectural framework of typical object processor units (OPUs) is consistent with the typical representation of CPUs. Design of the object performance code (Oopc) plays an important role in the design of OPU and object-oriented car. In an elementary sense, this part is comparable to role of the 8-bit opc in the design of IAS machine during the 1944–1945 periods. For this (IAS) automobile, the opc length was viii $.25 in the 20-bit instructions, and the retentiveness of 4096 word, 40-bit memory corresponds to the address space of 12 binary bits. The design experience of the game processors and the modern graphical processor units will serve as a platform for the blueprint of the OPUs and hardware-based object machines.

The intermediate generations of machines (such as IBM 7094, 360-series) provide a rich array of guidelines to derive the instruction sets for the OPUs. If a fix of object registers or an object cache can exist envisioned in the OPU, so the instructions corresponding to annals instructions (R-series), register-storage (RS-series), storage (SS), immediate operand (I-series), and I/O serial instructions for OPU can also exist designed. The instruction set will need an expansion to suit the application. It is logical to foresee the demand of command object memories to replace the control memories of the microprogrammable computers.

The instruction set of the OPU is derived from the most frequent object functions such every bit (i) unmarried-object instructions, (2) multiobject instructions, (iii) object to object memory instructions, (four) internal object–external object instructions, and (5) object relationship instructions. The separation of logical, numeric, seminumeric, alphanumeric, and convolutions functions between objects will besides be necessary. Hardware, firmware, or animate being-forcefulness software (compiler ability) can accomplish these functions. The need for the next-generation object and cognition machines (discussed in Section 11.v) should provide an economic incentive to develop these architectural improvements beyond the basic OPU configuration shown in Figure eleven.2.

Effigy 11.2. Schematic of a hardwired object processor unit (OPU). Processing n objects with 1000 (maximum) attributes generates an n×m matrix. The common, interactive, and overlapping attributes are thus reconfigured to institute primary and secondary relationships between objects. DMA, direct retentiveness access; IDBMS, Intelligent, data, object, and aspect base(s) management system(southward); KB, knowledge base(s). Many variations tin can be derived.

The designs of OPU can exist equally diversified as the designs of a CPU. The CPUs, I/O device interfaces, unlike retention units, and directly memory admission hardware units for loftier-speed data commutation between main memory units and large secondary memories. Over the decades, numerous CPU architectures (single bus, multibus, hardwired, micro- and nanoprogrammed, multicontrol memory-based systems) have come and gone.

Some of microprogrammable and RISC architecture notwithstanding exist. Efficient and optimal functioning from the CPUs also needs combined SISD, SIMD, MISD, and MIMD, (Rock 1980) and/or pipeline architectures. Combined CPU designs tin use different clusters of architecture for their subfunctions. Some formats (e.g., assortment processors, matrix manipulators) are in active use. 2 concepts that have survived many generations of CPUs are (i) the algebra of functions (i.e., opcodes) that is well delineated, accepted, and documented and (ii) the operands that undergo dynamic changes every bit the opcode is executed in the CPU(s).

An architectural consonance exists between CPUs and OPUs. In pursuing the similarities, the five variations (SISD, SIMD, MISD, MIMD, and/or pipeline) pattern established for CPUs can be mapped into five corresponding designs; single process single object (SPSO), single procedure multiple objects (SPMO), multiple process single object (MPSO), multiple process multiple objects (MPMO), and/or partial process pipeline, respectively (Ahamed, 2003).

Read full chapter

URL:

https://world wide web.sciencedirect.com/scientific discipline/commodity/pii/B978012416630100011X

Demultiplexing

George Varghese , in Network Algorithmics, 2005

8.vi DYNAMIC Parcel FILTER: COMPILERS TO THE RESCUE

The Pathfinder story ends with an appeal to hardware to handle demultiplexing at loftier speeds. Since it is unlikely that most workstations and PCs today can afford dedicated demultiplexing hardware, information technology appears that implementors must choose between the flexibility afforded past early demultiplexing and the limited performance of a software classifier. Thus it is inappreciably surprising that high-performance TCP [CJRS89], active messages [vCGS92], and Remote Procedure Phone call (RPC) [TNML93] implementations use hand-crafted demultiplexing routines.

Dynamic packet filter [EK96] (DPF) attempts to have its block (gain flexibility) and eat it (obtain performance) at the same time. DPF starts with the Pathfinder trie idea. Still, it goes on to eliminate indirections and extra checks inherent in cell processing past recompiling the classifier into machine code each time a filter is added or deleted. In effect, DPF produces separate, optimized code for each cell in the trie, every bit opposed to generic, unoptimized code that can parse whatever cell in the trie.

DPF is based on dynamic code generation engineering [Eng96], which allows code to exist generated at run time instead of when the kernel is compiled. DPF is an application of Principle P2, shifting computation in time. Annotation that by run time we mean classifier update fourth dimension and not bundle processing time.

This is fortunate considering this implies that DPF must exist able to recompile code fast enough so as not to dull down a classifier update. For instance, information technology may take milliseconds to set up a connection, which in turn requires adding a filter to identify the endpoint in the aforementioned time. By contrast, it tin take a few microseconds to receive a minimum-size packet at gigabit rates. Despite this leeway, submillisecond compile times are still challenging.

To understand why using specialized code per jail cell is useful, it helps to understand 2 generic causes of cell-processing inefficiency in Pathfinder:

Estimation Overhead: Pathfinder code is indeed compiled into machine instructions when kernel code is compiled. However, the code does, in some sense, "translate" a generic Pathfinder cell. To encounter this, consider a generic Pathfinder cell C that specifies a iv-tuple: offset, length, mask, value. When a packet P arrives, arcadian automobile code to check whether the prison cell matches the parcel is as follows:

LOAD R1, C(Offset); (* load outset specified in cell into register R1 *)

LOAD R2, C(length); (* load length specified in cell into register R1 *)

LOAD R3, P(R1, R2); (* load packet field specified by get-go into R3 *)

LOAD R1, C(mask); (* load mask specified in cell into register R1 *)

AND R3, R1; (* mask parcel field every bit specified in cell *)

LOAD R2, C(value); (* load value specified in cell into register R5 *)

BNE R2, R3; (* branch if masked packet field is not equal to value *)

Detect the extra instructions and extra memory references in Lines i, ii, 4, and half-dozen that are used to load parameters from a generic prison cell in order to exist bachelor for after comparison.

Prophylactic-Checking Overhead: Considering parcel filters written by users cannot be trusted, all implementations must perform checks to baby-sit confronting errors. For case, every reference to a packet field must be checked at run fourth dimension to ensure that it stays within the current packet being demultiplexed. Similarly, references need to be checked in real time for memory alignment; on many machines, a retention reference that is not aligned to a multiple of a word size can cause a trap. After these boosted checks, the code fragment shown earlier is more complicated and contains even more instructions.

By specializing code for each jail cell, DPF tin can eliminate these ii sources of overhead past exploiting data known when the prison cell is added to the Pathfinder graph.

Exterminating Estimation Overhead: Since DPF knows all the cell parameters when the cell is created, DPF tin generate code in which the cell parameters are directly encoded into the car code as immediate operands. For example, the before code fragment to parse a generic Pathfinder cell collapses to the more meaty cell-specific code:

LOAD R3, P(offset, length); (* load package field into R3 *)

AND R3, mask; (* mask packet field using mask in instruction *)

BNE R3, value; (* co-operative if field not equal to value *)

Observe that the actress instructions and (more chiefly) extra memory references to load parameters take disappeared, because the parameters are straight placed every bit immediate operands within the instructions.

Mitigating Safety-Checking Overhead: Alignment checking can exist reduced in the expected instance (P11) by inferring at compile fourth dimension that most references are discussion aligned. This can exist washed by examining the consummate filter. If the initial reference is word aligned and the current reference (offset plus length of all previous headers) is a multiple of the give-and-take length, then the reference is give-and-take aligned. Existent-fourth dimension alignment checks demand only exist used when the compile time inference fails, for example, when indirect loads are performed (eastward.g., a variable-size IP header). Similarly, at compile time the largest offset used in any cell tin exist determined and a single check can be placed (before packet processing) to ensure that the largest offset is within the length of the current packet.

Once one is onto a good affair, information technology pays to push information technology for all it is worth. DPF goes on to exploit compile-time knowledge in DPF to perform further optimizations equally follows. A first optimization is to combine small accesses to adjacent fields into a unmarried big access. Other optimizations are explored in the exercises.

DPF has the following potential disadvantages that are fabricated manageable through conscientious pattern.

Recompilation Time: Call back that when a filter is added to the Pathfinder trie (Effigy 8.vi), only cells that were not present in the original trie need to be created. DPF optimizes this expected example (P11) past caching the code for existing cells and copying this code straight (without recreating them from scratch) to the new classifier code block. New code must be emitted just for the newly created cells. Similarly, when a new value is added to a hash table (east.chiliad., the new TCP port added in Figure viii.6), unless the hash function changes, the code is reused and only the hash table is updated.

Code Bloat: One of the standard advantages of interpretation is more than compact code. Generating specialized code per cell appears to create excessive amounts of code, peculiarly for large numbers of filters. A large lawmaking footprint can, in plough, issue in degraded pedagogy cache performance. However, a careful examination shows that the number of distinct code blocks generated past DPF is only proportional to the number of distinct header fields examined by all filters. This should scale much better than the number of filters. Consider, for example, ten,000 simultaneous TCP connections, for which DPF may emit only three specialized lawmaking blocks: one for the Ethernet header, one for the IP header, and i hash tabular array for the TCP header.

The final operation numbers for DPF are impressive. DPF demultiplexes messages thirteen–26 times faster than Pathfinder on a comparable platform [EK96]. The time to add a filter, withal, is but three times slower than Pathfinder. Dynamic code generation accounts for only 40% of this increased insertion overhead.

In whatsoever case, the larger insertion costs appear to exist a reasonable manner to pay for faster demultiplexing. Finally, DPF demultiplexing routines appear to rival or beat hand-crafted demultiplexing routines; for instance, a DPF routine to demultiplex IP packets takes 18 instructions, compared to an earlier value, reported in Clark [Cla85], of 57 instructions. While the two implementations were on different machines, the numbers provide some indication of DPF quality.

The final message of DPF is twofold. First, DPF indicates that one can obtain both performance and flexibility. But as compiler-generated code is often faster than hand-crafted lawmaking, DPF code appears to make manus-crafted demultiplexing no longer necessary. Second, DPF indicates that hardware back up for demultiplexing at line rates may not be necessary. In fact, information technology may exist difficult to allow dynamic lawmaking generation on filter creation in a hardware implementation. Software demultiplexing allows cheaper workstations; information technology also allows demultiplexing lawmaking to benefit from processor speed improvements.

Technology Changes Can Invalidate Pattern Assumptions

There are several examples of innovations in architecture and operating systems that were discarded later initial utilise then returned to exist used again. While this may seem like the whims of fashion ("collars are frilled again in 1995") or reinventing the wheel ("in that location is nothing new under the sun"), it takes a careful understanding of current engineering to know when to dust off an old idea, possibly even in a new guise.

Accept, for example, the cadre of the phone network used to send vocalism calls via analog signals. With the advent of cobweb optics and the transistor, much of the core telephone network now transmits voice signals in digital formats using the T1 and SONET hierarchies. However, with the advent of wavelength-partitioning multiplexing in optical fiber, at that place is at least some talk of returning to analog transmission.

Thus the expert system designer must constantly monitor available engineering science to check whether the system pattern assumptions have been invalidated. The idea of using dynamic compilation was mentioned by the CSPF designers in Mogul et al. [MRA87] but was was non considered further. The CSPF designers causeless that tailoring code to specific sets of filters (by recompiling the classifier code whenever a filter was added) was too "complicated."

Dynamic compilation at the time of the CSPF design was probably slow and likewise not portable beyond systems; the gains at that time would have likewise been marginal because of other bottlenecks. However, by the fourth dimension DPF was beingness designed, a number of systems, including VCODE [Eng96], had designed fairly fast and portable dynamic compilation infrastructure. The other classifier implementations in DPF's lineage had also eliminated other bottlenecks, which allowed the benefits of dynamic compilation to stand out more clearly.

Read full chapter

URL:

https://world wide web.sciencedirect.com/scientific discipline/article/pii/B9780120884773500102

Early Intel® Architecture

In Power and Performance, 2015

1.1.4 Car Code Format

One of the more complex aspects of x86 is the encoding of instructions into machine codes, that is, the binary format expected by the processor for instructions. Typically, developers write assembly using the instruction mnemonics, and allow the assembler select the proper instruction format; however, that isn't always viable. An engineer might want to bypass the assembler and manually encode the desired instructions, in order to utilize a newer instruction on an older assembler, which doesn't back up that educational activity, or to precisely command the encoding utilized, in society to control code size.

8086 instructions, and their operands, are encoded into a variable length, ranging from ane to vi bytes. To arrange this, the decoding unit parses the earlier bits in order to determine what bits to wait in the future, and how to translate them. Utilizing a variable length encoding format trades an increase in decoder complication for improved code density. This is because very common instructions can be given short sequences, while less common and more circuitous instructions tin can be given longer sequences.

The first byte of the machine code represents the instruction'due south opcode . An opcode is only a fixed number corresponding to a specific class of an instruction. Unlike forms of an educational activity, such as one form that operates on a register operand and ane class that operates on an immediate operand, may take dissimilar opcodes. This opcode forms the initial decoding state that determines the decoder's next deportment. The opcode for a given instruction format can be found in Volume 2, the Pedagogy Set Reference, of the Intel SDM.

Some very common instructions, such as the stack manipulating PUSH and POP instructions in their register form, or instructions that employ implicit registers, tin be encoded with only 1 byte. For instance, consider the Push instruction, that places the value located in the register operand on the elevation of the stack, which has an opcode of 01010two. Note that this opcode is only 5 bits. The remaining three least meaning bits are the encoding of the register operand. In the mod education reference, this instruction format, "PUSH r16," is expressed every bit "01050 + rw" (Intel Corporation, 2013). The rw entry refers to a annals lawmaking specifically designated for unmarried byte opcodes. Table 1.three provides a list of these codes. For example, using this table and the reference above, the binary encoding for PUSH AX is 0x50, for Button BP is 0x55, and for Push DI is 0x57. As an aside, in later processor generations the 32- and 64-fleck versions of the Push instruction, with a annals operand, are also encoded equally 1 byte.

Table i.3. Register Codes for Single Byte Opcodes "+rw" (Intel Corporation, 2013)

rw Register
0 AX
1 CX
two DX
3 BX
4 SP
5 BP
6 SI
vii DI

If the format is longer than one byte, the 2nd byte, referred to equally the Modern R/M byte, describes the operands. This byte is comprised of three dissimilar fields, MOD, bits 7 and vi, REG, bits v through 3, and R/M, $.25 2 through 0.

The Mod field encodes whether one of the operands is a memory address, and if then, the size of the retentivity offset the decoder should expect. This memory start, if present, immediately follows the Modern R/M byte. Table 1.4 lists the meanings of the Modernistic field.

Table 1.four. Values for the Modernistic Field in the Mod R/G Byte (Intel Corporation, 2013)

Value Memory Operand Commencement Size
00 Yes 0
01 Yes 1 Byte
10 Yep two Bytes
11 No 0

The REG field encodes one of the register operands, or, in the example where at that place are no annals operands, is combined with the opcode for a special instruction-specific significant. Tabular array 1.5 lists the various register encodings. Observe how the high and low byte accesses to the data grouping registers are encoded, with the byte access to the arrow/index classification of registers actually accessing the high byte of the data group registers.

Table ane.5. Register Encodings in Mod R/M Byte (Intel Corporation, 2013)

Value Annals (16/8)
000 AX/AL
001 CX/CL
010 DX/DL
011 BX/BL
100 SP/AH
101 BP/CH
110 SI/DH
111 DI/BH

In the case where Mod = three, that is, where there are no memory operands, the R/M field encodes the second annals operand, using the encodings from Table 1.5. Otherwise, the R/Chiliad field specifies how the memory operand's address should exist calculated.

The 8086, and its other 16-bit successors, had some limitations on which registers and forms could be used for addressing. These restrictions were removed once the architecture expanded to 32-bits, so it doesn't make also much sense to document them here.

For an example of the REG field extending the opcode, consider the CMP instruction in the grade that compares an 16-flake immediate against a 16-fleck register. In the SDM, this form, "CMP r16,imm16," is described as "81 /7 iw" (Intel Corporation, 2013), which means an opcode byte of 0ten81, then a Modern R/M byte with Mod = 112, REG = 7 = 1112, and the R/M field containing the xvi-scrap register to test. The iw entry specifies that a xvi-bit immediate value will follow the Mod R/M byte, providing the immediate to test the register confronting. Therefore, "CMP DX, 0xABCD," volition be encoded every bit: 0x81, 0xFA, 0xCD, 0xAB. Discover that 0xABCD is stored byte-reversed considering x86 is petty-endian.

Consider another case, this time performing a CMP of a 16-fleck immediate confronting a memory operand. For this example, the memory operand is encoded as an outset from the base arrow, BP + 8. The CMP encoding format is the aforementioned equally earlier, the divergence volition be in the Modern R/M byte. The Mod field will exist 012, although ten2 could exist used as well but would waste an extra byte. Like to the last example, the REG field volition be 7, 1112. Finally, the R/M field will be 1102. This leaves us with the first byte, the opcode 0x81, and the second byte, the Modern R/M byte 0ten7E. Thus, "CMP 0xABCD, [BP + 8]," will be encoded every bit 0x81, 0xviiE, 0ten08, 0xCD, 0xAB.

Read full chapter

URL:

https://www.sciencedirect.com/science/commodity/pii/B978012800726600001X