Home
Intel Core 2 Duo T5850
Contents
1. chenes intel EXTRACTPS Extract Packed Single Precision Floating Point Value Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF 3A17 EXTRACTPS A Valid Valid Extract a single precision Ir ib reg m32 xmm2 floating point value from imm8 2 at the source offset specified by imm8 and store the result to reg or m32 The upper 32 bits of r64 is zeroed if reg is r64 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m w ModRM reg r imm8 NA FSAVE FNSAVE Store x87 FPU State IA 32 Architecture Compatibility For Intel math coprocessors and FPUS prior to the Intel Pentium processor an FWAIT instruction should be executed before attempting to read from the memory image stored with a prior FSAVE FNSAVE instruction This FWAIT instruction helps ensure that the storage operation has been completed When operating a Pentium or Intel486 processor in MS DOS compatibility mode it is possible under unusual circumstances for an FNSAVE instruction to be interrupted prior to being executed to handle a pending FPU exception See the section titled No Wait FPU Instructions Can Get FPU Interrupt in Window in Appendix D of the Intel 64 and IA 32 Architectures Software Developer s Manual Volume 1 for a description of these circumstances An FNSAVE instruction cannot be interrupted in this way on a Pentium 4 Intel Xeon or P6 family
2. 3 Operand 4 NA NA NA NA imm8 16 32 NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 62 H intel IN Input from Port Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode E4 ib IN AL imm8 A Valid Valid Input byte from imm8 1 0 port address into AL E5 ib IN AX imm8 A Valid Valid Input word from imm8 1 0 port address into AX E5 ib IN EAX imm8 A Valid Valid Input dword from imm 1 0 port address into EAX EC IN AL DX B Valid Valid Input byte from I O port in DX into AL ED IN AX DX B Valid Valid Input word from I O port in DX into AX ED IN EAX DX B Valid Valid Input doubleword from 1 0 port in DX into EAX Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A imm8 NA NA NA B NA NA NA NA INC Increment by 1 Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode FE 0 INC r m8 A Valid Valid Increment r m byte by 1 REX 0 INC 8 Valid Increment byte by 1 FF 0 INC r m16 A Valid Valid Increment r m word by 1 FF 0 INC r m32 A Valid Valid Increment doublew ord by 1 REX W FF O INCr m64 A Valid N E Increment r m quadword by 1 40 rw INCr16 B NE Valid Increment word register by 1 404 rd INC r32 B N E Valid Increment doublew ord register by 1 NOTES n 64 bit mode r m8 can not be encoded to acc
3. Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode F20FC2 rib 5 A Valid Valid Compare low double xmm2 m64 imm8 precision floating point value in xmm2 m64 and xmm1 using imm8 as comparison predicate Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r imm8 NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 37 e Documentation Changes n tel CMPSS Compare Scalar Single Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode F30FC2 rib CMPSS A Valid Valid Compare low single xmm2 m32 imm8 precision floating point value in xmm2 m32 and xmm1 using imm8 as comparison predicate Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r imm8 NA Description Compares the low single precision floating point values in the source operand second operand and the destination operand first operand and returns the results of the comparison to the destination operand The comparison predicate operand third operand specifies the type of comparison performed The comparison result is a double word mask of all 1s comparison true or all Os comparison false The source operand can be an XMM register or a 32 bit memory location The destination operand
4. Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 216 ee ehanas intel Bits 51 12 are from the PDPTE Bits 11 3 are bits 29 21 of the linear address Bits 2 0 are all 0 If a paging structure entry s P flag bit O is O or if the entry sets any reserved bit the entry is used neither to reference another paging structure entry nor to map a page A reference using a linear address whose translation would use such a paging structure entry causes a page fault exception see Section 4 7 The following bits are reserved with 32 paging If the P flag of a paging structure entry is 1 bits 51 MAXPHYADDR are reserved If the P flag of a PML4E is 1 the PS flag is reserved If 1 GByte pages are not supported and the P flag of a PDPTE is 1 the PS flag is reserved If the P flag and the PS flag of a PDPTE are both 1 bits 29 13 are reserved If the P flag and the PS flag of a PDE are both 1 bits 20 13 are reserved If IA32 EFER NXE 0 and the P flag of a paging structure entry is 1 the XD flag bit 63 is reserved 1 See Section 4 1 4 for how to determine whether 1 GByte pages are supported Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 217 Documentation Changes n tel Figure 4 11 Formats of and Paging Structure Entries with IA 32e Paging 616 6655 5 55 5555 1 3 21 0 9 8
5. Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 53 H intel DEC Decrement by 1 Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 1 DEC r m8 A Valid Valid Decrement r m8 by 1 REX FE 1 DEC r m8 A Valid Decrement r m8 by 1 FF 1 DEC r m16 A Valid Valid Decrement r m16 by 1 FF 1 DEC r m32 A Valid Valid Decrement r m32 by 1 REX W FF 1 DECr m64 A Valid N E Decrement r m64 by 1 48 rw DECr16 B N E Valid Decrement r16 by 1 48 4rd DEC r32 B N E Valid Decrement r32 by 1 NOTES n 64 bit mode r m8 can not be encoded to access the following byte registers if a REX prefix is used AH BH CH DH Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m w NA NA NA B reg r w NA NA NA DIV Unsigned Divide Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode F6 6 DIV r m8 A Valid Valid Unsigned divide AX by r m8 with result stored in AL lt Quotient AH Remainder REX F6 6 DIV r m8 A Valid N E Unsigned divide AX by r m8 with result stored in AL Quotient AH Remainder F7 6 DIV r m16 A Valid Valid Unsigned divide DX AX by r m16 with result stored in lt Quotient DX Remainder F7 6 DIV r m32 A Valid Valid Unsigned divide EDX EAX by r m32 with result stored in EAX lt Quotient EDX Remainder REXW F7 6 DIV r m
6. 2 MISC ENABLES 12 to detect whether the performance monitoring facility and PEBS functionality are supported in the processor The MSR 2 5 ENABLE provides 4 bits that software must use to enable which 2 PMCx overflow condition will cause the PEBS record to be captured Additionally the PEBS record is expanded to allow latency information to be captured The MSR 2 PEBS ENABLE provides 4 additional bits that software must use to enable latency data recording in the PEBS record upon the respective 2 PMCx overflow condition The layout of 1A32 PEBS ENABLE is shown in Figure 30 13 Programming PEBS Facility Only a subset of non architectural performance events in the processor support PEBS The subset of precise events are listed in Table 30 10 In addition to using 2 PERFEVTSELx to specify event unit mask settings and setting the EN bit in 1 Intel Xeon processor 5500 series and 3400 series are also based on Intel microarchitecture Nehalem so the performance monitoring facilities described in this section generally also apply Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 272 e Documentation Changes n tel thelA32 PEBS ENABLE register for the respective counter the software must also initialize the 05 BUFFER MANAGEMENT AREA data structure in memory to support capturing PEBS records for precise events 30 14 1 Overview of Performance Monit
7. REX W AD REP LODS RAX A Valid N E Load RCX quadwords from RSI to RAX F3 AA REP STOS m8 A Valid Valid Fill E CX bytes at ES E DI with AL F3 REX W AA REP STOS m8 A Valid N E Fill RCX bytes at RDI with AL F3 AB REP STOS m16 A Valid Valid Fill E CX words at ES E DI with AX REP STOS m32 A Valid Valid Fill E CX doublewords at ES E DI with EAX F3 REX W AB REP STOS m64 A Valid N E Fill RCX quadwords at RDI with RAX 6 5 Valid Valid Find nonmatching bytes in m8 ES E DI and DS1T EJSI F3 REX W A6 REPECMPSm8 A Valid N E Find non matching bytes in m8 RDI and RSI F3A7 REPECMPS m16 A Valid Valid Find nonmatching words in m16 ES E DI and DS E SI F3 A7 REPE CMPS m32 A Valid Valid Find nonmatching m32 doublewords ES E DI and DS E SI REX W A7 REPE CMPS m64 A Valid N E Find non matching m64 quadwords in RDI and RSI F3 AE REPE SCAS m8 A Valid Valid Find non AL byte starting at ES E DI F3 REX W AE REPE SCAS m8 A Valid N E Find non AL byte starting at RDI F3 AF REPE SCASm16 A Valid Valid Find non AX word starting at EST E DI AF REPESCASm32 A Valid Valid Find non EAX doubleword starting at ES E DI F3 REX W AF REPESCAS m64 A Valid N E Find non RAX quadword starting at RDI 171 Documentation Changes Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 2 A6 CMPS m8 Valid V
8. 1 values from xmm1 to xmm2 m128 Instruction Operand Encoding Operand 1 Operand 2 Operand 3 Operand 4 ModRM reg w ModRM r m r NA NA ModRM r m w ModRM reg r NA NA MOVBE Move Data After Swapping Bytes Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 38 FO r MOVBEr16 m16 Valid Valid Reverse byte order in m16 and move to r16 OF 38 r MOVBEr32 m32 A Valid Valid Reverse byte order in m32 and move to r32 REX W 0F38 MOVBEr64 m64 Valid N E Reverse byte order in m64 F0 r and move to r64 OF 38 F1 r 16 16 Valid Valid Reverse byte order in r16 and move to m16 OF 38 F1 r m32 r32 Valid Valid Reverse byte order in r32 and move to m32 REX W 0F38 MOVBEm64 r64 Valid N E Reverse byte order in r64 1 and move to m64 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m r NA NA B ModRM r m w ModRM reg r NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 92 Documentation Changes MOVD MOVQ Move Doubleword Move Quadword Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 6E r MOVD mm r m32 A Valid Valid Move doubleword from r m32 to mm REX W 0F6E MOVQmm r m64 A Valid N E Move quadword from r m64 Ir to mm OF7E r MOVD r m32 mm B Valid Valid Move doubleword from mm to r m32
9. Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode E8 cw CALL rel16 B NS Valid Call near relative displacement relative to next instruction E8 cd CALL rel32 B Valid Valid Call near relative displacement relative to next instruction 32 bit displacement sign extended to 64 bits in 64 bit mode FF 2 CALL r m16 B N E Valid Call near absolute indirect address given in r m16 FF 2 CALL r m32 B N E Valid Call near absolute indirect address given in r m32 Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes Documentation Changes intel Opcode FF 2 9A cd 9A cp FF 3 FF 3 REX W FF 3 Instruction CALL r m64 CALL ptr16 16 CALL ptr16 32 CALL m16 16 CALL m16 32 CALL m16 64 64 bit En Mode B Valid A Invalid A Invalid B Valid B Valid B Valid Compat Leg Mode N E Valid Valid Valid Valid N E Description Call near absolute indirect address given in r m64 Call far absolute address given in operand Call far absolute address given in operand Call far absolute indirect address given in m16 16 In 32 bit mode if selector points to a gate then RIP 32 bit zero extended displacement taken from gate else RIP zero extended 16 bit offset from far pointer referenced in the instruction In 64 bit mode If selector points to a gate then RIP 64 bit displacement taken from g
10. In Example 8 11 processor 0 does one round of 128 iterations doubleword string store operation via stosd writing the value 1 value in EAX into a block of 512 bytes from location x kept in ES EDI in ascending order Since each operation stores a double word 4 bytes the operation is repeated 128 times value in ECX The block of memory initially contained 0 Processor 1 is reading two memory locations that are part of the memory block being updated by processor 0 i e reading locations in the range x to _x 511 Example 8 11 Stores Within a String Operation May be Reordered Processor 0 Processor 1 repstosd x mov r1 _z mov r2 y Initially on processor 0 EAX 1 ECX 128 ESEDI x Initially _x to 511 _x 0 _x 2 y z x4512 rl 1 and r2 0 is allowed It is possible for processor 1 to perceive that the repeated string stores in processor 0 are happening out of order Assume that fast string operations are enabled on processor 0 8 2 5 Strengthening or Weakening the Memory Ordering Model The Intel 64 and 32 architectures provide several mechanisms for strengthening or weakening the memory ordering model to handle special programming situations These mechanisms include The 1 O instructions locking instructions the LOCK prefix and serializing instruc tions force stronger ordering on the processor e SFENCE instruction introduced to the 32 architecture in
11. Serializes all load read operations that occurred prior to the LFENCE instruction in the program instruction stream but does not affect store operations MFENCE Serializes all store and load operations that occurred prior to the MFENCE instruction in the program instruction stream Note that the SFENCE LFENCE and MFENCE instructions provide a more efficient method of controlling memory ordering than the CPUID instruction The MTRRs were introduced in the P6 family processors to define the cache characteris tics for specified areas of physical memory The following are two examples of how memory types set up with MTRRs can be used strengthen or weaken memory ordering for the Pentium 4 Intel Xeon and P6 family processors The strong uncached UC memory type forces a strong ordering model on memory accesses Here all reads and writes to the UC memory region appear on the bus and out of order or speculative accesses are not performed This memory type can be 1 Specifically LFENCE does not execute until all prior instructions have completed locally and no later instruction begins execution until LFENCE completes As a result an instruction that loads from mem ory and that precedes an LFENCE receives data from memory prior to completion of the LFENCE An LFENCE that follows an instruction that stores to memory might complete before the data being stored have become globally visible Instructions following an LFENCE may be
12. 205 H intel Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A AL AX EAX RAX imm8 16 32 NA NA B ModRM r m w imm8 16 32 NA NA C ModRM r m w ModRM reg D ModRM reg r w ModRM r m r NA NA XORPD Bitwise Logical XOR for Double Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF 57 Jr XORPD xmm1 A Valid Valid Bitwise exclusive OR of xmm2 m128 xmm2 m128 and 1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m XORPS Bitwise Logical XOR for Single Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 57 Jr XORPS xmm1 A Valid Valid Bitwise exclusive OR of xmm2 m128 xmm2 m128 and 1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 206 Documentation Changes XRSTOR Restore Processor Extended States EDX EAX Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF AE 5 XRSTOR mem A Valid Valid Restore processor extended states from memory The states are specified by Instruction Operand Encoding Op En Operand 1 Operand 2
13. 60 H intel Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA IDIV Signed Divide Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode F6 7 IDIV r m8 A Valid Valid Signed divide AX by r m8 with result stored in AL lt Quotient AH lt Remainder REX F6 7 IDIV r m8 A Valid N E Signed divide AX by r m8 with result stored in AL Quotient AH Remainder 7 7 IDIV r m16 A Valid Valid Signed divide DX AX by r m16 with result stored in AX lt Quotient DX Remainder 7 7 IDIV r m32 A Valid Valid Signed divide EDX EAX by r m32 with result stored in EAX lt Quotient EDX Remainder REXW F7 7 IDIV r m64 A Valid N E Signed divide by r m64 with result stored in RAX lt Quotient RDX Remainder NOTES n 64 bit mode r m8 can not be encoded to access the following byte registers if a REX prefix is used AH BH CH DH Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m r NA NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 61 Documentation Changes IMUL Signed Multiply intel Opcode Instruction F6 5 IMUL r m8 7 5 IMUL r m16 7 5 IMUL r m32 REX W 7 5 IMUL r m64 OF AF r IMUL r16 r m16 OF AF r IMUL r32 r m32 REX W OF AF IMUL r64
14. H intel any serializing instructions such as the CPUID instruction MFENCE does not serialize the instruction stream Weakly ordered memory types can be used to achieve higher processor performance through such techniques as out of order issue speculative reads write combining and write collapsing The degree to which a consumer of data recognizes or knows that the data is weakly ordered varies among applications and may be unknown to the producer of this data The MFENCE instruction provides a performance efficient way of ensuring load and store ordering between routines that produce weakly ordered results and routines that consume that data Processors are free to fetch and cache data speculatively from regions of system memory that use the WB WC and WT memory types This speculative fetching can occur at any time and is not tied to instruction execution Thus it is not ordered with respect to executions of the MFENCE instruction data can be brought into the caches speculatively just before during or after the execution of an MFENCE instruc tion Processors are free to fetch and cache data speculatively from regions of system memory that use the WB WC and WT memory types This speculative fetching can occur at any time and is not tied to instruction execution Thus it is not ordered with respect to executions of the MFENCE instruction data can be brought into the caches speculatively just before during or after the execution
15. Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes Documentation Changes LEA Load Effective Address Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 8D r LEA 16 A Valid Valid Store effective address for m in register r16 8D r LEA r32 m A Valid Valid Store effective address for m in register r32 REX W 8D r LEA r64 m A Valid N E Store effective address for m in register r64 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m r NA NA LEAVE High Level Procedure Exit Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode C9 LEAVE A Valid Valid Set SP to BP then pop BP C9 LEAVE A N E Valid Set ESP to EBP then pop EBP C9 LEAVE A Valid N E Set RSP to RBP then pop RBP Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA LFENCE Load Fence Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF AE 5 LFENCE A Valid Valid Serializes load operations Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 78 chenes intel LGDT LIDT Load Global Interrupt Descriptor Table Register Opcode Instruction Op 64 Bit Compat Description En Mode Le
16. Non Architectural Performance Events In Next Generation Processor Core Codenamed Westmere Continued B1H 3FH UOPS_EXECUTED CO Counts number of cycles there are RE_ACTIVE_CYCLES one or more uops being executed on any ports This is a core count only and can not be collected per thread B7H 01H OFF CORE RESPONS see Section 30 6 13 Off core Requires E 0 Response Performance Monitoring programming in the Processor Core MSR 01A6H ECH 01H THREAD ACTIVE Counts cycles threads are active Non architectural Performance monitoring events of the uncore sub system for Proces sors with CPUID signature of DisplayFamily DisplayModel 06 25H 06 2CH and 06 1FH support performance events listed in Table 5 Table 5 Non Architectural Performance Events In the Processor Uncore for Next Generation Intel Processor Codenamed Wesmere Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes Event Umask Event Mask Num Value Mnemonic Description Comment 02H 01H UNC GQ OCCUPANC Increments the number of queue Y READ TRACKER entries code read data read and RFOs in the tread tracker The GQ read tracker allocate to deallocate occupancy count is divided by the count to obtain the average read tracker latency OCH 01H UNC 60 SNOOP GOT Counts the number of remote snoops 05 that have requested a cache line be set to the S state
17. PMC ECX 39 32 ELSE ECX is not 0 or 1 or CR4 PCE is 0 and CPL is 1 2 or 3 and CRO PE is 1 GP 0 Fl Processors with CPUID family 15 IF CR4 PCE 1 or CPL 0 or CRO PE 0 THEN IF ECX 30 0 0 17 THEN IF ECX 31 2 0 THEN EAX lt PMC ECX 30 0 31 0 40 bit read EDX lt PMC ECX 30 0 39 32 ELSE ECX 31 2 1 THEN EAX lt PMC ECX 30 0 31 0 32 bit read EDX lt 0 FI ELSE IF 64 bit Intel Xeon processor with L3 THEN IF ECX 30 0 2 1825 EAX lt PMC ECX 30 0 31 0 32 bit read EDX 0 FI ELSE IF Intel Xeon processor 7100 series with L3 THEN IF ECX 30 0 2 1825 EAX lt PMC ECX 30 0 31 0 32 bit read EDX 0 FI ELSE Invalid PMC index in ECX 30 0 see Table 4 5 GP 0 FI ELSE CR4 PCE 20 and CPL 1 2 or 3 and CRO PE 21 GP 0 Fl Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 168 e Documentation Changes n tel RDTSC Read Time Stamp Counter Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 31 RDTSC A Valid Valid Read time stamp counter into EDX EAX Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA Description Loads the current value of the processor s time stamp counter a 64 bit MSR into the EDX EAX registers The EDX register is loaded with the high order 32 bits of the
18. POP GS Op En A 64 Bit Mode Valid N E Valid Valid N E Valid Invalid Invalid Invalid Valid N E Valid Valid N E Valid Compat Leg Mode Valid Valid N E Valid Valid N E Valid Valid Valid Valid Valid N E Valid Valid N E Description Poptop of stack into m16 increment stack pointer Poptop of stack into m32 increment stack pointer Poptop of stack into m64 increment stack pointer Cannot encode 32 bit operand size Poptop of stack into r16 increment stack pointer Poptop of stack into r32 increment stack pointer Poptop of stack into r64 increment stack pointer Cannot encode 32 bit operand size Poptop of stack into DS increment stack pointer Poptop of stack into ES increment stack pointer Poptop of stack into SS increment stack pointer Poptop of stack into FS increment stack pointer by 16 bits Poptop of stack into FS increment stack pointer by 32 bits Poptop of stack into FS increment stack pointer by 64 bits Poptop of stack into GS increment stack pointer by 16 bits Poptop of stack into GS increment stack pointer by 32 bits Poptop of stack into GS increment stack pointer by 64 bits Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 143 H intel Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Oper
19. REX W 89 r MOV r m64 r64 A Valid N E Move r64 to r m64 8A r MOV r8 r m8 B Valid Valid Move r m8 to r8 REX 8A r MOV B Valid N E Move r m8 to r8 r8 r mg 8B r MOV r16 r m16 B Valid Valid Move r m16 to r16 8B r MOV r32 r m32 B Valid Valid Move r m32 to r32 REXW 8B r MOV r64 r m64 B Valid N E Move r m64 to r64 8C r MOV r m16 Sreg A Valid Valid Move segment register to r m16 REX W 8C r MOV r m64 Sreg A Valid Valid Move zero extended 16 bit segment register to r m64 8E r MOV Sreg r m16 B Valid Valid Move r m16 to segment register REX W 8E r MOV Sreg r m64 B Valid Valid Move lower 16 bits of r m64 to segment register 0 MOV AL moffs8 Valid Valid Move byte at seg offset to AL REX W A0 MOV AL moffs8 C Valid N E Move byte at offset to AL Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 88 Documentation Changes Opcode Instruction Al MOV AX moffs16 Al MOV EAX moffs32 REX W A1 MOV RAX moffs64 A2 MOV moffs8 AL REX W A2 MOV moffs8 AL A3 MOV moffs16 AX A3 MOV moffs32 EAX REX W A3 MOV moffs64 BO rb MOV r8 imm8 REX B0 rb MOVr8 imm8 B8 rw MOV r16 imm16 B84 rd MOV r32 imm32 REX W B8 rd MOV r64 imm64 C6 0 MOV r m8 imm8 REX 4 C6 0 MOV r m8 imm8 C7 0 MOV r m16 imm16 C7 0 MOV r m32 imm32 REXW C7 0 MOV r m64 imm32 Op En 0 O UO 64 Bit Mode Valid Valid Val
20. 0E r PBLENDW 1 A Valid Valid Select words from xmm1 ib xmm2 m128 and xmm2 m128 from mask imm8 specified in imm8 and store the values into xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r imm8 NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 122 chenes intel PCMPEQB PCMPEQW PCMPEQD Compare Packed Data for Equal Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 74 Jr PCMPEQB mm A Valid Valid Compare packed bytes in mm m64 mm m64 and mm for equality 66 OF 74 r PCMPEQB xmml Valid Valid Compare packed bytes in xmm2 m128 xmm2 m128 and xmm1 for equality OF 75 Jr PCMPEQW mm A Valid Valid Compare packed words in mm m64 mm m64 and mm for equality 66 OF 75 r PCMPEQW 1 A Valid Valid Compare packed words in xmm2 m128 xmm2 m128 and xmm1 for equality OF 76 r PCMPEQD mm A Valid Valid Compare packed mm m64 doublewords in mm m64 and mm for equality 66 OF 76 PCMPEQD 1 A Valid Valid Compare packed xmm2 m128 doublewords in xmm2 m128 and xmm1 for equality Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA PCMPEQQ Compare Packed Qword Data for Equal Opcode Instruction 64 Bit Compat Description En Mode Leg Mode 66 0F3829 r PCMPEQQxmmi Valid Valid
21. IA 32 MCi ADDR MSRs 457H 1111 MC21 MISC Package See Section 15 3 24 A32 MCi MISC MSRs Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 290 chenes intel B 5 MSRS IN THE NEXT GENERATION INTEL PROCESSOR CODENAMED WESMERE Next Generation Intel 64 processors codenamed Wesmere supports the MSR interfaces listed in Table B 5 plus additional MSR listed in Table B 6 Table B 6 Additional MSRs supported by Next Generation Intel Processors Codenamed Wesmere Register Scope Address Register Name Bit Description Hex Dec 1A7H 423 MSR OFFCORE RS Thread Offcore Response Event Select Register R W P1 1 0 432 1A32_ENERGY_PE Package see Table B 2 RF_BIAS Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 291 chenes intel 15 Updates to Appendix G Volume 3B Change bars show changes to Appendix G of the Intel 64 and 1 32 Architectures Soft ware Developer s Manual Volume 3B System Programming Guide Part 2 56 10 AND CAPABILITIES 32 VMX VPID MSR index 48CH reports information about the capa bilities of the logical processor with regard to virtual processor identifiers VPIDs Section 25 1 and extended page tables EPT Section 25 2 If bit 0 is read as 1 the logical processor a
22. Unsigned divide r m64 by 2 once 03 5 SHR r m32 CL B Valid Valid Unsigned divide r m32 by 2 CL times REX W D3 5 SHR r m64 CL B Valid N E Unsigned divide r m64 by 2 CL times C1 5 ib SHR r m32 imm8 Valid Valid Unsigned divide r m32 by 2 imm8 times REXW C1 5 SHRr m64 imm8 Valid N E Unsigned divide r m64 by 2 ib imm8 times NOTES Not the same form of division as IDIV rounding is toward negative infinity In 64 bit mode r m8 can not be encoded to access the follow ing byte registers if a REX prefix is used AH BH CH DH See 32 Architecture Compatibility section below Instruction Operand Encoding Op En Operand 1 A ModRM r m ModRM r m ModRM r m w Operand 2 1 CL r imm8 Operand 3 Operand 4 NA NA NA NA NA NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes chenes intel SBB Integer Subtraction with Borrow Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 1C ib SBB AL imm8 A Valid Valid Subtract with borrow imm8 from AL 1Diw SBB AX imm16 A Valid Valid Subtract with borrow imm16 from AX 1D id SBB EAX imm32 A Valid Valid Subtract with borrow imm32 from EAX REXW 1Did SBBRAX imm32 Valid N E Subtract with borrow sign extended imm 32 to 64 bits from RAX 80 3 ib SBB r m8 imm8 B Valid Valid Subtract with borrow imm8 from r m8 REX 8
23. left CL times C1 2 ib RCL r m16 imm8 C Valid Valid Rotate 17 bits CF r m16 left imm8 times D1 2 RCL r m32 1 A Valid Valid Rotate 33 bits CF r m32 left once REXW D1 2 RCLr m64 1 A Valid N E Rotate 65 bits CF r m64 left once Uses a 6 bit count D3 2 RCL r m32 CL B Valid Valid Rotate 33 bits CF r m32 left CL times REX W 03 2 RCLr m64 CL B Valid N E Rotate 65 bits CF r m64 Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes left CL times Uses a 6 bit count 160 Documentation Changes Opcode 2 ib C1 2 ib DO 3 REX 00 3 D2 3 REX 02 3 C0 3 ib REX 3 ib D1 3 D3 3 3 ib D1 3 01 3 D3 3 REX W D3 3 3 REX W 3 ib DO 0 REX 00 0 D2 0 REX D2 0 Instruction RCL r m32 imm8 RCL r m64 imm8 RCR r m8 1 r m8 1 RCR r m8 CL RCR r m8 CL RCR r m8 imm8 RCR r m8 imm8 RCR r m16 1 RCR r m16 CL RCR r m16 imm8 RCR r m32 1 RCR r m64 1 RCR r m32 CL RCR r m64 CL RCR r m32 imm8 RCR r m64 imm8 ROL r m8 1 ROL r m8 1 ROL r m8 CL ROL r m8 CL Op En 64 Bit Mode Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Compat Leg Mode Valid
24. Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA B ModRM r m ModRM reg NA NA MOVDQU Move Unaligned Double Quadword Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 6F r MOVDQU xmml Valid Valid Move unaligned double xmm2 m128 quadword from xmm2 m128 to 1 OF 7F r MOVDQU B Valid Valid Move unaligned double xmm2 m128 quadword from xmm1 to 1 xmm2 m128 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA B ModRM r m ModRM reg NA NA MOVDQ2Q Move Quadword from XMM to MMX Technology Register Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode F2 OF D6 MOVDQ2Q mm A Valid Valid Move low quadword from xmm xmm to mmx register Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM reg NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes Changes intel 94 Documentation Changes MOVHLPS Move Packed Single Precision Floating Point Values High to Low Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 12 Jr MOVHLPS xmm1 Valid Valid Move two packed single xmm2 precision floating point values from high quadword of xmm2 to low quadword of
25. FEEO 0330H the mask bit for its associated LVT entry is set Value After Reset 0001 0000H Figure 10 8 Local Vector Table LVT 10 5 3 Error Handling The local APIC provides an error status register ESR that it uses to record errors that it detects when handling interrupts see Figure 10 9 An APIC error interrupt is generated Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 243 Documentation Changes rupt intel when the local APIC sets one of the error bits in the ESR The LVT error register allows selection of the interrupt vector to be delivered to the processor core when APIC error is detected The LVT error register also provides a means of masking an APIC error inter The ESR is a write read register A write of any value to the ESR must be done to update the register before attempting to read it This write clears any previously logged errors and updates the ESR with any errors detected since the last write to the ESR Errors are collected regardless of LVT Error mask bit but the APIC will only issue an interrupt due to the error if the LVT Error mask bit is cleared The functions of the ESR are listed in Table 10 2 Error handling in x2APIC mode is discussed in Section 10 12 8 31 876543210 Reserved Illegal Register Address Received Illegal Vector Send Illegal Vector Reserved Receive Accept Error Send Accept
26. FFFFF800H 00000800H Such a failure causes an associated VM entry to fail by reloading host state and causes an associated VM exit to lead to VMX abort Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 255 chenes intel 10 12 5 x2APICState Transitions This section provides a detailed description of the x2APIC states of a local x2APIC unit transitions between these states as well as interactions of these states with INIT and RESET 10 12 5 1 x2APIC States The valid states for a local x2APIC unit is listed in Table 10 5 e APIC disabled 2 0 and 2 APIC 0 xAPIC mode 2 APIC 5 1 and 2 APIC BASE EXTD O 2 mode 2 APIC BASE EN 1 and 2 APIC BASE EXTD 1 Invalid 2 APIC BASE EN 0 and 2 APIC BASE EXTD 1 The state corresponding to EXTD 1 and EN 0 is not valid and it is not possible to get into this state An execution of WRMSR to the1A32 APIC BASE MSR that attempts a transition from a valid state to this invalid state causes a general protection exception Figure 10 27 shows the comprehensive state transition diagram for a local x2APIC unit x2APIC Transitions From x2APIC Mode From the x2API C mode the only valid x2APIC transition using 2 APIC BASE is to the state where the x2APIC is disabled by setting EN to 0 and EXTD to 0 The x2APIC ID 32 bits and
27. ModRM r m r NA NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 18 H intel ANDNPS Bitwise Logical AND NOT of Packed Single Precision Floating Point Values Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode OF 55 r ANDNPSxmml A Valid Valid Bitwise logical AND NOT of xmm2 m128 xmm2 m128 and xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 ModRM reg w ModRM r m r NA NA ARPL Adjust RPL Field of Segment Selector Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode 63 r 16 16 N E Valid Adjust RPL of r m16 to not less than RPL of r16 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m w ModRM reg NA NA Description Compares the RPL fields of two segment selectors The first operand the destination operand contains one segment selector and the second operand source operand contains the other The RPL field is located in bits 0 and 1 of each operand If the RPL field of the destination operand is less than the RPL field of the source operand the ZF flag is set and the RPL field of the destination operand is increased to match that of the source operand Otherwise the ZF flag is cleared and no change is made to the destina tion operand The destination operand can be a w
28. Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 139 Boxusieniniton Chinon intel PMULHRSW Packed Multiply High with Round and Scale Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 38 0B r PMULHRSW mm1 Valid Valid Multiply 16 bit signed mm2 m64 words scale and round signed doublewords pack high 16 bits to MM1 66 OF 38 0B r PMULHRSW A Valid Valid Multiply 16 bit signed 1 words scale and round xmm2 m128 signed doublewords pack high 16 bits to XMM1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA PMULHUW Multiply Packed Unsigned Integers and Store High Result Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF E4 r PMULHUW mm1 A Valid Valid Multiply the packed mm2 m64 unsigned word integers in mm1 register and mm2 m64 and store the high 16 bits of the results in mm1 66 OF E4 Jr PMULHUW xmm1 A Valid Valid Multiply the packed xmm2 m128 unsigned word integers in xmm1 and xmm2 m128 and store the high 16 bits of the results in xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation C
29. REX W D3 7 C1 7 ib REX W C1 7 ib 00 4 REX D0 4 D2 4 REX 02 4 CO 4 ib REX C0 4 ib D1 4 D3 4 4 ib D1 4 01 4 03 4 REX W 03 4 4 ib C1 4 DO 5 Instruction SAR r m16 1 SAR r m16 CL SAR r m16 imm8 SAR r m32 1 SAR r m64 1 SAR r m32 CL SAR r m64 CL SAR r m32 imm8 SAR r m64 imm8 SHL r m8 1 SHL r m8 1 SHL r m8 CL SHL r m8 CL SHL r m8 imm8 SHL r m8 imm8 SHL r m16 1 SHL r m16 CL SHL r m16 imm8 SHL r m32 1 SHL r m64 1 SHL r m32 CL SHL r m64 CL SHL r m32 imm8 SHL r m64 imm8 SHR r m8 1 Op En A B B gt gt gt gt gt 64 Bit Mode Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Compat Leg Mode Valid Valid Valid Valid N E Valid N E Valid N E Valid N E Valid N E Valid N E Valid Valid Valid Valid N E Valid N E Valid N E Valid Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes Description Signed divide r m16 by 2 once Signed divide r m16 by 2 CL times Signed divide r m16 by 2 imm8 times Signed divide r m32 by 2 once Signed divide r m64 by 2 once Signed divide r m32 by 2 CL
30. To explicitly force the LOCK semantics software can use the LOCK prefix with the following instructions when they are used to modify a memory location An invalid opcode exception UD is generated when the LOCK prefix is used with any other instruction or when no write operation is made to memory that is when the destination operand is in a register The bit test and modify instructions BTS BTR and BTC The exchange instructions XADD CMPXCHG CMPXCHG8B The LOCK prefix is automatically assumed for XCHG instruction The following single operand arithmetic and logical instructions INC DEC NOT and NEG The following two operand arithmetic and logical instructions ADD ADC SUB SBB AND OR and XOR A locked instruction is guaranteed to lock only the area of memory defined by the desti nation operand but may be interpreted by the system as a lock for a larger memory area Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 230 e Documentation Changes n tel Software should access semaphores shared memory used for signalling between multiple processors using identical addresses and operand lengths For example if one processor accesses a semaphore using a word access other processors should not access the semaphore using a byte access NOTE Do not implement semaphores using the WC memory type Do not perform non temporal stores to a cache line contain
31. Valid N E Valid N E Valid N E Valid Valid Valid Valid N E Valid N E Valid N E Valid N E Valid N E Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes Description Rotate 33 bits CF r m32 left imm8 times Rotate 65 bits CF r m64 left imm8 times Uses a 6 bit count Rotate 9 bits CF r m8 right once Rotate 9 bits CF r m8 right once Rotate 9 bits CF r m8 right CL times Rotate 9 bits CF r m8 right CL times Rotate 9 bits CF r m8 right imm8 times Rotate 9 bits CF r m8 right imm times Rotate 17 bits CF r m16 right once Rotate 17 bits CF r m16 right CL times Rotate 17 bits CF r m16 right imm8 times Rotate 33 bits CF r m32 right once Uses a 6 bit count Rotate 65 bits CF r m64 right once Uses a 6 bit count Rotate 33 bits CF r m32 right CL times Rotate 65 bits CF r m64 right CL times Uses a 6 bit count Rotate 33 bits CF r m32 right imm8 times Rotate 65 bits CF r m64 right imm8 times Uses a 6 bit count Rotate 8 bits r m8 left once Rotate 8 bits r m8 left once Rotate 8 bits r m8 left CL times Rotate 8 bits r m8 left CL times 161 Documentation Changes Opcode C0 0 ib REX C0 0 ib D1 0 D3 0 C1 0 ib D1 0 REX W 01 0 D3 0 REX W D3 0 C1 0 ib C1 0 ib 1 REX D0 1 D2 1 REX 02 1 CO 1 ib REX 4 C0 1 ib
32. and store the low 16 bits of the results in mm1 66 OF D5 r PMULLW 1 Valid Valid Multiply the packed signed xmm2 m128 word integers in xmm1 and xmm2 m128 and store the low 16 bits of the results in 1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA PMULUDQ Multiply Packed Unsigned Doubleword Integers Opcode Instruction Op 64 Bit Compat Description En Leg Mode OF FA Jr PMULUDQ mm1 Valid Valid Multiply unsigned mm2 m64 doubleword integer in mm1 by unsigned doublew ord integer in mm2 m64 and store the quadw ord result in mm1 66 OF FA Jr PMULUDQ xmm1 Valid Valid Multiply packed unsigned xmm2 m128 doubleword integers in xmm1 by packed unsigned doubleword integers in xmm2 m128 and store the quadw ord results in xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 142 Documentation Changes POP Pop a Value from the Stack intel Opcode 8F 0 8F 0 8F 0 58 rw 584 rd 584 rd 1F 07 17 OF Al OF Al OF Al OF 9 OF 9 OF 9 Instruction POP r m16 POP r m32 POP r m64 POP r16 POP 32 POP r64 POP DS POP ES POP SS POP FS POP FS POP FS POP GS POP GS
33. ordering write after read conflicts While in the sleep state the request is not eligible to be scheduled to the QMC 34H 02H QHL SLEEPS R EMOTE ORDER Counts number of occurrences a request was put to sleep due to remote socket ordering write after read conflicts While in the sleep state the request is not eligible to be scheduled to the QMC 34H 04H UNC QHL SLEEPS L OCAL ORDER Counts number of occurrences a request was put to sleep due to local socket ordering write after read conflicts While in the sleep state the request is not eligible to be scheduled to the QMC 34H QHL SLEEPS O CONFLICT Counts number of occurrences a request was put to sleep due to address conflicts While in the sleep state the request is not eligible to be scheduled to the QMC QHL SLEEPS R EMOTE CONFLICT Counts number of occurrences a request was put to sleep due to remote socket address conflicts While in the sleep state the request is not eligible to be scheduled to the QMC UNC QHL SLEEPS L OCAL CONFLICT Counts number of occurrences a request was put to sleep due to local socket address conflicts While in the sleep state the request is not eligible to be scheduled to the QMC UNC ADDR OPCODE MATCH IOH Counts number of requests from the address opcode of request is qualified by mask value written to MSR 396H The following mask values
34. r 66 Of 38 24 r 66 0f 38 25 Jr Instruction PMOVSXBW xmml xmm2 m64 PMOVSXBD xmml xmm2 m32 PMOVSXBQ xmml xmm2 m16 PMOVSXWD xmml xmm2 m64 PMOVSXWQ xmml xmm2 m32 PMOVSXDQ xmml xmm2 m64 Op En A 64 bit Mode Valid Valid Valid Valid Valid Valid Compat Leg Mode Valid Valid Valid Valid Valid Valid Description Sign extend 8 packed signed 8 bit integers in the low 8 bytes of xmm2 m64 to 8 packed signed 16 bit integers in xmm1 Sign extend 4 packed signed 8 bit integers in the low 4 bytes of xmm2 m32 to 4 packed signed 32 bit integers in xmm1 Sign extend 2 packed signed 8 bit integers in the low 2 bytes of xmm2 m16 to 2 packed signed 64 bit integers in xmm1 Sign extend 4 packed signed 16 bit integers in the low 8 bytes of xmm2 m64 to 4 packed signed 32 bit integers in xmm1 Sign extend 2 packed signed 16 bit integers in the low 4 bytes of xmm2 m32 to 2 packed signed 64 bit integers in xmm1 Sign extend 2 packed signed 32 bit integers in the low 8 bytes of xmm2 m64 to 2 packed signed 64 bit integers in xmm1 Instruction Operand Encoding Op En Operand 1 A ModRM reg w Operand 2 ModRM r m r Operand 3 Operand 4 NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 138 Boxusieniniton Chinon intel PMOVZX Packed Move with Zero Extend Opcode Instruct
35. r NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 98 chenes intel MOVNTDQ Store Double Quadword Using Non Temporal Hint Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF E7 r MOVNTDQm128 A Valid Valid Move double quadword xmm from xmm to m128 using non temporal hint Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m w ModRM reg r NA NA MOVNTI Store Doubleword Using Non Temporal Hint Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF C3 r MOVNTI m32 r32 A Valid Valid Move doubleword from r32 to m32 using non temporal hint REX W 0F MOVNTI m64 r64 Valid N E Move quadword from r64 to Ir m64 using non temporal hint Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m w ModRM reg r NA NA MOVNTPD Store Packed Double Precision Floating Point Values Using Non Temporal Hint Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF 2B r MOVNTPDm128 A Valid Valid Move packed double xmm precision floating point values from xmm to m128 using non temporal hint Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m w ModRM reg r NA NA Intel 64 and 32 Architectures Software Developer s Manual Documenta
36. r imm8 NA PINSRW Insert Word Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF C4 r ib PINSRW mm A Valid Valid Insert the low word from r32 m16 imm8 r32 or from m16 into mm at the word position specified by imm8 66 OF C4 rib PINSRW xmm A Valid Valid Move the low word of r32 or r32 m16 imm8 from m16 into xmm at the word position specified by imma Instruction Operand Encoding Op En Operand 1 A ModRM reg w Operand 2 ModRM r m r Operand 3 Operand 4 imm8 NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 130 Cages intel PMADDUBSW Multiply and Add Packed Signed and Unsigned Bytes Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 38 04 Jr PMADDUBSW A Valid Valid Multiply signed and mm1 mm2 m64 unsigned bytes add horizontal pair of signed words pack saturated signed words to 1 66 OF 38 04 r PMADDUBSW A Valid Valid Multiply signed and 1 unsigned bytes add xmm2 m128 horizontal pair of signed words pack saturated signed words to XMM1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA PMADDWD Multiply and Add Packed Integers Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF F5 Jr PMADDWD mm A Valid Valid Multiply the packed words in mm m64 mm by the packed word
37. r m64 Ir 6B r ib IMUL r16 r m16 imm8 6B r ib IMUL r32 r m32 imm8 REX W 6B rib IMUL r64 r m64 imm8 69 r iw IMUL r16 r m16 imm16 69 r id IMUL r32 r m32 imm32 REX W 69 rid IMUL r64 r m64 imm32 NOTES Op gt gt gt gt 64 Bit Mode Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Compat Leg Mode Valid Valid Valid N E Valid Valid N E Valid Valid N E Valid Valid N E Description AX AL byte DX AX lt AX r m word EDX EAX EAX r m32 RDX RAX lt RAX r m64 word register word register r m16 doublew ord register lt doubleword register r m32 Quadw ord register Quadw ord register r m64 word register r m16 sign extended immediate byte doubleword register r m32 sign extended immediate byte Quadw ord register lt r m64 sign extended immediate byte word register r m16 immediate word doubleword register r m32 immediate doublew ord Quadw ord register lt r m64 immediate doubleword n 64 bit mode r m8 can not be encoded to access the following byte registers if a REX prefix is used AH BH CH DH Instruction Operand Encoding Op En Operand 1 ModRM r m w ModRM reg w ModRM w 2 ModRM r m ModRM r m
38. w ModRM r m NA NA ModRM r m ModRM reg NA MOVQ2DQ Move Quadword from MMX Technology to XMM Register Opcode Instruction Op 64 Bit Compat Description En Leg Mode OF D6 MOVQ2DQxmm Valid Valid Move quadword from mmx mm to low quadword of xmm Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 ModRM reg w ModRM reg NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 101 H intel MOVS MOVSB MOVSW MOVSD MOVSQ Move Data from String to String Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode A4 MOVS m8 m8 A Valid Valid For legacy mode Move byte from address DS E SI to ES E DI For 64 bit mode move byte from address R E SI to 0 A5 MOVS m16 m16 A Valid Valid For legacy mode move word from address DS E SI to ES E DI For 64 bit mode move word at address R E SI to R E DI A5 MOVS m32 m32 A Valid Valid For legacy mode move dword from address DS E SI to ES E DI For 64 bit mode move dword from address R E SI to R E DI REX W A5 MOVS m64 m64 A Valid N E Move qword from address R E SI to R E DI A4 MOVSB A Valid Valid For legacy mode Move byte from address DS E SI to ES E DI For 64 bit mode move byte from address R E SI to R E DI A5 MOVSW A Valid Valid For legacy mode move word from address DS E SI to
39. 0 offsets 0 through FFFH are still valid For all types of segments except expand down data segments the effective limit is the last address that is allowed to be accessed in the segment which is one less than the size in bytes of the segment The processor causes a general protection exception or if the segment is SS a stack fault exception any time an attempt is made to access the following addresses in a segment A byte at an offset greater than the effective limit A word at an offset greater than the effective limit 1 doubleword at an offset greater than the effective limit 3 A quadword at an offset greater than the effective limit 7 Adouble quadword at an offset greater than the effective limit 15 When the effective limit is FFFFFFFFH 4 GBytes these accesses may or may not cause the indicated exceptions Behavior is implementation specific and may vary from one execution to another 5 8 8 Fast System Calls in 64 bit Mode The SYSCALL and SYSRET instructions are designed for operating systems that use a flat memory model segmentation is not used The instructions along with SYSENTER and SYSEXIT are suited for A 32e mode operation SYSCALL and SYSRET however are not supported in compatibility mode Use CPUID to check if SYSCALL and SYSRET are avail able CPUID 80000001H EDX bit 11 1 SYSCALL is intended for use by user code running at privilege level 3 to access operating system or
40. 1 For example CPUID can be executed at any privilege level to serialize instruction execution with no effect on program flow except that the EAX EBX ECX and EDX registers are modified The following instructions are memory ordering instructions not serializing instructions These drain the data memory subsystem They do not serialize the instruction execution stream 8 4 4 1 Typical BSP Initialization Sequence After the BSP and APs have been selected by means of a hardware protocol see Section 8 4 3 MP Initialization Protocol Algorithm for Intel Xeon Processors the BSP begins executing BIOS boot strap code POST at the normal I A 32 architecture starting address FFFF FFFOH The boot strap code typically performs the following operations l Initializes memory Loads the microcode update into the processor Initializes the MTRRs Enables the caches rum Executes the CPUID instruction with a value of OH in the EAX register then reads the EBX ECX and EDX registers to determine if the BSP is Genuinel ntel 6 Executes the CPUID instruction with a value of 1H in the EAX register then saves the values in the EAX ECX and EDX registers in a system configuration space in RAM for use later 7 Loads start up code for the AP to execute into a 4 KByte page in the lower 1 MByte of memory 8 Switches to protected mode and ensures that the APIC address space is mapped to the strong uncacheable UC memory
41. 1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 ModRM reg w ModRM r m NA NA MAXSS Return Maximum Scalar Single Precision Floating Point Value Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 5F r MAXSS xmm1 A Valid Valid Return the maximum scalar xmm2 m32 single precision floating point value between xmm2 mem32 and xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 ModRM reg w ModRM r m NA NA MFENCE Memory Fence Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF AE 6 MFENCE A Valid Valid Serializes load and store operations Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA Description Performs serializing operation on all load from memory and store to memory instruc tions that were issued prior the MFENCE instruction This serializing operation guaran tees that every load and store instruction that precedes the MFENCE instruction in program order becomes globally visible before any load or store instruction that follows the MFENCE instruction The MFENCE instruction is ordered with respect to all load and store instructions other MFENCE instructions any LFENCE and SFENCE instructions and Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 85
42. 2 MByte Yes 4 KByte IA 32e 1 1 2 48 Upto52 2 MByte Yes 1 GByte NOTES 1 The physical address width is always bounded by MAXPHYADDR see Section 4 1 4 2 The processor ensures that A32 EFER LME must be 0 if CRO PG 1 and CR4 PAE 0 3 32 bit paging supports physical address widths of more than 32 bits only for 4 MByte pages and only if the PSE 36 mechanism is supported see Section 4 1 4 and Section 4 3 4 4 MByte pages are used with 32 bit paging only if CR4 PSE 1 see Section 4 3 5 Execute disable access rights are applied only if IA32_EFER NXE 1 see Section 4 6 6 Not all processors that support IA 32e paging support 1 GByte pages see Section 4 1 4 Because they are used only if 1A32 EFER LME 0 32 bit paging and PAE paging is used only in legacy protected mode Because legacy protected mode cannot produce 4 1 4 Enumeration of Paging Features by CPUID Software can discover support for different paging features using the CPUID instruction PSE page size extensions for 32 bit paging If CPUID O1H EDX PSE bit 3 1 CRA PSE may be set 1 enabling support for 4 MByte pages with 32 bit paging see Section 4 3 PAE physical address extension If CPUID O1H EDX PAE bit 6 1 CR4 PAE may be set to 1 enabling PAE paging this setting is also required for A 32e paging PGE global page support If CPUID O1H EDX PGE bit 13 1 CR4 PGE may be set to 1 enabling the global page feature see S
43. 21 r MOV r32 DRO A N E Valid Move debug register to r32 DR7 OF 21 r MOV 64 DRO A Valid Move extended debug DR7 register to r64 OF 23 r MOV DRO DR7 A Valid Move r32 to debug register r32 OF 23 Jr MOV DRO DR7 A Valid N E Move r64 to extended r64 debug register Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA MOVAPD Move Aligned Packed Double Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF 28 r MOVAPDxmml Valid Valid Move packed double xmm2 m128 precision floating point values from xmm2 m128 to xmm1 66 OF 29 r MOVAPD B Valid Valid Move packed double xmm2 m128 precision floating point 1 values from xmm1 to xmm2 m128 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA B ModRM r m ModRM reg NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 91 chenes intel MOVAPS Move Aligned Packed Single Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 28 r MOVAPS xmml Valid Valid Move packed single xmm2 m128 precision floating point values from xmm2 m128 to 1 OF 29 Jr MOVAPS B Valid Valid Move packed single xmm2 m128 precision floating point
44. 32 Architectures Software Developer s Manual Documentation Changes 59 Documentation Changes HADDPS Packed Single FP Horizontal Add intel Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode F2 OF 7C r HADDPSxmml Valid Valid Horizontal add packed xmm2 m128 single precision floating point values from xmm2 m128 to xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA HLT Halt Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode F4 HLT A Valid Valid Halt Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA HSUBPD Packed Double FP Horizontal Subtract Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF 7D r HSUBPDxmml A Valid Valid Horizontal subtract packed xmm2 m128 double precision floating point values from xmm2 m128 to xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA HSUBPS Packed Single FP Horizontal Subtract Opcode Instruction Op 64 Bit Compat En Mode Leg Mode F2 OF 7D r HSUBPSxmml A Valid Valid xmm2 m128 Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes Description Horizontal subtract packed single precision floating point values from xmm2 m128 to xmml1
45. 4 A NA NA NA NA PAVGB PAVGW Average Packed Integers Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF EO r PAVGB mm1 A Valid Valid Average packed unsigned mm2 m64 byte integers from mm2 m64 and mm1 with rounding 66 OF EO r PAVGB xmm1 A Valid Valid Average packed unsigned xmm2 m128 byte integers from xmm2 m128 and xmm1 with rounding OF E3 r PAVGW mm1 A Valid Valid Average packed unsigned mm2 m64 word integers from mm2 m64 and mm1 with rounding 66 OF r PAVGW xmm1 A Valid Valid Average packed unsigned xmm2 m128 word integers from xmm2 m128 and xmm1 with rounding Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 121 H intel Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA PBLENDVB Variable Blend Packed Bytes Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 0F 38 10 PBLENDVBxmml Valid Valid Select byte values from xmm2 m128 xmm1 and xmm2 m128 lt 0 gt from mask specified in the high bit of each byte in 0 and store the values into xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m XMM0 NA PBLENDW Blend Packed Words Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66
46. 45 r REX W 0F 45 Ir OF 40 r OF 40 r REX W 0F 40 Ir Instruction CMOVNG r32 r m32 CMOVNG r64 r m64 CMOVNGE r16 r m16 CMOVNGE r32 r m32 CMOVNGE r64 r m64 CMOVNL r16 r m16 CMOVNL r32 r m32 CMOVNL r64 r m64 CMOVNLE r16 r m16 CMOVNLE r32 r m32 CMOVNLE r64 r m64 CMOVNO r16 r m16 CMOVNO r32 r m32 CMOVNO r64 r m64 CMOVNP r16 r m16 CMOVNP r32 r m32 CMOVNP r64 r m64 CMOVNS r16 r m16 CMOVNS r32 r m32 CMOVNS r64 r m64 CMOVNZ r16 r m16 CMOVNZ r32 r m32 CMOVNZ r64 r m64 CMOVO r16 r m16 CMOVO r32 r m32 CMOVO r64 r m64 Op En A gt gt 64 Bit Mode Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Compat Leg Mode Valid N E Valid Valid N E Valid Valid N E Valid Valid N E Valid Valid N E Valid Valid N E Valid Valid N E Valid Valid N E Valid Valid N E Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes Description Move if not greater ZF 1 5 5 OF Move if not greater ZF 1 or 5 5 OF Move if not greater or equal SFz OF Move if not greater or equal SFz OF Move if not greater or equal SFz OF Move if not less SF OF Move if not less SF OF Move if not less SF OF Move if not
47. A PDE is selected using the physical address defined as follows Table 4 15 Format of an IA 32e Page Directory Pointer Table Entry PDPTE that References a Page Directory Bit Position s Contents 0 P Present must be 1 to reference a page directory 1 R W Read write if 0 writes may not be allowed to the 1 GByte region controlled by this entry depends on CPL and CRO WP see Section 4 6 2 U S User supervisor if 0 accesses with CPL 3 are not allowed to the 1 GByte region controlled by this entry see Section 4 6 3 PWT Page level write through indirectly determines the memory type used to access the page directory referenced by this entry see Section 4 9 4 PCD Page level cache disable indirectly determines the memory type used to access the page directory referenced by this entry see Section 4 9 5 A Accessed indicates whether this entry has been used for linear address translation see Section 4 8 Ignored 7 PS Page size must be 0 otherwise this entry maps a 1 GByte page see Table 4 14 118 Ignored M 1 12 Physical address of 4 KByte aligned page directory referenced by this entry 51 M Reserved must be 0 62 52 Ignored 63 XD If IA32_EFER NXE 1 execute disable if 1 instruction fetches are not allowed from the 1 GByte region controlled by this entry see Section 4 6 otherwise reserved must be 0
48. ANDRAX imm32 Valid N E RAX AND imm32 sign extended to 64 bits 80 4 ib ANDr m8 imm8 B Valid Valid r m8 AND imm8 REX 80 4ib AND r m8 imm8 B Valid N E r m64 AND imm8 sign extended 81 4 iw AND r m16 B Valid Valid r m16 AND imm16 imm16 81 4 id AND r m32 B Valid Valid r m32 AND imm32 imm32 REXW 81 4 ANDr m64 B Valid N E r m64 AND imm32 sign id imm32 extended to 64 bits Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 16 Documentation Changes Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode 83 4 ib AND r m16 imm8 Valid Valid r m16 AND imme sign extended 83 4 ib AND r m32 imm8 Valid Valid r m32 AND imme sign extended REX W 83 4 ANDr m64 imm8 Valid N E r m64 AND imm8 sign ib extended 20 r AND r m8 r8 A Valid Valid r m8 AND r8 REX 20 r ANDr m8 r8 A Valid r m64 AND r8 sign extended 21 r AND r m16 r16 A Valid Valid r m16 AND r16 21 r AND r m32 r32 A Valid Valid r m32 AND r32 REXW 21 r ANDr m64 r64 A Valid N E r m64 AND r32 22 AND r8 r m8 A Valid Valid r8 AND r m8 REX 22 r ANDr8 r m8 A Valid N E r m64 AND r8 sign extended 23 16 16 A Valid Valid r16 AND r m16 23 AND r32 r m32 A Valid Valid r32 AND r m32 REX W 23 r ANDr64 r m64 A Valid N E r64 AND r m64 NOTES In 64 bit mode r m8 can not be encoded to access the following byte registers if a
49. Address Register Name Bit Description Hex Dec 1C8H 456 MSR_LBR_SELECT Core Last Branch Record Filtering Select Register R W see Section 16 6 2 Filtering of Last Branch Records 3BOH 960 MSR_UNCORE_PM Package See Section 30 6 2 2 Uncore Performance co Event Configuration Facility 3B1H 961 MSR_UNCORE_PM Package See Section 30 6 2 2 Uncore Performance Event Configuration Facility 3B2H 962 MSR_UNCORE_PM Package See Section 30 6 2 2 Uncore Performance C2 Event Configuration Facility 3B3H 963 MSR UNCORE Package See Section 30 6 2 2 Uncore Performance C3 Event Configuration Facility 3B4H 964 MSR_UNCORE_PM Package See Section 30 6 2 2 Uncore Performance C4 Event Configuration Facility 3B5H 965 MSR UNCORE Package See Section 30 6 22 Uncore Performance C5 Event Configuration Facility 3B6H 966 MSR UNCORE Package See Section 30 6 22 Uncore Performance C6 Event Configuration Facility 3B7H 967 MSR UNCORE Package See Section 30 6 2 2 Uncore Performance C7 Event Configuration Facility 3C0H 944 MSR_UNCORE PE Package See Section 30 6 22 Uncore Performance RFEVTSELO Event Configuration Facility 3C1H 945 MSR_UNCORE PE Package See Section 30 6 2 2 Uncore Performance RFEVTSEL1 Event Configuration Facility 3C2H 946 MSR_UNCORE PE Package See Section 30 6 2 2 Uncore Performance RFEVTSEL2 Event Configuration Facili
50. B7 MOVZX r64 Ir r m16 NOTES n 64 bit mode r m8 can not be encoded to access the following byte registers if the REX prefix is used AH BH CH DH Instruction Operand Encoding Op En A Operand 1 ModRM reg w Operand 3 Operand 4 NA NA MPSADBW Compute Multiple Packed Sums of Absolute Difference Opcode Instruction Description Leg Mode 66 OF 42 MPSADBW 1 Sums absolute 8 bit integer ib xmm2 m128 difference of adjacent imm8 groups of 4 byte integers in xmm1 and xmm2 m128 and writes the results in xmm1 Starting offsets within xmm1 and xmm2 m128 are determined by imm8 Instruction Operand Encoding Op En Operand 1 Operand 3 Operand 4 A ModRM reg r w imm8 NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes chenes intel MUL Unsigned Multiply Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 4 MUL r m8 A Valid Valid Unsigned multiply AX lt AL r m8 REX 4 F6 4 MUL r m8 A Valid N E Unsigned multiply AX AL r m8 F7 4 MUL r m16 A Valid Valid Unsigned multiply DX AX 16 7 4 MUL r m32 A Valid Valid Unsigned multiply EDX EAX lt EAX 32 REX W F7 4 r m64 A Valid N E Unsigned multiply RDX RAX lt r m64 NOTES n 64 bit mode r m8 can not be encoded to access the following byte registers if a
51. C MSRs except for the ICR are reserved Table 10 6 Local APIC Register Address Map Supported by x2APIC MSR Address MMIO Offset Reqist r Name MSR R W Comments 2 mode xAPIC mode 9 Semantics 802H 020H Local APICID register Read only See Section 10 12 5 1 for initial values 803H 030H Local APIC Version Read only Same version used in register mode x2APIC mode 808H 080H Task Priority Register Read write Bits 31 8 are reserved TPR 80AH 0 Processor Priority Read only Register PPR 80BH EOI register Write WRMSR of a non zero only value causes GP 0 80DH Logical Destination Read only Read write in xAPIC Register LDR mode 80FH OFOH Spurious Interrupt Read write See Section 10 9 for Vector Register SVR reserved bits 810H 100H In Service Register Read only ISR bits 31 0 811H 110H ISR bits 63 32 Read only 812H 120H ISR bits 95 64 Read only 813H 130H ISR bits 127 96 Read only 814H 140H ISR bits 159 128 Read only Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 252 Documentation Changes MSR Address MMIO Offset Register Nana MSR R W Comments x2APIC mode xAPIC mode 9 Semantics 815H 150H ISR bits 191 160 Read only 816H 160H ISR bits 223 192 R
52. CF 0 and 7 0 REXW 0F47 CMOVAr64 r m64 Valid N E Move if above CF 0 and Ir 7 0 OF 43 r r16 r m16 A Valid Valid Move if above or equal CF 0 OF 43 r CMOVAE r32 r m32 A Valid Valid Move if above or equal CF 0 REX W 0F 43 CMOVAEr64 r m64 Valid N E Move if above or equal 0 OF 42 r CMOVB r16 r m16 A Valid Valid Move if below CF 1 OF 42 r CMOVB r32 r m32 A Valid Valid Move if below CF 1 REXW 0F 42 CMOVBr64 r m64 A Valid Move if below CF 1 OF 46 r CMOVBE r16 r m16 A Valid Valid Move if below or equal CF 1 or ZF21 OF 46 r CMOVBE r32 r m32 Valid Valid Move if below or equal CF 1 or ZF 1 REX W 0F 46 CMOVBEr64 r m64 Valid N E Move if below or equal CF 1 or ZF 1 OF 42 r CMOVCr16 r m16 Valid Valid Move if carry CF 1 OF 42 r CMOVCr32 r m32 A Valid Valid Move if carry CF 1 REXW 0F 42 CMOVCr64 r m64 A Valid Move if carry CF 1 Ir OF 44 r CMOVEr16 r m16 Valid Valid Move if equal 2 1 OF 44 r CMOVEr32 r m32 A Valid Valid Move if equal 2 1 REXW 0F 44 CMOVEr64 r m64 A Valid Move if equal 2 1 Ir OF 4F r CMOVGr16 r m16 Valid Valid Move if greater ZF 0 and SF OF OF 4F r CMOVGr32 r m32 Valid Valid Move if greater ZF 0 and SF OF REX W 0F4F CMOVGr64 r m64 A Valid N E Move if greater ZF 0 and SF OF OF 4D r CMOVGE r16 r m16 A Valid Valid Move if greater or equal SF OF OF
53. Changes Because a PDPTE is identified using bits 47 30 of the linear address it controls access to a 1 GByte region of the linear address space Use of the PDPTE depends on its PS flag bit 7 1 Ifthe PDPTE s PS flag is 1 the PDPTE maps a 1 GByte page see Table 4 14 The final physical address is computed as follows Table 4 14 Format of an IA 32e Page Directory Pointer Table Entry PDPTE that Maps a 1 GByte Page Bit Position s Contents 0 P Present must be 1 to map a 1 GByte page 1 R W Read write if 0 writes may not be allowed to the 1 GByte page referenced by this entry depends on CPL and CRO WP see Section 4 6 2 U S User supervisor if 0 accesses with CPL 3 are not allowed to the 1 GByte page referenced by this entry see Section 4 6 3 PWT Page level write through indirectly determines the memory type used to access the 1 GByte page referenced by this entry see Section 4 9 4 PCD Page level cache disable indirectly determines the memory type used to access the 1 GByte page referenced by this entry see Section 4 9 Accessed indicates whether software has accessed the 1 GByte page referenced by this entry see Section 4 8 Dirty indicates whether software has written to the 1 GByte page referenced by this entry see Section 4 8 Page size must be 1 otherwise this entry references a page directory see Table Table 4 15 Global if CR4 PG
54. D2 4 SAL r m8 CL B Valid Multiply r m8 by 2 CL times CO 4 ib SAL r m8 imm8 C Valid Valid Multiply r m8 by 2 imm8 times REX C0 4 ib SAL r m8 imm8 C Valid N E Multiply r m8 by 2 imm8 times D1 4 SAL r m16 1 A Valid Valid Multiply r m16 by 2 once D3 4 SAL r m16 CL B Valid Valid Multiply r m16 by 2 CL times 4 ib SAL r m16 imm8 Valid Valid Multiply r m16 by 2 imm8 times D1 4 SAL r m32 1 A Valid Valid Multiply r m32 by 2 once REXW D1 4 SALr m64 1 A Valid N E Multiply r m64 by 2 once D3 4 SAL r m32 CL B Valid Valid Multiply r m32 by 2 CL times REXW D3 4 SAL r m64 CL B Valid N E Multiply r m64 by 2 CL times 4 ib SAL r m32 imm8 C Valid Valid Multiply r m32 by 2 imm8 times REXW C1 4 SALr m64 imm8 Valid N E Multiply r m64 by 2 imm8 ib times 00 7 SAR r m8 1 A Valid Valid Signed divide r m8 by 2 once REX 00 7 SAR r m8 1 A Valid N E Signed divide r m8 by 2 once D2 7 SAR r m8 CL B Valid Valid Signed divide r m8 by 2 CL times REX D2 7 SAR r m8 CL B Valid N E Signed divide r m8 by 2 CL times CO 7 ib SAR r m8 imm8 Valid Valid Signed divide r m8 by 2 imm8 time REX C0 7 ib SAR r m8 imm8 Valid N E Signed divide r m8 by 2 Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes imm8 times 176 Documentation Changes Opcode D1 7 03 7 C1 7 ib D1 7 REX W 1 7 D3 7
55. Documentation Changes PSHUFHW Shuffle Packed High Words Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode F30F70 rib PSHUFHW 1 A Valid Valid Shuffle the high words in xmm2 m128 xmm2 m128 based on the imm8 encoding in imm8 and store the result in xmml1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m r imm8 NA PSHUFLW Shuffle Packed Low Words Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode F2 OF 70 r ib PSHUFLW 1 A Valid Valid Shuffle the low words in xmm2 m128 xmm2 m128 based on the imm8 encoding in imm8 and store the result in xmml1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m r imm8 NA PSHUFW Shuffle Packed Words Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 70 r ib PSHUFW mm1 A Valid Valid Shuffle the words in mm2 m64 imm8 mm2 m64 based on the encoding in imm8 and store the result in mm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m r imm8 NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 148 Documentation Changes PSIGNB PSIGNW PSIGND Packed SIGN ntel Op 64 Compat Opcode Instruction En Mode Leg Mode Description OF 38 08 r PSIG
56. ES E DI For 64 bit mode move word at address R E SI to RIE DI 5 MOVSD A Valid Valid For legacy mode move dword from address DS E SI to ES E DI For 64 bit mode move dword from address R E SI to R E DI REX W 5 MOVSQ A Valid Move qword from address R E SI to RIE DI Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 NA NA NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 102 chenes intel MOVSD Move Scalar Double Precision Floating Point Value Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode F2 OF 10 r MOVSD xmm1 A Valid Valid Move scalar double xmm2 m64 precision floating point value from xmm2 m64 to xmm1 register F2 OF 11 r MOVSD B Valid Valid Move scalar double xmm2 m64 precision floating point 1 value from xmm1 register to xmm2 m64 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m r NA NA B ModRM r m w ModRM reg r NA NA MOVSHDUP Move Packed Single FP High and Duplicate Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 16 r MOVSHDUP A Valid Valid Move two single precision 1 floating point values from xmm2 m128 the higher 32 bit operand of each qword in xmm2 m128 to xmm1 and duplicate each 32 bit operand to the lower 32 bits of ea
57. Error Receive Checksum Error Send Checksum Error Address FEEO 0280H Value after reset OH NOTES 1 Used in Intel Core Pentium 4 Intel Xeon and P6 family processors reserved in the Pentium processor 2 Only used in the P6 family and Pentium processors reserved in Intel Core Pentium 4 and Intel Xeon processors Figure 10 9 Error Status Register ESR Table 10 2 ESR Flags FLAG Function Send Checksum Error P6 family and Pentium processors only Set when the local APIC detects a checksum error for a message that it sent on the APIC bus Receive Checksum Error P6 family and Pentium processors only Set when the local APIC detects a checksum error for a message that it received on the APIC bus Send Accept Error P6 family and Pentium processors only Set when the local APIC detects that a message it sent was not accepted by any APICon the APIC bus Receive Accept Error P6 family and Pentium processors only Set when the local APIC detects that the message it received was not accepted by any APIC on the APIC bus including itself Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 244 Documentation Changes intel Table 10 2 ESR Flags FLAG Function Send Checksum Error P6 family and Pentium processors only Set when the local APIC detects a checksum error for a message that it sent
58. FEEO 0090H Arbitration Priority Register APR Read Only 0 00A0H Processor Priority Register PPR Read Only 0 00 0 EOI Register Write Only FEEO 00 0 Remote Read Register RRD Read Only FEEO 00DOH Logical Destination Register Read Write FEEO OOEOH Destination Format Register Read Write see Section 10 6 2 2 FEEO OOFOH Spurious Interrupt Vector Register Read Write see Section 10 9 FEEO 0100H In Service Register ISR bits 31 0 Read Only FEEO 0110H In Service Register ISR bits 63 32 Read Only FEEO 0120H In Service Register ISR bits 95 64 Read Only FEEO 0130H In Service Register ISR bits 127 96 Read Only FEEO 0140H In Service Register ISR bits 159 128 Read Only FEEO 0150H In Service Register ISR bits 191 160 Read Only FEEO 0160H In Service Register ISR bits 223 192 Read Only FEEO 0170H In Service Register ISR bits 255 224 Read Only 0 0180H Trigger Mode Register TMR bits 31 0 Read Only 0 0190H Trigger Mode Register TMR bits 63 32 Read Only FEEO 01A0H Trigger Mode Register TMR bits 95 64 Read Only FEEO 01B0H Trigger Mode Register TMR bits 127 96 Read Only FEEO 01 0 Trigger Mode Register TMR bits 159 128 Read Only FEEO 01D0H Trigger Mode Register TMR bits 191 160 Read Only FEEO 01E0H Trigger Mode Register TMR bits 223 192 Read Only Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 240 Documentation Ch
59. Intel Atom Intel Core Duo Pentium M Pentium 4 Intel Xeon and P6 family processors provide bus control signals that permit external memory subsystems to make split accesses atomic however nonaligned data accesses will seri ously impact the performance of the processor and should be avoided An x87 instruction or an SSE instructions that accesses data larger than a quadword may be implemented using multiple memory accesses If such an instruction stores to memory some of the accesses may complete writing to memory while another causes the operation to fault for architectural reasons e g due an page table entry that is marked not present In this case the effects of the completed accesses may be visible to software even though the overall instruction caused a fault If TLB invalidation has been delayed see Section 4 10 3 4 such page faults may occur even if all accesses are to the same page 8 1 2 1 Automatic Locking The operations on which the processor automatically follows the LOCK semantics are as follows When executing an XCHG instruction that references memory When setting the B busy flag of a TSS descriptor The processor tests and sets the busy flag in the type field of the TSS descriptor when switching to a task To ensure that two processors do not switch to the same task simultaneously the processor follows the LOCK semantics while testing and setting this flag 8 1 2 2 Software Controlled Bus Locking
60. MOVSXD Move with Sign Extension Opcode Instruction Op 64 Bit En Mode OF BE r MOVSX r16 r m8 A Valid OF BE r MOVSX r32 r m8 Valid REX OF BE r OF BF r MOVSX r32 r m16 REX W 0F BF MOVSX r64 r m16 REX W 63 r MOVSXD r64 r m32 MOVSX r64 r m8 A Valid A Valid A Valid A Valid Compat Leg Mode Valid Valid N E Valid N E N E Description Move byte to word with Sign extension Move byte to doublew ord with sign extension Move byte to quadword with sign extension Move word to doubleword with sign extension Move word to quadword with sign extension Move doubleword to quadword with sign extension NOTES n 64 bit mode r m8 can not be encoded to access the following byte registers if a REX prefix is used AH BH CH DH The use of MOVSXD without REX W in 64 bit mode is discouraged Regular MOV should be used instead of using MOVSXD without REX W Instruction Operand Encoding Op En A ModRM reg w Operand 1 Operand 2 ModRM r m Operand 3 Operand 4 NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 105 chenes intel MOVUPD Move Unaligned Packed Double Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF 10 r MOVUPDxmml A Valid Valid Move packed double xmm2 m128 precision floa
61. Mode OF 09 WBINVD A Valid Valid Write back and flush Internal caches initiate writing back and flushing of external caches Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 201 Documentation Changes WRMSR Write to Model Specific Register Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 30 WRMSR A Valid Valid Write the value in EDX EAX to MSR specified by ECX Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA XADD Exchange and Add Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF CO r XADD r m8 r8 A Valid Valid Exchange r8 and r m8 load sum into r m8 REX 0F CO r XADDr m8 r8 A Valid N E Exchange r8 and r m8 load sum into r m8 OF Cl r XADDr m16 r16 Valid Valid Exchange r16 and r m16 load sum into r m16 OF Cl r r m32 r32 A Valid Valid Exchange r32 and r m32 load sum into r m32 REX W 0F C1 XADDr m64 r64 Valid N E Exchange r64 and r m64 Ir load sum into r m64 NOTES n 64 bit mode r m8 can not be encoded to access the following byte registers if a REX prefix is used AH BH CH DH Instruction Operand Encoding Op En Operand 1 A ModRM r m r w Operand 2 ModRM reg r Operand 3 Operand 4 NA NA I
62. OF AC SHRD r m32 r32 Valid Valid Shift r m32 to right imm8 imm8 places while shifting bits from r32 in from the left REX W 0F AC SHRDr m64 r64 Valid N E Shift r m64 to right imm8 imm8 places while shifting bits from r64 in from the left OF AD SHRD r m32 r32 Valid Valid Shift r m32 to right CL CL places while shifting bits from r32 in from the left REX W 0FAD SHRDr m64 r64 Valid Shift r m64 to right CL CL places while shifting bits from r64 in from the left Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m w ModRM reg imm8 NA B ModRM r m w ModRM reg r CL NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 185 Documentation Changes SHUFPD Shuffle Packed Double Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF rib SHUFPDxmml1 A Valid Valid Shuffle packed double xmm2 m128 precision floating point imm8 values selected by imm8 from xmm1 and xmm2 m128 to xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r imm8 NA SHUFPS Shuffle Packed Single Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF C6 r ib SHUFPS xmm1 A Valid Valid Shuffle packed single xmm2 m128 precision floating point imm8 values se
63. Operand 3 Operand 4 A ModRM r m r NA NA NA XSAVE Save Processor Extended States Opcode Instruction Op 64 Bit Compat Description En Leg Mode OF AE 4 XSAVE mem A Valid Valid Save processor extended EDX EAX states to memory The states are specified by Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m NA NA NA XSETBV Set Extended Control Register Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 01 D1 XSETBV A Valid Valid Write the value in EDX EAX to the XCR specified by ECX Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 A NA NA NA Operand 4 NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 207 ee ehanas intel 3 Updates to Chapter 4 Volume 3A Change bars show changes to Chapter 4 of the Intel 64 and 32 Architectures Soft ware Developer s Manual Volume 3A System Programming Guide Part 1 Table 4 1 illustrates the key differences between the three paging modes Table 4 1 Properties of Different Paging Modes Linear Physical Supports Paging cro pG cna pag MEIN Address Address Page Execute Mode IA32_EFER Width width 5266 Disable None 0 N A N A 32 32 N A No p 2 3 4 KByte 32 bit 1 0 0 32 Up to 40 4 MByte No 4 KByte 5 PAE 1 1 0 32 Up to 52
64. Operand 3 Operand 4 A ModRM reg w ModRM r m MULSS Multiply Scalar Single Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 59 r MULSS xmm1 A Valid Valid Multiply the low single xmm2 m32 precision floating point value in xmm2 mem by the low single precision floating point value in xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 109 Documentation Changes MWAIT Monitor Wait intel Opcode Instruction Op 64 Bit Compat Description En Leg Mode OF 01 C9 MWAIT A Valid Valid A hint that allow the processor to stop instruction execution and enter an implementation dependent optimized state until occurrence of a class of events Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA Updates to Chapter 4 Volume 2B Change bars show changes to Chapter 4 of the Intel 64 and 32 Architectures Soft ware Developer s Manual Volume 2B Instruction Set Reference N Z NEG Two s Complement Negation Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode F6 3 NEG r m8 A Valid Valid Two s complement negate r m8 REX F6 3 NEG r m8 A Valid N E Two s complement
65. PSADBW 1 A Valid Valid Computes the absolute xmm2 m128 differences of the packed unsigned byte integers from xmm2 m128 and xmm1 the 8 low differences and 8 high differences are then summed separately to produce two unsigned word integer results Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 146 H intel Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA PSHUFB Packed Shuffle Bytes Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 38 00 r PSHUFB mm1 A Valid Valid Shuffle bytes in mm1 mm2 m64 according to contents of mm2 m64 660F3800 r PSHUFBxmml Valid Valid Shuffle bytes in xmm1 xmm2 m128 according to contents of xmm2 m128 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m NA NA PSHUFD Shuffle Packed Doublewords Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 0F 70 rib PSHUFD xmm1 A Valid Valid Shuffle the doublewords in xmm2 m128 xmm2 m128 based on the imm8 encoding in imm8 and store the result in xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m r imm8 NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 147
66. RAX AC LODSB A Valid Valid For legacy mode Load byte at address DS E SI into AL For 64 bit mode load byte at address R SI into AL AD LODSW A Valid Valid For legacy mode Load word at address DS E SI into AX For 64 bit mode load word at address R SI into AX AD LODSD A Valid Valid For legacy mode Load dword at address DS E SI into EAX For 64 bit mode load dword at address R SI into EAX REX W AD LODSQ A Valid Load at address R SI into RAX Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 NA NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 81 chenes intel LOOP LOOP cc Loop According to ECX Counter Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode E2 cb LOOP rel8 A Valid Valid Decrement count jump short if count 0 E1 cb LOOPE rel8 A Valid Valid Decrement count jump short if count 0 and ZF 1 0 cb LOOPNE rel8 A Valid Valid Decrement count jump short if count 0 and ZF 0 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A Offset NA NA NA LSL Load Segment Limit Opcode Instruction Op 64 Bit Compat Description En Leg Mode OF 03 r LSL r16 r16 m16 A Valid Valid Load r16 lt segment limit selector r16 m16 OF 03 r LSL r32 r32 m16 A Valid Valid Load r32 lt segment
67. REX prefix is used AH BH CH DH Instruction Operand Encoding Op En Operand 1 A ModRM reg r w B ModRM r m r w AL AX EAX RAX Operand 2 ModRM r m imm8 imm8 Operand 3 Operand 4 NA NA NA NA NA NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 17 Documentation Changes ANDPD Bitwise Logical AND of Packed Double Precision Floating Point Values Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode 66 OF 54 Jr ANDPD xmm1 A Valid Valid Bitwise logical AND of xmm2 m128 xmm2 m128 and xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA ANDPS Bitwise Logical AND of Packed Single Precision Floating Point Values Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode OF 54 r ANDPS 1 A Valid Valid Bitwise logical AND of xmm2 m128 xmm2 m128 and xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA ANDNPD Bitwise Logical AND NOT of Packed Double Precision Floating Point Values Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode 66 OF 55 r ANDNPDxmml A Valid Valid Bitwise logical AND NOT of xmm2 m128 xmm2 m128 and xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w
68. REX prefix is used AH BH CH DH Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m r NA NA NA MULPD Multiply Packed Double Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF 59 r MULPD xmm1 A Valid Valid Multiply packed double xmm2 m128 precision floating point values xmm2 m128 by 1 Instruction Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 108 chenes intel MULPS Multiply Packed Single Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 59 Jr MULPS xmm1 A Valid Valid Multiply packed single xmm2 m128 precision floating point values in xmm2 mem by xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA MULSD Multiply Scalar Double Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode F2 0F 59 r MULSD xmm1 A Valid Valid Multiply the low double xmm2 m64 precision floating point value in xmm2 mem64 by low double precision floating point value in 1 Instruction Operand Encoding Op En Operand 1 Operand 2
69. This layout enables 2 16 1 clusters each with up to 16 unique logical IDs effectively providing an addressability of 2 20 16 processors in logical destination mode Itis likely that processor implementations may choose to support less than 16 bits of the cluster ID or less than 16 bits of the Logical ID in the Logical Destination Register However system software should be agnostic to the number of bits implemented in the cluster ID and logical ID sub fields The x2APIC hardware initialization will ensure that the appropriately initialized logical x2APIC IDs are available to system software and reads of non implemented bits return zero This is a read only register that software must read to determine the logical x2APIC ID of the processor Specifically software can apply a 16 bit mask to the lowest 16 bits of the logical x2APIC ID to identify the logical address of a processor within a cluster without needing to know the number of imple mented bits in cluster ID and Logical ID sub fields Similarly software can create a message destination address for cluster model by bit Oring the Logical X2APIC ID 31 0 of processors that have matching Cluster ID 31 16 To enable cluster ID assignment a fashion that matches the system topology charac teristics and to enable efficient routing of logical mode lowest priority device interrupts in link based platform interconnects the LDR are initialized by hardware based on the value of x2APIC ID up
70. Valid Valid N E Valid Valid N E Valid Valid N E Valid Valid N E Valid Valid N E Valid Valid N E Valid Valid N E Valid Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes Description Move if less SFzz OF Move if less SF OF Move if less SF OF Move if less or equal ZF 1 SFz OF Move if less or equal ZF 1 or SFz OF Move if less or equal ZF 1 or SFz OF Move if not above CF 1 or ZF 1 Move if not above CF 1 or ZF 1 Move if not above CF 1 or ZF 1 Move if not above or equal 1 Move if not above or equal 1 Move if not above or equal 1 Move if not below CF 0 Move if not below 0 Move if not below CF 20 Move if not below or equal CF 0 and ZF 0 Move if not below or equal CF 0 and ZF 0 Move if not below or equal CF 0 and ZF 0 Move if not carry CF 0 Move if not carry CF 0 Move if not carry CF 0 Move if not equal ZF 0 Move if not equal ZF 0 Move if not equal ZF 0 Move if not greater ZF 1 or 5 5 OF 31 Documentation Changes Opcode OF 4E r REX W 0F 4E OF 4C r OF 4C r REX W 0F 4C OF 4D r OF 4D r REX W 0F 4D Ir OF 4F r OF 4F r REX W OF OF 41 r OF 41 r REX W 0F 41 Ir OF 4B r OF 4B r REX W 4B Ir OF 49 r OF 49 r REX W 0F 49 Ir OF 45 r OF
71. Valid Subtract signed packed mm m64 bytes in mm m64 from signed packed bytes in mm and saturate results 66 OF E8 r PSUBSB xmml Valid Valid Subtract packed signed byte xmm2 m128 integers in xmm2 m128 from packed signed byte integers in xmm1 and saturate results OF E9 r PSUBSW mm A Valid Valid Subtract signed packed mm m64 words in mm m64 from signed packed words in mm and saturate results Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 154 Documentation Changes Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF E9 r PSUBSW 1 A Valid Valid Subtract packed signed xmm2 m128 word integers in xmm2 m128 from packed signed word integers in 1 and saturate results Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA PSUBUSB PSUBUSW Subtract Packed Unsigned Integers with Unsigned Saturation Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF D8 r PSUBUSB mm A Valid Valid Subtract unsigned packed mm m64 bytes in mm m64 from unsigned packed bytes in mm and saturate result 66 OF D8 r PSUBUSB 1 Valid Valid Subtract packed unsigned xmm2 m128 byte integers in xmm2 m128 from packed unsigned byte integers in xmm1 and saturate result OF D9 r PSUBUSW mm A Valid Valid Subtract unsigned packed mm m64 words in mm m64 from
72. address causes a page fault exception see Section 4 7 In the examples above a paging structure entry maps a page with 4 KByte page frame when only 12 bits remain in the linear address entries identified earlier always reference other paging structures That may not apply in other cases The following items identify when an entry maps a page and when it references another paging structure If more than 12 bits remain in the linear address bit 7 PS page size of the current paging structure entry is consulted If the bit is 0 the entry references another paging structure if the bit is 1 the entry maps a page If only 12 bits remain in the linear address the current paging structure entry always maps a page bit 7 is used for other purposes If a paging structure entry maps a page when more than 12 bits remain in the linear address the entry identifies a page frame larger than 4 KBytes For example 32 bit paging uses the upper 10 bits of a linear address to locate the first paging structure entry 22 bits remain If that entry maps a page the page frame is 222 Bytes 4 MBytes 32 bit paging supports 4 MByte pages if CRA PSE 1 PAE paging and IA 32e paging support 2 MByte pages regardless of the value of CR4 PSE A 32e paging support 1 GByte pages see Section 4 1 4 Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 210 ee ehanas intel Paging structures are given diffe
73. are supported 0 NONE 40000000 00000000H RSPFWDI 40001A00 00000000H RSPFWDS 40001D00 00000000H RSPIWB Match opcode address by writing MSR 396H with mask supported mask value Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 279 Documentation Changes 35H 02H UNC_ADDR_OPCODE Counts number of requests fromthe Match opcode _MATCH REMOTE remote socket address opcode of address by request is qualified by mask value writing MSR written to MSR 396H The following 396H with mask values are supported mask 0 NONE supported 40000000 00000000HRsPFwp mask value 40001A00_00000000H RSPFWDS 40001D00_00000000H RSPIWB 35H 04H UNC_ADDR_OPCODE Counts number of requests fromthe Match opcode _MATCH LOCAL local socket address opcode of address by request is qualified by mask value writing MSR written to MSR 396H The following 396H with mask values are supported mask 0 NONE supported 40000000 00000000H RSPFWo 7955 value 40001A00_00000000H RSPFWDS 40001D00_00000000H RSPIWB 42H 01H UNC QPI TX HEADE Number of cycles that the header R FULL LINK 0 buffer in the Quickpath Interface outbound link 0 is full 42H 04H UNC TX HEADE Number of cycles that the header R FULL LINK 1 buffer in the Quickpath Interface outbound link 1 is full 67H 01H DRAM THERM Uncore cycles DRAM was throttled AL_THROTTLED due to its temperature
74. at the end of the next instruction Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA STMXCSR Store MXCSR Register State Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF AE 3 STMXCSR m32 A Valid Valid Store contents of MXCSR register to m32 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m w NA NA NA STOS STOSB STOSW STOSD STOSQ Store String Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode AA STOS m8 A Valid Valid For legacy mode store AL at address ES E DI For 64 bit mode store AL at address RDI or EDI AB STOS m16 A Valid Valid For legacy mode store AX at address ES E DI For 64 bit mode store AX at address RDI or EDI AB STOS m32 A Valid Valid For legacy mode store EAX at address ESXE DI For 64 bit mode store EAX at address RDI or EDI REX W AB STOS m64 A Valid N E Store RAX at address RDI or EDI Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 190 Documentation Changes Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode AA STOSB A Valid Valid For legacy mode store AL at address ES E DI For 64 bit mode store AL at address RDI or EDI AB STOSW A Valid Valid For legacy mode store AX at address ES E DI For 64 bit mode store AX
75. by reading the ICR is the last written value A destination ID value of FFFF_FFFFH is used for broadcast of interrupts in both logical destination and physical destination modes Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 258 chenes intel 10 12 10 Determining IPI Destination in x2APIC Mode 10 12 10 1 Logical Destination Mode in x2APIC Mode In x2APIC mode the Logical Destination Register LDR is increased to 32 bits wide It is a read only register to system software This 32 bit value is referred to as logical X2APIC ID System software accesses this register via the RDMSR instruction reading the MSR at address 80DH Figure 10 30 provides the layout of the Logical Destination Register in x2API C mode MSR Address 80DH 31 0 Logical x2APIC ID Figure 10 30 Logical Destination Register in x2APIC Mode In the xAPI C mode the Destination Format Register DFR through MMIO interface determines the choice of a flat logical mode or a clustered logical mode Flat logical mode is not supported in the x2APIC mode Hence the Destination Format Register DFR is eliminated in x2APIC mode The 32 bit logical x2APIC ID field of LDR is partitioned into two sub fields Cluster ID LDR 31 16 is the address of the destination cluster Logical 1 LDR 15 0 defines a logical ID of the individual local x2APIC within the cluster specified by LDR 31 16
76. continues to run at the same rate in deep C states If CPUID O6H EAX ARAT bit 2 0 or if CPUID O6H is not supported the APIC timer may temporarily stop while the processor is in deep C states or during transitions caused by Enhanced Intel SpeedStep amp Technology 31 43210 Reserved 0 Address FEEO Value after reset 0H Divide Value bits 0 1 and 3 000 Divide by 2 001 Divide by 4 010 Divide by 8 011 Divide by 16 100 Divide by 32 101 Divide by 64 110 Divide by 128 111 Divide by 1 Figure 10 10 Divide Configuration Register Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 245 m e Documentation Changes n tel 10 6 1 Interrupt Command Register ICR The interrupt command register ICR is a 64 bit local APIC register see Figure 10 12 that allows software running on the processor to specify and send interprocessor inter rupts 5 to other processors the system To send IPI software must set up the ICR to indicate the type of IPI message to be sent and the destination processor or processors All fields of the ICR are read write by software with the exception of the delivery status field which is read only The act of writing to the low doubleword of the ICR causes the IPI to be sent Delivery Status Read Only Indicates the IPI delivery status as follows 0 Idle Indica
77. discouraged due to negative impact on performance Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 271 chenes intel 12 Updates to Chapter 30 Volume 3B Change bars show changes to Chapter 30 of the Intel 64 and IA 32 Architectures Soft ware Developer s Manual Volume 3B System Programming Guide Part 2 30 2 3 Pre defined Architectural Performance Events A processor that supports architectural performance monitoring may not support all the predefined architectural performance events Table 30 1 The non zero bits in CPUI D OAH EBX indicate the events that are not available 30 6 PERFORMANCE MONITORING FOR PROCESSORS BASED ON INTEL MICROARCHITECTURE NEHALEM Intel Core i7 processor family supports architectural performance monitoring capability with version ID 3 see Section 30 2 2 2 and a host of non architectural monitoring capabilities The Intel Core i7 processor family is based on Intel Microarchitecture Nehalem and provides four general purpose performance counters 1A32 PMCO 2 PMC1 1A32 PMC2 1A32 and three fixed function performance counters IA32 FIXED 2 FIXED CTR1 1A32 FIXED CTR2 in the processor core 30 6 1 1 Precise Event Based Sampling PEBS All four general purpose performance counters 2 can be used for PEBS if the performance event supports PEBS Software uses 32 MISC ENABLES 7
78. entry contains information from the PML4E and PDPTE used to translate such linear addresses The physical address from the PDPTE the address of the page directory No PDPTE cache entry is created for a PDPTE that maps a 1 GByte page The logical AND of the R W flags in the PMLAE and the PDPTE The logical AND of the U S flags in the PMLAE and the PDPTE The logical OR of the XD flags in the PMLAE and the PDPTE The values of the PCD and PWT flags of the PDPTE The following items detail how a processor may use the PDPTE cache Ifthe processor has a PDPTE cache entry for a linear address it may use that entry when translating the linear address instead of the PMLAE and the PDPTE in memory The processor does not create a PDPTE cache entry unless the P flag is 1 the PS flag is 0 and the reserved bits 0 in the PML4E and the PDPTE in memory 4 10 3 2 Recommended Invalidation The following items provide some recommendations regarding when software should perform invalidations If software modifies a paging structure entry that identifies the final page frame for a page number either a PTE or a paging structure entry in which the PS flag is 1 it should execute NVLPG for any linear address with a page number whose translation uses that PTE If the paging structure entry may be used in the translation of different page numbers see Section 4 10 2 3 software should execute INVLPG for l
79. equal SF 0F OF 9C SETL r m8 A Valid Valid Set byte if less SF OF REX 0F 9C SETL r m8 A Valid N E Set byte if less SF4 OF OF 9E SETLE r m8 A Valid Valid Set byte if less or equal ZF 1 or SFz OF Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 181 Documentation Changes Opcode REX 0F 9E OF 96 REX 0F 96 OF 92 REX 0F 92 OF 93 REX 0F 93 OF 97 REX 0F 97 OF 93 REX 0F 93 OF 95 REX 0F 95 OF 9E REX 0F 9E OF 9C REX OF 9C OF 9D REX 0F 9D OF 9F REX OF 9F OF 91 REX 0F 91 OF 9B REX 9B OF 99 Instruction SETLE r m8 SETNA r m8 SETNA r m8 SETNAE r m8 SETNAE r m8 SETNB r m8 SETNB r m8 SETNBE r m8 SETNBE r m8 SETNC r m8 SETNC r m8 SETNE r m8 SETNE r m8 SETNG r m8 SETNG r m8 SETNGE r m8 SETNGE r m8 SETNL r m8 SETNL r m8 SETNLE r m8 SETNLE r m8 SETNO r m8 SETNO r m8 SETNP r m8 SETNP r m8 SETNS r m8 Op En A gt gt gt gt gt 64 Bit Mode Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Compat Leg Mode N E Valid N E Valid N E Valid N E Valid N E Valid N E Valid N E Valid N E Valid N E Valid N E Valid N E Valid N E Valid N E Valid Int
80. executive procedures running at privilege level 0 SYSRET is intended for use Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 227 ee ehanas intel by privilege level O operating system or executive procedures for fast returns to privilege level 3 user code Stack pointers for SYSCALL SYSRET are not specified through model specific registers The clearing of bits in RFLAGS is programmable rather than fixed SYSCALL SYSRET save and restore the RFLAGS register For SYSCALL the processor saves RFLAGS into R11 and the RIP of the next instruction into RCX it then gets the privilege level 0 target instruction and stack pointer from Target code segment Reads a non NULL selector from 2 STAR 47 32 Target instruction Reads 64 bit canonical address from 1A32 LSTAR Stack segment Computed by adding 8 to the value in 2 STAR 47 32 System flags The processor sets RFLAGS to the logical AND of its current value with the complement of the value in the 2 FMASK MSR Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 228 ee ehanas intel 5 Updates to Chapter 8 Volume 3A Change bars show changes to Chapter 8 of the Intel 64 and 32 Architectures Soft ware Developer s Manual Volume 3A System Programming Guide Part 1 8 1 LOCKED ATOMIC OPERATIONS The 32 bit A 32 processors support locked atomic operations on loca
81. fetched from memory before the LFENCE but they will not execute until the LFENCE completes Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 235 e Documentation Changes n tel applied to an address range dedicated to memory mapped 1 0 devices to force strong memory ordering For areas of memory where weak ordering is acceptable the write back WB memory type can be chosen Here reads can be performed speculatively and writes can be buffered and combined For this type of memory cache locking is performed on atomic locked operations that do not split across cache lines which helps to reduce the performance penalty associated with the use of the typical synchroni zation instructions such as XCHG that lock the bus during the entire read modify write operation With the WB memory type the XCHG instruction locks the cache instead of the bus if the memory access is contained within a cache line The PAT was introduced in the Pentium Ill processor to enhance the caching characteris tics that can be assigned to pages or groups of pages The PAT mechanism typically used to strengthen caching characteristics at the page level with respect to the caching char acteristics established by the MTRRs Table 11 7 shows the interaction of the PAT with the MTRRs Intel recommends that software written to run on Intel Core 2 Duo Intel Atom Intel Core Duo Pentium 4 Intel Xeon and P6 family proces
82. if below or equal CF 1 or ZF 1 Not supported in 64 bit mode Jump near if below or equal CF 1 or ZF 1 Jump near if carry CF 1 Not supported in 64 bit mode Jump near if carry CF 1 Jump near if equal ZF 1 Not supported in 64 bit mode 70 Documentation Changes Opcode OF 84 cd OF 84 cw OF 84 cd OF 8F cw OF 8F cd OF 8D cw OF 8Dcd OF 8C cw OF 8C cd OF 8E cw OF 8E cd OF 86 cw OF 86 cd OF 82 cw OF 82 cd OF 83 cw OF 83 cd OF 87 cw Instruction JE rel32 JZ rel16 JZ rel32 JG rel16 JG rel32 JGE 16 rel32 JL rel16 JL rel32 JLE rel16 JLE rel32 JNA rel16 JNA rel32 JNAE rel16 JNAE rel32 JNB rel16 JNB rel32 JNBE rel16 Op En A A 64 Bit Mode Valid NS Valid NS Valid 5 Valid 5 Valid NS Valid 5 Valid N S Valid NS Valid 5 Compat Leg Mode Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes Description Jump near if equal 2 1 Jump near if 0 ZF 1 Not supported in 64 bit mode Jump near if 0 2 1 Jump near if greater ZF 0 and SF OF Not supported in 64 bit mode Jump near if greater ZF 0 and SF OF Jump near if greater or equal SF OF N
83. imm8 from xmm and move it to reg bits 15 0 The upper bits of r32 or r64 is zeroed 66 OF 3A 15 PEXTRWreg m16 Valid Valid Extract the word specified Ir ib xmm imm8 by imm8 from xmm and copy it to lowest 16 bits of reg or m16 Zero extend the result in the destination r32 or r64 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM reg r imm8 NA B ModRM r m w ModRM reg r imm8 NA PHADDW PHADDD Packed Horizontal Add Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 38 01 r PHADDW mm1 A Valid Valid Add 16 bit signed integers mm2 m64 horizontally pack to MM1 66 0F3801 r PHADDWxmml A Valid Valid Add 16 bit signed integers xmm2 m128 horizontally pack to XMM1 OF 38 02 r PHADDD mm1 A Valid Valid Add 32 bit signed integers mm2 m64 horizontally pack to MM1 66 0 38 02 PHADDDxmml A Valid Valid Add 32 bit signed integers xmm2 m128 horizontally pack to XMM1 Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 127 H intel Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA PHADDSW Packed Horizontal Add and Saturate Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 38 03 r PHADDSW mm1 A Valid Valid Add 16 bit signed integers mm2 m64 horizontally pack saturated integers to MM
84. instruction has been completed Synchronization mechanisms in multiple processor systems may depend upon a strong memory ordering model Here a program can use a locking instruction such as the XCHG instruction or the LOCK prefix to ensure that a read modify write operation on memory is carried out atomically Locking operations typically operate like opera tions in that they wait for all previous instructions to complete and for all buffered writes to drain to memory see Section 8 1 2 Bus Locking Program synchronization can also be carried out with serializing instructions see Section 8 3 These instructions are typically used at critical procedure or task bound aries to force completion of all previous instructions before a jump to a new section of code or a context switch occurs Like the 1 0 and locking instructions the processor waits until all previous instructions have been completed and all buffered writes have been drained to memory before executing the serializing instruction The SFENCE LFENCE and MFENCE instructions provide a performance efficient way of ensuring load and store memory ordering between routines that produce weakly ordered results and routines that consume that data The functions of these instructions are as follows SFENCE Serializes all store write operations that occurred prior to the SFENCE instruction in the program instruction stream but does not affect load operations LFENCE
85. instruction was introduced by the Pentium processor See Changes to Instruction Behavior in VMX Non Root Operation in Chapter 22 of the Intel 64 and 32 Architectures Software Developer s Manual Volume for more information about the behavior of this instruction in VMX non root operation Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 169 Documentation Changes RDTSCP Read Time Stamp Counter and Processor ID Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 01 F9 RDT SCP A Valid Valid Read 64 bit time stamp counter and 32 bit 32 TSC AUX value into EDX EAX and ECX Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA REP REPE REPZ REPNE REPNZ Repeat String Operation Prefix Opcode 6C 6C F3 6D F3 6D F3 6D F3 4 REX W A4 F3 A5 F3 A5 REX W 5 6 REX W 6E 6F F36F Instruction REP INS m8 DX REP INS m8 DX REP INS m16 DX REP INS m32 DX REP INS r m32 DX REP MOVS m8 m8 REP MOVS m8 m8 REP MOVS m16 16 REP 5 32 32 REP 5 64 m64 REP OUTS DX r m8 REP OUTS DX r m8 REP OUTS DX r m16 REP OUTS DX r m32 Op En A 64 Bit Mode Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Compat
86. interrupt causes EOI messages to be broadcast to the 1 0 APICs 0 or not 1 See Section 10 8 5 The default value for this bit is O indicating that EOI broadcasts are performed This bit is reserved to 0 if the processor does not support EOI broadcast suppression Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 249 Documentation Changes intel NOTE Do not program an LVT or IOAPIC RTE with a spurious vector even if you set the mask bit A spurious vector ISR does not do an EOI If for some reason an interrupt is generated by an LVT or RTE entry the bit in the in service register will be left set for the spurious vector This will mask all interrupts at the same or lower priority 31 121110 9 8 7 0 EOI Broadcast Suppression 0 Enabled 1 Disabled Focus Processor Checking 0 Enabled 1 Disabled APIC Software Enable Disable 0 APIC Disabled 1 APIC Enabled Spurious Vector Address FEEO 00 Value after reset 0000 OOFFH 1 Not supported on all processors 2 Not supported in Pentium 4 and Intel Xeon processors 3 For the P6 family and Pentium processors bits 0 through 3 are always 0 Figure 10 23 Spurious Interrupt Vector Register SVR 10 12 EXTENDED XAPIC X2APIC The x2APIC architecture extends the xAPIC architecture described in Section 9 4 ina backward compatible manner and provides forward extendability for fut
87. is an XMM register The result is stored in the low doubleword of the destination operand the 3 high order doublewords remain unchanged The comparison predicate operand is an 8 bit immediate the first 3 bits of which define the type of comparison to be made see Table 3 15 Bits 3 through 7 of the immediate are reserved The unordered relationship is true when at least one of the two source operands being compared is a NaN the ordered relationship is true when neither source operand is a NaN A subsequent computational instruction that uses the mask result in the destination operand as an input operand will not generate a fault since a mask of all Os corresponds to a floating point value of 0 0 and a mask of all 1s corresponds to a QNaN Some of the comparisons listed in Table 3 15 can be achieved only through software emulation For these comparisons the program must swap the operands copying regis ters when necessary to protect the data that will now be in the destination operand and then perform the compare using a different predicate The predicate to be used for these emulations is listed in Table 3 15 under the heading Emulation Compilers and assemblers may implement the following two operand pseudo ops in addition to the three operand CMPSS instruction See Table 3 19 Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 38 Cages intel CMPXCHG Compare and Exchange Opco
88. m128 and xmm1 OF FE r PADDD mm A Valid Valid Add packed doublew ord mm m64 integers from mm m64 and mm 66 OF FE r PADDD 1 A Valid Valid Add packed doublew ord xmm2 m128 integers from xmm2 m128 and xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA PADDQ Add Packed Quadword Integers Opcode Instruction Op 64 Bit Compat Description En Leg Mode OF D4 r PADDQ mm1 A Valid Valid Add quadword integer mm2 m64 mm2 m64 to mm1 66 OF D4 r PADDQ xmm1 A Valid Valid Add packed quadword xmm2 m128 integers xmm2 m128 to 1 Instruction Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 118 chenes intel PADDSB PADDSW Add Packed Signed Integers with Signed Saturation Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF EC r PADDSB mm A Valid Valid Add packed signed byte mm m64 integers from mm m64 and mm and saturate the results 66 OF EC r PADDSB 1 A Valid Valid Add packed signed byte xmm2 m128 integers from xmm2 m128 and xmm1 saturate the results OF ED r PADDSW mm A Valid Valid Add packed signed word mm m64 integers from mm m64 and mm and saturate the results 66 OF ED r PADDSW xmm1 A Valid Valid Add packed signed
89. negate r m8 F7 3 NEG r m16 A Valid Valid Two s complement negate r m16 F7 3 NEG r m32 A Valid Valid Two s complement negate r m32 REXW F7 3 4 Valid N E Two s complement negate r m64 NOTES n 64 bit mode r m8 can not be encoded to access the following byte registers if a REX prefix is used AH BH CH DH Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes Documentation Changes Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m w NA NA NA NOP No Operation Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 90 NOP A Valid Valid One byte no operation instruction OF 1F 0 NOP r m16 B Valid Valid Multi byte no operation instruction OF 1F 0 NOP r m32 B Valid Valid Multi byte no operation instruction Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA B ModRM r m r NA NA NA NOT One s Complement Negation Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode F6 2 NOT r m8 A Valid Valid Reverse each bit of r m8 REX F6 2 NOT r m8 A Valid N E Reverse each bit of r m8 F7 2 NOT r m16 A Valid Valid Reverse each bit of r m16 F7 2 NOT r m32 A Valid Valid Reverse each bit of r m32 REXW F7 2 NOT r m64 A Valid N E Reverse each bit of r m64 NOTES n 64 bit mode r m8 can not be encoded to access the follow
90. of an MFENCE instruction This instruction s operation is the same in non 64 bit modes and 64 bit mode MINPD Return Minimum Packed Double Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF 5D r MINPD xmm1 A Valid Valid Return the minimum double xmm2 m128 precision floating point values between xmm2 m128 and 1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA 1 Aloadinstruction is considered to become globally visible when the value to be loaded into its desti nation register is determined Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 86 chenes intel MINPS Return Minimum Packed Single Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 5D r MINPS xmm1 A Valid Valid Return the minimum single xmm2 m128 precision floating point values between xmm2 m128 and xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA MINSD Return Minimum Scalar Double Precision Floating Point Value Opcode Instruction Op 64 Bit Compat Description En Leg Mode F2 OF 5D r MINSD xmm1 A Valid Valid Return the minimum scalar xmm2 m64 double precision floating point value between xmm
91. page directory pointer table The value of the R W flag of the PML4E The value of the U S flag of the PML4E The value of the XD flag of the PML4E The values of the PCD and PWT flags of the PML4E The following items detail how a processor may use the PML4 cache If the processor has a PML4 cache entry for a linear address it may use that entry when translating the linear address instead of the PML4E in memory The processor does not create a PML4 cache entry unless the P flag is 1 and all reserved bits are 0 in the PML4E in memory The processor does not create a PML4 cache entry unless the accessed flag is 1 in the PML4E in memory before caching a translation the processor sets the accessed flag if it is not already 1 The processor may create a PML4 cache entry even if there are no translations for any linear address that might use that entry e g because the P flags are 0 in all entries the referenced page directory pointer table If the processor creates a PML4 cache entry the processor may retain it unmodified even if software subsequently modifies the corresponding PML4E in memory Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 222 m e Documentation Changes n tel PDPTE cache 1A 32e paging only Each PDPTE cache entry is referenced by an 18 bit value and is used for linear addresses for which bits 47 30 have that value The
92. processor Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 58 Changes intel FXRSTOR Restore x87 FPU MMX XMM and MXCSR State Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF AE 1 FXRSTOR A Valid Valid Restore the x87 FPU MMX m512byte XMM and MXCSR register state from m512byte REX W 0F AE FXRSTOR64 A Valid Restore the x87 FPU 1 m512byte XMM and MXCSR register state from m512byte Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m r NA NA NA FXSAVE Save x87 FPU MMX Technology and SSE State Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF AE 0 FXSAVE A Valid Valid Save the x87 FPU MMX m512byte XMM and MXCSR register state to m512byte REX W 0F AE FXSAVE64 A Valid N E Save the x87 FPU MMX 0 m512byte XMM and MXCSR register state to m512byte Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m NA NA NA HADDPD Packed Double FP Horizontal Add Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF 7C r HADDPDxmml Valid Valid Horizontal add packed xmm2 m128 double precision floating point values from xmm2 m128 to xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA Intel 64 and IA
93. r m r NA NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes chenes intel 199 Documentation Changes UNPCKLPD Unpack and Interleave Low Packed Double Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF 14 r UNPCKLPD xmm1 A Valid Valid Unpacks and Interleaves xmm2 m128 double precision floating point values from low quadwords of xmm1 and xmm2 m128 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA UNPCKLPS Unpack and Interleave Low Packed Single Precision Floating Point Values Opcode Instruction 64 Bit Compat Description En Mode Leg Mode OF 14 r UNPCKLPS 1 Valid Valid Unpacks and Interleaves xmm2 m128 single precision floating point values from low quadwords of xmm1 and xmm2 mem into 1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA VERR VERW Verify a Segment for Reading or Writing Opcode OF 00 4 OF 00 5 Instruction VERR r m16 VERW r m16 Op En A B 64 Bit Mode Valid Valid Compat Leg Mode Valid Valid Description Set ZF 1 if segment specified with r m16 can be read Set ZF 1 if segment specified with r m16 can be written Intel 64 and 32 Ar
94. result 85 r TEST r m32 r32 C Valid Valid AND r32 with r m32 set SF ZF PF according to result REXW 85 r TEST r m64 r64 Valid N E AND r64 with r m64 set SF ZF PF according to result NOTES n 64 bit mode r m8 can not be encoded to access the following byte registers if a REX prefix is used AH BH CH DH Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 197 Documentation Changes Instruction Operand Encoding Op En B 1 AL AX EAX RAX ModRM r m ModRM r m Operand 2 imm8 16 32 imm8 16 32 ModRM reg Operand 3 Operand 4 NA NA NA NA NA NA UCOMISD Unordered Compare Scalar Double Precision Floating Point Values and Set EFLAGS Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF 2E r UCOMISD xmm1 A Valid Valid Compares unordered the xmm2 m64 low double precision floating point values in xmm1 and xmm2 m64 and set the EFLAGS accordingly Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg ModRM r m NA NA UCOMISS Unordered Compare Scalar Single Precision Floating Point Values and Set EFLAGS Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 2b r UCOMISS xmm1 A Valid Valid Compare lower single xmm2 m32 precision floating point value in xmm1 register with lower single precision floati
95. software modifies the paging structures so that the page size used for a 4 KByte range of linear addresses changes the TLBs may subsequently contain multiple translations for the address range one for each page size A reference to a linear address in the address range may use any of these translations Which translation is used may vary from one execution to another and the choice may be implementation specific 4 10 1 4 Global Pages The Intel 64 and 32 architectures also allow for global pages when the PGE flag bit 7 is 1 in CR4 If the G flag bit 8 is 1 in a paging structure entry that maps a page either a PTE or a paging structure entry in which the PS flag is 1 any TLB entry cached for a linear address using that paging structure entry is considered to be global Because the G flag is used only in paging structure entries that map a page and because information from such entries are not cached in the paging structure caches the global page feature does not affect the behavior of the paging structure caches 4 10 2 1 Caches for Paging Structures A processor may support any or of all the following paging structure caches PML4 cache 1A 32e paging only Each PML4 cache entry is referenced by a 9 bit value and is used for linear addresses for which bits 47 39 have that value The entry contains information from the PML4E used to translate such linear addresses The physical address from the PML4E the address of the
96. specific address within that region to which the linear address translates Each paging structure entry contains a physical address which is either the address of another paging structure or the address of a page frame In the first case the entry is Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 209 e Documentation Changes n tel said to reference the other paging structure in the latter the entry is said to map a page The first paging structure used for any translation is located at the physical address in CR3 A linear address is translated using the following iterative procedure A portion of the linear address initially the uppermost bits select an entry in a paging structure initially the one located using CR3 If that entry references another paging structure the process continues with that paging structure and with the portion of the linear address immediately below that just used If instead the entry maps a page the process completes the physical address in the entry is that of the page frame and the remaining lower portion of the linear address is the page offset The following items give an example for each of the three paging modes each example locates a 4 KByte page frame e With 32 bit paging each paging structure comprises 1024 21 entries For this reason the translation process uses 10 bits at a time from a 32 bit linear address Bits 31 22 identify the first p
97. that every store instruction that precedes the SFENCE instruction in program order becomes globally visible before any store instruction that follows the SFENCE instruction The SFENCE instruction is ordered with respect to store instructions other SFENCE instructions any LFENCE and MFENCE instructions and any serializing instructions Such as the CPUID instruction It is not ordered with respect to load instructions Weakly ordered memory types can be used to achieve higher processor performance through such techniques as out of order issue write combining and write collapsing The degree to which a consumer of data recognizes or knows that the data is weakly ordered varies among applications and may be unknown to the producer of this data SFENCE instruction provides a performance efficient way of ensuring store ordering between routines that produce weakly ordered results and routines that consume this data This instruction s operation is the same in non 64 bit modes and 64 bit mode SGDT Store Global Descriptor Table Register Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 01 0 SGDT m A Valid Valid Store GDTR to m NOTES See 1 32 Architecture Compatibility section below Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m NA NA NA SHLD Double Precision Shift Left Opcode Instruction Op 64 Bit Compat Description En M
98. the Pentium 111 processor and the LFENCE and MFENCE instructions introduced in the Pentium 4 processor provide memory ordering and serialization capabilities for specific types of memory operations The memory type range registers MTRRs can be used to strengthen or weaken memory ordering for specific area of physical memory see Section 11 11 Memory Type Range Registers MTRRs MTRRs are available only in the Pentium 4 Intel Xeon and P6 family processors Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 234 e Documentation Changes n tel The page attribute table PAT can be used to strengthen memory ordering for a specific page or group of pages see Section 11 12 Page Attribute Table PAT The PAT is available only in the Pentium 4 Intel Xeon and Pentium Ill processors These mechanisms can be used as follows Memory mapped devices and other 1 0 devices on the bus are often sensitive to the order of writes to their I O buffers 1 0 instructions can be used to the IN and OUT instructions impose strong write ordering on such accesses as follows Prior to executing an 1 0 instruction the processor waits for all previous instructions in the program to complete and for all buffered writes to drain to memory Only instruction fetch and page tables walks can pass I O instructions Execution of subsequent instruc tions do not begin until the processor determines that the I O
99. the discon nected components into a single system Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 254 e Documentation Changes n tel 10 12 2 x2APIC Register Availability The local APIC registers can be accessed via the MSR interface only when the local APIC has been switched to the x2API C mode as described in Section 10 12 1 Accessing any APIC register in the MSR address range 0800H through OBFFH via RDMSR or WRMSR when the local APIC is not in x2APIC mode causes a general protection exception In x2APIC mode the memory mapped interface is not available and any access to the MMIO interface will behave similar to that of a legacy xAPIC in globally disabled state Table 10 7 provides the interactions between the legacy amp extended modes and the legacy and register interfaces Table 10 7 MSR MMIO Interface of a Local x2APIC in Different Modes of Operation MMIO Interface MSR Interface xAPIC mode Available General protection exception X2APIC mode Behavior identical to xAPIC in globally Available disabled state 10 12 3 MSR Access in x2APIC Mode To allow for efficient access to the APIC registers in x2APIC mode the serializing seman tics of WRMSR are relaxed when writing to the APIC registers Thus system software should not use WRMSR to APIC registers in 2 mode as a serializing instruction Read and write accesses to the APIC registers will occur in pr
100. the page size is 4 MBytes and the page number comprises bits 31 22 of the linear address Ifthe translation does use a PTE the page size is 4 KBytes and the page number comprises bits 31 12 of the linear address PAE paging If the translation does not use a PTE because the PS flag is 1 in the PDE used the page size is 2 MBytes and the page number comprises bits 31 21 of the linear address Ifthe translation does uses a PTE the page size is 4 KBytes and the page number comprises bits 31 12 of the linear address A 32e paging If the translation does not use a PDE because the PS flag is 1 the PDPTE used the page size is 1 GBytes and the page number comprises bits 47 30 of the linear address Ifthe translation does use a PDE but does not uses a PTE because the PS flag is 1 in the PDE used the page size is 2 MBytes and the page number comprises bits 47 21 of the linear address Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 220 e Documentation Changes n tel Ifthe translation does use a PTE the page size is 4 KBytes and the page number comprises bits 47 12 of the linear address 4 10 1 2 Caching Translations in TLBs The processor may accelerate the paging process by caching individual translations in translation lookaside buffers TLBs Each entry in a TLB is an individual translation Each translation is referenced by a page number It con
101. times Signed divide r m64 by 2 CL times Signed divide r m32 by 2 imm8 times Signed divide r m64 by 2 imm8 times Multiply r m8 by 2 once Multiply r m8 by 2 once Multiply r m8 by 2 CL times Multiply r m8 by 2 CL times Multiply r m8 by 2 imm8 times Multiply r m8 by 2 imm8 times Multiply r m16 by 2 once Multiply r m16 by 2 CL times Multiply r m16 by 2 imm8 times Multiply r m32 by 2 once Multiply r m64 by 2 once Multiply r m32 by 2 CL times Multiply r m64 by 2 CL times Multiply r m32 by 2 imm8 times Multiply r m64 by 2 imm8 times Unsigned divide r m8 by 2 once 177 Documentation Changes Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode REX 00 5 SHR r m8 1 A Valid Unsigned divide r m8 by 2 once D2 5 SHR r m8 CL B Valid Valid Unsigned divide r m8 by 2 CL times REX D2 5 SHR r m8 CL B Valid Unsigned divide r m8 by 2 CL times 5 ib SHR r m8 imm8 Valid Valid Unsigned divide r m8 by 2 imm8 times REX 5 SHR r m8 imm8 Valid N E Unsigned divide r m8 by 2 imm8 times 01 5 SHR r m16 1 A Valid Valid Unsigned divide r m16 by 2 once D3 5 SHR r m16 CL B Valid Valid Unsigned divide r m16 by 2 CL times C1 5 ib SHR r m16 imm8 Valid Valid Unsigned divide r m16 by 2 imm8 times 01 5 SHR r m32 1 A Valid Valid Unsigned divide r m32 by 2 once REX W D1 5 SHRr m64 1 A Valid
102. used for multiple purposes see Section 4 10 2 3 software should perform invalidations for all of these purposes For example if a single entry might serve as both a PDE and PTE it may be necessary to execute INVLPG with two or more linear addresses one that uses the entry as a PDE and one that uses it as a PTE Alternatively software could use MOV to CR3 or MOV to CR4 As noted in Section 4 10 1 the TLBs may subsequently contain multiple translations for the address range if software modifies the paging structures so that the page size used for a 4 KByte range of linear addresses changes A reference to a linear address in the address range may use any of these translations Software wishing to prevent this uncertainty should not write to a paging structure entry in a way that would change for any linear address both the page size and either the page frame access rights or other attributes It can instead use the following algorithm first clear the P flag in the relevant paging structure entry 0 PDE then invalidate any translations for the affected linear addresses see Section 4 10 3 2 and then modify the relevant paging structure entry to set the P flag and establish modified translation s for the new page size 4 10 3 3 Optional Invalidation The following items describe cases in which software may choose not to invalidate and the potential consequences of that choice If a paging structure entry is modified to chan
103. xmm m128 to two packed signed doubleword integers mm Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 44 Documentation Changes intel Instruction Operand Encoding Op En A ModRM reg w Operand 1 Operand 2 Operand 3 ModRM r m NA NA Operand 4 CVTPD2PS Convert Packed Double Precision FP Values to Packed Single Precision FP Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF 5A CVTPD2PS xmm1 Valid Valid Convert two packed double xmm2 m128 precision floating point values in xmm2 m128 to two packed single precision floating point values in 1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA CVTPI2PD Convert Packed Dword Integers to Packed Double Precision FP Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF 2A r CVTPI2PDxmm A Valid Valid Convert two packed signed mm m64 doubleword integers from mm mem64 to two packed double precision floating point values in xmm NOTES Operation is different for different operand sets see the Description section Instruction Operand Encoding Op En A ModRM reg w Operand 1 Operand 2 Operand 3 ModRM r m r NA NA Operand 4 Intel 64 and 32 Architectures Software Developer s Manual Documentatio
104. 0 r ADC r m8 r8 A Valid N E Add with carry byte register to r m64 11 r ADC r m16 r16 A Valid Valid Add with carry r16 to r m16 11 r ADC r m32 r32 A Valid Valid Add with CF r32 to r m32 Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 12 Documentation Changes Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode REXW 11 r ADCr m64 r64 A Valid N E Add with CF r64 to r m64 12 r ADC r8 r m8 A Valid Valid Add with carry r m8 to byte register REX 12 r ADC r8 r m8 A Valid N E Add with carry r m64 to byte register 13 r ADC r16 r m16 A Valid Valid Add with carry r m16 to r16 13 r ADC r32 r m32 A Valid Valid Add with CF r m32 to r32 REXW 13 r ADCr64 r m64 A Valid N E Add with CF r m64 to r64 NOTES In 64 bit mode r m8 can not be encoded to access the following byte registers if a REX prefix is used AH BH CH DH Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m w ModRM reg NA NA B ModRM r m w imm8 NA NA C AL AX EAX RAX imm8 NA NA ADD Add Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode 04 ib ADD AL imm8 C Valid Valid Add imm8 to AL 05 iw ADD AX imm16 C Valid Valid Add imm16 to AX 05 id ADDEAX imm32 C Valid Valid Add imm32 to EAX REX W 05 id ADDRAX imm32 Valid N E Add imm32 sign extended to 64 bits to RAX 80 0 ib ADDr m8 imm8 Vali
105. 0 3ib SBBr m8 imm8 Valid N E Subtract with borrow imm8 from r m8 81 3 iw SBB r m16 B Valid Valid Subtract with borrow imm16 imm16 from r m16 81 3 id SBB r m32 B Valid Valid Subtract with borrow imm32 imm32 from r m32 REX W 81 3 SBBr m64 B Valid N E Subtract with borrow sign id imm32 extended imm32 to 64 bits from r m64 83 3 ib SBB r m16 imm8 B Valid Valid Subtract with borrow sign extended imm8 from r m16 83 3 ib SBB r m32 imm8 B Valid Valid Subtract with borrow sign extended imm8 from r m32 REXW 83 3 SBBr m64 imm8 Valid N E Subtract with borrow sign ib extended imm8 from r m64 18 r SBB r m8 r8 C Valid Valid Subtract with borrow r8 from r m8 REX 18 r SBB r m8 r8 C Valid N E Subtract with borrow r8 from r m8 19 r SBB r m16 r16 C Valid Valid Subtract with borrow r16 from r m16 19 r SBB r m32 r32 C Valid Valid Subtract with borrow r32 from r m32 REXW 19 r SBB r m64 r64 Valid Subtract with borrow r64 from r m64 1A SBB r8 r m8 D Valid Valid Subtract with borrow r m8 from r8 REX 1A r SBB r8 r m8 D Valid N E Subtract with borrow r m8 from r8 1B r SBB r16 r m16 D Valid Valid Subtract with borrow r m16 from r16 Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 179 chenes intel Opcode Instruction Op 64 Bit Compat Description En Leg Mode 1B r SBB r32 r m32 D Valid Valid Su
106. 1 66 0F 38 03 r PHADDSW xmml Valid Valid Add 16 bit signed integers xmm2 m128 horizontally pack saturated integers to XMM1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA PHMINPOSUW Packed Horizontal Word Minimum Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 0F3841 r PHMINPOSUW A Valid Valid Find the minimum unsigned xmm1 word in xmm2 m128 and xmm2 m128 place its value in the low word of xmm1 and its index in the second lowest word of xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 128 H intel PHSUBW PHSUBD Packed Horizontal Subtract Opcode Instruction Op 64 Bit Compat Description En Leg Mode OF 38 05 Jr PHSUBW mm1 A Valid Valid Subtract 16 bit signed mm2 m64 integers horizontally pack to MM1 66 0 3805 PHSUBW xmml A Valid Valid Subtract 16 bit signed xmm2 m128 integers horizontally pack to XMM1 OF 38 06 r PHSUBD mm1 A Valid Valid Subtract 32 bit signed mm2 m64 integers horizontally pack to MM1 66 38 06 PHSUBDxmml A Valid Valid Subtract 32 bit signed xmm2 m128 integers horizontally pack to XMM1 Instruction Operand Encoding Op En Operand 1 Operand 2 Opera
107. 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m r NA NA CVTPS2P Convert Packed Single Precision FP Values to Packed Dword Integers Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 2D r CVTPS2PI mm A Valid Valid Convert two packed single xmm m64 precision floating point values from xmm m64 to two packed signed doublew ord integers mm Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m r NA NA CVTSD2SI Convert Scalar Double Precision FP Value to Integer Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode F2 OF 2D r CVTSD2SI r32 A Valid Valid Convert one double xmm m64 precision floating point value from xmm m64 to one signed doubleword integer r32 F2REXWOF CVTSD2SIr64 A Valid N E Convert one double 2D r xmm m64 precision floating point value from xmm m64 to one signed quadword integer sign extended into r64 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 47 Documentation Changes CVTSD2SS Convert Scalar Double Precision FP Value to Scalar Single Precision FP Value Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode F2 OF 5A r CVTSD2SS 1 Valid Va
108. 119 Ignored M 1 12 Physical address of 4 KByte aligned page directory referenced by this entry 63 M Reserved must be 0 NOTES 1 M is an abbreviation for MAXPHYADDR which is at most 52 see Section 4 14 Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 213 e Documentation Changes n tel 4 5 IA 32bE PAGING A logical processor uses A 32e paging if CRO PG 1 CR4 PAE 1 and 2 EFER LME 1 With IA 32e paging linear address are translated using a hierarchy of in memory paging structures located using the contents of A 32e paging trans lates 48 bit linear addresses to 52 bit physical addresses Although 52 bits corresponds to 4 PBytes linear addresses are limited to 48 bits at most 256 TBytes of linear address space may be accessed at any given time A 32e paging uses a hierarchy of paging structures to produce a translation for a linear address CR3 is used to locate the first paging structure the PML4 table Table 4 12 illustrates how is used with A 32e paging Table 4 12 Use of CR3 with IA 32e Paging Bit Contents Position s 2 0 Ignored 3 PWT Page level write through indirectly determines the memory type used to access the PML4 table during linear address translation see Section 4 9 4 PCD Page level cache disable indirectly determines the memory type used to access the PML4 tabl
109. 128 for greater than OF 66 r PCMPGTD mm A Valid Valid Compare packed signed mm m64 doubleword integers in mm and mm m64 for greater than 66 OF 66 r PCMPGTD xmml A Valid Valid Compare packed signed xmm2 m128 doubleword integers in xmm1 xmm2 m128 for greater than Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 125 chenes intel PCMPGTQ Compare Packed Data for Greater Than Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF 38 37 PCMPGTQ Valid Valid Compare packed qwords in xmm1 xmm2 m12 xmm2 m128 and xmm1 for 8 greater than Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA PEXTRB PEXTRD PEXTRQ Extract Byte Dword Qword Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF 14 reg m8 Valid Valid Extract a byte integer value Ir ib xmm2 imm8 from xmm2 at the source byte offset specified by imm8 into rreg or m8 The upper bits of r32 or r64 are zeroed 66 0F 3A 16 PEXTRDr m32 A Valid Valid Extract a dword integer Ir ib xmm2 imm8 value from xmm2 at the source dword offset specified by imm8 into r m32 66 REX W OF PEXTRQr m64 A Valid N E Extract a qword i
110. 2 mem64 and xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA MINSS Return Minimum Scalar Single Precision Floating Point Value Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 5D r MINSS xmm1 A Valid Valid Return the minimum scalar xmm2 m32 single precision floating point value between xmm2 mem32 and 1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 87 Documentation Changes MONITOR Set Up Monitor Address Opcode OF 01 C8 Instruction MONITOR Op 64 Bit En Mode A Valid Compat Leg Mode Valid Description Sets up a linear address range to be monitored by hardware and activates the monitor The address range should be a write back memory caching type The address is DS EAX DS RAX in 64 bit mode Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA MOV Move Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 88 r MOV r m8 r8 A Valid Valid Move r8 to r m8 REX 88 r MOV r m8 8 Valid N E Move r8 to r m8 89 r MOV r m16 r16 A Valid Valid Move r16 to r m16 89 r MOV r m32 r32 A Valid Valid Move r32 to r m32
111. 4D r CMOVGE r32 r m32 A Valid Valid Move if greater or equal SF OF REX W 0F 4D CMOVGE r64 r m64 A Valid N E Move if greater or equal SF OF Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 30 Documentation Changes ntel Opcode OF 4C r OF 4C r REX W 0F 4C OF 4E r OF 4E r REX W 4E OF 46 r OF 46 r REX W 0 46 OF 42 r OF 42 r REX W 0F 42 OF 43 r OF 43 r REX W 0F 43 Ir OF 47 r OF 47 r REX W 0F 47 Ir OF 43 r OF 43 r REX W 0F 43 Ir OF 45 r OF 45 r REX W 0F 45 Ir OF 4E r Instruction CMOVL r16 r m16 CMOVL r32 r m32 CMOVL r64 r m64 CMOVLE r16 r m16 CMOVLE r32 r m32 CMOVLE r64 r m64 CMOVNA r16 r m16 CMOVNA r32 r m32 CMOVNA r64 r m64 CMOVNAE r16 r m16 CMOVNAE r32 r m32 CMOVNAE r64 r m64 CMOVNB r16 r m16 CMOVNB r32 r m32 CMOVNB r64 r m64 CMOVNBE r16 r m16 CMOVNBE r32 r m32 CMOVNBE r64 r m64 CMOVNC 16 r m16 CMOVNC r32 r m32 CMOVNC r64 r m64 r16 r m16 CMOVNE r32 r m32 r64 r m64 CMOVNG r16 r m16 Op En A A A 64 Bit Mode Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Compat Leg Mode Valid Valid N E
112. 5 Volume 3A 5 Updates to Chapter 8 Volume 3A 6 Updates to Chapter 10 Volume 3A 7 Updates to Chapter 15 Volume 3A 8 Updates to Chapter 21 Volume 3B 9 Updates to Chapter 22 Volume 3B 10 Updates to Chapter 25 Volume 3B 11 Updates to Chapter 27 Volume 3B 12 Updates to Chapter 30 Volume 3B 13 Updates to Appendix A Volume 3B 14 Updates to Appendix B Volume 3B 15 Updates to Appendix G Volume 3B Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes intel Documentation Changes 1 Updates to Chapter 3 Volume 2A Change bars show changes to Chapter 3 of the Intel 64 and IA 32 Architectures Soft ware Developer s Manual Volume 2A Instruction Set Reference A M 3 1 1 Instruction Format The following is an example of the format used for each instruction description in this chapter The heading below introduces the example The table below provides an example summary table CMC Complement Carry Flag this is an example Opcode Instruction Op En 64 bit Compat Description Mode Leg Mode F5 CMC A Valid Valid Complement carry flag Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA Operand Encoding Column in the Instruction Summary Table The operand encoding column is abbreviated as Op En in the Instruction Summary table heading Instruction operand encoding information is provided for each a
113. 64 imm8 double precision floating point value in xmm2 m64 and place the result in xmm1 The rounding mode is determined by imm8 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m r imm8 NA ROUNDSS Round Scalar Single Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 0F3A0A r ROUNDSS xmml A Valid Valid Round the low packed single ib xmm2 m32 imm8 precision floating point value in xmm2 m32 and place the result in xmm1 The rounding mode is determined by imm8 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m r imm8 NA RSM Resume from System Management Mode Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF AA RSM A Invalid Valid Resume operation of interrupted program Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 174 ee ehanas intel RSQRTPS Compute Reciprocals of Square Roots of Packed Single Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 52 r RSQRTPSxmml1 Valid Valid Computes the approximate xmm2 m128 reciprocals of the square roots of the packed single precision floating point v
114. 64 A Valid Unsigned divide RDX RAX by r m64 with result stored RAX Quotient RDX Remainder NOTES n 64 bit mode r m8 can not be encoded to access the following byte registers if a REX prefix is used AH BH CH DH Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 54 H intel Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m NA NA NA DIVPD Divide Packed Double Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF 5E r DIVPD xmm1 A Valid Valid Divide packed double xmm2 m128 precision floating point values in xmm1 by packed double precision floating point values xmm2 m128 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA DIVPS Divide Packed Single Precision Floating Point Values Opcode Instruction 64 Bit Compat Description En Mode Leg Mode OF 5E r DIVPS xmm1 A Valid Valid Divide packed single xmm2 m128 precision floating point values in xmm1 by packed single precision floating point values xmm2 m128 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA DIVSD Divide Scalar Double Precision Floating Point Values Opcode Instruction Op 64 Bit
115. 7 6 514 3 21 2110 9 8 7 6 54 32 1 0 9 8 7 6 514 3 211 0 9 8 7 65 41321 4 P Reserved2 Address of PML4 table Ignored C D PML4E Ignored s Address of page directory pointer table present PML4E Ignored not present Address of PDPTE Ignored 1GB page Reserved 1GB frame T page PDPTE Ignored Address of page directory page directory PDTPE Ignored not present PDE Address of Ignored 2MB page frame Reserved num PDE Ignored Address of page table page table PDE Ignored not present PTE Ignored Address of 4KB page frame 4KB page PTE Ignored not present NOTES 1 Mis an abbreviation for MAXPHYADDR 2 Reserved fields must be 0 3 If A32 EFER NXE 0 and the P flag of a paging structure entry is 1 the XD flag bit 63 is reserved Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 218 ee ehanas intel 4 7 PAGE FAULT EXCEPTIONS Accesses using linear addresses may cause page fault exceptions PF exception 14 An access to a linear address may cause page fault exception for either of two reasons 1 there is no valid translation for the linear address or 2 there is a valid translation for the linear address but its access rights do not permit the access As noted in Section 4 3 Section 4 4 2 and Section 4 5 there is no valid tr
116. A Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 76 H intel LDS LES LFS LGS LSS Load Far Pointer Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode C5 r 105 16 16 16 Invalid Valid Load DS r16 with far pointer from memory C5 r LDS r32 m16 32 Invalid Valid Load DS r32 with far pointer from memory OF B2 r 155 16 16 16 A Valid Valid Load SS r16 with far pointer from memory OF B2 r 155 r32 m16 32 A Valid Valid Load SS r32 with far pointer from memory REX 0F B2 r 155 64 1664 Valid N E Load SS r64 with far pointer from memory C4 LESr16 m16 16 A Invalid Valid Load ES r16 with far pointer from memory C4 LES r32 m16 32 Invalid Valid Load 5732 with far pointer from memory OF B4 r LFSr16 m16 16 A Valid Valid Load FS r16 with far pointer from memory OF B4 LFS r32 m16 32 Valid Valid Load FS r32 with far pointer from memory REX 0F B4 r LFSr64 m1664 Valid Load FS r64 with far pointer from memory OF B5 r 165 16 16 16 A Valid Valid Load GS r16 with far pointer from memory OF B5 r LGS r32 m16 32 A Valid Valid Load GS r32 with far pointer from memory REX 0F 5 165 64 16 64 A Valid N E Load GS r64 with far pointer from memory Instruction Operand Encoding Op En Operand 1 A ModRM reg w Operand 2 ModRM r m Operand 3 Operand 4 NA NA
117. A 4 ib r m16 imm8 Valid Valid Store selected bit in CF flag OF BA 4 ib r m32 imm8 Valid Valid Store selected bit in CF flag REX W O0FBA BTr m64 imm8 Valid N E Store selected bit in CF flag 4 ib Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m ModRM reg NA NA B ModRM r m imm8 NA NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 23 Documentation Changes BTC Bit Test and Complement Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode OF BB BTC r m16 16 A Valid Valid Store selected bit in CF flag and complement OF BB BTC r m32 r32 A Valid Valid Store selected bit in CF flag and complement REX W 0F BB BTCr m64 r64 A Valid N E Store selected bit in CF flag and complement OF BA 7 ib BTCr m16 imm8 B Valid Valid Store selected bit in CF flag and complement OF BA 7 ib BTCr m32 imm8 B Valid Valid Store selected bit in CF flag and complement REX W O0F BA BTCr m64 imm8 Valid N E Store selected bit in CF flag 7 ib and complement Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m w ModRM reg NA NA B ModRM r m w imm8 NA NA BTR Bit Test and Reset Opcode Instruction Op 64 bit Compat Description En Leg Mode OF B3 BTR r m16 r16 A Valid Valid Store selected bit in CF flag and clear OF B3 BTR r m32 r32 A Val
118. A NA PSLLW PSLLD PSLLQ Shift Packed Data Left Logical Opcode OF Fl r 66 OF Fl r OF 71 6 ib 66 0F 71 6 ib OF F2 r 66 OF F2 r OF 72 6 ib 66 OF 72 6 ib 66 OF F3 OF 73 6 ib Instruction PSLLW mm mm m64 PSLLW xmm1 xmm2 m128 PSLLW xmm1 imm8 PSLLW xmm1 imm8 PSLLD mm mm m64 PSLLD xmm1 xmm2 m128 PSLLD mm imm8 PSLLD xmm1 imm8 PSLLQ mm mm m64 PSLLQ xmm1 xmm2 m128 PSLLQ mm imm8 Op En A 64 Bit Mode Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Compat Leg Mode Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes Description Shift words in mm left mm m64 while shifting in 0s Shift words in xmm1 left by xmm2 m128 while shifting in Os Shift words in mm left by imm8 while shifting in Os Shift words in xmm1 left by imm8 while shifting in Os Shift doublewords in mm left by mm m64 while shifting in Os Shift doublewords in xmm1 left by xmm2 m128 while shifting in Os Shift doublewords in mm left by imm8 while shifting in 0s Shift doublewords in xmm1 left by imm8 while shifting in 0s Shift quadword in mm left by mm m64 while shifting in Os Shift quadwords in xmm1 left by xmm2 m128 while shifting in Os Shift quad
119. B r Instruction PACKSSWB mm1 mm2 m64 PACKSSWB xmm1 xmm2 m128 PACKSSDW mm1 mm2 m64 PACKSSDW xmm1 xmm2 m128 Op En A 64 Bit Mode Valid Valid Valid Valid Compat Leg Mode Valid Valid Valid Valid Description Converts 4 packed signed word integers from mm1 and from mm2 m64 into 8 packed signed byte integers in mm1 using signed saturation Converts 8 packed signed word integers from xmm1 and from xxm2 m128 into 16 packed signed byte integers in xxm1 using signed saturation Converts 2 packed signed doubleword integers from mm1 and from mm2 m64 into 4 packed signed word integers in mm1 using signed saturation Converts 4 packed signed doubleword integers from xmm1 and from xxm2 m128 into 8 packed signed word integers in xxm1 using signed saturation Instruction Operand Encoding Op En A Operand 1 ModRM reg r w Operand 2 ModRM r m Operand 3 NA NA Operand 4 Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 116 Documentation Changes PACKUSDW Pack with Unsigned Saturation Opcode 66 OF 38 2B r Instruction xmm2 m128 PACKUSDW xmm1 Op 64 Bit En Mode Valid Compat Leg Mode Valid Description Convert 4 packed signed doubleword integers from xmm1 and 4 packed signed doubleword integers from xmm2 m128 into 8 packed unsigned word integers in xmm1 using
120. Bitwise Logical OR Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF EB r POR mm mm m64 A Valid Valid Bitwise OR of mm m64 and mm 66 OF EB r POR xmm1 A Valid Valid Bitwise OR of xmm2 m128 xmm2 m128 and xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 145 chenes intel PREFETCHh Prefetch Data Into Caches Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 18 1 PREFETCHTO m8 A Valid Valid Move data from m8 closer to the processor using TO hint OF 18 2 PREFETCHT1 m8 A Valid Valid Move data from m8 closer to the processor using T1 hint OF 18 3 PREFETCHT2 m8 A Valid Valid Move data from m8 closer to the processor using T2 hint OF 18 0 PREFETCHNTA m8 Valid Valid Move data from m8 closer to the processor using NTA hint Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m r NA NA NA PSADBW Compute Sum of Absolute Differences Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF F6 r PSADBW mm1 A Valid Valid Computes the absolute mm2 m64 differences of the packed unsigned byte integers from mm2 m64 and mm1 differences are then summed to produce an unsigned word integer result 66 OF F6 r
121. C mode the layout of the Interrupt Command Register is shown in Figure 10 12 The lower 32 bits of ICR x2APIC mode is identical to the lower half of the ICR in xAPIC mode except the Delivery Status bit is removed since it is not needed in x2APIC mode The destination ID field is expanded to 32 bits in x2APIC mode 63 32 Destination Field 31 2019181716 151413121110 8 7 0 Reserved Vector Destination Shorthand Delivery Mode 00 No Shorthand 000 Fixed 01 Self 001 Reserved 10 All Including Self 010 SMI 11 All Excluding Self 011 Reserved 100 NMI 101 INIT 110 Start Up Reserved 111 Reserved Destination Mode 0 Physical 1 Logical Level Address 830H 63 0 pe aaen Asser Value after Reset OH Trigger Mode 0 Edge 1 Level Figure 10 29 Interrupt Command Register ICR in x2APIC Mode To send an IPI using the ICR software must set up the ICR to indicate the type of IPI message to be sent and the destination processor or processors Self IPIs can also be sent using the SELF IPI register see Section 10 12 11 A single MSR write to the Interrupt Command Register is required for dispatching an interrupt in x2APIC mode With the removal of the Delivery Status bit system software no longer has a reason to read the ICR It remains readable only to aid in debugging however software should not assume the value returned
122. C15_ADDR1 MC15_ADDR 06_2EH 43FH 1087 IA32_MC15_MISC MC15_MISC 06_2EH 440H 1088 IA32_MC16_CTL MC16_CTL 06_2EH 441H 1089 IA32_MC16_STATUS MC16_STATUS 06_2EH 442H 1090 IA32_MC16_ADDR1 MC16_ADDR 06_2EH 443H 1091 IA32_MC16_MISC MC16_MISC 06_2EH 444H 1092 IA32 MC17 17 06 2 445 1093 A32 MC17 STATUS MC17 STATUS 06 2EH 446H 1094 IA32 17 ADDR 17 ADDR 06 2EH 447H 1095 IA32 17 MISC 17 MISC 06 2EH 448H 1096 IA32 MC18 CTL MC18 CTL 06 2 449H 1097 32 MC18 STATUS MC18 STATUS 06 2EH 44AH 1098 IA32 MC18 ADDR MC18 ADDR 06 2EH 44BH 1099 32 MC18 MISC MC18 MISC 06 2EH 44CH 1100 IA32 19 19 06 2 44DH 1101 IA32_MC19_STATUS MC19_STATUS 06_2EH 44EH 1102 IA32_MC19_ADDR1 MC19_ADDR 06_2EH 44FH 1103 IA32_MC19_MISC MC19_MISC 06_2EH 450H 1104 IA32_MC20_CTL MC20_CTL 06_2EH 451H 1105 IA32_MC20_STATUS MC20_STATUS 06_2EH 452H 1106 IA32_MC20_ADDR1 MC20_ADDR 06_2EH 453H 1107 IA32_MC20_MISC MC20_MISC 06_2EH 454H 1108 IA32 MC21 MC21 CTL 06 2EH 455H 1109 A32 MC21 STATUS MC21 STATUS 06 2EH 456H 1110 IA32 MC21 ADDR MC21 ADDR 06 2EH 457H 1111 IA32 MC21 MISC MC21 MISC 06 2EH Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 286 Documentation Changes Table B 5 MSRs in Processors Based on Intel Microarchitecture Continued Nehalem Register Scope
123. Compare packed qwords in xmm2 m128 xmm2 m128 and 1 for equality Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 123 chenes intel PCMPESTRI Packed Compare Explicit Length Strings Return Index Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF 3A61 r PCMPESTRI A Valid Valid Perform a packed imm8 xmm1 comparison of string data xmm2 m128 with explicit lengths imm8 generating an index and storing the result in ECX Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg ModRM r m imm8 NA PCMPESTRM Packed Compare Explicit Length Strings Return Mask Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF 60 PCMPESTRM Valid Valid Perform a packed imm8 xmm1 comparison of string data xmm2 m128 with explicit lengths imm8 generating a mask and storing the result in XMMO Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg ModRM r m r imm8 NA PCMPISTRI Packed Compare Implicit Length Strings Return Index Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF 63 PCMPISTRI xmm1 A Valid Valid Perform a packed imm8
124. Compat Description En Mode Leg Mode F2 OF 5E r DIVSD xmm1 A Valid Valid Divide low double precision xmm2 m64 floating point value n xmm1 by low double precision floating point value in xmm2 mem64 Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 55 H intel Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA DIVSS Divide Scalar Single Precision Floating Point Values Opcode Instruction 64 Bit Compat Description En Mode Leg Mode 5E r 0155 xmm1 Valid Valid Divide low single precision xmm2 m32 floating point value in xmm1 by low single precision floating point value in xmm2 m32 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA DPPD Dot Product of Packed Double Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Leg Mode 66 OF 41 r DPPDxmml A Valid Valid Selectively multiply packed ib xmm2 m128 DP floating point values imm8 from xmm1 with packed DP floating point values from xmm2 add and selectively store the packed DP floating point values to xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r imm8 NA Intel 64 and IA 32 Architecture
125. D1 1 D3 1 C1 1ib D1 1 Instruction ROL r m8 imm8 ROL r m8 imm8 ROL r m16 1 ROL r m16 CL ROL r m16 imm8 ROL r m32 1 ROL r m64 1 ROL r m32 CL ROL r m64 CL ROL r m32 imm8 ROL r m64 imm8 ROR r m8 1 ROR r m8 1 ROR r m8 CL ROR r m8 CL ROR r m8 imm8 ROR r m8 imm8 ROR r m16 1 ROR r m16 CL ROR r m16 imm8 ROR r m32 1 Op En C B 64 Bit Mode Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Compat Leg Mode Valid N E Valid Valid Valid Valid N E Valid N E Valid N E Valid N E Valid N E Valid N E Valid Valid Valid Valid Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes Description Rotate 8 bits r m8 left imm8 times Rotate 8 bits r m8 left imm8 times Rotate 16 bits r m16 left once Rotate 16 bits r m16 left CL times Rotate 16 bits r m16 left imm8 times Rotate 32 bits r m32 left once Rotate 64 bits r m64 left once Uses a 6 bit count Rotate 32 bits r m32 left CL times Rotate 64 bits r m64 left CL times Uses a 6 bit count Rotate 32 bits r m32 left imm times Rotate 64 bits r m64 left imm8 times Uses a 6 bit count Rotate 8 bits r m8 right once Rotate 8 bits r m8 right once Rotate 8 bits r m8 right CL times Rot
126. DR Package See Section 15 3 2 3 A32 MCi ADDR MSRs 447H 1095 MSR 17 MISC Package See Section 15 3 24 IA32 MCi MISC MSRs 448H 1096 MSR 18 Package See Section 15 3 2 1 A32 MCi MSRs 449H 1097 MSR MC18 Package See Section 15 3 22 A32 MCi STATUS STATUS MSRS and Appendix E 44AH 1098 MSR 18 ADDR Package See Section 15 3 2 3 IA32 MCi ADDR MSRs 44BH 1099 MSR MC18 MISC Package See Section 15 3 24 32 MCi MISC MSRs 44CH 1100 19 CTL Package See Section 15 3 2 1 4 32 MCi MSRs 44DH 1101 MSR_MC19_ Package See Section 15 3 2 2 4 32 MCi STATUS STATUS MSRS and Appendix E 44EH 1102 MSR 19 ADDR Package See Section 15 3 2 3 IA 32 MCi ADDR MSRs 44FH 1103 MSR MC19 MISC Package See Section 15 3 24 A32 MCi MISC MSRs 450H 1104 MSR MC20 CTL Package See Section 15 3 2 1 1 32 MCi MSRs 451H 1105 MSR MC20 Package See Section 15 3 22 A32 MCi STATUS STATUS MSRS and Appendix E 452H 1106 MSR MC20 ADDR Package See Section 15 3 2 3 IA 32 MCi ADDR MSRs 453H 1107 MSR MC20 MISC Package See Section 15 3 24 A32 MCi MISC MSRs 454H 1108 MSR MC21 CTL Package See Section 15 3 2 1 1 32 MCi MSRs 455H 1109 MSR MC21 Package See Section 15 3 22 A32 MCi STATUS STATUS MSRS and Appendix E 456H 1110 MSR MC21 ADDR Package See Section 15 3 2 3
127. E 1 determines whether the translation is global see Section 4 10 ignored otherwise 1139 Ignored 12 PAT Indirectly determines the memory type used to access the 1 GByte page referenced by this entry see Section 4 9 2913 Reserved must be 0 M 1 30 Physical address of the 1 GByte page referenced by this entry 51 M Reserved must be 0 62 52 Ignored 1 The PS flag of a PDPTE is reserved and must be 0 if the P flag is 1 if 1 GByte pages are not sup ported See Section 4 1 4 for how to determine whether 1 GByte pages are supported Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 215 Documentation Changes Table 4 14 Format of IA 32e Page Directory Pointer Table Entry PDPTE that Maps a 1 GByte Page Continued Bit Position s Contents 63 XD If IA32_EFER NXE 1 execute disable if 1 instruction fetches are not allowed from the 1 GByte page controlled by this entry see Section 4 6 otherwise reserved must be 0 NOTES 1 The PAT is supported on all processors that support IA 32e paging Bits 51 30 are from the PDPTE Bits 29 0 are from the original linear address Ifthe PDE s PS flag is 0 a 4 KByte naturally aligned page directory is located at the physical address specified in bits 51 12 of the PDPTE see Table 4 15 A page directory comprises 512 64 bit entries PDEs
128. ET instruction This action indicates that the servicing of the current interrupt is complete and the local APIC can issue the next inter rupt from the ISR Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 247 e Documentation Changes n tel 31 0 Address OFEEO 00BOH Value after reset 0H Figure 10 21 EOI Register Upon receiving and EOI the APIC clears the highest priority bit in the ISR and dispatches the next highest priority interrupt to the processor If the terminated interrupt was a level triggered interrupt the local APIC also sends an end of interrupt message to all 1 O APICs System software may prefer to direct EOIs to specific 1 0 APICs rather than having the local APIC send end of interrupt messages to all 1 APICs Software can inhibit the broadcast of EOI message by setting bit 12 of the Spurious Interrupt Vector Register see Section 10 9 If this bit is set a broadcast EOI is not generated on an EOI cycle even if the associated TMR bit indicates that the current inter rupt was level triggered The default value for the bit is 0 indicating that EOI broadcasts are performed Bit 12 of the Spurious Interrupt Vector Register is reserved to 0 if the processor does not support suppression of EOI broadcasts Support for EOI broadcast suppression is reported in bit 24 in the Local APIC Version Register see Section 10 4 8 the feature is supported if
129. En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA SWAPGS Swap GS Base Register Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 01 7 SWAPGS A Valid Invalid Exchanges the current GS base register value with the value contained in MSR address 0000102 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 194 chenes intel SYSCALL Fast System Call Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 05 SYSCALL A Valid Invalid Fast call to privilege level 0 system procedures Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA SYSENTER Fast System Call Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 34 SYSENTER A Valid Valid Fast call to privilege level 0 system procedures Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA Operation IF CRO PE 0 THEN GP 0 FI IF SYSENTER CS 5 15 2 0 THEN GP 0 FI EFLAGS VM lt 0 ensures protected mode execution EFLAGS IF lt 0 Mask interrupts EFLAGS RF lt 0 CS SEL lt SYSENTER CS MSR Operating system provides CS Set rest of CS to a fixed value CS BASE lt 0 Flat s
130. HDQ mm A Valid Valid Unpack and interleave high mm m64 order doublewords from mm and mm m64 into mm 66 OF 6A r PUNPCKHDQ A Valid Valid Unpack and interleave high xmm1 order doublewords from xmm2 m128 xmm1 and xmm2 m128 into xmm1 66 OF 6D r PUNPCKHQDQ A Valid Valid Unpack and interleave high xmm1 order quadwords from xmm2 m128 xmm1 and xmm2 m128 into xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 156 Documentation Changes intel PUNPCKLBW PUNPCKLWD PUNPCKLDQ PUNPCKLQDQ Unpack Low Data Opcode Instruction Op 64 Bit En Mode OF 60 r PUNPCKLBW mm A Valid mm m32 66 OF 60 r PUNPCKLBW A Valid xmml xmm2 m128 OF 61 r PUNPCKLWD mm A Valid mm m32 66 OF 61 r PUNPCKLWD A Valid xmml xmm2 m128 OF 62 r PUNPCKLDQ mm A Valid mm m32 66 OF 62 r PUNPCKLDQ A Valid xmm2 m128 66 OF 6C r PUNPCKLQDQ A Valid xmml xmm2 m128 Compat Leg Mode Valid Valid Valid Valid Valid Valid Valid Description Interleave low order bytes from mm and mm m32 into mm Interleave low order bytes from xmm1 and xmm2 m128 into xmm1 Interleave low order words from mm and mm m32 into mm Interleave low order words from xmm1 and xmm2 m128 into xmm1 Interleave low order doublewords from mm and mm m32 into mm Inter
131. HG8B m64 Valid Valid Compare EDX EAX with m64 If equal set ZF and load ECX EBX into m64 Else clear ZF and load m64 into EDX EAX REX W 0F C7 CMPXCHG16B A Valid Compare RDX RAX with 1 m128 m128 m128 If equal set ZF and load RCX RBX into m128 Else clear ZF and load m128 into RDX RAX NOTES See IA 32 Architecture Compatibility section below Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m w NA NA NA Scalar Ordered Double Precision Floating Point Values and Set EFL Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF 2F r COMISD xmm1 A Valid Valid Compare low double xmm2 m64 precision floating point values in xmm1 and xmm2 mem64 and set the EFLAGS flags accordingly Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg ModRM r m r NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 40 H intel COMISS Compare Scalar Ordered Single Precision Floating Point Values and Set EFLAGS Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 2F r COMISS xmm1 A Valid Valid Compare low single xmm2 m32 precision floating point values in xmm1 and xmm2 mem32 and set the EFLAGS flags accordingly Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg
132. INVLPGm A Valid Valid Invalidate TLB Entry for page that contains m NOTES See the IA 32 Architecture Compatibility section below Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m r NA NA NA IRET IRETD Interrupt Return Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode CF IRET A Valid Valid Interrupt return 16 bit operand size CF IRETD A Valid Valid Interrupt return 32 bit operand size REX W CF IRETQ A Valid N E Interrupt return 64 bit operand size Instruction Operand Encoding Op En Operand 1 A NA Operand 2 Operand 3 Operand 4 NA NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 68 Documentation Changes Jec Jump if Condition Is Met intel Opcode 71 cb 73 cb 72 cb 76 cb 72 cb E3 cb E3 cb E3 cb 74 cb 7F cb 7D cb 7C cb 7E cb 76 cb 72 cb 73 cb 17 cb 73 cb 75 cb 7E cb 7C cb 7D cb 7F cb Instruction JA rel8 JAE rel8 JB rel8 JBE rel8 JC rel8 JCXZ rel8 JECXZ rel8 JRCXZ rel8 JE rel8 JG rel8 JGE rel8 JL rel8 JLE rel8 JNA rel8 JNAE rel8 JNB rel8 JNBE rel8 JNC rel8 JNE rel8 JNG rel8 JNGE rel8 JNL rel8 JNLE rel8 Op En A gt gt gt gt gt 64 Bit Mode Valid Valid Valid Valid Valid N E Valid Valid Valid Valid Valid Valid Valid Valid Valid Va
133. Leg Mode Valid N E Valid Valid N E Valid N E Valid Valid N E Valid N E Valid Valid Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes Description Input E CX bytes from port DX into ES E DI Input RCX bytes from port DX into RDI Input E CX words from port DX into EST E DI Input E CX doublew ords from port DX into EST E DI Input RCX default size from port DX into RDI Move E CX bytes from DS1 E SI to ES1 E DI Move RCX bytes from RSI to RDI Move E CX words from DS1 E SI to ES1 E DI Move E CX doublew ords from DS 1 E SI to ES E DI Move RCX quadwords from RSI to RDI Output E CX bytes from DS 1 E SI to port DX Output RCX bytes from RSI to port DX Output E CX words from DS 1 E SI to port DX Output E CX doublew ords from DS E SI to port DX 170 Documentation Changes Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode F3 REX W 6F REP OUTS DX A Valid N E Output RCX default size r m32 from RSI to port DX F3 AC REP LODS AL A Valid Valid Load E CX bytes from DS E SI to AL F3 REX W AC REP LODS AL A Valid N E Load RCX bytes from RSI to AL F3AD REP LODS AX A Valid Valid Load E CX words from DS1 E SI to AX F3 AD REP LODS EAX A Valid Valid Load E CX doublewords from 05 51 to EAX
134. MSR set to the I state from M state 301H with mask 1H OCH 08H UNC_GQ_SNOOP GOT Counts the number of remote snoops Requires 01 S that have requested a cache line be writing MSR set to the I state from S state 301H with mask 4H 2AH 07H UNC QMC OCCUPAN Normal read request occupancy for CY ANY any channel 32H 01H UNC IMC RETRY CH Counts number of IMC DRAM channel 0 0 retries DRAM retry only occurs when configured in RAS mode 32H 02H UNC IMC RETRY CH Counts number of IMC DRAM channel 1 1 retries DRAM retry only occurs when configured in RAS mode 32H 04H UNC IMC RETRY CH Counts number of IMC DRAM channel 2 2 retries DRAM retry only occurs when configured in RAS mode 32H 07H UNC IMC RETRYAN Counts number of IMC DRAM retries Y from any channel DRAM retry only occurs when configured in RAS mode 33H 01H UNC QHL FRC ACK Counts number of Force Acknowledge CNFLTS IOH Conflict messages sent by the Quickpath Home Logic to the IOH 33H 02H UNC QHL FRC ACK Counts number of Force Acknowledge CNFLTS REMOTE Conflict messages sent by the Quickpath Home Logic to the remote home Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 278 Documentation Changes UNC_QHL_FRC_ACK_ CNFLTS ANY QHL SLEEPS O H ORDER Counts number of Force Acknowledge Conflict messages sent by the Quickpath Home Logic Counts number of occurrences a request was put to sleep due to
135. MSR and the EAX register is loaded with the low order 32 bits On processors that support the Intel 64 architecture the high order 32 bits of each of RAX and RDX are cleared The processor monotonically increments the time stamp counter MSR every clock cycle and resets it to 0 whenever the processor is reset See Time Stamp Counter in Chapter 16 of the Intel 64 and IA 32 Architectures Software Developer s Manual Volume for specific details of the time stamp counter behavior When in protected or virtual 8086 mode the time stamp disable TSD flag in register CR4 restricts the use of the RDTSC instruction as follows When the TSD flag is clear the RDTSC instruction can be executed at any privilege level when the flag is set the instruction can only be executed at privilege level 0 When in real address mode the RDTSC instruction is always enabled The time stamp counter can also be read with the RDMSR instruction when executing at privilege level 0 The RDTSC instruction is not a serializing instruction It does not necessarily wait until all previous instructions have been executed before reading the counter Similarly subse quent instructions may begin execution before the read operation is performed If soft ware requires RDTSC to be executed only after all previous instructions have completed locally it can either use RDTSCP if the processor supports that instruction or execute the sequence LFENCE RDTSC This
136. ModRM r m NA NA CPUID CPU Identification Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF A2 CPUID A Valid Valid Returns processor identification and feature information to the EAX EBX ECX and EDX registers as determined by input entered in EAX in some cases ECX as well Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 41 e Documentation Changes n tel Table 3 20 Information Returned by CPUID Instruction Continued Initial EAX Value Information Provided about the Processor Basic CPUID Information 80000001H EAX Extended Processor Signature and Feature Bits EBX Reserved ECX Bit 0 LAHF SAHF available in 64 bit mode Bits 31 1 Reserved EDX Bits 10 0 Reserved Bit 11 SYSCALL SYSRET available when in 64 bit mode Bits 19 12 Reserved 20 Bit 20 Execute Disable Bit available Bits 25 21 Reserved 20 Bit 26 1 GByte pages are available if 1 Bit 27 RDTSCP 32 5 are available if 1 Bits 28 Reserved 0 Bit 29 Intel 64 Architecture available if 1 Bits 31 30 Reserved 0 Table 3 24 More on Feature Information Returned in the EDX Register Bit Mnemonic Description 13 Page Global Bit The global bit is supported in paging structure entries that m
137. NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 27 e Documentation Changes n tel CLFLUSH Flush Cache Line Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode OF AE 7 CLFLUSH m8 A Valid Valid Flushes cache line containing m8 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m NA NA NA Description Invalidates the cache line that contains the linear address specified with the source operand from all levels of the processor cache hierarchy data and instruction The invalidation is broadcast throughout the cache coherence domain If at any level of the cache hierarchy the line is inconsistent with memory dirty it is written to memory before invalidation The source operand is a byte memory location The availability of CLFLUSH is indicated by the presence of the CPUID feature flag CLFSH bit 19 of the EDX register see CPUI D CPU Identification in this chapter The aligned cache line size affected is also indicated with the CPUID instruction bits 8 through 15 of the EBX register when the initial value in the EAX register is 1 The memory attribute of the page containing the affected line has no effect on the behavior of this instruction It should be noted that processors are free to speculatively fetch and cache data from system memory regions assigned a memory type allowing for spe
138. NA NA NA Description Causes the processor s LOCK signal to be asserted during execution of the accompa nying instruction turns the instruction into an atomic instruction a multiprocessor environment the LOCK signal ensures that the processor has exclusive use of any shared memory while the signal is asserted Note that in later Intel 64 and 32 processors including the Pentium 4 Intel Xeon and P6 family processors locking may occur without the LOCK signal being asserted See the IA 32 Architecture Compatibility section below The LOCK prefix can be prepended only to the following instructions and only to those forms of the instructions where the destination operand is a memory operand ADD ADC AND BTC BTR BTS CMPXCHG CMPXCH8B DEC INC NEG NOT OR SBB SUB XOR XADD and XCHG If the LOCK prefix is used with one of these instructions and the source operand is a memory operand an undefined opcode exception UD may be generated An undefined opcode exception will also be generated if the LOCK prefix is used with any instruction not in the above list The XCHG instruction always asserts the LOCK signal regardless of the presence or absence of the LOCK prefix The LOCK prefix is typically used with the BTS instruction to perform a read modify write operation on a memory location in shared memory environment The integrity of the LOCK prefix is not affected by the alignment of the memory field Memory loc
139. NB mm1 A Valid Valid Negate zero preserve mm2 m64 packed byte integers in mm1 depending on the corresponding sign in mm2 m64 66 OF 38 08 PSIGNB A Valid Valid Negate zero preserve xmm2 m128 packed byte integers in xmm1 depending on the corresponding sign in xmm2 m128 OF 38 09 r PSIGNW mm1 A Valid Valid Negate zero preserve mm2 m64 packed word integers in mm1 depending on the corresponding sign in mm2 m128 66 0F 38 09 r PSIGNW xmm1 A Valid Valid Negate zero preserve xmm2 m128 packed word integers in xmm1 depending on the corresponding sign in xmm2 m128 OF 38 0A r PSIGND mm1 A Valid Valid Negate zero preserve mm2 m64 packed doubleword integers in mm1 depending on the corresponding sign in mm2 m128 66 OF 38 r PSIGND xmm1 A Valid Valid Negate zero preserve xmm2 m128 packed doubleword integers in xmm1 depending on the corresponding sign in xmm2 m128 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 149 Documentation Changes PSLLDQ Shift Double Quadword Left Logical Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 660F 73 7 ib PSLLDQ xmm1 A Valid Valid Shift xmm1 left by imm8 imm8 bytes while shifting in 0s Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m w imm8 N
140. OCH 02H UNC_GQ_SNOOP GOT Counts the number of remote snoops 01 that requested cache line be set to the state OCH 04H UNC_GQ_SNOOP GOT Counts the number of remote snoops Requires O S HIT E that have requested a cache line be writing MSR set to the S state from E state 301H with mask 2H OCH 04H UNC_GQ_SNOOP GOT Counts the number of remote snoops Requires O_S_HIT_F that have requested a cache line be writing MSR set to the S state from F forward 301H with state mask 8H 277 Documentation Changes OCH 04H UNC_GQ_SNOOP GOT Counts the number of remote snoops Requires O S HIT M that have requested a cache line be writing MSR set to the S state from M state 301H with mask 1H OCH 04H UNC_GQ_SNOOP GOT Counts the number of remote snoops Requires O S HIT S that have requested a cache line be writing MSR set to the S state from S state 301H with mask 4H OCH 08H UNC_GQ_SNOOP GOT Counts the number of remote snoops Requires O HIT E that have requested a cache line be writing MSR set to the state from E state 301H with mask 2H OCH 08H UNC GQ SNOOP GOT Counts the number of remote snoops Requires O HIT that have requested a cache line be writing MSR set to the I state from F forward 301H with state mask 8H OCH 08H UNC_GQ_SNOOP GOT Counts the number of remote snoops Requires O HIT M that have requested a cache line be writing
141. Package See Section 15 3 22 A32 MCi STATUS STATUS MSRS and Appendix E 42AH 1066 MSR 10 ADDR Package See Section 15 3 2 3 IA32 MCi ADDR MSRs 42BH 1067 MSR 10 MISC Package See Section 15 3 24 A32 MCi MISC MSRs 42CH 1068 11 Package See Section 15 3 2 1 4 32 MCi MSRs 42DH 1069 MSR_MC11_ Package See Section 15 3 2 2 4 32 MCi STATUS STATUS MSRS and Appendix E 42bEH 1070 MSR 11 ADDR Package See Section 15 3 2 3 A32 MCi ADDR MSRs 42FH 1071 MSR 11 MISC See Section 15 3 24 IA32 MCi MISC MSRs 430H 1072 MSR 12 Package See Section 15 3 2 1 IA32_MCi_CTL MSRs 431H 1073 MSR_MC12_ Package See Section 15 3 22 A32 MCi STATUS STATUS MSRS and Appendix E 432H 1074 MSR 12 ADDR Package See Section 15 3 2 3 IA 32 MCi ADDR MSRs 433H 1075 MSR MC12 MISC Package See Section 15 3 24 A32 MCi MISC MSRs 434H 1076 MSR MC13 CTL Package See Section 15 3 2 1 1 32 MCi MSRs 435H 1077 MSR MC13 Package See Section 15 3 22 A32 MCi STATUS STATUS MSRS and Appendix E 436H 1078 MSR MC13 ADDR Package See Section 15 3 2 3 IA 32 MCi ADDR MSRs 437H 1079 MSR MC13 MISC Package See Section 15 3 24 IA32 MCi MISC MSRs 438H 1080 MSR 14 CTL Package See Section 15 3 2 1 1 32 MCi MSRs 439H 1081 MSR_MC14_ Package See Sect
142. Port Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode E6 ib OUT imm8 AL A Valid Valid Output byte in AL to I O port address imm8 E7 ib OUT imm8 AX A Valid Valid Output word in AX to I O port address imm8 E7 ib OUT imm8 EAX A Valid Valid Output doubleword in EAX to I O port address imm8 EE OUT DX AL B Valid Valid Output byte in AL to 1 0 port address in DX EF OUT DX AX B Valid Valid Output word in AX to I O port address in DX EF OUT DX EAX B Valid Valid Output doubleword in EAX to 1 0 port address in DX NOTES See IA 32 Architecture Compatibility section below Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 113 Documentation Changes Instruction Operand Encoding Op En Operand 1 A imm8 B NA Operand 2 NA NA Operand 3 Operand 4 NA NA NA NA IA 32 Architecture Compatibility After executing an OUT instruction the Pentium processor ensures that the EWBE pin has been sampled active before it begins to execute the next instruction Note that the instruction can be prefetched if EWBE is not active but it will not be executed until the EWBE pin is sampled active Only the Pentium processor family has the EWBE pin OUTS OUTSB OUTSW OUTSD Output String to Port 6E 6F 6F 6E 6F 6F Opcode Instruction OUTS DX m8 OUTS DX m16 OUTS DX m32 OUTSB OUTSW OUTSD Op En A 64 Bit Mode Va
143. REX 0A r OR r8 r m8 D Valid N E r8 OR r m8 OB r OR r16 r m16 D Valid Valid r16 OR r m16 OB r OR r32 r m32 D Valid Valid r32 OR r m32 REX W 0B r ORr64 r m64 D Valid N E r64 OR r m64 NOTES n 64 bit mode r m8 can not be encoded to access the following byte registers if a REX prefix is used AH BH CH DH Instruction Operand Encoding Op En Operand 1 A B C D AL AX EAX RAX ModRM r m w ModRM r m w ModRM reg r w Operand 2 imm8 16 32 imm8 16 32 ModRM reg r ModRM r m r Operand 3 Operand 4 NA NA NA NA NA NA NA NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 112 Documentation Changes ORPD Bitwise Logical OR of Double Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF 56 r ORPD xmm1 A Valid Valid Bitwise OR of xmm2 m128 xmm2 m128 and xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA ORPS Bitwise Logical OR of Single Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Leg Mode OF 56 r ORPS A Valid Valid Bitwise OR of xmm2 m128 xmm2 m128 and xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA OUT Output to
144. REX W 0F7E MOVQr m64 mm B Valid N E Move quadword from mm to r m64 66 OF 6E r MOVD xmm A Valid Valid Move doubleword from r m32 r m32 to xmm 66 REX WOF6E xmm A Valid N E Move quadword from r m64 Ir r m64 to xmm 66 OF 7E r MOVD r m32 B Valid Valid Move doubleword from xmm xmm register to r m32 66 REX W OF MOVQ r m64 B Valid N E Move quadword from xmm xmm register to r m64 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 ModRM reg w ModRM r m NA NA ModRM r m ModRM reg MOVDDUP Move One Double FP Duplicate Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode F2 12 r MOVDDUP xmm1 A Valid Valid Move one double precision xmm2 m64 floating point value from the lower 64 bit operand in xmm2 m64 to xmm1 and duplicate Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 ModRM reg w ModRM r m Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 93 MOVDQA Move Aligned Double Quadword Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF 6F r MOVDQA xmml A Valid Valid Move aligned double xmm2 m128 quadword from xmm2 m128 to xmm1 66 OF 7F r MOVDQA B Valid Valid Move aligned double xmm2 m128 quadword from xmm1 to 1 2 128 Instruction
145. RV flag in the MSR STATUS register is clear When not implemented in the processor all reads and writes to this MSR will cause a general protection exception 40FH 1039 MSR MC3 MISC Core See Section 15 3 2 4 4 32 MCi MISC MSRs 410H 1040 Core See Section 15 3 2 1 IA32 MCi CTL MSRs 411H 1041 MSR_MC4_ Core See Section 15 3 22 IA32 MCi STATUS STATUS MSRS 412H 1042 MSR MC4 ADDR Core See Section 15 32 3 IA32_MCi_ADDR MSRs The MSR_MC3_ADDR register is either not implemented or contains no address if the ADDRV flag in the MSR MC3 STATUS register is clear When not implemented in the processor all reads and writes to this MSR will cause a general protection exception 413H 1043 MC4 MISC Core See Section 15 3 24 A32 MCi MISC MSRs 414H 1044 MSR MC5 CTL Core See Section 15 3 2 1 IA32 MCi CTL MSRs 415H 1045 MSR MC5 Core See Section 15 3 22 IA32 MCi STATUS STATUS MSRS 416H 1046 MSR_MC5_ADDR Core See Section 15 3 2 3 IA32 MCi ADDR MSRs 417H 1047 MC5 MISC Core See Section 15 3 24 1 32 MCi MISC MSRs 418H 1048 MC6 Package See Section 15 3 2 1 A32 MCi MSRs 419H 1049 MSR_MC6_ Package See Section 15 3 2 2 IA32_MCi_STATUS STATUS MSRS and Appendix E 41AH 1050 MSR MC6 ADDR Package See Section 15 3 2 3 A32 MCi ADDR MSRs 41BH 1051 MC6 MISC Package See Section 15 3 24 A32 MCi MISC MSR
146. SK GATE ELSE GOTO TRAP OR INTERRUPT GATE PE 1 trap interrupt gate FI END IA 32e MODE IF vector number 16 15 is not in IDT limits or selected IDT descriptor is not an interrupt or trap gate type THEN GP vector number lt 3 2 EXT EXT is bit 0 in error code Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 66 H intel FI IF software interrupt Generated by INT n INT 3 but not INTO THEN IF gate descriptor DPL CPL THEN GP vector number lt 3 2 PE 1 DPL CPL software interrupt FI ELSE Generated by INTO UD FI IF gate not present THEN NP vector number lt 3 2 EXT Fl IF vector_number 16 IST z 0 NewRSP lt TSS ISTx Fl GOTO TRAP OR INTERRUPT GATE Trap interrupt gate END Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 67 Documentation Changes intel INVD Invalidate Internal Caches Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 08 INVD A Valid Valid Flush internal caches initiate flushing of external caches NOTES See the IA 32 Architecture Compatibility section below Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA INVLPG Invalidate TLB Entry Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 01 7
147. T instruction Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 282 Documentation Changes intel 14 Updates to Appendix B Volume 3B Change bars show changes to Appendix B of the Intel 64 and IA 32 Architectures Soft ware Developer s Manual Volume 3B System Programming Guide Part 2 Table B 1 CPUID Signature Values of DisplayFamily DisplayModel DisplayFamily DisplayModel Processor Families Processor Number Series 06 1AH Intel Core i7 Processor Intel Xeon Processor 5500 series 06 1 06 1FH Intel Core i7 and i5 Processor 06 2 Intel Xeon Processors based on Intel Microarchitecture Nehalem 06 25H 06 2CH Next Generation Intel Processor Westmere 06 1DH Intel Xeon Processor MP 7400 series 06 17H Intel Xeon Processor 5200 5400 series Intel Core 2 Quad processors 8000 9000 series 06 OFH Intel Xeon Processor 3000 3200 5100 5300 7300 series Intel Core 2 Quad processor 6000 series Intel Core 2 Extreme 6000 series Intel Core 2 Duo 4000 5000 6000 7000 series processors Intel Pentium dual core processors 06 OEH Intel Core Duo Intel Core Solo processors 06_0DH Intel Pentium M processor 06_1CH Intel Atom processor OF 06H Intel Xeon processor 7100 5000 Series Intel Xeon Processor MP Intel Pentium 4 Pentium D processors OF 04H Intel Xeon Processor Intel Xeon Processor MP Intel Pentium 4 P
148. T memory type for this 1 GByte page see Section 25 2 4 7 Must be 1 otherwise this entry references an EPT page directory 118 Ignored 29 12 Reserved must be 0 N 1 30 Physical address of the 1 GByte page referenced by this entry 51 N Reserved must be 0 63 52 Ignored NOTES 1 N is the physical address width supported by the logical processor 1 Not all processors allow bit 7 of an EPT PDPTE to be set to 1 Software should read the VMX capabil ity MSR IA32 VPID CAP see Appendix G 10 to determine whether this is allowed Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 268 chenes intel Bits 63 52 are all 0 Bits 51 30 are from the EPT PDPTE Bits 29 0 are from the original guest physical address If bit 7 of the PDPTE is 0 a 4 KByte naturally aligned page directory is located at the physical address specified in bits 51 12 of the EPT PDPTE see Table 25 3 An EPT page directory comprises 512 64 bit entries PDEs An EPT PDE is selected using the physical address defined as follows Bits 63 52 are all 0 Bits 51 12 are from the EPT PDPTE Bits 11 3 are bits 29 21 of the guest physical address Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 269 e Documentation Changes n tel 11 Updates to Chapter 27 Volume 3B Change bars show changes to Chapter 27 of th
149. TATUS MC9 STATUS 06 2EH 426H 1062 IA32 MC9 ADDR MC9 ADDR 06 2EH 427H 1063 32 MC9 MISC MC9 MISC 06 2 428H 1064 IA32 10 10 06 2 429 1065 A32 MC10 STATUS MC10 STATUS 06 2EH 42AH 1066 IA32 10 MC10 ADDR 06 2EH 42BH 1067 IA32 10 MISC MC10 MISC 06 2EH 42CH 1068 IA32 11 11 06 2 42DH 1069 IA32 11 STATUS 11 STATUS 06 2EH 42EH 1070 IA32 11 ADDR MC11_ADDR 06_2EH 42FH 1071 32 11 MISC 11 5 06 2 430H 1072 IA32 12 MC12_CTL 06 2 431H 1073 32 12 STATUS 12 STATUS 06 2 432 1074 IA32_MC12_ADDR1 MC12_ADDR 06_2EH 433H 1075 IA32_MC12_MISC MC12_MISC 06_2EH 434H 1076 IA32_MC13_CTL MC13_CTL 06_2EH 435H 1077 IA32_MC13_STATUS MC13_STATUS 06_2EH 436H 1078 IA32_MC13_ADDR1 MC13_ADDR 06_2EH 437H 1079 IA32_MC13_MISC MC13_MISC 06_2EH 438H 1080 IA32_MC14_CTL MC14_CTL 06_2EH 439H 1081 IA32_MC14_STATUS MC14_STATUS 06_2EH Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 285 Documentation Changes Register Address Architectural MSR Name Introduced as 2 MSR Bit Description 43AH 1082 IA32_MC14_ADDR1 MC14_ADDR 06_2EH 43BH 1083 IA32_MC14_MISC MC14_MISC 06_2EH 43CH 1084 IA32_MC15_CTL MC15_CTL 06_2EH 43DH 1085 IA32_MC15_STATUS MC15_STATUS 06_2EH 43EH 1086 IA32_M
150. Threading Technology the microcode update facilities are shared between the logical processors either logical processor can initiate an update Each logical processor has its own BIOS signature MSR 2 BIOS SIGN ID at MSR address 8BH When a logical processor performs an update for the physical processor the 1A32 BIOS SIGN ID MSRs for resident logical processors are updated with identical information If logical processors initiate an update simultaneously the processor core provides the necessary synchronization needed to ensure that only one update is performed at a time Operating system microcode update drivers that adhere to Intel s guidelines do not need to be modified to run on processors supporting Intel Hyper Threading Technology Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 238 ee ehanas intel 6 Updates to Chapter 10 Volume 3A Change bars show changes to Chapter 10 of the Intel 64 and IA 32 Architectures Soft ware Developer s Manual Volume 3A System Programming Guide Part 1 10 3 THE INTEL 82489DX EXTERNAL APIC THE APIC THE XAPIC AND THE X2APIC The local APIC in the P6 family and Pentium processors is an architectural subset of the Intel 82489DX external APIC See Section 19 27 1 Software Visible Differences Between the Local APIC and the 82489DX The APIC architecture used in the Pentium 4 and Intel Xeon processors called the xAPIC architecture is an exte
151. UBPD A Valid Subtract packed double xmm2 m128 precision floating point values in xmm2 m128 from xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA SUBPS Subtract Packed Single Precision Floating Point Values Opcode Instruction Op 64 Bit Description En Mode OF 5C r SUBPS xmm1 A Valid Subtract packed single xmm2 m128 precision floating point values in xmm2 mem from xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes Documentation Changes SUBSD Subtract Scalar Double Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode F2 OF 5C r SUBSD xmm1 A Valid Valid Subtracts the low double xmm2 m64 precision floating point values in xmm2 mem64 from xmml1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA SUBSS Subtract Scalar Single Precision Floating Point Values Opcode Instruction 64 Bit Compat Description En Mode Leg Mode 5C r 50855 xmm1 Valid Valid Subtract the lower single xmm2 m32 precision floating point values in xmm2 m32 from xmm1 Instruction Operand Encoding Op
152. VMCS after VMCLEAR has been executed for that VMCS VMRESUME should be used for any subsequent VM entry using a VMCS until the next execution of VMCLEAR for the VMCS It is expected that in general VMRESUME will have lower latency than VMLAUNCH Since migrating a VMCS from one logical processor to another requires use of VMCLEAR see Section 21 10 1 which sets the launch state of the VMCS to clear such migration requires the next VM entry to be performed using VMLAUNCH Software devel opers can avoid the performance cost of increased VM entry latency by avoiding unnec essary migration of a VMCS from one logical processor to another Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 266 intel 9 Updates to Chapter 22 Volume 3B Change bars show changes to Chapter 22 of the Intel 64 and IA 32 Architectures Soft ware Developer s Manual Volume 3B System Programming Guide Part 2 22 1 1 Relative Priority of Faults and VM Exits The following principles describe the ordering between existing faults and VM exits Certain exceptions have priority over VM exits These include invalid opcode exceptions faults based on privilege level and general protection exceptions that are based on checking 1 0 permission bits in the task state segment TSS For example execution of RDMSR with CPL 3 generates a general protection exception and not a VM exit 1 These include faul
153. Valid Subtract sign extended imm8 from r m16 83 5 ib SUB r m32 imm8 Valid Valid Subtract sign extended imm8 from r m32 REX W 83 5 SUBr m64 imm8 Valid N E Subtract sign extended ib imm8 from r m64 28 r SUB r m8 r8 C Valid Valid Subtract r8 from r m8 REX 28 r SUB r m8 r8 Valid N E Subtract r8 from r m8 29 SUB r m16 r16 C Valid Valid Subtract r16 from r m16 29 r SUB r m32 r32 C Valid Valid Subtract r32 from r m32 REXW 29 r SUB r m64 r32 C Valid N E Subtract r64 from r m64 2A SUB r8 r m8 D Valid Valid Subtract r m8 from r8 REX 2 r SUB r8 r m8 D Valid N E Subtract r m8 from r8 2B r SUB r16 r m16 D Valid Valid Subtract r m16 from r16 2B r SUB r32 r m32 D Valid Valid Subtract r m32 from r32 REXW 2B r SUBr64 r m64 D Valid N E Subtract r m64 from r64 NOTES n 64 bit mode r m8 can not be encoded to access the following byte registers if a REX prefix is used AH BH CH DH Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 192 Documentation Changes Instruction Operand Encoding Op En A B D Operand 1 AL AX EAX RAX ModRM r m w ModRM r m w ModRM reg r w Operand 2 imm8 26 32 imm8 26 32 ModRM reg r ModRM r m r Operand 3 Operand 4 NA NA NA NA NA NA NA NA SUBPD Subtract Packed Double Precision Floating Point Values Opcode Instruction Op 64 Bit Description En 66 OF 5C r S
154. a paging structure whose address is in CR3 e g the PML4 table with IA 32e paging i 2 PCD PWT where the PCD and PWT values come from CR3 For an access to a PDE with PAE paging i 2 PCD PWT where the PCD and PWT values come from the relevant PDPTE register For access to a paging structure entry X whose address is in another paging structure entry Y 2 PCD PWT where the PCD and PWT values come from Y For an access to the physical address that is the translation of a linear address i 4 PAT 2 PCD PWT where the PAT PCD and PWT values come from the relevant PTE if the translation uses a 4 KByte page the relevant PDE if the translation uses a 2 MByte page or a 4 MByte page or the relevant PDPTE if the translation uses a 1 GByte page 410 11 Numbers Page Frames and Page Offsets Section 4 3 Section 4 4 2 and Section 4 5 give details of how the different paging modes translate linear addresses to physical addresses Specifically the upper bits of a linear address called the page number determine the upper bits of the physical address called the page frame the lower bits of the linear address called the page offset determine the lower bits of the physical address The boundary between the page number and the page offset is determined by the page size Specifically 32 bit paging If the translation does not use a PTE because CR4 PSE 1 and the PS flag is 1 in the PDE used
155. aging structure entry and bits 21 12 identify a second The latter identifies the page frame Bits 11 0 of the linear address are the page offset within the 4 KByte page frame See Figure 4 2 for an illustration e With PAE paging the first paging structure comprises only 4 22 entries Translation thus begins by using bits 31 30 from a 32 bit linear address to identify the first paging structure entry Other paging structures comprise 512 29 entries so the process continues by using 9 bits at a time Bits 29 21 identify a second paging structure entry and bits 20 12 identify a third This last identifies the page frame See Figure 4 5 for an illustration e With 32 paging each paging structure comprises 512 2 entries and translation uses 9 bits at a time from a 48 bit linear address Bits 47 39 identify the first paging structure entry bits 38 30 identify a second bits 29 21 a third and bits 20 12 identify a fourth Again the last identifies the page frame See Figure 4 8 for an illustration The translation process in each of the examples above completes by identifying a page frame However the paging structures may be configured so that translation terminates before doing so This occurs if process encounters a paging structure entry that is marked not present because its P flag bit 0 is clear or in which a reserved bit is set In this case there is no translation for the linear address an access to that
156. alid Valid Valid Valid Valid Valid Valid Valid Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes Description Jump near if not below or equal CF 0 and ZF 0 Jump near if not carry CF 0 Not supported in 64 bit mode Jump near if not carry CF 0 Jump near if not equal ZF 0 Not supported in 64 bit mode Jump near if not equal 2 0 Jump near if not greater ZF 1 or SFz OF Not supported in 64 bit mode Jump near if not greater ZF 1 or SFz OF Jump near if not greater or equal SFz OF Not supported in 64 bit mode Jump near if not greater or equal SFz OF Jump near if not less SF OF Not supported in 64 bit mode Jump near if not less 5 Jump near if not less or equal ZF 0 and SF OF Not supported in 64 bit mode Jump near if not less or equal ZF 0 and SF OF Jump near if not overflow OF 0 Not supported in 64 bit mode Jump near if not overflow OF 0 Jump near if not parity PF 0 Not supported in 64 bit mode Jump near if not parity PF 0 72 Documentation Changes Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 89 cw JNS rel16 A N S Valid Jump near if not sign SF 0 Not supported in 64 bit mode OF 89 cd JNS rel32 A Valid Valid Jump near if not sign SF 0 OF 85 cw JNZ rel16 A N S Valid Jump near if not zero ZF 0 Not suppo
157. alid AL imma8 35 iw XOR AX imm16 A Valid Valid AX XOR imm16 35 id XOR EAX imm32 A Valid Valid EAX XOR imm32 REX W 35id XORRAX imm32 A Valid N E RAX XOR imm32 sign extended 80 6 ib XORr m8 imm8 Valid Valid r m8 XOR imm8 REX 80 6ib XOR r m8 imm8 Valid N E r m8 XOR imm8 81 6 iw XOR r m16 B Valid Valid r m16 XOR imm16 imm16 81 6 id XOR r m32 B Valid Valid r m32 XOR imm32 imm32 REXW 81 6 XORr m64 B Valid N E r m64 XOR imm32 sign id imm32 extended 83 6 ib XOR r m16 imm8 Valid Valid r m16 XOR imm8 sign extended 83 6 ib r m32 imm8 Valid Valid r m32 XOR imm8 sign extended REX W 83 6 XORr m64 imm8 Valid N E r m64 XOR imm8 sign ib extended 30 Jr XOR r m8 r8 C Valid Valid r m8 XOR r8 REX 30 r XOR r m8 r8 Valid N E r m8 r8 31 r XOR r m16 16 C Valid Valid r m16 XOR r16 31 r XOR r m32 r32 C Valid Valid r m32 XOR r32 REXW 31 r XOR r m64 r64 C Valid N E r m64 XOR r64 32 XOR r8 r m8 D Valid Valid r8 XOR r m8 REX 32 r XOR r8 r m8 D Valid N E r8 XOR r m8 33 r XOR r16 r m16 D Valid Valid r16 XOR r m16 33 r r32 r m32 D Valid Valid r32 XOR r m32 REX W 33 r XOR r64 r m64 D Valid N E r64 XOR r m64 NOTES n 64 bit mode r m8 can not be encoded to access the following byte registers if a REX prefix is used AH BH CH DH Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes
158. alid Find matching bytes in m8 ES E DI and DST EJSI F2 REX W A6 REPNE CMPS m8 A Valid N E Find matching bytes in RDI m8 and RSI 2 7 REPNECMPS m16 Valid Valid Find matching words in 16 ES E DI and DST EJSI F2A7 REPNE CMPS m32 A Valid Valid Find matching doublewords m32 in ES E DI and 054 51 F2 REX W A7 REPNE CMPS m64 A Valid N E Find matching doublewords m64 in RDI and RSI F2 AE REPNESCAS m8 A Valid Valid Find AL starting at ES1 E DI F2 REX W AE REPNESCAS m8 A Valid N E Find AL starting at RDI F2 AF REPNE SCAS m16 A Valid Valid Find AX starting at ES1 E DI F2 AF REPNE SCAS m32 A Valid Valid Find EAX starting at ES E DI F2 REX W AF REPNE SCAS m64 Valid N E Find RAX starting at RDI NOTES n 64 bit mode r m8 can not be encoded to access the following byte registers if a REX prefix is used AH BH CH DH Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA RET Return from Procedure Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode C3 RET A Valid Valid Near return to calling procedure CB RET A Valid Valid Far return to calling procedure C2 iw RET imm16 B Valid Valid Near return to calling procedure and pop imm16 bytes from stack CA iw RET imm16 B Valid Valid Far return to calling procedure and pop imm16 bytes from stack Intel 64 and 32 Architectures Software Developer s Man
159. alues in xmm2 m128 and stores the results in xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA RSQRTSS Compute Reciprocal of Square Root of Scalar Single Precision Floating Point Value Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 0F 52 Jr RSQRTSS xmm1 Valid Valid Computes the approximate xmm2 m32 reciprocal of the square root of the low single precision floating point value in xmm2 m32 and stores the results in xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m r NA NA SAHF Store AH into Flags Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 9E SAHF A Invalid Valid Loads SF ZF AF PF and CF from AH into EFLAGS register NOTES Valid in specific steppings See Description section Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 175 Documentation Changes Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA SAL SAR SHL SHR Shift Opcode Instruction Op 64 Bit Compat Description En Leg Mode DO 4 SAL r m8 1 A Valid Valid Multiply r m8 by 2 once REX DO 4 SAL r m8 1 A Valid Multiply r m8 by 2 once D2 4 SAL r m8 CL B Valid Valid Multiply r m8 by 2 CL times REX
160. and 32 Architectures Software Developer s Manual Documentation Changes 133 H intel PMAXUW Maximum of Packed Word Integers Opcode Instruction Op 64 Bit Compat Description En Leg Mode 66 0F 383E r PMAXUWxmml A Valid Valid Compare packed unsigned xmm2 m128 word integers in xmm1 xmm2 m128 and store packed maximum values in 1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA PMINSB Minimum of Packed Signed Byte Integers Opcode Instruction Op 64 Bit Compat Description En Leg Mode 66 OF 38 38 PMINSB xmm1 Valid Valid Compare packed signed byte xmm2 m128 integers in xmm1 and xmm2 m128 and store packed minimum values in 1 Instruction Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m PMINSD Minimum of Packed Dword Integers Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF 38 39 r PMINSD A Valid Valid Compare packed signed xmm2 m128 dword integers in xmm1 and xmm2 m128 and store packed minimum values in 1 Instruction Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation C
161. and 4 A ModRM r m w NA NA NA B reg w NA NA NA POPA POPAD Pop All General Purpose Registers Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 61 POPA A Invalid Valid Pop DI SI BP BX DX CX and AX 61 POPAD A Invalid Valid Pop EDI ESI EBX EDX ECX and EAX Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA POPCNT Return the Count of Number of Bits Set to 1 Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode F3 OF B8 r POPCNT r16 A Valid Valid POPCNT on r m16 r m16 F3 OF B8 r POPCNT r32 A Valid Valid POPCNT on r m32 r m32 F3REX WOFB8 POPCNT r64 A Valid N E POPCNT on r m64 r m64 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 144 chenes intel POPF POPFD POPFQ Pop Stack into EFLAGS Register Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 9D POPF A Valid Valid Pop top of stack into lower 16 bits of EFLAGS 9D POPFD A Valid Pop top of stack into EFLAGS REX W 9D POPFQ A Valid N E Poptop of stack and zero extend into RFLAGS Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA POR
162. and readiness to accept interrupts it is possible that interrupts sent via the SELF IPI register or via the ICR with identical vectors can be combined Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 260 Documentation Changes 7 Updates to Chapter 15 Volume 3A Change bars show changes to Chapter 15 of the Intel 64 and IA 32 Architectures Soft ware Developer s Manual Volume 3A System Programming Guide Part 1 Table 15 7 lists overwrite rules for uncorrected errors corrected errors and uncorrected recoverable errors Table 15 7 Overwrite Rules for UC and UCR Errors First Event Second Event PCC S AR MCA Bank Reset System CE UCR 1 0 0 if UCNA 1 if SRAR second yes if AR 1 else 1 else 0 UCR CE 1 0 OifUCNA 1 if SRAR first yes if AR 1 else 1 else 0 UCNA UCNA 1 0 0 0 first no UCNA SRAO 1 0 1 0 first no UCNA SRAR 1 0 1 1 first yes SRAO UCNA 1 0 1 0 first no SRAO SRAO 1 0 1 0 first no SRAO SRAR 1 0 1 1 first yes SRAR UCNA 1 0 1 1 first yes SRAR SRAO 1 0 1 1 first yes SRAR SRAR 1 0 1 1 first yes UCR UC 1 1 undefined undefined second yes UC UCR 1 1 undefined undefined first yes Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 261 m e Documentation Changes n tel 8 Updates to Chapter 21 Volume 3B Change bars show changes to Cha
163. anges Table 10 1 Local APIC Register Address Map Continued ntel Address Register Name Software Read Write FEEO 01 Trigger Mode Register TMR bits 255 224 Read Only FEEO 0200H Interrupt Request Register IRR bits 31 0 Read Only FEEO 0210H Interrupt Request Register IRR bits 63 32 Read Only FEEO 0220H Interrupt Request Register IRR bits 95 64 Read Only FEEO 0230H Interrupt Request Register IRR bits 127 96 Read Only FEEO 0240H Interrupt Request Register IRR bits 159 128 Read Only FEEO 0250H Interrupt Request Register IRR bits 191 160 Read Only FEEO 0260H Interrupt Request Register IRR bits 223 192 Read Only FEEO 0270H Interrupt Request Register IRR bits 255 224 Read Only FEEO 0280H Error Status Register Read Only FEEO 0290H through Reserved FEEO 02E0H FEEO 02F0H LVT CMCI Registers Read Write FEEO 0300H Interrupt Command Register ICR bits 0 31 Read Write 0 0310H Interrupt Command Register ICR bits 32 63 Read Write FEEO 0320H LVT Timer Register Read Write 0 0330H LVT Thermal Sensor Register2 Read Write 0 0340H LVT Performance Monitoring Counters Read Write Register FEEO 0350H LVT LINTO Register Read Write FEEO 0360H LVT LINT1 Register Read Write FEEO 0370H LVT Error Register Read Write FEEO 0380H Initial Count Register for Timer Read Write 0 0390H Current Count Regist
164. anges n tel Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA Description The EAX register is loaded with the low order 32 bits The EDX register is loaded with the supported high order bits of the counter The number of high order bits loaded into EDX is implementation specific on processors that do no support architectural performance monitoring The width of fixed function and general purpose performance counters on processors supporting architectural performance monitoring are reported by CPUI D OAH leaf See below for the treatment of the EDX register for fast reads The ECX register selects one of two type of performance counters specifies the index relative to the base of each counter type and selects fast read mode if supported The two counter types are General purpose or special purpose performance counters The number of general purpose counters is model specific if the processor does not support architectural performance monitoring see Chapter 30 of Intel 64 and 32 Architectures Software Developer s Manual Volume Special purpose counters are available only in selected processor members see Section 30 13 30 14 of Intel 64 and A 32 Architectures Software Developer s Manual Volume This counter type is selected if ECX 30 is clear Fixed function performance counter The number fixed function performance counters is enumerated by CPUID OAH
165. anslation for a linear address if the translation process for that address would use a paging structure entry in which the P flag bit 0 is 0 or one that sets a reserved bit If there is a valid translation for a linear address its access rights are determined as specified in Section 4 6 Figure 4 12 illustrates the error code that the processor provides on delivery of a page fault exception The following items explain how the bits in the error code describe the nature of the page fault exception P flag bit 0 This flag is 0 if there is no valid translation for the linear address because the P flag was 0 in one of the paging structure entries used to translate that address W R bit 1 If the access causing the page fault exception was a write this flag is 1 otherwise it is 0 This flag describes the access causing the page fault exception not the access rights specified by paging U S bit 2 If a user mode CPL 3 access caused the page fault exception this flag is 1 it is 0 if a supervisor mode CPL lt 3 access did so This bit describes the access causing the page fault exception not the access rights specified by paging 4 8 ACCESSED AND DIRTY FLAGS For any paging structure entry that is used during linear address translation bit 5 is the accessed flag For paging structure entries that map a page as opposed to referencing another paging structure bit 6 is the dirty flag These flags are provided for use b
166. ap a page indicating TLB entries that are common to different processes and need not be flushed The CR4 PGE bit controls this feature Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 42 Documentation Changes CRC32 Accumulate CRC32 Value Opcode F2 OF 38 FO r F2 REX OF 38 FO r F2 OF 38 Fl r F2 OF 38 Fl r F2 REX W OF 38 FO r F2 REX W OF 38 Fl r Instruction CRC32 r32 r m8 CRC32 r32 r m8 CRC32 r32 r m16 CRC32 r32 r m32 CRC32 r64 r m8 CRC32 r64 r m64 Op En A A 64 Bit Mode Valid Valid Valid Valid Valid Valid Compat Description Leg Mode Valid Accumulate CRC32 on r m8 N E Accumulate CRC32 on r m8 Valid Accumulate CRC32 on r m16 Valid Accumulate CRC32 on r m32 N E Accumulate CRC32 on r m8 N E Accumulate CRC32 on r m64 NOTES In 64 bit mode r m8 can not be encoded to access the following byte registers if a REX prefix is used AH BH CH DH Instruction Operand Encoding Op En Operand 1 A ModRM reg r w Operand 2 ModRM r m r Operand 3 Operand 4 NA NA CVTDQ2PD Convert Packed Dword Integers to Packed Double Precision FP Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode F3 OF E6 CVTDQ2PDxmm1 Valid Valid Convert two packed signed xmm2 m64 doublew ord integers from xmm2 m128 to two packed doubl
167. are executing on the processor where the term processor refers to a logical processor For example a physical processor supporting multiple cores and or HyperThreading Technology is treated as a multi processor systems Reads are not reordered with other reads Writes not reordered with older reads Writes to memory not reordered with other writes with the following exceptions writes executed with the CLFLUSH instruction streaming stores writes executed with the non temporal move instructions MOVNTI MOVNTQ MOVNTDQ MOVNTPS and MOVNTPD and String operations see Section 8 2 4 1 Reads may be reordered with older writes to different locations but not with older writes to the same location e Reads writes cannot be reordered with I O instructions locked instructions or serializing instructions Reads cannot pass earlier LFENCE and MFENCE instructions Writes cannot pass earlier LFENCE SFENCE and MFENCE instructions Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 233 chenes intel LFENCE instructions cannot pass earlier reads SFENCE instructions cannot pass earlier writes MFENCE instructions cannot pass earlier reads or writes 8 2 4 2 Examples Illustrating Memory Ordering Principles for String Operations The following examples uses the same notation and convention as described in Section 8 2 3 1
168. at address RDI or EDI AB STOSD A Valid Valid For legacy mode store EAX at address ES E DI For 64 bit mode store EAX at address RDI or EDI REX W 57050 Valid N E Store RAX at address RDI or EDI Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA STR Store Task Register Opcode Instruction Op 64 Bit Compat Description En Leg Mode OF 00 1 STR r m16 A Valid Valid Stores segment selector from TR in r m16 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m NA NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 191 Documentation Changes SUB Subtract ntel Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 2Cib SUB AL imm8 A Valid Valid Subtract imm8 from AL 2D iw SUB AX imm16 A Valid Valid Subtract imm16 from AX 2D id SUB EAX imm32 A Valid Valid Subtract imm32 from EAX REXW 2Did SUBRAX imm32 A Valid Subtract imm32 sign extended to 64 bits from RAX 80 5 ib SUB r m8 imm8 Valid Valid Subtract imm8 from r m8 REX 80 5ib SUBr m8 imm8 Valid N E Subtract imm8 from r m8 81 5 iw SUB r m16 B Valid Valid Subtract imm16 from imm16 r m16 81 5 id SUB r m32 B Valid Valid Subtract imm32 from imm32 r m32 REXW 81 5 SUBr m64 B Valid Subtract imm32 sign id imm32 extended to 64 bits from r m64 83 5 ib SUB r m16 imm8 Valid
169. ate else RIP zero extended 32 bit offset from far pointer referenced in the instruction In 64 bit mode If selector points to a gate then RIP 64 bit displacement taken from gate else RIP 64 bit offset from far pointer referenced in the instruction Instruction Operand Encoding Op En A Operand 1 Offset B ModRM r m Operand 2 NA NA Operand 3 Operand 4 NA NA NA NA Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 26 H intel BW CWDE CDQE Convert Byte to Word Convert Word to Doubleword Convert Doubleword to Quadword Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode 98 CBW A Valid Valid lt sign extend of AL 98 CWDE A Valid Valid EAX lt sign extend of AX REX W 98 CDQE A Valid RAX lt sign extend of EAX Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA CLC Clear Carry Flag Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode F8 CLC A Valid Valid Clear CF flag Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA CLD Clear Direction Flag Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode FC CLD A Valid Valid Clear DF flag Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA
170. ate 8 bits r m8 right CL times Rotate 8 bits r m16 right imm times Rotate 8 bits r m16 right imm8 times Rotate 16 bits r m16 right once Rotate 16 bits r m16 right CL times Rotate 16 bits r m16 right imm8 times Rotate 32 bits r m32 right once 162 Documentation Changes Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode REX W D1 1 RORr m64 1 A Valid Rotate 64 bits r m64 right once Uses a 6 bit count 03 1 ROR r m32 CL B Valid Valid Rotate 32 bits r m32 right CL times REXW D3 1 RORr m64 CL B Valid N E Rotate 64 bits r m64 right CL times Uses a 6 bit count C1 1 ib ROR r m32 imm8 Valid Valid Rotate 32 bits r m32 right imm8 times REXW C1 1 RORr m64 imm8 Valid Rotate 64 bits r m64 right ib imm8 times Uses a 6 bit count NOTES n 64 bit mode r m8 can not be encoded to access the following byte registers if a REX prefix is used AH BH CH DH See 32 Architecture Compatibility section below Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m w 1 NA NA B ModRM r m w CL r NA NA C ModRM r m w imm8 NA NA RCPPS Compute Reciprocals of Packed Single Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 53 r RCPPS xmm1 A Valid Valid Computes the approximate xmm2 m128 reciprocals of the packed single precision floatin
171. being above the thermal throttling threshold 80H 01H UNC_THERMAL_THR Cycles that the PCU records that core OTTLING TEMP CORE J 0 is above the thermal throttling _0 threshold temperature 80H 02H THERMAL Cycles that the PCU records that core OTTLING TEMP CORE 1 is above the thermal throttling 21 threshold temperature 80H 04H THERMAL Cycles that the PCU records that core OTTLING TEMP CORE 2 is above the thermal throttling 22 threshold temperature 80H 08H THERMAL Cycles that the PCU records that core OTTLING TEMP CORE 3 is above the thermal throttling 3 threshold temperature 81H 01H UNC_THERMAL_THR Cycles that the PCU records that core OTTLED TEMP CORE 0 is in the power throttled state due _0 to 5 temperature being above the thermal throttling threshold Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 280 Documentation Changes ntel 81H 02H UNC THERMAL Cycles that the PCU records that core OTTLED TEMP CORE 1is in the power throttled state due D to core s temperature being above the thermal throttling threshold 81H 04H UNC THERMAL Cycles that the PCU records that core OTTLED TEMP CORE 2 is in the power throttled state due 22 to 5 temperature being above the thermal throttling threshold 81H 08H UNC_THERMAL_THR Cycles that the PCU records that cor
172. bit mode software read and write the TPR using an alternate interface MOV CR8 instruction The new Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 248 m e Documentation Changes n tel priority level is established when the MOV CR8 instruction completes execution Soft ware does not need to force serialization after loading the TPR using MOV CR8 Use of the MOV CRn instruction requires a privilege level of 0 Programs running at priv ilege level greater than 0 cannot read or write the TPR An attempt to do so causes a general protection exception The TPR is abstracted from the interrupt controller which prioritizes and manages external interrupt delivery to the processor The IC can be an external device such as an APIC or 8259 Typically the IC provides a priority mecha nism similar or identical to the TPR The IC however is considered implementation dependent with the under lying priority mechanisms subject to change CR8 by contrast is part of the Intel 64 architecture Software can depend on this definition remaining unchanged Figure 10 22 shows the layout of CR8 only the low four bits are used The remaining 60 bits are reserved and must be written with zeros Failure to do this causes a general protection exception 10 9 SPURIOUS INTERRUPT A special situation may occur when a processor raises its task priority to be greater than or equal to the level of the interr
173. btract with borrow r m32 from r32 REXW 1B r SBBr64 r m64 D Valid Subtract with borrow r m64 from r64 NOTES n 64 bit mode r m8 can not be encoded to access the following byte registers if a REX prefix is used AH BH CH DH Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A AL AX EAX RAX imm8 16 32 NA NA B ModRM r m 8 16 32 ModRM r m w ModRM reg NA NA D ModRM reg w ModRM r m NA NA SCAS SCASB SCASW SCASD Scan String Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode AE SCAS m8 A Valid Valid Compare AL with byte at ES E DI RDI then set status flags AF SCASm16 A Valid Valid Compare AX with word at ES E DI or RDI then set status flags AF SCAS m32 A Valid Valid Compare EAX with doubleword at ES E DI or RDI then set status flags REX W AF SCAS m64 A Valid N E Compare RAX with quadword at RDl or EDI then set status flags AE SCASB A Valid Valid Compare AL with byte at ES E DI RDI then set status flags AF SCASW A Valid Valid Compare AX with word at ES E DI RDI then set status flags AF SCASD A Valid Valid Compare EAX with doubleword at ES E DI or RDI then set status flags REX W AF SCASQ A Valid N E Compare RAX with quadword at RDl or EDI then set status flags Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 180 Documentation Cha
174. byte mask in mm2 The default memory location is specified by DS EDI Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 ModRM reg ModRM r m MAXPD Return Maximum Packed Double Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF 5F r MAXPD A Valid Valid Return the maximum xmm2 m128 double precision floating point values between xmm2 m128 and xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 ModRM reg w ModRM r m MAXPS Return Maximum Packed Single Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 5F r MAXPS 1 A Valid Valid Return the maximum single xmm2 m128 precision floating point values between xmm2 m128 and xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 ModRM reg w ModRM r m NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 84 Documentation Changes MAXSD Return Maximum Scalar Double Precision Floating Point Value Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode F2 OF 5F r MAXSD 1 A Valid Valid Return the maximum scalar xmm2 m64 double precision floating point value between xmm2 mem64 and
175. cd OF 86 cw OF 86 cd OF 82 cw OF 82 cd OF 84 cw Instruction JNO rel8 JNP rel8 JNS rel8 JNZ rel8 JO rel8 JP rel8 JPE rel8 JPO rel8 JS rel8 JZ rel8 JA rel16 JA rel32 JAE rel16 JAE rel32 JB rel16 JB rel32 JBE rel16 JBE rel32 JCrel16 JC rel32 JE rel16 Op En A 64 Bit Mode Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid N S Valid 5 Valid NS Valid NS Valid 5 Valid NS Compat Leg Mode Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes Description Jump short if not overflow OF 0 Jump short if not parity PF 0 Jump short if not sign SF 0 Jump short if not zero ZF 0 Jump short if overflow 1 Jump short if parity PF 1 Jump short if parity even 1 Jump short if parity odd PF 0 Jump short if sign SF 1 Jump short if zero ZF 1 Jump near if above CF 0 and ZF 0 Not supported in 64 bit mode Jump near if above CF 0 and ZF 0 Jump near if above or equal CF 0 Not supported in 64 bit mode Jump near if above or equal 0 Jump near if below 1 Not supported in 64 bit mode Jump near if below CF 1 Jump near
176. cessor VM Entry VM Entry VMXOFF Operation s M Exit Ld VM Exit VM Exit V VM Exit VMXON Legend exec Outside v VMX Root VMX VMX Operation Non Root Operation Operation b State of VMCS and VMX Operation VMRESUME4A vMLauNcH YMRESY VMGLEAR VMPTRLD B VM Exit VM Exit VMCS B VMCS A VMPTRLD A VMPTRLD A YM Exit MERE VMLAUNCH VMRESUME VMCLEAR A Legend 1 Inactive Current VMCS Active VMCS Current VMCS VMCS working not current controlling Figure 27 1 VMX Transitions and States of VMCS in a Logical Processor VMCS data cached by the processor are flushed to memory and that no other software can corrupt the current VMM s VMCS data It is also recommended that the VMM execute VMXOFF after such executions of VMCLEAR The VMX capability MSR 2 VMX BASIC reports the memory type used by the processor for accessing a VMCS or any data structures referenced through pointers in the VMCS Software must maintain the VMCS structures in cache coherent memory Software must always map the regions hosting the I O bitmaps MSR bitmaps VM exit MSR store area VM exit MSR load area and VM entry MSR load area to the write back WB memory type Mapping these regions to uncacheable UC memory type is supported but strongly
177. ch qword Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m r NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 103 chenes intel MOVSLDUP Move Packed Single FP Low and Duplicate Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 12 r MOVSLDUP A Valid Valid Move two single precision 1 floating point values from xmm2 m128 the lower 32 bit operand of each qword in xmm2 m128 to xmm1 and duplicate each 32 bit operand to the higher 32 bits of each qword Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA MOVSS Move Scalar Single Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 0 10 MOVSS Valid Valid Move scalar single precision xmm2 m32 floating point value from xmm2 m32 to xmm1 register OF 11 r MOVSS B Valid Valid Move scalar single precision xmm2 m32 xmm floating point value from xmm1 register to xmm2 m32 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m r NA NA B ModRM r m w ModRM reg r NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 104 Documentation Changes MOVSX
178. che coherency protocols ensure that other processors that are caching the same memory locations are managed properly while atomic operations are performed on cached memory locations 8 1 1 Guaranteed Atomic Operations Intel486 processor and newer processors since guarantees that the following basic memory operations will always be carried out atomically Reading or writing a byte Reading or writing a word aligned on a 16 bit boundary Reading or writing a doubleword aligned on a 32 bit boundary The Pentium processor and newer processors since guarantees that the following addi tional memory operations will always be carried out atomically Reading or writing a quadword aligned on a 64 bit boundary 16 bit accesses to uncached memory locations that fit within a 32 bit data bus The P6 family processors and newer processors since guarantee that the following additional memory operation will always be carried out atomically Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 229 e Documentation Changes n tel Unaligned 16 32 and 64 bit accesses to cached memory that fit within a cache line Accesses to cacheable memory that are split across bus widths cache lines and page boundaries are not guaranteed to be atomic by the Intel Core 2 Duo Intel Atom Intel Core Duo Pentium M Pentium 4 Intel Xeon P6 family Pentium and Intel486 proces sors The Intel Core 2 Duo
179. chitectures Software Developer s Manual Documentation Changes H intel Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m r NA NA NA B NA NA NA NA WAIT FWAIT Wait Opcode Instruction Op 64 Bit Compat Description En Leg Mode 9B WAIT A Valid Valid Check pending unmasked floating point exceptions 9B FWAIT A Valid Valid Check pending unmasked floating point exceptions Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA Description Causes the processor to check for and handle pending unmasked floating point excep tions before proceeding FWAIT is an alternate mnemonic for WAIT This instruction is useful for synchronizing exceptions in critical sections of code Coding a WAIT instruction after a floating point instruction ensures that any unmasked floating point exceptions the instruction may raise are handled before the processor can modify the instruction s results See the section titled Floating Point Exception Synchroniza tion in Chapter 8 of the Intel 64 and 32 Architectures Software Developer s Manual Volume 1 for more information on using the WAIT FWAIT instruction This instruction s operation is the same in non 64 bit modes and 64 bit mode WBINVD Write Back and Invalidate Cache Opcode Instruction Op 64 Bit Compat Description En Mode Leg
180. creased to 32 bits wide This enables 232 1 processors to be addressable in physical destination mode This 32 bit value is referred to as x2APIC ID A processor implementation may choose to support less than 32 bits in its hardware System software should be agnostic to the actual number of bits that are implemented All non implemented bits will return zeros on reads by software The ID value of FFFF_FFFFH and the highest value corresponding to the imple mented bit width of the local APIC ID register in the system are reserved and cannot be assigned to any logical processor In 2 mode the local ID register is a read only register to system software and will be initialized by hardware It is accessed via the RDMSR instruction reading the MSR at address 0802H Each logical processor in the system including clusters with a communication fabric must be configured with an unique x2APIC ID to avoid collisions of x2APIC IDs On DP and high end MP processors targeted to specific market segments and depending on the system configuration it is possible that logical processors in different and un connected clusters power up initialized with overlapping x2APIC IDs In these configu rations a model specific means may be provided in those product segments to enable BIOS and or platform firmware to re configure the x2APIC IDs in some clusters to provide for unique and non overlapping system wide IDs before configuring
181. cription En Mode Leg Mode 9C PUSHF A Valid Valid Push lower 16 bits of EFLAGS 9C PUSHFD A Valid Push EFLAGS 9C PUSHFQ A Valid Push RFLAGS Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 159 Documentation Changes PXOR Logical Exclusive OR Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF EF r PXOR mm A Valid Valid Bitwise XOR of mm m64 mm m64 and mm 66 OF EF r PXOR 1 A Valid Valid Bitwise XOR of xmm2 m128 xmm2 m128 and xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA RCL RCR ROL ROR Rotate Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode DO 2 RCL r m8 1 A Valid Valid Rotate 9 bits CF r m8 left once REX D0 2 RCL r m8 1 A Valid N E Rotate 9 bits CF r m8 left once D2 2 RCL r m8 CL B Valid Valid Rotate 9 bits CF r m8 left CL times REX D2 2 RCL r m8 CL B Valid N E Rotate 9 bits CF r m8 left CL times C0 2 ib RCL r m8 imm8 Valid Valid Rotate 9 bits CF r m8 left imme8 times REX 2 ib RCLr m8 imm8 Valid N E Rotate 9 bits CF r m8 left imm8 times D1 2 RCL r m16 1 A Valid Valid Rotate 17 bits CF r m16 left once D3 2 RCL r m16 CL B Valid Valid Rotate 17 bits CF r m16
182. culative reads such as the WB WC and WT memory types PREFETCHh instruc tions can be used to provide the processor with hints for this speculative behavior Because this speculative fetching can occur at any time and is not tied to instruction execution the CLFLUSH instruction is not ordered with respect to PREFETCHh instruc tions or any of the speculative fetching mechanisms that is data can be speculatively loaded into a cache line just before during or after the execution of a CLFLUSH instruc tion that references the cache line CLFLUSH is only ordered by the MFENCE instruction It is not guaranteed to be ordered by any other fencing or serializing instructions or by another CLFLUSH instruction For example software can use an MFENCE instruction to ensure that previous stores are included in the write back The CLFLUSH instruction can be used at all privilege levels and is subject to all permis sion checking and faults associated with a byte load and in addition a CLFLUSH instruc tion is allowed to flush a linear address in an execute only segment Like a load the CLFLUSH instruction sets the A bit but not the D bit in the page tables The CLFLUSH instruction was introduced with the SSE2 extensions however because it has its own CPUID feature flag it be implemented in A 32 processors that do not include the SSE2 extensions Also detecting the presence of the SSE2 extensions with the CPUID instruction does not guarant
183. cumentation Changes 1 20 016 Added Documentation changes 21 23 March 27 2006 Removed Documentation Changes 1 23 Added Documentation Changes 1 36 September 2006 018 Added Documentation Changes 37 42 October 2006 Removed Documentation Changes 1 42 019 Added Documentation Changes 1 19 March 2007 020 Added Documentation Changes 20 27 May 2007 Removed Documentation Changes 1 27 Added Documentation Changes 1 6 November 2007 Removed Documentation Changes 1 6 um Added Documentation Changes 1 6 August 2008 Removed Documentation Changes 1 6 023 Added Documentation Changes 1 21 March 2009 Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes Revision History intel Revision Description Date DER June 2009 LE v3 September 2009 December 2009 Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes Revision History Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes Preface Preface intel This document is an update to the specifications contained in the Affected Documents table below This document is a compilation of device and documentation errata specification clarifications and changes It is intended for hardware system manufacturers and software developers of applications operating systems or tools Affected Documents Document Title Document Numbe
184. d 32 Architectures Software Developer s Manual Documentation Changes 65 Documentation Changes n tel IF VM 1 and IOPL lt 3 AND INT n THEN GP 0 ELSE Protected mode IA 32e mode or virtual 8086 mode interrupt IF IA32 EFER LMA 0 THEN Protected mode or virtual 8086 mode interrupt GOTO PROTECTED MODE ELSE IA 32e mode interrupt IA 32e MODE FI FI Fl REAL ADDRESS MODE IF vector number 4 3 is not within IDT limit THEN GP FI IF stack not large enough for a 6 byte return information THEN 55 Push EFLAGS 15 0 IF lt 0 Clear interrupt flag TF lt 0 Clear trap flag lt 0 Clear AC flag Push CS Push IP No error codes are pushed CS lt IDT Descriptor vector number 4 selector EIP lt IDT Descriptor vector number 4 offset 16 bit offset AND 0000FFFFH END PROTECTED MODE IF vector number 8 7 is not within IDT limits or selected IDT descriptor is not an interrupt trap or task gate type THEN GP vector number 8 2 EXT Fl EXT is bit 0 in error code IF software interrupt Generated by INT n INT 3 or INTO THEN IF gate descriptor DPL CPL THEN GP vector number 8 2 Fl PE 1 DPL CPL software interrupt FI IF gate not present THEN NP vector number 8 2 EXT FI IF task gate Specified in the selected interrupt table descriptor THEN GOTO TA
185. d Valid Add imm8 to r m8 REX 80 0ib ADDr m8 imm8 Valid N E Add sign extended imm8 to r m64 81 0 iw ADD r m16 B Valid Valid Add imm16 to r m16 imm16 81 0 id ADD r m32 B Valid Valid Add imm32 to r m32 imm32 REX W 81 0 ADD r m64 B Valid N E Add imm32 sign extended id imm32 to 64 bits to r m64 83 0 ib ADD r m16 imm8 Valid Valid Add sign extended imm8 to r m16 83 0 ib ADD r m32 imm8 B Valid Valid Add sign extended imm8 to r m32 Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 13 Documentation Changes Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode REX W 83 0 ADDr m64 imm8 Valid N E Add sign extended imm8 to ib r m64 00 r ADD r m8 r8 A Valid Valid Add r8 to r m8 REX 00 r X ADDr m8 r8 Valid Add r8 to r m8 01 r ADD r m16 r16 A Valid Valid Add r16 to r m16 01 r ADD r m32 r32 A Valid Valid Add r32 to r m32 REXW 01 r ADDr m64 64 A Valid N E Add r64 to r m64 02 r ADD r8 r m8 A Valid Valid Add r m8 to r8 REX 02 r ADDr8 r m8 Valid Add r m8 to r8 03 r ADD r16 r m16 A Valid Valid Add r m16 to r16 03 r ADD r32 r m32 A Valid Valid Add r m32 to r32 REX W 03 r ADDr64 r m64 A Valid Add r m64 to r64 NOTES In 64 bit mode r m8 can not be encoded to access the following byte registers if a REX prefix is used AH BH CH DH Instruction Operand Encoding Op En Operand 1 Operand 2 Ope
186. de OF EE r PMAXSW mm1 A Valid Valid Compare signed word mm2 m64 integers in mm2 m64 and mm1 and return maximum values 66 OF EE r PMAXSW xmm1 A Valid Valid Compare signed word xmm2 m128 integers in xmm2 m128 xmm1 and return maximum values Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 132 H intel Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA PMAXUB Maximum of Packed Unsigned Byte Integers Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF DE r PMAXUB mm1 A Valid Valid Compare unsigned byte mm2 m64 integers in mm2 m64 and mm1 and returns maximum values 66 OF DE r PMAXUB xmm1 A Valid Valid Compare unsigned byte xmm2 m128 integers in xmm2 m128 xmm1 and returns maximum values Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA PMAXUD Maximum of Packed Unsigned Dword Integers Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 0F 383F r PMAXUDxmml A Valid Valid Compare packed unsigned xmm2 m128 dword integers xmm1 and xmm2 m128 and store packed maximum values in 1 Instruction Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m NA NA Intel 64
187. de Instruction Op 64 Bit Compat Description En Mode Leg Mode OF BO r CMPXCHG r m8 r8 Valid Valid Compare AL with r m8 If equal ZF is set and r8 is loaded into r m8 Else clear ZF and load r m8 into AL REX 0F BO r CMPXCHG A Valid Compare AL with r m8 If r m8 r8 equal ZF is set and r8 is loaded into r m8 Else clear ZF and load r m8 into AL OF B1 r CMPXCHGr m16 Valid Valid Compare AX with r m16 If r16 equal ZF is set and r16 is loaded into r m16 Else clear ZF and load r m16 into AX OF B1 r CMPXCHG r m32 Valid Valid Compare EAX with r m32 If r32 equal ZF is set and r32 is loaded into r m32 Else Clear ZF and load r m32 into EAX REX W CMPXCHGr m64 A Valid Compare RAX with r m64 If 1 r64 equal ZF is set and r64 is loaded into r m64 Else clear ZF and load r m64 into RAX NOTES See the IA 32 Architecture Compatibility section below n 64 bit mode r m8 can not be encoded to access the following byte registers if a REX prefix is used AH BH CH DH Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 ModRM r m w ModRM reg NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 39 chenes intel CMP XCHG8B CMPXCHG16B Compare and Exchange Bytes Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF C7 1m64 CMPXC
188. defined Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m w NA NA NA SQRTPD Compute Square Roots of Packed Double Precision Floating Point Values Opcode Instruction 64 Bit Compat Description En Mode Leg Mode 66 OF 51 r SQRTPDxmml A Valid Valid Computes square roots of xmm2 m128 the packed double precision floating point values in xmm2 m128 and stores the results in xmm1 Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 187 Documentation Changes ntel Instruction Operand Encoding Op En A Operand 1 ModRM reg w Operand 3 NA Operand 4 NA Operand 2 ModRM r m SQRTPS Compute Square Roots of Packed Single Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 51 r SQRTPS xmm1 A Valid Valid Computes square roots of xmm2 m128 the packed single precision floating point values in xmm2 m128 and stores the results in xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA SQRTSD Compute Squa re Root of Scalar Double Precision Floating Point Value Opcode Instruction Op 64 Bit Compat Description En Leg Mode F20F 51 r SQRTSD xmm1 A Valid Valid Computes square root of xmm2 m64 the low double precision fl
189. difying code A 32 processors exhibit model specific behavior when executing self modified code depending upon how far ahead of the current execution pointer the code has been modified As processor microarchitectures become more complex and start to speculatively execute code ahead of the retirement point as in P6 and more recent processor fami lies the rules regarding which code should execute pre or post modification become blurred To write self modifying code and ensure that it is compliant with current and future versions of the 32 architectures use one of the following coding options OPTION 1 Store modified code as data into code segment Jump to new code or an intermediate location Execute new code OPTION 2 Store modified code as data into code segment Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 231 m e Documentation Changes n tel Execute a serializing instruction For example CPUID instruction Execute new code The use of one of these options is not required for programs intended to run on the Pentium or Intel486 processors but are recommended to ensure compatibility with the P6 and more recent processor families Self modifying code will execute at a lower level of performance than non self modifying or normal code The degree of the performance deterioration will depend upon the frequency of modification and specific characteristics
190. ding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m w ModRM reg NA NA B ModRM reg w ModRM r m NA NA C AL AX EAX RAX Displacement NA NA D Displacement AL AX EAX RAX NA NA E reg w imm8 16 32 64 NA NA F ModRM r m w imm8 16 32 64 NA NA MOV Move to from Control Registers Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 20 r MOV r32 CRO A Valid Move control register to r32 CR7 OF 20 r MOV r64 CRO A Valid N E Move extended control CR7 register to r64 REX R 0F 20 MOV r64 CR8 A Valid N E Move extended CR8 to 0 r64 OF 22 r MOV CRO CR7 A N E Valid Move r32 to control register r32 OF 22 r MOV CRO CR7 A Valid N E Move r64 to extended r64 control register REX R 0F 22 MOV CR8 r64 A Valid N E Move r64 to extended 0 CR8 NOTE 1 MOV CR instructions except for MOV CR8 are serializing instructions MOV CR8 is not architecturally defined as a serializing instruction For more information see Chapter 8 in Intel 64 and IA 32 Architectures Software Developer s Manual Volume Instruction Operand Encoding Op En Operand 1 Operand 2 A ModRM reg w ModRM r m Operand 3 Operand 4 NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes Documentation Changes MOV Move to from Debug Registers Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF
191. dition and its definition differs across GBSQ GSNPQ FSB Bits 31 0 is the event count field If the specified condition is met during each relevant clock domain of the event logic the matched condition signals the counter logic to increment the associated event count field The lower 32 bits of these 8 MSRs at addresses 107CC through 107D3 are treated as 32 bit performance counter registers In Dual Core Intel Xeon processor 7100 series the uncore performance counters can be accessed using RDPMC instruction with the index starting from 18 through 25 The EDX register returns zero when reading these 8 PMCs In Intel Xeon processor 7400 series RDPMC with ECX between 2 and 9 can be used to access the eight uncore performance counter control registers Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 273 Changes intel 13 Updates to Appendix A Volume 3B Change bars show changes to Appendix A of the Intel 64 and 1A 32 Architectures Soft ware Developer s Manual Volume 3B System Programming Guide Part 2 A 2 PERFORMANCE MONITORING EVENTS FOR INTEL CORE I7 PROCESSOR FAMILY AND XEON PROCESSOR FAMILY Processors based on the Intel microarchitecture Nehalem support the architectural and non architectural performance monitoring events listed in Table A 1 and Table A 2 The events in Table A 2 generally applies to processors with CPUID signature of DisplayFamily DisplayModel encodi
192. dress corresponds to a TLB entry the processor may use that TLB entry to determine the page frame access rights and other attributes for accesses to that linear address In this case the processor may not actually consult the paging structures in memory The processor may retain a TLB entry unmodified even if software subsequently modifies the relevant paging structure entries in memory See Section 4 10 3 2 for how software can ensure that the processor uses the modified paging structure entries If the paging structures specify a translation using a page larger than 4 KBytes some processors may choose to cache multiple smaller page TLB entries for that translation Each such TLB entry would be associated with a page number corresponding to the smaller page size e g bits 47 12 of a linear address with A 32e paging even though part of that page number e 9 bits 20 12 are part of the offset with respect to the page specified by the paging structures The upper bits of the physical address in such a TLB entry are derived from the physical address in the PDE used to create the translation Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 221 e Documentation Changes n tel while the lower bits come from the linear address of the access for which the translation is created There is no way for software to be aware that multiple translations for smaller pages have been used for a large page If
193. e OTTLED TEMP CORE 3 is in the power throttled state due 3 to 5 temperature being above the thermal throttling threshold 82H 01H UNC PROCHOT ASS Number of system assertions of ERTION PROCHOT indicating the entire processor has exceeded the thermal limit 83H 01H UNC THERMAL Cycles that the PCU records that core OTTLING PROCHOTC 0 is a low power state due to the ORE 0 system asserting PROCHOT the entire processor has exceeded the thermal limit 83H 02H UNC THERMAL THR Cycles that the PCU records that core OTTLING PROCHOT C 1 is low power state due to the ORE 1 system asserting PROCHOT the entire processor has exceeded the thermal limit 83H 04H UNC THERMAL THR Cycles that the PCU records that core OTTLING PROCHOTC 2 is alow power state due to the ORE 2 system asserting PROCHOT the entire processor has exceeded the thermal limit 83H 08H UNC THERMAL THR Cycles that the PCU records that core OTTLING PROCHOT C 3 is a low power state due to the ORE 3 system asserting PROCHOT the entire processor has exceeded the thermal limit 84H 01H UNC TURBO MODE Uncore cycles that core 0 is operating CORE 0 in turbo mode 84H 02H UNC TURBO MODE Uncore cycles that core 1 is operating CORE 1 in turbo mode 84H 04H UNC TURBO MODE Uncore cycles that core 2 is operating CORE 2 in turbo mode 84H 08H UNC TURBO MODE Uncore cycles that core 3 is operating CORE 3 in turbo mode 85H 02H UNC CYCLES UNHAL Uncore cycles that at lea
194. e 4 2 Valid General and Special Purpose Performance Counter Index Range for RDPMC Continued Processor Family Displayed Family Dis Valid PMC General played Model Other Index Range purpose Signatures Counters Pentium M processors 06H 09H 06H 0 1 0 1 64 bit Intel Xeon processors 03H 0FH 04H gt 0 and lt 25 gt 0and lt 17 with L3 and L3 is present Intel Core Solo and Intel 06H OEH 0 1 0 1 Core Duo processors Dual core Intel Xeon processor LV Intel Core 2 Duo processor 06H OFH 0 1 0 1 Intel Xeon processor 3000 5100 5300 7300 Series general purpose PMC Intel Xeon processors 7100 0FH 06H and L3 is 20andx25 gt 0 4 lt 17 series with L3 present Intel Core 2 Duo processor 06 17 0 1 0 1 family Intel Xeon processor family general purpose PMC Intel Xeon processors 7400 06H_1DH gt 0 and lt 9 0 1 series Intel Atom processor family 06 0 1 0 1 Intel Core 17 processor Intel 06H 1AH 06H 1EH 0 3 0 1 2 3 Xeon processors 5500 series 06H 1FH 06H 2EH The Pentium 4 and Intel Xeon processors also support fast 32 bit and slow 40 bit reads on the first 18 performance counters Selected this option using ECX 31 If bit 31 is set RDPMC reads only the low 32 bits of the selected performance counter If bit 31 is clear all 40 bits are read A 32 bit result is returned in EAX and EDX is set to 0 A 32 bit read executes faster on Pentium 4 proce
195. e Developer s Manual Documentation Changes 253 e Documentation Changes n tel MSR Address MMIO Offset MSR R W x2APIC mode xAPIC mode Register Name Semantics Comments 837H 370H LVT Error register Read write See Figure 10 8 for reserved bits 838H 380H Initial Count register Read write for Timer 839H 390H Current Count Read only register for Timer 83EH 3E0H Divide Configuration Read write See Figure 10 10 for Register DCR for reserved bits Timer 83FH Not available SELF IPI Write only Available only in x2APIC mode NOTES 1 WRMSR causes GP 0 for read only registers 2 WRMSR causes GP 0 for attempts to set reserved bit to 1 in a read write register including bits 63 32 of each register 3 RDMSR causes GP 0 for write only registers 4 MSR 831H is reserved read write operations cause general protection exceptions The contents of the APIC register at MMIO offset 310H are accessible in x2APIC mode through the MSR at address 830H 5 SELF IPI register is supported only in x2APIC mode 10 12 1 3 Reserved Bit Checking Section 10 12 1 2 and Table 10 6 specifies the reserved bit definitions for the APIC regis ters in 2 mode Non zero writes by WRMSR instruction to reserved bits to these registers will raise a general protection fault exception while reads return zeros RsvdZ semantics In x2APIC mode the local APIC ID register is in
196. e Intel 64 and IA 32 Architectures Soft ware Developer s Manual Volume 3B System Programming Guide Part 2 21 3 MANAGING VMCS REGIONS AND POINTERS A VMM must observe necessary procedures when working with a VMCS the associated VMCS pointer and the VMCS region It must also not assume the state of persistency for VMCS regions in memory or cache Before entering VMX operation the host VMM allocates a VMXON region A VMM can host several virtual machines and have many VMCSs active under its management A unique VMCS region is required for each virtual machine a VMXON region is required for the VMM itself A VMM determines the VMCS region size by reading 32 VMX BASIC MSR it creates VMCS regions of this size using a 4 KByte aligned area of physical memory Each VMCS region needs to be initialized with a VMCS revision identifier at byte offset 0 identical to the revision reported by the processor in the VMX capability MSR NOTE Software must not read or write directly to the VMCS data region as the format is not architecturally defined Consequently Intel recommends that the VMM remove any linear address mappings to VMCS regions before loading System software does not need to do special preparation to the VMXON region before entering into VMX operation The address of the VMXON region for the VMM is provided as an operand to VMXON instruction Once in VMX root operation the VMM needs to prepare data fields in the VMCS that cont
197. e during linear address translation see Section 4 9 115 Ignored M 1 12 Physical address of the 4 KByte aligned PML4 table used for linear address translation 63M Reserved must be 0 NOTES 1 is an abbreviation for MAXPHYADDR which is at most 52 see Section 4 1 4 A 32e paging may map linear addresses to 4 KByte pages 2 MByte pages or 1 GByte pages Figure 4 8 illustrates the translation process when it produces a 4 KByte page Figure 4 9 covers the case of a 2 MByte page and Figure 4 10 the case of a 1 GByte page The following items describe the I A 32e paging process in more detail as well has how the page size is determined A 4 KByte naturally aligned page directory pointer table is located at the physical address specified in bits 51 12 of the PML4E see Table 4 13 A page directory pointer table comprises 512 64 bit entries PDPTEs A PDPTE is selected using the physical address defined as follows Bits 51 12 are from the PMLAE Bits 11 3 are bits 38 30 of the linear address Bits 2 0 are all O 1 If MAXPHYADDR 52 bits in the range 51 MAXPHYADDR will be 0 in any physical address used by IA 32e paging The corresponding bits are reserved in the paging structure entries See Section 4 14 for how to determine MAXPHYADDR 2 Not all processors support 1 GByte pages see Section 4 1 4 Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 214 Documentation
198. e imm8 with r m64 ib 38 r CMP r m8 r8 B Valid Valid Compare r8 with r m8 REX 38 r r m8 r8 B Valid Compare r8 with r m8 39 CMP r m16 r16 B Valid Valid Compare r16 with r m16 39 CMP r m32 r32 B Valid Valid Compare r32 with r m32 REX W 39 r r m64 r64 B Valid Compare r64 with r m64 3A CMP r8 r m8 A Valid Valid Compare r m8 with r8 REX r CMP r8 r m8 A Valid N E Compare r m8 with r8 3B r CMP r16 r m16 A Valid Valid Compare r m16 with r16 3B r CMP r32 r m32 A Valid Valid Compare r m32 with r32 REXW 3B r CMP r64 r m64 Valid Compare r m64 with r64 NOTES n 64 bit mode r m8 can not be encoded to access the following byte registers if a REX prefix is used AH BH CH DH Instruction Operand Encoding Op En Operand 1 A ModRM reg r w B ModRM r m r w C ModRM r m D AL AX EAX RAX Operand 2 ModRM r m ModRM reg w imm8 imm8 Operand 3 Operand 4 NA NA NA NA NA NA NA NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 34 chenes intel CMPPD Compare Packed Double Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 C2 rib CMPPD xmm1 A Valid Valid Compare packed double xmm2 m128 imm8 precision floating point values in xmm2 m128 and xmm1 using imm8 as comparison predicate Instruction O
199. e precision floating point values in xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 43 H intel CVTDQ2PS Convert Packed Dword Integers to Packed Single Precision FP Values Opcode Instruction Op 64 Bit Compat Description En Leg Mode OF 5B r CVTDQ2PS xmm1 Valid Valid Convert four packed signed xmm2 m128 doubleword integers from xmm2 m128 to four packed single precision floating point values in xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA CVTPD2DQ Convert Packed Double Precision FP Values to Packed Dword Integers Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode F2 OF E6 CVTPD2DQxmml1 A Valid Valid Convert two packed double xmm2 m128 precision floating point values from xmm2 m128 to two packed signed doubleword integers in xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA CVTPD2PI Convert Packed Double Precision FP Values to Packed Dword Integers Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF 2D r CVTPD2PI mm A Valid Valid Convert two packed double xmm m128 precision floating point values from
200. ead only 817H 170H ISR bits 255 224 Read only 818H 180H Trigger Mode Register Read only TMR bits 31 0 819H 190H TMR bits 63 32 Read only 81AH 1A0H TMR bits 95 64 Read only 81BH 1B0H TMR bits 127 96 Read only 81CH 1COH TMR bits 159 128 Read only 81DH 1D0H TMR bits 191 160 Read only 81EH 1E0H TMR bits 223 192 Read only 81FH bits 255 224 Read only 820H 200H Interrupt Request Read only Register IRR bits 31 0 821H 210H IRR bits 63 32 Read only 822H 220H IRR bits 95 64 Read only 823H 230H IRR bits 127 96 Read only 824H 240H IRR bits 159 128 Read only 825H 250H IRR bits 191 160 Read only 826H 260H IRR bits 223 192 Read only 827H 270H IRR bits 255 224 Read only 828H 280H Error Status Register Read write WRMSR of a non zero ESR value causes GP 0 See Section 10 5 3 and Section 10 12 8 82FH 2F0H LVT CMCI register Read write See Figure 15 10 for reserved bits 830H 300H and Interrupt Command Read write See Figure 10 29 for 310H Register ICR reserved bits 832H 320H LVT Timer register Read write See Figure 10 8 for reserved bits 833H 330H LVT Thermal Sensor Read write See Figure 10 8 for register reserved bits 834H 340H LVT Performance Read write See Figure 10 8 for Monitoring register reserved bits 835H 350H LVT LINTO register Read write See Figure 10 8 for reserved bits 836H 360H LVT LINT1 register Read write See Figure 10 8 for reserved bits Intel 64 and 1 32 Architectures Softwar
201. ection 4 10 1 4 Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 208 ee ehanas intel PAT page attribute table If CPUID O1H EDX PAT bit 16 1 the 8 entry page attribute table PAT is supported When the PAT is supported three bits in certain paging structure entries select a memory type used to determine type of caching used from the PAT see Section 4 9 PSE 36 36 Bit page size extension If CPUID O1H EDX PSE 36 bit 17 1 the PSE 36 mechanism is supported indicating that translations using 4 MByte pages with 32 bit paging may produce physical addresses with more than 32 bits see Section 4 3 NX execute disable If CPUID 80000001H EDX NX bit 20 1 1A32 EFER NXE may be set to 1 allowing PAE paging and A 32e paging to disable execute access to selected pages see Section 4 6 Processors that do not support CPUID function 80000001H do not allow 2 EFER NXE to be set to 1 e PagelGB 1 GByte pages If CPUI D 80000001H EDX Page1GB bit 26 1 1 GByte pages are supported with A 32e paging see Section 4 5 LM IA 32e mode support If CPUID 80000001H EDX LM bit 29 1 1A32 EFER LME may be set to 1 enabling A 32e paging Processors that do not support CPUID function 80000001H do not allow 2 EFER LME to be set to 1 e CPUID 80000008H EAX 7 0 reports the physical address width supported by the processor For processors that do not s
202. ee that the CLFLUSH instruction is implemented in the processor CLFLUSH operation is the same in non 64 bit modes and 64 bit mode Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 28 chenes intel CLI Clear Interrupt Flag Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode FA CLI A Valid Valid Clear interrupt flag interrupts disabled when interrupt flag cleared Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA CLTS Clear Task Switched Flag in CRO Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode OF 06 CLTS A Valid Valid Clears TS flag in CRO Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA CMC Complement Carry Flag Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode F5 CMC A Valid Valid Complement CF flag Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 29 m e Documentation Changes n tel CMOVcc Conditional Move Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 47 r 16 16 Valid Move if above CF 0 and ZF 0 OF 47 r CMOVAr32 r m32 Valid Valid Move if above
203. egisters The location of the first byte of the memory location is specified by DI EDI and DS registers The memory location does not need to be aligned on a natural boundary The size of the store address depends on the address size attribute The most significant bit in each byte of the mask operand determines whether the corre sponding byte in the source operand is written to the corresponding byte location in memory 0 indicates no write and 1 indicates write The MASKMOVDQU instruction generates a non temporal hint to the processor to mini mize cache pollution The non temporal hint is implemented by using a write combining WC memory type protocol see Caching of Temporal vs Non Temporal Data in Chapter 10 of the Intel 64 and 32 Architectures Software Developer s Manual Volume 1 Because the WC protocol uses a weakly ordered memory consistency model a fencing operation implemented with the SFENCE or MFENCE instruction should be used in conjunction with MASKMOVDQU instructions if multiple processors might use different memory types to read write the destination memory locations Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 83 chenes intel MASKMOVQ Store Selected Bytes of Quadword Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF Jr MASKMOVQ mm1 Valid Valid Selectively write bytes from mm2 mm1 to memory location using the
204. egment CS LIMIT FFFFFH 4 GByte limit CS ARbyte G lt 1 4 KByte granularity CS ARbyte S 1 CS ARbyte TYPE 1011B Execute Read Accessed CS ARbyte D 1 32 bit code segment CS ARbyte DPL lt 0 CS SEL RPL lt 0 CSARbyteP 1 CPL lt 0 SSSEL lt CS SEL 8 Set rest of SS to a fixed value Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 195 ee ehanas intel SS BASE 0 Flat segment SS LIMIT lt FFFFFH 4 GByte limit SS ARbyte G lt 1 4 KByte granularity SS ARbyte S SS ARbyte TYPE lt 0011B Read Write Accessed SS ARbyte D lt 1 32 bit stack segment SS ARbyte DPL lt 0 SS SEL RPL lt 0 SS ARbyte P 1 ESP SYSENTER_ESP_MSR EIP SYSENTER_EIP_MSR SYSEXIT Fast Return from Fast System Call Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 35 SYSEXIT A Valid Valid Fast return to privilege level 3 user code REX W 0F35 SYSEXIT A Valid Valid Fast return to 64 bit mode privilege level 3 user code Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA SYSRET Return From Fast System Call Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 07 SYSRET A Valid Invalid Return from fast system call Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Ope
205. el Revision History Revision History Revision Description Date 001 Initial release November 2002 Added 1 10 Documentation Changes 002 Removed old Documentation Changes items that already have been December 2002 incorporated in the published Software Developer s manual Added 9 17 Documentation Changes Removed Documentation Change 6 References to bits Gen and Len 003 Deleted February 2003 Removed Documentation Change 4 VIF Information Added to CLI Discussion Removed Documentation changes 1 17 2094 Added Documentation changes 1 24 Jun 2003 Removed Documentation Changes 1 24 005 Added Documentation Changes 1 15 September 2003 006 Added Documentation Changes 16 34 November 2003 Updated Documentation changes 14 16 17 and 28 007 Added Documentation Changes 35 45 January 2004 Removed Documentation Changes 1 45 008 Added Documentation Changes 1 5 March 2004 009 Added Documentation Changes 7 27 May 2004 Removed Documentation Changes 1 27 010 Added Documentation Changes 1 August 2004 011 Added Documentation Changes 2 28 November 2004 Removed Documentation Changes 1 28 s Added Documentation Changes 1 16 March 2005 Updated title 013 There are no Documentation Changes for this revision of the July 2005 document 014 Added Documentation Changes 1 21 September 2005 Removed Documentation Changes 1 21 015 Added Do
206. el 64 and IA 32 Architectures Software Developer s Manual Documentation Changes Description Set byte if less or equal ZF 1 or SFz Set byte if not above CF 1 or ZF 1 Set byte if not above CF 1 or ZF 1 Set byte if not above or equal CF 1 Set byte if not above or equal CF 1 Set byte if not below CF 0 Set byte if not below CF 0 Set byte if not below or equal CF 0 and ZF 0 Set byte if not below or equal CF 0 and ZF 0 Set byte if not carry CF 0 Set byte if not carry CF 0 Set byte if not equal ZF 0 Set byte if not equal ZF 0 Set byte if not greater ZF 1 or SFz OF Set byte if not greater ZF 1 or SFz OF Set byte if not greater or equal SFz OF Set byte if not greater or equal SFz OF Set byte if not less SF zOF Set byte if not less SF zOF Set byte if not less or equal ZF 0 and SF OF Set byte if not less or equal ZF 0 and SF OF Set byte if not overflow OF 0 Set byte if not overflow OF 0 Set byte if not parity PF 0 Set byte if not parity PF 0 Set byte if not sign SF 0 182 Documentation Changes Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode REX 0F 99 SETNS r m8 A Valid N E Set byte if not sign SF 0 OF 95 SETNZ r m8 A Valid Valid Set byte if not zero ZF 0 REX 0F 95 SETNZ r m8 A Valid N E Set byte if not zero ZF 0 OF 90 SETO r m8 A Valid Valid Set byte if overf
207. entium D processors 06 09H Intel Pentium M processor OF 02H Intel Xeon Processor Intel Xeon Processor MP Intel Pentium 4 processors OF 01H Intel Xeon Processor Intel Xeon Processor MP Intel Pentium 4 processors 06 7H 06 08H 06 Intel Pentium Xeon Processor Intel Pentium III Processor 06 0 06 03H 06 05H Intel Pentium II Xeon Processor Intel Pentium 11 Processor 06 01H Intel Pentium Pro Processor 05 01H 05 02H 05 04H Intel Pentium Processor Intel Pentium Processor with MMX Technology Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 283 Documentation Changes Table B 2 IA 32 Architectural MSRs Register Address Architectural MSR Name Introduced as and bit fields Architectural Former MSR MSR Bit Description MSR 179H 377 IA32 CAP Global Machine Check 06 01H MCG CAP Capability RO 7 0 Count Number of reporting banks 8 P IA32 is present if this bit is set 9 MCG EXT P Extended machine check state registers are present if this bit is set 10 MCP CMCI P Support for 06 1AH corrected MC error event is present 11 TES P Threshold based error status register are present if this bit is set 1512 Reserved 23 16 MCG EXT CNT Number of extended machine check state register
208. er for Timer Read Only FEEO 03A0H through Reserved 0 03D0H FEEO 03 Divide Configuration Register for Timer Read Write FEEO 03F0H Reserved NOTES 1 Not supported in the Pentium 4 and Intel Xeon processors The Illegal Register Access bit 7 of the ESR will not be set when writing to these registers Introduced in the Pentium 4 and Intel Xeon processors This APIC register and its associated function are implementation dependent and may not be present in future 32 or Intel 64 pro cessors Introduced in the Pentium Pro processor This APIC register and its associated function are implementation dependent and may not be present in future IA 32 or Intel 64 processors Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 241 e Documentation Changes n tel Suppress EOI broadcasts Indicates whether software can inhibit the broadcast of EOI message by setting bit 12 of the Spurious Interrupt Vector Register see Section 10 8 5 and Section 10 9 31 2524 23 16 15 8 7 0 Reserved Max LVT Entry Reserved Version Support for EOI broadcast suppression Value after reset OOBN 00VVH V Version N of LVT entries minus 1 B 1 if EOl broadcast suppression supported Address FEEO 0030H Figure 10 7 Local APIC Version Register 10 5 1 Local Vector Table The local vector table LVT allows software to specify the man
209. ers and fields within the registers ICR Register The following fields in the I CR register are used to specify the destination of an Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 246 m e Documentation Changes n tel Destination Mode Selects one of two destination modes physical or logical Destination Field In physical destination mode used to specify the APIC ID of the destination processor in logical destination mode used to specify a message destination address MDA that can be used to select specific processors in clusters Destination Shorthand A quick method of specifying all processors all excluding self or self as the destination Delivery mode Lowest Priority Architecturally specifies that a lowest priority arbitration mechanism be used to select a destination processor from a specified group of processors The ability of a processor to send a lowest priority is model specific and should be avoided by BIOS and operating system software Local destination register LDR Used in conjunction with the logical destination mode and MDAs to select the destination processors Destination format register DFR Used in conjunction with the logical destination mode and MDAs to select the destination processors How the ICR LDR and DFR are used to select an IPI destination depends on the desti nation mode used phy
210. ess the following byte registers if a REX prefix is used AH BH CH DH 40H through 47H are REX prefixes in 64 bit mode Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 63 Documentation Changes Instruction Operand Encoding ntel Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m r w NA NA NA B reg r w NA NA NA INS INSB INSW INSD Input from Port to String Opcode Instruction Op 64 Bit En Mode A Valid Compat Leg Mode 6C INS m8 DX Valid 6D INS m16 DX Valid Valid 6D INS m32 DX Valid Valid 6 INSB Valid Valid 6D INSW Valid Valid 6D INSD Valid Valid Description Input byte from 1 0 port specified in DX into memory location specified in ES E DI or RDI Input word from I O port specified in DX into memory location specified in ES E DI or RDI 1 Input doubleword from 1 0 port specified in DX into memory location specified in ES E DI or RDI Input byte from 1 0 port specified in DX into memory location specified with ES E DI or RDI Input word from I O port specified in DX into memory location specified in ES E DI or Input doubleword from 1 0 port specified in DX into memory location specified in ES E DI or RDI NOTES n 64 bit mode only 64 bit RDI and 32 bit EDI address sizes are supported In non 64 bit mode only 32 bit EDI and 16 bit DI address sizes are supported Instr
211. estination and xmm2 m128 source operands extract imm8 byte aligned result shifted to the right by constant value in imm8 into xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r imm8 NA PAND Logical AND Opcode Instruction Op 64 Bit Compat Description En Leg Mode OF DB r PAND mm A Valid Valid Bitwise AND mm m64 mm m64 mm 66 OF DB r PAND xmm1 A Valid Valid Bitwise AND of xmm2 m128 xmm2 m128 and xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 120 chenes intel PANDN Logical AND NOT Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF DF r PANDN mm A Valid Valid Bitwise AND NOT of mm m64 mm m64 and mm 66 OF DF r PANDN xmm1 A Valid Valid Bitwise AND NOT of xmm2 m128 xmm2 m128 and xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m NA NA PAUSE Spin Loop Hint Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 90 5 Valid Valid Gives hint to processor that improves performance of spin wait loops Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand
212. extended topology enumeration leaf is the preferred mechanism for Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 256 e Documentation Changes n tel enumerating topology The presence of CPUID leaf OBH in a processor does not guar antee support for x2APIC If CPUI D EAX OBH EBX returns zero and maximum input value for basic CPUID information is greater than OBH then CPUID OBH leaf is not supported on that processor The extended topology enumeration leaf is intended to assist software with enumerating processor topology on systems that requires 32 bit x2APIC IDs to address individual logical processors Details of CPUID leaf OBH can be found in the reference pages of CPUID in Chapter of Intel 64 and IA 32 Architectures Software Developer s Manual Volume 2A Processor topology enumeration algorithm for processors supporting the extended topology enumeration leaf of CPUID and processors that do not support CPUID leaf OBH are treated in Section 8 9 4 Algorithm for Three Level Mappings of APIC_ID 10 12 8 Error Handling in x2APIC Mode RDMSR WRMSR operations to reserved addresses x2APIC mode cause general protection exceptions as do reserved bit violations see Section 10 12 1 3 Beyond illegal register access and reserved bit violations other APIC errors are logged in Error Status Register Writes of a non zero value to the Error Status Register in x2API C m
213. fter executing VMCLEAR for that region If a logical processor leaves VMX operation any VMCSs active on that logical processor may be corrupted see below To prevent such corruption of a VMCS that may be used either after a return to VMX operation or on another logical processor software should VMCLEAR that VMCS before executing the VMXOFF instruction or removing power from the processor e g as part of a transition to the 3 and S4 power states This section has identified operations that may cause a VMCS to become corrupted These operations may cause the VMCS s data to become undefined Behavior may be unpredictable if that VMCS used subsequently on any logical processor The following items detail some hazards of VMCS corruption VM entries may fail for unexplained reasons or may load undesired processor state The processor may not correctly support VMX non root operation as documented in Chapter 21 and may generate unexpected VM exits VM exits may load undesired processor state save incorrect state into the VMCS or cause the logical processor to transition to a shutdown state 21 10 3 Initializing a VMCS Software should initialize fields in a VMCS using VMWRITE before using the VMCS for VM entry Failure to do so may result in unpredictable behavior for example a VM entry may fail for unexplained reasons or a successful transition VM entry or VM exit may load processor state with unexpected values It is not
214. g point values in xmm2 m128 and stores the results in 1 Instruction Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 163 ee ehanas intel RCPSS Compute Reciprocal of Scalar Single Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 53 r RCPSS xmm1 A Valid Valid Computes the approximate xmm2 m32 reciprocal of the scalar single precision floating point value in xmm2 m32 and stores the result in xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA RDMSR Read from Model Specific Register Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 32 RDMSR A Valid Valid Read MSR specified by ECX into EDX EAX NOTES See IA 32 Architecture Compatibility section below Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA RDPMC Read Performance Monitoring Counters Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 33 RDPMC A Valid Valid Read performance monitoring counter specified by ECX into EDX EAX Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 164 e Documentation Ch
215. g Mode 0F 01 2 LGDT m16 amp 32 A N E Valid Load m into GDTR OF 01 3 LIDT m16 amp 32 A N E Valid Load m into IDTR 0F 01 2 LGDT m16 amp 64 A Valid N E Load m into GDTR OF 01 3 LIDT m16 amp 64 A Valid N E Load m into IDTR Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m r NA NA NA LLDT Load Local Descriptor Table Register Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 00 2 LLDT r m16 A Valid Valid Load segment selector r m16 into LDTR Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m r NA NA NA LMSW Load Machine Status Word Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 01 6 LMSW r m16 A Valid Valid Loads r m16 in machine status word of CRO Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m r NA NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 79 m e Documentation Changes n tel LOCK Assert LOCK Signal Prefix Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode FO LOCK A Valid Valid Asserts LOCK signal for duration of the accompanying instruction NOTES See IA 32 Architecture Compatibility section below Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA
216. ge the P flag from 0 to 1 no invali dation is necessary This is because no TLB entry or paging structure cache entry is created with information from a paging structure entry in which the P flag is 0 1 If a paging structure entry is modified to change the accessed flag from 0 to 1 no invalidation is necessary assuming that an invalidation was performed the last time the accessed flag was changed from 1 to 0 This is because no TLB entry or paging structure cache entry is created with information from a paging structure entry in which the accessed flag is 0 If a paging structure entry is modified to change the R W flag from 0 to 1 failure to perform an invalidation may result in a spurious page fault exception e g in response to an attempted write access but no other adverse behavior Such an exception will occur at most once for each affected linear address see Section 4 10 3 1 If a paging structure entry is modified to change the U S flag from 0 to 1 failure to perform an invalidation may result in a spurious page fault exception e g in response to an attempted user mode access but no other adverse behavior Such an exception will occur at most once for each affected linear address see Section 4 10 3 1 If a paging structure entry is modified to change the XD flag from 1 to O failure to perform an invalidation may result in a spurious page fault exception e g in response to an attempted instruction fetch but no o
217. gister System software only specifies the vector associated with the interrupt to be sent The semantics of sending a self PI via the SELF IPI register are identical to sending a self targeted edge triggered fixed interrupt with the specified vector Specifically the semantics are identical to the following settings for an inter processor interrupt sent via the ICR Destination Short hand ICR 19 18 01 Self Trigger Mode ICR 15 0 Edge Delivery Mode ICR 10 8 000 Fixed Vector ICR 7 0 Vector MSR Address 083FH 31 87 0 Reserved Vector Figure 10 31 SELF IPI register The SELF IPI register is a write only register A RDMSR instruction with address of the SELF IPI register causes a general protection exception The handling and prioritization of a self IPI sent via the SELF IPI register is architectur ally identical to that for an IPI sent via the ICR from a legacy xAPIC unit Specifically the state of the interrupt would be tracked via the Interrupt Request Register IRR and In Service Register ISR and Trigger Mode Register TMR as if it were received from the system bus Also sending the IPI via the Self Interrupt Register ensures that interrupt is delivered to the processor core Specifically completion of the WRMSR instruction to the SELF IPI register implies that the interrupt has been logged into the IRR As expected for edge triggered interrupts depending on the processor priority
218. gisters in x2APIC mode Table 10 6 lists the APIC registers that are available in x2APIC mode When appropriate the table also gives the offset at which Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 251 Documentation Changes each register is available on the page referenced by 2 APIC BASE 35 12 in xAPIC mode There is a one to one mapping between the x2APIC MSRs and the legacy xAPIC register offsets with the following exceptions The Destination Format Register DFR The DFR supported at offset OEOH x2APIC mode is not supported in x2APIC mode There is no MSR with address 80EH The Interrupt Command Register ICR The two 32 bit registers in xAPIC mode at offsets 300H and 310H are merged into a single 64 bit MSR in x2APIC mode with MSR address 830H There is no MSR with address 831H The SELF IPI register This register is available only in x2APIC mode at address 83FH In xAPIC mode there is no register defined at offset 3FOH Addresses the range 800H BFFH that are not listed in Table 10 6 including 80EH and 831H are reserved Executions of RDMSR and WRMSR that attempt to access such addresses cause general protection exceptions The MSR address space is compressed to allow for future growth Every 32 bit register on a 128 bit boundary in the legacy MMIO space is mapped to a single MSR in the local x2APIC MSR address space The upper 32 bits of all x2API
219. gs See Description section Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA LAR Load Access Rights Byte Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 02 Jr LAR r16 r16 m16 A Valid Valid r16 lt r16 m16 masked by FFOOH OF 02 Jr LAR r32 A Valid Valid r32 lt r32 m16 masked by r32 m16 OOFxFFOOH REXW 0F02 LAR r64 A Valid N E r64 r32 m16 masked by Ir r32 m16 OOFXFFOOH and zero extended NOTES 1 For all loads regardless of source or destination sizing only bits 16 0 are used Other bits are ignored Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m r NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 75 Documentation Changes LDDQU Load Unaligned Integer 128 Bits Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode F2 OF FO r LDDQU 1 A Valid Valid Load unaligned data from mem mem and return double quadword in xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA LDMXCSR Load MXCSR Register Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF AE 2 LDMXCSR m32 A Valid Valid Load MXCSR register from m32 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m r NA NA N
220. hanges 134 H intel PMINSW Minimum of Packed Signed Word Integers Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF EA r PMINSW mm1 A Valid Valid Compare signed word mm2 m64 integers in mm2 m64 and mm1 and return minimum values 66 OF EA r PMINSW xmm1 A Valid Valid Compare signed word xmm2 m128 integers xmm2 m128 and xmm1 and return minimum values Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA PMINUB Minimum of Packed Unsigned Byte Integers Opcode Instruction 64 Bit Compat Description En Leg Mode OF DA r PMINUB mm1 A Valid Valid Compare unsigned byte mm2 m64 integers in mm2 m64 and mm1 and returns minimum values 66 OF DA r PMINUB xmm1 A Valid Valid Compare unsigned byte xmm2 m128 integers in xmm2 m128 xmm1 and returns minimum values Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 135 Documentation Changes PMINUD Minimum of Packed Dword Integers Opcode Instruction 66 OF 383B r PMINUDxmm1 Op 64 Bit En Mode A Valid Compat Description Leg Mode Valid Compare packed unsigned xmm2 m128 dword integers in xmm1 and xmm2 m128 and store packed mini
221. hanges 140 chenes intel PMULHW Multiply Packed Signed Integers and Store High Result Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF E5 r PMULHW mm A Valid Valid Multiply the packed signed mm m64 word integers mm1 register and mm2 m64 and store the high 16 bits of the results in mm1 66 OF E5 r PMULHW 1 Valid Valid Multiply the packed signed xmm2 m128 word integers in xmm1 and xmm2 m128 and store the high 16 bits of the results in 1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA PMULLD Multiply Packed Signed Dword Integers and Store Low Result Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 660F3840 r PMULLDxmml A Valid Valid Multiply the packed dword xmm2 m128 signed integers xmm1 and xmm2 m128 and store the low 32 bits of each product in xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 141 chenes intel PMULLW Multiply Packed Signed Integers and Store Low Result Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF D5 r PMULLW mm A Valid Valid Multiply the packed signed mm m64 word integers mm1 register and mm2 m64
222. hes and to perform I 0 UOPS EXECUTED CO Counts cycles when the Uops RE ACTIVE CYCLES executed were issued from any NO PORT5 ports except port 5 Use Cmask 1 for active cycles Cmask 0 for weighted cycles Use CMask 1 Invert 1 to count 0 4 stalled cycles Use Cmask 1 Edge 1 Invert 1 to count 0 4 stalls UOPS EXECUTED CO Counts cycles when the Uops RE ACTIVE CYCLES executing Use Cmask 1 for active cycles Cmask 0 for weighted cycles Use CMask 1 Invert 1 to count P0 4 stalled cycles Use Cmask 1 Edge 1 Invert 1 to count 4 stalls B7H 01H OFF CORE RESPONS see Section 30 6 1 3 Off core Requires E 0 Response Performance Monitoring programming in the Processor Core MSR 01A6H BBH 01H OFF CORE RESPONS see Section 30 6 1 3 Off core Requires El Response Performance Monitoring programming in the Processor Core MSR 01A7H Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 275 Documentation Changes Non architectural Performance monitoring events that are located in the uncore sub intel system are implementation specific between different platforms using processors based on Intel microarchitecture Nehalem Processors with CPUID signature of DisplayFamily DisplayModel 06 1AH 06_1EH and 06_1FH support performance events listed in Table A 3 Table 3 Non Architectural Performance Events In the Processor Unco
223. hold value occurred in a machine check bank supporting CMCI see Section 15 5 1 CMCI Local APIC Interface The LVT performance counter register and its associated interrupt were introduced in the P6 processors and are also present in the Pentium 4 and Intel Xeon processors The LVT Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 242 e Documentation Changes n tel thermal monitor register and its associated interrupt were introduced in the Pentium 4 and Intel Xeon processors As shown in Figure 10 8 some of these fields and flags are not available and reserved for some entries 31 18 17 16 15 1312 11 87 0 Timer Vector A Timer Mode Value after Reset 0001 0000H 0 One shot 1 Periodic Delivery Status 0 Idle 1 Send Pending Maskt 0 Not Masked 1 Masked Interrupt Input Delivery Mode Pin Polarity 000 Fixed 010 SMI 100 NMI Remote 111 ExtINT IRR 101 INIT All other combinations are Reserved Trigger Mode 0 Edge 1 Level 31 17 y Y y 111087 0 LINTO Vector LINT1 Vector Error Vector Performance Mon Counters Vector Thermal Sensor Vector 16 15 14 13 12 Reserved Address FEEO 0350H Address FEEO 0360H Address FEEO 0370H T Pentium 4 and Intel Xeon processors When a Address FEEO 0340H performance monitoring counters interrupt is generated Address
224. id Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Compat Leg Mode Valid Valid N E Valid N E Valid Valid N E Valid N E Valid Valid N E Valid N E Valid Valid N E Description Move word at seg offset to AX Move doubleword at seg offset to EAX Move quadword at offset to RAX Move AL to seg offset Move AL to offset Move AX to seg offset Move EAX to seg offset Move RAX to offset Move imm8 to r8 Move imme to r8 Move imm16 to r16 Move imm32 to r32 Move imm64 to r64 Move imm8 to r m8 Move imme to r m8 Move imm16 to r m16 Move imm32 to r m32 Move imm32 sign extended to 64 bits to r m64 NOTES The moffs8 moffs16 moffs32 and moffs64 operands specify a simple offset relative to the segment base where 8 16 32 and 64 refer to the size of the data The address size attribute of the instruction determines the size of the offset either 16 32 or 64 bits n 32 bit mode the assembler may insert the 16 bit operand size prefix with this instruction see the following Description section for further information In 64 bit mode r m8 can not be encoded to access the follow ing byte registers if a REX prefix is used AH BH CH DH Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 89 Documentation Changes Instruction Operand Enco
225. id Valid Store selected bit in CF flag and clear REX W 0F B3 BTR r m64 r64 A Valid Store selected bit in CF flag and clear OF BA 6 ib BTR r m16 imm8 Valid Valid Store selected bit in CF flag and clear OF BA 6 ib r m32 imm8 Valid Valid Store selected bit in CF flag and clear REX W r m64 imm8 Valid N E Store selected bit in CF flag 6 ib and clear Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 24 H intel Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m r w ModRM reg r NA NA B ModRM r m w imm8 NA NA BTS Bit Test and Set Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode OF AB BTS r m16 r16 A Valid Valid Store selected bit in CF flag and set OF AB BTS r m32 r32 A Valid Valid Store selected bit in CF flag and set REX W 0FAB BTSr m64 r64 A Valid N E Store selected bit in CF flag and set OF 5 ib BTSr m16 imm8 Valid Valid Store selected bit in CF flag and set OF BA 5 ib BTS r m32 imm8 Valid Valid Store selected bit in CF flag and set REX W O0F BA BTSr m64 imm8 Valid N E Store selected bit in CF flag 5 ib and set Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m r w ModRM reg r NA NA B ModRM r m w imm8 NA NA CALL Call Procedure
226. ign 5 1 Ir OF 44 r CMOVZr16 r m16 A Valid Valid Move if zero ZF 21 OF 44 r CMOVZ r32 r m32 A Valid Valid Move if zero ZF 21 REX W 0F44 CMOVZr64 r m64 Valid Move if zero ZF 21 Ir Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m 5 Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 3Cib CMP AL imm8 D Valid Valid Compare imm8 with AL 3Diw CMP AX imm16 D Valid Valid Compare imm16 with AX 3D id CMP EAX imm32 D Valid Valid Compare imm32 with EAX REXW 3Did RAX imm32 D Valid N E Compare imm32 sign extended to 64 bits with RAX 80 7 ib r m8 imm8 Valid Valid Compare imm8 with r m8 REX 480 7ib CMPr m8 imm8 Valid N E Compare imm8 with r m8 81 7 iw CMP r m16 C Valid Valid Compare imm16 with r m16 imm16 Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 33 Documentation Changes Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 81 7 id CMP r m32 C Valid Valid Compare imm32 with imm32 r m32 REX W 81 7 r m64 C Valid N E Compare imm32 sign id imm32 extended to 64 bits with r m64 83 7 ib CMP r m16 imm8 Valid Valid Compare imm8 with r m16 83 7 ib r m32 imm8 Valid Valid Compare imm8 with r m32 REXW 83 7 CMPr m64 imm8 Valid Compar
227. in memory before the other logical processor executes VMPTRLD for the VMCS to make it active on the second logical processor A VMCS that is made active on more than one logical processor may become corrupted see below Software should use the VMREAD and VMWRITE instructions to access the different fields in the current VMCS see Section 21 10 2 Software should never access or modify the VMCS data of an active VMCS using ordinary memory operations in part because the format used to store the VMCS data is implementation specific and not architecturally defined and also because a logical processor may maintain some VMCS Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 264 e Documentation Changes n tel data of an active VMCS on the processor and not in the VMCS region The following items detail some of the hazards of accessing VMCS data using ordinary memory operations Any data read from a VMCS with an ordinary memory read does not reliably reflect the state of the VMCS Results may vary from time to time or from logical processor to logical processor Writing to a VMCS with an ordinary memory write is not guaranteed to have a deter ministic effect on the VMCS Doing so may cause the VMCS to become corrupted see below Software can avoid these hazards by removing any linear address mappings to a VMCS region before executing a VMPTRLD for that region and by not remapping it until a
228. inear addresses with each of those page numbers alternatively it could use MOV to CR3 or MOV to CRA If software modifies a paging structure entry that references another paging structure it may use one of the following approaches depending upon the types and number of translations controlled by the modified entry Execute I NVLPG for linear addresses with each of the page numbers with trans lations that would use the entry However if no page numbers that would use the entry have translations e g because the P flags are 0 in all entries in the paging structure referenced by the modified entry it remains necessary to execute INVLPG at least once Execute MOV to CR3 if the modified entry controls no global pages Execute MOV to CR4 to modify CR4 PGE If software using PAE paging modifies a PDPTE it should reload CR3 with the register s current value to ensure that the modified PDPTE is loaded into the corre sponding PDPTE register see Section 4 4 1 With PAE paging the PDPTEs are stored in internal non architectural registers The operation of these registers is described in Section 4 4 1 and differs from that described here One execution of INVLPG is sufficient even for a page with size greater than 4 KBytes Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 223 e Documentation Changes tel If the nature of the paging structures is such that a single entry may be
229. ing a location used to implement a semaphore The integrity of a bus lock is not affected by the alignment of the memory field The LOCK semantics are followed for as many bus cycles as necessary to update the entire operand However it is recommend that locked accesses be aligned on their natural boundaries for better system performance Any boundary for an 8 bit access locked or otherwise 16 bit boundary for locked word accesses 32 bit boundary for locked doubleword accesses 64 bit boundary for locked quadword accesses Locked operations are atomic with respect to all other memory operations and all exter nally visible events Only instruction fetch and page table accesses can pass locked instructions Locked instructions can be used to synchronize data written by one processor and read by another processor For the P6 family processors locked operations serialize all outstanding load and store operations that is wait for them to complete This rule is also true for the Pentium 4 and Intel Xeon processors with one exception Load operations that reference weakly ordered memory types such as the WC memory type may not be serialized Locked instructions should not be used to ensure that data written can be fetched as instructions 8 1 3 Handling Self and Cross Modifying Code The act of a processor writing data into a currently executing code segment with the intent of executing that data as code is called self mo
230. ing byte registers if a REX prefix is used AH BH CH DH Instruction Operand Encoding Op En A Operand 1 ModRM r m w Operand 2 NA Operand 3 Operand 4 NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes Documentation Changes OR Logical Inclusive OR intel Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 0C ib OR AL imm8 A Valid Valid AL OR imm8 0D iw OR AX imm16 A Valid Valid AX OR imm16 0D id OR EAX imm32 A Valid Valid EAX OR imm32 REX W 0Did OR RAX imm32 A Valid N E RAX OR imm32 sign extended 80 1 ib OR r m8 imm8 B Valid Valid r m8 OR imm8 REX 80 lib 8 imm8 B Valid N E r m8 OR imm8 81 1iw OR r m16 imm16 Valid Valid r m16 OR imm16 81 1 id OR r m32 imm32 B Valid Valid r m32 OR imm32 REXW 81 1 ORr m64 imm32 Valid N E r m64 OR imm32 sign id extended 83 1 ib OR r m16 imm8 Valid Valid r m16 OR imm8 sign extended 83 1 ib OR r m32 imm8 Valid Valid r m32 OR imma sign extended REXW 83 1 ORr m64 imm8 Valid N E r m64 OR imm8 sign ib extended 08 r OR r m8 r8 C Valid Valid r m8 OR r8 REX 08 r OR r m8 r8 C Valid N E r m8 OR r8 09 r OR r m16 r16 C Valid Valid r m16 OR r16 09 r OR r m32 r32 C Valid Valid r m32 OR r32 REXW 09 r OR r m64 r64 C Valid N E r m64 OR r64 r OR r8 r m8 D Valid Valid r8 OR r m8
231. intel Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes December 2009 Notice The Intel 64 and 32 architectures may contain design defects or errors known as errata that may cause the product to deviate from published specifications Current characterized errata are documented in the specification updates Document Number 252046 026 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS NO LICENSE EXPRESS OR IMPLIED BY ESTOPPEL OR OTHERWISE TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT EXCEPT AS PROVIDED IN INTEL S TERMS AND CONDITI ONS OF SALE FOR SUCH PRODUCTS INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY RELATING TO SALE AND OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE MERCHANTABILITY OR INFRINGEMENT OF ANY PATENT COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT Intel products are not intended for use in medical life saving or life sustaining applications Intel may make changes to specifications and product descriptions at any time without notice 64 bit computing on Intel architecture requires a computer system with a processor chipset BIOS operating system device drivers and applications enabled for Intel 64 architecture Performance will vary depending on your hardware and software configurations Consult with your system vendor f
232. interrupts received or number of cache loads Appendix A Performance Monitoring Events in the Intel 64 and 1 32 Architectures Software Developer s Manual Volume lists the events that can be counted for various processors the Intel 64 and 32 architecture fami lies The RDPMC instruction is not a serializing instruction that is it does not imply that all the events caused by the preceding instructions have been completed or that events caused by subsequent instructions have not begun If an exact event count is desired software must insert a serializing instruction such as the CPUID instruction before and or after the RDPMC instruction In the Pentium 4 and Intel Xeon processors performing back to back fast reads are not guaranteed to be monotonic To guarantee monotonicity on back to back reads a serial izing instruction must be placed between the two RDPMC instructions The RDPMC instruction can execute in 16 bit addressing mode or virtual 8086 mode however the full contents of the ECX register are used to select the counter and the event count is stored in the full EAX and EDX registers The RDPMC instruction was intro duced into the 32 Architecture in the Pentium Pro processor and the Pentium processor with MMX technology The earlier Pentium processors have performance monitoring counters but they must be read with the RDMSR instruction Operation Intel Core i7 processor family and Intel Xeo
233. ion 15 322 1 32_ MCi STATUS STATUS MSRS and Appendix E 43AH 1082 MSR 14 ADDR Package See Section 15 3 2 3 IA32 MCi ADDR MSRs 43BH 1083 MSR MC14 MISC Package See Section 15 3 24 1 32 MCi MISC MSRs Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 289 Documentation Changes intel Register Scope Address Register Name Bit Description Hex Dec 43 1084 MSR 15 Package See Section 15 3 2 1 4 32 MCi MSRs 43DH 1085 MSR_MC15_ Package See Section 15 3 22 A32 MCi STATUS STATUS MSRS and Appendix E 43EH 1086 MSR 15 ADDR Package See Section 15 3 2 3 A32 MCi ADDR MSRs 43FH 1087 MSR 15 MISC Package See Section 15 3 24 32 MCi MISC MSRs 440H 1088 16 Package See Section 15 3 2 1 4 32 MCi MSRs 441H 1089 MSR_MC16_ Package See Section 15 3 2 2 4 32 MCi STATUS STATUS MSRS and Appendix E 442H 1090 16 ADDR Package See Section 15 3 2 3 A32 MCi ADDR MSRs 443H 1091 MSR MC16 MISC Package See Section 15 3 24 1 32 MCi MISC MSRs 444H 1092 MC17 Package See Section 15 3 2 1 IA32_MCi_CTL MSRs 445H 1093 17 Package See Section 15 3 2 2 4 32 MCi STATUS STATUS MSRS and Appendix E 446H 1094 17 AD
234. ion Op 64 bit Compat Description En Mode Leg Mode 66 0138 30 r PMOVZXBW A Valid Valid Zero extend 8 packed 8 bit xmm1 integers in the low 8 bytes xmm2 m64 of xmm2 m64 to 8 packed 16 bit integers in xmm1 66 0 38 31 PMOVZXBD Valid Valid Zero extend 4 packed 8 bit 1 integers in the low 4 bytes xmm2 m32 of xmm2 m32 to 4 packed 32 bit integers in xmm1 66 0f 3832 r PMOVZXBQ A Valid Valid Zero extend 2 packed 8 bit 1 integers in the low 2 bytes xmm2 m16 of xmm2 m16 to 2 packed 64 bit integers xmm1 66 Of 38 33 r PMOVZXWD A Valid Valid Zero extend 4 packed 16 bit 1 integers in the low 8 bytes xmm2 m64 of xmm2 m64 to 4 packed 32 bit integers xmm1 66 0 38 34 PMOVZXWQ Valid Valid Zero extend 2 packed 16 bit 1 integers in the low 4 bytes xmm2 m32 of xmm2 m32 to 2 packed 64 bit integers in xmm1 66 0f 3835 r PMOVZXDQ A Valid Valid Zero extend 2 packed 32 bit 1 integers in the low 8 bytes xmm2 m64 of xmm2 m64 to 2 packed 64 bit integers in xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA PMULDQ Multiply Packed Signed Dword Integers Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 660F3828 r PMULDQxmml A Valid Valid Multiply the packed signed xmm2 m128 dword integers in xmm1 and xmm2 m128 and store the quadw ord product in xmm1 Instruction Operand Encoding
235. king is observed for arbitrarily misaligned fields This instruction s operation is the same in non 64 bit modes and 64 bit mode IA 32 Architecture Compatibility Beginning with the P6 family processors when the LOCK prefix is prefixed to an instruc tion and the memory area being accessed is cached internally in the processor the LOCK signal is generally not asserted Instead only the processor s cache is locked Here the processor s cache coherency mechanism ensures that the operation is carried out atomically with regards to memory See Effects of a Locked Operation on Internal Processor Caches in Chapter 8 of Intel 64 and 32 Architectures Software Devel oper s Manual Volume 3A the for more information on locking of caches Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 80 H intel LODS LODSB LODSW LODSD LODSQ Load String Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode AC LODS m8 A Valid Valid For legacy mode Load byte at address DS E SI into AL For 64 bit mode load byte at address R SI into AL AD LODS m16 A Valid Valid For legacy mode Load word at address DS E SI into AX For 64 bit mode load word at address R SI into AX AD LODS m32 A Valid Valid For legacy mode Load dword at address DS E SI into EAX For 64 bit mode load dword at address R SI into EAX REX W AD LODS m64 A Valid N E Load qword at address R SI into
236. leaf See Chapter 30 of Intel 64 and 1 32 Architectures Software Developer s Manual Volume 3B This counter type is selected if ECX 30 is set ECX 29 0 specifies the index The width of general purpose performance counters are 40 bits for processors that do not support architectural performance monitoring counters The width of special purpose performance counters are implementation specific The width of fixed function performance counters and general purpose perfor mance counters on processor supporting architectural performance monitoring are reported by CPUID OAH leaf Table 4 2 lists valid indices of the general purpose and special purpose performance counters according to the derived displayed family displayed model values of CPUID encoding for each processor family Table 4 2 Valid General and Special Purpose Performance Counter Index Range for RDPMC Processor Family Displayed Family Dis Valid PMC General played Model Other Index Range purpose Signatures Counters P6 06H 01H 06H 03H 0 1 0 1 06H 05H 06H 06H 06H 07H 06H 08H 06H 0AH 06H 0 Pentium 4 Intel Xeon OFH 00 0 01H gt 0 lt 17 gt 0 and lt 17 0 OFH_02H Pentium 4 Intel Xeon processors OFH_03H 0FH_04H gt 0and lt 17 gt 0and lt 17 OFH_06H and L3 is absent Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 165 m e Documentation Changes n tel Tabl
237. leave low order doublewords from xmm1 and xmm2 m128 into xmml Interleave low order quadw ord from xmm1 and xmm2 m128 into xmm1 register Instruction Operand Encoding Op En Operand 1 A ModRM reg w 2 ModRM r m 3 Operand 4 NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 157 Documentation Changes intel PUSH Push Word Doubleword or Quadword Onto the Stack Opcode FF 6 FF 6 FF 6 50 4rw 50 50 6A 68 68 0E 16 1E 06 OF A0 OF AO OF AO OF A8 OF A8 OF A8 Instruction PUSH r m16 PUSH r m32 PUSH r m64 PUSH r16 PUSH r32 PUSH r64 PUSH imm8 PUSH imm16 PUSH imm32 PUSH CS PUSH SS PUSH DS PUSH ES PUSH FS PUSH FS PUSH FS PUSH GS PUSH GS PUSH GS Op En A A A 0 0 UD 64 Bit Mode Valid N E Valid Valid N E Valid Valid Valid Valid Invalid Invalid Invalid Invalid Valid N E Valid Valid N E Valid Compat Leg Mode Valid Valid N E Valid Valid N E Valid Valid Valid Valid Valid Valid Valid Valid Valid N E Valid Valid N E Description Push r m16 Push r m32 Push r m64 Default operand size 64 bits Push r16 Push r32 Push r64 Default operand size 64 bits Push sign extended imm8 Stack pointer is incremented by the size
238. lected by imm8 from xmm1 and xmm1 m128 to xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r imm8 NA SIDT Store Interrupt Descriptor Table Register Opcode Instruction 64 Bit Compat Description En Mode Leg Mode OF 01 1 SIDT m A Valid Valid Store IDTR to m Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m NA NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 186 chenes intel SLDT Store Local Descriptor Table Register Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 00 0 SLDT r m16 A Valid Valid Stores segment selector from LDTR in r m16 REX W 0F00 SLDT r64 m16 A Valid Valid Stores segment selector 0 from LDTR in r64 m16 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m NA NA NA SMSW Store Machine Status Word Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 01 4 SMSW r m16 A Valid Valid Store machine status word to r m16 OF 01 4 SMSW r32 m16 A Valid Valid Store machine status word in low order 16 bits of r32 m16 high order 16 bits of r32 are undefined REX W 0F01 SMSW r64 m16 A Valid Valid Store machine status word 14 in low order 16 bits of r64 m16 high order 16 bits of r32 are un
239. less or equal ZF 0 and SF OF Move if not less or equal ZF 0 and SF OF Move if not less or equal ZF 0 and SF OF Move if not overflow OF 0 Move if not overflow OF 0 Move if not overflow OF 0 Move if not parity 0 Move if not parity 0 Move if not parity 0 Move if not sign SF 0 Move if not sign SF 0 Move if not sign SF 0 Move if not zero ZF 0 Move if not zero ZF 0 Move if not zero ZF 0 Move if overflow 0 Move if overflow 0 Move if overflow 0 32 e Documentation Changes n tel Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 4A r CMOVP r16 r m16 A Valid Valid Move if parity 1 OF 4A r CMOVP r32 r m32 A Valid Valid Move if parity PF 1 REXW 0F 4A CMOVP r64 r m64 A Valid Move if parity 1 OF 4A r CMOVPEr16 r m16 Valid Valid Move if parity even PF 1 OF 4A r CMOVPEr32 r m32 A Valid Valid Move if parity even PF 1 REXW 0F 4A CMOVPEr64 r m64 Valid Move if parity even 1 Ir OF 4B r CMOVPOri16 r m16 Valid Valid Move if parity odd PF 0 OF 4B r CMOVPO r32 r m32 Valid Valid Move if parity odd PF 0 REXW 0F4B CMOVPOr64 r m64 Valid N E Move if parity odd PF 0 OF 48 r CMOVSr16 r ml6 Valid Valid Move if sign SF 1 OF 48 r CMOVSr32 r m32 Valid Valid Move if sign SF 1 REXW 0F 48 CMOVSr64 r m64 Valid Move if s
240. lid Valid Valid Valid Valid Valid Compat Leg Mode Valid Valid Valid Valid Valid Valid Description Output byte from memory location specified in DS E SI or RSI to 1 0 port specified in DX Output word from memory location specified in DS E SI or RSI to 1 0 port specified in DX Output doubleword from memory location specified in DS E SI or RSI to I O port specified in DX Output byte from memory location specified in DS E SI or RSI to 1 0 port specified in DX Output word from memory location specified in DS E SI or RSI to 1 0 port specified in DX Output doubleword from memory location specified in DS E SI or RSI to 1 0 port specified in DX NOTES See IA 32 Architecture Compatibility section below n 64 bit mode only 64 bit RSI and 32 bit ESI address sizes are supported In non 64 bit mode only 32 bit ESI and 16 bit SI address sizes are supported Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 114 H intel Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA IA 32 Architecture Compatibility After executing an OUTS OUTSB OUTSW or OUTSD instruction the Pentium processor ensures that the EWBE pin has been sampled active before it begins to execute the next instruction Note that the instruction can be prefetched if EWBE is
241. lid Valid Valid Valid Valid Valid Valid Valid Compat Leg Mode Valid Valid Valid Valid Valid Valid Valid N E Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes Description Jump short if above CF 0 and ZF 0 Jump short if above or equal CF 0 Jump short if below CF 1 Jump short if below or equal CF 1 or ZF 1 Jump short if carry CF 1 Jump short if CX register is 0 Jump short if ECX register is 0 Jump short if RCX register is 0 Jump short if equal ZF 1 Jump short if greater ZF 0 and SF OF Jump short if greater or equal SF 0F Jump short if less SF OF Jump short if less or equal ZF 1 or SFz OF Jump short if not above CF 1 or ZF 21 Jump short if not above or equal CF 1 Jump short if not below CF 0 Jump short if not below or equal CF 0 and ZF 0 Jump short if not carry CF 0 Jump short if not equal 2 0 Jump short if not greater ZF 1 or SFz OF Jump short if not greater or equal SFz OF Jump short if not less SF OF Jump short if not less or equal ZF 0 and SF OF 69 Documentation Changes Opcode 71 cb 7B cb 79 cb 75 cb 70 cb 7A cb cb 7B cb 78 cb 74 cb OF 87 cw OF 87 cd OF 83 cw OF 83 cd OF 82 cw OF 82
242. lid Convert one double xmm2 m64 precision floating point value in xmm2 m64 to one single precision floating point value in xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA CVTSI2SD Convert Dword Integer to Scalar Double Precision FP Value Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode F2 OF 2A r CVTSI2SDxmm A Valid Valid Convert one signed r m32 doublew ord integer from r m32 to one double precision floating point value in xmm F2REXWOF2A CVTSI2SDxmm A Valid N E Convert one signed Ir r m64 quadword integer from r m64 to one double precision floating point value in xmm Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 48 chenes intel CVTSI2SS Convert Dword Integer to Scalar Single Precision FP Value Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 2A Jr CVTSI2SSxmm A Valid Valid Convert one signed r m32 doubleword integer from r m32 to one single precision floating point value in xmm F3REXWOF2A CVTSI2S5S xmm A Valid N E Convert one signed Ir r m64 quadword integer from r m64 to one single precision floating point value in xmm Instruction Operand Encoding Op En Operand 1 Opera
243. limit selector r32 m16 REXW 0F03 LSLr64 r32 m16 A Valid Valid Load r64 lt segment limit Ir selector r32 m16 NOTES For all loads regardless of destination sizing only bits 16 0 are used Other bits are ignored Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 82 chenes intel LTR Load Task Register Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 00 3 LTR r m16 A Valid Valid Load r m16 into task register Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m r NA NA NA MASKMOVDQU Store Selected Bytes of Double Quadword Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF F7 r MASKMOVDQU A Valid Valid Selectively write bytes from xmm1 xmm2 xmm1 to memory location using the byte mask in 2 The default memory location is specified by DS EDI Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM ModRM r m NA Description Stores selected bytes from the source operand first operand into an 128 bit memory location The mask operand second operand selects which bytes from the source operand are written to memory The source and mask operands are XMM r
244. llows software to configure paging structure entries in which bits 2 0 have value 100b indicating an execute only translation Bit 6 indicates support for a page walk length of 4 If bit 8 is read as 1 the logical processor allows software to configure the EPT paging structure memory type to be uncacheable UC see Section 21 6 11 If bit 14 is read as 1 the logical processor allows software to configure the EPT paging structure memory type to be write back WB If bit 16 is read as 1 the logical processor allows software to configure a EPT PDE to map a 2 Mbyte page by setting bit 7 in the EPT PDE If bit 17 is read as 1 the logical processor allows software to configure PDPTE to map a 1 Gbyte page by setting bit 7 in the EPT PDPTE Support for the INVEPT instruction see Chapter 6 of the Intel 64 and 32 Archi tectures Software Developer s Manual Volume 3A and Section 25 3 3 1 If bit 20 is read as 1 the INVEPT instruction is supported If bit 25 is read as 1 the single context INVEPT type is supported If bit 26 is read as 1 the all context INVEPT type is supported Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 292
245. low OF 1 REX 0F 90 SETO r m8 A Valid N E Set byte if overflow OF 1 OF 9A SETP r m8 A Valid Valid Set byte if parity PF 1 REX 0F 9A SETP r m8 A Valid N E Set byte if parity PF 1 OF 9A SETPE r m8 A Valid Valid Set byte if parity even 1 REX 0F 9A SETPE r m8 A Valid N E Set byte if parity even PF 1 OF 9B SETPO r m8 A Valid Valid Set byte if parity odd PF 0 REX 0F 9B SETPO r m8 A Valid N E Set byte if parity odd PF 0 OF 98 SETS r m8 A Valid Valid Set byte if sign SF 1 REX 0F 98 SETS r m8 A Valid N E Set byte if sign SF 1 OF 94 SETZ r m8 A Valid Valid Set byte if zero ZF 1 REX 0F 94 SETZ r m8 A Valid N E Set byte if zero ZF 1 NOTES n 64 bit mode r m8 can not be encoded to access the following byte registers if a REX prefix is used AH BH CH DH Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m r NA NA NA SFENCE Store Fence Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF AE 7 SFENCE A Valid Valid Serializes store operations Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 183 ee ehanas intel Description Performs a serializing operation on all store to memory instructions that were issued prior the SFENCE instruction This serializing operation guarantees
246. mum values in xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 136 Documentation Changes PMINUW Minimum of Packed Word Integers Opcode Instruction Op 64 Bit Compat Description En Leg Mode 66 38 PMINUW xmm1 A Valid Valid Compare packed unsigned xmm2 m128 word integers in xmm1 and xmm2 m128 and store packed minimum values in 1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA PMOVMSKB Move Byte Mask Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF D7 r PMOVMSKBr32 A Valid Valid Move a byte mask of mm to mm r32 REX W 0FD7 PMOVMSKBr64 A Valid N E Move a byte mask of mm to Ir mm the lower 32 bits of r64 and zero fill the upper 32 bits 66 OF D7 r PMOVMSKB reg Valid Valid Move a byte mask of xmm xmm to reg The upper bits of r32 or r64 are zeroed Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM reg NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 137 Documentation Changes PMOVSX Packed Move with Sign Extend Opcode 66 Of 38 20 r 66 0f 38 21 r 66 Of 38 22 r 66 Of 38 23
247. n Changes 45 H intel CVTPI2PS Convert Packed Dword Integers to Packed Single Precision FP Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 2A r 2 5 xmm Valid Valid Convert two signed mm m64 doubleword integers from mm m64 to two single precision floating point values in xmm Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m r NA NA CVTPS2DQ Convert Packed Single Precision FP Values to Packed Dword Integers Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF 5B r CVTPS2DQxmmi1 A Valid Valid Convert four packed single xmm2 m128 precision floating point values from xmm2 m128 to four packed signed doublew ord integers in 1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA CVTPS2PD Convert Packed Single Precision FP Values to Packed Double Precision FP Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 5A r CVTPS2PDxmml Valid Valid Convert two packed single xmm2 m64 precision floating point values xmm2 m64 to two packed double precision floating point values in xmm1 Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 46 H intel Instruction Operand Encoding Op En Operand
248. n processor 3400 5500 series Most significant counter bit MSCB 47 IF CR4 PCE 1 or CPL 0 or CRO PE 0 THEN IF ECX 30 1 and ECX 29 0 in valid fixed counter range EAX lt 1432 FIXED CTR ECX 30 0 EDX lt A32 FIXED CTR ECX MSCB 32 ELSE IF ECX 30 2 0 and ECX 29 0 in valid general purpose counter range EAX lt PMC ECX 30 0 31 0 EDX PMC ECX 30 0 MSCB 32 ELSE ECX is not valid or CR4 PCE is 0 and CPL is 1 2 or 3 and CRO PE is 1 GP 0 FI Intel Core 2 Duo processor family and Intel Xeon processor 3000 5100 5300 7400 series Most significant counter bit MSCB 39 IF CR4 PCE 1 or CPL 0 or CRO PE 0 THEN IF ECX 30 1 and ECX 29 0 in valid fixed counter range EAX lt A32 FIXED CTR ECX 30 0 EDX lt A32 FIXED CTR ECX MSCB 32 ELSE IF ECX 30 2 0 and ECX 29 0 in valid general purpose counter range EAX lt PMC ECX 30 0 31 0 EDX PMC ECX 30 0 MSCB 32 ELSE IF ECX 30 2 0 and ECX 29 0 in valid special purpose counter range EAX lt PMC ECX 30 0 31 0 32 bit read Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 167 ee ehanas intel ELSE ECX is not valid or CR4 PCE is 0 and CPL is 1 2 or 3 and CRO PE is 1 3tGP 0 FI P6 family processors and Pentium processor with MMX technology IF ECX 0 or 1 and CR4 PCE 1 or CPL 0 or CRO PE 0 THEN EAX lt PMC ECX 31 0 EDX
249. n self modifying code and cross modifying code also apply to the Intel 64 architecture 8 1 4 Effects of a LOCK Operation on Internal Processor Caches For the Intel486 and Pentium processors the LOCK signal is always asserted on the bus during a LOCK operation even if the area of memory being locked is cached in the processor For the P6 and more recent processor families if the area of memory being locked during a LOCK operation is cached in the processor that is performing the LOCK operation as write back memory and is completely contained in a cache line the processor may not assert the LOCK signal on the bus Instead it will modify the memory location inter nally and allow it s cache coherency mechanism to ensure that the operation is carried out atomically This operation is called cache locking The cache coherency mechanism Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 232 e Documentation Changes n tel automatically prevents two or more processors that have cached the same area of memory from simultaneously modifying data in that area 8 2 1 Memory Ordering the Intel Pentium and Intel486 Processors The Pentium and Intel486 processors follow the processor ordered memory model however they operate as strongly ordered processors under most circumstances Reads and writes always appear in programmed order at the system bus except for the following situation where p
250. nd 2 Operand 3 Operand 4 A ModRM reg w ModRM r m r NA NA CVTSS2SD Convert Scalar Single Precision FP Value to Scalar Double Precision FP Value Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 5A r CVTSS2SD 1 Valid Valid Convert one single precision xmm2 m32 floating point value in xmm2 m32 to one double precision floating point value in xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA VTSS2SI Convert Scalar Single Precision FP Value to Dword Integer Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 2D r CVTSS2SI r32 A Valid Valid Convert one single precision xmm m32 floating point value from xmm m32 to one signed doubleword integer in r32 F3REX WOF2D CVTSS2SI r64 A Valid Convert one single precision Ir xmm m32 floating point value from xmm m32 to one signed quadword integer in r64 Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 49 H intel Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m r NA NA CVTTPD2DQ Convert with Truncation Packed Double Precision FP Values to Packed Dword Integers two packed signed doublew ord integers in xmm1 using truncation Opcode Instruction Op 64 Bit Compat Descrip
251. nd 3 Operand 4 A ModRM reg r w ModRM r m r NA NA PHSUBSW Packed Horizontal Subtract and Saturate Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 38 07 Jr PHSUBSW mm1 A Valid Valid Subtract 16 bit signed mm2 m64 integer horizontally pack saturated integers to MM1 66 0F 3807 r PHSUBSW 1 Valid Valid Subtract 16 bit signed xmm2 m128 integer horizontally pack saturated integers to XMM1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 129 Documentation Changes PINSRB PINSRD PINSRQ Insert Byte Dword Qword Opcode Instruction 66 0F 3A 20 PINSRB ib r32 m8 imm8 66 OF 3A22 r PINSRD xmml1 ib r m32 imm8 66 REX W OF PINSRQ xmm1 22 r m64 imm8 Op En A 64 Bit Mode Valid Valid N E Compat Description Leg Mode Valid Insert a byte integer value from r32 m8 into xmm1 at the destination element in xmm1 specified by imm8 Valid Insert a dword integer value from r m32 into the xmm1 at the destination element specified by imm8 Valid Insert a qword integer value from r m32 into the xmm1 at the destination element specified by imm8 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m
252. necessary to initialize fields that the logical processor will not use For example it is not necessary to initialize the MSR bitmap address if the use MSR bitmaps VM execution control is O A processor maintains some VMCS information that cannot be modified with the VMWRITE instruction this includes a VMCS s launch state see Section 21 1 Such information may be stored in the VMCS data portion of a VMCS region Because the format of this information is implementation specific there is no way for software to know when it first allocates a region of memory for use as a VMCS region how the processor will determine this information from the contents of the memory region In addition to its other functions the VMCLEAR instruction initializes any implementa tion specific information in the VMCS region referenced by its operand To avoid the uncertainties of implementation specific behavior software should execute VMCLEAR on a VMCS region before making the corresponding VMCS active with VMPTRLD for the first time Figure 21 1 illustrates how execution of VMCLEAR puts a VMCS into a well defined state Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 265 chenes intel The following software usage is consistent with these limitations VMCLEAR should be executed for a VMCS before it is used for VM entry for the first time e VMLAUNCH should be used for the first VM entry using a
253. ner in which the local interrupts are delivered to the processor core It consists of the following 32 bit APIC registers see Figure 10 8 one for each local interrupt LVT Timer Register FEEO 0320H Specifies interrupt delivery when the API C timer signals an interrupt see Section 10 5 4 APIC Timer LVT Thermal Monitor Register FEEO 0330H Specifies interrupt delivery when the thermal sensor generates an interrupt see Section 14 5 2 Thermal Monitor This LVT entry is implementation specific not architectural If imple mented it will always be at base address FEEO 0330H LVT Performance Counter Register FEEO 0340H Specifies interrupt delivery when a performance counter generates an interrupt on overflow see Section 30 8 5 8 Generating an Interrupt on Overflow This LVT entry is imple mentation specific not architectural If implemented it is not guaranteed to be at base address FEEO 0340H LVT LI NTO Register FEEO 0350H Specifies interrupt delivery when an interrupt is signaled at the LINTO pin LVT LINT1 Register FEEO 0360H Specifies interrupt delivery when an interrupt is signaled at the LINT1 pin LVT Error Register FEEO 0370H Specifies interrupt delivery when the APIC detects an internal error see Section 10 5 3 Error Handling LVT Register FEEO 2 Specifies interrupt delivery when an overflow condition of corrected machine check error count reaching a thres
254. ng point value in xmm2 mem and set the status flags accordingly Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg ModRM r m NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes UD2 Undefined Instruction Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF OB UD2 A Valid Valid Raise invalid opcode exception Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA UNPCKHPD Unpack and Interleave High Packed Double Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 0F 15 r UNPCKHPDxmm1 Valid Valid Unpacks and Interleaves xmm2 m128 double precision floating point values from high quadwords of xmm1 and xmm2 m128 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA UNPCKHPS Unpack and Interleave High Packed Single Precision Floating Point Values Opcode Instruction 64 Bit Compat Description En Leg Mode OF 15 r UNPCKHPS xmm1 A Valid Valid Unpacks and Interleaves xmm2 m128 single precision floating point values from high quadwords of xmm1 and xmm2 mem into xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM
255. ng with the following values 06 1AH 06 1EH 06 1FH 06 2EH However Intel Xeon processors with CPUID signature of DisplayFamily DisplayModel 06 2 have a small number of events that are not supported in processors with CPUID signature 06 1AH 06 1EH and 06 1FH These events are noted in the comment column In addition these processors CPUID signature of DisplayFamily DisplayModel 06 1AH 06 1EH 06 I1FH also support the following non architectural product specific uncore performance monitoring events listed in Table A 3 Fixed counters in the core PMU support the architecture events defined in Table A 7 Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 274 Documentation Changes intel Table 2 Non Architectural Performance Events In the Processor Core for Intel Core i7 Processor and Intel Xeon Processor 5500 Series Event Umask Event Mask Num Value Mnemonic Description Comment 04H 07H SB DRAIN ANY Counts the number of store buffer drains OFH 01H MEM UNCORE RETI Counts number of memory load Available only for RED L3 DATA MISS instructions retired where the CPUID signature UNKNOWN memory reference missed L3 and 06 2EH data source is unknown OFH 80H MEM UNCORE RETI Counts number of memory load Available only for RED UNCACHEABLE instructions retired where the CPUID signature memory reference missed the L1 06 2 L2 and L3 cac
256. nges NOTES n 64 bit mode only 64 bit RDI and 32 bit EDI address sizes are supported In non 64 bit mode only 32 bit EDI and 16 bit address sizes are supported Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA SETcc Set Byte on Condition Opcode Instruction Op 64 Bit Compat Description En Leg Mode OF 97 SETA r m8 A Valid Valid Set byte if above CF 0 and ZF20 REX 0F 97 SETA r m8 A Valid N E Set byte if above CF 0 and ZF20 OF 93 SETAE r m8 A Valid Valid Set byte if above or equal 0 REX 0F 93 SETAE r m8 A Valid Set byte if above or equal 0 0 92 SETB r m8 A Valid Valid Set byte if below CF 1 REX 0F 92 SETB r m8 A Valid N E Set byte if below CF 1 OF 96 SETBE r m8 A Valid Valid Set byte if below or equal CF 1 or ZF 1 REX 0F 96 SETBE r m8 A Valid Set byte if below or equal CF 1 or ZF 1 OF 92 SETC r m8 A Valid Valid Set byte if carry CF 1 REX 0F 92 SETC r m8 A Valid N E Set byte if carry 1 OF 94 SETE r m8 A Valid Valid Set byte if equal ZF 1 REX 0F 94 SETE r m8 A Valid N E Set byte if equal ZF 1 OF 9F SETG r m8 A Valid Valid Set byte if greater ZF 0 and SF OF REX 0F 9F SETG r m8 A Valid Set byte if greater ZF 0 and SF 0F OF 9D SETGE r m8 A Valid Valid Set byte if greater or equal SF OF REX 0F 9D SETGE r m8 A Valid N E Set byte if greater or
257. not active but it will not be executed until the EWBE pin is sampled active Only the Pentium processor family has the EWBE pin PABSB PABSW PABSD Packed Absolute Value Op 64 Bit Compat Opcode Instruction En Mode Leg Mode Description OF 38 1C r PABSB mm1 A Valid Valid Compute the absolute value mm2 m64 of bytes in mm2 m64 and store UNSIGNED result in mm1 66 0F 38 1C r A Valid Valid Compute the absolute value xmm2 m128 of bytes in xmm2 m128 and store UNSIGNED result in xmml OF 38 1D r PABSW mm1 A Valid Valid Compute the absolute value mm2 m64 of 16 bit integers in mm2 m64 and store UNSIGNED result in mm1 66 0 38 10 PABSW xmml Valid Valid Compute the absolute value xmm2 m128 of 16 bit integers in xmm2 m128 and store UNSIGNED result in xmm1 OF 38 1E r PABSD mm1 A Valid Valid Compute the absolute value mm2 m64 of 32 bit integers in mm2 m64 and store UNSIGNED result in mm1 66 0F 38 1E r PABSDxmml A Valid Valid Compute the absolute value xmm2 m128 of 32 bit integers in xmm2 m128 and store UNSIGNED result in xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 115 Documentation Changes intel PACKSSWB PACKSSDW Pack with Signed Saturation Opcode OF 63 r OF 6B r 66 OF 63 r 66 OF 6
258. nsion of the APIC architecture found in the P6 family processors The primary difference between the and xAPIC architectures is that with the xAPIC architecture the local APICs and the 1 0 APIC communicate through the system bus With the APIC architecture they communication through the APIC bus see Section 10 2 System Bus Vs APIC Bus Also some APIC architectural features have been extended and or modified in the xAPIC architecture These extensions and modifications are described in Section 10 4 through Section 10 10 The x2APIC architecture is an extension of the xAPIC architecture primarily to increase processor addressability The x2APIC architecture provides backward compatibility to the xAPIC architecture and forward extendability for future Intel platform innovations These extensions and modifications are supported by a new mode of execution x2API C mode are detailed in Section 10 12 10 4 1 The Local APIC Block Diagram Figure 10 4 gives a functional block diagram for the local APIC Software interacts with the local APIC by reading and writing its registers APIC registers are memory mapped to a 4 KByte region of the processor s physical address space with an initial starting address of FEEQOOOOH For correct APIC operation this address space must be mapped to an area of memory that has been designated as strong uncacheable UC See Section 11 3 Methods of Caching Available In MP system configurations
259. nteger 3A 16 xmm 2 imm8 value from xmm2 at the Ir ib source qword offset specified by imm8 into r m64 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m w ModRM reg r imm8 NA Description Copies a data element byte dword quadword in the source operand second operand specified by the count operand third operand to the destination operand first operand The source operand is an XMM register The destination operand can be a general purpose register or a memory address The count operand is an 8 bit imme diate When specifying a quadword dword byte element the 2 4 least significant bit s of the count operand specify the location Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 126 Documentation Changes In 64 bit mode using a REX prefix in the form of REX R permits this instruction to access additional registers XMM8 XMM15 R8 15 PEXTRQ requires REX W If the destination operand is a general purpose register the default operand size of PEXTRB PEXTRW is 64 bits PEXTRW Extract Word Opcode Instruction Op 64 Bit Compat Description En Leg Mode OF C5 rib PEXTRW reg mm A Valid Valid Extract the word specified imm8 by imm8 from mm and move it to reg bits 15 0 The upper bits of r32 or r64 is zeroed 66 C5 rib PEXTRW A Valid Valid Extract the word specified xmm imm8 by
260. ntel 64 and 32 Architectures Software Developer s Manual Documentation Changes 202 chenes intel XCHG Exchange Register Memory with Register Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 90 rw XCHG AX r16 A Valid Valid Exchange r16 with AX 90 rw XCHG r16 AX B Valid Valid Exchange AX with r16 90 XCHG EAX r32 A Valid Valid Exchange r32 with EAX REXW 90 rd r64 A Valid Exchange r64 with RAX 90 XCHG r32 EAX B Valid Valid Exchange EAX with r32 REXW 90 rd r64 B Valid N E Exchange RAX with r64 86 r XCHG r m8 r8 C Valid Valid Exchange r8 byte register with byte from r m8 REX 86 r XCHG 8 r8 Valid N E Exchange r8 byte register with byte from r m8 86 r XCHG r8 r m8 D Valid Valid Exchange byte from r m8 with r8 byte register REX 86 r XCHGr8 r m8 D Valid N E Exchange byte from r m8 with r8 byte register 87 r XCHGr m16 r16 C Valid Valid Exchange r16 with word from r m16 87 XCHGr16 r m16 D Valid Valid Exchange word from r m16 with r16 87 r XCHGr m32 r32 C Valid Valid Exchange r32 with doubleword from r m32 REX W 87 r XCHGr m64 r64 Valid N E Exchange r64 with quadw ord from r m64 87 XCHGr32 r m32 D Valid Valid Exchange doubleword from r m32 with r32 REXW 87 r XCHGr64 r m64 D Valid N E Exchange quadword from r m64 with r64 NOTES n 64 bit mode r m8 can not be encoded to acces
261. o a specified memory location it stores the value FFFFFFFF FFFFFFFFH if there is no current VMCS The launch state of a VMCS determines which VM entry instruction should be used with that VMCS the VMLAUNCH instruction requires a VMCS whose launch state is clear the VMRESUME instruction requires a VMCS whose launch state is launched A logical processor maintains a VMCS s launch state in the corresponding VMCS region The following items describe how a logical processor manages the launch state of a VMCS Ifthe launch state of the current VMCS is clear successful execution of the VMLAUNCH instruction changes the launch state to launched The memory operand of the VMCLEAR instruction is the address of a VMCS After execution of the instruction the launch state of that VMCS is clear There are no other ways to modify the launch state of a VMCS it cannot be modified using VMWRITE and there is no direct way to discover it it cannot be read using VMREAD Figure 21 1 illustrates the different states of a VMCS It uses X to refer to the VMCS and Y to refer to any other VMCS Thus VMPTRLD X always makes X current active VMPTRLD Y always makes X not current because it makes Y current VMLAUNCH makes the launch state of X launched if X was current and its launch state was clear and VMCLEAR X always makes X inactive and not current and makes its launch state clear The figure does not illustrate o
262. oating point value in xmm2 m64 and stores the results in xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m r NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 188 Documentation Changes SQRTSS Compute Square Root of Scalar Single Precision Floating Point Value Opcode Instruction Op 64 Bit Description En Mode 0 51 SQRTSS xmm1 A Valid Computes square root of xmm2 m32 the low single precision floating point value in xmm2 m32 and stores the results in xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA STC Set Carry Flag Opcode Instruction Op 64 Bit Description En Mode F9 STC A Valid Set CF flag Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA STD Set Direction Flag Opcode Instruction Op 64 Bit Description En Mode FD STD A Valid Set DF flag Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes chenes intel STI Set Interrupt Flag Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode FB STI A Valid Valid Set interrupt flag external maskable interrupts enabled
263. odRM reg ModRM r m implicit XMMO BLENDVPS Variable Blend Packed Single Precision Floating Point Values Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode 66 38 14 BLENDVPS 1 Valid Valid Select packed single xmm2 m128 precision floating point lt 0 gt values from xmm1 and xmm2 m128 from mask specified in XMMO and store the values into xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m r implicit XMM0 NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 21 chenes intel BOUND Check Array Index Against Bounds Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode 62 Ir BOUND r16 A Invalid Valid Check if r16 array index is m16 amp 16 within bounds specified by 16 amp 16 62 Ir BOUND r32 A Invalid Valid Check if r32 array index is m32 amp 32 within bounds specified by m16 amp 16 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 ModRM reg ModRM r m BSF Bit Scan Forward Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode OF BC r BSF r16 r m16 A Valid Valid Bit scan forward on r m16 OF BC r BSF r32 r m32 A Valid Valid Bit scan forward on r m32 REX W 0F BSF r64 r m64 A Valid N E Bit scan fo
264. ode cause general protection exceptions Figure 10 28 illustrates the Error Status Register in x2APIC mode Write to the ICR in xAPIC and x2APIC modes or to SELF IPI register x2APIC mode only with an illegal vector vector lt OFH will set the Send Illegal Vector bit On receiving an IPI with an illegal vector vector x OFH the Receive Illegal Vector bit will be set On receiving an interrupt with illegal vector in the range OH OFH the inter rupt will not be delivered to the processor nor will an IRR bit be set in that range Only the ESR Receive Illegal Vector bit will be set If the ICR is programmed with lowest priority delivery mode then the Re directible PI bit will be set in X2APIC modes same as legacy xAPIC behavior and the interrupt will not be processed Write to the ICR with both lowest priority delivery mode and illegal vector will set the re directible I PI error bit The interrupt will not be processed and hence the Send Illegal Vector error bit will not be set 31 87654 32410 Reserved Illegal Register Address Received Illegal Vector Send Illegal Vector Redirectible Reserved MSR Address 828H Figure 10 28 Error Status Register ESR in x2APIC Mode Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 257 chenes intel 10 12 9 ICR Operation in x2APIC Mode In x2API
265. ode Leg Mode OF A4 SHLD r m16 r16 Valid Valid Shift r m16 to left imm8 imm8 places while shifting bits from r16 in from the right OF A5 SHLD r m16 r16 B Valid Valid Shift r m16 to left CL places CL while shifting bits from r16 in from the right OF A4 SHLD r m32 r32 A Valid Valid Shift r m32 to left imm8 imm8 places while shifting bits from r32 in from the right REX W 0FA4 SHLDr m64 r64 A Valid Shift r m64 to left imm8 imm8 places while shifting bits from r64 in from the right Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 184 Documentation Changes SHRD Double Precision Shift Right Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF A5 SHLD r m32 r32 Valid Valid Shift r m32 to left CL places CL while shifting bits from r32 in from the right REX W 0F A5 SHLDr m64 r64 Valid N E Shift r m64 to left CL places CL while shifting bits from r64 in from the right Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m w ModRM reg r imm8 NA B ModRM r m w ModRM reg r CL NA Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF AC SHRDr m16 r16 Valid Valid Shift r m16 to right imm8 imm8 places while shifting bits from r16 in from the left OF AD SHRD r m16 r16 Valid Valid Shift r m16 to right CL CL places while shifting bits from r16 in from the left
266. of stack pointer Push sign extended imm16 Stack pointer is incremented by the size of stack pointer Push sign extended imm32 Stack pointer is incremented by the size of stack pointer Push CS Push SS Push DS Push ES Push FS and decrement stack pointer by 16 bits Push FS and decrement stack pointer by 32 bits Push FS Default operand size 64 bits 66H override causes 16 bit operation Push GS and decrement stack pointer by 16 bits Push GS and decrement stack pointer by 32 bits Push GS default operand size 64 bits 66H override causes 16 bit operation NOTES See IA 32 Architecture Compatibility section below Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 158 H intel Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m NA NA NA B reg r NA NA NA C imm8 16 32 NA NA NA D NA NA NA NA PUSHA PUSHAD Push All General Purpose Registers Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 60 PUSHA A Invalid Valid Push AX CX DX BX original SP BP SI and DI 60 PUSHAD A Invalid Valid Push EAX ECX EDX EBX original ESP EBP ESI and EDI Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA PUSHF PUSHFD Push EFLAGS Register onto the Stack Opcode Instruction Op 64 Bit Compat Des
267. of the code The act of one processor writing data into the currently executing code segment of a second processor with the intent of having the second processor execute that data as code is called cross modifying code As with self modifying code 32 processors exhibit model specific behavior when executing cross modifying code depending upon how far ahead of the executing processors current execution pointer the code has been modified To write cross modifying code and ensure that it is compliant with current and future versions of the 1 A 32 architecture the following processor synchronization algorithm must be implemented Action of Modifying Processor Memory Flag lt 0 Set Memory Flag to value other than 1 Store modified code as data into code segment Memory Flag 1 Action of Executing Processor WHILE Memory Flag z 1 Wait for code to update ELIHW Execute serializing instruction For example CPUID instruction Begin executing modified code The use of this option is not required for programs intended to run on the Intel486 processor but is recommended to ensure compatibility with the Pentium 4 Intel Xeon P6 family and Pentium processors Like self modifying code cross modifying code will execute at a lower level of perfor mance than non cross modifying normal code depending upon the frequency of modi fication and specific characteristics of the code The restrictions o
268. ogram order A WRMSR to an APIC register may complete before all preceding stores are globally visible software can prevent this by inserting a serializing instruction an SFENCE or an MFENCE before the WRMSR The RDMSR instruction is not serializing and this behavior is unchanged when reading APIC registers in x2APIC mode System software accessing the APIC registers using the RDMSR instruction should not expect a serializing behavior Note The MMIO based xAPIC interface is mapped by system software as an un cached region Consequently read writes to the xAPI C MMI O interface have serializing semantics the xAPIC mode 10 12 4 VM Exit Controls for MSRs and x2APIC Registers The VMX architecture allows a VMM to specify lists of MSRs to be loaded or stored on VMX transitions using the VMX transition MSR areas see VM exit MSR store address field VM exit MSR load address filed and VM entry MSR load address field in Intel 64 and I A 32 Architectures Software Developer s Manual Volume The X2API C MSRs cannot to be loaded and stored on VMX transitions A VMX transition fails if the VMM has specified that the transition should access any MSRs in the address range from 0000 0800H to 0000 08FFH the range used for accessing the X2APIC registers Specifically processing of an 128 bit entry in any of the VMX transition MSR areas fails if bits 31 0 of that entry represented as ENTRY LOW DW satisfies the expression ENTRY LOW DW amp
269. om 0 to 1 write accesses to linear addresses whose translation is controlled by this entry may or may not cause a page fault exception Ifapaging structure entry is modified to change the U S flag from 0 to 1 user mode accesses to linear addresses whose translation is controlled by this entry may or may not cause a page fault exception Ifa paging structure entry is modified to change the XD flag from 1 to 0 instruction fetches from linear addresses whose translation is controlled by this entry may or may not cause a page fault exception As noted in Section 8 1 1 an x87 instruction or an SSE instruction that accesses data larger than a quadword may be implemented using multiple memory accesses If such an instruction stores to memory and invalidation has been delayed some of the accesses may complete writing to memory while another causes a page fault excep tion In this case the effects of the completed accesses may be visible to software even though the overall instruction caused a fault 1 If the accesses are to different pages this may occur even if invalidation has not been delayed Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 225 chenes intel In some cases the consequences of delayed invalidation may not affect software adversely For example when freeing a portion of the linear address space by marking paging structure entries not present invalidation u
270. ompat Description En Mode Leg Mode OF 16 r MOVLHPS 1 A Valid Valid Move two packed single xmm2 precision floating point values from low quadword of xmm2 to high quadword of xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM reg r NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 96 Boxusieniniton Chinon intel MOVLPD Move Low Packed Double Precision Floating Point Value Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF 12 r MOVLPD xmm A Valid Valid Move double precision m64 floating point value from m64 to low quadword of xmm register 66 OF 13 r MOVLPD m64 B Valid Valid Move double precision xmm floating point nvalue from low quadword of xmm register to m64 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m NA NA B ModRM r m w ModRM reg r NA NA MOVLPS Move Low Packed Single Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 12 r MOVLPS xmm A Valid Valid Move two packed single m64 precision floating point values from m64 to low quadword of xmm OF 13 r MOVLPS m64 B Valid Valid Move two packed single xmm precision floating point values from low quadword of xmm to m64 Instruction Operand Encoding Op En O
271. on the APIC bus Receive Checksum Error P6 family and Pentium processors only Set when the local APIC detects a checksum error for a message that it received on the APIC bus Send Illegal Vector Set when the local APIC detects an illegal vector in the message that it is sending Receive Illegal Vector Illegal Reg Address Set when the local APIC detects an illegal vector in the message it received including an illegal vector code in the local vector table interrupts or in a self interrupt Intel Core Intel Atom Pentium 4 Intel Xeon and P6 family processors only Set when the processor is trying to access a register in the processor s local APIC register address space that is reserved see Table 10 1 Addresses in one of the 0x10 byte regions marked reserved are illegal register addresses The Local APIC Register Map is the address range of the APIC register base address specified in the 32 APIC BASE MSR plus 4 KBytes 10 5 4 APICTimer The local APIC unit contains a 32 bit programmable timer that is available to software to time events or operations This timer is set up by programming four registers the divide configuration register see Figure 10 10 the initial count and current count registers see Figure 10 11 and the LVT timer register see Figure 10 8 If CPUID 06H EAX ARAT bit 2 1 the processor s API C timer runs at a constant rate regardless of P state transitions and it
272. on x2APIC state transitions Details of this initialization are provided in Section 10 12 10 2 Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 259 chenes intel 10 12 10 2 Deriving Logical x2APIC ID from the Local x2APICID In 2 mode the 32 bit logical x2APIC ID which be read from LDR is derived from the 32 bit local x2APIC ID Specifically the 16 bit logical ID sub field is derived by shifting 1 by the lowest 4 bits of the x2APIC ID i e Logical ID 1 lt X2APIC ID 3 0 The remaining bits of the x2APIC I D then form the cluster ID portion of the logical x2APIC ID Logical x2APIC ID x2APIC ID 19 4 lt 16 1 lt x2APIC ID 3 0 The use of the lowest 4 bits in the x2APIC ID implies that at least 16 APIC IDs are reserved for logical processors within a socket in multi socket configurations If more than 16 APIC IDS are reserved for logical processors in a socket package then multiple cluster I Ds can exist within the package The LDR initialization occurs whenever the x2APIC mode is enabled see Section 10 12 5 10 12 11 SELF IPI Register SELF IPIs are used extensively by some system software The x2APIC architecture intro duces a new register interface This new register is dedicated to the purpose of sending self IPIs with the intent of enabling a highly optimized path for sending self Pls Figure 10 31 provides the layout of the SELF re
273. or more information Designers must not rely on the absence or characteristics of any features or instructions marked reserved or undefined Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order 12C is a two wire communications bus protocol developed by Philips SMBus is a subset of the 12 bus protocol and was developed by Intel Implementations of the 2 bus protocol may require licenses from various entities including Philips Electronics N V and North American Philips Corporation Intel Pentium Intel Core Intel Xeon Intel 64 Intel NetBurst and the Intel logo are trademarks of Intel Corporation in the U S and other countries Other names and brands may be claimed as the property of others Copyright 2002 2009 Intel Corporation All rights reserved 2 Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes Contents Revision 4 Preface oco pee ens Ee S nx REOR Be a eee d 7 Summary Tables of Changes 8 Documentation 9 Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 3 int
274. ord register or a memory location the source operand must be a word register The ARPL instruction is provided for use by operating system procedures however it can also be used by applications It is generally used to adjust the RPL of a segment selector that has been passed to the operating system by an application program to match the privilege level of the application program Here the segment selector passed to the operating system is placed in the destination operand and segment selector for the application program s code segment is placed in the source operand The RPL field in the source operand represents the privilege level of the application program Execu tion of the ARPL instruction then ensures that the RPL of the segment selector received by the operating system is no lower does not have a higher privilege than the privilege level of the application program the segment selector for the application program s code segment can be read from the stack following a procedure call This instruction executes as described in compatibility mode and legacy mode It is not encodable in 64 bit mode Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 19 ee ehanas intel See Checking Caller Access Privileges in Chapter 3 Protected Mode Memory Manage ment of the Intel 64 and IA 32 Architectures Software Developer s Manual Volume 3A for more information about the use of this ins
275. oring with L3 Caching Bus Controller The facility for monitoring events consists of a set of dedicated model specific registers MSRs There are eight event select counting MSRs that are dedicated to counting events associated with specified microarchitectural conditions Programming of these MSRs requires using RDMSR WRMSR instructions with 64 bit values In addition an MSR MSR EMON L3 GL provides simplified interface to control freezing resetting re enabling operation of any combination of these event select counting MSRs The eight MSRs dedicated to count occurrences of specific conditions are further divided to count three sub classes of microarchitectural conditions Two MSRs MSR EMON L3 0 and MSR EMON L3 CTL1 are dedicated to counting GBSQ events Up to two GBSQ events can be programmed and counted simultaneously Two MSRs MSR EMON L3 CTR CTL2 and MSR EMON L3 CTR CTL3 are dedicated to counting GSNPQ events Up to two GBSQ events can be programmed and counted simultaneously e Four MSRs MSR EMON L3 CTR CTL4 MSR EMON L3 CTR CTL5 MSR EMON L3 CTR CTL6 and MSR EMON L3 CTR CTL7 are dedicated to counting external bus operations The bit fields in each of eight MSRs share the following common characteristics Bits 63 32 is the event control field that includes an event mask and other bit fields that control counter operation The event mask field specifies details of the microar chitectural con
276. ot supported in 64 bit mode Jump near if greater or equal SF OF Jump near if less SF Not supported in 64 bit mode Jump near if less SF OF Jump near if less or equal ZF 1 or SFz OF Not supported in 64 bit mode Jump near if less or equal ZF 1 or SFz OF Jump near if not above CF 1 or ZF 1 Not supported in 64 bit mode Jump near if not above CF 1 or ZF 21 Jump near if not above or equal CF 1 Not supported in 64 bit mode Jump near if not above or equal CF 1 Jump near if not below CF 0 Not supported in 64 bit mode Jump near if not below 0 Jump near if not below equal CF 0 and ZF 0 Not supported in 64 bit mode 71 Documentation Changes Opcode OF 87 cd OF 83 cw OF 83 cd OF 85 cw OF 85 cd OF 8E cw OF 8E cd OF 8C cw OF 8C cd OF 8D cw OF 8D cd OF 8F cw OF 8F cd OF 81 cw OF 81 cd OF 8B cw OF 8B cd Instruction JNBE rel32 JNC rel16 JNC rel32 JNE rel16 JNE rel32 JNG rel16 JNG rel32 JNGE rel16 JNGE rel32 JNL rel16 JNL rel32 JNLE rel16 JNLE rel32 JNO rel16 JNO rel32 JNP rel16 JNP rel32 Op En A 64 Bit Mode Valid 5 Valid NS Valid 5 Valid NS Valid NS Valid NS Valid NS Valid NS Valid Compat Leg Mode Valid Valid Valid Valid Valid Valid Valid Valid Valid V
277. per s Manual Documentation Changes Description Subtract packed byte integers in mm m64 from packed byte integers in mm Subtract packed byte integers in xmm2 m128 from packed byte integers in xmm1 Subtract packed word integers in mm m64 from packed word integers in mm Subtract packed word integers in xmm2 m128 from packed word integers in xmml1 Subtract packed doubleword integers in mm m64 from packed doubleword integers in mm 153 Boxusieniniton Chinon intel Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF FA r PSUBD xmm1 A Valid Valid Subtract packed doubleword xmm2 m128 integers in xmm2 mem128 from packed doubleword integers in xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA PSUBQ Subtract Packed Quadword Integers Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF FB r PSUBQ mm1 A Valid Valid Subtract quadword integer mm2 m64 in mm1 from mm2 m64 66 OF FB r PSUBQ xmm1 A Valid Valid Subtract packed quadword xmm2 m128 integers in xmm1 from xmm2 m128 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA PSUBSB PSUBSW Subtract Packed Signed Integers with Signed Saturation Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF E8 r PSUBSB mm A Valid
278. perand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA B ModRM r m w ModRM reg r NA NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 97 Documentation Changes MOVMSKPD Extract Packed Double Precision Floating Point Sign Mask Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF 50 r MOVMSKPDreg A Valid Valid Extract 2 bit sign mask from xmm xmm and store in reg The upper bits of r32 or r64 are filled with zeros Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM reg NA NA MOVMSKPS Extract Packed Single Precision Floating Point Sign Mask Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 50 r MOVMSKPS reg A Valid Valid Extract 4 bit sign mask from xmm xmm and store in reg The upper bits of r32 or r64 are filled with zeros Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg ModRM reg NA NA MOVNTDQA Load Double Quadword Non Temporal Aligned Hint Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 0 38 2 r MOVNTDQA A Valid Valid Move double quadword 1 128 from m128 to xmm using non temporal hint if WC memory type Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m
279. perand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r imm8 NA CMPPS Compare Packed Single Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF C2 r ib CMPPS A Valid Valid Compare packed single xmm2 m128 imm8 precision floating point values in xmm2 mem and xmm1 using imm8 as comparison predicate Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r imm8 NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 35 Documentation Changes CMPS CMPSB CMPSW CMPSD CMPSQ Compare String Operands Opcode 6 A7 REX W A7 A6 Instruction CMPS m8 m8 CMPS m16 m16 CMPS m32 m32 CMPS m64 m64 CMPSB Op En A 64 Bit Mode Valid Valid Valid Valid Valid Compat Leg Mode Valid Valid Valid N E Valid Description For legacy mode compare byte at address DS E SI with byte at address ES E DI For 64 bit mode compare byte at address R E SI to byte at address R E DI The status flags are set accordingly For legacy mode compare word at address DS E SI with word at address ES E DI For 64 bit mode compare word at address R E SI with word at address R E DI The status flags are set accordingly For legacy mode compare dword a
280. peration ensures that all operations that were started while the processor was in real address mode are completed before the switch to protected mode is made The concept of serializing instructions was introduced into the 32 architecture with the Pentium processor to support parallel instruction execution Serializing instructions have no meaning for the Intel486 and earlier processors that do not implement parallel instruction execution It is important to note that executing of serializing instructions on P6 and more recent processor families constrain speculative execution because the results of speculatively executed instructions are discarded The following instructions are serializing instruc tions Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 236 e Documentation Changes n tel Privileged serializing instructions INVD INVEPT INVLPG INVVPID LGDT LI DT LLDT LTR MOV to control register with the exception of MOV CR81 MOV to debug register WBINVD and WRMSR Non privileged serializing instructions CPUID I RET and RSM When the processor serializes instruction execution it ensures that all pending memory transactions are completed including writes stored in its store buffer before it executes the next instruction Nothing can pass a serializing instruction and a serializing instruc tion cannot pass any other instruction read write instruction fetch or
281. perations that do not modify the VMCS state relative to these parameters e g execution of VMPTRLD X when X is already current Note that VMCLEAR X makes X inactive not current and clear even if X s current state is not defined e g even if X has not yet been initialized See Section 21 11 Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 263 Documentation Changes Active Not Current Active Current Clear Inactive Not Current Clear lt lt VMCLEAR X L L z 5 RA Ly z 5 8 6 3 2 m eue lt m o 5 5 x A Anything x lt Else Active Not Current Launched Active Current Launched Figure 21 1 States of VMCS X 21 10 SOFTWARE USE OF THE VMCS AND RELATED STRUCTURES This section details guidelines that software should observe when using a VMCS and related structures It also provides descriptions of consequences for failing to follow guidelines 21 10 1 Software Use of Virtual Machine Control Structures To ensure proper processor behavior software should observe certain guidelines when using an active VMCS No VMCS should ever be active on more than one logical processor If a VMCS is to be migrated from one logical processor to another the first logical processor should execute VMCLEAR for the VMCS to make it inactive on that logical processor and to ensure that all VMCS data are
282. placement relative to next instruction Not supported in 64 bit mode E9 cd JMP rel32 A Valid Valid Jump near relative RIP RIP 32 bit displacement sign extended to 64 bits FF 4 JMP r m16 B NS Valid Jump near absolute indirect address sign extended r m16 Not supported in 64 bit mode FF 4 JMP r m32 B NS Valid Jump near absolute indirect address sign extended r m32 Not supported in 64 bit mode FF 4 JMP r m64 B Valid Jump near absolute indirect RIP 64 Bit offset from register or memory EA cd JMP ptr16 16 A Inv Valid Jump far absolute address given in operand EA cp JMP ptr16 32 A Inv Valid Jump far absolute address given in operand FF 5 JMP m16 16 A Valid Valid Jump far absolute indirect address given in m16 16 FF 5 JMP m16 32 A Valid Valid Jump far absolute indirect address given in m16 32 REXW FF 5 JMP 16 64 A Valid N E Jump far absolute indirect address given in m16 64 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 Offset NA NA NA ModRM r m NA NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 74 Boxusieniniton Chinon intel LAHF Load Status Flags into AH Register Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF LAHF A Invalid Valid Load AH EFLAGS SF ZF 0 AF 0 PF 1 CF NOTES Valid in specific steppin
283. pter 21 of the Intel 64 and IA 32 Architectures Soft ware Developer s Manual Volume 3B System Programming Guide Part 2 21 1 OVERVIEW A logical processor uses virtual machine control data structures VMCSs while it is in VMX operation These manage transitions into and out of VMX non root operation VM entries and VM exits as well as processor behavior in VMX non root operation This structure is manipulated by the new instructions VMCLEAR VMPTRLD VMREAD and VMWRITE A VMM can use a different VMCS for each virtual machine that it supports For a virtual machine with multiple logical processors virtual processors the VMM can use a different VMCS for each virtual processor A logical processor associates a region in memory with each VMCS This region is called the VMCS region Software references a specific VMCS using the 64 bit physical address of the region a VMCS pointer VMCS pointers must be aligned on a 4 KByte boundary bits 11 0 must be zero On processors that support Intel 64 architecture these pointers must not set bits beyond the processor s physical address width On processors that do not support Intel 64 architecture they must not set any bits in the range 63 32 A logical processor may maintain a number of VMCSs that are active The processor may optimize VMX operation by maintaining the state of an active VMCS in memory on the processor or both At any given time at most one of the active VMCSs is the curren
284. r Location Intel 64 and IA 32 Architectures Software Developer s Manual Volume 3B System Programming Guide Part 2 1 Basic Architecture 233063 Intel 64 and IA 32 Architectures Software Developer s Manual Volume 253666 2A Instruction Set Reference A M Intel 64 and IA 32 Architectures Software Developer s Manual Volume 253667 2B Instruction Set Reference N Z Intel 64 and 32 Architectures Software Developer s Manual Volume 253668 3A System Programming Guide Part 1 Intel 64 and 1A 32 Architectures Software Developer s Manual Volume 253669 Nomenclature Documentation Changes include typos errors or omissions from the current published specifications These will be incorporated in any new release of the specification Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes Summary Tables of Changes intel Summary Tables of Changes The following table indicates documentation changes which apply to the Intel 64 and A 32 architectures This table uses the following notations Codes Used in Summary Tables Change bar to left of table row indicates this erratum is either new or modified from the previous version of the document Documentation Changes No DOCUMENTATI ON CHANGES 1 Updates to Chapter 3 Volume 2A 2 Updates to Chapter 4 Volume 2B 3 Updates to Chapter 4 Volume 3A 4 Updates to Chapter
285. rand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA B ModRM r m r w imm8 NA NA AL AX EAX RAX imm8 NA NA ADDPD Add Packed Double Precision Floating Point Values Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode 66 OF 58 r ADDPD 1 A Valid Valid Add packed double precision xmm2 m128 floating point values from xmm2 m128 to xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 14 chenes intel ADDPS Add Packed Single Precision Floating Point Values Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode OF 58 r ADDPS xmm1 A Valid Valid Add packed single precision xmm2 m128 floating point values from xmm2 m128 to xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA ADDSD Add Scalar Double Precision Floating Point Values Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode F2 OF 58 r ADDSD xmm1 A Valid Valid Add the low double xmm2 m64 precision floating point value from xmm2 m64 to xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m NA NA ADDSS Add Scalar Single Precision Floating Poin
286. rand 4 A NA NA NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes e Documentation Changes n tel TEST Logical Compare Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode A8 ib TEST AL imm8 A Valid Valid AND imm8 with AL set SF ZF PF according to result A9 iw TEST AX imm16 Valid Valid AND imm16 with AX set SF ZF PF according to result A9 id TEST imm32 Valid Valid AND imm32 with EAX set SF ZF PF according to result REX W A9 id TEST imm32 A Valid N E AND imm32 sign extended to 64 bits with RAX set SF ZF PF according to result F6 0 ib TEST r m8 imm8 B Valid Valid AND imm8 with r m8 set SF ZF PF according to result REX 6 0 ib TEST r m8 imm8 Valid N E AND imm8 with r m8 set SF ZF PF according to result F7 0 iw TEST r m16 B Valid Valid AND imm16 with r m16 set imm16 SF ZF PF according to result F7 0 id TEST r m32 B Valid Valid AND imm32 with r m32 set imm32 SF ZF PF according to result REX W F7 0 TEST r m64 B Valid N E AND imm32 sign extended id imm32 to 64 bits with r m64 set SF ZF PF according to result 84 r TEST r m8 r8 C Valid Valid AND r8 with r m8 set SF ZF PF according to result REX 84 r TEST r m8 r8 Valid N E AND r8 with r m8 set SF ZF PF according to result 85 r TEST r m16 r16 C Valid Valid AND r16 with r m16 set SF ZF PF according to
287. rchitectures Software Developer s Manual Documentation Changes 11 chenes intel AAS ASCII Adjust AL After Subtraction Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode 3F AAS A Invalid Valid ASCII adjust AL after subtraction Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA ADC Add with Carry Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode 14 ib ADC AL imm8 C Valid Valid Add with carry imm8 to AL 15 iw ADC AX imm16 C Valid Valid Add with carry imm16 to AX 15 id ADCEAX imm32 C Valid Valid Add with carry imm32 to EAX REXW 15id ADCRAX imm32 C Valid N E Add with carry imm32 sign extended to 64 bits to RAX 80 2 ib ADCr m8 imm8 Valid Valid Add with carry imm8 to r m8 REX 80 2ib ADCr m8 imm8 Valid N E Add with carry imm8 to r m8 81 2 iw ADC r m16 B Valid Valid Add with carry imm16 to imm16 r m16 81 2 id ADC r m32 B Valid Valid Add with CF imm32 to imm32 r m32 REX W 81 2 ADCr m64 B Valid N E Add with CF imm32 sign id imm32 extended to 64 bits to r m64 83 2 ib ADCr m16 imm8 Valid Valid Add with CF sign extended imm8 to r m16 83 2 ib ADCr m32 imm8 Valid Valid Add with CF sign extended imm8 into r m32 REX W 83 2 ADCr m64 imm8 Valid N E Add with CF sign extended ib imm8 into r m64 10 r ADC r m8 r8 A Valid Valid Add with carry byte register to r m8 REX 1
288. re for Intel Core i7 Processor and Intel Xeon Processor 5500 Series Event Num Umask Value Event Mask Mnemonic Description Comment Intel Xeon processors with CPUID signature of DisplayFamily_DisplayModel 06 2 have a distinct uncore sub system that is significantly different from the uncore found in processors with CPUID signature 06_1AH 06_1EH and 06_1FH Non architectural Performance monitoring events for its uncore will be available in future documentation Table 4 Non Architectural Performance Events In Next Generation Processor Core Codenamed Westmere Event Umask Event Mask Num Value Mnemonic Description Comment OFH 10H MEM UNCORE RETI Load instructions retired with a data RED LOCAL DRAM source of local DRAM or locally homed remote cache HITM Precise Event 20H 01H LSD OVERFLOW Number of loops that can not stream from the instruction queue 24H OCH 12 05 5 5 Counts all L2 store RFO requests L2 RFO requests include both L1D demand RFO misses as well as L1D RFO prefetches B1H 1FH UOPS_EXECUTED CO Counts number of cycles there RE_ACTIVE_CYCLES_ one or more uops being executed NO_PORT5 and were issued on ports 0 4 This is core count only and can not be collected per thread Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 276 Documentation Changes intel
289. rent names based their uses in the translation process Table 4 2 gives the names of the different paging structures It also provides for each structure the source of the physical address used to locate it CR3 or a different paging structure entry the bits in the linear address used to select an entry from the structure and details of about whether and how such an entry can map a page Table 4 2 Paging Structures the Different Paging Modes Physical Bits Paging Entry Structure Name Paging Mode 22 ae Page Mapping 32 bit PAE N A PML4 table PML4E IA 32e CR3 47 39 N A PS must be 0 32 bit N A Page arectony gt appre PAE CR3 3130 N A PS must be 0 pointer table IA 32e PML4E 38 30 1 GByte page if PS 1 32 bit CR3 3122 4 MByte page if PS 12 Page directory PDE PAE 32 PDPTE 29 21 2 MByte page if PS 1 32 bit 21 12 4 KByte page Page table PTE PDE PAE IA 32e 20 12 4 KByte page NOTES 1 Not all processors allow the PS flag to be 1 in PDPTEs see Section 4 1 4 for how to determine whether 1 GByte pages are supported 2 32 bit paging ignores the PS flag in a PDE and uses the entry to reference page table unless CR4 PSE 1 Not all processors allow CR4 PSE to be 1 see Section 4 1 4 for how to determine whether 4 MByte pages are supported with 32 bit paging 4 4 1 PDPTE Registers When PAE paging is used CR3 references the base of a 32 Byte page direc
290. rite to that paging structure entry if there is no serializing instruction between the write and the instruction fetch Note that the invalidating instructions identified in Section 4 10 3 1 are all serializing instructions Section 4 10 2 3 describes situations in which a single paging structure entry may contain information cached in multiple entries in the paging structure caches Because all entries in these caches are invalidated by any execution of INVLPG it is not necessary to follow the modification of such a paging structure entry by executing INVLPG multiple times solely for the purpose of invalidating these multiple cached entries It may be necessary to do so to invalidate multiple TLB entries 4 10 3 4 Delayed Invalidation Required invalidations may be delayed under some circumstances Software developers should understand that between the modification of a paging structure entry and execution of the invalidation instruction recommended in Section 4 10 3 2 the processor may use translations based on either the old value or the new value of the paging structure entry The following items describe some of the potential consequences of delayed invalidation Ifa paging structure entry is modified to change from 1 to 0 the P flag from 1 to 0 an access to a linear address whose translation is controlled by this entry may or may not cause a page fault exception Ifa paging structure entry is modified to change the R W flag fr
291. rocessor ordering is exhibited Read misses are permitted to go ahead of buffered writes on the system bus when all the buffered writes are cache hits and therefore are not directed to the same address being accessed by the read miss In the case of I O operations both reads and writes always appear in programmed order Software intended to operate correctly in processor ordered processors such as the Pentium 4 Intel Xeon and P6 family processors should not depend on the relatively strong ordering of the Pentium or Intel486 processors Instead it should ensure that accesses to shared variables that are intended to control concurrent execution among processors are explicitly required to obey program ordering through the use of appro priate locking or serializing operations see Section 8 2 5 Strengthening or Weakening the Memory Ordering Model 6 2 2 Memory Ordering in 6 and More Recent Processor Families The Intel Core 2 Duo Intel Atom Intel Core Duo Pentium 4 and P6 family processors also use a processor ordered memory ordering model that can be further defined as write ordered with store buffer forwarding This model be characterized as follows In a single processor system for memory regions defined as write back cacheable the memory ordering model respects the following principles Note the memory ordering principles for single processor and multiple processor systems are written from the perspective of softw
292. rol the execution of a VM upon a VM entry The VMM can make a VMCS the current VMCS by using the VMPTRLD instruction VMCS data fields must be read or written only through VMREAD and VMWRITE commands respec tively Every component of the VMCS is identified by a 32 bit encoding that is provided as an operand to VMREAD and VMWRITE Appendix H provides the encodings A VMM must properly initialize all fields in a VMCS before using the current VMCS for VM entry VMCS is referred to as a controlling VMCS if it is the current VMCS on a logical processor in VMX non root operation A current VMCS for controlling a logical processor in VMX non root operation may be referred to as a working VMCS if the logical processor is not in VMX non root operation The relationship of active current i e working and controlling VMCS during VMX operation is shown in Figure 27 1 NOTE As noted in Section 21 1 the processor may optimize VMX operation by maintaining the state of an active VMCS one for which VMPTRLD has been executed on the processor Before relinquishing control to other system software that may without informing the VMM remove power from the processor e g for transitions to 3 or S4 or leave VMX operation a VMM must VMCLEAR all active VMCSs This ensures that all Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 270 Documentation Changes a VMX Operation and VMX Transitions VM Entry Pro
293. rted in 64 bit mode OF 85 cd JNZ rel32 A Valid Valid Jump near if not zero ZF 0 OF 80 cw JO rel16 A NS Valid Jump near if overflow OF 1 Not supported in 64 bit mode OF 80 cd JO rel32 A Valid Valid Jump near if overflow OF 1 OF 8A cw JP rel16 A NS Valid Jump near if parity PF 1 Not supported in 64 bit mode OF 8A cd JP rel32 A Valid Valid Jump near if parity PF 1 OF 8A cw JPE rel16 A NS Valid Jump near if parity even PF 1 Not supported in 64 bit mode OF 8A cd JPE rel32 A Valid Valid Jump near if parity even PF 1 OF 8B cw JPO rel16 A NS Valid Jump near if parity odd PF 0 Not supported in 64 bit mode OF 8B cd JPO rel32 A Valid Valid Jump near if parity odd PF 0 OF 88 cw JS rel16 A NS Valid Jump near if sign SF 1 Not supported in 64 bit mode OF 88 cd JS rel32 A Valid Valid Jump near if sign SF 1 OF 84 cw JZ rel16 A NS Valid Jump near if 0 ZF 1 Not supported in 64 bit mode OF 84 cd JZ rel32 A Valid Valid Jump near if 0 ZF 1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A Offset NA NA NA Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 73 chenes intel JMP Jump Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode EB cb JMP rel8 A Valid Valid Jump short RIP RIP 8 bit displacement sign extended to 64 bits E9 cw JMP rel16 A NS Valid Jump near relative dis
294. rward on r m64 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 ModRM reg w ModRM r m NA NA BSR Bit Scan Reverse Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode OF BD r BSR r16 r m16 A Valid Valid Bit scan reverse on r m16 OF BD BSR r32 32 A Valid Valid Bit scan reverse on r m32 REX W 0F BD BSR r64 r m64 A Valid N E Bit scan reverse on r m64 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 ModRM reg w ModRM r m Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 22 chenes intel BSWAP Byte Swap Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode OF C8 rd BSWAP r32 A Valid Valid Reverses the byte order of a 32 bit register REX W BSWAP r64 A Valid Reverses the byte order of C8 rd a 64 bit register NOTES See 1 32 Architecture Compatibility section below Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A reg r w NA NA NA BT Bit Test Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode OF A3 BT r m16 r16 A Valid Valid Store selected bit in CF flag OF A3 BT r m32 132 A Valid Valid Store selected bit in CF flag REX W 0F BT r m64 r64 A Valid N E Store selected bit in CF flag OF B
295. s the new value Certain VMX transitions load the PDPTE registers See Section 4 11 1 Ignored Address of page directory pointer table PDPTE 3 n Reserved Address of page directory present Ignored Address of 2MB page frame Reserved Ignored Ignored Address of page table Ignored Ignored Address of 4KB page frame Ignored present Figure 4 7 Formats of CR3 and Paging Structure Entries with PAE Paging NOTES 1 Mis an abbreviation for MAXPHYADDR 2 CR3 has 64 bits only on processors supporting the Intel 64 architecture These bits are ignored with PAE paging Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 212 Documentation Changes 3 Reserved fields must be 0 4 If A32 EFERNXE 0 and the P flag of a PDE or a PTE is 1 the XD flag bit 63 is reserved Table 4 8 Format of a PAE Page Directory Pointer Table Entry PDPTE Bit Position s Contents 0 P Present must be 1 to reference a page directory 2 1 Reserved must be 0 3 PWT Page level write through indirectly determines the memory type used to access the page directory referenced by this entry see Section 4 9 4 PCD Page level cache disable indirectly determines the memory type used to access the page directory referenced by this entry see Section 4 9 8 5 Reserved must be 0
296. s 41CH 1052 MSR MC7 CTL Package See Section 15 3 2 1 1 32 MCi MSRs 41DH 1053 MSR_MC7_ Package See Section 15 3 22 A32 MCi STATUS STATUS MSRS and Appendix E Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 288 Documentation Changes intel Register Scope Address Register Name Bit Description Hex Dec 41EH 1054 MSR MC7 ADDR Package See Section 15 3 2 3 IA32_MCi_ADDR MSRs 41FH 1055 MSR MC7 MISC Package See Section 15 3 2 4 IA32_MCi_MISC MSRs 420H 1056 MSR_MC8_CTL Package See Section 15 3 2 1 A32 MCi MSRs 421H 1057 MSR MC8 Package See Section 15 3 22 A32 MCi STATUS STATUS MSRS and Appendix E 422H 1058 MSR MC8 ADDR Package See Section 15 3 2 3 A32 MCi ADDR MSRs 423H 1059 MSR MC8 MISC Package See Section 15 3 24 4 32 MCi MISC MSRs 424H 1060 MC9 Package See Section 15 3 2 1 A32 MCi MSRs 425H 1061 MSR MC9 Package See Section 15 3 22 A32 MCi STATUS STATUS MSRS and Appendix E 426H 1062 MC9 ADDR Package See Section 15 3 2 3 A32 MCi ADDR MSRs 427H 1063 MC9 MISC Package See Section 15 3 24 4 32 MCi MISC MSRs 428H 1064 MSR 10 Package See Section 15 3 2 1 4 32 MCi MSRs 429H 1065 MSR MC10
297. s Software Developer s Manual Documentation Changes 56 Documentation Changes DPPS Dot Product of Packed Single Precision Floating Point Values Opcode Instruction Op 64 Bit Description En Mode Leg Mode 66 0 40 DPPS xmml Valid Selectively multiply packed ib xmm2 m128 SP floating point values imm8 from xmm1 with packed SP floating point values from xmm2 add and selectively store the packed SP floating point values or zero values to xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r imm8 NA EMMS Empty MMX Technology State Opcode Instruction Op 64 Bit Description En Mode Leg Mode OF 77 EMMS A Valid Set the x87 FPU tag word to empty Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA ENTER Make Stack Frame for Procedure Parameters Opcode C8 iw 00 C8 iw 01 C8 iw ib Instruction Op 64 Bit En Mode ENTER imm16 0 A Valid ENTER imm16 1 ENTER imm16 imm8 A Valid A Valid Description Leg Mode Create a stack frame for a procedure Create a nested stack frame for a procedure Create a nested stack frame for a procedure Instruction Operand Encoding Op En A Operand 1 iw Operand 2 imm8 Operand 3 Operand 4 NA NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes
298. s in mm m64 add adjacent doubleword results and store in mm 66 OF F5 r PMADDWD 1 Valid Valid Multiply the packed word xmm2 m128 integers in xmm1 by the packed word integers in xmm2 m128 add adjacent doubleword results and store in xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 131 H intel PMAXSB Maximum of Packed Signed Byte Integers Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 0F 383C r PMAXSBxmml Valid Valid Compare packed signed byte xmm2 m128 integers in xmm1 and xmm2 m128 and store packed maximum values in xmml Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA PMAXSD Maximum of Packed Signed Dword Integers Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 0F 38 30 PMAXSDxmml Valid Valid Compare packed signed xmm2 m128 dword integers in xmm1 and xmm2 m128 and store packed maximum values in xmml Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m PMAXSW Maximum of Packed Signed Word Integers Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mo
299. s present 24 MCG SER P The processor supports software error recovery if this bit is set 6325 Reserved 1BOH 432 IA32_ENERGY_PERF_BIAS Performance Energy Bias if Hint R W CPUID 6H ECX 3 1 280H 640 1 32 R W 06 1AH 14 0 Corrected error count threshold 29 15 Reserved 30 EN 6331 Reserved 414H 1044 IA32 5 5 06_0FH 415H 1045 IA32 5 STATUS MC5 STATUS 06 416H 1046 IA32 MC5 MC5 ADDR 06 OFH Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 284 Documentation Changes Register Address Architectural MSR Name Introduced as Decimal A NR MSR Bit Description 417H 1047 IA32_MC5_MISC MC5_MISC 06 OFH 418H 1048 IA32_MC6_CTL MC6_CTL 06_1DH 419 1049 IA32_MC6_STATUS MC6_STATUS 06_1DH 41AH 1050 IA32_MC6_ADDR1 MC6_ADDR 06_1DH 41BH 1051 IA32_MC6_MISC MC6_MISC 06_1DH 41CH 1052 IA32 MC7 MC7 CTL 06 1AH 41DH 1053 32 7 STATUS MC7 STATUS 06 1AH 41 1054 IA32_MC7_ADDR1 MC7_ADDR 06_1AH 41FH 1055 IA32 MC7 MISC MC7 MISC 06 1AH 420H 1056 IA32_MC8_CTL MC8_CTL 06_1AH 421H 1057 IA32_MC8_STATUS MC8_STATUS 06_1AH 422H 1058 IA32 MC8 ADDR 8 06 423H 1059 1432 MC8 MISC MC8 MISC 06 1AH 424H 1060 IA32 9 9 06 2 425H 1061 IA32 MC9 S
300. s the following byte registers if a REX prefix is used AH BH CH DH Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A AX EAX RAX w reg r w NA NA B reg r w AX EAX RAX w C ModRM r m r w ModRM reg w D ModRM reg r w ModRM r m r w NA NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 203 Documentation Changes XGETBV Get Value of Extended Control Register Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 01 DO XGETBV A Valid Valid Reads an XCR specified by ECX into EDX EAX Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA XLAT XLATB Table Look up Translation Opcode Instruction Op 64 Bit Compat Description En Leg Mode D7 XLAT m8 A Valid Valid Set AL to memory byte DS E BX unsigned AL D7 XLATB A Valid Valid Set AL to memory byte DS E BX unsigned AL REX W 07 XLATB A Valid N E Set AL to memory byte RBX unsigned AL Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 204 Documentation Changes XOR Logical Exclusive OR ntel Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 34 ib XOR AL imm8 A Valid V
301. s treated as a new instruction in 64 bit mode N S Indicates an instruction syntax that requires an address override prefix in 64 bit mode and is not supported Using an address override prefix in 64 bit mode may result in model specific execution behavior Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 10 Boxusieniniton Chinon intel AAA ASCII Adjust After Addition Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode 37 AAA A Invalid Valid ASCII adjust AL after addition Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA 64 Bit Mode Exceptions UD If in 64 bit mode AAD ASCII Adjust AX Before Division Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode D5 0A AAD A Invalid Valid ASCII adjust AX before division D5 ib No mnemonic A Invalid Valid Adjust AX before division to number base imm8 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA AAM ASCII Adjust AX After Multiply Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode D4 0A AAM A Invalid Valid ASCII adjust AX after multiply DA ib No mnemonic A Invalid Valid Adjust AX after multiply to number base imm8 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA Intel 64 and 32 A
302. sical logical broadcast self or lowest priority delivery mode These destination modes are described in the following sections Determination of IPI destinations in x2APIC mode is discussed in Section 10 12 10 NOTES All processors that have their APIC software enabled using the spurious vector enable disable bit must have their DFRs Destination Format Registers programmed identically The default mode for DFR is flat mode If you are using cluster mode DFRs must be programmed before the APIC is software enabled Since some chipsets do not accurately track a system view of the logical mode program DFRs as soon as possible after starting the processor 10 6 2 3 Broadcast Self Delivery Mode The destination shorthand field of the ICR allows the delivery mode to be by passed in favor of broadcasting the IPI to all the processors on the system bus and or back to itself see Section 10 6 1 Interrupt Command Register ICR Three destination short hands are supported self all excluding self and all including self The destination mode is ignored when a destination shorthand is used 10 8 5 Signaling Interrupt Servicing Completion For all interrupts except those delivered with the NMI SMI INIT ExtI NT the start up or INIT Deassert delivery mode the interrupt handler must include a write to the end of interrupt EOI register see Figure 10 21 This write must occur at the end of the handler routine sometime before the IR
303. sign bits Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 151 H intel Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m ModRM r m w imm8 NA NA PSRLDQ Shift Double Quadword Right Logical Opcode Instruction Op 64 Bit Compat Description En Leg Mode 66 0 73 3 PSRLDQxmml A Valid Valid Shift xmm1 right by imm8 imm8 while shifting in Os Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m w imm8 NA NA PSRLW PSRLD PSRLQ Shift Packed Data Right Logical Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF Dl r PSRLW mm A Valid Valid Shift words in mm right by mm m64 amount specified in mm m64 while shifting in Os 66 OF D1 r PSRLW xmm1 A Valid Valid Shift words in xmm1 right xmm2 m128 by amount specified in xmm2 m128 while shifting in Os OF 71 2 ib PSRLW mm imm8 B Valid Valid Shift words in mm right by imm8 while shifting in Os 66 0 71 2ib PSRLW xmm1 B Valid Valid Shift words in xmm1 right imm8 by imm8 while shifting in 05 OF D2 r PSRLD mm A Valid Valid Shift doublewords in mm mm m64 right by amount specified in mm m64 while shifting in Os 66 OF D2 r PSRLD xmm1 A Valid Valid Shift doublewords xmm1 xmm2 m128 right by amount specified in xmm 2 m128 while shif
304. signed doublew ord integer in r32 using truncation F3REXWOF2C CVTTSS2SIr64 Valid N E Convert one single precision xmm m32 floating point value from xmm m32 to one signed quadword integer in r64 using truncation Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 52 H intel CWD CDQ CQO Convert Word to Doubleword Convert Doubleword to Quadword Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 99 CWD A Valid Valid DX AX lt sign extend of AX 99 CDQ A Valid Valid EDX EAX lt sign extend of REX W 99 CQO A Valid N E RDX RAX lt sign extend of RAX Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA DAA Decimal Adjust AL after Addition Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 27 DAA A Invalid Valid Decimal adjust AL after addition Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA DAS Decimal Adjust AL after Subtraction Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 2F DAS A Invalid Valid Decimal adjust AL after subtraction Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA
305. sing INVLPG may be delayed if software does not re allocate that portion of the linear address space or the memory that had been associated with it However because of speculative execution or errant software there may be accesses to the freed portion of the linear address space before the invalidations occur In this case the following can happen Reads can occur to the freed portion of the linear address space Therefore invali dation should not be delayed for an address range that has read side effects The processor may retain entries in the TLBs and paging structure caches for an extended period of time Software should not assume that the processor will not use entries associated with a linear address simply because time has passed As noted in Section 4 10 2 1 the processor may create an entry in a paging structure cache even if there are no translations for any linear address that might use that entry Thus if software has marked not present all entries in page table the processor may subsequently create a PDE cache entry for the PDE that references that page table assuming that the PDE itself is marked present If software attempts to write to the freed portion of the linear address space the processor might not generate a page fault Such an attempt would likely be the result of a software error For that reason the page frames previously associated with the freed portion of the linear address space should no
306. sors assume the processor ordering model or a weaker memory ordering model The Intel Core 2 Duo Intel Atom Intel Core Duo Pentium 4 Intel Xeon and P6 family processors do not implement a strong memory ordering model except when using the UC memory type Despite the fact that Pentium 4 Intel Xeon and P6 family processors support processor ordering Intel does not guarantee that future processors will support this model To make soft ware portable to future processors it is recommended that operating systems provide critical region and resource control constructs and API s application program interfaces based 1 0 locking and or serializing instructions be used to synchronize access to shared areas of memory in multiple processor systems Also software should not depend on processor ordering in situations where the system hardware does not support this memory ordering model 8 3 SERIALIZING INSTRUCTIONS The Intel 64 and 32 architectures define several serializing instructions These instructions force the processor to complete all modifications to flags registers and memory by previous instructions and to drain all buffered writes to memory before the next instruction is fetched and executed For example when a MOV to control register instruction is used to load a new value into control register CRO to enable protected mode the processor must perform a serializing operation before it enters protected mode This serializing o
307. ssembly instruction syntax using a letter to cross reference to a row entry in the operand encoding definition table that follows the instruction summary table The definition table is organized according to the order of operand in Intel assembly syntax The encoding method for each operand in the instruction byte stream is expressed via modR M modR M imm8 16 32 64 etc cross reference NOTES The letters in the Op En column of an instruction apply ONLY to the encoding definition table immediately following the instruction summary table nthe encoding definition table the letter r within a pair of parenthesis denotes the content of the operand will be read by the processor The letter w within a pair of parenthesis denotes the content of the operand will be updated by the processor Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 9 ee ehanas intel 3 1 1 4 64 bit Mode Column in the Instruction Summary Table The 64 bit Mode column indicates whether the opcode sequence is supported in 64 bit mode The column uses the following notation Valid Supported Invalid Not supported N E Indicates an instruction syntax is not encodable in 64 bit mode it may represent part of a sequence of valid instructions in other modes N P Indicates the REX prefix does not affect the legacy instruction in 64 bit mode N l Indicates the opcode i
308. ssors and Intel Xeon processors than a full 40 bit read On 64 bit Intel Xeon processors with L3 performance counters with indices 18 25 are 32 bit counters EDX is cleared after executing RDPMC for these counters On Intel Xeon processor 7100 series with L3 performance counters with indices 18 25 are also 32 bit counters InIntel Core 2 processor family Intel Xeon processor 3000 5100 5300 and 7400 series the fixed function performance counters are 40 bits wide they can be accessed by RDMPC with ECX between from 4000 0000H and 4000 0002H On Intel Xeon processor 7400 series there are eight 32 bit special purpose counters addressable with indices 2 9 ECX 30 0 When in protected or virtual 8086 mode the performance monitoring counters enabled PCE flag in register CR4 restricts the use of the RDPMC instruction as follows When the PCE flag is set the RDPMC instruction can be executed at any privilege level when the flag is clear the instruction can only be executed at privilege level 0 When in real address mode the RDPMC instruction is always enabled The performance monitoring counters can also be read with the RDMSR instruction when executing at privilege level 0 Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 166 ee ehanas intel The performance monitoring counters are event counters that can be programmed to count events such as the number of instructions decoded number of
309. st one core is TED L3 FLL ENABL unhalted and all L3 ways are enabled E Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 281 Documentation Changes 86H 01H UNC_CYCLES_UNHAL TED_L3_FLL_DISABL E Uncore cycles that at least one core is unhalted and all L3 ways are disabled Table 7 Fixed Function Performance Counter and Pre defined Performance Events Fixed Function CTR2 IA32 PERF FIXED CT R2 Performance Event Mask Counter Address Mnemonic Description MSR PERF FIXED 309H Inst RetiredAny This event counts the number of CTRO instructions that retire execution For IA32_PERF_FIXED_CT instructions that consist of multiple micro RO ops this event counts the retirement of the last micro op of the instruction The counter continue counting during hardware interrupts traps and inside interrupt handlers MSR_PERF_FIXED_ 30AH CPU CLK UNHALT This event counts the number of core CTR1 ED CORE cycles while the core is not in a halt state 32 PERF FIXED CT The core enters the halt state when it is R1 running the HLT instruction This event is a component in many key event ratios MSR PERF FIXED 30BH CPU CLK UNHALT This event counts the number of ED REF reference cycles when the core is not in a halt state and not in a TM stop clock state The core enters the halt state when it is running the HLT instruction or the MWAI
310. t EN bit 11 and the extended mode bit EXTD bit 10 in the IA32 APIC BASE MSR 63 36 35 12111098 7 0 Reserved APIC Base APIC Base Base physical address EN xAPIC global enable disable EXTD Enable x2APIC mode BSP Processor is BSP Reserved Figure 10 26 1 32 APIC BASE MSR Supporting x2APIC Table 10 5 x2APIC Operating Mode Configurations global enable x2APIC enable IA32 APIC BASE 11 IA32_APIC_BASE 10 Description local APIC is disabled Invalid local APIC is enabled in xAPIC mode local APIC is enabled in x2APIC mode Once the local APIC has been switched to x2APIC mode EN 1 EXTD 1 switching back to xAPIC mode would require system software to disable the local APIC unit Specif ically attempting to write a value to the 2 APIC BASE MSR that has 1 EXTD 0 when the local APIC is enabled and x2APIC mode causes a general protection exception Once bit 10 in 2 APIC BASE MSR is set the only way to leave x2APIC mode using 2 5 would require a WRMSR to set both bit 11 and bit 10 to zero Section 10 12 5 x2APIC State Transitions provides a detailed state diagram for the state transitions allowed for the local APIC The MSR address range 800H through BFFH is architecturally reserved and dedicated for accessing APIC re
311. t VMCS This document frequently uses the term the VMCS to refer to the current VMCS The VMLAUNCH VMREAD VMRESUME and VMWRITE instructions operate only on the current VMCS The following items describe how a logical processor determines which VMCSs are active and which is current The memory operand of the VMPTRLD instruction is the address of a VMCS After execution of the instruction that VMCS is both active and current on the logical processor Any other VMCS that had been active remains so but no other VMCS is current The memory operand of the VMCLEAR instruction is also the address of a VMCS After execution of the instruction that VMCS is neither active nor current on the logical processor If the VMCS had been current on the logical processor the logical processor no longer has a current VMCS 1 The amount of memory required for a VMCS region is at most 4 KBytes The exact size is implemen tation specific can be determined by consulting the VMX capability MSR IA32 BASICto determine the size of the VMCS region see Appendix G 1 2 Software can determine a processor s physical address width by executing CPUID with 80000008H in EAX The physical address width is returned in bits 7 0 of EAX Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 262 e Documentation Changes n tel The VMPTRST instruction stores the address of the logical processor s current VMCS int
312. t Values Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode 58 r ADDSS 1 A Valid Valid Add the low single precision xmm2 m32 floating point value from xmm2 m32 to xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 15 Documentation Changes ADDSUBPD Packed Double FP Add Subtract intel Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode 66 OF DO r ADDSUBPDxmml1 Valid Valid Add subtract double xmm2 m128 precision floating point values from xmm2 m128 to xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m NA NA ADDSUBPS Packed Single FP Add Subtract Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode F2 OF DO r ADDSUBPSxmm1 Valid Valid Add subtract single xmm2 m128 precision floating point values from xmm2 m128 to xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA AND Logical AND Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode 24 ib AND AL imm8 C Valid Valid AL AND imm8 25 iw ANDAX imm16 C Valid Valid AX AND imm16 25 id ANDEAX imm32 Valid Valid EAX AND imm32 REXW 25id
313. t address DS E SI at dword at address ES E DI For 64 bit mode compare dword at address R E SI at dword at address The status flags are set accordingly Compares quadw ord at address R E SI with quadw ord at address R E DI and sets the status flags accordingly For legacy mode compare byte at address DS E SI with byte at address ES E DI For 64 bit mode compare byte at address R E SI with byte at address R E DI The status flags are set accordingly 7 CMPSW Valid Valid Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes For legacy mode compare word at address DS E SI with word at address ES E DI For 64 bit mode compare word at address R E SI with word at address R E DI The status flags are set accordingly 36 Documentation Changes Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode A7 CMPSD A Valid Valid For legacy mode compare dword at address DS E SI with dword at address ES E DI For 64 bit mode compare dword at address RJE SI with dword at address R E DI The status flags are set accordingly REX W A7 CMPSQ A Valid Compares quadword at address R E SI with quadword at address R E DI and sets the status flags accordingly Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA CMPSD Compare Scalar Double Precision Floating Point Values
314. t be reallocated for another purpose until the appropriate invalidations have been performed Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 226 ee ehanas intel 4 Updates to Chapter 5 Volume 3A Change bars show changes to Chapter 5 of the Intel 64 and 32 Architectures Soft ware Developer s Manual Volume 3A System Programming Guide Part 1 5 3 LIMIT CHECKING The limit field of a segment descriptor prevents programs or procedures from addressing memory locations outside the segment The effective value of the limit depends on the setting of the G granularity flag see Figure 5 1 For data segments the limit also depends on the E expansion direction flag and the B default stack pointer size and or upper bound flag The E flag is one of the bits in the type field when the segment descriptor is for a data segment type When the G flag is clear byte granularity the effective limit is the value of the 20 bit limit field in the segment descriptor Here the limit ranges from 0 to FFFFFH 1 MByte When the G flag is set 4 KByte page granularity the processor scales the value in the limit field by a factor of 212 4 KBytes In this case the effective limit ranges from FFFH 4 KBytes to FFFFFFFFH 4 GBytes Note that when scaling is used G flag is set the lower 12 bits of a segment offset address are not checked against the limit for example note that if the segment limit is
315. tains the following information from the paging structure entries used to translate linear addresses with the page number The physical address corresponding to the page number the page frame The access rights from the paging structure entries used to translate linear addresses with the page number see Section 4 6 The logical AND of the R W flags The logical AND of the U S flags logical OR of the XD flags necessary only if 1A32 EFER NXE 1 Attributes from a paging structure entry that identifies the final page frame for the page number either a PTE or a paging structure entry in which the PS flag is 1 The dirty flag see Section 4 8 The memory type see Section 4 9 4 10 1 3 Details of TLB Use Because the TLBs cache only valid translations there can be a TLB entry for a page number only if the P flag is 1 and the reserved bits are 0 in each of the paging structure entries used to translate that page number In addition the processor does not cache a translation for a page number unless the accessed flag is 1 in each of the paging struc ture entries used during translation before caching a translation the processor sets any of these accessed flags that is not already 1 The processor may cache translations required for prefetches and for accesses that are a result of speculative execution that would never actually occur in the executed code path If the page number of a linear ad
316. tes that this local APIC has completed sending any previous Pls 1 Send Pending Indicates that this local APIC has not completed sending the last IPI Destination Specifies the target processor or processors This field is only used when the destination shorthand field is set to OOB If the destination mode is set to physical then bits 56 through 59 contain the APIC ID of the target processor for Pentium and P6 family processors and bits 56 through 63 contain the APIC ID of the target processor the for Pentium 4 and Intel Xeon processors If the destination mode is set to logical the interpretation of the 8 bit destination field depends on the settings of the DFR and LDR registers of the local APICs in all the processors in the system see Section 10 6 2 Determining Destination Not all combinations of options for the ICR are valid Table 10 3 shows the valid combi nations for the fields in the ICR for the Pentium 4 and Intel Xeon processors Table 10 4 shows the valid combinations for the fields in the ICR for the P6 family processors Also note that the lower half of the ICR may not be preserved over transitions to the deepest C States ICR operation in x2APIC mode is discussed in Section 10 12 9 10 6 2 Determining IPI Destination The destination of an IPI can be one all or a subset group of the processors on the system bus The sender of the IPI specifies the destination of an IPI with the following APIC regist
317. that bit is set to 1 When supported the feature is available in both xAPI C mode and x2APIC mode System software desiring to perform directed EOIs for level triggered interrupts should set bit 12 of the Spurious Interrupt Vector Register and follow each the EOI to the local for a level triggered interrupt with a directed EOI to the 1 0 APIC generating the interrupt this is done by writing to the APIC s EOI register System software performing directed EOIs must retain a mapping associating level triggered interrupts with the I O APICs in the system 10 8 6 Task Priority 32 Mode In 32 mode operating systems can manage the 16 priority classes of external inter rupts see Section 10 8 3 Interrupt Task and Processor Priority explicitly using the task priority register TPR Operating systems can use the TPR to temporarily block specific low priority interrupts from interrupting a high priority task This is done by loading TPR with a value corresponding to the highest priority interrupt that is to be blocked For example Loading the TPR with a value of 8 01000B blocks all interrupts with a priority of 8 or less while allowing all interrupts with a priority of nine or more to be recognized Loading the TPR with zero enables all external interrupts Loading the TPR with OF 01111B disables all external interrupts TPR shown in Figure 10 18 is cleared to 0 on reset 64
318. the APIC registers for Intel 64 or 32 processors on the system bus are initially mapped to the same 4 KByte region of the physical address space Software has the option of changing initial mapping to a different 4 KByte region for all the local APICs or of mapping the APIC registers for each local APIC to its own 4 KByte region Section 10 4 5 Relocating the Local APIC Registers describes how to relocate the base address for APIC registers On processors supporting x2APIC architecture indicated by CPUID 0O1H ECX 21 1 the local APIC supports operation in the xAPIC mode as described in Section 10 4 Addi tionally software can enable the local APIC to operate in x2APIC mode for extended processor addressability see Section 10 12 Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 239 Documentation Changes NOTE In processors based on Intel Microarchitecture Nehalem the Local APIC ID Register is no longer Read Write it is Read Only Table 10 1 Local APIC Register Address Map Address Register Name Software Read Write FEEO 0000 Reserved 0 0010H Reserved FEEO 0020H Local APIC ID Register Read Write FEEO 0030H Local APIC Version Register Read Only FEEO 0040H Reserved 0 0050H Reserved FEEO 0060H Reserved 0 0070H Reserved FEEO 0080H Task Priority Register TPR Read Write
319. the legacy local xAPIC ID 8 bits are preserved across this transition A transi tion from the x2API C mode to xAPIC mode is not valid and the corresponding WRMSR to thelA32 APIC BASE MSR causes a general protection exception A RESET in this state places the x2APIC in xAPIC mode All APIC registers including the local APIC ID register are initialized as described in Section 10 12 5 1 An INIT in this state keeps the x2APIC in the x2APIC mode The state of the local APIC ID register is preserved all 32 bits However all the other APIC registers are initialized as a result of the INIT transition 10 12 7 CPUID Extensions And Topology Enumeration For Intel 64 and 32 processors that support x2APIC a value of 1 reported by CPUI D O1H ECX 21 indicates that the processor supports x2APIC and the extended topology enumeration leaf CPUI D OBH The extended topology enumeration leaf can be accessed by executing CPUID with EAX OBH Processors that do not support x2APIC may support CPUID leaf OBH Software can detect the availability of the extended topology enumeration leaf OBH by performing two steps Check maximum input value for basic CPUID information by executing CPUID with EAX 0 If CPUI D OH EAX is greater than or equal or 11 OBH then proceed to next step e Check CPUID EAX 0BH ECX 0H EBX is non zero If both of the above conditions are true extended topology enumeration leaf is available If available the
320. ther adverse behavior Such an exception will occur at most once for each affected linear address see Section 4 10 3 1 If it is also the case that no invalidation was performed the last time the P flag was changed from 1 to 0 the processor may use a TLB entry or paging structure cache entry that was created when the P flag had earlier been 1 Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 224 e Documentation Changes n tel Ifa paging structure entry is modified to change the accessed flag from 1 to 0 failure to perform an invalidation may result in the processor not setting that bit in response to a subsequent access to a linear address whose translation uses the entry Software cannot interpret the bit being clear as an indication that such an access has not occurred If software modifies a paging structure entry that identifies the final physical address for a linear address either a PTE or a paging structure entry in which the PS flag is 1 to change the dirty flag from 1 to 0 failure to perform an invalidation may result in the processor not setting that bit in response to a subsequent write to a linear address whose translation uses the entry Software cannot interpret the bit being clear as an indication that such a write has not occurred The read of a paging structure entry in translating an address being used to fetch an instruction may appear to execute before an earlier w
321. ting in Os OF 72 2 ib PSRLD mm imm8 Valid Valid Shift doublewords in mm right by imm8 while shifting in Os Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 152 Documentation Changes Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 0F72 2ib PSRLDxmm1 B Valid Valid Shift doublewords in xmm1 imm8 right by imm8 while shifting in 0s OF D3 r PSRLQ mm A Valid Valid Shift mm right by amount mm m64 specified in mm m64 while shifting in Os 66 OF D3 r PSRLQ 1 A Valid Valid Shift quadwords xmm1 xmm2 m128 right by amount specified in xmm2 m128 while shifting in Os OF 73 2 ib PSRLQ mm imm8 Valid Valid Shift mm right by imm8 while shifting in Os 66 0 73 2ib PSRLQ xmml1 B Valid Valid Shift quadwords in xmm1 imm8 right by imm8 while shifting in 0s Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m ModRM r m w imm8 NA NA PSUBB PSUBW PSUBD Subtract Packed Integers Opcode OF F8 r 66 OF F8 r OF F9 r 66 OF F9 r OF FA r Instruction PSUBB mm mm m64 PSUBB xmm1 xmm2 m128 PSUBW mm mm m64 PSUBW xmm1 xmm2 m128 PSUBD mm mm m64 Op 64 Bit En Mode A Valid A Valid A Valid A Valid A Valid Compat Leg Mode Valid Valid Valid Valid Valid Intel 64 and 32 Architectures Software Develo
322. ting point values from xmm2 m128 to xmml 66 OF 11 r MOVUPD B Valid Valid Move packed double xmm2 m128 precision floating point xmm values from xmm1 to xmm2 m128 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m r NA NA B ModRM r m w ModRM reg r NA NA MOVUPS Move Unaligned Packed Single Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 10 r MOVUPS xmm1 Valid Valid Move packed single xmm2 m128 precision floating point values from xmm2 m128 to xmml OF 11 r MOVUPS B Valid Valid Move packed single xmm2 m128 precision floating point xmm1 values from xmm1 to xmm2 m128 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA B ModRM r m ModRM reg NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 106 Documentation Changes MOVZX Move with Zero Extend MOVZX r16 r m8 MOVZX r32 r m8 A MOVZX r64 r m8 A Description Leg Mode Move byte to word with zero extension Move byte to doubleword zero extension Move byte to quadword zero extension Move word to doublew ord zero extension Move word to quadword zero extension Opcode Instruction OF B6 r OF B6 r REX W 0F B6 Ir OF B7 r MOVZX r32 r m16 REX W 0F
323. tion En Mode Leg Mode 66 OF E6 CVTTPD2DQ A Valid Valid Convert two packed double 1 precision floating point xmm2 m128 values from xmm2 m128 to Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA CVTTPD2PI Convert with Truncation Packed Double Precision FP Values to Packed Dword Integers Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF 2C r CVTTPD2PImm Valid Valid Convert two packer double xmm m128 precision floating point values from xmm m128 to two packed signed doubleword integers mm using truncation Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m NA NA Intel 64 and 1 32 Architectures Software Developer s Manual Documentation Changes 50 chenes intel CVTTPS2DQ Convert with Truncation Packed Single Precision FP Values to Packed Dword Integers Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 5B r CVTTPS2DQ A Valid Valid Convert four single xmml precision floating point xmm2 m128 values from xmm2 m128 to four signed doubleword integers in xmm1 using truncation Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 ModRM reg w ModRM r m CVTTPS2P Convert with Truncation Packed Single Precision FP Values
324. tion Changes 99 H intel MOVNTPS Store Packed Single Precision Floating Point Values Using Non Temporal Hint Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 2B r MOVNTPS m128 A Valid Valid Move packed single xmm precision floating point values from xmm to m128 using non temporal hint Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m w ModRM reg r NA NA MOVNTQ Store of Quadword Using Non Temporal Hint Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF E7 Jr MOVNTQ m64 A Valid Valid Move quadword from mm to mm m64 using non temporal hint Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM r m ModRM reg NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 100 chenes intel MOVQ Move Quadword Opcode Instruction Op 64 Bit Compat Description En Leg Mode OF 6F r MOVQ mm A Valid Valid Move quadword from mm m64 mm m64 to mm OF 7F r MOVQ mm m64 Valid Valid Move quadword from mm to mm mm m64 F30F 7E MOVQ A Valid Valid Move quadword from xmm2 m64 xmm2 mem64 to xmm1 66 OF D6 MOVQ B Valid Valid Move quadword from xmm1 xmm2 m64 to xmm2 mem64 xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 ModRM reg
325. tions in system memory These operations are typically used to manage shared data structures such as semaphores segment descriptors system segments or page tables in which two or more processors may try simultaneously to modify the same field or flag The processor uses three interdependent mechanisms for carrying out locked atomic operations Guaranteed atomic operations Bus locking using the LOCK signal and the LOCK instruction prefix Cache coherency protocols that ensure that atomic operations can be carried out on cached data structures cache lock this mechanism is present in the Pentium 4 Intel Xeon and P6 family processors These mechanisms are interdependent in the following ways Certain basic memory transactions such as reading or writing a byte in system memory are always guaran teed to be handled atomically That is once started the processor guarantees that the operation will be completed before another processor or bus agent is allowed access to the memory location The processor also supports bus locking for performing selected memory operations such as a read modify write operation in a shared area of memory that typically need to be handled atomically but are not automatically handled this way Because frequently used memory locations are often cached in a processor s L1 or L2 caches atomic operations can often be carried out inside a processor s caches without asserting the bus lock Here the processor s ca
326. to Packed Dword Integers Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 2C r CVTTPS2PImm A Valid Valid Convert two single xmm m64 precision floating point values from xmm m64 to two signed doubleword signed integers in mm using truncation Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 ModRM reg w ModRM r m Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 51 H intel CVTTSD2SI Convert with Truncation Scalar Double Precision FP Value to Signed Integer Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode F2 OF 2C r CVTTSD2SIT32 Valid Valid Convert one double xmm m64 precision floating point value from xmm m64 to one signed doubleword integer in r32 using truncation F2REX WOF2C CVTTSD2SIr64 A Valid N E Convert one double Ir xmm m64 precision floating point value from xmm m64 to one signedquadw ord integer in r64 using truncation Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m r NA NA CVTTSS2S Convert with Truncation Scalar Single Precision FP Value to Dword Integer Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 2C r 55251 32 A Valid Valid Convert one single precision xmm m32 floating point value from xmm m32 to one
327. tory pointer table Table 4 8 illustrates how CR3 is used with PAE paging Table 4 8 Use of CR3 with PAE Paging Bit Contents Position s 40 Ignored 315 Physical address of the 32 Byte aligned page directory pointer table used for linear address translation 63 32 Ignored these bits exist only on processors supporting the Intel 64 architecture Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 211 e Documentation Changes n tel The page directory pointer table comprises four 4 64 bit entries called PDPTEs Each PDPTE controls access to a 1 GByte region of the linear address space Corresponding to the PDPTEs the logical processor maintains a set of four 4 internal non architectural PDPTE registers called PDPTEO PDPTE1 PDPTE2 and PDPTE3 The logical processor loads these registers from the PDPTEs in memory as part of certain executions the MOV to CR instruction f PAE paging would be in use following an execution of MOV to CRO or MOV to CR4 see Section 4 1 1 and the instruction is modifying any of CRO CD CRO NW CRO PG CR4 PAE CR4 PGE or CR4 PSE then the PDPTEs are loaded from the address in CR3 f MOV to CR3 is executed while the logical processor is using PAE paging the PDPTEs are loaded from the address being loaded into CR3 f PAE paging is in use and a task switch changes the value of the PDPTEs loaded from the addres
328. truction BLENDPD Blend Packed Double Precision Floating Point Values Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode 660F3A0D r BLENDPDxmml A Valid Valid Select packed DP FP values ib xmm2 m128 from xmm1 and imm8 xmm2 m128 from mask specified in imm8 and store the values into xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r imm8 NA BLENDPS Blend Packed Single Precision Floating Point Values Opcode Instruction Op 64 bit Compat Description En Mode Leg Mode 66 0F 3A0C r BLENDPSxmml A Valid Valid Select packed single ib xmm2 m128 precision floating point imm8 values from xmm1 and xmm2 m128 from mask specified in imm8 and store the values into xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r imm8 NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 20 Documentation Changes BLENDVPD Variable Blend Packed Double Precision Floating Point Values Opcode Instruction Op 64 bit Compat Description En Leg Mode 660F3815 r BLENDVPDxmml1 Valid Valid Select packed DP FP values xmm2 m128 from xmm1 and xmm2 from lt 0 gt mask specified XMMO and store the values xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A M
329. ts generated by attempts to execute in virtual 8086 mode privileged instructions that are not recognized in that mode 2 MOV DR is an exception to this rule see Section 22 13 Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 267 chenes intel 10 Updates to Chapter 25 Volume 3B Change bars show changes to Chapter 25 of the Intel 64 and IA 32 Architectures Soft ware Developer s Manual Volume 3B System Programming Guide Part 2 25 2 2 EPT Translation Mechanism Because a PDPTE is identified using bits 47 30 of the guest physical address it controls access to 1 GByte region of the guest physical address space Use of the PDPTE depends on the value of bit 7 in that entry 1 If bit 7 of the PDPTE is 1 the EPT PDPTE maps a 1 GByte page see Table 25 2 The final physical address is computed as follows Table 25 2 Format of an EPT Page Directory Pointer Table Entry PDPTE that Maps a 1 GByte Page Bit Contents Position s 0 Read access indicates whether reads are allowed from the 1 GByte page referenced by this entry 1 Write access indicates whether writes are allowed to the 1 GByte page referenced by this entry 2 Execute access indicates whether instruction fetches are allowed from the 1 GByte page referenced by this entry 5 3 EPT memory type for this 1 GByte page see Section 25 2 4 6 Ignore PA
330. ty 3C3H 947 MSR_UNCORE_PE Package See Section 30 6 2 2 Uncore Performance RFEVTSEL3 Event Configuration Facility 3C4H 948 MSR_UNCORE_PE Package See Section 30 6 2 2 Uncore Performance RFEVTSEL4 Event Configuration Facility 3C5H 949 MSR_UNCORE PE Package See Section 30 6 2 2 Uncore Performance RFEVTSEL5 Event Configuration Facility 3C6H 950 MSR UNCORE Package See Section 30 6 22 Uncore Performance RFEVTSEL6 Event Configuration Facility 3C7H 951 MSR_UNCORE PE Package See Section 30 6 2 2 Uncore Performance RFEVTSEL7 Event Configuration Facility 403H 1027 MSR MISC Package See Section 15 3 2 4 A32 MCi MISC MSRs Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 287 Documentation Changes intel Register Scope Address Register Name Bit Description Hex Dec 407H 1031 MC1 MISC Package See Section 15 3 2 4 1 32_ MSRs 40BH 1035 MSR MC2 MISC Core See Section 15 3 24 A32 MCi MISC MSRs 40CH 1036 MSR MC3 CTL Core See Section 15 3 2 1 IA32 MCi CTL MSRs 40DH 1037 MSR MC3 Core See Section 15 3 22 IA32 MCi STATUS STATUS MSRS 40EH 1038 MSR_MC3_ADDR Core See Section 15 3 2 3 IA32_MCi_ADDR MSRs The MSR MC4 ADDR register is either not implemented or contains no address if the ADD
331. type 8 4 4 2 Typical AP Initialization Sequence When an AP receives the SIPI it begins executing BIOS AP initialization code at the vector encoded in the SIPI The AP initialization code typically performs the following operations 1 MOV CR8 is not defined architecturally as a serializing instruction 2 LFENCE does provide some guarantees on instruction ordering It does not execute until all prior instructions have completed locally and no later instruction begins execution until LFENCE com pletes Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 237 ee ehanas intel 1 Waits on the BIOS initialization Lock Semaphore When control of the semaphore is attained initialization continues Loads the microcode update into the processor Initializes the MTRRs using the same mapping that was used for the BSP Enables the cache OY Executes the CPUID instruction with a value of OH in the EAX register then reads the EBX ECX and EDX registers to determine if the AP is Genuinelntel 6 Executes the CPUID instruction with a value of 1H in the EAX register then saves the values in the EAX ECX and EDX registers in a system configuration space in RAM for use later 7 Switches to protected mode and ensures that the APIC address space is mapped to the strong uncacheable UC memory type 8 7 11 MICROCODE UPDATE Resources In an Intel processor supporting Intel Hyper
332. ual Documentation Changes 172 H intel Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA B imm16 NA NA NA ROUNDPD Round Packed Double Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Leg Mode 66 OF 3A09 r ROUNDPDxmml A Valid Valid Round packed double ib xmm2 m128 precision floating point imm8 values in xmm2 m128 and place the result in xmm1 The rounding mode is determined by imm8 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m r imm8 NA ROUNDPS Round Packed Single Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 0F 3A 08 ROUNDPS xmml A Valid Valid Round packed single Ir ib xmm2 m128 precision floating point imm8 values in xmm2 m128 and place the result in xmm1 The rounding mode is determined by imm8 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m r imm8 NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 173 chenes intel ROUNDSD Round Scalar Double Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 0F3A0B r ROUNDSDxmml Valid Valid Round the low packed ib xmm2 m
333. uction Operand Encoding Op En A Operand 1 NA Operand 2 NA Operand 3 NA Operand 4 NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 64 chenes intel INSERTPS Insert Packed Single Precision Floating Point Value Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 660F3A21 r INSERTPS xmm1 A Valid Valid Insert a single precision ib xmm2 m32 imm8 floating point value selected by imm8 from xmm2 m32 into xmm1 at the specified destination element specified by imm8 and zero out destination elements in xmm1 as indicated in imm8 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM r m r imm8 NA INT n INTO INT 3 Call to Interrupt Procedure Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode CC INT 3 A Valid Valid Interrupt 3 trap to debugger CD ib INT imm8 B Valid Valid Interrupt vector number specified by immediate byte CE INTO A Invalid Valid Interrupt 4 if overflow flag is 1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A NA NA NA NA B imm8 NA NA NA Operation The following operational description applies not only to the INT n and INTO instructions but also to external interrupts and exceptions IF PE 0 THEN GOTO REAL ADDRESS MODE ELSE PE 1 Intel 64 an
334. unsigned saturation Instruction Operand Encoding Op En A Operand 1 ModRM reg r w Operand 2 ModRM r m r Operand 3 PACKUSWB Pack with Unsigned Saturation Operand 4 NA NA Opcode OF 67 r 66 OF 67 r Instruction PACKUSWB mm mm m64 xmm2 m128 PACKUSWB xmm1 Op 64 Bit En Mode A Valid Valid Compat Leg Mode Valid Valid Description Converts 4 signed word integers from mm and 4 signed word integers from mm m64 into 8 unsigned byte integers in mm using unsigned saturation Converts 8 signed word integers from xmm1 and 8 signed word integers from xmm2 m128 into 16 unsigned byte integers in xmm1 using unsigned saturation Instruction Operand Encoding Op En A Operand 1 ModRM reg r w Operand 2 ModRM r m r Operand 3 Operand 4 NA NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 117 H intel PADDB PADDW PADDD Add Packed Integers Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF FC r PADDB mm A Valid Valid Add packed byte integers mm m64 from mm m64 and mm 66 OF FC r PADDB 1 A Valid Valid Add packed byte integers xmm2 m128 from xmm2 m128 and xmm1 OFFD r PADDW mm A Valid Valid Add packed word integers mm m64 from mm m64 and mm 66 OF FD r PADDW 1 A Valid Valid Add packed word integers xmm2 m128 from xmm2
335. unsigned packed words in mm and saturate result 66 OF D9 r PSUBUSW 1 A Valid Valid Subtract packed unsigned xmm2 m128 word integers in xmm2 m128 from packed unsigned word integers in xmm1 and saturate result Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 155 Documentation Changes PTEST Logical Compare Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 0F 38 17 r A Valid Valid Set ZF if xmm2 m128 AND xmm2 m128 xmm1 result is all 05 Set CF if xmm2 m128 AND NOT xmm1 result is all Os Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg ModRM r m r NA NA eee Unpack High ala Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 68 r PUNPCKHBW mm A Valid Valid Unpack and interleave high mm m64 order bytes from mm and mm m64 into mm 66 OF 68 r PUNPCKHBW A Valid Valid Unpack and interleave high 1 order bytes from xmm1 and xmm2 m128 xmm2 m128 into xmm1 OF 69 r PUNPCKHWD mm A Valid Valid Unpack and interleave high mm m64 order words from mm and mm m64 into mm 66 OF 69 r PUNPCKHWD A Valid Valid Unpack and interleave high 1 order words from xmm1 xmm2 m128 xmm2 m128 into xmm1 OF 6A r PUNPCK
336. upport CPUI D function 80000008H the width is generally 36 if CPUI D OH EDX PAE bit 6 1 and 32 otherwise This width is referred to as MAXPHYADDR MAXPHYADDR is at most 52 e CPUID 80000008H EAX 15 8 reports the linear address width supported by the processor Generally this value is 48 if CPUID 80000001H EDX LM bit 29 2 1 and 32 otherwise Processors that do not support CPUID function 80000008H support a linear address width of 32 4 2 HIERARCHICAL PAGING STRUCTURES AN OVERVIEW All three paging modes translate linear addresses use hierarchical paging structures This section provides an overview of their operation Section 4 3 Section 4 4 and Section 4 5 provide details for the three paging modes Every paging structure is 4096 Bytes in size and comprises a number of individual entries With 32 bit paging each entry is 32 bits 4 bytes there are thus 1024 entries in each structure With PAE paging and 32 paging each entry is 64 bits 8 bytes there are thus 512 entries in each structure PAE paging includes one exception a paging structure that is 32 bytes in size containing 4 64 bit entries The processor uses the upper portion of a linear address to identify a series of paging structure entries The last of these entries identifies the physical address of the region to which the linear address translates called the page frame The lower portion of the linear address called the page offset identifies the
337. upt for which the processor INTR signal is currently being asserted If at the time the INTA cycle is issued the interrupt that was to be dispensed has become masked programmed by software the local APIC will deliver a spurious interrupt vector Dispensing the spurious interrupt vector does not affect the ISR so the handler for this vector should return without an EOI The vector number for the spurious interrupt vector is specified in the spurious interrupt vector register See Figure 10 23 The functions of the fields in this register are as follows Spurious Vector Determines the vector number to be delivered to the processor when the local APIC generates a spurious vector Pentium 4 and Intel Xeon processors Bits 0 through 7 of the this field are programmable by software P6 family and Pentium processors Bits 4 through 7 of the this field are programmable by software and bits 0 through 3 are hardwired to logical ones Software writes to bits 0 through 3 have no effect APIC Software Enable Disable Allows software to temporarily enable 1 or disable 0 the local APIC see Section 10 4 3 Enabling or Disabling the Local APIC Focus Processor Checking Determines if focus processor checking is enabled 0 or disabled 1 when using the lowest priority delivery mode In Pentium 4 and Intel Xeon processors this bit is reserved and should be cleared to 0 Suppress EOI Broadcasts Determines whether an EOI for a level triggered
338. ure Intel plat form innovations Specifically the x2APIC architecture does the following Retains all key elements of compatibility to the xAPIC architecture delivery modes interrupt and processor priorities interrupt sources interrupt destination types Provides extensions to scale processor addressability for both the logical and physical destination modes Adds new features to enhance performance of interrupt delivery Reduces complexity of logical destination mode interrupt delivery on link based platform architectures Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 250 ee ehanas intel Uses MSR programming interface to access APIC registers in x2APIC mode instead of memory mapped interfaces Memory mapped interface is supported when operating in xAPIC mode 10 12 1 Detecting and Enabling x2APIC Mode Processor support for x2APIC mode be detected by executing CPUID with EAX 1 and then checking ECX bit 21 ECX If CPUID EAX 1 ECX 21 is set the processor supports the x2APIC capability and can be placed into the x2APIC mode System software can place the local APIC in the x2APIC mode by setting the x2APIC mode enable bit bit 10 in the 2 APIC BASE MSR at MSR address 01BH The layout for the 2 MSR is shown in Figure 10 26 Table 10 5 x2APIC operating mode configurations describe the possible combinations of the enable bi
339. word xmm2 m128 integers from xmm2 m128 and xmm1 and saturate the results Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA PADDUSB PADDUSW Add Packed Unsigned Integers with Unsigned Saturation Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF DC r PADDUSB mm A Valid Valid Add packed unsigned byte mm m64 integers from mm m64 and mm and saturate the results 66 OF DC r PADDUSB xmm1 Valid Valid Add packed unsigned byte xmm2 m128 integers from xmm2 m128 and xmm1 saturate the results OF DD r PADDUSW mm A Valid Valid Add packed unsigned word mm m64 integers from mm m64 and mm and saturate the results 66 OF DD r PADDUSW xmm1 Valid Valid Add packed unsigned word xmm2 m128 integers from xmm2 m128 to xmm1 and saturate the results Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 119 H intel Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA PALIGNR Packed Align Right Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 3A OF PALIGNR mm1 A Valid Valid Concatenate destination and mm2 m64 imm8 source operands extract byte aligned result shifted to the right by constant value in imm8 into mm1 66 OF 3A OF PALIGNR xmm1 A Valid Valid Concatenate d
340. word in mm left by imm8 while shifting in 05 150 chenes intel Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 0 73 6 PSLLQ xmm1 Valid Valid Shift quadwords in xmm1 imm8 left by imm8 while shifting in 0s Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA B ModRM r m r w imm8 NA NA PSRAW PSRAD Shift Packed Data Right Arithmetic Opcode Instruction 64 Bit Compat Description En Mode Leg Mode OF El r PSRAW mm A Valid Valid Shift words in mm right by mm m64 mm m64 while shifting in sign bits 66 OF El r PSRAW xmm1 A Valid Valid Shift words in xmm1 right xmm2 m128 by xmm2 m128 while shifting in sign bits OF 71 4 ib PSRAW mm imm8 B Valid Valid Shift words in mm right by imm8 while shifting in sign bits 66 0F 71 4ib PSRAW 1 B Valid Valid Shift words in xmm1 right imm8 by imm8 while shifting in sign bits OF E2 PSRAD mm A Valid Valid Shift doublewords in mm mm m64 right by mm m64 while shifting in sign bits 66 OF E2 r PSRAD xmm1 A Valid Valid Shift doubleword in xmm1 xmm2 m128 right by xmm2 m128 while shifting in sign bits OF 72 4 ib PSRAD mm imm8 Valid Valid Shift doublewords in mm right by imm8 while shifting in sign bits 66 0 72 4 ib 1 B Valid Valid Shift doublewords in xmm1 imm8 right by imm8 while shifting in
341. xmm1 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg w ModRM reg r NA NA MOVHPD Move High Packed Double Precision Floating Point Value Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 OF 16 r MOVHPD xmm A Valid Valid Move double precision m64 floating point value from m64 to high quadword of xmm 66 OF 17 r MOVHPD m64 B Valid Valid Move double precision xmm floating point value from high quadword of xmm to m64 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg r w ModRM r m r NA NA B ModRM r m w ModRM reg r NA NA Intel 64 and IA 32 Architectures Software Developer s Manual Documentation Changes 95 chenes intel MOVHPS Move High Packed Single Precision Floating Point Values Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 16 r MOVHPS xmm A Valid Valid Move two packed single m64 precision floating point values from m64 to high quadword of xmm OF 17 Jr MOVHPS m64 B Valid Valid Move two packed single xmm precision floating point values from high quadword of xmm to m64 Instruction Operand Encoding Operand 1 Operand 2 Operand 3 Operand 4 ModRM reg r w ModRM r m NA NA ModRM r m w ModRM reg NA NA Packed Single Precision Floating Point Values Low to Ig Opcode Instruction Op 64 Bit C
342. xmm2 m128 comparison of string data imm8 with implicit lengths generating an index and storing the result in ECX Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg ModRM r m r imm8 NA Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 124 chenes intel PCMPISTRM Packed Compare Implicit Length Strings Return Mask Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode 66 62 PCMPISTRM Valid Valid Perform a packed imm8 xmm1 comparison of string data xmm2 m128 with implicit lengths imm8 generating a mask and storing the result in XMM0 Instruction Operand Encoding Op En Operand 1 Operand 2 Operand 3 Operand 4 A ModRM reg ModRM r m imm8 NA PCMPGTB PCMPGTW PCMPGTD Compare Packed Signed Integers for Greater Than Opcode Instruction Op 64 Bit Compat Description En Mode Leg Mode OF 64 PCMPGTB mm A Valid Valid Compare packed signed byte mm m64 integers in mm and mm m64 for greater than 66 OF 64 r 1 A Valid Valid Compare packed signed byte xmm2 m128 integers in xmm1 and xmm2 m128 for greater than OF 65 r PCMPGTW mm A Valid Valid Compare packed signed mm m64 word integers in mm and mm m64 for greater than 66 OF 65 r PCMPGTW xmm1 A Valid Valid Compare packed signed xmm2 m128 word integers in xmm1 xmm2 m
343. y memory management software to manage the transfer of pages and paging structures into and out of physical memory Whenever the processor uses a paging structure entry as part of linear address transla tion it sets the accessed flag in that entry if it is not already set Whenever there is a write to a linear address the processor sets the dirty flag if it is not already set in the paging structure entry that identifies the final physical address for the linear address either a PTE or a paging structure entry in which the PS flag is 1 4 9 2 Paging and Memory Typing When the PAT is Supported Pentium 1 and More Recent Processor Families If the PAT is supported paging contributes to memory typing in conjunction with the PAT and the memory type range registers MTRRs as specified in Table 11 7 in Section 11 5 2 2 Intel 64 and 32 Architectures Software Developer s Manual Documentation Changes 219 e Documentation Changes n tel The PAT is a 64 bit MSR 1A32 PAT MSR index 277H comprising eight 8 8 bit entries entry i comprises bits 8i 7 8i of the MSR For any access to a physical address the table combines the memory type specified for that physical address by the MTRRs with a memory type selected from the PAT Table 11 11 in Section 11 12 3 specifies how a memory type is selected from the PAT Specifically it comes from entry i of the PAT where i is defined as follows For access to an entry in
Download Pdf Manuals
Related Search
Related Contents
OM, Rider 112C, Rider 112C5, 2014 Kramer Electronics C-DM/DM/XL-3 Uploading Images, Documents, or Podcasts to Your Blog Applications mobiles en horticulture Pie Warmer R-8 User Manual - POLARIS PROFESSIONAL - POLARIS II ViewZ VZ-46NL Tele Atlas 1027714 navigation software Copyright © All rights reserved.
Failed to retrieve file