Home

7 Series FPGAs Configurable Logic Block User Guide (UG474)

1. LUT D 6 Inputs Lo DD 05 t y DMUX FF LAT DQ DX gt D 0 CE LUT F7BMUX Inputs i Dc gt CMUX FF LAT CX gt t A are CLK D a F8MUX ee SR CK LUT SE BD 06 De Inputs 05 t 1 gt BMUX FF LAT BX gt cr rer CLK 0 SR CE LUT F7AMUX Kh A 6 Inputs gt O6 LO A O5 t 1 gt AMUX AX gt i D a FF LAT CE CK AQ SR CE D CLK E gt SR E gt gt UG474_c5_01_101210 Figure 5 1 Simplified 7 Series FPGA Slice 58 www xilinx com 7 Series FPGAs CLB User Guide send Feedback UG474 v1 7 November 17 2014 XILINX General Timing Parameters CLB General Slice Timing Model and Parameters Table 5 1 shows the general slice timing parameters for a majority of the paths in Figure 5 1 Table 5 1 General Slice Timing Parameters Parameter Function Description Combinatorial Delays Tro A B C Dinputs to A B C D Propagation delay from the A B C D LUT inputs of the slice outputs through the look up tables LUTs to the A B C D outputs of the slice Tro A B C D inputs through Propagation delay from the A B C D LUT inputs of the slice transparent latch to AQ BQ through the LUTs to the AO BO CO DO outputs of the slice CO DO outputs sequential elements configured as a latch Sequential Delays Tcko FF Clock CLK to AQ BQ Time af
2. Super Logic Region SLR UG474_c6_03_090211 Figure 6 3 Stacked Silicon Interconnect Technology Designs targeting devices using SSI technology only require one STARTUPE2 instantiation One source for GSR and GTS signals propagates across the SLR boundaries to all elements of the device Optimization for devices using SSI technology can start with the same techniques as for any standard device including using proper design techniques timing constraints or options to automatically find the best implementation Floorplanning for these devices can use the same graphical tools For more information on devices using SSI technology see the following e DS180 7 Series FPGAs Overview e Stacked Silicon Interconnect Technology page www xilinx com technology roadmap stacked silicon interconnect e WP380 Xilinx Stacked Silicon Interconnect Technology Delivers Breakthrough FPGA Capacity Bandwidth and Power Efficiency 74 www xilinx com 7 Series FPGAs CLB User Guide send Feedback UG474 v1 7 November 17 2014
3. WEN MC31 cK PSRLO SR AD 0 1 sR ID D CE 1 D CLK gt d E CK 1 WEN WE gt CIN UG474 c2 02 110510 Figure 2 3 Diagram of SLICEM 7 Series FPGAs CLB User Guide www xilinx com 19 UG474 v1 7 November 17 2014 send Feedback Chapter 2 Functional Details XILINX n SRHI Reset Type B NITO DE SR Sync Async o FF LAT gt gt DMUX D6 1 OS A6 A1 DD Bb n FF LAT oINIT1 alma D gINITo USRHI CE SRLO n SRLO 5 SR B INIT1 CE gINITO CK gp gt EN gt CMUX C6 1 DD A6 A1 DDC og D FF LAT BINITT aH ca D INITO B SRHI ex BSRLO SR BX DD gt BMUX B6 1 gt A6 A1 LOB de D FF LAT oINIT1 AH BQ D GINITO CE OSRHI CK o SRLO SR ud AX t gt AMUX A6 1 D A6 A1 tr D 06 A n o FF LAT EN z 2 AX BINI1T aH AQ gt D B SRHI 8580 SR 0 1 sa gt D CE CLK D EJ CIN UG474_c2_03_101210 Figure 2 4 Diagram of SLICEL 20 www xilinx com 7 Series FPGAs CLB User Guide send Feedback UG474 v1 7 November 17 2014 XILINX Look Up Table LUT Look Up Table LUT The function generators in 7 series FPGAs are implemented as six input look up tables LU
4. 7 Series FPGAs CLB User Guide www xilinx com UG474 v1 7 November 17 2014 Date Version Revision 08 6 2013 1 5 Added Artix 7 devices Updated references to implementation tools 08 11 2014 1 6 Revised footnotes in Table 1 1 through Table 1 3 Revised polarity from independent to programmable in Control Signals page 22 Added Primitive column to Table 2 3 and removed footnotes Renamed or made minor revisions to Figure 2 6 through Figure 2 14 Revised sections Clock WCLK page 49 Clock CLK page 50 and Clock C page 51 11 17 2014 1 7 Updated Table 1 1 for new Artix 7A15T device UG474 v1 7 November 17 2014 www xilinx com 7 Series FPGAs CLB User Guide 7 Series FPGAs CLB User Guide www xilinx com UG474 v1 7 November 17 2014 Table of Contents REVISION TNA 2 Preface About This Guide Guid Contents rar a nahh Ronnie hn ea Ra Finn lon a EN Men 7 Additional Support Resources 8 Chapter 1 Overview CLB Overview ia hd ad ee e e ee Re vb e BS 9 7 Series CLB Features eod erp e aire ecce Itn ue e a e Rt e e dicas 10 Device Resources eee 10 Recommended Design Flow 12 Pinout Planning c sida eh iper Dew 12 Chapter 2 Functional Details CLE Arrangement see nad qid dr pardo o dare Fl edes elio a
5. Additional Support Resources To find additional documentation see the Xilinx website at www xilinx com support documentation index To search the Answer Database of silicon software and IP questions and answers or to create a technical support WebCase see the Xilinx website at www xilinx com support 8 www xilinx com 7 Series FPGAs CLB User Guide send Feedback UG474 v1 7 November 17 2014 XILINX Chapter 1 Overview CLB Overview The 7 series configurable logic block CLB provides advanced high performance FPGA logic e Real 6 input look up table LUT technology e Dual LUT5 5 input LUT option e Distributed Memory and Shift Register Logic capability e Dedicated high speed carry logic for arithmetic functions e Wide multiplexers for efficient utilization CLBs are the main logic resources for implementing sequential as well as combinatorial circuits Each CLB element is connected to a switch matrix for access to the general routing matrix shown in Figure 1 1 A CLB element contains a pair of slices COUT COUT ts a I CLB I l I l y l I I Switch l Matrix I I l I Slice 0 I l I t I L 2 de m l CIN CIN UG474 c1 01 071910 Figure 1 1 Arrangement of Slices within the CLB The LUTs in 7 series FPGAs can be configured as either a 6 input LUT with one output or as two 5 input LUTs with separate outputs but common addre
6. os I I B 6 1 SEL B 1 0 DUE uo A 6 1 s I I I AMUX gt 8 1 MUX I LUT I Output 2 I I 06 AQ _ Registered I Output SEL A 1 0 DATA A 3 07 6 KE b I Input 2 I i I I Optional I AX I UG474 c2 21 101210 Figure 2 22 Two 8 1 Multiplexers in a Slice 7 Series FPGAs CLB User Guide www xilinx com 41 UG474 v1 7 November 17 2014 send Feedback Chapter 2 Functional Details 42 16 1 Multiplexer XILINX Each slice has an FEMUX F8MUX combines the outputs of F AMUX and F7BMUxX to form a combinatorial function up to 27 inputs or a 16 1 MUX Only one 16 1 MUX can be implemented in a slice as shown in Figure 2 23 SLICE TATE TTT deg entene testen ede iei 1 I LUT I I I 06 I l I SEL D 1 0 DATA D 3 0 REN IED e A 6 1 F7BMUX i Input I I I I LUT l I I 06 I I SEL C 1 0 DATA C 3 0 C16 1 6 A 6 1 input F8MUX I Cx BMUX ssi CX y 16 1 MUX Output I LUT I I B __ Registered I 06 Output i I SEL B 1 0 DATA B 3 07 B 6 1 I m i A 6 1 F7AMUX Optional I l I LUT l l I I 06 I SEL A 1 0 DATA A 3 0 A 6 1 6 Input A 6 1 i l I AX SELF7 s I I SELF8 AK j CLK pisi pam saei eut pui futs Gum Dni jussi Van dum ii fum dui Mud Rei NE RUM nei Iud me mdi Gum jus dum iu UG474 c2 22 101210 Figure 2 23 16 1
7. 2014 www xilinx com Send Feedback t Chapter 2 Functional Details XILINX Slice Description Every slice contains e Four logic function generators or look up tables e Eight storage elements e Wide function multiplexers e Carry logic These elements are used by all slices to provide logic arithmetic and ROM functions In addition some slices support two additional functions storing data using distributed RAM and shifting data with 32 bit registers Slices that support these additional functions are called SLICEM others are called SLICEL SLICEM shown in Figure 2 3 represents a superset of elements and connections found in all slices SLICEL is shown in Figure 2 4 Each CLB can contain two SLICEL or a SLICEL and a SLICEM 18 www xilinx com 7 Series FPGAs CLB User Guide pena Fecupack UG474 v1 7 November 17 2014 XILINX Slice Description Da il Reset Type DINITI Q CE GINITO O Sync Async CK 66 OFF LAT OFF LAT DINITI DINITO n SRHI WEN MC31 o FF LAT B INIT1 DINITO O SRHI WEN MC31 n SRLO o FF LAT B INIT1 B INITO o SRHI o SRLO SR n FF LAT DINIT1 D nINITO CE OSRHI
8. e Single Port 32 x 1 bit RAM e Dual Port 32 x 1 bit RAM e Quad Port 32 x 2 bit RAM e Simple Dual Port 32 x 6 bit RAM e Single Port 64 x 1 bit RAM e Dual Port 64 x 1 bit RAM e Quad Port 64 x 1 bit RAM e Simple Dual Port 64 x 3 bit RAM e Single Port 128 x 1 bit RAM e Dual Port 128 x 1 bit RAM 7 Series FPGAs CLB User Guide www xilinx com Send Feedback 23 UG474 v1 7 November 17 2014 XILINX Chapter 2 Functional Details e Single Port 256 x 1 bit RAM Distributed RAM modules are synchronous write resources A synchronous read can be implemented with a flip flop in the same slice By using this flip flop the distributed RAM performance is improved by decreasing the delay into the clock to out value of the flip flop However an additional clock latency is added The distributed elements share the same clock input For a write operation the Write Enable WE input driven by either the CE or WE pin of a SLICEM must be set High Table 2 3 shows the number of LUTs four per slice occupied by each distributed RAM configuration See UG953 Vivado Design Suite 7 Series FPGA and Zyng 7000 All Programmable SoC Libraries Guide for details of available distributed RAM primitives Table 2 3 Distributed RAM Configuration RAM Description Primitive Number of LUTs 32 15 Single port RAM32X1S 1 32x1D Dual port RAM32X1D 2 32 x 20 Quad port RAM32M 4 32 x 6SDP Simple dual port RAM32M 4 64 x 15
9. gt AMUX CIN From Previous Slice C gt AQ Optional Can be used if unregistered registered outputs are free UG474 c2 23 071813 Figure 2 24 Fast Carry Logic Path and Associated Elements www xilinx com UG474 v1 7 November 17 2014 Send Feedback 43 Chapter 2 Functional Details XILINX The carry chains use carry lookahead logic along with the function generators There are 10 independent inputs and 8 independent outputs Inputs e S inputs 50 to 53 The propagate signals of the carry lookahead logic Sourced from the O6 output of a function generator e Dlinputs DI1 to DIA The generate signals of the carry lookahead logic Sourced from either the O5 output of a function generator To create a multiplier Or the BYPASS input AX BX CX or DX of a slice To create an adder accumulator CYINIT CIN of the first bit in a carry chain 0 for add 1 for subtract AX input for the dynamic first carry bit e CIN Cascade slices to form a longer carry chain e Outputs e O outputs O0 to O3 Contain the sum of the addition subtraction e CO outputs COO to CO3 Compute the carry out for each bit COS8is connected to COUT output of a slice to form a longer carry chain by cascading multiple slices The propagation delay for an adder increases linearly with the number of bits in the operand as more carry chains are cascaded The carry chain can be implemented with a storag
10. 0 at the DI input of the SRL and is reflected on the A B C D output after a delay of length Tggg after clock event 1 Because address 0 is specified at clock event 1 the data on the DI input is reflected at A B C D output because it is written to register 0 Clock Event 2 Shift In e At time Tps before clock event 2 the data becomes valid 1 at the DI input of the SRL and is reflected on the A B C D output after a delay of length Tpgg after clock event 2 Because address 0 is still specified at clock event 2 the data on the DI input is reflected at the D output because it is written to register 0 Clock Event 3 Shift In Addressable Asynchronous Read All Read operations are asynchronous to the CLK signal If the address is changed between clock events the contents of the register at that address are reflected at the addressable output A B C D outputs after a delay of length Ty o propagation delay through a LUT At time Tps before clock event 3 the data becomes valid 1 at the DI input of the SRL and is reflected on the A B C D output Tpgg time after clock event 3 68 Send Feedback www xilinx com 7 Series FPGAs CLB User Guide UG474 v1 7 November 17 2014 XILINX CLB Slice SRL Shift Register Timing Model and Parameters Available in SLICEM Only e The address is changed from 0 to 2 The value stored in register 2 at this time is a 0 in this example this was the first data shifted in and it is reflected on the
11. 2 Functional Details XILINX Feature Options Logic SLICEL E Global Clock Column E Logic SLICEM El High performance I O Based ASMBL O DSP O High range I O Architecture El Memory E Integrated IP O Clock Management Tile Mixed Signal O Transceivers Domain A Domain B Domain C Applications Applications Applications UG474 c2 24 071014 Figure 2 1 ASMBL Architecture The ASMBL architecture breaks through traditional design barriers by e Eliminating geometric layout constraints such as dependencies between I O count and array size e Enhancing on chip power and ground distribution by allowing power and ground to be placed anywhere on the chip e Allowing disparate integrated IP blocks to be scaled independent of each other and surrounding resources SSI Technology The 7 series FPGAs extend integration even higher by using the unique stacked silicon interconnect SSI technology SSI technology enables multiple super logic regions SLRs to be combined on a passive interposer layer to create a single FPGA with more than ten thousand inter SLR connections See Chapter 6 Advanced Topics for additional information CLB Slices A CLB element contains a pair of slices and each slice is composed of four 6 input LUTs and eight storage elements e SLICE 0 slice at the bottom of the CLB and in the left column e SLICE 1 slice at the top of the CLB and in the right column These two slices do not have direct connectio
12. 5 DATA 6 O 6 015 UG474_c2_06_070914 Figure 2 7 32 X 6 Simple Dual Port Distributed RAM RAM32M 7 Series FPGAs CLB User Guide www xilinx com 27 UG474 v1 7 November 17 2014 send Feedback Chapter 2 Functional Details XILINX RAM64X1S dei I I I gt Output I I I Registered Output P Optional l I 6 D 6 1 I A 5 0 TIN e I WCLK CLK n 00608 UG474_c2_07_101210 Figure 2 8 64 X 1 Single Port Distributed RAM RAM64X1S If four single port 64 x 1 bit modules are each built as shown in Figure 2 8 the four RAM64X1S primitives can occupy a SLICEM as long as they share the same clock write enable and shared read and write port address inputs This configuration equates to a 64 x 4 bit single port distributed RAM If two dual port 64 x 1 bit modules are each built as shown in Figure 2 9 the two RAM64X1D primitives can occupy a SLICEM as long as they share the same clock write enable and shared read and write port address inputs This configuration equates to a 64 x 2 bit dual port distributed RAM S RAM64X1D____ l i vo Br DH gt i Registered I D 6 1 egistere A 5 0 4 el Lo ES Output WCLK CLK _ Optional l we WE CE l I I I l l l I Registered DPRA 5 0
13. A B C D output after a delay of length Ty o Clock Event 32 MSB Most Significant Bit Changes e At time Trac after clock event 32 the first bit shifted into the SRL becomes valid logical 0 in this case on the DMUX output of the slice via the MC31 output of LUT A SRL This is also applicable to the AMUX BMUX CMUX DMUX and COUT outputs at time Tpgg after clock event 1 7 Series FPGAs CLB User Guide www xilinx com 69 UG474 v1 7 November 17 2014 e Chapter 5 Timing 70 Send Feedback www xilinx com XILINX 7 Series FPGAs CLB User Guide UG474 v1 7 November 17 2014 XILINX Chapter 6 Advanced Topics Using the Latch Function as Logic Because the latch function is level sensitive it can be used as the equivalent of a logic gate The primitives to specify this function are AND2B1L a 2 input AND gate with one input inverted and OR2L a 2 input OR gate as shown in Figure 6 1 D D AND2B1L OR2L UG474_c6_01_110210 Figure 6 1 AND2B1L and OR2L Components As shown in Figure 6 2 the data and SR inputs and Q output of the latch are used when the AND2BIL and OR2L primitives are instantiated and the CK gate and CE gate enables are held active High The AND2B1L combines the latch data input the inverted input on the gate DI with the asynchronous clear input SRI The OR2L combines the latch data input with an asynchronous preset Generally the latch data input comes from the output of a LUT wit
14. DSPA8E1 slice The carry logic runs vertically up every other column of slices SLICEL and SLICEM The Xilinx tools automatically place logic in a column when the carry logic is used When floorplanning carry logic based functions should always be specified as a group to avoid unnecessary breaks in the carry chain Carry logic cannot be cascaded across super logic regions SLRs in devices using stacked silicon interconnect SSI technology 7 Series FPGAs CLB User Guide www xilinx com 55 UG474 v1 7 November 17 2014 SE Chapter 4 Applications 56 Send Feedback www xilinx com XILINX 7 Series FPGAs CLB User Guide UG474 v1 7 November 17 2014 XILINX Chapter 5 Timing Although it is not necessary to understand the various timing parameters to implement most designs using Xilinx software a thorough timing model can assist advanced users in analyzing critical paths or planning speed sensitive designs especially in the larger and more complex 7 series FPGAs The user is recommended to use the models in this chapter in conjunction with both the Xilinx Timing Analyzer and the section on switching characteristics in the respective 7 series FPGAs data sheet All pin names parameter names and paths are consistent with the Timing Analyzer reports Most of the CLB timing parameters found in the data sheet section on switching characteristics are described in this chapter Five categories of functions are described
15. LUT configuration The other three LUTs are always effectively dual port although SPRAM32 can be selected when the read and write addresses are connected together Figure 2 6 through Figure 2 14 illustrate various example distributed RAM configurations occupying one SLICEM When using x2 configurations as in 32 X 2 Quad Port in Figure 2 6 A6 and WA6 are driven High by the software to keep O5 and O6 independent 7 Series FPGAs CLB User Guide www xilinx com 25 UG474 v1 7 November 17 2014 send Feedback Chapter 2 Functional Details RAM32M We ee CN TESEI cU DID 1 gt AX BX CX DX ADDRDI4 0 I DEA a l l tex WCLK 4 A WE gt CLK l WED i i l l i l i l i l l ADDRC 4 0 l l l l l l l l l ADDRB 4 0 l l l l l l l l l l ADDRA 4 0 l l l l l d Send Feedback www xilinx com XILINX DOD O DODI1 DOC 0 DOC 1 DOB O DOB 1 DOA 0 DOA 1 UG474 c2 06 070914 Figure 2 6 32 X 2 Quad Port Distributed RAM RAM32M 7 Series FPGAs CLB User Guide UG474 v1 7 November 17 2014 XILINX Distributed RAM Available in SLICEM Only RAM32M o DPRAM32 unused unused WADDR 5 1 WADDR 6 1 WCLK WED DATA 1 DATA 2 RADDR 5 1 RADDR 6 1 002 Ort DATA 3 DATA 4 04 O 3 DATA
16. Multiplexer in a Slice It is possible to create multiplexers wider than 16 1 across more than one SLICEM However there are no direct connections between slices to form these wide multiplexers Send Feedback www xilinx com 7 Series FPGAs CLB User Guide UG474 v1 7 November 17 2014 XILINX Carry Logic 7 Series FPGAs CLB User Guide Carry Logic In addition to function generators dedicated fast lookahead carry logic is provided to perform fast arithmetic addition and subtraction in a slice A 7 series FPGA CLB has two separate carry chains as shown in Figure 1 1 The carry chains are cascadable to form wider add subtract logic as shown in Figure 2 2 The carry chain runs upward and has a height of four bits per slice For each bit there is a carry multiplexer MUXCY and a dedicated XOR gate for adding subtracting the operands with a selected carry bits The dedicated carry path and carry multiplexer MUXCY can also be used to cascade function generators for implementing wide logic functions Figure 2 24 illustrates the carry chain with associated logic in a slice COUT To Next Slice O6 From LUTD 62 O5 From LUTD D Carry Chain Block CARRY4 gt DMUX DQ gt DMUX DQ Optional DD CMUX CQ gt CMUX CQ Optional O5 From LUTB BX gt BMUX BQ DD BMUX C gt BQ Optional gt AMUX AQ O5 From LUTA AX
17. XILINX SRLC32E FF Synchronous Output Write Enable UG474_c4_01_101210 Figure 4 1 Synchronous Shift Register This configuration provides a better timing solution and simplifies the design Because the flip flop must be considered to be the last register in the shift register chain the static or dynamic address should point to the desired length minus one If needed the cascadable output can also be registered in a flip flop Static Length Shift Registers The cascadable 32 bit shift register implements any static length mode shift register without the dedicated multiplexers F AMUX F7BMUX and F8MUX Figure 4 2 illustrates a 72 bit shift register Only the last SRLC32E primitive needs to have its address inputs tied to 0b00111 Alternatively shift register length can be limited to 71 bits address tied to 0000110 and a flip flop can be used as the last register In an SRLC32E primitive the shift register length is the address input 1 OUT 72 bit SRL OUT bit SRL 00110 72 bit SRL UG474_c4_02_101210 Figure 4 2 Example Static Length Shift Register 54 www xilinx com 7 Series FPGAs CLB User Guide UG474 v1 7 November 17 2014 XILINX Carry Logic Applications Carry Logic Applications Dedicated carry logic in the CLB improves the performance of arithmetic functions such as adders counters and comparators Designs that include simple co
18. and flip flops supported in the device Recommended Design Flow Pinout Planning 12 CLB resources are inferred for generic design logic and do not require instantiation Good HDL design is sufficient A few items to note CLB flip flops have either a set or a reset The designer must not use both set and reset Flip flops are abundant Pipelining should be considered to improve performance Control inputs are shared across a slice or CLB The number of unique control inputs required for a design should be minimized Control inputs include clock clock enable set reset and write enable A 6 input LUT can be used as a 32 bit shift register for efficient implementation A 6 input LUT can be used as a 64 x 1 memory for small storage requirements Dedicated carry logic implements arithmetic functions effectively These steps indicate the recommended design flow 1 2 Implement the design using preferred methodologies HDL IP etc Evaluate utilization reports to determine resources used Check to make sure arithmetic logic distributed RAM and SRL are used when helpful Consider flip flop usage a Pipeline for performance b Use dedicated flip flops at the outputs of dedicated resources block RAM DSP c Allow shift registers to use SRL avoid set resets Minimize the use of set resets Although the use of most resources affects the resulting device pinout CLB usage has little effect on pinouts because they
19. attribute SRHIGH forces a logic High at the storage element output when SR is asserted while SRLOW forces a logic Low at the storage element output see Table 2 2 Table 2 2 Truth Table when using SRLOW and SRHIGH SR SRVAL Function 0 SRLOW default No Logic Change 1 SRLOW default 0 0 SRHIGH No Logic Change 1 SRHIGH 1 SRHIGH and SRLOW can be set individually for each storage element in a slice The choice of synchronous SYNC or asynchronous ASYNC set reset SRTYPE cannot be set individually for each storage element in a slice The initial state after configuration or global initial state is defined by separate INITO and INIT1 attributes By default setting the SRLOW attribute sets INITO and setting the SRHIGH attribute sets INIT1 7 series devices can set INITO and INIT1 independent of SRHIGH and SRLOW The configuration options for the set and reset functionality of a register or the four storage elements capable of functioning as a latch are e No set or reset e Synchronous set e Synchronous reset e Asynchronous set preset e Asynchronous reset clear Distributed RAM Available in SLICEM Only The function generators LUTs in SLICEMs can be implemented as a synchronous RAM resource called a distributed RAM element Multiple LUTs in a SLICEM can be combined in various ways to store larger amount of data RAM elements are configurable within a SLICEM to implement these configurations
20. inputs are used as generate signals to the carry lookahead logic The generate signals are sourced from LUT outputs Select Inputs S 3 0 The select inputs are used as propagate signals to the carry lookahead logic The propagate signals are sourced from LUT outputs Carry Initialize CYINIT The carry initialize input is used to select the first bit in a carry chain The value for this pin is either 0 for add 1 for subtract or AX input for the dynamic first carry bit SLICEM Distributed RAM Primitives Eight primitives are shown in Table 3 2 Three primitives are single port RAM three primitives are dual port RAM and two primitives are quad port RAM Table 3 2 Single Port Dual Port and Quad Port Distributed RAM Primitive RAM Size Type Address Inputs RAM32X1S 32 bit Single port A 4 0 read write RAM32M 32 bit Quad port ADDRA 4 0 read ADDRBI4 0 read ADDRC 4 0 read ADDRDf 4 0 read write RAM64X1S 64 bit Single port A 5 0 read write RAM64X1D 64 bit Dual port A 5 0 read write DPRA 5 0 read RAM64M 64 bit Quad port ADDRA 5 0 read ADDRB 5 0 read ADDRC 5 0 read ADDRDI 5 0 read write RAM128X1S 128 bit Single port A 6 0 read write RAM128X1D 128 bit Dual port A 6 0 read write DPRA 6 0 read RAM256X1S 256 bit Single port A 7 0 read write The input and output data are one bit wide with the exception of the quad port R
21. location pointed to by the address inputs Address A 0 DPRA 0 and ADDRA 0 ADDRD 0 The address inputs A 0 for single port and dual port DPRA 0 for dual port and ADDRA 0 ADDRDI 0 for quad port select the memory cells for read or write The width of the port determines the required address inputs Some of the address inputs are not buses in VADL or Verilog instantiations Table 3 2 summarizes the function of each address pin Data In D DID 0 The data input D for single port and dual port and DID 0 for quad port provide the new data value to be written into the RAM 7 Series FPGAs CLB User Guide www xilinx com Send Feedback 49 UG474 v1 7 November 17 2014 Chapter 3 Design Entry XILINX Data Out O SPO DPO and DOA 0 DOD 0 The data out O single port or SPO DPO dual port and DOA 0 DOD 0 quad port reflects the contents of the memory cells referenced by the address inputs Following an active write clock edge the data out O SPO or DOD 0 reflects the newly written data SLICEM SRL Shift Register Primitive One primitive is available for the 32 bit shift register SRLC32E which uses one LUT ina SLICEM Figure 3 4 shows the 32 bit shift register primitive SRLC32E UG474_c3_04_102910 Figure 3 4 32 Bit Shift Register Port Signals Clock CLK Either the rising edge or the falling edge of the clock is used for the synchronous
22. the data input signal to be connected to the output O Logic 0 selects the 10 input while logic 1 selects the I1 input Data Out O The data output O provides the data value one bit selected by the control inputs Carry Chain Primitive The CARRY4 primitive instantiates the fast carry logic available in each slice This primitive works in conjunction with LUTS to build adders and multipliers Figure 3 2 shows the CARRY4 primitive Synthesis tools generally infer this logic from arithmetic HDL code automatically connecting this function properly CARRY4 DI 3 0 CYINIT CI UG474 c3 02 101210 Figure 3 2 CARRYA Primitive Port Signals Sum Outputs O 3 0 The sum outputs provide the final result of the addition subtraction They connect to the slice AMUX BMUX CMUX DMUX outputs Carry Outputs CO 3 0 The carry outputs provide the carry out for each bit CO 3 is equivalent to COUT A longer carry chain can be created if CO 3 is connected through COUT to the CI input of another CARRYA primitive and dedicated routing connects the carry chain up a column of slices The carry outputs also optionally connect to the slice AMUX BMUX CMUX DMUX outputs 7 Series FPGAs CLB User Guide www xilinx com 47 UG474 v1 7 November 17 2014 send Feedback Chapter 3 Design Entry XILINX Carry In Cl The carry in input also called CIN is used to cascade slices to form longer carry chains Data Inputs DI 3 0 The data
23. 15 Slice Description ooo toL eddies Chk a EE RA E ds eis be died E Reg buds 18 Look Up Table LUT deesset hie eletta ehe apre 21 Storage Elements c css Cbr AE ARANA A ANA A 21 Distributed RAM Available in SLICEM Only 23 Shift Registers Available in SLICEM Only 34 hill AMP ae oser 39 Carry LOGIC Pret 43 Chapter 3 Design Entry Design Checklist uvas ip ieee ARIE ek ER a Sade adde ex a nie eyed 45 Using the CLB Resources des Eod de xx irene 46 MTM A eee ee eee dae n 46 Chapter 4 Applications Distributed RAM Applications varaner 53 Shift Register Applications di A 53 Carry Logic Applicati fis icc a pa a 55 Chapter 5 Timing CLB General Slice Timing Model and Parameters 58 CLB Slice Multiplexer Timing Model and Parameters 60 CLB Slice Carry Chain Timing Model and Parameters 61 7 Series FPGAs CLB User Guide www xilinx com UG474 v1 7 November 17 2014 send Feedback XILINX CLB Slice Distributed RAM Timing Model and Parameters Available in SLICEM Only ee 63 CLB Slice SRL Shift Register Timing Model and Parameters Available in SLICEM Only ioa xe ttr etr Ada 66 Chapter 6 Ad
24. 3 600 1 600 20 800 400 200 41 600 10 www xilinx com 7 Series FPGAs CLB User Guide Send Feedback UG474 v1 7 November 17 2014 XILINX Device Resources Table 1 1 Artix 7 FPGA CLB Resources Cont d 6 input Distributed RAM Shift Device Slices SLICEL SLICEM p Register Flip Flops LUTs Kb Kb 7A50T 8 150 5 750 2 400 32 600 600 300 65 200 7A75T 11 800 8 232 3 568 47 200 892 446 94 400 7A100T 15 850 11 100 4 750 63 400 1 188 594 126 800 7A200T 33 650 22 100 11 550 134 600 2 888 1 444 269 200 Notes 1 Each 7 series FPGA slice contains four LUTs and eight flip flops only SLICEMs can use their LUTs as distributed RAM or SRLs 2 Number of slices corresponding to the number of LUTs and flip flops supported in the device Table 1 2 Kintex 7 FPGA CLB Resources Device Slices SLICEL sucem SA 1880 eget Flip Flops Kb 7K70T 10 250 6 900 3 350 41 000 838 419 82 000 7K160T 25 350 16 600 8 750 101 400 2 188 1 094 202 800 7K325T 50 950 34 950 16 000 203 800 4 000 2 000 407 600 7K355T 55 650 35 300 20 350 222 600 5 088 2 544 445 200 7KA10T 63 550 40 900 22 650 254 200 5 663 2 831 508 400 7K420T 65 1150 41 400 23 750 260 600 5 938 2 969 521 200 7K480T 74 650 47 500 27 150 298 600 6 788 3 394 597 200 Notes 1 Each 7 series FPGA slice contains four LUTs and eight flip flops only SLICEMs can use their LUTs as distributed RAM o
25. 7 Series FPGAs Configurable Logic Block User Guide UG474 v1 7 November 17 2014 XILINX XILINX The information disclosed to you hereunder the Materials is provided solely for the selection and use of Xilinx products To the maximum extent permitted by applicable law 1 Materials are made available AS IS and with all faults Xilinx hereby DISCLAIMS ALL WARRANTIES AND CONDITIONS EXPRESS IMPLIED OR STATUTORY INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY NON INFRINGEMENT OR FITNESS FOR ANY PARTICULAR PURPOSE and 2 Xilinx shall not be liable whether in contract or tort including negligence or under any other theory of liability for any loss or damage of any kind or nature related to arising under or in connection with the Materials including your use of the Materials including for any direct indirect special incidental or consequential loss or damage including loss of data profits goodwill or any type of loss or damage suffered as a result of any action brought by a third party even if such damage or loss was reasonably foreseeable or Xilinx had been advised of the possibility of the same Xilinx assumes no obligation to correct any errors contained in the Materials or to notify you of updates to the Materials or to product specifications You may not reproduce modify distribute or publicly display the Materials without prior written consent Certain products are subject to the terms and c
26. AM Figure 3 3 shows generic single port dual port and quad port distributed RAM primitives The A ADDR and DPRA signals are address buses 48 www xilinx com 7 Series FPGAs CLB User Guide pena Fecupack UG474 v1 7 November 17 2014 XILINX Primitives RAM X1S RAM X1D RAM M D 0 D DI A D 0 WE WE SPO WE DOD 0 WCLK WCLK WCLK A 0 A 0 R W Port ADDRD f 0 R W Port DPO i DPRA 0 Read Port ADDRO 0 Read Port DOC 0 ADDRBI 0 Read Port DOB 0 ADDRA 0 Read Port DOA 0 UG474_c3_03_101210 Figure 3 3 Single Port Dual Port and Quad Port Distributed RAM Primitives Instantiating several distributed RAM primitives can be used to implement wide memory blocks Port Signals Each distributed RAM port operates independently of the other while reading the same set of memory cells Clock WCLK The clock is used for the synchronous write The data and the address input pins have setup times referenced to the WCLK pin The clock pin WCLK has an inversion option at the slice level The clock signal can be active at the negative edge of the clock or the positive edge of the clock without requiring other logic resources The default is at the positive clock edge Enable WE WED The enable pin affects the write functionality of the port An inactive write enable prevents any writing to memory cells An active write enable causes the clock edge to write the data input signal to the memory
27. ES hi i Optional l l l 28 Send Feedback UG474_c2_08_101210 Figure 2 9 64 X 1 Dual Port Distributed RAM RAM64X1D www xilinx com 7 Series FPGAs CLB User Guide UG474 v1 7 November 17 2014 XILINX Distributed RAM Available in SLICEM Only RAM64M FERRER REE RE E e q l De DID DU gt I ADDRD 0063 Er Registered Ll Output WCLK 41 Li gt WE T 1 I Optional I l l l l l l I l ADDRC l Registered I Output l l i f d Optional l l l l i DOB I I gt l l ADDRB I Registered Output l I I i p Optional l l l l l l DOA I I g I ADDRA i Registered l Output l l P Optional l l I I UG474 c2 09 070914 Figure 2 10 64 X 1 Quad Port Distributed RAM RAM64M 7 Series FPGAs CLB User Guide www xilinx com UG474 v1 7 November 17 2014 Send Feedback 29 Chapter 2 Functional Details unused unused WADDR 6 1 WCLK WED DATA 1 RADDR 6 1 DATA 2 DATA 3 Figure 2 11 RAM64M DPRAM64 Or 012 0131 XILINX UG474_c2_10_070914 64 X 3 Simple Dual Port Distributed RAM RAM64M Implementation of distributed RAM configurations with depth greater than 64 requires the usage of wide functi
28. G474 v1 7 November 17 2014 XILINX A 7 0 WCLK WE Distributed RAM Available in SLICEM Only RAM256X1S SPRAM64 1 01 ttt gt A 6 1 H gt H CLK WEICE O SPRAM64 A6 CX F7BMUX BX Output Output 46 AX F7AMUX Optional UG474 c2 13 101210 Figure 2 14 256 X 1 Single Port Distributed RAM RAM256X1S Distributed RAM configurations greater than the provided examples require more than one SLICEM There are no direct connections between slices to form larger distributed RAM configurations within a CLB or between slices 7 Series FPGAs CLB User Guide UG474 v1 7 November 17 2014 www xilinx com Send Feedback Registered 33 Chapter 2 Functional Details XILINX Distributed RAM Data Flow Synchronous Write Operation The synchronous write operation is a single clock edge operation with an active High write enable WE feature When WE is High the input D is loaded into the memory location at address A Asynchronous Read Operation The output is determined by the address A for the single port mode output SPO of dual port mode or address DPRA for the DPO output of dual port mode Each time a new address is applied to the address pins the data value in the memory location of that address is available on the output after the time delay to access
29. Single port RAM64X1S 1 64 x 1D Dual port RAM64X1D 2 64 10 Quad port RAM64M 4 64 x 3SDP Simple dual port RAM64M 4 128 x 15 Single port RAMI128X1S 2 128x ID Dual port RAM128X1D 4 256 x 15 Single port RAM256X1S 4 Distributed RAM configurations include e Single port e Common address port for synchronous writes and asynchronous reads Read and write addresses share the same address bus e Dual port e One port for synchronous writes and asynchronous reads One function generator is connected with the shared read and write port address e One port for asynchronous reads Second function generator has the A inputs connected to a second read only port address and the WA inputs are shared with the first read write port address 7 Series FPGAs CLB User Guide UG474 v1 7 November 17 2014 www xilinx com 24 Send Feedback XILINX Distributed RAM Available in SLICEM Only e Simple dual port e One port for synchronous writes no data out read port from the write port e One port for asynchronous reads e Quad port One port for synchronous writes and asynchronous reads e Three ports for asynchronous reads As shown in Figure 2 3 the common write port W6 W1 WA 6 1 in the following figures is always physically driven by the inputs to the D LUT using D 6 1 The read ports are independent for each of the four LUTs Therefore the D LUT is always effectively single port even if DPRAM64 is selected as the
30. Ts There are six independent inputs A inputs A1 to A6 and two independent outputs O5 and O6 for each of the four function generators in a slice A B C and D The function generators can implement e Any arbitrarily defined six input Boolean function e Two arbitrarily defined five input Boolean functions as long as these two functions share common inputs e Two arbitrarily defined Boolean functions of 3 and 2 inputs or less A six input function uses e A1 A6 inputs e 06 output Two five input or less functions use e A1 A5 inputs e A6 driven High e O5 and O6 outputs The propagation delay through a LUT is independent of the function implemented Signals from the function generators can e Exit the slice through A B C D output for O6 or AMUX BMUX CMUX DMUX output for O5 e Enter the XOR dedicated gate from an O6 output e Enter the carry logic chain from an O5 output e Enter the select line of the carry logic multiplexer from O6 output e Feed the D input of the storage element e Go to F AMUX F7BMUX wide multiplexers from O6 output In addition to the basic LUTs slices contain three multiplexers F7AMUX F BMUX and F8MUX These multiplexers are used to combine up to four function generators to provide any function of seven or eight inputs in a slice e F7AMUx Used to generate seven input functions from LUTs A and B e F7BMUx Used to generate seven input functions from LUTs C and D e F8MUX Used to combine a
31. are distributed throughout the device The ASMBL architecture provides maximum flexibility with CLBs on both sides of most I Os The best approach is to let the tools choose the I O locations based on the FPGA requirements Results can be adjusted if necessary for board layout considerations The timing constraints should be set so that the tools can choose optimal placement for the design requirements Send Feedback 7 Series FPGAs CLB User Guide UG474 v1 7 November 17 2014 www xilinx com XILINX Pinout Planning Carry logic cascades vertically up a column so wide arithmetic buses might drive a vertical orientation to other logic including I O While most 7 series devices are available in flip chip packages taking full advantage of the distributed I O in the ASMBL architecture the smaller devices are available in wire bond packages at a lower cost In these packages some pins are naturally closer to the I Os and special resources than others so pin placement should be done after the internal logic is defined 7 Series FPGAs CLB User Guide www xilinx com 13 UG474 v1 7 November 17 2014 Send Feedback Chapter 1 Overview 14 Send Feedback www xilinx com XILINX 7 Series FPGAs CLB User Guide UG474 v1 7 November 17 2014 XILINX Chapter 2 Functional Details This chapter provides a detailed view of the 7 series FPGAs CLB architecture These details can be useful for design optimization and
32. balance the timing of data pipelines 34 www xilinx com 7 Series FPGAs CLB User Guide pena Fecupack UG474 v1 7 November 17 2014 XILINX Shift Registers Available in SLICEM Only Applications for shift registers include e Delay or latency compensation e Synchronous FIFO and content addressable memory CAM Shift register functions include e Write operation e Synchronous with a clock input CLK and an optional clock enable CE e Fixed read access to Q31 e Dynamic read access e Performed through the 5 bit address bus A 4 0 The LSB of the LUT address is unused and the software automatically ties it to a logic High e Any of the 32 bits can be read out asynchronously at the 06 LUT outputs referred to as Q on the primitive by varying the address e This capability is useful in creating smaller shift registers less than 32 bits Forexample when building a 13 bit shift register set the address to the 13th bit e A storage element or flip flop is available to implement a synchronous read Theclock to out of the flip flop determines the overall delay and improves performance However one additional cycle of clock latency is added e Set or reset of the shift register is not supported Figure 2 15 is a logic block diagram of a 32 bit shift register SRLC32E Al SHIFTIN D SHIFTOUT Q31 l l Ato 3 AE clk CLK ork Output Q EET TEGE ETTE ce 00608 AQ Register
33. cKo after clock event 1 This is also applicable to the AMUX BMUX CMUX DMUX and COUT outputs at time Tsucko after clock event 1 Clock Event 2 Read Operation All Read operations are asynchronous in distributed RAM As long as WE is Low the address bus can be asserted at any time The contents of the RAM on the address bus are reflected on the A B C D outputs after a delay of length Ty o propagation delay through a LUT The address F is asserted after clock event 2 and the contents of the RAM at address F are reflected at the output after a delay of length Ty o 7 Series FPGAs CLB User Guide www xilinx com 65 UG474 v1 7 November 17 2014 e Chapter 5 Timing XILINX CLB Slice SRL Shift Register Timing Model and Parameters Available in SLICEM Only Figure 5 6 illustrates shift register implementation in a 7 series FPGA slice Some elements of the slice have been omitted for clarity Only the elements relevant to the timing paths described in this section are shown C address gt BIC gt B address gt Al DD A address gt UG474 c5 06 110110 Figure 5 6 Simplified 7 Series FPGA Slice SRL 66 www xilinx com 7 Series FPGAs CLB User Guide send Feedback UG474 v1 7 November 17 2014 XILINX CLB Slice SRL Shift Register Timing Model and Parameters Available in SLICEM Only Slice SRL Timing Parameters Table 5 5 shows the SLICEM SRL t
34. d global reset signal to make CLB inputs available which results in a smaller more efficient design The GSR signal must always re initialize every flip flop Using GSR and GTS does not use any general purpose routing resources The GSR signal is asserted automatically during the FPGA configuration process guaranteeing that the FPGA starts up in a known state At configuration because flip flops are enabled by the configuration clock and then clocked by a user clock it is recommended to enable key flip flops or key clock signals on the user clock after configuration 72 www xilinx com 7 Series FPGAs CLB User Guide send Feedback UG474 v1 7 November 17 2014 XILINX Devices Using Stacked Silicon Interconnect SSI Technology STARTUPE2 Primitive The GSR and GTS signal sources are defined and connected using the STARTUPE2 primitive This primitive allows the user to define the source of these dedicated nets GSR and GTS are always active during configuration and connecting signals to them on the STARTUPE2 primitive defines how they are controlled after configuration By default they are disabled after configuration on a selected clock cycle of the start up phase enabling the flip flops and I Os in the device The STARTUPE2 primitive also includes other signals used specifically during configuration For more information refer to the UG470 7 Series FPGAs Configuration User Guide Interconnect Optimization Interconnect delays vary acco
35. d in one SLICEM Figure 2 18 through Figure 2 20 illustrate various example shift register configurations that can occupy one SLICEM 36 www xilinx com 7 Series FPGAs CLB User Guide 225 UG474 v1 7 November 17 2014 XILINX Shift Registers Available in SLICEM Only SHIFTIN D DI1 A 5 0 A 6 2 A5 AX CLK CLK WE WE CE gt Output Q AQ Registered Optional MC31 SHIFTOUT Q63 UG474 c2 17 101210 Figure 2 18 64 Bit Shift Register Configuration CX A5 SHIFTIN D DI a 5 A 6 0 amp A 6 2 F7BMUX BX A6 CLK WE Output Q Registered Output Optional AX A5 Not Used F7AMUX UG474 c2 18 101210 Figure 2 19 96 Bit Shift Register Configuration 7 Series FPGAs CLB User Guide www xilinx com 37 UG474 v1 7 November 17 2014 send Feedback Chapter 2 Functional Details XILINX SHIFTIN D 5 A 6 0 A 6 2 CLK WE Output Q Registered F8MUX Output Optional SHIFTOUT Q127 UG474 c2 19 101210 Figure 2 20 128 Bit Shift Register Configuration It is possible to create shift registers longer than 128 bi
36. e element or a flip flop in the same slice Carry logic cascading is limited only by the height of the column of slices Carry logic cannot be cascaded across super logic regions SLRs in devices using stacked silicon interconnect SSI technology See Devices Using Stacked Silicon Interconnect SSI Technology in Chapter 6 44 www xilinx com 7 Series FPGAs CLB User Guide send Feedback UG474 v1 7 November 17 2014 XILINX Chapter 3 Design Entry CLB resources are used automatically and efficiently by FPGA synthesis tools without any special FPGA specific coding required Some HDL coding suggestions and techniques can help optimize designs for maximum efficiency Design Checklist These guidelines are provided as a quick checklist of design suggestions for effective use of the 7 series CLBs e Resource Utilization Use generic HDL code and allow the synthesis and mapping tools to choose the specific FPGA CLB resources Consider instantiation of specific resources only when necessary to meet density or performance requirements Compare the results to an estimated slice count to verify design efficiency If a design is running out of resources in the target device examine which resource is the limiting factor and consider using alternative resources such as moving registers to SRLs or distributed RAM or distributed RAM to block RAM or carry logic to DSP slices When migrating a design from another architecture rem
37. e fastest compile times The 7 series CLBs are arranged in a regular array inside the FPGA Each connects to a switch matrix for access to the general routing resources which run vertically and horizontally between the CLB rows and columns A similar switch matrix connects other resources such as the DSP slices and block RAM resources Most of the interconnect features are transparent to FPGA designers Knowledge of the interconnect details can be used to guide design techniques but is not necessary for efficient FPGA design Only selected types of interconnect are under user control These include the clock routing resources which are selected by using clock buffers and discussed in more detail in UG472 7 Series FPGAs Clocking Resources User Guide Two global control signals GSR and GTS are selected by using the STARTUPE2 primitive Global Controls GSR and GTS In addition to the general purpose interconnect 7 series FPGAs have two global logic control signals as described in Table 6 1 Table 6 1 Global Logic Control Signals Global Control Input Description GSR Global Set Reset When High asynchronously places all registers and flip flops in their initial state GTS Global 3 State When High asynchronously forces all I O pins to a high impedance state High Z 3 state If a common initialization signal is needed for every flip flop in the design use the GSR control in a design instead of a separate route
38. ed I Output Optional UG474_c2_14_110510 Figure 2 15 32 Bit Shift Register Configuration 7 Series FPGAs CLB User Guide www xilinx com 35 UG474 v1 7 November 17 2014 Chapter 2 Functional Details XILINX Figure 2 16 illustrates an example shift register configuration occupying one function generator 32 bit Shift Register SHIFTIN D WE SHIFTOUT Q31 CLK Address A 4 0 UG474 62 15 101210 Figure 2 16 Representation of a Shift Register Figure 2 17 shows two 16 bit shift registers The example shown can be implemented in a single LUT For more information on the SRLC32E and SRL16E primitives see UG953 Vivado Design Suite 7 Series FPGA and Zynq 7000 All Programmable SoC Libraries Guide n SRL16 SHIFTIN1 Al DH 4 A 3 0 A 5 2 CLK CE gt WE SHIFTIN2 AX UG474 c2 16 101210 Figure 2 17 Dual 16 Bit Shift Register Configuration As mentioned earlier the MC31 output and a dedicated connection between shift registers allows connecting the last bit of one shift register to the first bit of the next without using the LUT O6 output Longer shift registers can be built with dynamic access to any bit in the chain The shift register chaining and the F AMUX F7BMUX and F8MUX multiplexers allow up to a 128 bit shift register with addressable access to be implemente
39. ers 4 1 Multiplexer Each LUT can be configured into a 4 1 MUX The 4 1 MUX can be implemented with a flip flop in the same slice Up to four 4 1 MUXes can be implemented in a slice as shown in Figure 2 21 SEL D 1 0 DATA D 3 0 Input SEL B 1 0 DATA B 3 0 Input i eS Se fi fe i CQ Optional 4 1 MUX Output Registered Output 4 1 MUX Output Registered Output 4 1 MUX Output Registered Output 4 1 MUX Output Registered na SEL A 1 0 DATA A 3 0 A 6 1 Input I CLK dg Figure 2 21 Send Feedback www xilinx com Four 4 1 Multiplexers in a Slice Output UG474 c2 20 101210 7 Series FPGAs CLB User Guide UG474 v1 7 November 17 2014 XILINX Multiplexers 8 1 Multiplexer Each slice has an F AMUX and an F7BMUX These two MUXes combine the output of two LUTs to form a combinatorial function up to 13 inputs or an 8 1 MUX Up to two 8 1 MUXes can be implemented in a slice as shown in Figure 2 22 SELF7 2 SLICE e TI LUT I I 06 I I D 6 1 SEL D 1 0 DATA D 3 0 DI6 1 6 x heat i A 6 1 F7BMUX I I I i 8 1 MUX LUT Output 1 I I Registered I gistere I 98 I Output SEL C 1 0 DATA C 3 0 CI6 1 6 I Input 1 A 6 1 Optional i I I SELF7 1 CS our 4 l I LUT I
40. erview some of the Virtex 7 devices use stacked silicon interconnect SSI technology These devices provide a unique additional type of interconnect resource called a super long line or SLL These special routing resources can be treated like other interconnect resources as they are abundant more than 10 000 and fast about 1 ns The SLL provides a connection between super logic regions SLRs as shown in Figure 6 3 Combining multiple SLRs effectively increases the height of the ASMBL architecture columns and increases the overall capacity of the device The software can be allowed to take best advantage of this configuration or users can apply floorplanning to control placement within and between SLRs Note that carry logic cascading is limited to within an SLR and does not continue through the SLL connections The same is true of other types of cascading including block RAM DSP and DCI Boundaries are visible in floorplanning tools and reported in timing reports The number of SLRs per Virtex 7 device is listed in the 7 Series FPGAs Overview 7 Series FPGAs CLB User Guide www xilinx com 73 UG474 v1 7 November 17 2014 Send Feedback Chapter 6 Advanced Topics XILINX Super Logic Region SLR Super Long Line SLL
41. evious bit is shifted to the next position and appears on the Q output Shift Register Summary Here is a summary of the shift registers Multiplexers A shift operation requires one clock edge Dynamic length read operations to the Q output of the LUT are asynchronous Static length read operations to the Q output of the LUT are synchronous The data input has a setup to clock timing specification In a cascadable configuration the Q31 output always contains the last bit value The Q31 output changes synchronously after each shift operation Function generators and associated multiplexers in 7 series FPGAs can implement these functions 4 1 multiplexers using one LUT e Four 4 1 MUXes per slice 8 1 multiplexers using two LUTs e Two 8 1 MUXes per slice 16 1 multiplexers using four LUTs e One 16 1 MUX per slice UG474 v1 7 November 17 2014 7 Series FPGAs CLB User Guide www xilinx com Send Feedback 39 Chapter 2 Functional Details 40 XILINX These wide input multiplexers are implemented in one level of logic or LUT using the dedicated F AMUX F7BMUX and F8MUX multiplexers These multiplexers allow LUT combinations of up to four LUTs in a slice The wide multiplexers can also be used to create general purpose functions of up to 13 inputs in two LUTs or 27 inputs in four LUTs one slice Although the multiplexer outputs are combinatorial they can be registered in the CLB storage elements Designing Large Multiplex
42. hin the same slice extending the logic capability to another external input Because there is only one SR input per slice using more than one AND2B1L or OR2L per slice requires a shared common external input IN 6 1 SRI UG474 c6 02 110310 Figure 6 2 Implementation of OR2L Q D or SRI The device model shows these functions as AND2L and OR2L configurations of the storage element 7 Series FPGAs CLB User Guide www xilinx com 71 UG474 v1 7 November 17 2014 send Feedback Chapter 6 Advanced Topics XILINX The AND2B1L and OR2L two input gates save LUT resources and are initialized to a known state on power up and on GSR assertion Global Set Reset Using these primitives can reduce logic levels and increase logic density of the device by trading register latch resources for logic However due to the static inputs required on the clock and clock enable inputs specifying one or more AND2B1L or OR2L primitives can cause register packing and density issues in a slice disallowing the use of the remaining registers and latches Interconnect Resources Interconnect is the programmable network of signal pathways between the inputs and outputs of functional elements within the FPGA such as IOBs CLBs DSP slices and block RAM Interconnect also called routing is segmented for optimal connectivity The Xilinx placement and routing tools exploit the rich interconnect array to deliver optimal system performance and th
43. ic and sequential elements instantiation should be reserved primarily for specifying when resources outside the CLB should be used such as the DSP slice Instantiation might also be needed if the synthesis tool does not infer a desired special CLB resource such as the wide multiplexers distributed RAM or SRL feature This section provides an overview of the most commonly used CLB primitives For instantiation examples and for information on other CLB primitives see UG953 Vivado Design Suite 7 Series FPGA and Zynq 7000 All Programmable SoC Libraries Guide Multiplexer Primitives The multiplexer primitives provide direct instantiation of the dedicated multiplexers in each slice allowing wider multiplexers to be built Table 3 1 describes the two primitives Table 3 1 Multiplexer Primitives Primitive Inputs Resource Output Function LUT outputs ver 4 1 multiplexer F7AMUX or F7BMUX 8 1 multiplexer F7AMUX and F7BMUX outputs MUXF8 8 1 multiplexer F8MUX 16 1 multiplexer The port signals are the same for both multiplexer primitives Figure 3 1 shows MUXF7 MUXF7 UG474 c3 01 101210 Figure 3 1 MUXF7 Primitive 46 www xilinx com 7 Series FPGAs CLB User Guide send Feedback UG474 v1 7 November 17 2014 XILINX Primitives Port Signals Data In 10 1 The data input provides the data to be selected by the select signal S Control In S The select input signal determines
44. igns originally targeted to earlier FPGAs The interconnect routing resources are increased in size quantity and flexibility relative to the Virtex 6 FPGA family improving the quality of automatic place and route results Device Resources The CLB resources are scalable across all the 7 series families providing a common architecture that improves efficiency IP implementation and design migration The number of CLBs and the ratio between CLBs and other device resources differentiates the 7 series families Migration between the 7 series families does not require any design changes for the CLBs Device capacity is often measured in terms of logic cells which are the logical equivalent of a classic four input LUT and a flip flop The 7 series FPGA CLB six input LUT abundant flip flops and latches carry logic and the ability to create distributed RAM or shift registers in the SLICEM increase the effective capacity The ratio between the number of logic cells and 6 input LUTs is 1 6 1 7 Series FPGA CLB Resources Table 1 1 through Table 1 3 show the available CLB resources for the Artix 7 Kintex 7 and Virtex 7 FPGAs Refer to DS180 7 Series FPGAs Overview for the most up to date information Table 1 1 Artix 7 FPGA CLB Resources 6 input Distributed RAM shift Device Slices SLICEL SLICEM p Register Flip Flops LUTs Kb Kb 7A15T 2 600 1 800 800 10 400 200 100 20 800 7A35T 5 200 2
45. iming parameters for a majority of the paths in Figure 5 6 Table 5 5 Slice SRL Timing Parameters Parameter Function Sequential Delays for a Slice LUT Configured as an SRL Description Trec CLK to A B C D outputs Time after the CLK of a write operation that the data written to the SRL is stable on the A B C D outputs of the slice Trec Trec uix CLK to AMUX DMUX outputs Time after the CLK of a write operation that the data written to the SRL is stable on the AMUX BMUX CMUX DMUX outputs of the slice Trec TREG_M31 CLK to DMUX output via MC31 output Time after the CLK of a write operation that the data written to the SRL is stable on the DMUX output via MC31 output Setup and Hold Times for a Slice LUT Configured as an SRL Tcrck TCKCE Time before after the clock that the Tws surREG TWH SHFREG T T CE input CE clock enable signal must be stable at the CE EEE OISIGEGEIHBBES input of the slice LUT configured as an SRL Twe T Time before after the clock that the WS WH CE input WE write enable signal must be stable at the WE input of the slice LUT configured as an SRL Tps Tpp Tps_sHFREG TpH SHFREG AI BI CI DI configured as data inputs DI Time before after the clock that the data must be stable at the AI BI CI DI inputs of the slice configured as an SRL Clock CLK Minimum clock pulse width for distributed Tm
46. in this chapter CLB General Slice Timing Model and Parameters page 58 e CLB Slice Multiplexer Timing Model and Parameters page 60 e CLB Slice Carry Chain Timing Model and Parameters page 61 e Slice Distributed RAM Timing Model and Parameters Available in SLICEM Only page 63 e CLB Slice SRL Shift Register Timing Model and Parameters Available in SLICEM Only page 66 For each category three timing model sections are described e Functional element diagram basic architectural schematic illustrating pins and connections e Timing parameters definitions of the respective 7 series FPGAs data sheet timing parameters Timing diagram illustrates functional element timing parameters relative to each other The general form of a timing parameter name is T subscripted with a combination of the starting pin and the ending pin For example Tcgcx is the setup time from CE to the clock and Tcxcg is the hold time from the clock to CE being removed In some cases a single parameter name can apply to multiple paths through the slice and can have different values for each path 7 Series FPGAs CLB User Guide www xilinx com 57 UG474 v1 7 November 17 2014 send Feedback Chapter 5 Timing XILINX CLB General Slice Timing Model and Parameters A simplified 7 series FPGA slice is shown in Figure 5 1 Only the elements relevant to the timing paths described in this section are shown
47. input functions 2 Txxck Setup Time before clock edge and Tckxx Hold Time after clock edge 7 Series FPGAs CLB User Guide UG474 v1 7 November 17 2014 www xilinx com 59 Send Feedback Chapter 5 Timing XILINX General Timing Characteristics Figure 5 2 illustrates the general timing characteristics of a 7 series FPGA slice e TcEckK e yA lt bu Tpick l AX BX CX DX DATA _ Tsrck SR RESET gt m Jepp k T AQ BQ CQ DQ lt OUT ETR Figure 5 2 General Slice Timing Characteristics e At time Tcecg before clock event 1 the clock enable signal becomes valid High at the CE input of the slice register e Attime Tpjcy before clock event 1 data from either AX BX CX or DX inputs become valid High at the D input of the slice register and is reflected on either the AQ BQ CQ or DQ pin at time TcKo after clock event 1 e At time Tsrcg before clock event 3 the SR signal configured as synchronous reset becomes valid High resetting the slice register This is reflected on the AQ BO CQ or DQ pin at time Tcxo after clock event 3 CLB Slice Multiplexer Timing Model and Parameters Figure 2 22 page 41 and Figure 2 23 page 42 illustrate the wide multiplexers in a 7 series FPGA slice Slice Multiplexer Timing Parameters Table 5 2 shows the slice multiplexer timing parameters for a majority of the paths in Figure 2 22 and Figure 2 23 No
48. is also cleared by default on power up 7 Series FPGAs CLB User Guide www xilinx com 51 UG474 v1 7 November 17 2014 SS 52 Chapter 3 Design Entry XILINX Synchronous Set S When S is High all other inputs are overridden and the data output Q is driven High on the active clock transition This signal is available in the FDSE component The FDSE flip flop is also preset by default on power up Asynchronous Clear CLR When CLR is High all other inputs are overridden and the data output Q is driven Low This signal is available in the FDCE component The FDCE flip flop is also cleared by default on power up Asynchronous Preset PRE When PRE is High all other inputs are overridden and the data output Q is driven High This signal is available in the FDPE component The FDPE flip flop is also preset by default on power up Note Using both asynchronous clear and preset on the same flip flop requires additional resources and timing paths www xilinx com 7 Series FPGAs CLB User Guide send Feedback UG474 v1 7 November 17 2014 XILINX Chapter 4 Applications This chapter provides guidance regarding using the CLB resources as parts of larger functions Trade offs are described between alternative methods of implementing these functions Distributed RAM Applications Distributed RAM provides a trade off between using storage elements for very small arrays and block RAM for larger arrays It is recomme
49. le Notes 1 Txxcx Setup Time before clock edge and Tckxx Hold Time after clock edge Slice Carry Chain Timing Characteristics Figure 5 3 illustrates the timing characteristics of a slice carry chain implemented in a 7 series FPGA slice cK YX gt lt T CIN CINCK DATA pe Toko AQ BQ CQ DQ 4 OUT 46474 65 03 110110 Figure 5 3 Slice Carry Chain Timing Characteristics Note relevant to Figure 5 3 e At time Temck before clock event 1 data from the CIN input becomes valid High at the Data input of the slice register This is reflected on any of the AQ BQ CQ DQ pins at time Toyo after clock event 1 62 www xilinx com 7 Series FPGAs CLB User Guide send Feedback UG474 v1 7 November 17 2014 XILINX CLB Slice Distributed RAM Timing Model and Parameters Available in SLICEM Only CLB Slice Distributed RAM Timing Model and Parameters Available in SLICEM Only Figure 5 4 illustrates the details of distributed RAM implemented in a 7 series FPGA slice Some elements of the slice are omitted for clarity Only the elements relevant to the timing paths described in this section are shown CX Lo C input gt BI D BX D B input gt UG474 c5 04 110110 Figure 5 4 Simplified 7 Series FPGA SLICEM Distributed RAM 7 Series FPGAs CLB User Guide www xilinx com 63 UG474 v1 7 November 17 2014 SE Chapte
50. ll LUTS to generate eight input functions Functions with more than eight inputs can be implemented using multiple slices There are no direct connections between slices to form function generators greater than eight inputs within a CLB Storage Elements There are eight storage elements per slice Four can be configured as either edge triggered D type flip flops or level sensitive latches The D input can be driven directly by a LUT output via AFFMUX BFFMUX CFFMUX or DFFMUX or by the BYPASS slice inputs bypassing the function generators via AX BX CX or DX input When configured as a latch the latch is transparent when the CLK is Low 7 Series FPGAs CLB User Guide www xilinx com 21 UG474 v1 7 November 17 2014 send Feedback Chapter 2 Functional Details XILINX LUT D O5 Output LUT C O5 Output There are four additional storage elements that can only be configured as edge triggered D type flip flops The D input can be driven by the O5 output of the LUT or the BYPASS slice inputs via AX BX CX or DX input When the original four storage elements are configured as latches these four additional storage elements cannot be used Figure 2 5 shows both the register only and the register latch configuration in a slice DFF LUT D Output DFF LATCH o INIT DQ D INITO o SRHIGH n SRLOW DX SR CFF LUT C Output DINITI DINITO Q gt ca O SRHIGH n SRLOW Cx um SR Reset Type Reset Ty
51. nded to infer memory where possible to provide the greatest flexibility Distributed RAM can also be targeted by instantiation or through the use of Xilinx LogiCORE IP In general distributed RAM should be used for all memories that consist of 64 bits or less unless there is a shortage of SLICEM or logic resources for the target device Distributed RAM is more efficient in terms of resources performance and power For depths greater than 64 bits but less than or equal to 128 bits the decision on the best resource to use depends on these factors 1 The availability of extra block RAMs If not available distributed RAM should be used 2 The latency requirements If asynchronous read capability is needed distributed RAMs must be used The data width Widths greater than 16 bits should use block RAM if available The necessary performance requirements Registered distributed RAMs generally have shorter clock to out timing and fewer placement restrictions than block RAMs Shift Register Applications Synchronous Shift Registers The shift register primitive does not use the register available in the same slice To implement a fully synchronous read and write shift register output pin Q must be connected to a flip flop Both the shift register and the flip flop share the same clock as shown in Figure 4 1 7 Series FPGAs CLB User Guide www xilinx com 53 UG474 v1 7 November 17 2014 sie esa solids Chapter 4 Applications
52. ns to each other and each slice is organized as a column Each slice in a column has an independent carry chain 16 www xilinx com 7 Series FPGAs CLB User Guide 225 UG474 v1 7 November 17 2014 XILINX The Xilinx tools designate slices with these definitions CLB Arrangement e An X followed by a number identifies the position of each slice in a pair as well as the column position of the slice The X number counts slices starting from the bottom in sequence 0 1 the first CLB column 2 3 the second CLB column etc e AY followed by a number identifies a row of slices The number remains the same within a CLB but counts up in sequence from one CLB row to the next CLB row starting from the bottom Figure 2 2 shows four CLBs located in the bottom left corner of the die COUT COUT COUT COUT UG474 c2 01 092210 Figure 2 2 Row and Column Relationship between CLBs and Slices CLB Slice Configurations Table 2 1 summarizes the logic resources in one CLB Each SLICEM LUT can be configured as a look up table distributed RAM or a shift register Logic Resources in One CLB Flip Flops 16 Arithmetic and Carry Chains 2 Distributed RAM 256 bits Shift Registers 128 bits Table 2 1 Slices LUTs 2 8 Notes 1 SLICEM only SLICEL does not have distributed RAM or shift registers 7 Series FPGAs CLB User Guide UG474 v1 7 November 17
53. on multiplexers F7AMUX F7BMUX and F8MUX as shown in Figure 2 12 through Figure 2 14 If two single port 128 x 1 bit modules are each built as shown in Figure 2 12 the two RAM128X1S primitives can occupy a SLICEM as long as they share the same clock write enable and shared read and write port address inputs This configuration equates to 128 x 2 bit single port distributed RAM 30 Send Feedback www xilinx com 7 Series FPGAs CLB User Guide UG474 v1 7 November 17 2014 XILINX Distributed RAM Available in SLICEM Only RAM128X1S D A 6 0 WCLK WE Output Registered Output Optional UG474 c2 11 101210 Figure 2 12 128 X 1 Single Port Distributed RAM RAM128X1S 7 Series FPGAs CLB User Guide www xilinx com 31 UG474 v1 7 November 17 2014 Send Feedback Chapter 2 Functional Details XILINX RAM128X1D a O EE a I A 6 0 I WCLK WE i SPO E I l Registered F7BMUX ES Output I P Optional I I I I I I I DPRA 6 0 I I I 1000 I gt I I Registered F7AMUX de Output Optional I UG474_c2_12_101210 Figure 2 13 128 X 1 Dual Port Distributed RAM RAM128X1D 32 www xilinx com 7 Series FPGAs CLB User Guide send Feedback U
54. onditions of Xilinx s limited warranty please refer to Xilinx s Terms of Sale which can be viewed at http www xilinx com legal htm tos IP cores may be subject to warranty and support terms contained in a license issued to you by Xilinx Xilinx products are not designed or intended to be fail safe or for use in any application requiring fail safe performance you assume sole risk and liability for use of Xilinx products in such critical applications please refer to Xilinx s Terms of Sale which can be viewed at http www xilinx com legal htm tos Automotive Applications Disclaimer XILINX PRODUCTS ARE NOT DESIGNED OR INTENDED TO BE FAIL SAFE OR FOR USE IN ANY APPLICATION REQUIRING FAIL SAFE PERFORMANCE SUCH AS APPLICATIONS RELATED TO I THE DEPLOYMENT OF AIRBAGS II CONTROL OF A VEHICLE UNLESS THERE IS A FAIL SAFE OR REDUNDANCY FEATURE WHICH DOES NOT INCLUDE USE OF SOFTWARE IN THE XILINX DEVICE TO IMPLEMENT THE REDUNDANCY AND A WARNING SIGNAL UPON FAILURE TO THE OPERATOR OR III USES THAT COULD LEAD TO DEATH OR PERSONAL INJURY CUSTOMER ASSUMES THE SOLE RISK AND LIABILITY OF ANY USE OF XILINX PRODUCTS IN SUCH APPLICATIONS Copyright 2011 2014 Xilinx Inc Xilinx the Xilinx logo Artix ISE Kintex Spartan Virtex Vivado Zynq and other designated brands included herein are trademarks of Xilinx in the United States and other countries All other trademarks are the property of their respective owners Revision History The foll
55. ove resource instantiation and any mapping and floorplanning constraints refer to UG429 7 Series FPGAs Migration Methodology Guide Refer to www xilinx com for good HDL coding techniques for FPGAs such as in UG687 XST User Guide for Virtex 6 Spartan 6 and 7 Series Devices e Pipelining The designer should use sequential design techniques and pipelining to take advantage of abundant flip flops for performance e Control Signals 7 Series FPGAs CLB User Guide UG474 v1 7 November 17 2014 Use control signals only as necessary Avoid using a routed global reset signal and minimize use of local resets to maximize opportunity to use FPGA resources Use active High control signals Avoid having both set and reset on the same flip flop Avoid control signals on small shift registers and storage arrays in order to use LUTs instead of flip flops to maximize utilization and minimize power www ilinx com Send Feedback D Chapter 3 Design Entry XILINX e Software Options e To improve performance automatically use timing constraints and trade off longer implementation run times through software options Using the CLB Resources Primitives Xilinx recommends using generic HDL code and allowing the tools to infer the usage of CLB resources Using IP solutions designed for 7 series FPGAs can help make full use of the CLB resources Although any feature in the CLB can be instantiated directly including the LUTS carry log
56. owing table shows the revision history for this document Date Version Revision 03 01 2011 1 0 Xilinx Initial release 03 28 2011 1 1 Added devices XC7K355T XC7K420T and XC7K480T to Table 1 2 Portions of the text have been revised for clarity 09 30 2011 12 Added last sentence under 7 Series CLB Features Added Table 1 2 and Table 1 3 Updated CLB features in Table 1 2 Added first sentence under CLB Arrangement added ASMBL Architecture section and CLB Slices heading Added last paragraph under Carry Logic Added lasts sentence under Using Carry Logic Added second sentence under Slice Multiplexer Timing Parameters Modified Table 5 2 Table 5 4 and Table 5 5 for clarity Added Devices Using Stacked Silicon Interconnect SSI Technology section 01 30 2012 1 3 Revised Table 1 1 Added fifth paragraph under Distributed RAM Available in SLICEM Only Clarified last paragraph under Global Controls GSR and GTS 11 05 2012 14 Changed uniformity to optimized in last bullet under 7 Series CLB Features Changed unified to scalable in first sentence under Device Resources Deleted 7A350T device from Table 1 1 Deleted 7V1500T and 7VH290T devices from Table 1 3 Added reference to 7 Series FPGA Libraries Guide to Distributed RAM Available in SLICEM Only Shift Registers Available in SLICEM Only and Flip Flop Primitives Changed Tcro to Tcgck in Figure 5 2 and first bullet under General Timing Characteristics
57. pe SR L 5 LUT B 05 Output LUT B Output BFF cad dl BFF LATCH BINIT1 B INITO L BQ B SRHIGH o SRLOW CELO LUT A O5 Output AX SR CLK D EJ AFF LUT A Output B INITI DINITO o SRHIGH o SRLOW AX SR UG474 c2 04 101210 Figure 2 5 Two Versions of Configuration in a Slice 4 Registers Only and 4 Register Latch Control Signals 22 The control signals clock CLK clock enable CE and set reset SR are common to all storage elements in one slice When one flip flop in a slice has SR or CE enabled the other flip flops used in the slice also have SR or CE enabled by the common signal Only the CLK signal has programmable polarity Any inverter placed on the clock signal is automatically absorbed The CE and SR signals are active High These initialization options are available for storage elements e SRLOW Synchronous or asynchronous Reset when CLB SR signal is asserted e SRHIGH Synchronous or asynchronous Set when CLB SR signal is asserted www xilinx com 7 Series FPGAs CLB User Guide send Feedback UG474 v1 7 November 17 2014 Distributed RAM Available in SLICEM Only e INITO Asynchronous Reset on power up or global Set Reset see Global Controls GSR and GTS page 72 e INIT1 Asynchronous Set on power up or global Set Reset The SR signal forces the storage element into the state specified by the SRHIGH or SRLOW
58. r 5 Timing XILINX Distributed RAM Timing Parameters Table 5 4 shows the timing parameters for the distributed RAM in SLICEM for a majority of the paths in Figure 5 4 Table 5 4 Distributed RAM Timing Parameters Parameter Function Description Sequential Delays for a Slice LUT Configured as RAM Distributed RAM Time after the CLK of a write operation that the Tsucko CLK to A B C D outputs data written to the distributed RAM is stable on the A B C D outputs of the slice Time after the CLK of a write operation that the CLK to AMUX BMUX CMUX DMUX data written to the distributed RAM is stable on outputs the AMUX BMUX CMUX DMUX outputs of the slice Setup and Hold Times for a Slice LUT Configured as RAM Distributed RAM TsHcxo Tsucko 1 1 Time before after the clock that data must be Tps T ps Ton Tr comipuned as data mpita stable at the AI BI CI DI inputs of the slice Tos tRAM TpuH ram DIU configured as RAM Tac T Time before after the clock that address signals Ci AH T A B C D address inputs must be stable at the A B C D LUT inputs of Tas LRAM TAH LRAM the slice configured as RAM Time before after the clock that the write WE input enable signal must be stable at the WE input of the slice configured as RAM Tws TwH Tws LRAM TwH LRAM Clock CLK Tmpw Minimum clock pulse width for distributed TMPW_LRAM RAM Minimum clock period to meet add
59. r SRLs 2 Number of slices corresponding to the number of LUTs and flip flops supported in the device Table 1 3 Virtex 7 FPGA CLB Resources Device Slices SLICEL SLICEM n HAM i Flip Flops Kb 7V585T 91 050 63 300 27 750 364 200 6 938 3 469 728 400 7V2000T 305 400 219 200 86200 1 221 600 21 550 10775 2 443 200 7VX330T 51 000 33 450 17 550 204 000 4 388 2 194 408 000 7VX415T 64 400 38 300 26 100 257 600 6 525 3 263 515 200 7VX485T 75 900 43 200 32 700 303 600 8 175 4 088 607 200 7VX550T 86 6002 51 700 34 900 346 400 8 725 4 363 692 800 7VX690T 108 300 64 750 43 550 433 200 10 888 5444 866 400 7VX980T 153 000 97 650 55 350 612 000 13 838 6919 1 224 000 7VX1140T 178 000 107 200 70 800 712 000 17 700 8 850 1 424 000 7 Series FPGAs CLB User Guide UG474 v1 7 November 17 2014 www xilinx com Send Feedback H Chapter 1 Overview XILINX Table 1 3 Virtex 7 FPGA CLB Resources Cont d Shift Device Slices SLICEL sucem S nput Distributed RAM Register Flip Flops LUTs Kb Kb 7VH580T 90 700 55 300 35 400 362 800 8 850 4425 725 600 7VH870T 136 900 83 750 53 150 547 600 13 275 6 638 1 095 200 Notes 1 Each 7 series FPGA slice contains four LUTs and eight flip flops only SLICEMs can use their LUTs as distributed RAM or SRLs 2 Number of slices corresponding to the number of LUTs
60. rameters for a majority of the paths in Figure 2 24 Table 5 3 Slice Carry Chain Timing Parameters Parameter Function Description Propagation Delays for Slice Configured as a Carry Chain Taxa Taxp Taxc Taxp AX BX CX DX inputs to Propagation delay from the TBXB Tgxc Tgxp AMUX BMUX CMUX AX BX CX DX inputs of Texc Texp DMUX outputs the slice through the carry Tpxp logic to the AMUX BMUX CMUX DMUX outputs of the slice Taxcv Tgxcv Texcv Tpxcv AX BX CX DX inputs to Propagation delay from the COUT output AX BX CX DX inputs of the slice to the COUT output of the slice Tryp CIN input to COUT output Propagation delay from the CIN input of the slice to the COUT output of the slice TopcvA TorcyB Torcyc A B C D inputs to COUT Propagation delay from the Torcyp output A B C D LUT inputs of the slice to the COUT output of the slice 7 Series FPGAs CLB User Guide www xilinx com 61 UG474 v1 7 November 17 2014 send Feedback Chapter 5 Timing XILINX Table 5 3 Slice Carry Chain Timing Parameters Cont d Parameter Function Description Towa Tews Tewc TcInp CIN input to AMUX Propagation delay from the BMUX CMUX DMUX CIN input of the slice to outputs AMUX BMUX CMUX DMUX outputs of the slice using XOR sum Setup and Hold Times for a Slice Configured as a Carry Chain Towck TcKcIN CIN input Time before after the CLK that the CIN input of the slice must be stab
61. rding to the specific implementation and loading in a design The type of interconnect distance required to travel in the device and number of switch matrices to traverse factor into the total delay Most timing issues are addressed by examining the block delays and determining the impact of using fewer levels or faster paths If interconnect delays seem too long increase implementation effort levels or iterations to improve performance along with making sure that the required timing is in the constraints file Nets with critical timing or that are heavily loaded can often be improved by replicating the source of the net The dual 5 input LUT configuration of the slice simplifies the replication of logic in the same slice which minimizes any additional loads on the inputs to the source function Replicating logic in multiple slices gives the software more flexibility to place the sources independently Floorplanning is the process of specifying user placement constraints Floorplanning can be done either before or after automatic place and route but automatic place and route is always recommended first before specifying user floorplanning The Xilinx implementation tool provides a graphical view of placement It helps the designer choose between RTL coding and synthesis and implementation with extensive design exploration and analysis features Devices Using Stacked Silicon Interconnect SSI Technology As noted in the DS180 7 Series FPGAs Ov
62. re several primitives for the CLB storage elements including both flip flops and latches with different combinations of control signals available The FDRE primitive is shown in Figure 3 5 for an example For more information on the flip flop and latch primitives see UG953 Vivado Design Suite 7 Series FPGA and Zynq 7000 All Programmable SoC Libraries Guide UG474_c3_05_102910 Figure 3 5 FDRE Primitive Port Signals Data In D The data input provides new data one bit to be clocked into the flip flop Data Out Q Q is the one bit registered data output from the flip flop Clock C Either the rising edge or the falling edge of the clock is used to capture data and toggle the output The data and clock enable input pins have setup times referenced to the chosen edge of the clock The clock pin C has an inversion option at the slice level The clock signal can be active at the negative or positive edge of the clock without requiring other logic resources The default is the positive clock edge All flip flops in a slice must use the same clock and same clock polarity Clock Enable CE When CE is Low clock transitions are ignored with respect to the D input The set and reset inputs have priority over the clock enable Synchronous Reset R When R is High all other inputs are overridden and the data output Q is driven Low on the active clock transition This signal is available in the FDRE component The FDRE flip flop
63. ress write Tucr cycle ti ycle time Notes 1 Txs Setup Time before clock edge and Txy Hold Time after clock edge 2 Parameter includes AX BX CX DX configured as a data input DI2 64 www xilinx com 7 Series FPGAs CLB User Guide na Fecupack UG474 v1 7 November 17 2014 XILINX CLB Slice Distributed RAM Timing Model and Parameters Available in SLICEM Only Distributed RAM Timing Characteristics The timing characteristics of a distributed RAM implemented in a 7 series FPGA slice LUT configured as RAM are shown in Figure 5 5 lMcP 1 2 3 4 5 6 7 TMPW A B C D ADDR SA ATL 4 OE AI BI CI DI DI WE DATA OUT A B C D MEMFX 0 X 1 X 0 X MEME Output WRITE READ WRITE WRITE WRITE READ UG474_c5_05_110110 Figure 5 5 Slice Distributed RAM Timing Characteristics Clock Event 1 Write Operation During a Write operation the contents of the memory at the address on the ADDR inputs are changed The data written to this memory location is reflected on the A B C D outputs synchronously e At time Tyys before clock event 1 the write enable signal WE becomes valid High enabling the RAM for a Write operation At time Tag before clock event 1 the address 2 becomes valid at the A B C D inputs of the RAM e At time Tps before clock event 1 the DATA becomes valid 1 at the DI input of the RAM and is reflected on the A B C D outputs at time TsH
64. rw RAM Notes 1 This parameter includes a LUT configured as a two bit shift register 2 Tyg Setup Time before clock edge and Txy Hold Time after clock edge 3 Parameter includes AX BX CX DX configured as a data input DI2 or two bits with a common shift 7 Series FPGAs CLB User Guide UG474 v1 7 November 17 2014 www xilinx com Send Feedback 67 Chapter 5 Timing XILINX Slice SRL Timing Characteristics Figure 5 7 illustrates the timing characteristics of a shift register implemented in a 7 series FPGA slice a LUT configured as an SRL 1 2 3 4 5 Write Enable Shift In DI Gi Xu p a Address A B C D Data Out A B C D MSB Co X 7 MC31 DMUX X ND 1 UG474 c5 07 110110 Figure 5 7 Slice SRL Timing Characteristics Clock Event 1 Shift In During a write Shift In operation the single bit content of the register at the address on the A B C D inputs is changed as data is shifted through the SRL The data written to this register is reflected on the A B C D outputs synchronously if the address is unchanged during the clock event If the A B C D inputs are changed during a clock event the value of the data at the addressable output A B C D outputs is invalid At time Tws before clock event 1 the write enable signal WE becomes valid High enabling the SRL for the Write operation that follows At time Tps before clock event 1 the data becomes valid
65. s Configurable Logic Block User Guide part of an overall set of documentation on the 7 series FPGAs is available on the Xilinx 7 Series documentation website Guide Contents This manual contains these chapters e Chapter 1 Overview provides basic information needed for the majority of users including e CLB Overview is targeted at the new user e 7 Series CLB Features discusses what is new compared with the Spartan 6 and Virtex 6 FPGA families for the experienced user and provides design migration considerations e Device Resources indicates the number of resources per device and unity between different 7 series families e Recommended Design Flow provides the basics of using CLB resources and lists key aspects to consider e Pinout Planning discusses aspects of CLBs that might affect pin placement for a design e Chapter 2 Functional Details lists architectural specifics for each CLB feature e Chapter 3 Design Entry provides design entry guidelines and primitives for instantiation e Chapter 4 Applications provides examples that use the CLB resources in larger applications 7 Series FPGAs CLB User Guide www xilinx com 7 UG474 v1 7 November 17 2014 send Feedback Preface About This Guide XILINX e Chapter 5 Timing contains timing models and defines CLB timing specifications from the respective 7 series FPGA data sheet e Chapter 6 Advanced Topics discusses advanced features of the 7 series CLB
66. shift operation The data and clock enable input pins have setup times referenced to the chosen edge of CLK The clock pin CLK has an inversion option at the slice level The clock signal can be active at the negative or positive edge without requiring other logic resources The default is positive clock edge Data In D The data input provides new data one bit to be shifted into the shift register Clock Enable CE The clock enable pin affects shift functionality An inactive clock enable pin does not shift data into the shift register and does not write new data Activating the clock enable allows the data in D to be written to the first location and all data to be shifted by one location When available new data appears on output pins Q and the cascadable output pin Q31 Address A 4 0 The address input selects the bit range 0 to 31 to be read The nth bit is available on the output pin Q Address inputs have no effect on the cascadable output pin Q31 Q31 is always the last bit of the shift register bit 31 Data Out Q The data output Q provides the data value 1 bit selected by the address inputs 50 www xilinx com 7 Series FPGAs CLB User Guide pena Fecupack UG474 v1 7 November 17 2014 XILINX Primitives Data Out Q31 The data output Q31 provides the last bit value of the 32 bit shift register New data becomes available after each shift in operation Flip Flop Primitives There a
67. sses or logic inputs Each 5 input LUT output can optionally be registered in a flip flop Four such 6 input LUTs and their eight flip flops as well as multiplexers and arithmetic carry logic form a slice and two slices form a CLB Four flip flops per slice one per LUT can optionally be configured as latches In that case the remaining four flip flops in that slice must remain unused 7 Series FPGAs CLB User Guide www xilinx com 9 UG474 v1 7 November 17 2014 send Feedback Chapter 1 Overview XILINX Approximately two thirds of the slices are SLICEL logic slices and the rest are SLICEM which can also use their LUTs as distributed 64 bit RAM or as 32 bit shift registers SRL32 or as two SRL16s Modern synthesis tools take advantage of these highly efficient logic arithmetic and memory features Expert designers can also instantiate them 7 Series CLB Features The 7 series CLB is identical to that in the Virtex 6 FPGA family The CLB is very similar to that of the Spartan 6 FPGA family with these differences e Columnar architecture e Scales easily to higher densities e More routing between CLBs e SLICEL and SLICEM only no Spartan 6 FPGA SLICEX e All slices support carry logic e More optimized The common features in the CLB structure simplify design migration from the Spartan 6 and Virtex 6 families to the 7 series devices The unique floorplan means that location constraints should be removed before implementing des
68. te that many of these parameter names are also used for paths that do not include the wide multiplexers Table 5 2 Slice Multiplexer Timing Parameters Parameter Function Description Propagation Delays for Slice Using the Wide Multiplexers Taxa Taxp AX BX CX inputs to AMUX Propagation delay from the 1107 BMUX CMUX outputs AX BX CX inputs of the Toxp Texc slice through the select inputs of the multiplexers to the AMUX BMUX CMUX outputs of the slice 60 www xilinx com 7 Series FPGAs CLB User Guide send Feedback UG474 v1 7 November 17 2014 XILINX CLB Slice Carry Chain Timing Model and Parameters Table 5 2 Slice Multiplexer Timing Parameters Cont d Parameter Function Description TopAB A B C D inputs to AMUX Propagation delay from the Toppa TopBB BMUX CMUX outputs A B C D LUT inputs of the Topcp slice through the Toppp Toppc multiplexers to the AMUX BMUX CMUX outputs of the slice Timo 2 A C inputs to AMUX CMUX Propagation delay from the A outputs LUT inputs of the slice through the FZAMUX multiplexer to the AMUX output of the slice or propagation delay from the C LUT inputs of the slice through the F7BMUX multiplexer to the CMUX output of the slice CLB Slice Carry Chain Timing Model and Parameters Figure 2 24 page 43 illustrates a carry chain in a 7 series FPGA slice Slice Carry Chain Timing Parameters Table 5 3 shows the slice carry chain timing pa
69. ter the clock that data is stable at the AQ BQ CQ DQ CQ DQ outputs outputs of the slice sequential elements configured as a flip flop TckLo Latch Clock CLK to AQ BQ Time after the clock that data is stable at the AQ BQ CQ DQ CQ DQ outputs outputs of the slice sequential elements configured as a latch Tsucko FF Clock CLK to AMUX Time after the clock that data is stable at the AMUX BMUX BMUX CMUX DMUX CMUX DMUX outputs through the slice flip flops outputs Setup and Hold Times for Slice Sequential Elements Tas TaH A B C D inputs Time before after the CLK that data from the A B C D LUT inputs of the slice must be stable Tpick Tckpi AX BX CX DX inputs Time before after the CLK that data from the AX BX CX DX inputs of the slice must be stable Tcgck TckcE CE input Time before after the CLK that the CE input of the slice must be stable Tsnck TcksR SR input Time before after the CLK that the SR Set Reset input of the slice must be stable Set Reset Tgpw Tsgmum SR input Minimum pulse width for the SR Set Reset Tro SR input Propagation delay for an asynchronous Set Reset of the slice sequential elements Miscellaneous Tcgo CE input Propagation delay from CE enable to latch output at AQ BQ CO DO outputs Frog CLK input Toggle Frequency Maximum frequency that a slice flip flop can be clocked For export control Notes 1 This parameter includes a LUT configured as two five
70. the LUT This operation is asynchronous and independent of the clock signal Distributed RAM Summary Here is a summary of distributed RAM features e Single port and dual port modes are available in SLICEMs e A write operation requires one clock edge e Read operations are asynchronous Q output e The data input has a setup to clock timing specification Read Only Memory ROM Each function generator in both SLICEMs and SLICELs can implement a 64 x 1 bit ROM Three configurations are available ROM64X1 ROM128X1 and ROM256X1 ROM contents are loaded at each device configuration Table 2 4 shows the number of LUTs occupied by each ROM configuration size Table 2 4 ROM Configuration ROM Number of LUTs 64x 1 1 128 x1 2 256 x 1 4 Shift Registers Available in SLICEM Only A SLICEM function generator can also be configured as a 32 bit shift register without using the flip flops available in a slice Used in this way each LUT can delay serial data from 1 to 32 clock cycles The shiftin D DI1 LUT pin and shiftout Q31 MC31 LUT pin lines cascade LUTs to form larger shift registers The four LUTs in a SLICEM are thus cascaded to produce delays up to 128 clock cycles It is also possible to combine shift registers across more than one SLICEM There are no direct connections between slices to form longer shift registers nor is the MC31 output at LUT B C D available The resulting programmable delays can be used to
71. ts across more than one SLICEM However there are no direct connections between slices to form these shift registers Shift Register Data Flow Shift Operation Figure 5 7 shows a timing diagram of the shift operation The shift operation has these characteristics e Operates on a single clock edge e Enabled by an active High clock enable e The input D is loaded into the first bit of the shift register e Each bit is also shifted to the next highest bit position 38 www xilinx com 7 Series FPGAs CLB User Guide 225 UG474 v1 7 November 17 2014 XILINX Multiplexers In a cascadable shift register configuration the last bit is shifted out on the MC31 output The bit selected by the 5 bit address port A 4 0 appears on the Q output Dynamic Read Operation In a dynamic read operation The Q output is determined by the 5 bit address Each time a new address is applied to the 5 input address pins the new bit position value is available on the Q output after the time delay to access the LUT This operation is asynchronous and independent of the clock and clock enable signals Static Read Operation In a static read operation If the 5 bit address is fixed the Q output always uses the same bit position This mode implements any shift register length from 1 to 32 bits in one LUT The shift register length is N 1 where N is the input address 0 31 The Q output changes synchronously with each shift operation The pr
72. unters or adder subtractors automatically infer the carry logic More complex functions including multipliers can be implemented using the separate DSP48E1 slice The DSP48E1 slices in every device support many independent functions including multiply multiply accumulate multiply add three input add barrel shift wide bus multiplexing magnitude comparator bitwise logic functions pattern detection and wide counter The architecture also supports cascading multiple DSP48E1 slices to form wide math functions DSP filters and complex arithmetic For more details on the DSP48E1 slice see UG479 7 Series DSP48E1 Slice User Guide The choice between using CLB carry logic and the DSP48E1 slice depends on the application A small arithmetic function can be faster and lower power using the CLB carry logic rather than using an entire DSPA8EI slice When the DSP48E1 slices are fully utilized the CLB slices and carry logic provide an efficient alternative Using Carry Logic Carry logic can be inferred or instantiated Using macros designed for the 7 series FPGA can provide the most flexibility and efficiency especially for more complex functions Unimacros include adders counters comparators multipliers and multiplier accumulators The unimacros use the DSP48E1 slice The Xilinx IP includes a similar set of functions When defining these functions the user can specify whether the implementation should be in the CLB carry logic or in the
73. vanced Topics Using the Latch Function as Logic 71 Interconnect Resources ee 72 Devices Using Stacked Silicon Interconnect SSI Technology 73 www xilinx com 7 Series FPGAs CLB User Guide send Feedback UG474 v1 7 November 17 2014 XILINX Preface About This Guide Xilinx 7 series FPGAs include three FPGA families that are all designed for lowest power to enable a common design to scale across families for optimal power performance and cost The Artix 7 family is optimized for lowest cost and absolute power for the highest volume applications The Virtex 7 family is optimized for highest system performance and capacity The Kintex 7 family is an innovative class of FPGAs optimized for the best price performance This guide serves as a technical reference describing the 7 series FPGAs configurable logic blocks CLBs Usually logic synthesis assigns the CLB resources without system designer intervention It can be advantageous for the designer to understand certain CLB details including the varying capabilities of the look up tables LUTs the physical direction of the carry propagation the number and distribution of the available flip flops and the availability of the very efficient shift registers This guide describes these and other features of the CLB in detail This 7 Series FPGA
74. verification but are not necessary for initiating a design This chapter includes CLB Arrangement Overview of slice locations and features within the CLB Slice Description Complete details of SLICEM and SLICEL Look Up Table LUT Description of the logical function generators Storage Elements Description and controls for the latches and flip flops Distributed RAM Available in SLICEM Only SLICEM ability to use LUTs as writable memory Shift Registers Available in SLICEM Only SLICEM ability to use LUTs as shift registers Multiplexers Dedicated gates for combining LUTs into wide functions Carry Logic Dedicated gates and cascading to implement efficient arithmetic functions CLB Arrangement The CLBs are arranged in columns in the 7 series FPGAs The 7 series is the fourth generation to be based on the unique columnar approach provided by the ASMBLIM architecture ASMBL Architecture Xilinx created the Advanced Silicon Modular Block ASMBL architecture to enable FPGA platforms with varying feature mixes optimized for different application domains Through this innovation Xilinx offers a greater selection of devices enabling customers to select the FPGA with the right mix of features and capabilities for their specific design Figure 2 1 provides a high level description of the different types of column based resources 7 Series FPGAs CLB User Guide www xilinx com Sand Feedback 15 UG474 v1 7 November 17 2014 Chapter

7 Series FPGAs Configurable Logic Block User Guide (UG474)

Contents

Download Pdf Manuals

Related Search

Related Contents