logging in or signing up Digital Integrated Circuit ASIC and FPGA amgadyounis Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT lite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 401 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: September 19, 2010 This Presentation is Public Favorites: 1 Presentation Description No description available. Comments Posting comment... By: gmkr (17 month(s) ago) good........ Saving..... Post Reply Close Saving..... Edit Comment Close Premium member Presentation Transcript Digital Integrated Circuit: ASIC and FPGA : 1 Digital Integrated Circuit: ASIC and FPGA Theerayod Wiangtong Electronic Department Mahanakorn University of Technology Outlines : Outlines Introduction to Digital IC designs: History, Evolutions, etc. Introduction to ASIC: CMOS IC designs Introduction to FPGA: Chips and Design Processs 2 What is this course all about? : 3 What is this course all about? Digital integrated circuit design Cell-based: CMOS devices and manufacturing technology. CMOS inverters and gates. Propagation delay, noise margins, and power dissipation. Sequential circuits. Arithmetic, interconnect, and memories. Array based: Programmable logic arrays. FPGA, HDL, Design methodologies. The First Computer : 4 The First Computer ENIAC - The first electronic computer (1946) : 5 ENIAC - The first electronic computer (1946) Intel Pentium (IV) microprocessor : 6 Intel Pentium (IV) microprocessor The Computer in 50 years later! Moore’s Law : 7 Moore’s Law In 1965, Gordon Moore noted that the number of transistors on a chip doubled every 18 to 24 months. He made a prediction that semiconductor technology will double its effectiveness every 18 months Evolution in Complexity : 8 Evolution in Complexity Transistor Counts : 9 Transistor Counts 1,000,000 100,000 10,000 1,000 10 100 1 1975 1980 1985 1990 1995 2000 2005 2010 8086 80286 i386 i486 Pentium® Pentium® Pro K 1 Billion Transistors Source: Intel Projected Pentium® II Pentium® III Courtesy, Intel Moore’s law in Microprocessors : 10 Moore’s law in Microprocessors 4004 8008 8080 8085 8086 286 386 486 Pentium® proc P6 0.001 0.01 0.1 1 10 100 1000 1970 1980 1990 2000 2010 Year Transistors (MT) 2X growth in 1.96 years! Transistors on Lead Microprocessors double every 2 years Courtesy, Intel Die Size Growth : 11 Die Size Growth 4004 8008 8080 8085 8086 286 386 486 Pentium ® proc P6 1 10 100 1970 1980 1990 2000 2010 Year Die size (mm) ~7% growth per year ~2X growth in 10 years Die size grows by 14% to satisfy Moore’s Law Courtesy, Intel Frequency : 12 Frequency P6 Pentium ® proc 486 386 286 8086 8085 8080 8008 4004 0.1 1 10 100 1000 10000 1970 1980 1990 2000 2010 Year Frequency (Mhz) Lead Microprocessors frequency doubles every 2 years Doubles every2 years Courtesy, Intel Power Dissipation : 13 Power Dissipation P6 Pentium ® proc 486 386 286 8086 8085 8080 8008 4004 0.1 1 10 100 1971 1974 1978 1985 1992 2000 Year Power (Watts) Lead Microprocessors power continues to increase Courtesy, Intel Power will be a major problem : 14 Power will be a major problem 5KW 18KW 1.5KW 500W 4004 8008 8080 8085 8086 286 386 486 Pentium® proc 0.1 1 10 100 1000 10000 100000 1971 1974 1978 1985 1992 2000 2004 2008 Year Power (Watts) Power delivery and dissipation will be prohibitive Courtesy, Intel Power density : 15 Power density 4004 8008 8080 8085 8086 286 386 486 Pentium® proc P6 1 10 100 1000 10000 1970 1980 1990 2000 2010 Year Power Density (W/cm2) Power density too high to keep junctions at low temp Courtesy, Intel Challenges in Digital Design : 16 Challenges in Digital Design “Microscopic Problems” • Ultra-high speed design Interconnect • Noise, Crosstalk • Reliability, Manufacturability • Power Dissipation • Clock distribution. Everything Looks a Little Different “Macroscopic Issues” • Time-to-Market • Millions of Gates • High-Level Abstractions • Reuse & IP: Portability • Predictability • etc. …and There’s a Lot of Them! ? Design Abstraction Levels : 17 Design Abstraction Levels n+ n+ S G D + DEVICE CIRCUIT GATE MODULE SYSTEM Design Metrics : 18 Design Metrics How to evaluate performance of a digital circuit (gate, block, …)? Cost Reliability Scalability Speed (delay, operating frequency) Power dissipation Energy to perform a function Cost of Integrated Circuits : 19 Cost of Integrated Circuits NRE (non-recurrent engineering) costs design time and effort, mask generation one-time cost factor Recurrent costs silicon processing, packaging, test proportional to volume proportional to chip area NRE Cost is Increasing : 20 NRE Cost is Increasing Die Cost : 21 Die Cost Single die Wafer From http://www.amd.com Going up to 12” (30cm) Cost per Transistor : 22 Cost per Transistor 0.0000001 0.000001 0.00001 0.0001 0.001 0.01 0.1 1 1982 1985 1988 1991 1994 1997 2000 2003 2006 2009 2012 cost: ¢-per-transistor Fabrication capital cost per transistor (Moore’s law) Yield : 23 Yield Some Examples : 24 Some Examples Impact ofTechnology Scaling : 25 Impact ofTechnology Scaling Goals of Technology Scaling : 26 Goals of Technology Scaling Make things cheaper: Want to sell more functions (transistors) per chip for the same money Build same products cheaper, sell the same part for less money Price of a transistor has to be reduced But also want to be faster, smaller, lower power Technology Evolution (2000 data) : 27 Technology Evolution (2000 data) International Technology Roadmap for Semiconductors Node years: 2007/65nm, 2010/45nm, 2013/33nm, 2016/23nm ITRS Technology Roadmap : 28 ITRS Technology Roadmap Terminology : 29 Terminology ITRS: International Technology Roadmap for Semiconductors. It is devised and intended for technology assessment only and is without regard to any commercial considerations pertaining to individual products or equipment DRAM Half-pitch: The common measure of the technology generation of a chip. It is half the distance between cells in a dynamic RAM memory chip. For example, in 2002, the DRAM half pitch has been reduced to 130 nm (.13 micron). Half Pitch : 30 Half Pitch Technology Scaling: To preserve Moore’s Law : 31 Technology Scaling: To preserve Moore’s Law Number of components per chip 2010 Outlook : 32 2010 Outlook Performance 2X/16 months 1 TIP (terra instructions/s) 30 GHz clock Size No of transistors: 2 Billion Die: 40*40 mm Power 10kW!! Leakage: 1/3 active Power Wafer Size : 33 Wafer Size 450mm/2012f NRE Cost: Example : 34 NRE Cost: Example http://www.mosis.org/prices.html Some interesting questions : 35 Some interesting questions What will cause this model to break? When will it break? Will the model gradually slow down? Power and power density Leakage Process Variation Delay NRE Cost Etc. Summary : 36 Summary Digital integrated circuits have come a long way and still have quite some potential left for the coming decades. Some interesting challenges ahead Getting a clear perspective on the challenges and potential solutions is the purpose of this book Understanding the design metrics that govern digital design is crucial Cost, reliability, speed, power and energy dissipation ASIC: CMOS and Manufacturing Process : ASIC: CMOS and Manufacturing Process Theerayod Wiangtong Electronic Department Mahanakorn University of Technology VLSI : VLSI Integrated circuits: many transistors on one chip. Very Large Scale Integration (VLSI): very many Complementary Metal Oxide Semiconductor Fast, cheap, low power transistors Today: How to build your own simple CMOS chip CMOS transistors Building logic gates from transistors Transistor layout and fabrication Rest of the course: How to build a good CMOS chip Class : Class Silicon Lattice : Silicon Lattice Transistors are built on a silicon substrate Silicon is a Group IV material Forms crystal lattice with bonds to four neighbors http://jas.eng.buffalo.edu/education/solid/unitCell/home.html Dopants : Dopants Silicon is a semiconductor Pure silicon has no free carriers and conducts poorly Adding dopants increases the conductivity Group V: extra electron (n-type) Group III: missing electron, called hole (p-type) p-n Junctions : p-n Junctions A junction between p-type and n-type semiconductor forms a diode. Current flows only in one direction MOS Structure : MOS Structure nMOS Transistor : nMOS Transistor Four terminals: gate, source, drain, body Gate – oxide – body stack looks like a capacitor Gate and body are conductors SiO2 (oxide) is a very good insulator Called metal – oxide – semiconductor (MOS) capacitor Even though gate is no longer made of metal nMOS Operation : nMOS Operation Body is commonly tied to ground (0 V) When the gate is at a low voltage: P-type body is at low voltage Source-body and drain-body diodes are OFF No current flows, transistor is OFF nMOS Operation Cont. : nMOS Operation Cont. When the gate is at a high voltage: Positive charge on gate of MOS capacitor Negative charge attracted to body Inverts a channel under gate to n-type Now current can flow through n-type silicon from source through channel to drain, transistor is ON pMOS Transistor : pMOS Transistor Similar, but doping and voltages reversed Body tied to high voltage (VDD) Gate low: transistor ON Gate high: transistor OFF Bubble indicates inverted behavior Power Supply Voltage : Power Supply Voltage GND = 0 V In 1980’s, VDD = 5V VDD has decreased in modern processes High VDD would damage modern tiny transistors Lower VDD saves power VDD = 3.3, 2.5, 1.8, 1.5, 1.2, 1.0, … CMOS Fabrication : CMOS Fabrication CMOS transistors are fabricated on silicon wafer Lithography process similar to printing press On each step, different materials are deposited or etched Easiest to understand by viewing both top and cross-section of wafer in a simplified manufacturing process Photo-Lithographic Process : oxidation optical mask process step photoresist coating photoresist removal (ashing) spin, rinse, dry acid etch photoresist stepper exposure development Typical operations in a single photolithographic cycle (from [Fullman]). Photo-Lithographic Process http://it.darden.virginia.edu/explore/content/index_frames.htm Slide 51: Circuit Under Design & Layout View Inverter Cross-section : Inverter Cross-section Typically use p-type substrate for nMOS transistors Requires n-well for body of pMOS transistors Well and Substrate Taps : Well and Substrate Taps Substrate must be tied to GND and n-well to VDD Metal to lightly-doped semiconductor forms poor connection called Shottky Diode Use heavily doped well and substrate contacts / taps Inverter Mask Set : Inverter Mask Set Transistors and wires are defined by masks Cross-section taken along dashed line Detailed Mask Views : Detailed Mask Views Six masks n-well Polysilicon n+ diffusion p+ diffusion Contact Metal Fabrication Steps : Fabrication Steps Start with blank wafer Build inverter from the bottom up First step will be to form the n-well Cover wafer with protective layer of SiO2 (oxide) Remove layer where n-well should be built Implant or diffuse n dopants into exposed wafer Strip off SiO2 Oxidation : Oxidation Grow SiO2 on top of Si wafer 900 – 1200 C with H2O or O2 in oxidation furnace Photoresist : Photoresist Spin on photoresist Photoresist is a light-sensitive organic polymer Softens where exposed to light Lithography : Lithography Expose photoresist through n-well mask Strip off exposed photoresist Etch : Etch Etch oxide with hydrofluoric acid (HF) Seeps through skin and eats bone; nasty stuff!!! Only attacks oxide where resist has been exposed Strip Photoresist : Strip Photoresist Strip off remaining photoresist Use mixture of acids called piranah etch Necessary so resist doesn’t melt in next step n-well : n-well n-well is formed with diffusion or ion implantation Diffusion Place wafer in furnace with arsenic gas Heat until As atoms diffuse into exposed Si Ion Implantation Blast wafer with beam of As ions Ions blocked by SiO2, only enter exposed Si Strip Oxide : Strip Oxide Strip off the remaining oxide using HF Back to bare wafer with n-well Subsequent steps involve similar series of steps Polysilicon : Polysilicon Deposit very thin layer of gate oxide < 20 Å (6-7 atomic layers) Chemical Vapor Deposition (CVD) of silicon layer Place wafer in furnace with Silane gas (SiH4) Forms many small crystals called polysilicon Heavily doped to be good conductor Polysilicon Patterning : Polysilicon Patterning Use same lithography process to pattern polysilicon Self-Aligned Process : Self-Aligned Process Use oxide and masking to expose where n+ dopants should be diffused or implanted N-diffusion forms nMOS source, drain, and n-well contact N-diffusion : N-diffusion Pattern oxide and form n+ regions Self-aligned process where gate blocks diffusion Polysilicon is better than metal for self-aligned gates because it doesn’t melt during later processing N-diffusion cont. : N-diffusion cont. Historically dopants were diffused Usually ion implantation today But regions are still called diffusion N-diffusion cont. : N-diffusion cont. Strip off oxide to complete patterning step P-Diffusion : P-Diffusion Similar set of steps form p+ diffusion regions for pMOS source and drain and substrate contact Contacts : Contacts Now we need to wire together the devices Cover chip with thick field oxide Etch oxide where contact cuts are needed Metalization : Metalization Sputter on aluminum over whole wafer Pattern to remove excess metal, leaving wires Layout : Layout Layout : Layout Chips are specified with set of masks Minimum dimensions of masks determine transistor size (and hence speed, cost, and power) Feature size improves 30% every 3 years or so Normalize for feature size when describing design rules Transistor Layout : Transistor Layout Design Rules : Design Rules Interface between designer and process engineer Guidelines for constructing process masks Unit dimension: Minimum line width scalable design rules: lambda parameter absolute dimensions (micron rules) CMOS Process Layers : CMOS Process Layers Layers in 0.25 mm CMOS process : Layers in 0.25 mm CMOS process Intra-Layer Design Rules : Intra-Layer Design Rules Metal2 4 3 Vias and Contacts : Vias and Contacts CMOS Inverter Layout : CMOS Inverter Layout Design tools : Design tools Layout Editor : Layout Editor Design Rule Checker : Design Rule Checker poly_not_fet to all_diff minimum spacing = 0.14 um. Sticks Diagram : Sticks Diagram Dimensionless layout entities Only topology is important Final layout generated by “compaction” program Introduction to CPLD/FPGATechnology, Devices and Tools : 86 Introduction to CPLD/FPGATechnology, Devices and Tools Theerayod Wiangtong Electronic Department Mahanakorn University of Technology Outline : 87 Programmable Logic CPLD FPGA Architecture: Basic & Advance Examples Features Vendors and Devices Design Tools Outline World of Integrated Circuits : 88 Full-Custom ASICs Semi-Custom ASICs User Programmable PLD FPGA World of Integrated Circuits ASIC : 89 ASIC ASIC: Application SpecificIntegrated Circuit Designs must be sent for expensive and time consuming fabrication in semiconductor foundry Designed all the way from behavioral description to physical layout CPLD/FPGA : 90 CPLD/FPGA CPLD: Complex Programmable Logic Device FPGA: Field Programmable Gate Array Small development overhead No NRE (non-recurring engineering) costs Quick time to market No minimum quantity order Reprogrammable Which Way to Go? : 91 Which Way to Go? Off-the-shelf Low development cost Short time to market Reconfigurability High performance ASICs CPLD/FPGAs Low power Low cost in high volumes Other Advantages : 92 Other Advantages Manufacturing cycle for ASIC is very costly, lengthy and engages lots of manpower Mistakes not detected at design time have large impact on development time and cost FPGAs are perfect for rapid prototyping of digital circuits Easy upgrades like in case of software Unique applications Reconfigurable computing Programmable LogicCPLD/FPGA : 93 Programmable LogicCPLD/FPGA Programmable Logic : 94 Programmable Logic Programmable digital integrated circuit Standard off-the-shelf parts Desired functionality is implemented by configuring on-chip logic blocks and interconnections Types of programmable logic: Complex PLDs (CPLD) Field programmable Gate Arrays (FPGA) PLD - Sum of Products : 95 PLD - Sum of Products Programmable AND array followed by fixed fan-in OR gates Programmable switch or fuse PLD - Macrocell : 96 PLD - Macrocell Can implement combinational or sequential logic A B C Flip-flop Select Enable D Q Clock AND plane MUX CPLD Structure : 97 CPLD Structure Integration of several PLD blocks with a programmable interconnect on a single chip CPLD Example - Altera MAX7000 : 98 CPLD Example - Altera MAX7000 EPM7000 Series Block Diagram CPLD Example - Altera MAX7000 : 99 CPLD Example - Altera MAX7000 EPM7000 Series Device Macrocell FPGA Architecture : 100 FPGA Architecture FPGA - Generic Structure : 101 FPGA - Generic Structure FPGA building blocks: Programmable logic blocksImplement combinatorial and sequential logic Programmable interconnectWires to connect inputs and outputs to logic blocks Programmable I/O blocks Special logic blocks at the periphery of device for external connections FPGA – Basic Logic Element : 102 FPGA – Basic Logic Element LUT to implement combinatorial logic Register for sequential circuits Additional logic (not shown): Carry logic for arithmetic functions Expansion logic for functions requiring more than 4 inputs Look-Up Tables (LUT) : 103 Look-Up Tables (LUT) Look-up table with N-inputs can be used to implement any combinatorial function of N inputs LUT is programmed with the truth-table Truth-table Gate implementation LUT implementation LUT Implementation : 104 LUT Implementation Example: 3-input LUT Based on multiplexers (pass transistors) LUT entries stored in configuration memory cells Configuration memory cells Programmable Interconnect : 105 Programmable Interconnect Interconnect hierarchy (not shown) Fast local interconnect Horizontal and vertical lines of various lengths Switch Matrix Operation : 106 Switch Matrix Operation 6 pass transistors per switch matrix interconnect point Pass transistors act as programmable switches Pass transistor gates are driven by configuration memory cells After Programming Before Programming Configuration Storage Elements : 107 Configuration Storage Elements Static Random Access Memory (SRAM) each switch is a pass transistor controlled by the state of an SRAM bit FPGA needs to be configured at power-on Flash Erasable Programmable ROM (Flash) each switch is a floating-gate transistor that can be turned off by injecting charge onto its gate. FPGA itself holds the program reprogrammable, even in-circuit Fusible Links (“Antifuse”) Forms a forms a low resistance path when electrically programmed one-time programmable in special programming machine radiation tolerant FPGA Technology Roadmap : 108 FPGA Technology Roadmap Special Features : 109 Special Features Clock management PLL,DLL Eliminate clock skew between external clock input and on-chip clock Low-skew global clock distribution network Embedded memory blocks Support for various interface standards High-speed serial I/Os Embedded processor cores DSP blocks FPGA Vendors & Device Families : 110 FPGA Vendors & Device Families Xilinx Virtex-II/Virtex-4: Feature-packed high-performance SRAM-based FPGA Spartan 3: low-cost feature reduced version CoolRunner: CPLDs Altera Stratix/Stratix-II High-performance SRAM-based FPGAs Cyclone/Cyclone-II Low-cost feature reduced version for cost-critical applications MAX3000/7000 CPLDs MAX-II: Flash-based FPGA Actel Anti-fuse based FPGAs Radiation tolerant Flash-based FPGAs Lattice Flash-based FPGAs CPLDs (EEPROM) QuickLogic ViaLink-based FPGAs State of the Art in FPGAs : 111 State of the Art in FPGAs 90 nm process on 300 mm wafers Lower cost per function (LUT + register) Smaller and faster transistors: Higher speed System speed up to 500 MHz Mainly through smart interconnects, clock management, dedicated circuits, flexible I/O. Integrated transceivers running at 10 Gigabits/sec More Logic and Better Features: >100,000 LUTs & flip-flops >200 embedded RAMs, and same number 18 x 18 multipliers 1156 pins (balls) with >800 GP I/O 50 I/O standards, incl. LVDS with internal termination 16 low-skew global clock lines Multiple clock management circuits On-chip microprocessor(s) and multi-Gbps transceivers Latest Devices: Capacity & Features : 112 Latest Devices: Capacity & Features Xilinx Virtex-4 90nm process Up to 960 I/Os >200000 logic cells Up to 552 18kb block RAMs (~10Mb RAM) 192 DSP slices (18x18 multiplier-accumulator) 20 digital clock managers (DCM) 24 high-speed serial transceivers (622Mb/s to 11.1Gb/s) Up to four PowerPC 405 cores Altera Stratix-II 90nm process Up to 1170 I/Os 179000 logic elements 9.6Mb embedded RAM 96 DSP blocks: 380 18x18 multipliers 12 PLLs Serial I/O up to 1Gb/s No hard processor cores ALTERA : 113 ALTERA Device Families & Tools : 114 Device Families & Tools Device Roadmap : 115 Device Roadmap Technology : 116 Technology Logic Density : 117 Logic Density Pricing Roadmap : 118 Pricing Roadmap FLEX10K Basic Architecture : 119 FLEX10K Basic Architecture Logic Array Block: FLEX10K : 120 Logic Array Block: FLEX10K Logic Element of FLEX10K : 121 Logic Element of FLEX10K Advance Altera Architecture : 122 Advance Altera Architecture Stratix Device : 123 Stratix Device Stratix Device Family : 124 Stratix Device Family Altera: Embedded DSP Blocks : 125 Altera: Embedded DSP Blocks Two DSP Block columns per device Number varies by height of column Can implement: Eight 9x9 multipliers Four 18x18 multipliers One 36x36 multiplier Contains adder/subtractor/accumulator Registered inputs can become shift register Altera: Embedded DSP Block : 126 Altera: Embedded DSP Block Embedded RAM : 127 Embedded RAM Dual-Port RAM M512 – 512 x 1 M4K – 4096 x 1 M-RAM – 64K x 8 Embedded RAM Block : 128 Embedded RAM Block ALTERA High Speed I/O : 129 ALTERA High Speed I/O Embedded Processor : 130 Embedded Processor Soft Processor: NIOS 32bit @150MHz Hard Processor: ARM922T 32bit RISC @200 MHz (Excalibur device) Additional features Communication Controller Integrated MMU (Memory Management Unit) High-Speed Memory Interface C-Level Simulation Multi-Processor Support NIOS II Family : 131 NIOS II Family Max II Device : 132 Max II Device Xilinx : 133 Xilinx Product Overview : 134 Product Overview High Volume Low Cost High Performance High Density Low Power Low Cost CPLD Rom-based Xilinx FPGA Families : 135 Xilinx FPGA Families Old families XC3000, XC4000, XC5200 Old 0.5µm, 0.35µm and 0.25µm technology. Not recommended for modern designs. High-performance families Virtex (0.22µm) Virtex-E, Virtex-EM (0.18µm) Virtex-II, Virtex-II PRO (0.13µm) Low Cost Family Spartan/XL – derived from XC4000 Spartan-II – derived from Virtex Spartan-IIE – derived from Virtex-E Spartan-3 Basic FPGA Architecture Spartan-II : 136 Basic FPGA Architecture Spartan-II CLB Structure : 137 CLB Structure Contains 2 slices Each slice has 2 LUT-FF pairs with associated carry logic Two 3-state buffers (BUFT) associated with each CLB, accessible by all CLB outputs CLB Slice Structure : 138 CLB Slice Structure Each slice contains two sets of the following: Four-input LUT Any 4-input logic function, or 16-bit x 1 sync RAM or 16-bit shift register Carry & Control Fast arithmetic logic Multiplier logic Multiplexer logic Storage element Latch or flip-flop Set and reset True or inverted inputs Sync. or async. control Example: 5-Input Functions implemented using two LUTs : 139 Example: 5-Input Functions implemented using two LUTs OUT Dedicated Expansion Multiplexers : 140 Dedicated Expansion Multiplexers MUXF5 combines 2 LUTs to create Any 5-input function (LUT5) Or selected functions up to 9 inputs Or 4x1 multiplexer MUXF6 combines 2 slices to form Any 6-input function (LUT6) Or selected functions up to 19 inputs 8x1 multiplexer Distributed RAM : 141 Distributed RAM CLB LUT configurable as Distributed RAM A LUT equals 16x1 RAM Implements Single and Dual-Ports Cascade LUTs to increase RAM size Synchronous write Synchronous/Asynchronous read Accompanying flip-flops used for synchronous read Fast Carry Logic : 142 Each CLB contains separate logic and routing for the fast generation of sum & carry signals Increases efficiency and performance of adders, subtractors, accumulators, comparators, and counters Carry logic is independent of normal logic and routing resources Fast Carry Logic LSB MSB Carry Logic Routing Basic I/O Block Structure : 143 Basic I/O Block Structure Each IOB can work as uni- or bi-directional I/O Outputs can be forced into High Impedance Inputs and outputs can be registered Inputs can be delayed Advance Xilinx Architecture : 144 Advance Xilinx Architecture Virtex-II Pro : 145 Virtex-II Pro 130nm CMOS Copper Low-K 1200 I/Os, 1696 Pin Package 125,000 Logic Cells 10 Megabits of RAM 556 XTREME DSP Multipliers 16 3.125 Gbps transceivers 4 PowerPC CPUs Virtex-II Pro Vertex-II Pro : 146 Vertex-II Pro PowerPC 405 Digital Clock Management (DCM) provides 16 independent clock domains Clock divide, multiply, phase shift Enhanced Phase Locked Loops (PLLs) Routing Resources (90%) Dedicated multipliers and memory Block RAM : 147 Block RAM Most efficient memory implementation Dedicated blocks of memory Ideal for most memory requirements 4 to 14 memory blocks 4096 bits per blocks Use multiple blocks for larger memories Builds both single and true dual-port RAMs Dual-Port Bus Flexibility : 148 RAMB4_S4_S16 Port A Out 4-Bit Width Port B In 256-Bit Depth Port A In 1K-Bit Depth Port B Out 16-Bit Width DOA[3:0] DOB[15:0] WEA ENA RSTA ADDRA[9:0] CLKA DIA[3:0] WEB ENB RSTB ADDRB[7:0] CLKB DIB[15:0] Dual-Port Bus Flexibility Each port can be configured with a different data bus width Provides easy data width conversion without any additional logic Two Independent Single-Port RAMs : 149 VCC, ADDR[10:0] GND, ADDR[10:0] RAMB4_S1_S1 Port B Out 1-Bit Width Port B In 2K-Bit Depth Port A Out 1-Bit Width Port A In 2K-Bit Depth Two Independent Single-Port RAMs Can split a Dual-Port 4K RAM into two Single-Port 2K RAM Simultaneous independent access to each RAM To access the lower RAM Tie the MSB address bit to Logic Low To access the upper RAM Tie the MSB address bit to Logic High Rocket I/O : 150 Rocket I/O From 4 to 24 RocketIO MGTs per Virtex-II Pro™ device Continuous operating range 622 Mbps to 3.125 Gbps Virtex 4: 11.1 Gbps !!! Embedded Processor : 151 Embedded Processor Soft Processor: MicroBlaze 32bit @150MHz Hard Processor: IBM PowerPC405 32bit RISC @300MHz (in Vertex-II Pro) Low Power Consumption: 0.9 mW/MHz Five-Stage Data Path Pipeline Hardware Multiply/Divide Unit Thirty-Two 32-bit General Purpose Registers Memory Management Unit (MMU) Dedicated On-Chip Memory (OCM) Interface Supports IBM CoreConnect™ Bus Architecture Debug and Trace Support FPGA Design Tools : 152 FPGA Design Tools Design process (1) : 153 Design process (1) Design and implement a simple unit permitting to speed up encryption with RC5-similar cipher with fixed key set on 8031 microcontroller. Unlike in the experiment 5, this time your unit has to be able to perform an encryption algorithm by itself, executing 32 rounds….. Library IEEE; use ieee.std_logic_1164.all; use ieee.std_logic_unsigned.all; entity RC5_core is port( clock, reset, encr_decr: in std_logic; data_input: in std_logic_vector(31 downto 0); data_output: out std_logic_vector(31 downto 0); out_full: in std_logic; key_input: in std_logic_vector(31 downto 0); key_read: out std_logic; ); end AES_core; Specification (Lab Experiments) VHDL description (Your Source Files) Functional simulation Post-synthesis simulation Synthesis Design process (2) : 154 Design process (2) Implementation Configuration Timing simulation On chip testing Active-HDL : 155 Active-HDL Simulation and Synthesis Tools : 156 Simulation and Synthesis Tools Logic Synthesis : 157 architecture MLU_DATAFLOW of MLU is signal A1:STD_LOGIC; signal B1:STD_LOGIC; signal Y1:STD_LOGIC; signal MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC; begin A1<=A when (NEG_A='0') else not A; B1<=B when (NEG_B='0') else not B; Y<=Y1 when (NEG_Y='0') else not Y1; MUX_0<=A1 and B1; MUX_1<=A1 or B1; MUX_2<=A1 xor B1; MUX_3<=A1 xnor B1; with (L1 & L0) select Y1<=MUX_0 when "00", MUX_1 when "01", MUX_2 when "10", MUX_3 when others; end MLU_DATAFLOW; VHDL description Circuit netlist Logic Synthesis Features of synthesis tools : 158 Features of synthesis tools Interpret RTL code Produce synthesized circuit netlist in a standard EDIF format Give preliminary performance estimates Some can display circuit schematics corresponding to EDIF netlist Implementation : 159 Implementation After synthesis the entire implementation process is performed by FPGA vendor tools Xilinx ISE foundation 11.1i Altera Quartus II 9.2 3rd party tools for alliance version Circuit Compilation : 160 Circuit Compilation Assign a logical LUT to a physical location. Select wire segments And switches for Interconnection. 1. Technology Mapping 2. Placement 3. Routing Routing Example : 161 Routing Example Programmable Connections FPGA Configuration : 162 Configuration Once a design is implemented, you must create a file that the FPGA can understand This file is called a bit stream or configuration file The BIT file can be downloaded directly to the FPGA, or can be converted into a PROM file which stores the programming information QUESTIONS? : 163 QUESTIONS? THANK YOU You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
Digital Integrated Circuit ASIC and FPGA amgadyounis Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT lite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 401 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: September 19, 2010 This Presentation is Public Favorites: 1 Presentation Description No description available. Comments Posting comment... By: gmkr (17 month(s) ago) good........ Saving..... Post Reply Close Saving..... Edit Comment Close Premium member Presentation Transcript Digital Integrated Circuit: ASIC and FPGA : 1 Digital Integrated Circuit: ASIC and FPGA Theerayod Wiangtong Electronic Department Mahanakorn University of Technology Outlines : Outlines Introduction to Digital IC designs: History, Evolutions, etc. Introduction to ASIC: CMOS IC designs Introduction to FPGA: Chips and Design Processs 2 What is this course all about? : 3 What is this course all about? Digital integrated circuit design Cell-based: CMOS devices and manufacturing technology. CMOS inverters and gates. Propagation delay, noise margins, and power dissipation. Sequential circuits. Arithmetic, interconnect, and memories. Array based: Programmable logic arrays. FPGA, HDL, Design methodologies. The First Computer : 4 The First Computer ENIAC - The first electronic computer (1946) : 5 ENIAC - The first electronic computer (1946) Intel Pentium (IV) microprocessor : 6 Intel Pentium (IV) microprocessor The Computer in 50 years later! Moore’s Law : 7 Moore’s Law In 1965, Gordon Moore noted that the number of transistors on a chip doubled every 18 to 24 months. He made a prediction that semiconductor technology will double its effectiveness every 18 months Evolution in Complexity : 8 Evolution in Complexity Transistor Counts : 9 Transistor Counts 1,000,000 100,000 10,000 1,000 10 100 1 1975 1980 1985 1990 1995 2000 2005 2010 8086 80286 i386 i486 Pentium® Pentium® Pro K 1 Billion Transistors Source: Intel Projected Pentium® II Pentium® III Courtesy, Intel Moore’s law in Microprocessors : 10 Moore’s law in Microprocessors 4004 8008 8080 8085 8086 286 386 486 Pentium® proc P6 0.001 0.01 0.1 1 10 100 1000 1970 1980 1990 2000 2010 Year Transistors (MT) 2X growth in 1.96 years! Transistors on Lead Microprocessors double every 2 years Courtesy, Intel Die Size Growth : 11 Die Size Growth 4004 8008 8080 8085 8086 286 386 486 Pentium ® proc P6 1 10 100 1970 1980 1990 2000 2010 Year Die size (mm) ~7% growth per year ~2X growth in 10 years Die size grows by 14% to satisfy Moore’s Law Courtesy, Intel Frequency : 12 Frequency P6 Pentium ® proc 486 386 286 8086 8085 8080 8008 4004 0.1 1 10 100 1000 10000 1970 1980 1990 2000 2010 Year Frequency (Mhz) Lead Microprocessors frequency doubles every 2 years Doubles every2 years Courtesy, Intel Power Dissipation : 13 Power Dissipation P6 Pentium ® proc 486 386 286 8086 8085 8080 8008 4004 0.1 1 10 100 1971 1974 1978 1985 1992 2000 Year Power (Watts) Lead Microprocessors power continues to increase Courtesy, Intel Power will be a major problem : 14 Power will be a major problem 5KW 18KW 1.5KW 500W 4004 8008 8080 8085 8086 286 386 486 Pentium® proc 0.1 1 10 100 1000 10000 100000 1971 1974 1978 1985 1992 2000 2004 2008 Year Power (Watts) Power delivery and dissipation will be prohibitive Courtesy, Intel Power density : 15 Power density 4004 8008 8080 8085 8086 286 386 486 Pentium® proc P6 1 10 100 1000 10000 1970 1980 1990 2000 2010 Year Power Density (W/cm2) Power density too high to keep junctions at low temp Courtesy, Intel Challenges in Digital Design : 16 Challenges in Digital Design “Microscopic Problems” • Ultra-high speed design Interconnect • Noise, Crosstalk • Reliability, Manufacturability • Power Dissipation • Clock distribution. Everything Looks a Little Different “Macroscopic Issues” • Time-to-Market • Millions of Gates • High-Level Abstractions • Reuse & IP: Portability • Predictability • etc. …and There’s a Lot of Them! ? Design Abstraction Levels : 17 Design Abstraction Levels n+ n+ S G D + DEVICE CIRCUIT GATE MODULE SYSTEM Design Metrics : 18 Design Metrics How to evaluate performance of a digital circuit (gate, block, …)? Cost Reliability Scalability Speed (delay, operating frequency) Power dissipation Energy to perform a function Cost of Integrated Circuits : 19 Cost of Integrated Circuits NRE (non-recurrent engineering) costs design time and effort, mask generation one-time cost factor Recurrent costs silicon processing, packaging, test proportional to volume proportional to chip area NRE Cost is Increasing : 20 NRE Cost is Increasing Die Cost : 21 Die Cost Single die Wafer From http://www.amd.com Going up to 12” (30cm) Cost per Transistor : 22 Cost per Transistor 0.0000001 0.000001 0.00001 0.0001 0.001 0.01 0.1 1 1982 1985 1988 1991 1994 1997 2000 2003 2006 2009 2012 cost: ¢-per-transistor Fabrication capital cost per transistor (Moore’s law) Yield : 23 Yield Some Examples : 24 Some Examples Impact ofTechnology Scaling : 25 Impact ofTechnology Scaling Goals of Technology Scaling : 26 Goals of Technology Scaling Make things cheaper: Want to sell more functions (transistors) per chip for the same money Build same products cheaper, sell the same part for less money Price of a transistor has to be reduced But also want to be faster, smaller, lower power Technology Evolution (2000 data) : 27 Technology Evolution (2000 data) International Technology Roadmap for Semiconductors Node years: 2007/65nm, 2010/45nm, 2013/33nm, 2016/23nm ITRS Technology Roadmap : 28 ITRS Technology Roadmap Terminology : 29 Terminology ITRS: International Technology Roadmap for Semiconductors. It is devised and intended for technology assessment only and is without regard to any commercial considerations pertaining to individual products or equipment DRAM Half-pitch: The common measure of the technology generation of a chip. It is half the distance between cells in a dynamic RAM memory chip. For example, in 2002, the DRAM half pitch has been reduced to 130 nm (.13 micron). Half Pitch : 30 Half Pitch Technology Scaling: To preserve Moore’s Law : 31 Technology Scaling: To preserve Moore’s Law Number of components per chip 2010 Outlook : 32 2010 Outlook Performance 2X/16 months 1 TIP (terra instructions/s) 30 GHz clock Size No of transistors: 2 Billion Die: 40*40 mm Power 10kW!! Leakage: 1/3 active Power Wafer Size : 33 Wafer Size 450mm/2012f NRE Cost: Example : 34 NRE Cost: Example http://www.mosis.org/prices.html Some interesting questions : 35 Some interesting questions What will cause this model to break? When will it break? Will the model gradually slow down? Power and power density Leakage Process Variation Delay NRE Cost Etc. Summary : 36 Summary Digital integrated circuits have come a long way and still have quite some potential left for the coming decades. Some interesting challenges ahead Getting a clear perspective on the challenges and potential solutions is the purpose of this book Understanding the design metrics that govern digital design is crucial Cost, reliability, speed, power and energy dissipation ASIC: CMOS and Manufacturing Process : ASIC: CMOS and Manufacturing Process Theerayod Wiangtong Electronic Department Mahanakorn University of Technology VLSI : VLSI Integrated circuits: many transistors on one chip. Very Large Scale Integration (VLSI): very many Complementary Metal Oxide Semiconductor Fast, cheap, low power transistors Today: How to build your own simple CMOS chip CMOS transistors Building logic gates from transistors Transistor layout and fabrication Rest of the course: How to build a good CMOS chip Class : Class Silicon Lattice : Silicon Lattice Transistors are built on a silicon substrate Silicon is a Group IV material Forms crystal lattice with bonds to four neighbors http://jas.eng.buffalo.edu/education/solid/unitCell/home.html Dopants : Dopants Silicon is a semiconductor Pure silicon has no free carriers and conducts poorly Adding dopants increases the conductivity Group V: extra electron (n-type) Group III: missing electron, called hole (p-type) p-n Junctions : p-n Junctions A junction between p-type and n-type semiconductor forms a diode. Current flows only in one direction MOS Structure : MOS Structure nMOS Transistor : nMOS Transistor Four terminals: gate, source, drain, body Gate – oxide – body stack looks like a capacitor Gate and body are conductors SiO2 (oxide) is a very good insulator Called metal – oxide – semiconductor (MOS) capacitor Even though gate is no longer made of metal nMOS Operation : nMOS Operation Body is commonly tied to ground (0 V) When the gate is at a low voltage: P-type body is at low voltage Source-body and drain-body diodes are OFF No current flows, transistor is OFF nMOS Operation Cont. : nMOS Operation Cont. When the gate is at a high voltage: Positive charge on gate of MOS capacitor Negative charge attracted to body Inverts a channel under gate to n-type Now current can flow through n-type silicon from source through channel to drain, transistor is ON pMOS Transistor : pMOS Transistor Similar, but doping and voltages reversed Body tied to high voltage (VDD) Gate low: transistor ON Gate high: transistor OFF Bubble indicates inverted behavior Power Supply Voltage : Power Supply Voltage GND = 0 V In 1980’s, VDD = 5V VDD has decreased in modern processes High VDD would damage modern tiny transistors Lower VDD saves power VDD = 3.3, 2.5, 1.8, 1.5, 1.2, 1.0, … CMOS Fabrication : CMOS Fabrication CMOS transistors are fabricated on silicon wafer Lithography process similar to printing press On each step, different materials are deposited or etched Easiest to understand by viewing both top and cross-section of wafer in a simplified manufacturing process Photo-Lithographic Process : oxidation optical mask process step photoresist coating photoresist removal (ashing) spin, rinse, dry acid etch photoresist stepper exposure development Typical operations in a single photolithographic cycle (from [Fullman]). Photo-Lithographic Process http://it.darden.virginia.edu/explore/content/index_frames.htm Slide 51: Circuit Under Design & Layout View Inverter Cross-section : Inverter Cross-section Typically use p-type substrate for nMOS transistors Requires n-well for body of pMOS transistors Well and Substrate Taps : Well and Substrate Taps Substrate must be tied to GND and n-well to VDD Metal to lightly-doped semiconductor forms poor connection called Shottky Diode Use heavily doped well and substrate contacts / taps Inverter Mask Set : Inverter Mask Set Transistors and wires are defined by masks Cross-section taken along dashed line Detailed Mask Views : Detailed Mask Views Six masks n-well Polysilicon n+ diffusion p+ diffusion Contact Metal Fabrication Steps : Fabrication Steps Start with blank wafer Build inverter from the bottom up First step will be to form the n-well Cover wafer with protective layer of SiO2 (oxide) Remove layer where n-well should be built Implant or diffuse n dopants into exposed wafer Strip off SiO2 Oxidation : Oxidation Grow SiO2 on top of Si wafer 900 – 1200 C with H2O or O2 in oxidation furnace Photoresist : Photoresist Spin on photoresist Photoresist is a light-sensitive organic polymer Softens where exposed to light Lithography : Lithography Expose photoresist through n-well mask Strip off exposed photoresist Etch : Etch Etch oxide with hydrofluoric acid (HF) Seeps through skin and eats bone; nasty stuff!!! Only attacks oxide where resist has been exposed Strip Photoresist : Strip Photoresist Strip off remaining photoresist Use mixture of acids called piranah etch Necessary so resist doesn’t melt in next step n-well : n-well n-well is formed with diffusion or ion implantation Diffusion Place wafer in furnace with arsenic gas Heat until As atoms diffuse into exposed Si Ion Implantation Blast wafer with beam of As ions Ions blocked by SiO2, only enter exposed Si Strip Oxide : Strip Oxide Strip off the remaining oxide using HF Back to bare wafer with n-well Subsequent steps involve similar series of steps Polysilicon : Polysilicon Deposit very thin layer of gate oxide < 20 Å (6-7 atomic layers) Chemical Vapor Deposition (CVD) of silicon layer Place wafer in furnace with Silane gas (SiH4) Forms many small crystals called polysilicon Heavily doped to be good conductor Polysilicon Patterning : Polysilicon Patterning Use same lithography process to pattern polysilicon Self-Aligned Process : Self-Aligned Process Use oxide and masking to expose where n+ dopants should be diffused or implanted N-diffusion forms nMOS source, drain, and n-well contact N-diffusion : N-diffusion Pattern oxide and form n+ regions Self-aligned process where gate blocks diffusion Polysilicon is better than metal for self-aligned gates because it doesn’t melt during later processing N-diffusion cont. : N-diffusion cont. Historically dopants were diffused Usually ion implantation today But regions are still called diffusion N-diffusion cont. : N-diffusion cont. Strip off oxide to complete patterning step P-Diffusion : P-Diffusion Similar set of steps form p+ diffusion regions for pMOS source and drain and substrate contact Contacts : Contacts Now we need to wire together the devices Cover chip with thick field oxide Etch oxide where contact cuts are needed Metalization : Metalization Sputter on aluminum over whole wafer Pattern to remove excess metal, leaving wires Layout : Layout Layout : Layout Chips are specified with set of masks Minimum dimensions of masks determine transistor size (and hence speed, cost, and power) Feature size improves 30% every 3 years or so Normalize for feature size when describing design rules Transistor Layout : Transistor Layout Design Rules : Design Rules Interface between designer and process engineer Guidelines for constructing process masks Unit dimension: Minimum line width scalable design rules: lambda parameter absolute dimensions (micron rules) CMOS Process Layers : CMOS Process Layers Layers in 0.25 mm CMOS process : Layers in 0.25 mm CMOS process Intra-Layer Design Rules : Intra-Layer Design Rules Metal2 4 3 Vias and Contacts : Vias and Contacts CMOS Inverter Layout : CMOS Inverter Layout Design tools : Design tools Layout Editor : Layout Editor Design Rule Checker : Design Rule Checker poly_not_fet to all_diff minimum spacing = 0.14 um. Sticks Diagram : Sticks Diagram Dimensionless layout entities Only topology is important Final layout generated by “compaction” program Introduction to CPLD/FPGATechnology, Devices and Tools : 86 Introduction to CPLD/FPGATechnology, Devices and Tools Theerayod Wiangtong Electronic Department Mahanakorn University of Technology Outline : 87 Programmable Logic CPLD FPGA Architecture: Basic & Advance Examples Features Vendors and Devices Design Tools Outline World of Integrated Circuits : 88 Full-Custom ASICs Semi-Custom ASICs User Programmable PLD FPGA World of Integrated Circuits ASIC : 89 ASIC ASIC: Application SpecificIntegrated Circuit Designs must be sent for expensive and time consuming fabrication in semiconductor foundry Designed all the way from behavioral description to physical layout CPLD/FPGA : 90 CPLD/FPGA CPLD: Complex Programmable Logic Device FPGA: Field Programmable Gate Array Small development overhead No NRE (non-recurring engineering) costs Quick time to market No minimum quantity order Reprogrammable Which Way to Go? : 91 Which Way to Go? Off-the-shelf Low development cost Short time to market Reconfigurability High performance ASICs CPLD/FPGAs Low power Low cost in high volumes Other Advantages : 92 Other Advantages Manufacturing cycle for ASIC is very costly, lengthy and engages lots of manpower Mistakes not detected at design time have large impact on development time and cost FPGAs are perfect for rapid prototyping of digital circuits Easy upgrades like in case of software Unique applications Reconfigurable computing Programmable LogicCPLD/FPGA : 93 Programmable LogicCPLD/FPGA Programmable Logic : 94 Programmable Logic Programmable digital integrated circuit Standard off-the-shelf parts Desired functionality is implemented by configuring on-chip logic blocks and interconnections Types of programmable logic: Complex PLDs (CPLD) Field programmable Gate Arrays (FPGA) PLD - Sum of Products : 95 PLD - Sum of Products Programmable AND array followed by fixed fan-in OR gates Programmable switch or fuse PLD - Macrocell : 96 PLD - Macrocell Can implement combinational or sequential logic A B C Flip-flop Select Enable D Q Clock AND plane MUX CPLD Structure : 97 CPLD Structure Integration of several PLD blocks with a programmable interconnect on a single chip CPLD Example - Altera MAX7000 : 98 CPLD Example - Altera MAX7000 EPM7000 Series Block Diagram CPLD Example - Altera MAX7000 : 99 CPLD Example - Altera MAX7000 EPM7000 Series Device Macrocell FPGA Architecture : 100 FPGA Architecture FPGA - Generic Structure : 101 FPGA - Generic Structure FPGA building blocks: Programmable logic blocksImplement combinatorial and sequential logic Programmable interconnectWires to connect inputs and outputs to logic blocks Programmable I/O blocks Special logic blocks at the periphery of device for external connections FPGA – Basic Logic Element : 102 FPGA – Basic Logic Element LUT to implement combinatorial logic Register for sequential circuits Additional logic (not shown): Carry logic for arithmetic functions Expansion logic for functions requiring more than 4 inputs Look-Up Tables (LUT) : 103 Look-Up Tables (LUT) Look-up table with N-inputs can be used to implement any combinatorial function of N inputs LUT is programmed with the truth-table Truth-table Gate implementation LUT implementation LUT Implementation : 104 LUT Implementation Example: 3-input LUT Based on multiplexers (pass transistors) LUT entries stored in configuration memory cells Configuration memory cells Programmable Interconnect : 105 Programmable Interconnect Interconnect hierarchy (not shown) Fast local interconnect Horizontal and vertical lines of various lengths Switch Matrix Operation : 106 Switch Matrix Operation 6 pass transistors per switch matrix interconnect point Pass transistors act as programmable switches Pass transistor gates are driven by configuration memory cells After Programming Before Programming Configuration Storage Elements : 107 Configuration Storage Elements Static Random Access Memory (SRAM) each switch is a pass transistor controlled by the state of an SRAM bit FPGA needs to be configured at power-on Flash Erasable Programmable ROM (Flash) each switch is a floating-gate transistor that can be turned off by injecting charge onto its gate. FPGA itself holds the program reprogrammable, even in-circuit Fusible Links (“Antifuse”) Forms a forms a low resistance path when electrically programmed one-time programmable in special programming machine radiation tolerant FPGA Technology Roadmap : 108 FPGA Technology Roadmap Special Features : 109 Special Features Clock management PLL,DLL Eliminate clock skew between external clock input and on-chip clock Low-skew global clock distribution network Embedded memory blocks Support for various interface standards High-speed serial I/Os Embedded processor cores DSP blocks FPGA Vendors & Device Families : 110 FPGA Vendors & Device Families Xilinx Virtex-II/Virtex-4: Feature-packed high-performance SRAM-based FPGA Spartan 3: low-cost feature reduced version CoolRunner: CPLDs Altera Stratix/Stratix-II High-performance SRAM-based FPGAs Cyclone/Cyclone-II Low-cost feature reduced version for cost-critical applications MAX3000/7000 CPLDs MAX-II: Flash-based FPGA Actel Anti-fuse based FPGAs Radiation tolerant Flash-based FPGAs Lattice Flash-based FPGAs CPLDs (EEPROM) QuickLogic ViaLink-based FPGAs State of the Art in FPGAs : 111 State of the Art in FPGAs 90 nm process on 300 mm wafers Lower cost per function (LUT + register) Smaller and faster transistors: Higher speed System speed up to 500 MHz Mainly through smart interconnects, clock management, dedicated circuits, flexible I/O. Integrated transceivers running at 10 Gigabits/sec More Logic and Better Features: >100,000 LUTs & flip-flops >200 embedded RAMs, and same number 18 x 18 multipliers 1156 pins (balls) with >800 GP I/O 50 I/O standards, incl. LVDS with internal termination 16 low-skew global clock lines Multiple clock management circuits On-chip microprocessor(s) and multi-Gbps transceivers Latest Devices: Capacity & Features : 112 Latest Devices: Capacity & Features Xilinx Virtex-4 90nm process Up to 960 I/Os >200000 logic cells Up to 552 18kb block RAMs (~10Mb RAM) 192 DSP slices (18x18 multiplier-accumulator) 20 digital clock managers (DCM) 24 high-speed serial transceivers (622Mb/s to 11.1Gb/s) Up to four PowerPC 405 cores Altera Stratix-II 90nm process Up to 1170 I/Os 179000 logic elements 9.6Mb embedded RAM 96 DSP blocks: 380 18x18 multipliers 12 PLLs Serial I/O up to 1Gb/s No hard processor cores ALTERA : 113 ALTERA Device Families & Tools : 114 Device Families & Tools Device Roadmap : 115 Device Roadmap Technology : 116 Technology Logic Density : 117 Logic Density Pricing Roadmap : 118 Pricing Roadmap FLEX10K Basic Architecture : 119 FLEX10K Basic Architecture Logic Array Block: FLEX10K : 120 Logic Array Block: FLEX10K Logic Element of FLEX10K : 121 Logic Element of FLEX10K Advance Altera Architecture : 122 Advance Altera Architecture Stratix Device : 123 Stratix Device Stratix Device Family : 124 Stratix Device Family Altera: Embedded DSP Blocks : 125 Altera: Embedded DSP Blocks Two DSP Block columns per device Number varies by height of column Can implement: Eight 9x9 multipliers Four 18x18 multipliers One 36x36 multiplier Contains adder/subtractor/accumulator Registered inputs can become shift register Altera: Embedded DSP Block : 126 Altera: Embedded DSP Block Embedded RAM : 127 Embedded RAM Dual-Port RAM M512 – 512 x 1 M4K – 4096 x 1 M-RAM – 64K x 8 Embedded RAM Block : 128 Embedded RAM Block ALTERA High Speed I/O : 129 ALTERA High Speed I/O Embedded Processor : 130 Embedded Processor Soft Processor: NIOS 32bit @150MHz Hard Processor: ARM922T 32bit RISC @200 MHz (Excalibur device) Additional features Communication Controller Integrated MMU (Memory Management Unit) High-Speed Memory Interface C-Level Simulation Multi-Processor Support NIOS II Family : 131 NIOS II Family Max II Device : 132 Max II Device Xilinx : 133 Xilinx Product Overview : 134 Product Overview High Volume Low Cost High Performance High Density Low Power Low Cost CPLD Rom-based Xilinx FPGA Families : 135 Xilinx FPGA Families Old families XC3000, XC4000, XC5200 Old 0.5µm, 0.35µm and 0.25µm technology. Not recommended for modern designs. High-performance families Virtex (0.22µm) Virtex-E, Virtex-EM (0.18µm) Virtex-II, Virtex-II PRO (0.13µm) Low Cost Family Spartan/XL – derived from XC4000 Spartan-II – derived from Virtex Spartan-IIE – derived from Virtex-E Spartan-3 Basic FPGA Architecture Spartan-II : 136 Basic FPGA Architecture Spartan-II CLB Structure : 137 CLB Structure Contains 2 slices Each slice has 2 LUT-FF pairs with associated carry logic Two 3-state buffers (BUFT) associated with each CLB, accessible by all CLB outputs CLB Slice Structure : 138 CLB Slice Structure Each slice contains two sets of the following: Four-input LUT Any 4-input logic function, or 16-bit x 1 sync RAM or 16-bit shift register Carry & Control Fast arithmetic logic Multiplier logic Multiplexer logic Storage element Latch or flip-flop Set and reset True or inverted inputs Sync. or async. control Example: 5-Input Functions implemented using two LUTs : 139 Example: 5-Input Functions implemented using two LUTs OUT Dedicated Expansion Multiplexers : 140 Dedicated Expansion Multiplexers MUXF5 combines 2 LUTs to create Any 5-input function (LUT5) Or selected functions up to 9 inputs Or 4x1 multiplexer MUXF6 combines 2 slices to form Any 6-input function (LUT6) Or selected functions up to 19 inputs 8x1 multiplexer Distributed RAM : 141 Distributed RAM CLB LUT configurable as Distributed RAM A LUT equals 16x1 RAM Implements Single and Dual-Ports Cascade LUTs to increase RAM size Synchronous write Synchronous/Asynchronous read Accompanying flip-flops used for synchronous read Fast Carry Logic : 142 Each CLB contains separate logic and routing for the fast generation of sum & carry signals Increases efficiency and performance of adders, subtractors, accumulators, comparators, and counters Carry logic is independent of normal logic and routing resources Fast Carry Logic LSB MSB Carry Logic Routing Basic I/O Block Structure : 143 Basic I/O Block Structure Each IOB can work as uni- or bi-directional I/O Outputs can be forced into High Impedance Inputs and outputs can be registered Inputs can be delayed Advance Xilinx Architecture : 144 Advance Xilinx Architecture Virtex-II Pro : 145 Virtex-II Pro 130nm CMOS Copper Low-K 1200 I/Os, 1696 Pin Package 125,000 Logic Cells 10 Megabits of RAM 556 XTREME DSP Multipliers 16 3.125 Gbps transceivers 4 PowerPC CPUs Virtex-II Pro Vertex-II Pro : 146 Vertex-II Pro PowerPC 405 Digital Clock Management (DCM) provides 16 independent clock domains Clock divide, multiply, phase shift Enhanced Phase Locked Loops (PLLs) Routing Resources (90%) Dedicated multipliers and memory Block RAM : 147 Block RAM Most efficient memory implementation Dedicated blocks of memory Ideal for most memory requirements 4 to 14 memory blocks 4096 bits per blocks Use multiple blocks for larger memories Builds both single and true dual-port RAMs Dual-Port Bus Flexibility : 148 RAMB4_S4_S16 Port A Out 4-Bit Width Port B In 256-Bit Depth Port A In 1K-Bit Depth Port B Out 16-Bit Width DOA[3:0] DOB[15:0] WEA ENA RSTA ADDRA[9:0] CLKA DIA[3:0] WEB ENB RSTB ADDRB[7:0] CLKB DIB[15:0] Dual-Port Bus Flexibility Each port can be configured with a different data bus width Provides easy data width conversion without any additional logic Two Independent Single-Port RAMs : 149 VCC, ADDR[10:0] GND, ADDR[10:0] RAMB4_S1_S1 Port B Out 1-Bit Width Port B In 2K-Bit Depth Port A Out 1-Bit Width Port A In 2K-Bit Depth Two Independent Single-Port RAMs Can split a Dual-Port 4K RAM into two Single-Port 2K RAM Simultaneous independent access to each RAM To access the lower RAM Tie the MSB address bit to Logic Low To access the upper RAM Tie the MSB address bit to Logic High Rocket I/O : 150 Rocket I/O From 4 to 24 RocketIO MGTs per Virtex-II Pro™ device Continuous operating range 622 Mbps to 3.125 Gbps Virtex 4: 11.1 Gbps !!! Embedded Processor : 151 Embedded Processor Soft Processor: MicroBlaze 32bit @150MHz Hard Processor: IBM PowerPC405 32bit RISC @300MHz (in Vertex-II Pro) Low Power Consumption: 0.9 mW/MHz Five-Stage Data Path Pipeline Hardware Multiply/Divide Unit Thirty-Two 32-bit General Purpose Registers Memory Management Unit (MMU) Dedicated On-Chip Memory (OCM) Interface Supports IBM CoreConnect™ Bus Architecture Debug and Trace Support FPGA Design Tools : 152 FPGA Design Tools Design process (1) : 153 Design process (1) Design and implement a simple unit permitting to speed up encryption with RC5-similar cipher with fixed key set on 8031 microcontroller. Unlike in the experiment 5, this time your unit has to be able to perform an encryption algorithm by itself, executing 32 rounds….. Library IEEE; use ieee.std_logic_1164.all; use ieee.std_logic_unsigned.all; entity RC5_core is port( clock, reset, encr_decr: in std_logic; data_input: in std_logic_vector(31 downto 0); data_output: out std_logic_vector(31 downto 0); out_full: in std_logic; key_input: in std_logic_vector(31 downto 0); key_read: out std_logic; ); end AES_core; Specification (Lab Experiments) VHDL description (Your Source Files) Functional simulation Post-synthesis simulation Synthesis Design process (2) : 154 Design process (2) Implementation Configuration Timing simulation On chip testing Active-HDL : 155 Active-HDL Simulation and Synthesis Tools : 156 Simulation and Synthesis Tools Logic Synthesis : 157 architecture MLU_DATAFLOW of MLU is signal A1:STD_LOGIC; signal B1:STD_LOGIC; signal Y1:STD_LOGIC; signal MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC; begin A1<=A when (NEG_A='0') else not A; B1<=B when (NEG_B='0') else not B; Y<=Y1 when (NEG_Y='0') else not Y1; MUX_0<=A1 and B1; MUX_1<=A1 or B1; MUX_2<=A1 xor B1; MUX_3<=A1 xnor B1; with (L1 & L0) select Y1<=MUX_0 when "00", MUX_1 when "01", MUX_2 when "10", MUX_3 when others; end MLU_DATAFLOW; VHDL description Circuit netlist Logic Synthesis Features of synthesis tools : 158 Features of synthesis tools Interpret RTL code Produce synthesized circuit netlist in a standard EDIF format Give preliminary performance estimates Some can display circuit schematics corresponding to EDIF netlist Implementation : 159 Implementation After synthesis the entire implementation process is performed by FPGA vendor tools Xilinx ISE foundation 11.1i Altera Quartus II 9.2 3rd party tools for alliance version Circuit Compilation : 160 Circuit Compilation Assign a logical LUT to a physical location. Select wire segments And switches for Interconnection. 1. Technology Mapping 2. Placement 3. Routing Routing Example : 161 Routing Example Programmable Connections FPGA Configuration : 162 Configuration Once a design is implemented, you must create a file that the FPGA can understand This file is called a bit stream or configuration file The BIT file can be downloaded directly to the FPGA, or can be converted into a PROM file which stores the programming information QUESTIONS? : 163 QUESTIONS? THANK YOU