dsp processors


Presentation Description

introduction to dsp processors


Presentation Transcript



Module 1 : 

Module 1

Syllabus : 

Syllabus Architecture of TMS 320C6x functional units fetch and execute Pipelining Registers addressing modes instruction sets Timers Interrupts serial ports DMA memory

Introduction to DSP : 

Introduction to DSP A digital signal processor (DSP) is a type of microprocessor that are optimized for Digital signal Processing They Integrates system control and math-intensive functions Advantage is speed, cost and energy efficiency. It is a key component in many communication, medical, military and industrial products.

Slide 5: 

FPGA Field-Programmable Gate Arrays have the capability of being reconfigurable within a system But more expensive, have high power dissipation ASIC - Application Specific Integrated circuits can perform specific functions extremely well, and can be made quite power efficient. But since ASICS are not field-programmable, their functionality cannot be iteratively changed or updated while in product development Alternatives

Why go digital? : 

Why go digital? Digital signal processing techniques are now so powerful that sometimes it is extremely difficult, if not impossible, for analogue signal processing to achieve similar performance. Examples: FIR filter with linear phase. Adaptive filters.

Slide 7: 

With DSP it is easy to: Change applications. Correct applications. Update applications. Additionally DSP reduces: Noise susceptibility. Chip count. Development time. Cost. Power consumption.

Why do we need DSP processors? : 

Use a DSP processor when the following are required: Cost saving. Smaller size. Low power consumption. Processing of many “high” frequency signals in real-time. Why do we need DSP processors?

Applications : 


Slide 10: 

General DSP System Block Diagram PERIPHERALS Central Processing Unit Internal Memory Internal Buses External Memory

Classification of DSP : 

Classification of DSP Von Neumann's architecture Harvard architecture Super Harvard architecture



Slide 13: 

One shared memory for instructions (program) and data with one data bus and one address bus between processor and memory. Instructions and data have to be fetched in sequential order (known as the Von Neuman Bottleneck), limiting the operation bandwidth. Its design is simple It is mostly used to interface to external memory.



Slide 15: 

uses physically separate memories for their instructions and data, requiring dedicated buses for each of them. Instructions and operands can therefore be fetched simultaneously. Different program and data bus widths are possible, allowing program and data memory to be better optimized to the architectural requirements. Eg.: If the instruction format requires 14 bits then program bus and memory can be made 14-bit wide, while the data bus and data memory remain 8-bit wide.

Efficient Memory Access : 

Efficient Memory Access OR Bus General purpose processors Early DSP processors More optimized DSP processors

Classification of DSP : 

Classification of DSP Fixed point – performs integer operations Floating point – performs both integer and floating point processors It is the application that dictates which device and platform to use in order to achieve optimum performance at a low cost. For educational purposes, use the floating-point device as it can support both fixed and floating point operations. Fixed point – TMS320C1x, C2x, C5x ….. Floating point – TMS320C3x, C4x, C67x ….

Programs in C are more flexible and quicker to develop. : 

Programs in C are more flexible and quicker to develop. programs in assembly often have better performance; they run faster and use less memory, resulting in lower cost. C versus Assembly language

Slide 22: 

How complicated is the program? If it is large and intricate, you will probably want to use C. If it is small and simple, assembly may be a good choice. Are you pushing the maximum speed of the DSP? If so, assembly will give you the last drop of performance from the device. For less demanding applications, you should consider using C. C / Assembly ?

Slide 23: 

How many programmers will be working together? If the project is large enough for more than one programmer, lean toward C use in-line assembly only for time critical segments. Which is more important, product cost / development cost ? If it is product cost, choose assembly; if it is development cost, choose C. What is your background? If you are experienced in assembly (on other microprocessors), choose assembly for your DSP. If your previous work is in C, choose C for your DSP.

The Digital Signal Processor Market : 

The Digital Signal Processor Market

Digital Signal Processor market is dominated by 4 companies. : 

Digital Signal Processor market is dominated by 4 companies. Analog Devices (www.analog.com/dsp) ADSP-21xx 16 bit, fixed point ADSP-21xxx 32 bit, floating and fixed Lucent Technologies (www.lucent.com)‏ DSP16xxx 16 bit fixed point DSP32xx 32 bit floating point Motorola (www.mot.com)‏ DSP561xx 16 bit fixed point DSP560xx 24 bit, fixed point DSP96002 32 bit, floating point Texas Instruments (www.ti.com)‏ TMS320Cxx 16 bit fixed point TMS320Cxx 32 bit floating point

Slide 27: 

TMS320 Family Best Performance & Ease-of-Use

Slide 28: 

C6000 Roadmap Performance Time C64x™ DSP 2nd Generation (Fixed Point)‏ General Purpose C6414 C6415 C6416 MediaGateway 3G Wireless Infrastructure C6201 C6701 C6202 C6203 C6211 C6711 C6204 1st Generation C6205 C6712 C67x™ Fixed-point Floating-point C6411

Feature of the TMS320C6x : 

Feature of the TMS320C6x The Texas Instruments TMS320C6x family of microprocessors is one of the largest VLIW success stories to date This family of processors are built to deliver speed Family have different size, cost, memory, peripherals, power consumption specifications Fixed-point C6201 version 5-ns Instruction Cycle Time 200-MHz Clock Rate performance of up to 1600 MIPS Eight 32-Bit Instructions/Cycle Floating-point C6701 version Can operate at 167MHz 6ns Instruction cycle time 1 giga floating-point operations per second (GFLOPS) Eg:

Very Long Instruction Word (VLIW )‏ : 

Very Long Instruction Word (VLIW )‏ refers to a CPU architecture designed to take advantage of instruction level parallelism executes operation in parallel based on a fixed schedule determined when programs are compiled. the order of execution of operations (including which operations can execute simultaneously) is handled by the compiler hence the processor does not need the scheduling hardware VLIW CPUs offer significant computational power with less hardware complexity greater compiler complexity

VLIW architectures execute multiple instructions/cycle and use simple, regular instruction sets More parallelism, higher performance Better compiler targets : 

VLIW architectures execute multiple instructions/cycle and use simple, regular instruction sets More parallelism, higher performance Better compiler targets

Slide 34: 

Disadvantages of VLIW Architectures New kinds of programmer/compiler complexity Programmer (or code-generation tool) must keep track of instruction scheduling Deep pipelines and long latencies can be confusing, may make peak performance elusive Increased memory use High program memory bandwidth requirements High power consumption Misleading MIPS ratings

VelociTI™ : 

VelociTI™ VLIW modification done by TI is called VelociTI Reduces code size Increases performance when instructions reside off-chip C6X architecture is based on the high-performance advanced VelociTI very-long-instruction-word (VLIW) architecture developed by Texas Instruments (TI) an excellent choice for multichannel and multifunction applications (Several instructions captured & processed simultaneously)‏

TMS320C6x with VelociTI Enables Cost-Effective Solutions for EmergingApplications : 

TMS320C6x with VelociTI Enables Cost-Effective Solutions for EmergingApplications Unlimited Internet bandwidth Universal wireless communication New telephony features Remote medical diagnostics Automated cruise control Personal home base station Personalized home security

TMS320C6000. DSP Device Nomenclature : 

TMS320C6000. DSP Device Nomenclature

TMS320C6711 : 

TMS320C6711 A floating point processor with VLIW architecture Internal memory includes a two level cache architecture - 4KB of level 1 program cache (L1P)‏ - 4KB of level 1 data cache (L1D)‏ - 64 KB of RAM / level 2 cache for data/program (L2)‏ Has direct interface to both synchronous memories (SDRAM and SBSRAM) and asynchronous (SRAM and EPROM)‏ With 32 bit address bus , total memory space is 232 =4GB It requires 3.3v for I/O and 1.8v for core Operates at 150 MHz perform 900 million floating point operations per second (MFLOPS)‏ Translates to 1200 million instructions per second (MIPS)‏

DSK Contents : 

DSK Contents

Slide 40: 

Block diagram : 

Block diagram

CPU : 

CPU There are two sets of functional units A and B Each set contains four units and a register file. One set contains functional units .L1, .S1, .M1, and .D1 the other set contains units .D2, .M2, .S2, and .L2. .M unit : multiplication operation .L unit : logical and arithmetic operations .S unit : branch, bit manipulation and arithmetic operations .D unit : load/store and arithmetic operations

Slide 44: 

The C67x CPU executes all C62x instructions. In addition to C62x fixed-point instructions, the six out of eight functional units (.L1, .S1, .M1, .M2, .S2, and .L2) also execute floating-point instructions. The remaining two functional units (.D1 and .D2) also execute the new LDDW instruction which loads 64 bits per CPU side for a total of 128 bits per cycle.

TMS320C6711 Memory : 

TMS320C6711 Memory

3-Access level of Memory Map : 

3-Access level of Memory Map 1. L1 Memory -Cache-based Architecture -Program Cache & Data Cache -Size : PC(4Kbyte), DC(4Kbyte)‏ 2. L2 Memory - Size : 64Kbyte - Program & Data 3. L3 Memory External Memory

Slide 48: 

External Memory - Synchronous Memory (SRAM, SBSRAM)‏ - Asynchronous Memory (SDRAM, EPROM)‏ Internal Memory - Program - Data

Slide 49: 

Registers: The two register files each contain 16 32-bit registers for a total of 32 general-purpose registers (A0~A15, B0~B15)‏ Interaction with the CPU must be done through these registers The four functional units on each side of the CPU can freely share the 16 registers belonging to that side. two cross paths 1x and 2x connects all the registers on the other side (which can access data from the register files on the opposite side.) If register access is by functional units on the same side of the CPU, register file can service all the units in a single clock cycle -register access using the register file across the CPU supports one read and one write per cycle.

Slide 50: 

Registers A0,A1,B0,B1 are used as conditional registers Registers A4-A7 and B4-B7 are used for circular addressing Registers A0-A9 and B0-B9 (except B3) are temporary registers Any Registers A10-A15 and B10-B15 used are saved and later restored before returning from a subroutine Restrictions on Register Accesses

Slide 51: 

Each function unit has read/write ports Data path 1 (2) units read/write A (B) registers Data path 2 (1) can read one A (B) register per cycle 40 bit words stored in adjacent register pair Used in extended precision accumulation 32 LSB bits are stored in even register(eg.A2) and remaining 8 bits stored in the 8 LSB of next upper (odd) register(A3)‏ 64 bit is also stored in the similar fashion Two simultaneous memory accesses cannot use registers of same register file as address pointers

C6x internal buses : 

C6x internal buses

Slide 54: 

32-bit program address bus, 256-bit program data bus Two 32-bit data address (DA1, DA2)‏ Two 32-bit(64-bit for floating-point version) load data buses (LD1, LD2)‏ Two 32-bit(64-bit for floating-point version) store data buses (ST1, ST2)‏ Two 32-bit DMA data buses, two 32-bit DMA address buses Off-chip or external memory is accessed through a 22-bit address and a 32-bit data bus

'C6x Peripherals : 

'C6x Peripherals ‘C6x CPU EMIF External Memory Interface. A 32-bit bus on which external memories and other devices can be connected. It includes features like internal wait state generation and SDRAM control. The EMIF can interface to both synchronous and synchronous memories.

Slide 57: 

McBSP 2 McBSP – Multichannel buffered serial ports. Each McBSP can be used for high speed serial data transmission with external devices or reprogrammed as general purpose I/Os. McBSP1 is used to transmit and receive audio data from the AIC23 stereo codec. McBSP0 is used to control the codec through its serial control port.

Slide 58: 

On-chip PLL – generates processor clock rate from slower external clock reference. Timers – generates periodic timer events as a function of the processor clock. Used by DSP/BIOS to create time slices for multitasking. Power Down units - Save power for durations when CPU is inactive EDMA Controller – Enhanced DMA controller allows high speed data transfers without intervention from the DSP. BOOT - Boot from 4M external block - Boot from HPI/XB SBSRAM: Synchronous Burst Static Random Access Memory

Host Port Interface (HPI)‏ : 

Host Port Interface (HPI)‏ The host port interface (HPI) is a parallel port through which a host processor can directly access the CPU’s memory space. The host device is the master of the interface, therefore increasing its ease of access. The host and the CPU can exchange information via internal or external memory. In addition, the host has direct access to memory-mapped peripherals. Connectivity to the CPU’s memory space is provided through the DMA controller. Expansion bus (XB) is a replacement for the HPI, as well as an expansion of the EMIF. The expansion provides two distinct areas of functionality (host port and I/O port) which can co-exist in a system

Slide 60: 

CPU operations Fetch instruction from memory (DSP program memory)‏ Decode instruction Execute instruction including reading data values

Program Fetch (F)‏ : 

Program Fetch (F)‏ Program fetching consists of 4 phases generate fetch address (PG)‏ send address to memory (PS)‏ wait for data ready (PW)‏ read opcode (PR)‏ C6x Memory PG PS PW PR

Decode Stage (D)‏ : 

Decode Stage (D)‏ Decode stage consists of two phases dispatch instruction to functional unit (DP)‏ instruction decoded at functional unit (DC)‏ C6x Memory PG PS PW PR DC DP

Execute Stage (E)‏ : 

Execute Stage (E)‏ An execute packet (EP) consists of a group of instructions that can be executed in parallel within the same cycle Number of EP within a fetch packet can vary from one (with 8 parallel instructions) to 8 (with no parallel instructions)‏ bit 0 (LSB) of every 32 bit instruction determines if the next instruction belongs to same EP or not if 1 – same EP if 0 – part of next EP

Slide 64: 

FETCH and EXECUTION PACKETS (Fetch packet consists of 8 32-bit instructions)‏ Consider an FP with three EP: Instruction A II Instruction B instruction C II Instruction D II Instruction E Instruction F II Instruction G II Instruction H A D E F G H C B 1 1 1 1 1 0 0 0 31 0 31 0 31 0 31 0 31 0 31 0 31 0 31 0 31 0 In the fetch packet , EP1 contains 2 parallel instructions, EP2 contains 3 and EP3 has 3 parallel instructions

Pipelining : 

Pipelining Overlap operations to increase performance Pipeline CPU operations to increase clock speed over a sequential implementation Separate parallel functional units Peripheral interfaces for I/O do not burden CPU It is a key feature in DSP to get parallel instructions working properly Requires careful timing

Slide 66: 

non-pipelined scalar architecture - A processor that executes every instruction one after the other - may use processor resources inefficiently, potentially leading to poor performance. pipelining - executing different sub-steps of sequential instructions simultaneously superscalar architectures - executing multiple instructions entirely simultaneously

Basic Ideas : 

Basic Ideas Parallel processing Pipelined processing a1 a2 a3 a4 b1 b2 b3 b4 c1 c2 c3 c4 d1 d2 d3 d4 a1 b1 c1 d1 a2 b2 c2 d2 a3 b3 c3 d3 a4 b4 c4 d4 P1 P2 P3 P4 P1 P2 P3 P4 time Colors: different types of operations performed a, b, c, d: different data streams processed Less inter-processor communication Complicated processor hardware time More inter-processor communication Simpler processor hardware

Slide 69: 

Pipelining does not decrease the time for individual instruction execution. Instead, it increases instruction throughput. The throughput of the instruction pipeline is determined by how often an instruction exits the pipeline If the stages are perfectly balanced, then the time per instruction on the pipelined machine is equal to Time per instruction on nonpipelined machine Number of pipe stages

Slide 70: 

There are 3 stages of pipelining: Program fetch – composed of 4 phases PG – program address generate to fetch an address PS – program address send to send the address PW – program address ready wait to wait for data PR – program fetch packet receive to read opcode from memory Decode stage – composed of 2 phases DP – dispatch all the instructions within an FP to the appropriate functional units DC – instruction decode Execute stage – composed of 6 (fixed point)-10 (floating point)‏ a) multiplication instruction consists of 2 phases due to 1 delay b) load instruction consists of 5 phases due to 4 delays c) branch instruction consists of 6 phases due to 5 delays

Slide 71: 

Pipeline phases Program fetch decode execute PG PS PW PR DP DC E1- E6 (E1-E10 for double precision)‏ Pipelining effects Clock cycles 1 2 3 4 5 6 7 8 9 10 11 12 PG PS PW PR DP DC E1 E2 E3 E4 E5 E6 PG PS PW PR DP DC E1 E2 E3 E4 E5 E6 PG PS PW PR DP DC E1 E2 E3 E4 PG PS PW PR DP DC E1 E2 E3 E4 PG PS PW PR DP DC E1 E2 E3 E4 PG PS PW PR DP DC E1 E2 E3 E4 PG PS PW PR DP DC E1 E2 E3 E4

Slide 72: 

Each row represents an FP PG of first FP starts in cycle 1,PG of second FP starts in cycle 2 and so on…. Each FP has 4 phases for fetch ,2 phases for decode and execution phases can take from 1 to 10 phases At cycle 7, instruction in the first FP are in the first execution phase E1, instruction in the second FP is in decoding phase, instruction in the third FP is in dispatching phase and so on….. All the instructions are proceeding through various phases Therefore pipeline is FULL

Slide 73: 

Most instructions have 1 execute phase Multiply (MPY) has 2 Load (LDH/LDW) has 5 Branch (B) has 6 phases Additional execute phases are associated with floating point and double precision type instructions (upto 10 phases)‏ eg: MPYDP has 9 delay slots and a total 10 phases Functional unit latency: The number of cycles that an instruction ties up a functional unit. it is 1 for all instructions except double precision instructions no other instructions can use the functional unit it is different from delay slot eg: MPYDP has 4 functional unit latency but 9 delay slots delay slot: some instructions that are physically after the instruction are executed as if they were located before it. Classic examples are branch and call instructions, which often execute the following instruction before the branch or call is performed.

Instruction Set : 

Instruction Set Assembly code format: Label II [ ] Instruction Unit operands ; comments A Label represents a specific address/memory location that contains an instruction or data (label must be in the first column)‏ Parallel bars (II) are used if the instructions are being executed parallel with the previous instructions this field ([ ]) is optional to make the associated instruction conditional - 5 registers are used as conditional registers - [A2] specifies that the associated instruction executes if A2 is not zero - [!A2] associated instructions are executed if A2 is zero

Slide 75: 

instruction field can be assembler directive or mnemonic - assembler directive is a command for assembler .short : initialize 16 bit integer .int : initialize 32 bit integer .float : initialize 32 bit IEEE single precision constant - mnemonic is an actual instruction that executes at run time Unit field can be any one of the 8 functional units (optional)‏ Comments starting in column 1 begin with an asterisk or a semicolon whereas comments starting in any other column must begin with a semicolon ADD .L1 A3,A7,A7 ; add A3+A7 A7 MPY .M2 A7,B7,B6 ; multiply 16 LSBs of A7,B7 B7 II MPYH .M1 A7,B7,A6 ; multiply 16 MSBs of A7,B7 A6 Eg:

Instruction set : 

Instruction set They are designed to make maximum use of the processors’ resources and at the same time minimize the memory space required to store the instructions. Minimizing the storage space ensures the cost effectiveness of the overall system. To ensure the maximum use of hardware of the DSP, the instructions are designed to perform several parallel operations in a single instruction, typically including fetching of data in parallel with main arithmetic operation.

Slide 77: 

Instructions are kept short by restricting which register can be used with which operations and which operations can be combined in an instruction. Some of the latest processors use VLIW architectures, where in multiple instructions are issued and executed per cycle. In such architectures the instructions are short and designed to perform much less work thus requiring less memory and increased speed because of the VLIW architecture.

'C6x Instruction Set (by category)‏ : 

'C6x Instruction Set (by category)‏

'C6x Instruction Set (by unit)‏ : 

'C6x Instruction Set (by unit)‏

‘C67x Add’l Instructions (by unit)‏ : 

‘C67x Add’l Instructions (by unit)‏

Control Register File : 

Control Register File

Slide 82: 

Addressing mode register (AMR) - specifies the addressing mode Control status register (CSR) - contains control and status bits. Interrupt clear register (ICR) - allows you to manually clear the maskable interrupts (INT15-INT4) in the interrupt flag register (IFR). - Writing a 1 to any of the bits in ICR causes the corresponding interrupt flag (IFn) to be cleared in IFR. - Writing a 0 to any bit in ICR has no effect. - You cannot set any bit in ICR to affect NMI or reset. Interrupt enable register (IER) - enables and disables individual interrupts.

Slide 83: 

The interrupt flag register (IFR) - contains the status of INT4-INT15 and NMI interrupt. - Each corresponding bit in the IFR is set to 1 when that interrupt occurs; otherwise, the bits are cleared to 0. - If you want to check the status of interrupts, use the MVC instruction to read the IFR. The interrupt return pointer register (IRP) - contains the return pointer that directs the CPU to the proper location to continue program execution after processing a maskable interrupt. - A branch using the address in IRP (B IRP) in your interrupt service routine returns to the program flow when interrupt servicing is complete.

Slide 84: 

The interrupt set register (ISR) - allows you to manually set the maskable interrupts (INT15-INT4) in the interrupt flag register (IFR). - Writing a 1 to any of the its in ISR causes the corresponding interrupt flag (IFn) to be set in IFR. - Writing a 0 to any bit in ISR has no effect. - You cannot set any bit in ISR to affect NMI or reset. The interrupt service table pointer register (ISTP) - is used to locate the interrupt service routine (ISR). The NMI return pointer register (NRP) - contains the return pointer that directs the CPU to the proper location to continue program execution after NMI processing. - A branch using the address in NRP (B NRP) in your interrupt service routine returns to the program flow when NMI servicing is complete. The E1 phase program counter (PCE1)‏ - contains the 32-bit address of the fetch packet in the E1 pipeline phase.

Addressing modes : 

Addressing modes Determines how one access memory Addressing refers to means to specify location of operands for instructions - types of addressing are called addressing modes - operands may be input operands for the operation as well as results of the operation Addressing modes supported by the TMS320C67x include register-indirect, indexed register-indirect, and modulo addressing (circular addressing). Immediate data is also supported. The TMS320C67x does not support modulo addressing for 64-bit data.

Slide 86: 

ADD .L1 -13,A1,A6 (implied) ADD .L1 A7,A6,A7 not supported LDW .L1 *A5++[8],A1 Immediate The operand is part of the instruction Register The operand is specified in a register Direct The address of the operand is part of the instruction (added to imply memory page)‏ Indirect The address of the operand is stored in a register

Register-Indirect Addressing : 

Register-Indirect Addressing Operand is located in memory address stored in a register Special group of registers can be used to store addresses (address registers)‏ Most important addressing mode in DSPs Efficient from instruction set point of view Few bits are needed to indicate address of operand can be used with or without displacement 32 registers(A0-A15,B0-B15) are used as pointers Indirect addressing uses ‘*’ in conjunction with one of the 32 registers

Slide 88: 

1. *R – register R contains address of a memory location where a data value is stored 2. *R++ (d) - register R contains memory address - after the memory address is used, R is postincremented such that new address is R+1 if d=1 - double minus (- -) update the address by d-1 3. * ++ R(d) - address is preincremented or offset by d - current address is R+d or R-d 4. * + R(d) - address is preincremented by d, such that the current address is R+d - however R pre increments without modification - unlike previous case, R is not updated or modified

Circular addressing : 

Circular addressing Circular addressing is used to create a circular buffer Buffer is created in hardware and is very useful for applications like digital filtering This addressing mode in conjunction with circular buffer updates samples by shifting data without creating overhead as in direct shifting When pointer reaches bottom location, and when incremented the pointer is automatically wrapped around to the top location Two independent buffers are available using BK0 and BK1 within the AMR register Registers A4-A7 and B4-B7 in conjunction with .D unit can be used as pointers MVC (move constant) is the only instruction to access AMR and other control registers

Circular Buffer : 

Circular Buffer At the beginning of each sample period, a new sample will be read into the circular buffer,overwriting the oldest sample. The newest sample x(n) will be stored at the memory location pointed at by auxiliary register AR(i).

Slide 91: 

The need of processing the digital signals in real time, evolves the concept of Circular Buffering. Circular buffers are used to store the most recent values of a continually updated signal. Circular buffering allows processors to access a block of data sequentially and then automatically wrap around to the beginning address exactly the pattern used to access coefficients in FIR filter. Circular buffering also very helpful in implementing first-in, first-out buffers, commonly used for I/O and for FIR delay lines.

Addressing Mode Register (AMR) : 

Addressing Mode Register (AMR) For each of the eight registers (A4–A7, B4–B7) that can perform linear or circular addressing, the addressing mode register (AMR) specifies the addressing mode. A 2-bit field for each register selects the address modification mode: linear (the default) or circular mode. With circular addressing, the field also specifies which BK (block size) field to use for a circular buffer. In addition, the buffer must be aligned on a byte boundary equal to the block size.

Slide 93: 

AMR mode and description Mode description 00 for linear addressing 01 for circular addressing using BK0 For circular addressing using BK1 reserved

Slide 98: 

Block size = 2N+1 bytes

Slide 99: 

Eg: MVK .S2 0X0004,B2 ; lower 16 bits to B2 MVKLH .S2 0x0005,B2 ; upper 16 bits to B2 The value 0x0004 =(0100) into 16 LSB of AMR sets bit 2 (third bit) to 1 and all other bits to zero. This sets the mode to 01 and selects register A5 as pointer to buffer using BK0 The value 0x0005 =(0101) into 16 MSB of AMR sets bits 16 and 18 to 1. This corresponds to value of N used to select size of buffer = 2 N+1 = 64 bytes using BKO

Interrupts : 

Interrupts The C6711device supports 16 prioritized interrupts Types of interrupts: Reset Maskable Non maskable

Slide 101: 

Reset (RESET)‏ Reset is the highest priority interrupt and is used to halt the CPU and return it to a known state. The reset interrupt is unique in a number of ways: - RESET is an active-low signal. All other interrupts are active-high signals. - RESET must be held low for 10 clock cycles before it goes high again to reinitialize the CPU properly. - The instruction execution in progress is aborted and all registers are returned to their default states. - RESET is not affected by branches.

Slide 102: 

Nonmaskable Interrupt (NMI)‏ - NMI is the second-highest priority interrupt - generally used to alert the CPU of a serious hardware problem such as imminent power failure. - For NMI processing to occur, the non maskable interrupt enable (NMIE) bit in the interrupt enable register must be set to 1.

Slide 103: 

Maskable Interrupts (INT4−INT15)‏ - These have lower priority than the NMI and reset interrupts. - These interrupts can be associated with external devices, on-chip peripherals, software control etc. The interrupt source for interrupts 4-15 can be programmed by modifying the selector value (binary value) in the corresponding fields of the Interrupt Selector Control registers: MUXH (address 0x019C0000) and MUXL (address 0x019C0004).

Interrupt Priority : 

Interrupt Priority

Slide 105: 

Multichannel Buffered Serial Port (McBSP)‏ The standard serial port interface provides: Full-duplex communication Double-buffered data registers, which allow a continuous data stream Independent framing and clocking for reception and transmission Direct interface to industry-standard codecs, analog interface chips (AICs), and other serially connected A/D and D/A devices - Multi channel transmission and reception of up to 128 channels. An element sizes of 8, 12, 16, 20, 24, or 32-bit. - 8-bit data transfers with LSB or MSB first.

Slide 107: 

The McBSP consists of a data path and a control path that connect to external devices. Separate pins for transmission and reception communicate data to these external devices. Four other pins communicate control information (clocking and frame synchronization). The device communicates to the McBSP using 32-bit-wide control and data registers accessible via the internal peripheral bus. Pin Description CLKR Receive clock CLKX Transmit clock CLKS External clock DR Received serial data DX Transmitted serial data FSR Receive frame synchronization FSX Transmit frame synchronization

Slide 108: 

CPU or DMA write the DATA to be transmitted to the Data transmit register (DXR) which is shifted out to DX via the transmit shift register (XSR). Similarly, receive data on the DR pin is shifted into the receive shift register (RSR) and copied into the receive buffer register (RBR). RBR is then copied to DRR, which can be read by the CPU or the DMA controller. This allows internal data movement and external data communications simultaneously. The following control registers are used in multichannel operation: The multi channel control register (MCR)‏ The transmit channel enable register (XCER)‏ The receive channel enable register (RCER)‏

Slide 109: 

Other registers for clock generation, frame synchronization and control are: serial port control register (SPCR)‏ receive control register (RCR)‏ transmit control register (XCR)‏ pin control register (PCR)‏ Sample rate generator register (SRGR)‏

DMA : 

DMA Direct Memory Access transfers data to or from the processor’s memory without the involvement of the processor itself. DMA is commonly used to provide improved performance with input/output devices. Rather than have the processor read data from an I/O device and copy the data into memory or vice versa, a separate DMA controller can handle such transfers in parallel. The processor loads the DMA controller with control information including the starting address for the transfer, the number of words to be transferred, the source and the destination.

Slide 111: 

The DMA controller uses the bus request pin to notify the DSP core that it is ready to make a transfer to or from external memory. The DSP core completes its current instruction, releases control of external memory and signals the DMA controller via the bus grant pin that the DMA transfer can proceed. The DMA controller then transfers the specified number of data words and optionally signals completion through an interrupt. Some processor can also have multiple channels DMA managing DMA transfers in parallel.

Timer : 

Timer The ’C67x has two 32-bit general-purpose timers that can be used to: Time events Count events Generate pulses Interrupt the CPU Send synchronization events to the DMA controller

Slide 114: 

The timer works in one of the two signaling modes depending on whether clocked by an internal or an external source. The timer has an input pin (TINP) and an output pin (TOUT). The TINP pin can be used as a general purpose input, and the TOUT pin can be used as a general-purpose output. When an internal clock is provided, the timer generates timing sequences to trigger peripheral or external devices such as DMA controller or A/D converter respectively. When an external clock is provided, the timer can count external events and interrupt the CPU after a specified number of events.

Load/Store Options : 

Load/Store Options In 'C6x the instruction set supports several types of load/store instructions:

LDH .D2 *B2++,B7II LDH .D1 *A2++,A7loads 16 bits(half word) into B7 whose address in memory is specified by B2load into A7 the content in memory specified by A7 STW .D2 A1,*+A4[20]stores 32 bit word A1 into memory whose address is specified by A4 offset by 20(32 bits) or 80 bytes : 

LDH .D2 *B2++,B7II LDH .D1 *A2++,A7loads 16 bits(half word) into B7 whose address in memory is specified by B2load into A7 the content in memory specified by A7 STW .D2 A1,*+A4[20]stores 32 bit word A1 into memory whose address is specified by A4 offset by 20(32 bits) or 80 bytes

Load, and Store Paths : 

Load, and Store Paths The C67x DSP has two 32-bit paths for loading data from memory to the register File: LD1 for register file A, and LD2 for register file B. The C67x DSP also has a second 32-bit load path for both register files A and B. This allows the LDDW instruction to simultaneously load two 32-bit values into register file A and two 32-bit values into register file B. For side A, LD1a is the load path for the 32 LSBs and LD1b is the load path for the 32 MSBs. For side B, LD2a is the load path for the 32 LSBs and LD2b is the load path for the 32 MSBs. There are also two 32-bit paths, ST1 and ST2, for storing register values to memory from each register file.