DSP PROCESSORS-I : DSP PROCESSORS-I Module 1 : Module 1 Syllabus : Syllabus Architecture of TMS 320C6x
fetch and execute
memory Introduction to DSP : Introduction to DSP A digital signal processor (DSP) is a type of microprocessor that are optimized for Digital signal Processing
They Integrates system control and math-intensive functions
Advantage is speed, cost and energy efficiency.
It is a key component in many communication, medical, military and industrial products. Slide 5: FPGA
Field-Programmable Gate Arrays
have the capability of being reconfigurable within a system
But more expensive, have high power dissipation
- Application Specific Integrated circuits
can perform specific functions extremely well, and can be made quite power efficient.
But since ASICS are not field-programmable, their functionality cannot be iteratively changed or updated while in product development Alternatives Why go digital? : Why go digital? Digital signal processing techniques are now so powerful that sometimes it is extremely difficult, if not impossible, for analogue signal processing to achieve similar performance.
FIR filter with linear phase.
Adaptive filters. Slide 7: With DSP it is easy to:
Additionally DSP reduces:
Power consumption. Why do we need DSP processors? : Use a DSP processor when the following are required:
Low power consumption.
Processing of many “high” frequency signals in real-time. Why do we need DSP processors? Applications : Applications Slide 10: General DSP System Block Diagram PERIPHERALS Central
Unit Internal Memory Internal Buses External
Memory Classification of DSP : Classification of DSP Von Neumann's architecture
Super Harvard architecture VON NEUMANN'S ARCHITECTURE : VON NEUMANN'S ARCHITECTURE Slide 13: One shared memory for instructions (program) and data with one data bus and one address bus between processor and memory.
Instructions and data have to be fetched in sequential order (known as the Von Neuman Bottleneck), limiting the operation bandwidth.
Its design is simple
It is mostly used to interface to external memory. HARVARD ARCHITECTURE : HARVARD ARCHITECTURE Slide 15: uses physically separate memories for their instructions and data, requiring dedicated buses for each of them.
Instructions and operands can therefore be fetched simultaneously.
Different program and data bus widths are possible, allowing program and data memory to be better optimized to the architectural requirements.
Eg.: If the instruction format requires 14 bits then program bus and memory can be made 14-bit wide, while the data bus and data memory remain 8-bit wide. Efficient Memory Access : Efficient Memory Access OR Bus General purpose processors Early DSP processors More optimized DSP processors Classification of DSP : Classification of DSP Fixed point – performs integer operations
Floating point – performs both integer and floating point processors It is the application that dictates which device and platform to use in order to achieve optimum performance at a low cost.
For educational purposes, use the floating-point device as it can support both fixed and floating point operations. Fixed point – TMS320C1x, C2x, C5x …..
Floating point – TMS320C3x, C4x, C67x …. Programs in C are more flexible and quicker to develop. : Programs in C are more flexible and quicker to develop. programs in assembly often have better performance;
they run faster and use less memory, resulting in lower cost. C versus Assembly language Slide 22: How complicated is the program?
If it is large and intricate, you will probably want to use C.
If it is small and simple, assembly may be a good choice.
Are you pushing the maximum speed of the DSP?
If so, assembly will give you the last drop of performance from the device.
For less demanding applications, you should consider using C. C / Assembly ? Slide 23: How many programmers will be working together?
If the project is large enough for more than one programmer, lean toward C
use in-line assembly only for time critical segments.
Which is more important, product cost / development cost ?
If it is product cost, choose assembly;
if it is development cost, choose C.
What is your background?
If you are experienced in assembly (on other microprocessors), choose assembly for your DSP.
If your previous work is in C, choose C for your DSP. The Digital Signal Processor Market : The Digital Signal Processor Market Digital Signal Processor market is dominated by 4 companies. : Digital Signal Processor market is dominated by 4 companies. Analog Devices (www.analog.com/dsp)
ADSP-21xx 16 bit, fixed point
ADSP-21xxx 32 bit, floating and fixed
Lucent Technologies (www.lucent.com)
DSP16xxx 16 bit fixed point
DSP32xx 32 bit floating point
DSP561xx 16 bit fixed point
DSP560xx 24 bit, fixed point
DSP96002 32 bit, floating point
Texas Instruments (www.ti.com)
TMS320Cxx 16 bit fixed point
TMS320Cxx 32 bit floating point Slide 27: TMS320 Family Best Performance &
Ease-of-Use Slide 28: C6000 Roadmap Performance Time C64x™ DSP 2nd Generation (Fixed Point) General Purpose C6414 C6415 C6416 MediaGateway 3G Wireless Infrastructure C6201 C6701 C6202 C6203 C6211 C6711 C6204 1st Generation C6205 C6712 C67x™ Fixed-point Floating-point C6411 Feature of the TMS320C6x : Feature of the TMS320C6x The Texas Instruments TMS320C6x family of microprocessors is one of the largest VLIW success stories to date
This family of processors are built to deliver speed
Family have different size, cost, memory, peripherals, power consumption specifications
Fixed-point C6201 version
5-ns Instruction Cycle Time
200-MHz Clock Rate
performance of up to 1600 MIPS
Eight 32-Bit Instructions/Cycle
Floating-point C6701 version
Can operate at 167MHz
6ns Instruction cycle time
1 giga floating-point operations per second (GFLOPS) Eg: Very Long Instruction Word (VLIW ) : Very Long Instruction Word (VLIW ) refers to a CPU architecture designed to take advantage of instruction level parallelism
executes operation in parallel based on a fixed schedule determined when programs are compiled.
the order of execution of operations (including which operations can execute simultaneously) is handled by the compiler hence the processor does not need the scheduling hardware
VLIW CPUs offer significant computational power with less hardware complexity
greater compiler complexity VLIW architectures execute multiple instructions/cycle and use simple, regular instruction sets More parallelism, higher performance Better compiler targets : VLIW architectures execute multiple instructions/cycle and use simple, regular instruction sets More parallelism, higher performance Better compiler targets Slide 34: Disadvantages of VLIW Architectures
New kinds of programmer/compiler complexity
Programmer (or code-generation tool) must keep
track of instruction scheduling
Deep pipelines and long latencies can be confusing, may make peak performance elusive
Increased memory use
High program memory bandwidth requirements
High power consumption
Misleading MIPS ratings VelociTI™ : VelociTI™ VLIW modification done by TI is called VelociTI
Reduces code size
Increases performance when instructions reside off-chip
C6X architecture is based on the high-performance advanced VelociTI very-long-instruction-word (VLIW) architecture developed by Texas Instruments (TI)
an excellent choice for multichannel and multifunction applications (Several instructions captured & processed simultaneously) TMS320C6x with VelociTI Enables Cost-Effective Solutions for EmergingApplications : TMS320C6x with VelociTI Enables Cost-Effective Solutions for EmergingApplications Unlimited Internet bandwidth
Universal wireless communication
New telephony features
Remote medical diagnostics
Automated cruise control
Personal home base station
Personalized home security TMS320C6000. DSP Device Nomenclature : TMS320C6000. DSP Device Nomenclature TMS320C6711 : TMS320C6711 A floating point processor with VLIW architecture
Internal memory includes a two level cache architecture
- 4KB of level 1 program cache (L1P)
- 4KB of level 1 data cache (L1D)
- 64 KB of RAM / level 2 cache for data/program (L2)
Has direct interface to both synchronous memories (SDRAM and SBSRAM) and asynchronous (SRAM and EPROM)
With 32 bit address bus , total memory space is 232 =4GB
It requires 3.3v for I/O and 1.8v for core
Operates at 150 MHz
perform 900 million floating point operations per second (MFLOPS)
Translates to 1200 million instructions per second (MIPS) DSK Contents : DSK Contents Slide 40: Block diagram : Block diagram CPU : CPU There are two sets of functional units A and B
Each set contains four units and a register file.
One set contains functional units .L1, .S1, .M1, and .D1
the other set contains units .D2, .M2, .S2, and .L2.
.M unit : multiplication operation
.L unit : logical and arithmetic operations
.S unit : branch, bit manipulation and arithmetic operations
.D unit : load/store and arithmetic operations Slide 44: The C67x CPU executes all C62x instructions.
In addition to C62x fixed-point instructions, the six out of
eight functional units (.L1, .S1, .M1, .M2, .S2, and .L2)
also execute floating-point instructions.
The remaining two functional units (.D1 and .D2) also
execute the new LDDW instruction which loads 64 bits
per CPU side for a total of 128 bits per cycle. TMS320C6711 Memory : TMS320C6711 Memory 3-Access level of Memory Map : 3-Access level of Memory Map 1. L1 Memory
-Program Cache & Data Cache
-Size : PC(4Kbyte), DC(4Kbyte)
2. L2 Memory
- Size : 64Kbyte
- Program & Data
3. L3 Memory
External Memory Slide 48: External Memory
- Synchronous Memory
- Asynchronous Memory
- Data Slide 49: Registers:
The two register files each contain 16 32-bit registers for a total of 32 general-purpose registers (A0~A15, B0~B15)
Interaction with the CPU must be done through these registers
The four functional units on each side of the CPU can freely share the 16 registers belonging to that side.
two cross paths 1x and 2x connects all the registers on the other side
(which can access data from the register files on the opposite side.)
If register access is by functional units on the same side of the CPU, register file can service all the units in a single clock cycle
-register access using the register file across the CPU supports one read and one write per cycle. Slide 50: Registers A0,A1,B0,B1 are used as conditional registers
Registers A4-A7 and B4-B7 are used for circular addressing
Registers A0-A9 and B0-B9 (except B3) are temporary registers
Any Registers A10-A15 and B10-B15 used are saved and later restored before returning from a subroutine Restrictions on Register Accesses Slide 51: Each function unit has read/write ports
Data path 1 (2) units read/write A (B) registers
Data path 2 (1) can read one A (B) register per cycle
40 bit words stored in adjacent register pair
Used in extended precision accumulation
32 LSB bits are stored in even register(eg.A2) and remaining 8 bits stored in the 8 LSB of next upper (odd) register(A3)
64 bit is also stored in the similar fashion
Two simultaneous memory accesses cannot use registers of same register file as address pointers C6x internal buses : C6x internal buses Slide 54: 32-bit program address bus, 256-bit program data bus
Two 32-bit data address (DA1, DA2)
Two 32-bit(64-bit for floating-point version) load data buses (LD1, LD2)
Two 32-bit(64-bit for floating-point version) store data buses (ST1, ST2)
Two 32-bit DMA data buses, two 32-bit DMA address buses
Off-chip or external memory is accessed through a 22-bit address and a 32-bit data bus 'C6x Peripherals : 'C6x Peripherals ‘C6x
External Memory Interface.
A 32-bit bus on which external memories and other devices can be
It includes features like internal wait state generation and SDRAM control.
The EMIF can interface to both synchronous and synchronous memories. Slide 57: McBSP 2 McBSP – Multichannel buffered serial ports.
Each McBSP can be used for high speed serial data transmission with external devices or reprogrammed as general purpose I/Os.
McBSP1 is used to transmit and receive audio data from the AIC23 stereo codec.
McBSP0 is used to control the codec through its serial control port. Slide 58: On-chip PLL – generates processor clock rate from slower external clock reference.
Timers – generates periodic timer events as a function of the processor clock. Used by DSP/BIOS to create time slices for multitasking.
Power Down units - Save power for durations when CPU is inactive
EDMA Controller – Enhanced DMA controller allows high speed data transfers without intervention from the DSP.
BOOT - Boot from 4M external block - Boot from HPI/XB
SBSRAM: Synchronous Burst Static Random Access Memory Host Port Interface (HPI) : Host Port Interface (HPI) The host port interface (HPI) is a parallel port through which a host processor can directly access the CPU’s memory space.
The host device is the master of the interface, therefore increasing its ease of access.
The host and the CPU can exchange information via internal or external memory.
In addition, the host has direct access to memory-mapped peripherals.
Connectivity to the CPU’s memory space is provided through the DMA controller.
Expansion bus (XB) is a replacement for the HPI, as well as an expansion of the EMIF.
The expansion provides two distinct areas of functionality (host
port and I/O port) which can co-exist in a system Slide 60: CPU operations
Fetch instruction from memory (DSP program memory)
Execute instruction including reading data values Program Fetch (F) : Program Fetch (F) Program fetching consists of 4 phases
generate fetch address (PG)
send address to memory (PS)
wait for data ready (PW)
read opcode (PR) C6x Memory PG PS PW PR Decode Stage (D) : Decode Stage (D) Decode stage consists of two phases
dispatch instruction to functional unit (DP)
instruction decoded at functional unit (DC) C6x Memory PG PS PW PR DC DP Execute Stage (E) : Execute Stage (E) An execute packet (EP) consists of a group of instructions that can be executed in parallel within the same cycle
Number of EP within a fetch packet can vary from one (with 8 parallel instructions) to 8 (with no parallel instructions)
bit 0 (LSB) of every 32 bit instruction determines if the next instruction belongs to same EP or not
if 1 – same EP
if 0 – part of next EP Slide 64: FETCH and EXECUTION PACKETS
(Fetch packet consists of 8 32-bit instructions)
Consider an FP with three EP:
Instruction A II Instruction B
instruction C II Instruction D II Instruction E
Instruction F II Instruction G II Instruction H A D E F G H C B 1 1 1 1 1 0 0 0 31 0 31 0 31 0 31 0 31 0 31 0 31 0 31 0 31 0 In the fetch packet ,
EP1 contains 2 parallel instructions,
EP2 contains 3
EP3 has 3 parallel instructions Pipelining : Pipelining Overlap operations to increase performance
Pipeline CPU operations to increase clock speed over a sequential implementation
Separate parallel functional units
Peripheral interfaces for I/O do not burden CPU It is a key feature in DSP to get parallel instructions working properly
Requires careful timing Slide 66: non-pipelined scalar architecture
- A processor that executes every instruction one after the other
- may use processor resources inefficiently, potentially leading to poor performance.
- executing different sub-steps of sequential instructions simultaneously
- executing multiple instructions entirely simultaneously Basic Ideas : Basic Ideas Parallel processing Pipelined processing a1 a2 a3 a4 b1 b2 b3 b4 c1 c2 c3 c4 d1 d2 d3 d4 a1 b1 c1 d1 a2 b2 c2 d2 a3 b3 c3 d3 a4 b4 c4 d4 P1
P4 time Colors: different types of operations performed
a, b, c, d: different data streams processed Less inter-processor communication
Complicated processor hardware time More inter-processor communication
Simpler processor hardware Slide 69: Pipelining does not decrease the time for individual instruction execution. Instead, it increases instruction throughput.
The throughput of the instruction pipeline is determined by how often an instruction exits the pipeline
If the stages are perfectly balanced, then the time per instruction on the pipelined machine is equal to
Time per instruction on nonpipelined machine Number of pipe stages Slide 70: There are 3 stages of pipelining:
Program fetch – composed of 4 phases
PG – program address generate to fetch an address
PS – program address send to send the address
PW – program address ready wait to wait for data
PR – program fetch packet receive to read opcode from memory
Decode stage – composed of 2 phases
DP – dispatch all the instructions within an FP to the appropriate functional units
DC – instruction decode
Execute stage – composed of 6 (fixed point)-10 (floating point)
a) multiplication instruction consists of 2 phases due to 1 delay
b) load instruction consists of 5 phases due to 4 delays
c) branch instruction consists of 6 phases due to 5 delays Slide 71: Pipeline phases Program fetch decode execute PG PS PW PR DP DC E1- E6 (E1-E10 for double precision) Pipelining effects Clock cycles 1 2 3 4 5 6 7 8 9 10 11 12 PG PS PW PR DP DC E1 E2 E3 E4 E5 E6 PG PS PW PR DP DC E1 E2 E3 E4 E5 E6 PG PS PW PR DP DC E1 E2 E3 E4 PG PS PW PR DP DC E1 E2 E3 E4 PG PS PW PR DP DC E1 E2 E3 E4 PG PS PW PR DP DC E1 E2 E3 E4 PG PS PW PR DP DC E1 E2 E3 E4 Slide 72: Each row represents an FP
PG of first FP starts in cycle 1,PG of second FP starts in cycle 2 and so on….
Each FP has 4 phases for fetch ,2 phases for decode and execution phases can take from 1 to 10 phases
At cycle 7,
instruction in the first FP are in the first execution phase E1,
instruction in the second FP is in decoding phase,
instruction in the third FP is in dispatching phase
and so on…..
All the instructions are proceeding through various phases
Therefore pipeline is FULL Slide 73: Most instructions have 1 execute phase
Multiply (MPY) has 2
Load (LDH/LDW) has 5
Branch (B) has 6 phases
Additional execute phases are associated with floating point and double precision type instructions (upto 10 phases)
eg: MPYDP has 9 delay slots and a total 10 phases
Functional unit latency:
The number of cycles that an instruction ties up a functional unit.
it is 1 for all instructions except double precision instructions
no other instructions can use the functional unit
it is different from delay slot
eg: MPYDP has 4 functional unit latency but 9 delay slots
delay slot: some instructions that are physically after the instruction are executed as if they were located before it.
Classic examples are branch and call instructions, which often execute the following instruction before the branch or call is performed. Instruction Set : Instruction Set Assembly code format:
Label II [ ] Instruction Unit operands ; comments A Label represents a specific address/memory location that contains an instruction or data (label must be in the first column)
Parallel bars (II) are used if the instructions are being executed parallel with the previous instructions
this field ([ ]) is optional to make the associated instruction conditional
- 5 registers are used as conditional registers
- [A2] specifies that the associated instruction executes if A2 is not zero
- [!A2] associated instructions are executed if A2 is zero Slide 75: instruction field can be assembler directive or mnemonic
- assembler directive is a command for assembler
.short : initialize 16 bit integer
.int : initialize 32 bit integer
.float : initialize 32 bit IEEE single precision constant
- mnemonic is an actual instruction that executes at run time
Unit field can be any one of the 8 functional units (optional)
Comments starting in column 1 begin with an asterisk or a semicolon
whereas comments starting in any other column must begin with a semicolon
ADD .L1 A3,A7,A7 ; add A3+A7 A7
MPY .M2 A7,B7,B6 ; multiply 16 LSBs of A7,B7 B7
II MPYH .M1 A7,B7,A6 ; multiply 16 MSBs of A7,B7 A6 Eg: Instruction set : Instruction set They are designed to make maximum use of the processors’ resources and at the same time minimize the memory space required to store the instructions.
Minimizing the storage space ensures the cost effectiveness of the overall system.
To ensure the maximum use of hardware of the DSP, the instructions are designed to perform several parallel operations in a single instruction, typically including fetching of data in parallel with main arithmetic operation. Slide 77: Instructions are kept short by restricting which register can be used with which operations and which operations can be combined in an instruction.
Some of the latest processors use VLIW architectures, where in multiple instructions are issued and executed per cycle.
In such architectures the instructions are short and designed to perform much less work thus requiring less memory and increased speed because of the VLIW architecture. 'C6x Instruction Set (by category) : 'C6x Instruction Set (by category) 'C6x Instruction Set (by unit) : 'C6x Instruction Set (by unit) ‘C67x Add’l Instructions (by unit) : ‘C67x Add’l Instructions (by unit) Control Register File : Control Register File Slide 82: Addressing mode register (AMR)
- specifies the addressing mode
Control status register (CSR)
- contains control and status bits.
Interrupt clear register (ICR)
- allows you to manually clear the maskable interrupts (INT15-INT4) in the interrupt flag register (IFR).
- Writing a 1 to any of the bits in ICR causes the corresponding interrupt flag (IFn) to be cleared in IFR.
- Writing a 0 to any bit in ICR has no effect.
- You cannot set any bit in ICR to affect NMI or reset.
Interrupt enable register (IER)
- enables and disables individual interrupts. Slide 83: The interrupt flag register (IFR)
- contains the status of INT4-INT15 and NMI interrupt.
- Each corresponding bit in the IFR is set to 1 when that interrupt occurs; otherwise, the bits are cleared to 0.
- If you want to check the status of interrupts, use the MVC instruction to read the IFR.
The interrupt return pointer register (IRP)
- contains the return pointer that directs the CPU to the proper location to continue program execution after
processing a maskable interrupt.
- A branch using the address in IRP (B IRP) in your interrupt service routine returns to the program flow when interrupt servicing is complete. Slide 84: The interrupt set register (ISR)
- allows you to manually set the maskable interrupts (INT15-INT4) in the interrupt flag register (IFR).
- Writing a 1 to any of the its in ISR causes the corresponding interrupt flag (IFn) to be set in IFR.
- Writing a 0 to any bit in ISR has no effect.
- You cannot set any bit in ISR to affect NMI or reset.
The interrupt service table pointer register (ISTP)
- is used to locate the interrupt service routine (ISR).
The NMI return pointer register (NRP)
- contains the return pointer that directs the CPU to the proper location to continue program execution after NMI processing.
- A branch using the address in NRP (B NRP) in your interrupt service routine returns to the program flow when NMI servicing is complete.
The E1 phase program counter (PCE1)
- contains the 32-bit address of the fetch packet in the E1 pipeline phase. Addressing modes : Addressing modes Determines how one access memory
Addressing refers to means to specify location of operands for instructions
- types of addressing are called addressing modes
- operands may be input operands for the operation as well as results of the operation
Addressing modes supported by the TMS320C67x include
and modulo addressing (circular addressing).
Immediate data is also supported.
The TMS320C67x does not support modulo addressing for 64-bit data. Slide 86: ADD .L1 -13,A1,A6 (implied) ADD .L1 A7,A6,A7 not supported LDW .L1 *A5++,A1 Immediate
The operand is part of the instruction
The operand is specified in a register
The address of the operand is part of the instruction (added to imply memory page)
The address of the operand is stored in a register Register-Indirect Addressing : Register-Indirect Addressing Operand is located in memory address stored in a register
Special group of registers can be used to store addresses
Most important addressing mode in DSPs
Efficient from instruction set point of view
Few bits are needed to indicate address of operand
can be used with or without displacement
32 registers(A0-A15,B0-B15) are used as pointers
Indirect addressing uses ‘*’ in conjunction with one of the 32 registers Slide 88: 1. *R – register R contains address of a memory location
where a data value is stored
2. *R++ (d) - register R contains memory address
- after the memory address is used, R is
postincremented such that new address is R+1 if d=1
- double minus (- -) update the address by d-1
3. * ++ R(d) - address is preincremented or offset by d
- current address is R+d or R-d
4. * + R(d) - address is preincremented by d, such that the current address is R+d
- however R pre increments without modification
- unlike previous case, R is not updated or modified Circular addressing : Circular addressing Circular addressing is used to create a circular buffer
Buffer is created in hardware and is very useful for applications like digital filtering
This addressing mode in conjunction with circular buffer updates samples by shifting data without creating overhead as in direct shifting
When pointer reaches bottom location, and when incremented the pointer is automatically wrapped around to the top location
Two independent buffers are available using BK0 and BK1 within the AMR register
Registers A4-A7 and B4-B7 in conjunction with .D unit can be used as pointers
MVC (move constant) is the only instruction to access AMR and other control registers Circular Buffer : Circular Buffer At the beginning of each
a new sample will be read into the circular buffer,overwriting the oldest sample.
The newest sample x(n) will be stored at the memory location pointed at by auxiliary register AR(i). Slide 91: The need of processing the digital signals in real time, evolves the concept of Circular Buffering.
Circular buffers are used to store the most recent values of a continually updated signal.
Circular buffering allows processors to access a block of data sequentially and then automatically wrap around to the beginning address exactly the pattern used to access coefficients in FIR filter.
Circular buffering also very helpful in implementing first-in, first-out buffers, commonly used for I/O and for FIR delay lines. Addressing Mode Register (AMR) : Addressing Mode Register (AMR) For each of the eight registers (A4–A7, B4–B7) that can perform linear or circular addressing, the addressing mode register (AMR) specifies the addressing mode.
A 2-bit field for each register selects the address modification mode: linear (the default) or circular mode.
With circular addressing, the field also specifies which BK (block size) field to use for a circular buffer.
In addition, the buffer must be aligned on a byte boundary equal to the block size. Slide 93: AMR mode and description
00 for linear addressing
01 for circular addressing using BK0
For circular addressing using BK1
reserved Slide 98: Block size = 2N+1 bytes Slide 99: Eg:
MVK .S2 0X0004,B2
; lower 16 bits to B2
MVKLH .S2 0x0005,B2
; upper 16 bits to B2
The value 0x0004 =(0100) into 16 LSB of AMR sets bit 2 (third bit) to 1 and all other bits to zero.
This sets the mode to 01 and selects register A5 as pointer to buffer using BK0
The value 0x0005 =(0101) into 16 MSB of AMR sets bits 16 and 18 to 1.
This corresponds to value of N used to select size of buffer = 2 N+1
= 64 bytes using BKO Interrupts : Interrupts The C6711device supports 16 prioritized interrupts
Types of interrupts:
Non maskable Slide 101: Reset (RESET)
Reset is the highest priority interrupt and is used to halt the CPU and return it to a known state.
The reset interrupt is unique in a number of ways:
- RESET is an active-low signal. All other interrupts are active-high signals.
- RESET must be held low for 10 clock cycles before it goes high again to reinitialize the CPU properly.
- The instruction execution in progress is aborted and all registers are returned to their default states.
- RESET is not affected by branches. Slide 102: Nonmaskable Interrupt (NMI)
- NMI is the second-highest priority interrupt
- generally used to alert the CPU of a serious hardware problem such as imminent power failure.
- For NMI processing to occur, the non maskable interrupt enable (NMIE) bit in the interrupt enable register must be set to 1. Slide 103: Maskable Interrupts (INT4−INT15)
- These have lower priority than the NMI and reset interrupts.
- These interrupts can be associated with external devices, on-chip peripherals, software control etc.
The interrupt source for interrupts 4-15 can be programmed by modifying the selector value (binary value) in the corresponding fields of the Interrupt
Selector Control registers:
MUXH (address 0x019C0000) and
MUXL (address 0x019C0004). Interrupt Priority : Interrupt Priority Slide 105: Multichannel Buffered Serial Port (McBSP)
The standard serial port interface provides:
Double-buffered data registers, which allow a continuous data stream
Independent framing and clocking for reception and transmission
Direct interface to industry-standard codecs, analog interface chips (AICs), and other serially connected A/D and D/A devices
- Multi channel transmission and reception of up to 128 channels.
An element sizes of 8, 12, 16, 20, 24, or 32-bit.
- 8-bit data transfers with LSB or MSB first. Slide 107: The McBSP consists of a data path and a control path that connect to external devices.
Separate pins for transmission and reception communicate data to these external devices.
Four other pins communicate control information (clocking and frame synchronization).
The device communicates to the McBSP using 32-bit-wide control and data registers accessible via the internal peripheral bus. Pin Description
CLKR Receive clock
CLKX Transmit clock
CLKS External clock
DR Received serial data
DX Transmitted serial data
FSR Receive frame synchronization
FSX Transmit frame synchronization Slide 108: CPU or DMA write the DATA to be transmitted to the Data transmit register (DXR) which is shifted out to DX via the transmit shift register (XSR).
Similarly, receive data on the DR pin is shifted into the receive shift register (RSR) and copied into the receive buffer register (RBR).
RBR is then copied to DRR, which can be read by the CPU or the DMA controller.
This allows internal data movement and external data communications simultaneously.
The following control registers are used in multichannel operation:
The multi channel control register (MCR)
The transmit channel enable register (XCER)
The receive channel enable register (RCER) Slide 109: Other registers for clock generation, frame synchronization and control are:
serial port control register (SPCR)
receive control register (RCR)
transmit control register (XCR)
pin control register (PCR)
Sample rate generator register (SRGR) DMA : DMA Direct Memory Access transfers data to or from the processor’s memory without the involvement of the processor itself.
DMA is commonly used to provide improved performance with input/output devices.
Rather than have the processor read data from an I/O device and copy the data into memory or vice versa, a separate DMA controller can handle such transfers in parallel.
The processor loads the DMA controller with control information including the starting address for the transfer, the number of words to be transferred, the source and the destination. Slide 111: The DMA controller uses the bus request pin to notify the DSP core that it is ready to make a transfer to or from external memory.
The DSP core completes its current instruction, releases control of external memory and signals the DMA controller via the bus grant pin that the DMA transfer can proceed.
The DMA controller then transfers the specified number of data words and optionally signals completion through an interrupt.
Some processor can also have multiple channels DMA managing DMA transfers in parallel. Timer : Timer The ’C67x has two 32-bit general-purpose timers that can be used to:
Interrupt the CPU
Send synchronization events to the DMA controller Slide 114: The timer works in one of the two signaling modes depending on whether clocked by an internal or an external source.
The timer has an input pin (TINP) and an output pin (TOUT).
The TINP pin can be used as a general purpose input, and the TOUT pin can be used as a general-purpose output.
When an internal clock is provided, the timer generates timing sequences to trigger peripheral or external devices such as DMA controller or A/D converter respectively.
When an external clock is provided, the timer can count external events and interrupt the CPU after a specified number of events. Load/Store Options : Load/Store Options In 'C6x the instruction set supports several types
of load/store instructions: LDH .D2 *B2++,B7II LDH .D1 *A2++,A7loads 16 bits(half word) into B7 whose address in memory is specified by B2load into A7 the content in memory specified by A7 STW .D2 A1,*+A4stores 32 bit word A1 into memory whose address is specified by A4 offset by 20(32 bits) or 80 bytes : LDH .D2 *B2++,B7II LDH .D1 *A2++,A7loads 16 bits(half word) into B7 whose address in memory is specified by B2load into A7 the content in memory specified by A7 STW .D2 A1,*+A4stores 32 bit word A1 into memory whose address is specified by A4 offset by 20(32 bits) or 80 bytes Load, and Store Paths : Load, and Store Paths The C67x DSP has two 32-bit paths for loading data from memory to the register File:
LD1 for register file A, and LD2 for register file B.
The C67x DSP also has a second 32-bit load path for both register files A and B.
This allows the LDDW instruction to simultaneously load two 32-bit values into register file A and two 32-bit values into register file B.
For side A, LD1a is the load path for the 32 LSBs and LD1b is the load path for the 32 MSBs.
For side B, LD2a is the load path for the 32 LSBs and LD2b is the load path for the 32 MSBs.
There are also two 32-bit paths, ST1 and ST2, for storing register values to memory from each register file. THE END : THE END