Appendix B: Basics of logic design: Combinatorial logic: ------------------------- Decoders: logn -> N Mux: - n input lines (probably decoded from logn lines) - selector (logn) - one output Simple example: A.S + B.S' - Series of AND gates + OR gates Two level logic: - Basic law: any logical function can be represented with AND, OR and NOT gates - Stronger: Can be represented with 2 levels of logic, i.e., using sum of products representation (canonical form) - This is the basis for PLAs -> array of AND gates, with the appropriate points turned on, followed by an array of OR gates - PLAs vs ROMs: former only partially encoded while latter fully encoded (because contents of ROM can be arbitrary, and need not have different sizes for diff contents) - Dont cares: represented by an X in the truth table -- allows designer to choose it as 0 or 1 as convenient for logic minimization eg: using Karnaugh maps etc - Logic minimization => efficient implementation... Arrays of logic elements: - for 32-bit versions etc (eg ALU, Mux for bus) Sequential logic: --------------------- Clocks: - edge triggered: easy to explain - either rising or falling edge may be "active" edge => implementation dependent - active => causes state changes to occur - stable => clock period must be sufficiently long: else race conditions - Only then can a seq. logic block provide input or outputs to a combinatorial block {advantage of edge-triggered method} Memory elements: - Store state - output of element depends upon input and value of stored state => sequential - simplest: unclocked. S-R (set-reset) latch: incorrect operation when S and R asserted simultaneously - could run into problems (oscillations ec) if not de-asserted properly Flip-flops and latches: - Output = value stored inside the element - Plus, change of state is triggered by a clock (unlike S-R latch) {note: not necessarily clock EDGE} - Difference: - Latch: state changes whenever inputs change AND the clock is asserted - Flip flop: state can change only upon a clock edge => we will use flip-flops in designs - D-latch: D = data, C = clock (fig B-13) - Two stages: a AND array, and a NOR array + feedback in the NOR array - "Transparent": Value of latch changes as D changes provided C is asserted (latch "open") - When "closed", output = whatever was asserted when the latch was most recently open - Flip flops are not transparent - D-flip flop: constructed from 2 D-latches. Fig: B-15 - Upon clock edge, the master latch is "closed", but the slave is "open" and it takes the D value and stores it. I.e. the D input is sampled and stored. - Setup and hold time requirements for a D-flip flop: - note that there must be a minimum time for input to be valid before the clock edge is called: "setup time" - minimum time it must be valid After the clock edge: "hold time" Register files: Read: require register number Write: require register number, data to be written and a clock edge for timing Implementation: a decoder to decode the register number, and D-flip flops SRAMs: - Larger amounts of memory than register files - Arrays of memory elements, fixed time to access any particular memory element - Read/write access characteristics may differ - Eg: 256K x 1 => 18 address lines, each addressing a bit - Fastest SRAMs are also narrow (x1 or x4) for techniocal reasons - OutputEnable: determines if output of memory is read out. - Useful for connecting multiple memories to a single bus and using OE to determine which memory drives the bus - read access times (fastest): 5 ns-25ns - largest: 4Mb - Write constraints: setup, hold times + pulese width of WriteEnable line Implementation: - large mux implementation like registers not feasible - Use tri-state buffer elements: - shared output line called a bit-line (driven by multiple inputs) - tri-state buffer: - data signal and an OutputEnable as inputs - Equal to input signal if OE is asserted - Else in highimpedence state => some other tri-state buffer can assert the line => requirement that at most one tri-state buffer should have OE enabled ! - Incorporated into flip flops - Still this implementation requires a large decoder and large number of word lines => reorganize memory as rectangular arrays and use a 2-step decoding process New developments: - Synchronous SRAMs and SDRAMs (see new DELL advertisements which talk about 100 MHz SDRAMs) - ability to transfer a burst of data from a series of sequential addresses within an array or a row - Input: starting address and number of words (burst length) .The clock is used to transfer the successive bits - SDRAMs rapidly being used as building blocks for cache-based systems w/ block transfers DRAMs: - State is stored as charge in a capacitor. - A single transistor used to read/write to this capacitor - Since there is only one transistor per bit vs SRAMS which have 5-6 transistors/bit, the implementation can be denser and cheaper - Problem: charge in capacitor needs to be periodically refreshed: "dynamic" - Refresh = read and rewrite. Charge can last for several ms = several million clock cycles - Now, single chip memory controllers do this refresh task independent of processor - Two-level decoding structure => we can refresh entire row at a time - => total refresh overhead = upto 2% of active cycles. - Two signals: RAS (Row Access Strobe) and Column Access Strobe (CAS) - Refresh: read the columns and write te values back - 2-level circuitry etc => access times much larger (60-110ns) - But lower cost/bit => DRAMs choice for main memory - SRAMs choice for caches - Since entire row is in he column latches, column address can change without changing row address (assert CAS) - Page-mode RAMs, nibble-mode RAMs, static column mode RAMs - Extended Data Out (EDO) RAMs: special case of page-mode RAMs: access times down to nearly 25 ns within a row. - SDRAMs: sequentially transfers all bits in a row under control of a clock signal Error detection/correction: - Simple parity, or Hamming distance-2 codes. - Form a n-bit number to store n-1 bits of information. Therefore only half the 2^n combinations are valid, and are called codes. Furthermore, we maintain the property that by changing any one bit of a code word, we cannot get any other code word. Thus a 1-bit error cannot go undetected. A simple way to do this is to have exactly one invalid codeword between every two valid codewords. - For correction, we should be able to pinpoint which bit was wrong. For this we typically need one more parity bit. - Note that error detection and correction capabilities are limited Finite State machines (FSM): - Used to represent sequential system - Behavior depends upon both current state in memory and input => can't use a truth table (since state is uncertain, and depends upon feedback from the combinatorial circuit itself) - Once state is certain, next-state & outputs can be represented by a combinatorial function - FSM = Set of states, plus two functions: - next-state function: which feeds back and changes the state - and output function which produces an output - Next state function is combinatorial: given state + input, find next state - - Synchronous => changes occur with clock cycle - Moore machine: output function depends upon current state alone - (and not upon the input) - Faster implementation - Mealy machine: output function depends upon the current state as well the input - Smaller, since it may require fewer states - But equivalent to Moore machine - FSMs can be converted into hardware using combinatorial circuits (implemented using structured logic such as PLAs) + a state register. Timing Methodologies: - Edge triggered and level-sensitive - Edge triggered simpler because it has fewer rules for correctness - Race: contents of a state element depends upon the relative speed of different logic elements - Clock period larger than t_prop + t_combinatorial + t_setup - Propagation time > hold time - Clock skew => add t_skew component - Edge triggered designs have some extra logic, and may be slower sometimes Level sensitive: changes are not instantaneous => races can easily occur - two phase clocking: two non overlapping clock signals Asynchronous Inputs and Synchronizers: - Difficult to use a single clock and keep clock skew small - Also typically different devices have their own clock. - How to take an asynchronous input so that it can affect the state of the system correctly (i.e. without causing race conditions) ? - Use synchronizer. - inputs = asynch signal + clock - output = signal which is synched to clock - Simple D flip flop design where D = async input, C = clock does not work due to metastability concerns - When the async signal is transitioning at the same time as the clock is, the output will not have a legitimate high or low value but will be indeterminate between them. Worse, this state can persist !