# ENGI 9859 CoE Fundamentals -- Computer ArchitectureProblem Set 3

Please submit for grading your answers to 1 and 3 in class on Monday, Oct. 29.

1. The instruction cache in a processor suffers a 1% miss rate, whereas the data cache in the same processor experiences 5% misses. A computer is designed using this processor without a second level cache. It has been estimated for this computer that, on average, 20% of all instructions are load instructions, and 8% of all instructions are stores. All instructions normally take 2.0 clock cycles. The miss penalty is 25 clock cycles for both caches. Compute the average CPU time. The CPU runs at a clock rate of 600 MHz. Assume that the processor follows load-store architecture.
1. Suppose we know that the data cache has a 95% hit rate while reading memory, but only a 90% success in finding the memory location while writing, what will be the percentage change in the CPU time from the previous problem?
2. Let us assume that the instruction count of a particular program is 300,000. How many times can this program be executed in processor in one second?
3. The processor described in the first question is used in designing another computer, and it is decided to employ a second level cache in this design. The second level cache is unified, and the total access time when this cache produces a hit is 15 clock cycles. This cache has a hit rate of only 80%. Compute the average CPU time.
4. The system described in the previous problem, on average, does not find a required page in its main memory 0.0005% of the time. The secondary memory has an access time of 8 ms. Compute the average CPU time. There is no virtual memory, nor TLB.
2. A computer operating at a 1 GHz clock speed uses three levels of mixed cache memories which produce miss rates of 1%, 10%, and 25%, respectively. CPI for a perfect first level cache is 2 clock cycles, and the hit times for the second and third levels of cache are 5 and 15 clock cycles, respectively. If the average number of memory references per instruction is 1.30, compute the average CPU time. If a given program takes 1 millisecond to execute in this system, what is the size of this code? Assume that the penalty for main memory access is 30 clock cycles.
3. A 64-bit pipelined processor, operating at 1 GHz, has split TLBs and split first level caches, and a mixed second level on-chip cache. The L1 caches are direct mapped and are 32 kB each. The second level cache contains 512 kB and is 4-way set-associative. The block size is 64 bytes. The instruction TLB consists of 16 entries and the data TLB 32 entries. The page size is 16 kB. A short program loop goes through a 128 kB long array one word at a time without modifying the contents of the array; this process is repeated 1500 times. Assume LRU replacement strategy for the second level cache. The miss penalty for first level cache misses is 10 clock cycles, and for the second level cache misses the penalty is 40 clock cycles.
1. Compute the miss rates in each cache and TLB when this loop is executed. Calculate the execution time of this loop.
2. What happens if the block containing the loop instructions is mapped to the same cache block as one of the data blocks?
4. This question deals with a 5 GHz 64-bit processor with a single instruction pipeline. The processor employs real caches, and uses a 42 bit physical address, 64 byte blocks, 16 kB pages, and a 64 bit virtual address including 4 bits for protection and 4 ASN bits.
1. Design a suitable first level on-chip cache structure using 2-way set associativity, write buffer and appropriate strategies for block replacement, write hit and write miss.
2. Design a suitable TLB architecture by determining where the TLB should be located, if it should be unified or split, associativity, number of entries, various bit fields in each entry, and other relevant factors.
3. Determine the numbers of bits in the various bit fields of the physical address as well as the virtual address.
4. Starting with a virtual address of the instruction in the program counter, explain the steps involved in fetching and executing a Store Word (64-bit) instruction.
5. The processor includes a 2 MB on-chip unified 16-way set-associative second level cache. This cache employs LRU block replacement policy and write-back strategy. Sketch a diagram that shows the blocks and the associated overhead bits.
6. The system employs a pure paged virtual memory implementation. A three-level page table is implemented. Discuss how, upon a TLB miss, would the missing entry be found in the page table.
7. A computer built using this processor, and no off-chip cache, employs 1 GB of main memory that is interleaved into eight (word-wide) banks. If the access time of the DRAM used is 6 ns, show that 8 ns is a reasonable penalty for an L2 cache miss.
8. The L1 miss penalty is 15 clock cycles. Assuming reasonable hit rates for the caches, compute the time taken to run an application that involves execution of 40 million ALU instructions, 20 million load instructions, 10 million store instructions and 30 million branch instructions.
5. Read the following statements carefully. For each indicate if it is true or false:
1. L2 caches are always built as mixed (unified) caches.
2. Instruction Count (IC) is linearly proportional to the program size in bytes.
3. The fraction enhanced is more predominant in the computation of the overall speedup than the speedup of the enhanced part.
4. A dirty bit is required if the cache employs write through strategy.
5. All high performance processors follow pure RISC architecture.
6. Enhancements that enable reduction of cache miss rates sometimes end up causing the cache hit time to increase.
7. It is possible to come up with an ideal instruction set that would make the architecture very efficient for all applications.
8. The immediate operand always has fewer than 16 bits in any application.

back to Dennis Peters' homepage

Last modified \$Date: 2007-10-22 22:28:14 -0230 (Mon, 22 Oct 2007) \$ (\$Revision: 211 \$) by \$Author: dpeters \$