No. of Pages: 2 Α # APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY SECOND SEMESTER M.TECH DEGREE EXAMINATION, DECEMBER 2018 ### Branch: COMPUTER SCIENCE & ENGINEERING Stream: Computer Science & Engineering ## 01CS6102:PARALLEL COMPUTER ARCHITECTURE Answer any two full questions from each part Limit answers to the required points. Max. Marks: 60 Duration: 3 hours #### PART A - 1. Explain Flynn's taxonomy of parallel computers. [4] - Show how the following code would look like in MIPS when Loop Unrolled and [5] Scheduled Loop: L.D F0,0(R1) ADD.D F4,F0,F2 S.D F4,0(R1) DADDUI R1,R1,#-8 BNE R1,R2,Loop - 2. What do you understand by correlating branch predictors. - Consider the following MIPS code sequence that increments a vector of values [5] in memory (starting at 0(R1) and with the last element at 8(R2)) by a scalar in register F2. Enlist any dependency if present. Loop: L.D F0,0(R1) ;F0=array element ADD.D F4,F0,F2 ;add scalar in F2 S.D F4,0(R1) store result DADDUI R1,R1,#-8 ;decrement pointer 8 bytes BNE R1,R2,LOOP ;branch R1!=R2 - Define Amdahl's law. - [2] - 3. a. We want to replace the processor used for Web processing. Assuming that the [5] original processor is busy with computation 40% of the time and is waiting for I/O 60% of the time. If the overall speedup gained by incorporating the enhancement is 1.5625, how much faster would be the new processor on computation in the web servicing application than the original processor. - b. Write notes on different type of dependences. [4] http://www.ktuonline.com [2] ## PART B http://www.ktuonline.com | 4. | a. | Write the structure and purpose of reservation stations used with machines | | | |----|----|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|--| | | | that uses Tomasulo's algorithm. | [5] | | | | b. | Write notes on pipelining used in Intel core-i7. | [4] | | | 5. | a. | What are the different steps involved in instruction execution in a system which supports hardware based speculation? | | | | | b. | Compare the main characteristics used with five primary approaches in use for multiple -issue processors. http://www.ktuonline.com | [4] | | | 6. | a. | With a diagram explain about different vector-access memory schemes. | [6] | | | | b. | Suppose we have 8 memory banks with a bank busy time of 6 clocks and a total memory latency of 12 cycles. How long will it take to complete a 128-element vector load with a stride of 1? With a stride of 32? | [3] | | | | | PART C | | | | 7. | a. | Explain the concept of multiport memory organization for a multiprocessor system. | [5] | | | | Ь. | What do you understand by hierarchical bus system? | [2] | | | | c. | Check whether the permutation (0,6,4,7,3) (1,5) (2) in a Omega network built with 2x2 switches is a blocking or non-bloking. Explain the routing of a message from 001 to 101. | [5] | | | 8. | a. | Explain different snoopy bus protocols. [6] | [6] | | | | b. | Assume that words x1 and x2 are in the same cache block, which is in the | [6] | | Assume that words x1 and x2 are in the same cache block, which is in the shared state in the caches of both P1 and P2. Assuming the following sequence of events, identify each miss as a true sharing miss, a false sharing miss, or a hit. Any miss that would occur if the block size were one word is designated a true sharing miss. | Time | P1 | P2 | | |------|----------|----------|--| | 1 | Write x1 | | | | 2 | | Read x2 | | | 3 | Write x1 | | | | 4 | | Write x2 | | | 5 | Read x2 | | | - a. Draw and explain the basic structures of centralized shared-memory 9. [6] architecture and distributed memory architecture. - b. Draw the state transition diagram of a write invalidate cache coherence [6] protocol for a private write back cache. Explain about different states and state transitions.