# **COMPUTER ORGANIZATION AND ARCHITECTURE TEST 3**

#### Number of Questions: 35

*Directions for questions 1 to 35:* Select the correct alternative from the given choices.

**1.** A computer '*A*' uses the following 32-bit floating point representation of real numbers.



Computer 'B' uses the following floating point representation scheme.



Which of the following statement is TRUE with regard to computer *B*'s method of representing floating-point numbers over computer *A*'s method?

- (A) Both range and precision are decreased.
- (B) The range is decreased but the precision is increased.
- (C) The range is increased but the precision is decreased.
- (D) Both range and precision remain same.
- 2. The minimum number of control bits required to execute a microprogram which has 50 control signals is

| (A) | 5  | (B) | 6   |
|-----|----|-----|-----|
| (C) | 50 | (D) | 100 |

**3.** Consider the following instructions: BEQ *R*<sub>0</sub>, *R*<sub>1</sub>, *L*<sub>1</sub>

The opcode is BEQ, which means Branch if Equal. The instruction checks the equality of  $R_0$ ,  $R_1$  and if both are equal jump to  $L_1$ . Which of the following correctly specifies given instruction but with much greater branching distance?

- (A) compare  $R_0, R_1$  if zero jump to  $L_1$
- (B) LOAD  $R_0$  Branch if equal  $R_1, L_1$
- (C) Branch if not equal  $R_0$ ,  $R_1$ ,  $L_2$  Jump  $L_1L_2$ :
- (D) Branch if not Equal  $R_0$ ,  $R_1$ ,  $L_1$  Jump  $L_2$   $L_1$ :
- 4. Consider a loop branch that branches 100 times in a row, then it is not taken once. Assume that the prediction bit for this branch remains in the prediction buffer. The prediction accuracy for this branch by using 1-bit branch history table prediction scheme (in percentage) is \_\_\_\_\_.

| (A) | 90 | (B) | 95 |
|-----|----|-----|----|
| (C) | 98 | (D) | 99 |

5. The biased exponent value for double precision floating point numbers is \_\_\_\_\_.

| (A) 1023 | (B) | 1024 |
|----------|-----|------|
|----------|-----|------|

| (C) 256 (D) | 255 |
|-------------|-----|
|-------------|-----|

- 6. Consider the following instruction:
  - ADD  $R_1$ ,  $(R_2, R_3)$ Here  $R_1$ ,  $R_2$ ,  $R_3$  are registers

The Sum of data present in the  $R_2$  and  $R_3$  is stored in a 32-bit register  $R_1$ .

Which of the following mode best reflects the mode of second operand?

- (A) Indexed mode
- (B) Base Register mode
- (C) Base with index mode
- (D) Base with index and offset mode
- 7. Consider an instruction format with fields containing zeros:

| 0000   | 0000 0000   | 0000 0000 | 0000 0000 0000  |
|--------|-------------|-----------|-----------------|
| Opcode | Destination | Source    | Immediate value |
|        | Register    | Register  |                 |

What is the maximum possible number of operations, registers with the given instruction format?

| (A) | 4, 8    | (B) | 4, 16     |
|-----|---------|-----|-----------|
| (C) | 16, 256 | (D) | 16, 65536 |

8. A 64 M-bit DRAM organized as 4 M addresses of 16-bit words each. A memory system is built using 128 M addresses of 256-bit words each. How many DRAM chips that are required for this memory system?
(A) 16
(B) 32

| (- | -) | 10  | (2) | -   |
|----|----|-----|-----|-----|
| (( | C) | 128 | (D) | 512 |

**9.** Consider a hypothetical 64-bit micro-processor having 64-bit instructions composed of two fields: Opcode: 2-bytes. Immediate operand or an operand address: remaining bytes.

What is the maximum directly addressable memory capacity (in bytes)?

| (A) |          | (B) |          |
|-----|----------|-----|----------|
| (C) | $2^{64}$ | (D) | $2^{24}$ |

- **10.** Consider a machine with a byte addressable main memory of 4 GB and a block size of 16 bytes. Assume that a direct mapped cache consisting of 32 lines is used with this machine. How many bytes of main memory can be stored in the cache?
  - (A) 4096 (B)  $2^{20}$ (C) 512 (D) 736
- 11. Consider a dynamic RAM that must be given a refresh cycle 64 times per ms. Each refresh operation requires 150 ns; a memory cycle requires 250 ns. What percentage of the memory's total operating time is given to refreshes?
  - (A) 1% (B) 3%
  - (C) 5% (D) 10%
- **12.** A DMA module is transferring characters to memory using cycle stealing, from a device transmitting at

## Section Marks: 30

9600 bps. The processor is fetching instructions at the rate of 1 million instructions per second (1 MIPS). By how much the processor is slowed down due to the DMA activity?

| (A) | 0.0012% | (B) | 0.01% |
|-----|---------|-----|-------|
| (C) | 0.001%  | (D) | 0.12% |

**13.** Match list-A with list-B and select the correct answer using the code given below the list:

|                                     |     | List-A        |    | List-B           |
|-------------------------------------|-----|---------------|----|------------------|
|                                     | a.  | Cache         | 1. | Printer          |
|                                     | b.  | DMA I/O       | 2. | Disk             |
|                                     | c.  | Interrupt I/O | 3. | High speed RAM   |
| (A) a-1, b-2, c-3 (B) a-2, b-3, c-1 |     |               |    |                  |
| (C)                                 | a—3 | 3, b–2, c–1   | (  | D) a-3, b-1, c-2 |

14. The speed gained by a 'p' segment pipeline executing 'q' tasks is:

(A) 
$$\frac{(q+p-1)}{pq}$$
 (B)  $\frac{p+q}{pq-1}$   
(C)  $\frac{pq}{p+q-1}$  (D)  $\frac{p+q}{pq+1}$ 

- **15.** The IEEE 32-bit floating point format of –6 is:
  - (A) 1 10000001 10000000000000000000000
  - (B) 1 0000010 0000000000000000000000
  - (C) 1 0000001 1000000000000000000000
  - (D) 1 1000001 0000000000000000000000
- 16. Consider below code segment:

LOAD  $R_1$ , M[1000]LOAD  $R_2$ , M[1002]ADD  $R_3$ ,  $R_1$ ,  $R_2$ STORE  $R_3$ , M[1008]LOAD  $R_4$ , M[1004]ADD  $R_5$ ,  $R_1$ ,  $R_4$ STORE  $R_5$ , M[1010]

Assume that this program is executed on a pipelined processor with 5 stages: FI (Fetch Instruction), RD(Read registers while Decoding), EX (Execute the operation or calculate the address), MEM (Access an operand in data memory), WR (write result into a register).

The first operand of each instruction will be the destination. What is the minimum number of stalls that will be there in the reordered code of given code if the processor uses forwarding also?

| (A) | 0 | (B) 1 |
|-----|---|-------|
| (C) | 2 | (D) 3 |

17. Consider a  $2 \times 8$  two-dimensional array of elements, *M*. Assume that each element in the array occupies one word and the array elements are stored in columnmajor order in the main memory from location 2000 to location 2015. The cache consists of 8 blocks and each block will have just two words. Assume also that whenever needed, LRU replacement policy can be used. What is the percentage of cache utilization for Direct mapped, Associative and 2-way set-Associative respectively, if the processor accesses the following elements?

- (C) 100%, 100%, 100% (D) 100%, 100%, 50%
- **18.** Consider the given program structure, which is in Main memory:



The memory addresses are in decimal. The program consists of two loops along with start and end statements. All the instructions in the program are executed in sequential manner. The program is to be run on a computer that has an instruction cache of size 1 K words organized in a direct-mapped manner. The main memory size is 64 K words with block size of 128 words. Let the cycle time of main memory is 10 ns and cycle time of the cache is 1 ns.

Then the total time needed for reading instructions from main memory to the cache during the execution of the program (in nano seconds) is \_\_\_\_\_.

| (A) | 1280  | (B) | 4864  |
|-----|-------|-----|-------|
| (C) | 48640 | (D) | 61440 |

**19.** A pipelined processor has two branch delay slots, An optimizing compiler can fill one of these slots 75% of the time and can fill the second slot only 20% of the time. What is the ratio of improvement in performance achieved by this optimization to without optimization, assuming that 10% of the instructions executed are branch instructions?

| (A) | 1.102 | (B) | 1.086 |
|-----|-------|-----|-------|
| (C) | 0.9   | (D) | 1.105 |

**20.** Consider a system with a 4 KB, 4-way set associative cache memory with 128 lines, a 1024 word array where each element is a 32-bit word. When a program accesses this array with a scale of D, it means starting with the first element, the program accesses every

## 3.28 | Computer Organization and Architecture Test 3

 $D^{\text{th}}$  element. For example, for D = 1, the program accesses every element, for D = 2, the program accesses every second element and so on. Assuming a cache that is initially empty and a program makes one pass over this array with a scale of D, what is the miss rate generated for D = 8?

(A) 
$$\frac{1}{2}$$
 (B)  $\frac{1}{4}$   
(C)  $\frac{1}{8}$  (D)  $\frac{1}{16}$ 

- **21.** A computer has 32-bit instructions and 12-bit addresses. If there are 240 two-address instructions, how many one-address operations can be formulated?
  - (A) 4096(B) 65536(C) 8192(D) 131072
- **22.** Consider a pipelined processor with a 5-stage pipeline. Assume that all instructions take 5 cycles. The dynamic instruction count by type, as a percentage of the total, is
  - as follows:
  - 10% store instructions
  - 20% load instructions
  - 30% branch instructions
  - 40% ALU instructions

What is the ideal speed-up due to pipelining for this processor?

| (A) | 2.5 | (B) | 5  |
|-----|-----|-----|----|
| (C) | 10  | (D) | 50 |

**23.** For the data given in Q. No. 22, let stalls due to data hazards occur only under two reasons.

A stall of two cycles occur when a load instruction is followed by an ALU instruction that uses the result of load. This scenario exists for 40% of the load instructions.

A stall of three cycles occur when a branch instruction is preceded by an ALU operation whose result is used as a branch condition. This scenario exists for 50% of the branch instructions. What is the decrease in the ideal speed up of pipelining only due to data hazards? (A) 50% (B) 75.6%

| (A) | 50%   | (B) | /3.0% |
|-----|-------|-----|-------|
| (C) | 18.9% | (D) | 37.8% |

24. Given

 $x = (0100\ 0110\ 1101\ 1000\ 0000\ 0000\ 0000\ 0000)_2$  and  $y = (1011\ 1110\ 1110\ 0000\ 0000\ 0000\ 0000\ 0000)_2$ , representing single precision IEEE 754 floating point numbers. Then the respective values of x + y and x \* y in decimal (approximately) are:

- (A) 27647.5625 and -12096
- (B) 27647.5625 and -24192
- (C) 13823.75 and 24192
- (D) 13823.75 and -24192
- **25.** A 5-stage pipeline has the following stages: IF: Instruction fetch

ID: Instruction decode and register file read

EX: Execution or address calculation MEM: Data memory access WB: Write back

The following code executed on this pipeline:

| Instruction                                          | Operation                     |  |  |  |
|------------------------------------------------------|-------------------------------|--|--|--|
| ADD $R_1, R_2, R_3$                                  | $R_1 \leftarrow R_2 + R_3$    |  |  |  |
| SUB R <sub>4</sub> , R <sub>1</sub> , R <sub>5</sub> | $R_4 \leftarrow R_1 - R_5$    |  |  |  |
| LOAD R <sub>6</sub> , 200(R <sub>1</sub> )           | $R_6 \leftarrow M[200 + R_1]$ |  |  |  |
| ADD $R_7, R_1, R_6$                                  | $R_7 \leftarrow R_1 + R_6$    |  |  |  |

Use forwarding to resolve data hazards. Then the number of stalls that will occur because of data hazards in given code is \_\_\_\_\_.

| (A) | 0 | (B) | 1 |
|-----|---|-----|---|
| (C) | 2 | (D) | 3 |

26. Consider a floating point representation:  $c.r^e$  where c represents coefficient register of size 10 in which MSB bit is used to represent sign, r represents radix and e represents contents of exponent register of size 5, in which MSB is used to represent sign. Then the contents of coefficient and exponent registers for the number +1001.110 will be:

(A) 0100111000, 00100
(B) 0001001110, 00100
(C) 1001110000, 10100
(D) 1001110000, 10101

**27.** Consider a 32-bit microprocessor, with a 16-bit external data bus, driven by an 8 MHz input clock. Assume that this microprocessor has a bus cycle whose minimum duration equals four input clock cycles. What is the minimum data transfer rate across the bus that this microprocessor can sustain, in bytes?

- (C) 6 MB/sec (D) 8 MB/sec
- 28. Consider a bus structure in which a single internal bus connects the ALU and all processor registers. Which of the following represents the correct sequence of micro-operations to add a number to the accumulator when the number is an indirect address operand?
  - (A)  $t_1: MAR \leftarrow (IR(address))$   $t_2: MBR \leftarrow memory$   $t_3: Y \leftarrow (MBR)$   $t_4: Z \leftarrow (AC) + (Y)$  $t_5: AC \leftarrow (Z)$
  - (B)  $t_1: MAR \leftarrow (IR(address))$  $t_2: MBR \leftarrow memory$  $t_3: Z \leftarrow (AC) + (MBR)$  $t_4: AC \leftarrow (Z)$
  - (C)  $t_1: \text{MBR} \leftarrow (\text{IR}(\text{address}))$  $t_2: Z \leftarrow (AC) + (\text{MBR})$  $t_3: AC \leftarrow (Z)$
  - (D)  $t_1: MAR \leftarrow (IR(address))$  $t_2: MBR \leftarrow memory$  $t_3: MAR \leftarrow (MBR)$ 
    - $t_4$ : MBR  $\leftarrow$  memory

#### Computer Organization and Architecture Test 3 | 3.29

- $t_5: Y \leftarrow (MBR)$  $t_6: AC \leftarrow (AC) + (Y)$  $t_7: AC \leftarrow (Z)$
- **29.** For the data given in Q. No. 28, what is the number of clock cycles required for above operation if each micro-operation takes one clock cycle?
  - (A) 5 (B) 4

| (D) 7 |
|-------|
|       |

30. The access time of a cache memory is 100 ns and that of main memory is 1 µs. 80% of the memory requests are for read and others are for write. Hit ratio for read only accesses is 0.9. A write-through procedure is used. The average access time of the system for both read and write requests is:

| (A) | 160 ns | (B) | 200 ns |
|-----|--------|-----|--------|
| (C) | 360 ns | (D) | 720 ns |

**31.** Consider a new instruction named branch-on-bit-set. The instruction

"bbs reg, pos, label"

jumps to label if bit in position 'pos' of register operand 'reg' is one. A register is 32 bits wide and the bits are numbered 0 to 31, bit in position 0 being the least significant. Consider the following emulation of this instruction on a processor that does not have branchon-bit-set implemented.

result  $\leftarrow$  reg & mark

Branch to label if result is non-zero.

The variable 'result' is temporary register. For correct emulation the variable mask must be generated by:

- (A) mask  $\leftarrow 0xf$  (B) mask  $\leftarrow pos$
- (C) mask  $\leftarrow 0x \text{fff} >> pos$  (D) mask  $\leftarrow 0x 1 \ll pos$

- 32. Consider a computer with the following characteristics: Total main memory: 1 MB; word size is 1 byte; Block size of 16 bytes; and cache size of 64 kbytes; for the main memory address *F*0010, what will be the cache line bits for a direct mapped cache?
  (A) 1111
  (B) 0000
  - (C) 0000 0000 0001 (D) 00000001000
- 33. Consider a system in which a data transfer over a bus takes 500 ns. Transfer of bus control in either direction, from processor to I/O device or vice versa, takes 250 ns. One of the I/O devices has a data transfer rate of 50 kb/s and employs DMA. Data is transferred one byte at a time. Suppose we employ DMA in a cycle-stealing mode. When transferring a block of 128 bytes, it would tie up the bus for (in micro seconds)
  (A) 250
  (B) 0.2

| (11) | 200 | (2) | 0.2 |  |
|------|-----|-----|-----|--|
| (C)  | 128 | (D) | 0.4 |  |

- **34.** A PC relative mode branch instruction is 2 bytes long. The address of the instruction, in decimal, is 356029. Determine the branch target address if the signed displacement in the instruction is -31.
  - (A) 356029
    (B) 356030
    (C) 356031
    (D) 356000
- **35.** Consider a system employing interrupt driven I/O for a particular device that transfers data on an average of 10 KB/sec on a continuous basis. Assume that interrupt processing takes 100  $\mu$ s (i.e., the time to jump to the ISR, execute it and return to the main program). Determine what fraction of processor time is consumed by this I/O device if it interrupts for every byte.
  - (A) 1 (B) 10 (C) 100 (D) 0.8

|              | Answer Keys  |             |              |              |              |              |              |              |              |
|--------------|--------------|-------------|--------------|--------------|--------------|--------------|--------------|--------------|--------------|
| 1. B         | <b>2.</b> B  | <b>3.</b> C | <b>4.</b> C  | <b>5.</b> A  | <b>6.</b> C  | <b>7.</b> C  | 8. D         | <b>9.</b> B  | 10. C        |
| 11. A        | 12. D        | 13. C       | 14. C        | 15. A        | 16. A        | 17. C        | 18. D        | <b>19.</b> B | <b>20.</b> B |
| <b>21.</b> B | <b>22.</b> B | 23. D       | <b>24.</b> A | <b>25.</b> B | <b>26.</b> A | <b>27.</b> B | <b>28.</b> D | <b>29.</b> D | <b>30.</b> C |
| 31. D        | <b>32.</b> C | 33. D       | <b>34.</b> D | 35. A        |              |              |              |              |              |

# HINTS AND EXPLANATIONS

1. The range of a floating point number depends on exponent size. As exponent size in '*B*' decreases, the range also decreases.

The precision specifies number of digits after decimal point, which increases in 'B' computer. Choice (B)

- Using Vertical Microprogramming, there will be 6-control bits to provide 50 control signals.
   (:: 2<sup>6</sup> = 64 > 50) Choice (B)
- Given instruction BEQ R<sub>0</sub>, R<sub>1</sub>, L<sub>1</sub>. This instruction can be replaced by following instructions, to achieve greater branching distance:

BNE  $R_0, R_1, L_2$ 

Jump  $L_1, L_2$ :

We get much greater branching distance by using 1-address instruction: Jump  $L_1$ .

We can't use zero-address as we need to specify the opcode 'jump'

By using the instruction 'Jump  $L_1$ ' the address space is more so we can specify much greater branching distance. Choice (C)

**4.** Branch history table is a small memory, which contains a bit that says whether the branch was recently taken or not.

#### 3.30 Computer Organization and Architecture Test 3

In the given problem, the loop executes 100 times but in the  $101^{\text{th}}$  time the loop is not taken.

But it will be incorrectly predicted by 1-bit branch history table prediction as the loop is taken for 100 times. In the 1<sup>st</sup> iteration also the predictor specifies incorrect branch prediction as the bit is set to 'not taken' in the exit stage of last execution of the program.

 $\therefore$  2 wrong predictions, out of 101 predictions.

$$\therefore \quad \text{Prediction Accuracy percentage} \\ = \frac{99}{101} \times 100 = 98\% \quad \text{Choice (C)}$$

- In Double precision floating point format, Number of bits for exponent = 11
  - :. Biased exponent =  $2^{k-1} 1$ , Where k is the number of bits used for exponent. Here k = 11, hence Biased exponent =  $2^{10} - 1 = 1023$ . Choice (A)
- 6. If we assume  $R_2$  as Base register,  $R_3$  as Index register, then the mode will be Base with index mode.

- 7. Number of bits in opcode = 4
  - ⇒ Number of operations possible =  $2^4 = 16$ Number of bits in src/dest. Registers = 8
  - $\therefore$  Number of registers =  $2^8 = 256$  Choice (C)
- 8. Each DRAM has 4*M* addresses of 16-bit words.
  - $\Rightarrow DRAM capacity = 4M \times 16$ Memory system capacity =  $128M \times 256$

$$\therefore \text{ Number of chips required} = \frac{128M \times 256}{4M \times 16} = 512$$
Choice (D)

- 9. Instruction size = 64-bits Opcode size = 2B = 16-bits Operand/Address field = 64 - 16 = 48-bits. The 48-bits can be used to specify a particular address.  $\therefore$  Maximum directly addressable memory =  $2^{48}$ Choice (B)
- 10. Number of lines in cache = 32 There are 32 lines in cache. Each line will have 16 bytes.
  ∴ Total bytes of memory in cache = 16 × 32 = 512 bytes Choice (C)
- 11. The DRAM has given a refresh cycle 64 times per ms. Time required for one refresh operation = 150 ns In 1 ms, the time required to refresh is 64 × 150 ns = 9600 ns
  ∴ The fraction of time devoted to memory refresh is
  - $\frac{9600 \times 10^{-9}}{10^{-3}} = 0.0096$
  - :. Approximate percentage of the memory's total operating time given to refreshes is 1%.

Choice (A)

12. The DMA is transmitting at a rate of 9600 bits per second i.e., it is transmitting  $\frac{9600}{8} = 1200$  characters per second. The processor is processing at a rate of 1 million instructions per second i.e., It will take  $\frac{1}{10^6} = 1\mu$  second to process a single instruction. A single character will be processed by DMA in

$$\frac{1}{1200} \approx 833 \ \mu s$$

.:. Slow down of processor due to DMA

$$=\frac{1}{833} \times 100 = 0.12\%$$
 Choice (D)

- **13.** Cache is high speed RAM, DMA I/O is used with disk, Interrupt I/O is used with printer. Choice (C)
- 14. Without pipelining, execution time = pqwith pipelining, execution time = p + q - 1

: speed up = 
$$\frac{pq}{p+q-1}$$
 Choice (C)

**15.** IEEE 32-bit floating point representation will be in the form of:

$$\begin{array}{l} 6 = 110 \\ \text{For } -6 \text{ sign bit is } 1. \\ 6 = 110 = 1.10 \times 2^{010} \\ \text{Exponent} = 010 \\ \text{Biased exponent} = 127 + 2 = 129 = 10000001 \\ \text{Fraction} = 1000000000000000000 \\ \text{Hence IEEE } 32\text{-bit floating point representation of} \\ -6 \text{ is:} \\ 110000001 \\ 100000000000000000000 \\ \text{Choice (A)} \end{array}$$

16. In given code stalls occur before the ADD instructions. 2 stalls are there in given code. To minimize stalls, we reorder the code. In the reordered code place LOAD  $R_4$ , M[1004] before ADD  $R_3$ ,  $R_1$ ,  $R_2$ .

The resultant code will be:

| LOAD $R_1$ ,                  | <i>M</i> [1000] |
|-------------------------------|-----------------|
| LOAD $R_2$ ,                  | <i>M</i> [1002] |
| LOAD $R_4$ ,                  | <i>M</i> [1004] |
| ADD $R_3, R_1, R_2$           |                 |
| ADD $R_5$ , $R_1$ , $R_4$     |                 |
| STORE $R_3$ ,                 | <i>M</i> [1008] |
| STORE $R_5$ ,                 | <i>M</i> [1010] |
| No stalls in the reordered co | ode.            |
|                               |                 |

17. The Array  $M_{2\times 8}$  is stored in column-major order in the main memory, i.e.,

Choice (A)



In direct mapping,  $B_0$  is placed in  $L_0$ ,  $B_1$  in  $L_1$ ,  $B_2$  in  $L_2$ , ...,  $B_7$  in  $L_7$  to access the elements  $M_{0,0} - M_{0,7}$ . The remaining elements are already in cache. All those accesses will be hits.

As all the 8-blocks used, cache utilization is 100%. In Associative mapping, each block of main memory will be placed at anywhere in the cache lines. Also 8-accesses will be misses and the remaining will be hits. No need of replacement and all lines of cache will be used.

 $\therefore$  Cache utilization = 100%

In 2-way set-Associative mapping, two blocks will be treated as a single set. There will be 4-sets.

Cache memory



All the four sets are used. Set 0 consists  $B_0$ ,  $B_4$ , set 1 contains  $B_1$ ,  $B_5$ . Like this all sets are used. Hence cache utilization is 100%. Choice (C)

- **18.** Given main memory size = 64 K =  $2^{16}$  words Block size =  $128 = 2^7$  words Cache size = 1 K =  $2^{10}$  words Word field size = 7 Number of lines =  $\frac{2^{10}}{2^7} = 2^3$ 
  - $\therefore$  Line field size = 3
  - $\Rightarrow Tag = 16 (7 + 3) = 6$

## Computer Organization and Architecture Test 3 | 3.31

The cache memory is shown below with main memory block addresses.



Hence the sequence of reads from the main memory blocks into cache line is:

i.e., in pass1 of outer loop the lines 0, 1, 2, 3, 4, 5, 6, 7, 0, 1 will be accessed. In pass 2 0, 1 are accessed for (0 - 127), (128 - 255) and again 0, 1 are accessed for (1024 - 1151), (1152 - 1279).

In last pass 0, 1, 0, 1, 2, 3 lines will be accessed.

... Total time for reading the blocks of main memory into the cache =  $(10 + 9 \times 4 + 2) \times 128 \times 10$ 

$$= (10 + 9 \times 4 + 2) \times 128 \times 10$$
  
= 61440 n sec. Choice (D)

- 19. Let the number of instruction be 100. Without optimization, time required to execute 100 Instructions =  $100 + 10 \times 2 = 100 + 20 = 120$ With optimization time required
  - $=(120 0.75 \times 10 0.20 \times 10) = 110.5$

: Improvement using optimization  
= 
$$\frac{120}{110.5}$$
 = 1.086. Choice (B)

- **20.** Number of lines in cache = 128 As the cache is a 4-way set associative, each set contains 4-blocks.
  - $\therefore \quad \text{Number of sets} = \frac{128}{4} = 32$

So there will be 32 misses.

## 3.32 | Computer Organization and Architecture Test 3

But the program accesses every 8<sup>th</sup> word.

$$\therefore \text{ Number of misses } = \frac{32}{8} = 4$$
  
Hence miss rate  $= \frac{1}{4}$  Choice (B)

**21.** Instruction size = 32-bits

Two address instruction format will be



There will be  $2^8$  possible combinations of operations. Two address instructions = 240 Operations for single address instructions

= 256 - 240 = 16

Single address instruction format will be



Total one address operations =  $16 \times 2^{12} = 65536$ . Choice (B)

- **22.** The ideal speed up of a pipelined processor is equal to the number of stages in the pipeline.
  - $\therefore$  As there are 5-stages the speed will be 5.
    - Choice (B)

**23.** Ideal speed up = 5

Speed up due to Data hazards

Pipeline Depth

1 + (load frequency × load penalty)  
+ (branch frequency × branch penalty)  
= 
$$\frac{5}{1 + (0.2 \times 0.4 \times 2) + (0.3 \times 0.5 \times 3)}$$
  
=  $\frac{5}{1 + 0.16 + 0.45} = 3.11$   
∴ Speed up =  $\frac{5 - 3.11}{5} \times 100 = 37.8\%$  Choice (D)

| Sign | Biased exponent | Mantissa |
|------|-----------------|----------|
| 1    | 8               | 23       |

For x, sign bit =  $0 \Rightarrow$  sign = + Biased exponent = 100 01101 = 141 Exponent = 141 - 127 (Bias) = 14 Mantissa = 1.101 1000 0000 0000 0000 0000  $\therefore$  x = +1.101 1000 0000 0000 0000  $\times 2^{14}$ = +27648 For y, sign bit =1  $\Rightarrow$  sign = -Biased exponent = 011 1110 1 = 125 Exponent = 125 - 127 = -2Mantissa = 1.110 0000 0000 0000 0000 0000  $\therefore y = -1.110 0000 0000 0000 0000 \times 2^{-2}$  = -0.4375 x + y = 27648 - 0.4375 = 27647.5625x \* y = 27648 \* 0.4375 = -12096 Choice (A)

25. Given code:

 $I_1: ADD R_1, R_2, R_3$   $I_2: SUB R_4, R_1, R_5$   $I_3: LOAD R_6, 200(R_1)$   $I_4: ADD R_7, R_1, R_6$   $I_2, I_3, I_4 are dependent on I_1. I_4 dependent on I_3.$ 

The execution chart is shown below:

|                | 1  | 2  | 3    | 4   | 5     | 6   | 7  | 8   | 9  | 10 |
|----------------|----|----|------|-----|-------|-----|----|-----|----|----|
| T <sub>1</sub> | IF | ID | EX′  | MEM | WB    |     |    |     |    |    |
| $I_2$          |    | IF | JD ) | EX  | MEM   | WB  |    |     |    |    |
| $I_3$          |    |    | IF   | ND  | EX    | MEŅ | WΒ |     |    |    |
| $I_4$          |    |    |      | IF  | Stall | ID▼ | ΕX | MEM | WB |    |

- $\therefore \quad R_1 \text{ can be forwarded to } I_2, I_3 \text{ and } I_4, \text{ but } R_6 \text{ (in } I_3) \\ \text{cannot be forwarded to } I_4 \text{ immediately, so one stall} \\ \text{occurs.} \qquad \qquad \text{Choice (B)}$
- **26.** Given number +1001.110

Then c = 
$$0 1 0 1 1 1 0 0$$
  
sign fraction

The decimal point will be at 4th position from left. Choice (A)

**27.** Input clock frequency = 8 MHz

clock cycle = 
$$\frac{1}{8 \text{ MHz}}$$
 = 125 ns

Given that bus cycle =  $4 \times \text{clock}$  cycle

 $= 4 \times 125 = 500 \text{ ns}$ 

Data bus width = 16-bits = 2 B

 $\therefore$  2 B can be transferred for every 500 ns.

:. Transfer rate = 
$$\frac{2 \text{ B}}{500 \text{ ns}}$$
 = 4 MB/sec Choice (B)

- 28. As the number is an indirect address operand, the effective address will present in the given address location. So we refer two times to get the actual operand from memory. Hence the correct sequence of micro-operation is
- **29.** As there are 7 micro-operations and each micro-operation takes one clock cycle, the total number of clock cycles for addition with an indirect address operand = 7. Choice (D)
- **30.** Access time of cache = 100 ns Access time of main memory =  $1 \ \mu s = 10^3 \ ns$

## Computer Organization and Architecture Test 3 | 3.33

Read requests = 80%word size = 4Number of blocks in cache =  $\frac{2^{16}}{2^4} = 2^{12}$ Write requests = 20%Hit ratio for read only access = 0.9Write-through means main memory is updated while *.*.. Line = 12updating cache memory. tag + line = 20 - 4 = 16Average access time of the system Tag = 16 - 12 = 4 $= 0.8(0.9 \times 100 + 0.1(100 + 1000)) + 0.2 \times 1000$ Given address F0010: = 160 + 200 = 360 ns Choice (C) 31. The instruction "bbs reg, pos, label" jumps to label it tag line word bit in position 'pos' of register operand 'reg' is one. Line address = 0000 0000 0001 Choice (C) · · . To implement this instruction as 33. The time required to transfer one byte is: result  $\leftarrow$  reg & mask = 250 + 500 + 250We are performing 'and' operation between 'reg' and  $= 1000 \text{ ns} = 1 \mu \text{s}.$ 'mask'. To transfer 128 bytes, we require 128 µs. The mask must be '1' to check a position value of 'reg'. Choice (D) And it is left shifted POS number of times to align that 1 to the required position of reg. **34.** Instructions size = 2 BFor example, Current instructions address = 356029  $\text{Reg} = A091 = 1010\ 0000\ 1001\ 0001$ PC always points to the address of the next instruction. POS = 2i.e., 356029 + 2 = 356031Using bbs, we won't jump to label as 2<sup>nd</sup> bit is zero us-In PC-relative addressing mode, the content of PC is ing mask  $\leftarrow$  ox1 << POS added to the displacement, which is given as -31.  $\leftarrow ox1 \ll 2$ So the branch target address  $\leftarrow$  ox 0100 = 356031 - 31 = 356000Choice (D) Now we perform AND between reg and mask. As the **35.** Data transferred = 10 kB/secresult is zero we won't jump to label. Choice (D) = 10000 Bytes/sec **32.** Main memory =  $1 \text{ MB} = 2^{20} \text{ B}$ Given that interrupts are generated for every 1 Byte. Word size = 1 BThere will be 10000 interrupts and rate of interrupts Block size =  $16 \text{ B} = 2^4 \text{ B}$  $=\frac{1}{10000}=100 \ \mu s$ Cache size =  $64 \text{ KB} = 2^{16} \text{ B}$ For Direct mapped cache: : Fraction of processor time consumed by this I/O tag line word  $=\frac{100}{100}=1$ Choice (A) 20