

# **Design and Measurement Techniques for Decision Feedback Equalizers up to 110 Gb/s in SiGe Technologies**

**Der Fakultät für Elektrotechnik, Informatik und Mathematik**

**der Universität Paderborn**

**zur Erlangung des akademischen Grades**

**Doktor der Ingenieurwissenschaften (Dr.-Ing.)**

**genehmigte Dissertation**

**von**

**M.Sc. Ahmed Sanaa Ahmed Awny**

**Erster Gutachter:**

**Prof. Dr.-Ing. Andreas Thiede**

**Zweiter Gutachter:**

**Prof. Dr. Sc. Techn. Frank Ellinger**

**Tag der mündlichen Prüfung: 15.06.2015**

**Paderborn 2015**

**EIM-E/319**



Summary of the dissertation:

**Design and Measurement Techniques for Decision Feedback Equalizers  
up to 110 Gb/s in SiGe Technologies**

**by Ahmed Sanaa Ahmed Awny**

Cloud computing, video-on-demand, voice and video-over-IP and other Internet services present an ever-increasing demand on high bit rate data connections. Optical fibers offer a very good solution for this ever-growing demand, due to their large bandwidth. One of the main factors limiting the utilization of this large bandwidth is intersymbol interference (ISI) that results from the different types of fiber dispersion.

To mitigate ISI, the use of linear equalizers in form of feedforward equalizers (FFE) and non-linear equalizers in form of decision feedback equalizers (DFE) has proved to be an effective means. Both types of equalizers are employed in fiber communication systems to compensate different kinds of dispersions. DFEs, as opposed to FFEs, can compensate for the deep nulls (discontinuities) in the channel frequency response and do not amplify high-frequency noise. Their design, however, is rather challenging at very high bit rates, due to the timing condition of their feedback loop.

This dissertation presents the design and characterization of 80 and 110 Gb/s DFEs in 0.13  $\mu$ m SiGe:C BiCMOS technology. A modified architecture is described and utilized for the implementation of the DFEs to relax the timing condition and improve the behavior of the feedback loop. Circuit techniques to enhance the bandwidth of the DFE building blocks are explained. Furthermore, new measurement techniques are employed to prove the ability of the DFEs to work at such high bit rates. The functionality of the 80 Gb/s DFE is demonstrated in an experiment to mitigate ISI for bandwidth-limited channels. The application of the DFEs developed in this work is not limited to optical fiber communication systems, but can be also employed in chip-to-chip and board-to-board communication systems.



Zusammenfassung der Dissertation:

**Design and Measurement Techniques for Decision Feedback Equalizers  
up to 110 Gb/s in SiGe Technologies**

**des Herrn Ahmed Sanaa Ahmed Awny**

Cloud-Computing, Video-on-Demand, Voice-/Video-over-IP und andere Internetdienste erzeugen einen stetig steigenden Bedarf an hochratigen Datenverbindungen. Für diesen steigenden Bedarf bietet die Glasfaser durch ihre hohe Bandbreite eine sehr gute Lösung. Einer der Hauptfaktoren, der die Nutzung der Bandbreite der Glasfaser einschränkt, ist die Interzeichenüberlagerung (intersymbol-interference, ISI), welche aus den verschiedenen Typen der Faserdispersion resultiert.

Ein effektives Mittel zur Minderung der ISI ist die Nutzung von Linear-Equalizern in Form von Vorwärtskopplung-Equalizers (feedforward equalizers, FFEs) sowie von Nichtlinear-Equalizern in Form von Datenentscheidern mit Rückkopplungsschleife (decision feedback equalizers, DFEs). Beide Equalizer-Typen werden in Glasfaserkommunikationssystemen zur Kompensation unterschiedlicher Dispersions-Typen eingesetzt. Im Gegensatz zu FFEs können DFEs Diskontinuitäten im Frequenzgang des Kanals kompensieren und verstärken das hochfrequente Rauschen nicht; bei sehr hohen Bitraten ist ihr Design jedoch durch die Zeitbedingungen der Rückkopplungsschleife sehr anspruchsvoll.

Diese Dissertation präsentiert den Entwurf und die Charakterisierung von DFEs für 80 und 110 Gb/s in der IHP-Technologie 0.13  $\mu$ m SiGe:C BiCMOS. Eine modifizierte Architektur wurde beschrieben und zur Implementierung der DFE benutzt, um die Zeitbedingungen sowie das Verhalten der Rückkopplungsschleife zu verbessern. Schaltungs-Techniken zur Erhöhung der Bandbreite der DFE-Bausteine wurden erklärt. Darüber hinaus wurden neue Mess-Techniken eingesetzt, um die Fähigkeit des DFE, bei derart hohen Bitraten zu arbeiten, nachzuweisen. Die Funktionalität des 80 Gb/s DFE wurde in einem Experiment zur Minderung der ISI für bandbreitenbegrenzte Kanäle demonstriert. Die Anwendung der in dieser Arbeit entwickelten DFEs ist nicht auf Glasfaserkommunikationssysteme eingeschränkt, sondern kann ebenso in Chip-to-Chip- sowie Board-to-Board-Kommunikationssystemen eingesetzt werden.



# Acknowledgments

“God does not begin by asking our ability, but more of our availability. When we prove our dependability, He will increase our capability.” Neal A. Maxwell.

First I would like to thank my supervisor Prof. Dr.-Ing. Andreas Thiede for providing me with an interesting project, planting me in a fruitful work environment and most of all for his constant guidance, valuable advice, and patience throughout this work.

My gratitude goes also to Prof. Dr.-Ing. Christoph Scheytt, who, as the head of IHP circuit department, has been a second supervisor to me. I would like also to thank my head of department after him, Dr.-Ing. Gunter Fischer for his support.

My thanks go also to Prof. Dr. Sc. Techn. Frank Ellinger, my second thesis adviser, for reviewing the work.

I wish to express my deepest appreciation to Dr. Lothar Möller of Bell Labs, who, through his experiments, “has breathed into my circuit the breath of life”.

I am very grateful for the help and support of my friends and former colleagues, Mohamed Elkhouly and Rajasekhar Nagulapalli, and many other people from IHP, whose names would at least span one more page. My tremendous appreciation also goes to Dr. Marcel Kroh and Dr. Keith Nelson for reviewing the work.

I will be forever in debt to my parents, without whose encouragement, sacrifices and prayers, I wouldn't have been able to do this work.

Last but not least, I would like to thank my wife, Ola, for her support and understanding. Her faith has encouraged me every time to get up after I fell and to press forward, when it seemed almost impossible, and as she is my eternal partner in everything, I regard this dissertation, without any doubt, not an exception for that.



If you cannot explain it simply,  
you don't understand it well  
enough.

---

Albert Einstein



# Contents

|                                                                |            |
|----------------------------------------------------------------|------------|
| <b>Contents</b>                                                | <b>x</b>   |
| <b>List of Figures</b>                                         | <b>xiv</b> |
| <b>List of Tables</b>                                          | <b>xx</b>  |
| <b>1 Introduction</b>                                          | <b>1</b>   |
| 1.1 Transmission Impairments in Optical Fibers . . . . .       | 3          |
| 1.2 Feedforward versus Feedback Equalization . . . . .         | 5          |
| 1.3 Chip-to-Chip and Board-to-Board Applications . . . . .     | 7          |
| 1.4 Motivation . . . . .                                       | 7          |
| 1.5 Organization of the Thesis . . . . .                       | 7          |
| <b>2 Decision Feedback Equalizer Architectures</b>             | <b>9</b>   |
| 2.1 Timing Metrics for Sequential Circuits . . . . .           | 9          |
| 2.1.1 Latches . . . . .                                        | 9          |
| 2.1.2 Edge-triggered D-flip-flops . . . . .                    | 10         |
| 2.2 DFE Architectures . . . . .                                | 14         |
| 2.2.1 Conventional Decision Feedback Loops (DFLs) . . . . .    | 14         |
| 2.2.2 Look-ahead DFEs . . . . .                                | 15         |
| 2.2.3 Half-rate Parallel Look-ahead DFEs . . . . .             | 17         |
| 2.2.4 The Modified Half-rate Parallel Look-ahead DFE . . . . . | 28         |

---

|                                                                                           |           |
|-------------------------------------------------------------------------------------------|-----------|
| <b>3 Active and Passive Components for the Proposed DFE</b>                               | <b>35</b> |
| 3.1 Design of the DFE Broadband Front-end . . . . .                                       | 35        |
| 3.1.1 Bandwidth Requirement . . . . .                                                     | 36        |
| 3.1.2 Design and Characterization of the First Version of the DFE<br>Front-end . . . . .  | 45        |
| 3.1.2.1 Circuit Description . . . . .                                                     | 45        |
| 3.1.2.2 Circuit Characterization . . . . .                                                | 46        |
| 3.1.3 Design and Characterization of the Second Version of the DFE<br>Front-end . . . . . | 47        |
| 3.1.3.1 Circuit Description . . . . .                                                     | 47        |
| 3.1.3.2 Circuit Characterization . . . . .                                                | 51        |
| 3.2 Static Frequency Dividers . . . . .                                                   | 53        |
| 3.2.1 86 GHz Static Frequency Divider . . . . .                                           | 55        |
| 3.2.1.1 Circuit Description . . . . .                                                     | 55        |
| 3.2.1.2 Circuit Characterization . . . . .                                                | 57        |
| 3.2.2 100 GHz Static Frequency Divider . . . . .                                          | 58        |
| 3.2.2.1 Circuit Description . . . . .                                                     | 58        |
| 3.2.2.2 Circuit Characterization . . . . .                                                | 67        |
| 3.3 Design of Broadband Clock Distribution Network . . . . .                              | 68        |
| 3.4 Generating Differential Signals from Single-ended CW Sources . . . . .                | 71        |
| 3.4.1 Circuit Description . . . . .                                                       | 75        |
| 3.4.2 Circuit Characterization . . . . .                                                  | 76        |
| <b>4 Design and Testing of the Proposed DFE for 80 and 110 Gb/s</b>                       | <b>81</b> |
| 4.1 Integrating the Components . . . . .                                                  | 81        |
| 4.1.1 Floor Planing . . . . .                                                             | 81        |
| 4.1.2 Layout Techniques . . . . .                                                         | 84        |
| 4.2 Measurement Techniques . . . . .                                                      | 86        |
| 4.2.1 Measuring the Maximum Operating Bit Rate of the Combinational Logic . . . . .       | 87        |

|                                                                                 |            |
|---------------------------------------------------------------------------------|------------|
| 4.2.2 Measuring the Maximum Operating Frequency of the Feed-back Loop . . . . . | 94         |
| 4.3 Measurement at the Full Bit Rate . . . . .                                  | 96         |
| 4.4 Design and On-wafer Measurement of a 110 Gb/s DFE . . . . .                 | 101        |
| 4.5 Comparison with State of the Art in DFEs . . . . .                          | 106        |
| <b>5 Conclusion and Outlook</b>                                                 | <b>109</b> |
| <b>Appendix A Fabrication Technology</b>                                        | <b>113</b> |
| <b>Appendix B Deembedding</b>                                                   | <b>115</b> |
| <b>Appendix C Small Signal Voltage Gain and S-parameters</b>                    | <b>119</b> |
| <b>References</b>                                                               | <b>123</b> |
| <b>List of Publication</b>                                                      | <b>133</b> |



# List of Figures

|      |                                                                                                                                                                                                                      |    |
|------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 1.1  | Number of Internet users per 100 inhabitants of the world. . . . .                                                                                                                                                   | 1  |
| 1.2  | Evolution of the bit rate in the IEEE 802.3 Ethernet standard. . . . .                                                                                                                                               | 2  |
| 1.3  | Single-mode fiber versus multi-mode fiber. . . . .                                                                                                                                                                   | 3  |
| 1.4  | Short-reach optical fiber communication links connecting data centers. . . . .                                                                                                                                       | 4  |
| 2.1  | A latch. . . . .                                                                                                                                                                                                     | 9  |
| 2.2  | Illustration of the timing metrics of a latch. . . . .                                                                                                                                                               | 10 |
| 2.3  | A positive edge-triggered D-flip-flop. . . . .                                                                                                                                                                       | 11 |
| 2.4  | An example of a sequential circuit to illustrate the timing metrics of D-flip-flops. . . . .                                                                                                                         | 12 |
| 2.5  | Definition of the timing metrics for sequential circuits. . . . .                                                                                                                                                    | 12 |
| 2.6  | Conventional DFL. . . . .                                                                                                                                                                                            | 14 |
| 2.7  | Look-ahead DFE. . . . .                                                                                                                                                                                              | 16 |
| 2.8  | Half-rate parallel look-ahead DFE. . . . .                                                                                                                                                                           | 18 |
| 2.9  | Simplified timing diagram of the architecture shown in Fig. 2.8 with the dashed lines, when all conditions in Ineq. 2.16-2.15 are satisfied. . . . .                                                                 | 20 |
| 2.10 | (a) Direct implementation of Eqn. 2.23 and 2.24 in the half-rate parallel look-ahead DFE (b) Simplified timing diagram of the architecture shown in (a) at low bit rates, where Ineq. 2.26 is not satisfied. . . . . | 25 |
| 2.11 | Simplified timing diagram of the architecture shown in Fig. 2.10a at high bit rates, where Ineq. 2.26 is satisfied. . . . .                                                                                          | 26 |

---

|                                                                                                                                                              |    |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 2.12 (a) The modified half-rate parallel look-ahead DFE (b) Simplified timing diagram of the architecture shown in (a), when $t_{cq,FF\_1-4} < UI$ . . . . . | 29 |
| 2.13 Simplified timing diagram of the architecture shown in Fig. 2.12a, when $t_{cq,FF\_1-4} > UI$ . . . . .                                                 | 32 |
| 3.1 Block diagram of the tilting amplifier and D-flip-flop at the front-end. . . . .                                                                         | 36 |
| 3.2 Input and output eye-diagrams for a first order RC LPF with $f_c$ (in GHz)=0.7, 0.4 and $0.2R_b$ . . . . .                                               | 37 |
| 3.3 Conventional circuit implementation of the tilting amplifier and the master latch. . . . .                                                               | 37 |
| 3.4 The back-gate feedback technique. . . . .                                                                                                                | 39 |
| 3.5 Tilting amplifier merged into the latch. . . . .                                                                                                         | 40 |
| 3.6 The front-end master latch with cascode configuration. . . . .                                                                                           | 41 |
| 3.7 The equivalent simplified half-circuit of the schematic in Fig. 3.6. . . . .                                                                             | 42 |
| 3.8 Chip photo of the first version of the DFE front-end . . . . .                                                                                           | 45 |
| 3.9 Measurement and simulation results of the first version of the DFE front-end . . . . .                                                                   | 46 |
| 3.10 Common emitter amplifier with inductive peaking. . . . .                                                                                                | 47 |
| 3.11 Chip photo of the second version of the DFE front-end. . . . .                                                                                          | 48 |
| 3.12 Schematic of the second version of the DFE front-end. . . . .                                                                                           | 49 |
| 3.13 Post-layout simulations of the second version of the DFE front-end. . . . .                                                                             | 50 |
| 3.14 Using MSL to realize the 100 pH required inductor. . . . .                                                                                              | 50 |
| 3.15 Chip photo showing the integration of the front-end into the DFE. TM2 appears in bright gold color in the dark background representing M3. . . . .      | 51 |
| 3.16 Measurement and simulation results of the second version of the DFE front-end. . . . .                                                                  | 54 |
| 3.17 Block diagram of the static frequency divider . . . . .                                                                                                 | 55 |
| 3.18 Schematic of one latch from the 86 GHz static frequency divider . . . . .                                                                               | 56 |
| 3.19 Chip photo of the 86 GHz static frequency divider . . . . .                                                                                             | 56 |

|      |                                                                                                                                 |    |
|------|---------------------------------------------------------------------------------------------------------------------------------|----|
| 3.20 | Sensitivity curve of the 86 GHz divider . . . . .                                                                               | 57 |
| 3.21 | Schematic of one latch from the 100 GHz static frequency divider . . . . .                                                      | 59 |
| 3.22 | Schematic used to optimize the sizes of the two emitter followers . . . . .                                                     | 59 |
| 3.23 | The effect of using different sizes of emitter followers. . . . .                                                               | 60 |
| 3.24 | Input impedance of emitter followers. . . . .                                                                                   | 61 |
| 3.25 | Output impedance of emitter followers. . . . .                                                                                  | 63 |
| 3.26 | Input reflection coefficient for the 100 GHz divider. . . . .                                                                   | 66 |
| 3.27 | Chip photo of the 100 GHz divider . . . . .                                                                                     | 67 |
| 3.28 | Sensitivity of the 100 GHz divider. . . . .                                                                                     | 68 |
| 3.29 | Odd-mode characteristic impedances and the lengths of the coupled MSTLs in the clock tree. . . . .                              | 69 |
| 3.30 | Reflection coefficient at the input of the clock tree. . . . .                                                                  | 70 |
| 3.31 | Two configurations for active balun implementation. . . . .                                                                     | 72 |
| 3.32 | small-signal model of the differential pair based balun. . . . .                                                                | 73 |
| 3.33 | Deviation from ideal balun behavior for the two configurations in Fig. 3.31b and 3.31a. . . . .                                 | 74 |
| 3.34 | The block diagram of the active balun. . . . .                                                                                  | 75 |
| 3.35 | First stage of the active balun. . . . .                                                                                        | 75 |
| 3.36 | Chip photo of the active balun. . . . .                                                                                         | 77 |
| 3.37 | Measured and simulated (with parasitics) forward transmission coefficients ( $S_{21}$ and $S_{31}$ ). . . . .                   | 77 |
| 3.38 | Measured phase and amplitude errors. . . . .                                                                                    | 78 |
| 3.39 | Measured input and output reflection coefficients as well as reverse transmission coefficients. . . . .                         | 78 |
| 3.40 | Time domain measurement with a -11 dBm input signal at 45 GHz (X-axis scale is 5 ps/div and Y-axis scale is 40 mV/div). . . . . | 79 |
| 4.1  | The modified half-rate parallel look-ahead DFE. . . . .                                                                         | 82 |
| 4.2  | Floor planning of the modified half-rate parallel look-ahead DFE. . . . .                                                       | 83 |
| 4.3  | Chip photo of the DFE. . . . .                                                                                                  | 83 |

|      |                                                                                                                                                                                |     |
|------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 4.4  | Simplified layout of a part of the equalizer core in Fig. 4.1 containing FF_1-4, LT_1-4, and MUX_1-4. . . . .                                                                  | 85  |
| 4.5  | Simplified layout of the feedback at the end of the odd channel. . . . .                                                                                                       | 86  |
| 4.6  | Measuring the maximum operating bit rate of the combinational logic. . . . .                                                                                                   | 88  |
| 4.7  | (a) Measurement setup for the maximum operating bit rate of the combinational logic (b) Bit error rate. . . . .                                                                | 89  |
| 4.8  | The input bit sequence at 40 Gb/s used in the post-layout simulation of the equalizer core. . . . .                                                                            | 90  |
| 4.9  | Post-layout simulation of the differential signal A(2m): (a) 40 Gb/s, (b) 45 Gb/s, (c) 50 Gb/s. . . . .                                                                        | 91  |
| 4.10 | Post-layout simulation of the differential output of MUX_1: (a) 40 Gb/s, (b) 45 Gb/s, (c) 50 Gb/s. . . . .                                                                     | 92  |
| 4.11 | Post-layout simulation of the differential signal at the output of FF_5: (a) 40 Gb/s, (b) 45 Gb/s, (c) 50 Gb/s. . . . .                                                        | 93  |
| 4.12 | Measuring the maximum operating bit rate of maximum operating frequency of the feedback loop. . . . .                                                                          | 95  |
| 4.13 | Possible logic values for <i>A</i> and <i>B</i> from the front-end comparators for different analog input. . . . .                                                             | 96  |
| 4.14 | Measurement setup for testing the DFE at 80 Gb/s. . . . .                                                                                                                      | 98  |
| 4.15 | (a) The 80 Gb/s single-ended eye-diagram at the input of DFE without distortion (30mV/div and 5ps/div) (b) The corresponding BER versus $V_{ref}$ . . . . .                    | 98  |
| 4.16 | The measured $S_{21}$ of the 20 GHz low-pass filter. . . . .                                                                                                                   | 99  |
| 4.17 | (a) The distorted single-ended eye-diagram at the output of the 20 GHz low-pass filter (50mV/div and 5ps/div) (b) The corresponding BER contour map. . . . .                   | 99  |
| 4.18 | (a) The BER versus $V_{ref}$ at the optimum sampling time in the BER contour map (b) The single-ended 40 Gb/s output eye-diagram from the DFE (35mV/div and 10ps/div). . . . . | 100 |

|                                                                                                                                                                           |     |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 4.19 The measured $S_{21}$ of an SMA elbow. . . . .                                                                                                                       | 100 |
| 4.20 (a) The distorted single-ended eye-diagram at the output of the SMA elbow (50mV/div and 5ps/div) (b) The corresponding BER contour map. . . . .                      | 101 |
| 4.21 The architecture of the 110 Gb/s DFE. . . . .                                                                                                                        | 102 |
| 4.22 Post-layout simulation at 50 Gb/s DFE of the improved architecture: (a) the differential signal $A(2m)$ , (b) the differential signal at the output of FF_7. . . . . | 103 |
| 4.23 Chip photo of the 110 Gb/s DFE. . . . .                                                                                                                              | 104 |
| 4.24 Infrared picture of the 110 Gb/s DFE measured on-wafer (rotated by 90° with respect to Fig. 4.23). . . . .                                                           | 105 |
| 4.25 Infrared picture of the 110 Gb/s DFE measured on-board. . . . .                                                                                                      | 106 |
| 4.26 Power dissipation breakdown for the (a) 80 Gb/s and (b) 110 Gb/s DFE. . . . .                                                                                        | 107 |
| <br>                                                                                                                                                                      |     |
| A.1 Metalization stack of IHP SG13S technology. . . . .                                                                                                                   | 114 |
| <br>                                                                                                                                                                      |     |
| B.1 The error networks and the DUT. . . . .                                                                                                                               | 115 |
| B.2 The embedded DUT in the fixture and the corresponding model. . . . .                                                                                                  | 116 |
| B.3 The “open” structure and its corresponding model. . . . .                                                                                                             | 117 |
| B.4 The “short” structure and its corresponding model. . . . .                                                                                                            | 117 |
| <br>                                                                                                                                                                      |     |
| C.1 The two port network. . . . .                                                                                                                                         | 119 |



# List of Tables

|     |                                                                                            |     |
|-----|--------------------------------------------------------------------------------------------|-----|
| 3.1 | State of the art in active baluns with bandwidth in excess of 2 GHz . . . . .              | 80  |
| 4.1 | State of the art in DFEs for 40 Gb/s and above. Except stated, the DFE uses 1-tap. . . . . | 107 |
| A.1 | Summary of IHP SG13S technology parameters. . . . .                                        | 113 |



# List of Abbreviations

|        |                                               |
|--------|-----------------------------------------------|
| ADC    | Analog to digital converter                   |
| Balun  | Balanced-to-unbalanced converter              |
| BER    | Bit error rate                                |
| BiCMOS | Bipolar CMOS                                  |
| BW     | 3 dB bandwidth                                |
| CB     | Common base                                   |
| CD     | Chromatic dispersion                          |
| CE     | Common emitter                                |
| CLINP  | Lossy coupled transmission line model in ADS® |
| Clk    | Clock                                         |
| CMOS   | Complementary metal-oxide-semiconductor       |
| CW     | Continuous wave                               |
| DFE    | Decision feedback equalizer                   |
| DSE    | Data signal emulator                          |
| DSP    | Digital signal processing                     |
| ECL    | Emitter coupled logic                         |
| EF     | Emitter follower                              |
| EM     | Electromagnetic                               |
| FF     | D-flip-flop                                   |
| FIR    | Finite impulse response                       |
| GaAs   | Gallium-arsenide                              |
| HBT    | Heterojunction bipolar transistor             |

|         |                                                 |
|---------|-------------------------------------------------|
| IIR     | Infinite impulse response                       |
| ISI     | Intersymbol interference                        |
| IP      | Internet protocol                               |
| LAN     | Local area network                              |
| LED     | Light-emitting diode                            |
| LT      | Latch                                           |
| MIM     | Metal-insulator-metal                           |
| MMF     | Multimode fiber                                 |
| MOS     | metal-oxide-semiconductor                       |
| MSTL    | Microstrip transmission line                    |
| MUX     | Digital multiplexer                             |
| OOK     | On-off keying                                   |
| PCB     | Printed circuit board                           |
| PGSGSGP | probe configuration (P:power,G:ground,S:signal) |
| PHEMT   | Pseudomorphic high electron mobility transistor |
| PMD     | Polarization mode dispersion                    |
| PRBS    | Pseudorandom binary sequence                    |
| QAM     | Quadrature-amplitude modulation                 |
| QPSK    | Quadrature-phase shift keying                   |
| SCL     | Source coupled logic                            |
| SiGe    | Silicon-germanium                               |
| SMF     | Single-mode fiber                               |
| TVoIP   | Television over IP                              |
| UI      | Unit interval (bit period)                      |
| VCSEL   | Vertical-cavity surface-emitting laser          |
| VNA     | Vector network analyzer                         |
| VoIP    | Voice over IP                                   |
| WAN     | Wide area network                               |

# Chapter 1

## Introduction

The recent years have experienced an unprecedented increase in the number of Internet users across the globe. As the statistics in Fig. 1.1 show, the number of Internet users per 100 inhabitants of the world grew from 1 user in 1995 to more than 40 users in 2014<sup>1</sup>. This increased number of users together with Internet services such as voice and television over IP (VoIP and TVoIP), video on demand streaming services and cloud computing, has resulted in an exponential growth in data traffic, which in turn is fueling an unquenchable demand for high data transmission speeds.



Figure 1.1: Number of Internet users per 100 inhabitants of the world.

<sup>1</sup>Source: The World Bank Group and the International Telecommunication Union (ITU).

Over the years, as shown in Fig. 1.2, the IEEE 802.3 Ethernet standard has been continuously updated to adapt to the ever-increasing demand for higher bit rates. Since its emergence in the early 1980s and standardization by IEEE in 1983, the Ethernet technology has presented an attractive solution for local area networks (LANs) because of its simplicity and low cost of implementation in comparison to other LAN technologies like token ring and fiber distributed data interface (FDDI). Currently, the Ethernet has become the dominant technology for LANs in homes and workplaces and becoming also popular for wide area networks (WANs) connecting cities, countries and even continents [1].



Figure 1.2: Evolution of the bit rate in the IEEE 802.3 Ethernet standard.

Optical fibers are utilized for communication at high bit rates because of their large bandwidth, low loss, light weight and immunity to electromagnetic interference compared to copper cables. Two kinds of optical fibers are employed in communications: multi-mode fibers (MMFs) and single-mode fibers (SMFs). The physical difference between them is the diameter of the core, which has a higher refractive index compared to the cladding, as shown in Fig. 1.3. While MMFs have typical core diameter of 50 or 62.5  $\mu\text{m}$ , SMFs have a core diameter of 8-10  $\mu\text{m}$ . Both types of fibers have a typical cladding diameter of 125  $\mu\text{m}$ .

The essential difference between the two types of fiber, however, is the number of modes they support for light propagation. Because of their relatively large core



Figure 1.3: Single-mode fiber versus multi-mode fiber.

diameter, MMFs support multiple modes of light propagation [2]. On the contrary, SMFs support only one mode for light propagation, as illustrated in Fig. 1.3.

## 1.1 Transmission Impairments in Optical Fibers

Unfortunately, the information transmission capacity of optical fiber is rather limited due to different kinds of dispersions. MMFs suffer from modal dispersion, which occurs because the different modes of light propagation travel with different velocities inside the core. The effect of modal dispersion is that pulses with short duration tend to broaden in time as they travel along the fiber. This effect leads to intersymbol interference (ISI) and limits the bandwidth of the fiber. On the contrary, SMFs do not suffer from modal dispersion and consequently have significantly higher bandwidth compared to MMFs, or alternatively, for the same bit rate, SMF links can be substantially longer than MMF. They suffer, however, from two other types of dispersion known as chromatic dispersion (CD) and polarization mode dispersion (PMD). Chromatic dispersion happens because the refractive index of the core material is not constant for all wavelengths. Since laser light does not have a single wavelength

but a finite spectral width in frequency domain, CD also leads to pulse broadening and consequently ISI. PMD is a special case of modal dispersion in which the two orthogonal polarizations of the light travel with different velocities in the fiber core, leading also to ISI. Many factors contribute to PMD. Among them are deviation of the fiber cross section from the circular geometry and different mechanical and thermal stresses along the fiber [3]. In addition to the different kinds of dispersion, the bandwidth limitation of the transmitter and receivers themselves result also in ISI.



Figure 1.4: Short-reach optical fiber communication links connecting data centers.

Despite their limited bandwidth compared to SMF, MMF are extensively employed in short-reach communication systems. Examples are, as illustrated in Fig. 1.4, to interconnect data center infrastructures or collocated communication equipment racks. Another example is the interconnection of 'data farms', which comprises mainframe computers. The typical range of such short-reach optical fiber links is 100-300 m. MMFs offer a good solution for these applications, because their large core diameter - compared to SMFs - allows the coupling of light from low-cost light sources, such as vertical-cavity surface-emitting lasers (VCSELs) and light-emitting diodes (LEDs) with less stringent coupling and alignment tolerances. The large core

diameter also facilitates the utilization of cost-effective tolerance-relaxed connectors [4] [5].

For most of the above short-reach optical fiber links, intensity modulation in form of on-off keying (OOK) and direct detection are used as modulation scheme and detection method [6], respectively. This is largely because of their cost-effectiveness and simplicity [7] [8] compared to rather complex - but more spectral efficient - modulation techniques like quadrature-phase shift keying (QPSK) and quadrature-amplitude modulation (QAM), which require coherent detection [9] and are employed in long-haul optical fiber links with SMFs.

In March 2014, the Ethernet study group [10] has finished its study of the next Ethernet standard supporting a bit rate of 400 Gb/s and handed over the results of the study to the Ethernet task force [11]. Currently the Ethernet task force is working on this new standard and intends to publish it in March 2017 [11]. The new Ethernet standard will support at least 100 m MMF links. Four 100 Gb/s channels are expected to run in parallel to achieve this 400 Gb/s bit rate. Here, ISI resulting from modal dispersion in MMF presents a challenge to reach this high bit rate per channel.

## 1.2 Feedforward versus Feedback Equalization

Solutions to mitigate ISI should be [12]

1. cost-effective.
2. adaptive, since the environment conditions such as temperature and mechanical stresses affect dispersion. Adaptivity is also necessary because of the different fiber length for different links.
3. compact in size and integrable with the already existing and running optical fiber links.

Although optical and electronic solutions are available to mitigate ISI. The advantages of electronic solutions surpass their optical counterparts in the aforementioned

three points. Electronic solutions include analog and DSP-based equalizers. In DSP-based equalizers, the optical received signal is converted to electrical signal and then sampled and quantized using analog to digital converters (ADCs). The equalizers are then implemented on DSP-level. Although this method is very effective in mitigating ISI [6] [13], because of the easiness to implement different kind of equalizers and adaption techniques on a DSP-level, they require the use of very high-speed ADSs [14] [15] and are predominantly utilized in long-haul optical fiber links. Short-reach applications, however, mostly employ less expensive and less complex electronic solutions such as analog equalizers.

Analog equalizers fall into two categories: linear and non-linear equalizers. Linear equalizers include finite impulse response (FIR) filters, which are also called feed-forward equalizers (FFEs), and infinite impulse response (IIR) filters [16] [17]. On the contrary to IIR filters, which may become sometimes unstable and suffer from non-linear phase response characteristics [18], FIR filters are rather versatile [17] and can have a linear phase response, which insures pulse fidelity in time domain. Given enough numbers of taps, it is possible to design an FIR filter with arbitrary magnitude or phase response. That is the reason behind the wide usage of FIR filters in communication systems. Analog linear filters, however, suffer from some disadvantages such as the amplification of noise and the inability to compensate for deep nulls (discontinuities) in the channel frequency response [16] [19] [20] [21]. The decision feedback equalizer (DFE), which details are going to be discussed in Sec. 2.2 is a non-linear filter capable of compensating deep nulls in the channel frequency response without amplifying the noise. However, DFEs are more challenging to implement at high bit rates because of the timing condition of their feedback loop. In general optical fiber communication systems normally employ a several-tap FFE filter followed by at least a one-tap DFE equalizer [16].

## 1.3 Chip-to-Chip and Board-to-Board Applications

Not only are equalizers employed in optical fiber communication systems but also in chip-to-chip and board-to-board communication. Here, equalization is also necessary because of the losses in the copper traces on printed circuit boards (PCBs) due to skin effect and dielectric loss [22] [23]. Moreover, reflections may occur because of discontinuities in the impedance along the transmission lines on the PCB or when interfacing them to the different connector types. These reflections may appear as discontinuities in the frequency response of the channel, for which also DFEs are a very effective means to equalize. Analog equalizers are generally utilized in such applications together with simple modulation formats such as on-off-keying.

## 1.4 Motivation

The motivation behind this work is the design and characterization of a DFE for 100 Gb/s bit rate. Following a review of the different architectures for DFE implementation in the literature to overcome the challenging timing condition of the feedback loop, a modification on an already-existing architecture is presented to further improve its timing behavior. Next, the work describes techniques for the design and measurement of the individual high-speed building blocks, which build up the DFE. Based on the described techniques, 80 and 110 Gb/s DFEs are designed. Furthermore, new and innovative measurement techniques are devised to prove the ability of the DFEs to work at such high bit rates. Finally, the 80 Gb/s DFE is practically utilized in an experiment to mitigate ISI in a bandwidth-limited channel.

## 1.5 Organization of the Thesis

This thesis is organized as follows:

### Chapter 2

Presents firstly the timing metrics for sequential circuits. The principle of deci-

sion feedback equalization is then discussed. Following that, the timing metrics are utilized to compare different implementations of DFEs with the advantages and disadvantages of each outlined. An architecture is then selected for implementation of the DFEs in this work.

### Chapter 3

Describes the design and characterization of the active and passive components necessary for building up the DFEs, like the high-bandwidth front-end, D-flip-flops and clock distribution network. Different techniques for bandwidth enhancement are also explained in this chapter. These techniques are employed for example, in designing static frequency dividers up to 100 GHz to evaluate the performance of the D-flip-flops.

### Chapter 4

Explains the integration of the different active and passive components from the previous chapter into a first version of the DFE working till 80 Gb/s. Furthermore, new measurement techniques are then presented to test the different bottlenecks of the design. In an experiment performed at Bell Labs, Alcatel-Lucent, Holmdel, New Jersey, USA, the ability of the DFE is demonstrated in mitigating ISI caused by bandwidth limitation for 80 Gb/s bit rate. The chapter then concludes by introducing further modifications and enhancement on the 80 Gb/s DFE. These modifications are applied in the design and measurement of a 110 Gb/s DFE.

### Chapter 5

Gives a summary of the results out of this work together with suggestions for further improvements and elaborations.

# Chapter 2

## Decision Feedback Equalizer Architectures

### 2.1 Timing Metrics for Sequential Circuits

Before delving into the details of DFE architectures, it is important to discuss the timing metrics of sequential circuits, which are used later in this chapter for comparing the timing conditions of the different DFE architectures. In the following two sections the timing metrics of the two most used sequencing elements, namely the latches and edge-triggered D-flip-flops, are going to be discussed.

#### 2.1.1 Latches

The latch, shown in Fig. 2.1, is level sensitive and becomes transparent (i.e.,  $Q=D$ ) when its enable signal  $E$  (here, the clock) is high.



Figure 2.1: A latch.



Figure 2.2: Illustration of the timing metrics of a latch.

Fig. 2.2 illustrates the timing metrics of a latch. These timing metrics are defined in the following points:

1. When Clk becomes high, the input data D is transmitted to the output Q after a worst-case propagation delay of  $t_{cq}$ , known as clock-to-output delay.
2. If the latch is already transparent (i.e.,  $\text{Clk}='1'$ ) and the input data D changes, then this change appears at the output Q after a worst-case propagation delay of  $t_{dq}$ , known as data-to-output delay.
3. For the data D to be correctly stored at or transmitted to the output Q when Clk is turning from high to low, the data must remain unchanged for a period of  $t_{stp}$ , known as the setup time, before Clk changes from high to low. Moreover, the data must remain unchanged at the input D, even after Clk changes from high to low, for a period of  $t_{hld}$ , known as the hold time.
4.  $t_{cd,cq}$  is the minimum clock-to-output delay (or contamination delay). Similarly,  $t_{cd,dq}$  is the minimum data-to-output delay (or contamination delay).

### 2.1.2 Edge-triggered D-flip-flops

An edge-triggered D-flip-flop consists of two latches in cascade working on opposite clock levels, as shown in Fig. 2.3. This structure is also known as master-slave D-flip-flop, where LT<sub>1</sub> is called the master latch and LT<sub>2</sub> the slave latch. Contrary to a

latch, which is level sensitive, a D-flip-flop is sensitive to the clock edge.



Figure 2.3: A positive edge-triggered D-flip-flop.

There are four important timing parameters associated with edge-triggered D-flip-flops [24]:

1. The setup time  $t_{stp}$  is the time that the data input at D must remain stable before the clock transition at which the D-flip-flop is triggered. This time is basically the setup time of the master latch LT\_1 in Fig. 2.3.
2. The hold time  $t_{hld}$  is the time that the data input must remain stable after the clock edge, which is basically the hold time of the master latch LT\_1 in Fig. 2.3.
3. Assuming that the setup and hold times are met, the input data at D is transmitted to the output Q after a worst-case propagation delay (with reference to the clock edge) of  $t_{cq}$ , which is called the clock-to-output delay and is essentially the clock-to-output delay of the slave latch LT\_2 in Fig. 2.3.
4.  $t_{cd,cq}$  is the minimum clock-to-output delay (or contamination delay).

The timing metrics of D-flip-flops can be illustrated with the help of the simple sequential circuit in Fig. 2.4, consisting of two positive edge-triggered D-flip-flops FF\_1,2 and a combinational logic in between.



Figure 2.4: An example of a sequential circuit to illustrate the timing metrics of D-flip-flops.



Figure 2.5: Definition of the timing metrics for sequential circuits.

Assuming that:

1. FF\_1,2 receive the clock edge simultaneously.
2. The combinational logic has a maximum propagation delay of  $t_{p,logic}$  and a minimum (or contamination) delay of  $t_{cd,logic}$ .
3. FF\_1,2 are identical, i.e., they have the same  $t_{s,tp}$ ,  $t_{hld}$ ,  $t_{cq}$  and  $t_{cd,cq}$ .
4. The clock period is  $T_{clk}$ .

Then Fig. 2.5 shows a simple timing diagram of the data transmission into FF<sub>1</sub> and then from FF<sub>1</sub> to FF<sub>2</sub>. For proper transmission of the data from the input of FF<sub>1</sub> to its output, the setup and hold time conditions of FF<sub>1</sub> have to be met.

For proper transmission of the data from FF<sub>1</sub> to FF<sub>2</sub>, the following condition has to be met

$$t_{cq,FF\_1} + t_{p,logic} + t_{stp,FF\_2} < T_{clk} \quad (2.1)$$

The timing condition of the critical path in Inequality (Ineq.) 2.1 defines the maximum frequency at which the circuit in Fig. 2.4 can work. It emphasizes that the clock period must be long enough for the data to propagate through FF<sub>1</sub>, the combinational logic and be set-up at FF<sub>2</sub>.

Furthermore, for proper operation of the circuit, another condition concerning the hold time of the destination D-flip-flop FF<sub>2</sub> has to be met. The hold time of FF<sub>2</sub> must be shorter than the minimum propagation delay through the source D-flip-flop FF<sub>1</sub> and the combination logic.

$$t_{hld,FF\_2} < t_{cd,cq,FF\_1} + t_{cd,logic} \quad (2.2)$$

The above conditions do not take into account the spatial variation in arrival time of the clock edge at the two D-flip-flops, which is known as clock skew. Assuming that FF<sub>1</sub> and FF<sub>2</sub> receive the clock edge at  $t_1$  and  $t_2$ , respectively. And assuming that FF<sub>1</sub> receives it before FF<sub>2</sub>, that is  $t_2 > t_1$ , then to account for the clock-skew, Ineq. 2.1 has to be modified as follows

$$\begin{aligned} t_{cq,FF\_1} + t_{p,logic} + t_{stp,FF\_2} &< T_{clk} + \delta \\ t_{cq,FF\_1} + t_{p,logic} + t_{stp,FF\_2} - \delta &< T_{clk} \end{aligned} \quad (2.3)$$

where  $\delta = t_2 - t_1$  is the clock skew. If  $\delta > 0$  and the direction of clock propagation is the same as that of the data, it is called positive clock skew. As evident from Ineq. 2.3 when compared to Ineq. 2.1, positive clock skew relaxes the timing constraint on the

critical path, thereby increasing the maximum clock frequency at which the circuit can work. The hold condition, however, becomes more stringent in the case of positive clock skew. In comparison to Ineq. 2.2, the circuit must satisfy the following timing constraint for proper operation with the clock skew

$$t_{hld,FF\_2} + \delta < t_{cd,cq,FF\_1} + t_{cd,logic} \quad (2.4)$$

## 2.2 DFE Architectures

### 2.2.1 Conventional Decision Feedback Loops (DFLs)

As shown in Fig. 2.6, in a conventional decision feedback loop (DFL), also called a direct full-rate DFE or non-speculative DFE, the preceding bit  $a(n-1)$  is fed back with a certain weight and the induced ISI is subtracted from the succeeding bit  $a(n)$ , before a decision is made on its amplitude. The decision element is the D-flip-flop. The



Figure 2.6: Conventional DFL.

weighting and subtraction functions are implemented by the variable gain amplifier (VGA) and the summation amplifier, respectively.

Differential signaling is preferred over single-ended [25] because it reduces crosstalk. Therefore, in Fig. 2.6, as well as all the upcoming figures showing the different architectures, the data, clock and the reference voltage are differential signals, but are drawn as single-ended signals for simplicity.

Although the conventional DFL is the simplest DFE architecture and dissipates the lowest power among the other DFE architectures, its speed is limited by the finite processing time of the feedback loop, which must not exceed one unit interval (UI). One unit interval is defined as the duration time of one bit.

Neglecting analog effects such as ringing and settling time, the following timing condition should be met

$$t_{cq} + t_{VGA} + t_{SA} + t_{stp} < \text{UI} \quad (2.5)$$

where  $t_{VGA}$  and  $t_{SA}$  are the processing delay time arising from the VGA and the summation amplifier, respectively. The delay time arising from the feedback interconnect is neglected, since it is usually very small compared to the other delay times, especially if the layout is done carefully.

In addition to the timing condition of the critical path in Ineq. 2.5, another timing condition must be satisfied concerning the hold time of the D-flip-flop  $t_{hld}$  for proper operation

$$t_{hld} < t_{cd,cq} + t_{cd,VGA} + t_{cd,SA} \quad (2.6)$$

where  $t_{cd,VGA}$ ,  $t_{cd,SA}$ ,  $t_{cd,cq}$  are the minimum delay time (or contamination delay) [24] of the VGA, summation amplifier and D-flip-flop, respectively.

### 2.2.2 Look-ahead DFEs

To overcome the speed limit of DFLs and dynamic problems such as ringing, which arise from the fact that a new decision threshold has to be generated for each incoming bit, a look-ahead architecture (also called unrolled or speculative DFE) has been proposed in [21] [26] [27] and is depicted in Fig. 2.7. The look-ahead architecture uses two DC references (thresholds),  $V_{ref}$  and  $-V_{ref}$ , corresponding to the feedback signals for a preceded '1' and '0', respectively. Consequently,  $A(n)$  and  $B(n)$  are the two decisions simultaneously taken on the incoming bit  $a(n)$  for the two alternative cases that the preceding bit  $a(n - 1)$  was '1' or '0', respectively. The feedback is then implemented by a digital 2:1 multiplexer (later referred to as MUX), which selects



Figure 2.7: Look-ahead DFE.

the correct decision on  $a(n)$ , depending on the preceding bit  $a(n - 1)$ . This process can be represented by the following recursive relation:

$$a(n) = A(n)a(n - 1) + B(n)\bar{a}(n - 1) \quad (2.7)$$

In the analysis of the critical path in Fig. 2.7, as well as in all the upcoming figures showing the different architectures, a positive clock skew is assumed. This clock skew is designated at the bottom of the figures of the architectures by  $\delta$ .

Concerning the setup time of the FF\_3, two timing conditions have to be met. They correspond to the two delays from the selection line S to the MUX output, which is  $t_{sq}$ , and the delay from the input lines  $I_{1,0}$  to the MUX output, which is  $t_{iq}$ . These timing conditions are

$$t_{cq,FF\_1,2} + t_{iq} + t_{stp,FF\_3} - \delta < UI \quad (2.8)$$

$$t_{cq,FF\_3} + t_{sq} + t_{stp,FF\_3} < UI \quad (2.9)$$

If all the D-flip-flops are identical then the timing condition in Ineq. 2.9 is more stringent than the one in Ineq. 2.8, as usually  $t_{sq} > t_{iq}$  and since there is no clock skew in

Ineq. 2.9. For example, in high-speed logic families like emitter or source coupled logic (ECL or SCL), a MUX is implemented by stacking the transistors. The selection line is associated with transistors in the lower stack, whereas the input lines  $I_{1,0}$  are associated with transistors in the higher stack. Therefore, the selection line has higher delay to the output than the input lines [28].

The timing condition of the critical path within the feedback loop in Ineq. 2.9 is relaxed in comparison to that of the DFL in Ineq. 2.5, provided that  $t_{VGA} + t_{SA} > t_{sq}$ . Therefore, higher bit rates can be supported with look-ahead DFEs compared to conventional DFLs. One remaining drawback of this look-ahead architecture, however, is that the components must work at the full bit rate, since the clock frequency in this case is equal to the bit rate.

The critical path condition in Ineq. 2.9 imposes a constraint for the maximum bit rate of the look-ahead architecture. However, the hold time of FF\_3 imposes extra conditions for proper operation,

$$t_{hld,FF\_3} < t_{cd,cq,FF\_3} + t_{cd,sq,MUX} \quad (2.10)$$

$$t_{hld,FF\_3} + \delta < t_{cd,cq,FF\_1,2} + t_{cd,iq,MUX} \quad (2.11)$$

where  $t_{cd,iq,MUX}$  and  $t_{cd,sq,MUX}$  are the contamination delay of the MUX from its input lines to the output and from its selection line to the output, respectively.

### 2.2.3 Half-rate Parallel Look-ahead DFEs

The timing condition of the critical path within the feedback loop of the look-ahead architecture in Ineq. 2.9 can be further relaxed using not only the decisions  $A(n)$  and  $B(n)$  on the incoming bit  $a(n)$ , but also the two decisions on its preceding bit  $a(n-1)$ , namely  $A(n-1)$  and  $B(n-1)$  [26]. Two ideas have been used to exploit this principle:

1. Two parallel decision paths for the odd and even bits can be used, as shown in Fig. 2.8 (without the dashed lines), where the clock frequency is equal to



Figure 2.8: Half-rate parallel look-ahead DFE.

half the bit rate. The timing condition within the feedback loop, however, is not relaxed compared to the look-ahead architecture in Fig. 2.7, because two D-flip-flops (FF\_5,6, which work on the rising and falling edge of the clock) and two MUXs (MUX\_1,2) exist within the loop. The following Inequality describes this timing condition of the critical path in Fig. 2.8 (without the dashed lines)

$$t_{cq,FF\_5} + t_{sq,MUX\_2} + t_{stp,FF\_6} + t_{cq,FF\_6} + t_{sq,MUX\_1} + t_{stp,FF\_5} < 2UI \quad (2.12)$$

And since an assumption was made before that all D-flip-flops and MUXs are

identical, then Ineq. 2.12 can be simplified as follows

$$2t_{cq,FF\_5} + 2t_{sq,MUX\_1} + 2t_{stp,FF\_5} < 2UI \Rightarrow t_{cq,FF\_5} + t_{sq,MUX\_1} + t_{stp,FF\_5} < UI \quad (2.13)$$

Although the components in the architecture in Fig. 2.8 work at half the clock frequency compared to the look-ahead architecture in Fig. 2.7, the timing condition of its critical path in Ineq. 2.13 is identical to that of the look-ahead architecture in Ineq. 2.9.

To relax this timing condition, the D-flip-flops can be eliminated from the feedback loop as indicated by the dashed lines in Fig. 2.8 without causing oscillations [22] [29] [2]. However, the input and select lines of the MUXs in Fig. 2.8 (with the dashed lines) are staggered in time, which causes unwanted glitches at the output of the MUX with certain input patterns and may influence the data to be decided.

To understand this problem, a sequence 001100 is assumed to have been transmitted through a PMD-limited single-mode fiber channel.

As explained in Sec. 1.1, the receiver in this case sees two time-shifted copies of the transmitted sequence, 001100, superimposed on top of each other, which is a form of ISI. In this example, the power is assumed to have been split equally between the two modes at the input of the fiber and the group delay between the two orthogonal modes (known as the differential group delay) is assumed to equal one UI [20]. Since the ISI here is caused by one delayed bit, a 1-tap DFE is supposed to fully recover the distorted signal. Assuming the initial state of all D-flip-flops is '0', Fig. 2.9 shows a simplified timing diagram of the architecture in Fig. 2.8 with the dashed lines, when receiving the sequence 001100 in this example. It has to be noted here also that the photodiode used in the detection is assumed to have a linear reponsivity, which means that the photocurrent produced in the photodiode is linearly proportional to the optical power incident on it.



Figure 2.9: Simplified timing diagram of the architecture shown in Fig. 2.8 with the dashed lines, when all conditions in Ineq. 2.16-2.15 are satisfied.

Furthermore, this photocurrent is assumed to be converted to voltage by a linear transimpedance amplifier.

It appears from Fig. 2.9 that the condition for no racing, i.e., for avoiding the effect, that e.g. the decision on bit  $n$ , as it is stored as the output signal of FF\_6, does already influence the decision on its predecessor bit  $n - 1$  instead of the successor bit  $n + 1$  in MUX\_1, respectively, can be expressed by the inequality

$$t_{cq,FF\_3,4} + t_{iq,MUX\_2} + t_{sq,MUX\_1} > \text{UI} + \delta + t_{hld,FF\_5} \quad (2.14)$$

which is easily met in case of operation at high bit rates, where one UI is relatively small in comparison to the sum of the respective delay times. A similar condition applies for the odd channel.

For high bit rates, where Ineq. 2.14 is satisfied, the following timing conditions have to be met:

- (a) The first condition is for proper transfer of the data from FF\_1,2 to FF\_5 through MUX\_1 and is given by

$$t_{cq,FF\_1,2} + t_{iq,MUX\_1} + t_{stp,FF\_5} - \delta < 2\text{UI} \quad (2.15)$$

- (b) The second condition imposes a limit on  $t_{sq,MUX\_1}$ . Looking into the timing diagram in Fig. 2.9, it can be concluded that a time interval of 3UI should be long enough to accommodate  $t_{cq,FF\_3,4}$ ,  $t_{iq,MUX\_2}$ ,  $t_{sq,MUX\_1}$  and  $t_{stp,FF\_5}$ , otherwise errors happen. This condition is described by the inequality

$$t_{cq,FF\_3,4} + t_{iq,MUX\_2} + t_{sq,MUX\_1} + t_{stp,FF\_5} - \delta < 3\text{UI} \quad (2.16)$$

The conditions in Ineq. 2.15 and 2.16 show the beneficial effect of eliminating FF\_5,6 from the feedback path in relaxing the timing condition in comparison to Ineq. 2.13.

2. Here, Boolean algebra is used to relax the timing condition within the feedback loop by a factor of two. Since Eqn. 2.7 is recursive, it can be written for  $a(n-1)$  as

$$a(n-1) = A(n-1)a(n-2) + B(n-1)\bar{a}(n-2) \quad (2.17)$$

Using Eqn. 2.17 to substitute for  $a(n-1)$  in Eqn. 2.7 and reducing the expression with Boolean algebra [26] we obtain

$$\begin{aligned} a(n) &= A(n)(A(n-1)a(n-2) + B(n-1)\bar{a}(n-2)) \\ &\quad + B(n)\overline{(A(n-1)a(n-2) + B(n-1)\bar{a}(n-2))} \end{aligned} \quad (2.18)$$

Using De Morgan's theorem, namely that  $\overline{XY} = \overline{X} + \overline{Y}$  and  $\overline{X+Y} = \overline{X}\overline{Y}$ , we obtain

$$\begin{aligned} a(n) &= A(n)(A(n-1)a(n-2) + B(n-1)\bar{a}(n-2)) \\ &\quad + B(n)\left(\overline{A(n-1)a(n-2)}\overline{B(n-1)\bar{a}(n-2)}\right) \\ &= A(n)(A(n-1)a(n-2) + B(n-1)\bar{a}(n-2)) \\ &\quad + B(n)\left([\overline{A}(n-1) + \bar{a}(n-2)][\overline{B}(n-1) + a(n-2)]\right) \end{aligned} \quad (2.19)$$

Using the distributive law, namely that  $X(Y+Z) = XY + XZ$  and  $X + (YZ) = (X+Y)(X+Z)$ , we obtain

$$\begin{aligned} a(n) &= A(n)A(n-1)a(n-2) + A(n)B(n-1)\bar{a}(n-2) \\ &\quad + B(n)\left(\overline{A}(n-1)\overline{B}(n-1) + \overline{A}(n-1)a(n-2) \right. \\ &\quad \left. + \overline{B}(n-1)\bar{a}(n-2) + \bar{a}(n-2)a(n-2)\right) \\ &= A(n)A(n-1)a(n-2) + A(n)B(n-1)\bar{a}(n-2) \\ &\quad + B(n)\overline{A}(n-1)\overline{B}(n-1) + B(n)\overline{A}(n-1)a(n-2) \\ &\quad + B(n)\overline{B}(n-1)\bar{a}(n-2) + B(n)\bar{a}(n-2)a(n-2) \end{aligned} \quad (2.20)$$

But  $\bar{a}(n-2)a(n-2) = 0$ , then the above expression can be reduced to

$$\begin{aligned}
 a(n) &= A(n)A(n-1)a(n-2) + A(n)B(n-1)\bar{a}(n-2) \\
 &\quad + B(n)\bar{A}(n-1)\bar{B}(n-1) + B(n)\bar{A}(n-1)a(n-2) \\
 &\quad + B(n)\bar{B}(n-1)\bar{a}(n-2) \\
 &= a(n-2)\left(A(n)A(n-1) + B(n)\bar{A}(n-1)\right) \\
 &\quad + \bar{a}(n-2)\left(A(n)B(n-1) + B(n)\bar{B}(n-1)\right) \\
 &\quad + B(n)\bar{A}(n-1)\bar{B}(n-1)
 \end{aligned} \tag{2.21}$$

To be able to reduce the above expression, we first need to expand it using the identity  $a(n-2) + \bar{a}(n-2) = 1$ , as follows:

$$\begin{aligned}
 a(n) &= a(n-2)\left(A(n)A(n-1) + B(n)\bar{A}(n-1)\right) \\
 &\quad + \bar{a}(n-2)\left(A(n)B(n-1) + B(n)\bar{B}(n-1)\right) \\
 &\quad + B(n)\bar{A}(n-1)\bar{B}(n-1)\left(a(n-2) + \bar{a}(n-2)\right) \\
 &= a(n-2)\left(A(n)A(n-1) + B(n)\bar{A}(n-1) + B(n)\bar{A}(n-1)\bar{B}(n-1)\right) \\
 &\quad + \bar{a}(n-2)\left(A(n)B(n-1) + B(n)\bar{B}(n-1) + B(n)\bar{A}(n-1)\bar{B}(n-1)\right) \\
 &= a(n-2)\left(A(n)A(n-1) + B(n)\bar{A}(n-1)[1 + \bar{B}(n-1)]\right) \\
 &\quad + \bar{a}(n-2)\left(A(n)B(n-1) + B(n)\bar{B}(n-1)[1 + \bar{A}(n-1)]\right)
 \end{aligned} \tag{2.22}$$

But  $1 + \bar{B}(n-1) = 1$  and  $1 + \bar{A}(n-1) = 1$ , then the above expression can be reduced to

$$\begin{aligned}
 a(n) &= a(n-2)\left(A(n)A(n-1) + B(n)\bar{A}(n-1)\right) \\
 &\quad + \bar{a}(n-2)\left(A(n)B(n-1) + B(n)\bar{B}(n-1)\right) \\
 &= f_1(n)a(n-2) + f_2(n)\bar{a}(n-2)
 \end{aligned} \tag{2.23}$$

where

$$f_1(n) = A(n)A(n-1) + B(n)\bar{A}(n-1) \quad (2.24a)$$

$$f_2(n) = A(n)B(n-1) + B(n)\bar{B}(n-1) \quad (2.24b)$$

An implementation based on Eqn. 2.23 and 2.24 permits a total propagation and processing delay of 2UI in the feedback loop.  $f_1(n)$  and  $f_2(n)$  can be implemented using MUXs and computed outside of the feedback loop. When writing Eqn. 2.23 and 2.24 for the even bits  $a(2m)$  and the odd bits  $a(2m+1)$ , two channels and two feedback loops can be used for the even and odd bits. For example, writing Eqn. 2.24 for the odd bits  $a(2m+1)$  yields:

$$f_1(2m+1) = A(2m+1)A(2m) + B(2m+1)\bar{A}(2m) \quad (2.25a)$$

$$f_2(2m+1) = A(2m+1)B(2m) + B(2m+1)\bar{B}(2m) \quad (2.25b)$$

In this case, the bit rate in the even and odd channels will be reduced to half the bit rate at the input. A direct implementation of Eqn. 2.23 and 2.24 for the even and odd bits is shown in Fig. 2.10a.

For proper operation of the architecture in Fig. 2.10a certain timing conditions have to be satisfied for both the even and odd channels. Listed here are the conditions for the odd channel. Similar conditions exist for the even channel:

- There is a timing condition for the lowest bit rate of operation, below which the architecture does not work properly. It is given by:

$$t_{cq,FF\_1,2} + t_{sq,MUX\_3,4} + t_{iq,MUX\_6} > \text{UI} + \delta + t_{hld,FF\_6} \quad (2.26)$$

Fig. 2.10b illustrates the case, in which Ineq. 2.26 is not satisfied. Here, 1UI is relatively long compared to the other delays such as  $t_{cq}$ ,  $t_{sq}$  and  $t_{iq}$ . FF\_6 in this case stores incorrect data, because the outputs of MUX\_3,4 no



Figure 2.10: (a) Direct implementation of Eqn. 2.23 and 2.24 in the half-rate parallel look-ahead DFE (b) Simplified timing diagram of the architecture shown in (a) at low bit rates, where Ineq. 2.26 is not satisfied.



Figure 2.11: Simplified timing diagram of the architecture shown in Fig. 2.10a at high bit rates, where Ineq. 2.26 is satisfied.

longer represent  $f_1(2m+1), f_2(2m+1)$ , as the signals on their selection lines have already changed from  $A(m), B(m)$  to  $A(m+2), B(m+2)$ . These periods are indicated in Fig. 2.10b by dotted lines.

- When Ineq. 2.26 is satisfied at high bit rates, where  $t_{cq}$ ,  $t_{sq}$  and  $t_{iq}$  are relatively long compared to 1UI, the following timing conditions, defining the maximum bit rate of operation, can be extracted from Fig. 2.10a and 2.11:
  - For proper data transfer from FF\_3,4 to FF\_6 through MUX\_3,4,6:

$$t_{cq,FF\_3,4} + t_{iq,MUX\_3,4} + t_{iq,MUX\_6} + t_{stp,FF\_6} - \delta < 2UI \quad (2.27)$$

- The next condition imposes a limit on  $t_{sq,MUX\_3,4}$ . It is illustrated in Fig. 2.11:

$$t_{cq,FF\_1,2} + t_{sq,MUX\_3,4} + t_{iq,MUX\_6} + t_{stp,FF\_6} - \delta < 3UI \quad (2.28)$$

(c) For proper operation of the feedback loop:

$$t_{cq,FF\_6} + t_{sq,MUX\_6} + t_{stp,FF\_6} < 2\text{UI} \quad (2.29)$$

In ECL, the values of  $t_{sq,MUX}$  and  $t_{cq,FF}$  are very close, since the selection line of a MUX and the clock input of a FF are both associated with transistors in the lower stack [28]. Hence, if Ineq. 2.29 is satisfied, both  $t_{sq,MUX}$  and  $t_{cq,FF}$  are expected to be less than 1UI. But, if Ineq. 2.27 and  $t_{sq,MUX\_3,4} < \text{UI}$  are satisfied, Ineq. 2.28 will be automatically satisfied. This implies that Ineq. 2.28 is not very critical, since it will be automatically satisfied, once Ineq. 2.27 and 2.29 are satisfied.

(d) Concerning the hold time of FF\_6, the following condition has to be satisfied:

$$t_{hld,FF\_6} < t_{cd,cq,FF\_6} + t_{cd,sq,MUX\_6} \quad (2.30)$$

Where  $t_{cd,sq,MUX\_6}$  is the selection line to the output contamination delay of MUX\_6, and  $t_{cd,cq,FF\_6}$  is the contamination delay of FF\_6.

(e) To ensure that FF\_6 does not store the incorrect data, represented in Fig. 2.11 on the waveforms of  $f_1(2m+1), f_2(2m+1)$  by the dotted lines.

$$t_{hld,FF\_6} + \delta < t_{cd,cq,FF\_3,4} + t_{cd,iq,MUX\_3,4} + t_{cd,iq,MUX\_6} \quad (2.31)$$

Where  $t_{cd,cq,FF\_3,4}$ ,  $t_{cd,iq,MUX\_3,4}$  and  $t_{cd,iq,MUX\_6}$  are the contamination delays of FF\_3,4, input line to output of MUX\_3,4 and input line to output of MUX\_6, respectively.

The dotted periods on the waveforms of  $f_1(2m+1), f_2(2m+1)$  in Fig. 2.11 happen essentially due to the fact that the input and selection lines of MUX\_3,4 (and of course also MUX\_1,2) do not change at the same time (i.e., not synchronized) and are separated in time by UI. This is also the reason behind the condition in Ineq. 2.26 for the lower limit on the data rate for proper operation.

### 2.2.4 The Modified Half-rate Parallel Look-ahead DFE

In this architecture, shown in Fig. 2.12a, two modifications are proposed [7] [30] to improve the timing behavior of the architecture in Fig. 2.10a.

These two modifications are:

1. The use of retiming latches LT\_1-4 to synchronize the signals at the input and selection lines of MUX\_1-4. The effect of synchronization is illustrated in Fig. 2.12b, where the waveforms of  $f_1(2m + 1)$  and  $f_2(2m + 1)$  no longer have any changes inside one period of 2UI and consequently no spectral components that disturb the signals at the input of FF\_9,10. This is contrary to the architectures in 2.8 (with the dashed lines) and Fig. 2.10a. The timing condition for this synchronization effect will be discussed later in this section.
2. Breaking the critical path from the outputs of FF\_1-4 and LT\_1-4 to the inputs of FF\_5,6 into two paths by adding FF\_7-10. This has the effect of relaxing the timing condition for proper data transmission from FF\_1-4 and LT\_1-4 to FF\_5,6, as the combinational logic depth in this path reduces.

Under the condition that the outputs of FF\_1,2 arrive at the latches LT\_3,4 *before* the latches become transparent, i.e., *before* the clock edge, which can be expressed by

$$t_{cq,FF\_1,2} < \text{UI} \quad (2.32)$$

In this case the signals at the input and selection lines of MUX\_1-4 are synchronized and the timing conditions for proper operation of the odd channel are

1. Concerning the data transmission from FF\_1,2 to LT\_3,4

$$t_{cq,FF\_1,2} + t_{stp,LT\_3,4} - \delta_1 < 2\text{UI} \quad (2.33)$$



Figure 2.12: (a) The modified half-rate parallel look-ahead DFE (b) Simplified timing diagram of the architecture shown in (a), when  $t_{cq,FF\_1-4} < \text{UI}$

2. Concerning the data transmission from FF\_3,4 and LT\_3,4 to FF\_9,10 through MUX\_3,4

$$t_{cq,FF\_3,4} + t_{iq,MUX\_3,4} + t_{stp,FF\_9,10} - \delta_1 - \delta_2 < 2\text{UI} \quad (2.34)$$

$$t_{cq,LT\_3,4} + t_{sq,MUX\_3,4} + t_{stp,FF\_9,10} - \delta_2 < 2\text{UI} \quad (2.35)$$

3. For proper data transmission from FF\_9,10 to FF\_6 through MUX\_6

$$t_{cq,FF\_9,10} + t_{iq,MUX\_6} + t_{stp,FF\_6} - \delta_3 < 2\text{UI} \quad (2.36)$$

4. For the feedback loop:

$$t_{cq,FF\_6} + t_{sq,MUX\_6} + t_{stp,FF\_6} < 2\text{UI} \quad (2.37)$$

Looking into Ineq. 2.37, one can comprehend that it is the critical timing condition for the maximum bit of operation, because of the following reasons:

- (a) As mentioned before in page 27, the values of  $t_{sq,MUX}$  and  $t_{cq,FF}$  are very close in ECL. Hence, if Ineq. 2.37 is satisfied, both  $t_{sq,MUX}$  and  $t_{cq,FF}$  are expected to be less than 1UI. This means that by the satisfying condition Ineq. 2.37, the condition of synchronization in Ineq. 2.32 is automatically met.
- (b) The rest of the conditions, Ineq. 2.33-2.36, are less critical compared to Ineq. 2.37, because usually  $t_{sq} > t_{iq}$ , as mentioned in page 16, and because of the positive clock skew.

The following conditions regarding the hold conditions have to be satisfied as well:

1. Concerning FF\_9,10

$$t_{hld,FF\_9,10} + \delta_1 + \delta_2 < t_{cd,cq,FF\_3,4} + t_{cd,iq,MUX\_3,4} \quad (2.38)$$

$$t_{hld,FF\_9,10} + \delta_2 < t_{cd,cq,LT\_3,4} + t_{cd,sq,MUX\_3,4} \quad (2.39)$$

Where  $t_{cd,cq,FF}$ ,  $t_{cq,cd,LT}$  and  $t_{cd,sq,MUX}$  are the contamination delays of the D-flip-flops, latches and MUXs, respectively.

## 2. Concerning FF\_6

$$t_{hld,FF\_6} + \delta_3 < t_{cd,cq,FF\_9,10} + t_{cd,iq,MUX\_6} \quad (2.40)$$

$$t_{hld,FF\_6} < t_{cd,cq,FF\_6} + t_{cd,sq,MUX\_6} \quad (2.41)$$

Similar conditions exist for the even channel.

Although the critical timing condition in Ineq. 2.37 is more stringent than that in Ineq. 2.15 of the architecture in Fig. 2.8 with the dashed lines, since  $t_{iq,MUX} - \delta < t_{sq,MUX}$ , the modified architecture presented here, as mentioned before, does not suffer from the problem of the glitches like the ones in Fig. 2.8 with the dashed lines and Fig. 2.10a.

Another advantage, which comes as a consequence of the addition of the retiming latches, is that the modified architecture has no timing condition for the minimum bit rate of operation, contrary to the architectures in Fig. 2.10a and 2.8 with the dashed lines. Therefore the modified architecture presented here lends itself also to systems supporting high and low bit rates at the same time, as in the case of multi-standard memory controller physical interfaces [22]. However, the architecture in Fig. 2.12a is more complex and hence dissipates more power, when compared to the ones in 2.8 and 2.10a.

It should be emphasized here that the satisfaction of the condition in Ineq. 2.32 is very essential for the synchronization of the signals at the input and selection lines of MUX\_1-4. As explained before, in ECL, this is automatically met, provided that Ineq. 2.37 is satisfied. In other logic families, Ineq. 2.32 has to be strictly observed to ensure the synchronization.

Fig. 2.13 shows what happens when the condition in Ineq. 2.32 is violated, i.e., when the outputs of FF\_1,2 arrive at the latches LT\_3,4 *after* the latches become transparent. In this case, the synchronization is disrupted and a disturbance occurs at the output of MUX\_3,4, which again may lead to errors.



Figure 2.13: Simplified timing diagram of the architecture shown in Fig. 2.12a, when  $t_{cq,FF\_1-4} > UI$ .

The condition in Ineq. 2.35 for proper data transmission to FF\_9,10 becomes  $t_{cq,FF\_1,2} + t_{dq,LT\_3,4} + t_{sq,MUX\_3,4} + t_{stp,FF\_9,10} - \delta_1 - \delta_2 < 3UI$ , as illustrated in Fig. 2.13.

A general limitation of the half-rate parallel look-ahead architectures in Sec. 2.2.3 as well as the architecture presented here in Fig. 2.12a is the increased complexity for the implementation of more taps. Although the recursive relation in Eqn. 2.7 can be written for any number of taps and the same procedure using Boolean algebra in Sec. 2.2.3 can be utilized to increase the permissible delay in the feedback loop [26], practically, however, the increased complexity of the circuit will make it hard to be implemented, as the number of components and power dissipation increase dramatically.

The modified architecture in Fig. 2.12a was used to design 80 and 110 Gb/s 1-tap

DFEs. The circuit details and measurement of which are discussed in chapter [3](#) and [4](#), respectively.



# Chapter 3

## Active and Passive Components for the Proposed DFE

This chapter focuses on the components, which are used to build up the DFE architecture described in Sec. 2.2.4. The design and characterization of both the active and passive components in the DFE are described. The active components include the front-end comparators, D-flip-flops and the active balun for single-ended to differential conversion of the clock signal. The passive components include the microstip transmission lines (MSTLs) used in the clock tree and in the implementation of on-chip inductors.

### 3.1 Design of the DFE Broadband Front-end

In the look-ahead DFE, two decisions are taken simultaneously on the incoming bit, for the two alternative cases that the preceding bit was '1' or '0', respectively. Two DC references ( $V_{ref}$  and  $-V_{ref}$ ), corresponding to the feedback signals for a preceded '1' and '0', respectively, are fed to a tilting (summation) amplifier. A master-slave D-flip-flop is then used to make a decision on its amplitude. Fig. 3.1 shows the block diagram of one tilting amplifier followed by a D-flip-flop at the DFE front-end.



Figure 3.1: Block diagram of the tilting amplifier and D-flip-flop at the front-end.

### 3.1.1 Bandwidth Requirement

In the design of the front-end for broadband circuits, a trade-off always exists between the ISI and noise [31]. On one hand, the bandwidth should be minimized to reduce the total integrated noise, hence increasing the sensitivity. On the other hand, limited bandwidth introduces ISI. This trade-off can be explained using a simple first order RC low-pass filter with a cut-off frequency  $f_c$ . When applying one full length of an ideal  $2^7 - 1$  PRBS with a bit rate  $R_b=100$  Gb/s, of which the eye-diagram is shown in Fig. 3.2a, to the input of the filter, the resulting eye-diagrams at the output corresponding to  $f_c$  (in GHz)=0.7, 0.5 and  $0.2R_b$  are shown in Fig. 3.2b-3.2d, respectively. It is obvious from the horizontal and vertical eye-closures in Fig. 3.2b-3.2d that the ISI increases as the bandwidth of the filter decreases. On the other hand, compared to the case when  $f_c=R_b$ , the integrated root mean square (rms) noise voltage at the output of the front-end drops by a factor of  $\sqrt{0.7}$ ,  $\sqrt{0.5}$  and  $\sqrt{0.2}$  when  $f_c=0.7$ , 0.5 and  $0.2R_b$ , respectively [31].

It has been proven that a bandwidth of  $0.7R_b$  GHz for the front-end is a good compromise between the noise and ISI [31] [25].

A conventional implementation of the tilting amplifier is shown in Fig. 3.3a. Complementary data inputs ( $V_m$  and  $V_{\bar{m}}$ ) are fed to a differential amplifier (Q1,2), which is accordingly tilted by the reference voltages applied to a second differential pair (Q3,4) working on the same loads. The D-flip-flop is a standard ECL master-slave D-flip-flop. Fig. 3.3b shows the schematic of the master latch of this D-flip-flop, where Q1,2 represent the tracking differential pair and Q3,4 represent the latching differential pair.



Figure 3.2: Input and output eye-diagrams for a first order RC LPF with  $f_c$ (in GHz)=0.7, 0.4 and  $0.2R_b$ .



Figure 3.3: Conventional circuit implementation of the tilting amplifier and the master latch.

As explained earlier in this section, the bandwidth resulting from cascading the tilting amplifier and the tracking stage of the master latch should be around  $0.7R_b$ , which is 70 GHz for  $R_b=100$  Gb/s. If the first order approximation

$$\frac{1}{BW_{total}^2} = \frac{1}{BW_1^2} + \frac{1}{BW_2^2} + \frac{1}{BW_3^2} + \dots + \frac{1}{BW_n^2} \quad (3.1)$$

was used, where  $BW_{total}$  is the 3 dB bandwidth of the system resulting from cascading several systems with 3 dB bandwidths of  $BW_1, BW_2, \dots, BW_n$ , then their individual 3 dB bandwidths should be about 100 GHz. Among the methods utilized to relax the bandwidth requirement on the tilting amplifier and the tracking stage of the master latch are the following two approaches:

1. The bandwidth of the summation amplifier in Fig. 3.3a can be increased by reducing the total capacitance at the collector nodes of Q1-4, hence reducing the RC time constant at this node. This can be performed in a MOS implementation by using the back-gate feedback technique, first introduced in [32]. A summation amplifier, which uses this technique is shown in Fig. 3.4a. Since the output current of a MOS transistor biased in saturation region is a function of the gate-to-source voltage and the threshold voltage, and since the threshold voltage of a MOS transistor could be dynamically adjusted by the source-to-bulk voltage, then the output current of the MOS transistor is a function of both the gate-to-source voltage and the source-to-substrate voltage. In this case, only one differential pair is needed, in comparison to two differential pairs in the conventional implementation in Fig. 3.3a. Furthermore, the summation amplifier and the latch can be merged into one cell [33], as shown in Fig. 3.4b, hence overcoming the bandwidth reduction due to the cascading of the summation amplifier and the latch. A drawback of this technique, however, is that it is only implementable in triple-well MOS technologies and requires access to the bulk terminal of the MOS transistor, as isolated bulks are needed for M1,2 in Fig. 3.4a and M1-4 in Fig. 3.4b.



Figure 3.4: The back-gate feedback technique.

2. The tilting amplifier can be merged into the master latch of the front-end D-flip-flop [34][35] as shown in Fig. 3.5a, hence overcoming the bandwidth reduction due to the cascading of the summation amplifier and the latch. This idea is utilized to implement the front-end in this work.

In the tracking mode of the latch in Fig. 3.5a, the current is steered to the input differential pair Q1,2. To analyze the frequency response of the circuit when excited differentially at  $V_{in}$  and  $V_{in}$ , it is sufficient to consider the equivalent simplified half-circuit shown in Fig. 3.5b. The emitter followers Q7,8 are not taken into account to simplify the analysis, since their load, which is the tracking differential pair of the slave latch, is switched off when the tracking stage of the master latch is active and consequently their bandwidths fall typically in the range of the transistor transit frequency  $f_T$  [36]. The latching differential pair Q3,4 is switched off but still loads the nodes  $X$  and  $\bar{X}$  in Fig. 3.5a with the collector-substrate capacitances  $C_{CS3,4}$  and base-collector capacitances  $C_{\mu3,4}$ . The two reference voltages  $V_{ref}$  and  $-V_{ref}$  are considered DC signals, but the collector-substrate and base-collector capacitances of Q11,12, namely  $C_{CS11,12}$  and  $C_{\mu11,12}$ , respectively, also still load the nodes  $X, \bar{X}$ . The bandwidth limitations of the



Figure 3.5: Tilting amplifier merged into the latch.

circuit in Fig. 3.5b comes mainly from the following two poles at  $V_{in}$  and  $X$ :

$$|\omega_{p,in}| = \frac{1}{R_{in}C_{in}} = \frac{1}{(R_s||R_M||r_{\pi 1})[C_{\pi 1} + C_{\mu 1}(1 + g_{m1}R_c)]} \quad (3.2)$$

$$|\omega_{p,X}| = \frac{1}{R_XC_X} = \frac{1}{R_c(C_{cs1} + C_{cs3} + C_{cs11} + C_{\mu 11} + 2C_{\mu 3} + C_{\mu 1}(1 + \frac{1}{g_{m1}R_c}))} \quad (3.3)$$

$C_{in}$  and  $C_X$  represent the sum of the small-signal capacitances seen at the nodes  $V_{in}$  and  $X$  in Fig. 3.5a to the ground, respectively.  $R_s$  is the source impedance and  $R_M$  is the resistor used for broadband matching.  $R_{in}$  and  $R_X$  represent the small-signal resistances seen at the nodes  $V_{in}$  and  $X$  to the ground, respectively. In this case  $R_{in}$  is the parallel combination of  $R_s$  and  $R_M$ .

$g_m$ ,  $r_{\pi}$ ,  $C_{\mu}$  and  $C_{\pi}$  are the small-signal transconductance, base-emitter resistance, base-collector capacitance and base-emitter capacitance, respectively.

The term  $C_{\mu 1}(1 + g_{m1}R_c)$  and  $C_{\mu 1}(1 + \frac{1}{g_{m1}R_c})$  in Eqn. 3.2 and 3.3, respectively, come from the Miller effect [36] between the nodes  $V_{in}$  and  $X$ . The low-frequency voltage gain  $A_{v1}$  between those two nodes is  $A_{v1} = \frac{x}{V_{in}} = -g_{m1}R_c$ . The Miller approximation in this case yields a capacitance of  $C_{\mu 1}(1 - A_{v1}) = C_{\mu 1}(1 + g_{m1}R_c)$  to the ground at node  $V_{in}$  and a capacitance of  $C_{\mu 1}(1 - \frac{1}{A_{v1}}) = C_{\mu 1}(1 + \frac{1}{g_{m1}R_c})$  to the ground at node  $X$ .

The term  $2C_{\mu 3}$  in Eqn. 3.3 comes because node  $X$  and  $V_{out}$  in Fig. 3.5a, which are the base and collector of Q3, respectively, move with the same magnitude and at opposite directions, assuming that the voltage gain of the emitter followers Q7,8 is ideally unity. In this case, a capacitance of  $2C_{\mu 3}$  appears between each of these nodes and a virtual ground. An identical result can be reached if the Miller approximation is applied here, as the low-frequency voltage gain between  $X$  and  $V_{out}$  is  $A_v = \frac{v_{out}}{x} = -1$ , ideally. Which yields a capacitance of  $C_{\mu 3}(1 - A_v) = 2C_{\mu 3}$  to the ground at node  $X$  and  $C_{\mu 3}(1 - \frac{1}{A_v}) = 2C_{\mu 3}$  to the ground at node  $V_{out}$ .

Neither  $|\omega_{p,in}|$  nor  $|\omega_{p,X}|$  can be regarded as the dominant pole, as they are usually of the same order of magnitude. Hence, the bandwidth is maximized when moving both of them to higher frequencies, as follows:

1. To move  $\omega_{p,in}$  to a higher frequency, the Miller effect (represented in Eqn. 3.2 by the term  $1 + g_{m1}R_c$ ) has to be reduced by decreasing the small-signal voltage gain  $A_{v1} = g_{m1}R_c$ . Another method that helps to move the input pole to higher frequencies, when  $A_{v1} > 1$ , is a cascode configuration [36] [37] [38]. A version of the circuit in Fig. 3.5a using the cascode configuration is shown in Fig. 3.6.



Figure 3.6: The front-end master latch with cascode configuration.



Figure 3.7: The equivalent simplified half-circuit of the schematic in Fig. 3.6.

The equivalent simplified half-circuit is shown in Fig. 3.7. Since  $g_m$  in bipolar transistors is a function of the biasing current and since Q1,13 share the same biasing current, then  $g_{m1}=g_{m13}$ . Also since the output resistance seen at the emitter of Q13 is  $\frac{1}{g_{m13}}$ , it follows that the voltage gain between the two nodes  $Y$  and  $V_{out}$  is  $A_{v1} = \frac{y}{v_{in}} = -g_{m1}/g_{m13} = -1$ . Applying the Miller approximation on  $C_{\mu 1}$  yields a capacitance of  $C_{\mu 1}(1 - A_{v1}) = 2C_{\mu 1}$  to the ground at node  $V_{in}$  and a capacitance of  $C_{\mu 1}(1 - \frac{1}{A_{v1}}) = 2C_{\mu 1}$  to the ground at node  $Y$ . Three poles are then identified in Fig. 3.7:

$$|\omega_{p,in}| = \frac{1}{R_{in}C_{in}} = \frac{1}{(R_S||R_M||r_{\pi 1})[C_{\pi 1} + 2C_{\mu 1}]} \quad (3.4)$$

$$|\omega_{p,Y}| = \frac{1}{R_Y C_Y} = \frac{1}{\frac{1}{a_{\pi 13}}(C_{CS1} + C_{\pi 13} + 2C_{\mu 1})} \quad (3.5)$$

$$|\omega_{p,X}| = \frac{1}{R_x C_x} = \frac{1}{R_c (C_{CS3} + C_{CS13} + C_{CS15} + 2C_{u3} + C_{u15} + C_{u13})} \quad (3.6)$$

Eqn. 3.4 reveals how the cascode configuration helped moving the input pole to a higher frequency, as the capacitance at the input node  $C_{in}$  decreases from  $(C_{\pi 1} + C_{\mu 1}(1 + g_{m1}R_c))$  to  $(C_{\pi 1} + 2C_{\mu 1})$ , assuming the voltage gain  $A_{v1} = g_{m1}R_c$  is greater than one.

Since  $f_T = \frac{g_m}{2\pi(C_\mu + C_\pi)}$ , the pole  $\omega_{p,Y}$  in Eqn. 3.5 falls near  $f_T$  of Q13 if  $C_{\pi 13} >> 2C_{\mu 1} + C_{CS1}$ . Even for comparable values of  $C_{\pi 13}$  and  $2C_{\mu 1} + C_{CS1}$ , this pole is in the order of  $f_T / 2$  and often has negligible effect on the frequency response of the cascode configuration [37].

The pole  $\omega_{p,x}$  is slightly shifted to a higher frequency, as  $C_x$  is decreased from  $(C_{CS1} + C_{CS3} + C_{CS11} + C_{\mu11} + 2C_{\mu3} + C_{\mu1}(1 + \frac{1}{g_{m1}R_C}))$  in Eqn. 3.3 to  $(C_{CS3} + C_{CS13} + C_{CS15} + 2C_{\mu3} + C_{\mu15} + C_{\mu13})$  in Eqn. 3.6.

Another advantage of using the cascode configuration is the high isolation it provides between the switching nodes  $X, \bar{X}$  and input data to be decided as well as the prevention of  $V_{CE}$  breakdown of Q1,2.

Another technique in addition to the cascode configuration, which helps in shifting the input pole towards higher frequencies, is the use of emitter followers to drive the tracking differential pair, because emitter followers have low small-signal output impedance. For example, when using an emitter follower biased with a current of 2 mA, its driving impedance can be as low as  $\frac{1}{g_m} = \frac{I_C}{V_T} = 13 \Omega$ , where  $V_T \approx 26 \text{ mV} @ 300 \text{ }^\circ\text{K}$  is the thermal voltage [36]. However, the output impedance contains an inductive part at high frequencies, which may lead to ringing and affect the data to be decided. This is the reason why emitter followers were not used to drive the master latch in this work. A detailed treatment of the output impedance of emitter followers will be presented in Sec. 3.2.2.1, as it will be utilized to implement active inductors in Sec. 3.4.1.

2. To decrease the effect of bandwidth limitation caused by the pole  $\omega_{p,x}$ , inductive peaking can be used, as will be discussed in Sec. 3.1.3.1.

The following considerations and design methodology were applied when choosing the transistor sizes, values of the loading resistor  $R_C$  and the tail current  $I_{bias1}$  in the schematic shown in Fig. 3.6:

1. Depending on the transistor sizes, the tail current,  $I_{bias1}$ , should be chosen to be 10-20% less than the collector current corresponding to the maximum  $f_T$  in the technology, which allows maximum switching speeds in the differential pairs. This 10-20% margin is necessary, because the decline of  $f_T$  at higher collector-currents than the one corresponding to max  $f_T$  is very strong [38]. The 10-20% provide some margin for the process and temperature variation.

2. Depending on the biasing current, the value of the load resistor  $R_C$  can be chosen so as to have a single-ended logic swing ( $I_{bias1} R_C$ ) of 200-300 mV.

$$I_{bias1} R_C = 200 \text{ to } 300 \text{ mV} \quad (3.7)$$

A simplified large signal analysis of the DC transfer characteristics of a bipolar differential pair with resistive load reveals that a differential input voltage swing of four times the thermal voltage,  $V_T$ , can completely steer the tail current from one side to another [36]. This suggests that the single-ended logic swing can be as low as 100 mV. However, this simplified analysis ignores the voltage drop across the parasitic emitter resistance  $r_e$  [38]. In new HBT technology nodes, the peak  $f_T$  current density increases and the emitter width decreases, thus increasing the parasitic emitter resistance. For example, the smallest size HBT transistor in the technology used for implementation (Appendix A) with emitter dimensions of  $0.48 \mu\text{m} \times 0.12 \mu\text{m}$  has  $r_e = 30 \Omega$ . Hence, to ensure complete switching of the tail current, the minimum single-ended logic swing must be corrected to  $I_{bias1} R_C \geq 4V_T + I_{bias1} r_e$  [38]. Since an even larger swing is required at high temperatures due to the increased thermal voltage, a single-ended logic swing between 200 and 300 mV is recommended. This provides enough margin to account for the impact of process and temperature variations and also very good noise margin [28]. Larger logic swings should be avoided, as large  $R_C$  decreases the bandwidth or, by other words, increases the gate delay due to the increased time constant at the collector nodes  $X, \bar{X}$  as shown in Eqn. 3.6.

3. The transistors Q5,6 can have the same size as Q1,2 for maximum switching speed in the clock differential pair.
4. The minimum size transistors could be used for Q3,4,7,8,13-16 to decrease the capacitive loading on nodes  $X, \bar{X}$ , as  $C_{CS}$  is minimized.

A simple procedure for choosing the value of  $R_C$ ,  $I_{bias1}$  and the sizes of the transistors Q1,2 begins by choosing the minimum size transistor, hence choosing  $I_{bias1}$ . The

value of  $R_C$  is then chosen according to Eqn. 3.7. The small-signal 3 dB bandwidth and voltage gain is then checked by AC simulations. If the bandwidth is less than the desired value or if the gain is less than one, the transistor sizes should be increased and the procedure is iterated, till a compromise is reached among the bandwidth, gain and logic swing.

### 3.1.2 Design and Characterization of the First Version of the DFE Front-end

#### 3.1.2.1 Circuit Description

The idea of merging the summation amplifier and the latch into one cell using the procedures outlined in the previous section was utilized to design the first version of the DFE front-end. The schematic of this version is shown in Fig. 3.6. Two DC power supplies are employed  $V_{ee}=-3\text{ V}$  and  $V_{cc}=2\text{ V}$  to allow bias-free input. The chip photo and its corresponding deembedding structures are shown in Fig. 3.8a and 3.8b, respectively. The input  $V_{in}$  is connected to the pad through a  $50\Omega$  MSLT, with the uppermost thick metalization layer TM2 in the technology (Appendix A) as the signal conductor (width=15  $\mu\text{m}$ ) and M1 as the ground plane. The other input  $V_{\bar{in}}$  is terminated on-chip to the ground through a  $50\Omega$  resistor. The circuit was designed to have a single-ended small-signal voltage gain ( $A_v = \frac{v_{out}}{v_{in}}$ ) of about 4 dB. As no parasitic extraction feature was available in the design kit during the time of the design, the circuit was designed with a 3 dB bandwidth in excess of 100 GHz to leave some margin for the bandwidth shrinkage due to the parasitics.



Figure 3.8: Chip photo of the first version of the DFE front-end

### 3.1.2.2 Circuit Characterization

Single-ended S-parameters measurement was performed on-wafer using the two port Agilent 8510XF vector network analyzer (VNA) up to 110 GHz. SOLT (Short-Open-Load-Thru) calibration was performed using an impedance standard substrate (ISS) to set the input and output measurement reference planes at the probe tips. Furthermore, to set the measurement reference plane at the input and output of the circuit, the open-short deembedding (Fig. 3.8b) technique (Appendix B) was employed.



Figure 3.9: Measurement and simulation results of the first version of the DFE front-end

In the DFE, the front-end master latch is loaded by the slave latch, not  $50\ \Omega$ . This necessitates the post processing of the measured S-parameters to obtain from it the voltage gain  $A_v$  of the DFE front-end. The procedure for the post processing is explained in Appendix C.

Fig. 3.9a and 3.9b show the deembedded measurement and simulation results of  $S_{21}$  and  $A_v$ , respectively. Although the schematic was designed to have more than 100 GHz of 3 dB bandwidth for  $A_v$ , the measured 3 dB bandwidth was only 45 GHz, which is far less than the required bandwidth, as discussed in Sec. 3.1.1 [35]. At the time of the measurement, RC parasitic extraction using Diva<sup>®</sup> became available in the design kit and post layout simulations rendered 55 GHz of bandwidth. Because

the bandwidth was not sufficient, a redesign of the front-end was necessary and will be described in the next section.

### 3.1.3 Design and Characterization of the Second Version of the DFE Front-end

#### 3.1.3.1 Circuit Description

In Sec. 3.1.1, an explanation was given as to how the input pole in Eqn. 3.2 can be shifted to high frequencies using the cascode configuration, hence relaxing the effect of the bandwidth limitation it imposes. This configuration was used to design the first version of the DFE front-end in Sec. 3.1.2. In this section, in addition to the use of the cascode configuration, inductive peaking (also called shunt peaking) [31] [38] is used to relax the effect of the bandwidth limitation due to the output pole in Eqn. 3.6.



Figure 3.10: Common emitter amplifier with inductive peaking.

To illustrate how the inductive peaking increases the bandwidth, the simple common emitter amplifier in Fig. 3.10 can be used. The addition of an inductor in series with the load resistor provides an impedance component that increases with frequency (i.e., it introduces a zero), which helps offset the decreasing impedance of the capacitance with frequency, leaving a net impedance that remains roughly constant over a broader frequency range than that of the same amplifier without the inductor.

The AC voltage gain in this case is given by

$$A_v(j\omega) = -g_{m1}Z_c(j\omega) = -g_{m1}R_c \frac{1 + j\omega \frac{L_c}{R_c}}{1 + j\omega R_c C_x + (j\omega)^2 C_x L_c} \quad (3.8)$$

Compared to the case without inductive peaking  $L_c = 0$ , the value of  $L_c$  can be chosen so that the maximum bandwidth, a maximally flat magnitude or maximally flat group delay condition is satisfied for the frequency response in Eqn. 3.8 [31] [38]. In theory, the maximum bandwidth attained by inductive peaking is about 1.85 times that of the case without inductive peaking [31] [39], but this comes at the cost of 3 dB gain peaking. Practically, however, inductive peaking improves the bandwidth to a lesser extent because of the parasitic capacitances and limited quality factors of on-chip inductors [31].



Figure 3.11: Chip photo of the second version of the DFE front-end.

Inductive peaking was employed in the design of the second version of the DFE front-end. Two circuits have been fabricated as shown in the photo in Fig. 3.11; one without inductive peaking ( $L_c = 0$ ) and another with inductive peaking. The schematic of the latter is shown in Fig. 3.12. The tail current was readjusted to reflect the changes in the transistor models in the technology, because the technology parameters had not been frozen at that time. To determine the value of  $L_c$  needed

to achieve a bandwidth of at least 70 GHz, post layout-simulations were performed, where an ideal inductor model was used for  $L_c$ .



Figure 3.12: Schematic of the second version of the DFE front-end.

As shown in Fig. 3.13, an ideal inductance value of about 100 pH results in a bandwidth of 80 GHz, which satisfies the bandwidth requirement. A value of 130 pH results in the maximum bandwidth of 83 GHz, but with 2.8 dB of peaking in the frequency response.

The input impedance  $Z_{in}$  of a lossless TL with characteristic impedance  $Z_o$ , and an electrical length  $\theta$ , terminated with a load  $Z_L$ , is given by

$$Z_{in} = Z_o \frac{Z_L + jZ_o \tan \theta}{Z_o + jZ_L \tan \theta} \quad (3.9)$$

Considering the case when  $Z_L$  is a short-circuit, then

$$Z_{in} = jZ_o \tan \theta \quad (3.10)$$

The purely imaginary input impedance in the above equation increases with fre-



Figure 3.13: Post-layout simulations of the second version of the DFE front-end.

quency when  $\theta < 90^\circ$ , which resembles the input impedance of an inductor. Therefore a shorted TL behaves like an inductor over a range of frequencies where its electrical length is less than  $90^\circ$ .

The previous conclusion was utilized to design the required 100 pH inductors as MSTLs with short circuit terminations. The MSTL has a width and length of 3  $\mu\text{m}$  and 130  $\mu\text{m}$ , respectively. TM1 is employed as the signal conductor and M1 as the ground plane. The MSTL was simulated in Momentum<sup>®</sup>. Fig. 3.14 shows the real and imaginary parts of its input impedance. The imaginary part of the input impedance of a 100 pH ideal inductor is also depicted in Fig. 3.14 to show how good the MSTL approximates an ideal inductor. Fig. 3.13 shows the simulation results



Figure 3.14: Using MSTL to realize the 100 pH required inductor.

when replacing the ideal inductor with the S-parameter model of the MSL from Momentum®. The bandwidth reduces slightly and becomes around 77 GHz.

When integrating the front-end with the other components of the DFE and due to layout constraints, a change in the metalization layers used in the realization of the inductors had to be taken. The inductors in this case were realized using a MSL with a width and length of 3 and 85  $\mu$ m in TM2 as the signal conductor and M3 and ground plane, respectively. These dimensions were chosen to give the same inductance value of 100 pH as the inductors implemented between TM1 and M1. The MSLs implemented between TM2 and M3 in this case extends vertically at both sides of the slave latch as shown in the part of DFE chip photo in Fig. 3.15.



Figure 3.15: Chip photo showing the integration of the front-end into the DFE. TM2 appears in bright gold color in the dark background representing M3.

### 3.1.3.2 Circuit Characterization

A fully differential characterization of the DFE front-end until 110 GHz requires either a true four-port VNA or a two-port VNA with switch box to extend it to a four-port VNA. Neither of them, however, was available up to 110 GHz. Therefore, it was necessary to develop a measurement technique to enable bandwidth measurement of this version of the DFE front-end semi-differentially (i.e., only when one input is active and the other is grounded) using the available two-port 110 GHz VNA and GSG probes.

In this technique, the input and output pads are arranged so as to take four distinct single-ended S-parameter measurements that relate each input to one output at a time, while the other input is grounded and the other output is left open. The input pads are arranged as  $(G, V_{in}, V_{\bar{in}}, G)$  whereas the output pads are arranged as  $(G, V_{out}, G, V_{\bar{out}}, G)$ . For example, putting the signal pin of the GSG probe on the input pad  $V_{in}$  in Fig. 3.11 will make the other input pad  $V_{\bar{in}}$  grounded. In contrast, at the output side, when the signal pin of the GSG probe is on the output pad  $V_{out}$ , the other output pad  $V_{\bar{out}}$  is left open. In this case an S-parameter set relating  $V_{out}$  to  $V_{in}$  can be measured. The probe at the output side is then shifted to let the output be taken from  $V_{out}$  and another S-parameter set relating  $V_{out}$  to  $V_{in}$  is measured. The resulting four S-parameter sets are then converted to voltage gains, assuming the load is open circuit (Appendix C). In this conversion to voltage gain, the load is assumed an open circuit, because while the GSG is placed on one output, the other output is always left open. In this way, the load symmetry between the two outputs  $V_{out}$  and  $V_{\bar{out}}$  is maintained and the only loading to the core of the circuit is the pad capacitance (about 30 fF) and the connecting line to it. The resulting four voltage gains can be summarized as follows:

$$\frac{v_{out}}{v_{in}} \quad \text{and} \quad \frac{v_{\bar{out}}}{v_{in}} \quad \text{when} \quad v_{\bar{in}} = 0 \quad (3.11)$$

$$\frac{v_{out}}{v_{\bar{in}}} \quad \text{and} \quad \frac{v_{\bar{out}}}{v_{\bar{in}}} \quad \text{when} \quad v_{in} = 0 \quad (3.12)$$

When  $v_{\bar{in}} = 0$ , the voltage gain from  $v_{in}$  to  $v_{out}$  and  $v_{\bar{out}}$  can be assumed to be

$$v_{\bar{out}} = -\frac{A_{v1}}{2}v_{in} \quad \text{and} \quad v_{out} = \frac{A_{v2}}{2}v_{in} \quad (3.13)$$

It follows then that

$$(v_{out} - v_{\bar{out}})|_{v_{\bar{in}}=0} = \frac{A_{v1} + A_{v2}}{2}v_{in} \quad (3.14)$$

Similarly, when  $v_{in} = 0$ , the voltage gain from  $v_{\bar{in}}$  to  $v_{out}$  and  $v_{\bar{out}}$  can be assumed to be

$$v_{\bar{out}} = \frac{A_{v1}}{2}v_{\bar{in}} \quad \text{and} \quad v_{out} = -\frac{A_{v2}}{2}v_{\bar{in}} \quad (3.15)$$

Then

$$(v_{out} - v_{\bar{out}})|_{v_{in}=0} = -\frac{A_{v1} + A_{v2}}{2}v_{\bar{in}} \quad (3.16)$$

Using the superposition principle, the right hand sides in Eqn. 3.14 and 3.16 can be summed up as

$$v_{out} - v_{\bar{out}} = \frac{A_{v1} + A_{v2}}{2}(v_{in} - v_{\bar{in}}) \quad (3.17)$$

It follows then from Eqn. 3.17 that the semi-differential voltage gain,  $A_{v,semi-diff}$  is

$$A_{v,semi-diff} = \frac{v_{out} - v_{\bar{out}}}{v_{in} - v_{\bar{in}}} = \frac{A_{v1} + A_{v2}}{2} \quad (3.18)$$

Fig. 3.16a and 3.16b show the measurement and simulation results of  $S_{21}$  (from  $V_{in}$  to  $V_{\bar{out}}$ ) and  $A_{v,semi-diff}$ , respectively. For  $A_{v,semi-diff}$ , in Fig. 3.16b, a 3 dB bandwidth of around 71 GHz - compared to 77 GHz in simulations - is achieved. In the case without inductive peaking, only 61 GHz - compared to 65 GHz in simulations - is achieved. This version of the DFE front-end is used later on, when integrating the components in the DFE.

## 3.2 Static Frequency Dividers

In addition to figures of merit for the transistor performance in a technology, like  $f_T$  and  $f_{max}$ , some parameters derived from simple benchmarking circuits are regarded as good indicators for the achievable performance in a wide range of circuit applications. One of those parameters is the ring oscillator gate delay, which is usually the gate delay of the simplest combinational gate, namely an inverter, in bipolar technology mostly set up in ECL. Besides the need for transistors with high intrinsic speed, such a ring oscillator also requires resistors with low parasitic capacitance and a met-



Figure 3.16: Measurement and simulation results of the second version of the DFE front-end.

alization stack with low parasitics. Therefore the ring oscillator gate delay serves as a good measure to evaluate a device technology for logic applications.

Digital circuits, however, consist not only of combinational logic gates but also sequential logic gates. The simplest sequential ECL gate is the D-flip-flop. Since a static frequency divider consists of one D-flip-flops connected in a negative feedback manner, the maximum frequency of its operation is used as another traditional measure to indicate the technology performance.

Although other types of frequency dividers exist, which have higher maximum frequency of operation compared to static frequency dividers, such as injection locked frequency dividers and dynamic (regenerative) frequency dividers, they suffer from the disadvantage of working only in a certain frequency range and not down to DC. A static frequency divider, on the contrary, is able to work down to DC, if the slew rate of the input clock signal is high enough.

Fig. 3.17 shows the block diagram of a static frequency divider. The inverted output of the slave latch is connected back to the input of the master latch. This results in an output toggling at the falling edge of the input clock signal. Therefore, the frequency of the output is half of the input frequency [24].



Figure 3.17: Block diagram of the static frequency divider

### 3.2.1 86 GHz Static Frequency Divider

#### 3.2.1.1 Circuit Description

The schematic of one latch from this static frequency divider [40] is shown in Fig. 3.18. Compared to the standard ECL latch shown in Fig. 3.3b, two bandwidth enhancement techniques were used to increase the bandwidth of the latch and hence increase the maximum frequency of operation. These two techniques are the use of cascode configuration (Sec. 3.1.1) and inductive peaking (Sec. 3.1.3.1).  $50\text{ }\mu\text{m} \times 50\text{ }\mu\text{m}$  on-chip inductors implemented in TM2 were utilized for inductive peaking. Their inductance at 50 GHz is around 186 pH, while their self-resonance frequency is 128 GHz. The tail current is set to the optimal current corresponding to maximum  $f_T$ , for maximum current switching speed between the tracking and the latching phases. For high speed operation and moderate noise margin, the values of the resistors  $R_C$  are selected so as to have 150 mV single-ended logic swing. Two extra emitter followers, Q7,8, are utilized to provide enough headroom for the tracking differential pair of the slave latch. The use of double emitter follower also increases the decoupling capability (impedance transformation) of the emitter followers at high operating frequencies [25] [41]. The feedback is taken from the first emitter followers Q7,8, to reduce the feedback delay from the collectors to the bases of Q3,4.

The circuit uses two power supplies ( $V_{ee} = -2\text{ V}$  and  $V_{cc} = 3.5\text{ V}$ ) to allow bias-free clock input. Since single-ended excitation was intended for measurements, one clock



Figure 3.18: Schematic of one latch from the 86 GHz static frequency divider

input is terminated directly to the ground. For broadband matching a  $50\Omega$  termination is used at the other input and  $50\Omega$  MSTL further connects this input to the pad.

The output buffer is a simple differential pair with  $50\Omega$  load resistors and is connected to the output pads also by  $50\Omega$  MSTLs.



Figure 3.19: Chip photo of the 86 GHz static frequency divider

The photo of the chip is shown in Fig. 3.19. Its size is  $1.07 \times 0.50 \text{ mm}^2$ , including the pads.

### 3.2.1.2 Circuit Characterization

Measurement was performed on-wafer [40]. To measure the divider input sensitivity for the frequency range 75-90 GHz, a CW generator was connected to a frequency multiplier. The output from the frequency multiplier was then connected through a waveguide to a GSG probe. For input frequencies up to 60 GHz, the CW generator was connected to a GSG probe by a low loss coaxial cable. One output was monitored on Agilent E4448A 3 Hz-50 GHz spectrum analyzer by a GSG probe, while the other output was left open.

A signal with a frequency of 29.67 GHz was observed at the output without applying any input, which is defined as the output oscillation frequency. This means that the input-referred self-resonance frequency is 59.34 GHz. The divider is working in measurement from 40 GHz to 86 GHz, as indicated by the sensitivity curve in Fig. 3.20. The ostensible loss in sensitivity below 40 GHz is caused by the low



Figure 3.20: Sensitivity curve of the 86 GHz divider

slew-rate of the input signal. To increase the slew-rate, higher input power can be delivered. Although the CW generator is able to deliver more than -5 dBm of input power for frequencies below 40 GHz, as the input power level is increased Q1,2 leave the active region and go into the saturation region, which ultimately leads to operation failure. In simulation, however, when a square wave with high slew-rate and with amplitudes that do not drive Q1,2 into saturation is used, the circuit is able to

work down to very low frequencies.

Diva® RC extraction of the circuit core was performed and post-layout simulations yielded an output oscillation frequency equal to 37 GHz and operating frequency range from 43 to 92 GHz. Although the post-layout simulation shows the right tendency, as the maximum frequency of operation decreased from 108 GHz in schematic-level simulation to 92 GHz in post-layout simulation. The discrepancy, however, between the maximum frequency of operation in measurement and post-layout simulation could not be explained.

### 3.2.2 100 GHz Static Frequency Divider

#### 3.2.2.1 Circuit Description

The motivation behind designing even faster D-flip-flops than those described in the previous section was the necessity to increase the speed of the DFE from 80 to 110 Gb/s (Sec. 4.4) [7].

The schematic of one latch is shown in Fig. 3.21. The tail current is again readjusted to reflect the changes in the transistor model and is set to the optimal current corresponding to maximum  $f_T$  for maximum speed of operation. The use of multiple emitter followers is recommended for high operating speeds, as the decoupling capability (impedance transformation) of emitter followers is rather limited due to the reduced effective current gain ( $|\beta| \approx f_T/f$ ) of the transistors [25] at high frequencies. Thus two or even three cascaded emitter followers are often required.

In the schematic of Fig. 3.21, the minimum size transistor with emitter length ( $E_L$ ) equal to  $0.48 \mu\text{m}$  was used for the first emitter follower pair Q7,9 to decrease  $C_{\mu7,9}$ , which directly affect the time constant at the collectors of Q1-4, hence reducing the speed. In contrast, the second emitter follower pair Q8,10 is usually longer, because it is strongly loaded due to the Miller effect in the differential pair of the succeeding latch (Q1,2). To optimize the size of the second emitter follower pair Q8,10, and hence the bandwidth of the cascaded emitter followers in AC simulations, the



Figure 3.21: Schematic of one latch from the 100 GHz static frequency divider



Figure 3.22: Schematic used to optimize the sizes of the two emitter followers

schematic in Fig. 3.22 was utilized, where the differential pairs (or current switches) Q1,2 and Q3,4 were simply modeled by AC current sources. The loading at the output of the circuit in this case is the sensing differential pair of the slave latch Q1,2 in Fig. 3.21, when the tail biasing current is steered to it.



Figure 3.23: The effect of using different sizes of emitter followers.

Fig. 3.23a- 3.23d show the frequency response of the first emitter follower pair  $\frac{v_{EF}}{v_X}$ , the second emitter follower pair  $\frac{v_Q}{v_{EF}}$  and the overall frequency response  $\frac{v_Q}{v_X}$ , as the emitter length of the second emitter follower pair increases from 0.48  $\mu\text{m}$  to 2.52  $\mu\text{m}$ , while keeping the first emitter follower length constant at 0.48  $\mu\text{m}$ . In all cases the emitter followers are biased with the optimal current corresponding to maximum  $f_T$ , which are 1.2, 2, 4 and 6 mA for  $E_{L2}=0.48$ , 0.84, 1.68 and 2.52  $\mu\text{m}$ , respectively. The overall bandwidth increases from 139 GHz to 175 GHz, as  $E_{L2}$  increases from 0.48 to 1.68  $\mu\text{m}$ . A further increase in  $E_{L2}$  does not result in significant increase in the overall bandwidth. Hence  $E_{L2}=1.68\text{ }\mu\text{m}$  was chosen as the optimum value.

Increasing  $E_{L2}$  results in increased bandwidth of the second emitter follower pair because its transconductance  $g_m$  increases, while its load is fixed. The bandwidth of the first emitter follower pair, however, decreases slightly as their capacitive load increases, as a result of the increase in the second emitter follower size, while their transconductance is maintained constant.

The peaking in the frequency response in Fig. 3.23a- 3.23d can be explained, when looking at the input and output impedance of an emitter follower with the help of the small-signal model as follows:



Figure 3.24: Input impedance of emitter followers.

To obtain an expression for the input impedance  $Z_{in}$  of an emitter follower, the small-signal model in Fig. 3.24a is used. The frequency of operation is assumed high enough to neglect the effect of the small-signal input resistance between the base and emitter, looking into the base,  $r_\pi$ , as most of the AC base current flows through  $C_\pi$ , hence, simplifying the analysis. This approximation is valid for  $\omega > \frac{1}{C_\pi r_\pi}$ . But  $r_\pi = \beta_o/g_m$  and  $\omega_T = 2\pi f_T = \frac{g_m}{C_\pi + C_\mu} \approx \frac{g_m}{C_\pi}$ , since  $C_\pi >> C_\mu$ . It follows then that the approximation is valid when  $\omega > \frac{1}{C_\pi(\beta_o/g_m)} = \frac{\omega_T}{\beta_o}$ , where  $\beta_o$  is the low-frequency short circuit common emitter current gain [25] [41]. In the technology used throughout

this work (Appendix A),  $\beta_o=900$  and  $f_T = 250$  GHz, which makes the approximation at hand valid for frequencies above 0.28 GHz.  $r_b$  is the series base resistance of the transistor, which consists of two parts [36]. The first part, which is not bias-dependent, is the resistance of the path between the base contact and the edge of emitter diffusion area. The second part, which is bias-dependent, is the resistance between the edge of the emitter and the site within the base region at which the current is actually flowing. Also to simplify the analysis, the load is assumed purely capacitive and equal to  $C_L$ .

Applying Kirchhoff's current law (KCL) at the emitter node in Fig. 3.24a:

$$j\omega C_\pi v_\pi + g_m v_\pi = j\omega(v_1 - v_\pi)C_L \quad (3.19)$$

$$v_1 = \frac{g_m + j\omega(C_L + C_\pi)}{j\omega C_L} v_\pi = \frac{g_m + j\omega(C_L + C_\pi)}{j\omega C_L} \frac{i_1}{j\omega C_\pi} \quad (3.20)$$

From Eqn. 3.20, the impedance  $Z_1$  in Fig. 3.24a can then be expressed as

$$Z_1 = \frac{v_1}{i_1} = \frac{-g_m}{\omega^2 C_L C_\pi} + \frac{C_L + C_\pi}{C_L C_\pi} \frac{1}{j\omega} \quad (3.21)$$

Using Fig. 3.24a and Eqn. 3.21, an equivalent circuit for  $Z_{in}$  can be obtained, as shown in Fig. 3.24b. A similar analysis can be performed when the load is resistive  $R_L$  and the resulting equivalent circuit is shown in Fig. 3.24c. In both cases  $Z_{in}$  contains a capacitive and resistive part, which may become negative in the case of capacitive loading.

To obtain an equivalent circuit for the output impedance  $Z_{out}$  of an emitter follower, the small-signal model in Fig. 3.25a is used. To simplify the analysis, the effect of  $C_\mu$  is neglected, which is valid for frequencies  $\omega < \frac{1}{(r_b + R_s)C_\mu}$  and holds if the source impedance  $R_s$  connected to the base is low enough. For example, in the first emitter follower pair,  $C_\mu$  is about 0.35 fF and  $r_b$  is about 50  $\Omega$ . This gives a corner frequency  $\frac{1}{2\pi(r_b + R_s)C_\mu}$  in excess of 3 THz.



Figure 3.25: Output impedance of emitter followers.

Applying KCL at the base node in Fig. 3.25a:

$$j\omega C_\pi v_\pi = -\frac{v_{out} + v_\pi}{R_s + r_b} \Rightarrow v_{out} = -[1 + j\omega C_\pi (R_s + r_b)] v_\pi \quad (3.22)$$

Then applying KCL at the emitter node yields

$$i_{out} = -v_\pi (g_m + j\omega C_\pi) \quad (3.23)$$

Dividing Eqn. 3.22 by Eqn. 3.23, an expression for  $Z_{out}$  can be obtained

$$Z_{out} = \frac{v_{out}}{i_{out}} = \frac{1 + j\omega C_\pi (R_s + r_b)}{g_m + j\omega C_\pi} = \frac{1/g_m + j\omega C_\pi (R_s + r_b)/g_m}{1 + j\omega C_\pi/g_m} \quad (3.24)$$

But  $\omega_T = \frac{g_m}{C_\pi + C_\mu} \approx \frac{g_m}{C_\pi}$ , so the expression in Eqn. 3.24 can be rewritten as

$$Z_{out} \approx \frac{1/g_m + j(R_s + r_b)\omega/\omega_T}{1 + j\omega/\omega_T} \quad (3.25)$$

For frequencies  $\omega \ll \omega_T$ , the imaginary part in the denominator of Eqn. 3.25 can be neglected. The following expression then approximates  $Z_{out}$

$$Z_{out} \approx 1/g_m + j(R_s + r_b)\omega/\omega_T \quad (3.26)$$

Eqn. 3.26 can be modeled by the equivalent circuit shown in Fig. 3.25b, which shows that the output impedance of the emitter follower contains an inductive and resistive

part.

The peaking in the frequency response in Fig. 3.23 happens because at the nodes  $V_{EF}$  and  $V_{\bar{EF}}$  in Fig. 3.22, the output impedance of the the first pair of emitter followers is inductive (Fig. 3.25b), whereas the input impedance of the second pair of emitter followers is capacitive (Fig. 3.24b).

It has been argued in the circuit description of the 86 GHz frequency divider (Sec. 3.2.1.1) that taking the feedback from the first emitter followers reduces the feedback delay from the collectors to the bases of the latching differential pair, Q3,4, and hence is expected to increase the operating speed of the divider. However, after further investigations during the design of this version of the static frequency divider and the optimization of the emitter follower sizes, this expectation turned out to be wrong. Although, on the one hand, the delay is increased, when taking the feedback from the second emitter follower, on the other hand, the signals at the output of the second emitter follower have steeper edges (low rise and fall times) in time domain. This is especially true, when the sizes of the two emitter followers in cascade are optimized, as it has been discussed in this version of the static frequency divider in contrary to the 86 GHz version in Sec. 3.2.1.1. The optimization of the emitter follower sizes in this version leads to a better driving capability of the second emitter follower in comparison to the first emitter follower, because the second emitter followers have higher biasing currents, which leads to steeper edges of the signals at their outputs in time domain. For example, in this version of the static frequency divider, the maximum frequency of operation in schematic simulation decreased from 120 to 108 GHz, when taking the feedback from the first emitter followers instead of the second ones.

Although the use of cascode increases the frequency of operation by shifting the input poles to higher frequencies, in the design at hand, the increment in the maximum frequency of operation after using the cascode was only 1 GHz, which increment is indeed very marginal. This is due to the optimization of the sizes of the two emitter followers, which led to a very small driving impedance at the output of the

second emitter follower and hence the effect of the input pole was not dominant. A cascode was therefore not used in this design to simplify the layout and to avoid the extra biasing network associated with it, which also simplifies the layout.

Inductive peaking by means of 100 pH ideal inductors did increase the maximum frequency of operation in schematic simulation from 120 to 140 GHz. This confirms the conclusion that here the maximum frequency of operation is not limited by the input pole but by the output pole. However, an operation of the D-flip-flops, on which the static frequency divider is based, till 120 GHz in schematic level simulations, was sufficient for the operation of the feedback loop in the second version of the DFE (Sec. 4.4). Therefore, inductive peaking was not employed. It has to be mentioned, however, that even without utilizing on-chip inductors, some gain peaking is expected due to the use of two emitter follower in the feedback, as it has been explained before and shown in Fig. 3.23c.

After optimization of the EF sizes on the schematic level and drawing the layout, RLC Parasitic extraction of the divider circuit was performed using Columbus-AMS®. According to post-layout simulation, the output oscillation frequency was found to be 37.55 GHz, indicating an input-referred self-resonance of 75.1 GHz. The maximum input frequency at which the divider can work is 110 GHz with an input power of at least 0 dBm.

For measurements, the  $\times 6$  frequency multiplier S10MS-AG from Agilent was going to be used. The multiplier has a waveguide output and is able to deliver output frequencies in the range 75-110 GHz with a maximum output power of about 3 dBm at 110 GHz. The insertion loss in the signal path from the waveguide output of the multiplier to the probe-tip (including the S-bend) was estimated to be about 2 to 3 dB, leaving us with 0 to 1 dBm of input power to the divider at 110 GHz, which is almost the minimum input power required for correct division in simulations. This makes the input power matching very critical for this version of the static frequency divider.

Traditionally for broadband matching, a  $50\ \Omega$  resistor terminated to AC ground is used at the clock input. This provides adequate input power matching at frequen-

cies, where the input impedance of the transistors is high enough, but as the clock frequency starts to increase, the junction capacitances of the transistor cannot be ignored and they degrade the matching. This is shown in the simulated input reflection coefficient on the schematic level in Fig. 3.26.



Figure 3.26: Input reflection coefficient for the 100 GHz divider.

The input matching is even further degraded after parasitic extraction and also after simulating the parasitic-extracted divider core together with an EM-simulated  $60\mu\text{m} \times 60\mu\text{m}$  pad and a  $50\Omega$  MSL in TM2 connecting the pad to the circuit core.

To reduce the effect of the pad capacitance, its width was reduced to  $35\mu\text{m}$ , while keeping its length at  $60\mu\text{m}$  for a good contact with the probe-tip. Further decrease in the pad size was not possible, as measuring the circuit on-wafer becomes very difficult.

To further improve the input matching at 110 GHz, a short stub located very near to the divider core was used, as shown in the chip photo in Fig. 3.27. As a result of that, the input reflection coefficient improved by 12 dB at 110 GHz, as shown in Fig. 3.26. Nevertheless, the disadvantage of using the short stub is the poor matching at the low frequencies. To overcome this disadvantage, a  $13\mu\text{m} \times 55\mu\text{m}$  passivation window directly at the input of the short stub was used, as shown in Fig. 3.27. This passivation window enables to cut away the short stub using precise needles on the wafer probe station, hence improving again the matching at low frequencies.



Figure 3.27: Chip photo of the 100 GHz divider

### 3.2.2.2 Circuit Characterization

The divider was characterized on-wafer. As explained in the previous section, to provide input frequencies in the range 75-110 GHz, the  $\times 6$  frequency multiplier was employed. Another frequency multiplier with output range 60-90 GHz from the same series, namely, S12MS-AG was used to cover the input frequency range 60-75 GHz. In both ranges an external waveguide mechanical variable attenuator was used to determine the minimum input power required by the divider at a certain frequency for proper operation. For input frequencies less than 60 GHz, the CW generator was used directly without frequency multipliers. The output was observed on a spectrum analyzer.

Without applying any input, an output oscillation frequency of 35.17 GHz was observed on the spectrum analyzer, indicating an input-referred self-resonance frequency of 70.34 GHz. This is very close to post-layout simulations which yielded an output oscillation frequency of 37.55 GHz.

Fig. 3.28a and 3.28b show the sensitivity curves of the divider with the short stub and after cutting it off, respectively. The divider worked in both cases until 100 GHz, compared to 110 GHz in post-layout simulations [7]. Indeed the minimum required input power at 100 GHz for proper operation was -2.5 dBm in the case of the circuit with short stub, compared to 0 dBm in the case without the short stub, which proves a better input matching at this frequency with the short stub. The fact that the divider worked only until 100 GHz in both cases proves that this limit is due to the circuit

itself and not due to the limited output power from the frequency multiplier.

The divider worked down to 50 GHz with the short stub, due to the degraded input matching at low frequencies, but then after cutting it, the divider worked down to 5 GHz. The ostensible loss in sensitivity below 5 GHz in the case without the short stub is caused by the low slew-rate of the input clock signal.



Figure 3.28: Sensitivity of the 100 GHz divider.

Although it has been proved by measurement that the short stub leads to a better sensitivity at 100 GHz, as less input power is required for proper functionality of the divider. Nonetheless, it turned out that the input power in measurement was sufficient for proper operation until 100 GHz and the limitation on the maximum frequency of operation came from the divider itself. An operation without the short stub in this case may be more useful, if the divider is intended for use in applications requiring proper operation over wide range of frequencies up to 100 GHz.

### 3.3 Design of Broadband Clock Distribution Network

Differential signaling is preferred over single-ended because it reduces crosstalk [25] [41]. To deliver the differential clock signal to the D-flip-flops, the design of a clock distribution network was necessary.

Coupled MSTLs were selected to implement the differential TLs. The signal conductors of the coupled MSTLs are implemented in TM2 (Appendix A), whereas the ground plane is in M3. M1,2 were reserved for the negative and positive supplies, respectively. This isolates the clock tree from other components of the circuit.

Fig. 3.29 shows the odd-mode characteristic impedance values,  $Z_{o,odd}$ , and the lengths of the lines used in the design. To match the  $\frac{40}{3}=13.33\Omega$  to  $100\Omega$  from each side of the clock tree, two  $\lambda/4$  transformers were used [42]. For initial simulations the physical TL model in ADS® schematic entry, CLINP, was used.

Fig. 3.30 shows the difference in the input reflection coefficient between using one and two  $\lambda/4$  sections for matching. In the case of one section, the input reflection coefficient is below -10 dB for the range of frequencies from 40 to 60 GHz, compared to 20 to 70 GHz for the two sections. Since wideband operation was necessary, the two section matching scheme was chosen. The  $80\Omega$  lines are terminated by on-chip  $80\Omega$  resistors at the input of the clock buffers, which allows flexibility in choosing the lengths of the  $80\Omega$  lines according to the layout of the equalizer core. This flexibility is further enhanced by adjusting the lengths of the  $40\Omega$  lines.



Figure 3.29: Odd-mode characteristic impedances and the lengths of the coupled MSTLs in the clock tree.



Figure 3.30: Reflection coefficient at the input of the clock tree.

The layout of the clock tree is shown in the chip photos of the first and second version of the DFE in Fig. 4.3 and 4.23, respectively. Great care was taken in making the lengths of the coupled lines pairwise equal to avoid mode conversion. EM simulation using Momentum® was performed and the resulting input reflection coefficient is shown in Fig. 3.30. The clock and data propagation are laid-out in the same direction, to have positive clock-skew, which even relaxes the timing conditions for the maximum operating bit rate [7] [24] (Sec. 2.1). Also, to relax the clock skew, the layout spacings between the different levels of D-flip-flops in the data propagation direction are made compact in the equalizer core.

According to the EM simulations, the loss from the clock tree input to its individual outputs amounts to about 12 dB. This loss is almost entirely due to the power division loss, as the input power is divided among 12 outputs, equating the power loss ideally to  $10\log(1/12)=-10.8$  dB. The rest of the loss is due to the two  $\lambda/4$  transformers, which have lengths of 730 and 772  $\mu\text{m}$  for  $Z_{o1,\text{odd}}$  and  $Z_{o2,\text{odd}}$ , respectively. The difference in the two lengths of the two  $\lambda/4$  sections with the odd impedances of  $Z_{o1,\text{odd}}$  and  $Z_{o2,\text{odd}}$  is due to the difference in the geometry of the two coupled MSTLs implementing them and consequently different effective dielectric constants. For  $Z_{o1,\text{odd}}=57 \Omega$ , a signal conductor width of 10  $\mu\text{m}$  and a separation of 20  $\mu\text{m}$  have been employed, whereas for  $Z_{o2,\text{odd}}=21 \Omega$ , the signal conductor width is 20  $\mu\text{m}$  and

the separation is 2  $\mu\text{m}$ .

An active balun (balanced to unbalanced) [43] at the input of the clock tree converts the single-ended input clock to differential clock and compensates partially for the losses of the clock tree. The clock buffers at the individual outputs of the clock tree then further compensate for the losses in the clock tree and deliver the required clock swing for proper operation of the D-flip-flops, which is about  $220 \text{ mV}_{\text{pp,diff}}$ . A detailed description of the design of the active balun and clock buffers is given in the next section.

## 3.4 Generating Differential Signals from Single-ended CW Sources

As explained in the previous section, a block to convert the single-ended clock input to differential at the input of the clock tree was necessary. Several on-chip solutions to convert single-ended to differential signals are available, among them are:

1. On-chip TL-based passive baluns like rat-race coupler [44] and the Marchand balun [45]: these are narrow band solutions and suffer from large on-chip area at frequencies like 50 GHz. They generally have large insertion losses, even in the case of Marchand balun based on the broadside-coupled lines [46].
2. On-chip transformer-based baluns realized with vertically stacked symmetrical coils in the upper metalization layers in the technology: they present a very compact solution, especially at high frequencies. Although relatively high coupling coefficients between the primary and secondary coils are achievable [47] (which means smaller insertion loss), it is very hard to design a transformer-based balun with small magnitude and phase errors over a large bandwidth. An analysis and a solution to compensate the unbalance in phase and magnitude is found in [48], but only over a narrow bandwidth.
3. On-chip active baluns: present also a very compact solution. Broadband oper-

ation with very low insertion loss and even with gain is possible, hence they were selected to convert the single-ended clock to differential one.

Active baluns are usually implemented by either using a differential pair with one of its inputs AC-grounded (Fig. 3.31a) [49] or with the common-base common-emitter (CB-CE) configuration (Fig. 3.31b) [50] [51].



Figure 3.31: Two configurations for active balun implementation.

Ideally, the two outputs of the balun,  $V_{out}$  and  $V_{out\bar{}}$ , should be equal in magnitude and  $180^\circ$  out of phase. The magnitude error in dB is defined as  $20\log V_{out} - 20\log V_{out\bar{}}$ , whereas the phase error in degrees is defined as  $|\angle V_{out\bar{}} - \angle V_{out}| - 180^\circ$ . There is, however, inherent deviation from the ideal case in the configurations shown in Fig. 3.31, even at low frequencies [49].

The differential pair in Fig. 3.31a can be taken as an example to illustrate the deviation from the ideal balun behavior. Fig. 3.32 shows the small-signal low-frequency equivalent circuit. Assuming the two inputs have the same DC bias, the tail current,  $I_{bias}$ , divides equally between the two branches of the differential pair, and Q1,2 will have the same small-signal parameters. From the small-signal equivalent circuit in



Figure 3.32: small-signal model of the differential pair based balun.

Fig. 3.32, the two outputs are related to the input by the following relations

$$v_{out} = -g_m Z v_{\pi 2} \quad (3.27)$$

$$v_{\overline{out}} = -g_m Z v_{\pi 1} \quad (3.28)$$

Applying Kirchhoff's voltage law (KVL) and KCL on Fig. 3.32, the following two equations can be written

$$v_{in} = v_{\pi 1} - v_{\pi 2} \quad (3.29)$$

$$g_m v_{\pi 1} + g_m v_{\pi 2} + \frac{v_{in} + v_{\pi 2}}{r_{\pi}} = -v_{\pi 2} \frac{r_o + r_{\pi}}{r_o r_{\pi}} \quad (3.30)$$

Substituting from Eqn. 3.29 in Eqn. 3.30 to eliminate  $v_{\pi 1}$  and obtain an expression of  $v_{\pi 2}$  in terms of  $v_{in}$ , we obtain

$$v_{\pi 2} = -\frac{r_o(1 + g_m r_{\pi})}{2r_o(1 + g_m r_{\pi}) + r_{\pi}} v_{in} \quad (3.31)$$

Then, again substituting from Eqn. 3.29 in Eqn. 3.30 now to eliminate  $v_{\pi 2}$  and obtain an expression of  $v_{\pi 1}$  in terms of  $v_{in}$ , we obtain

$$v_{\pi 1} = \frac{r_o(1 + g_m r_{\pi}) + r_{\pi}}{2r_o(1 + g_m r_{\pi}) + r_{\pi}} v_{in} \quad (3.32)$$

Now substituting from Eqn. 3.31 and 3.32 into Eqn. 3.27 and 3.28, respectively, we obtain

$$v_{out} = g_m Z \frac{r_o(1 + g_m r_\pi)}{2r_o(1 + g_m r_\pi) + r_\pi} v_{in} \quad (3.33)$$

$$v_{\overline{out}} = -g_m Z \frac{r_o(1 + g_m r_\pi) + r_\pi}{2r_o(1 + g_m r_\pi) + r_\pi} v_{in} \quad (3.34)$$

Eqn. 3.33 and 3.34 show that even at low frequencies the differential pair deviates from an ideal balun, because of the  $r_\pi$  in the numerator in Eqn. 3.34. Even though the asymmetry between  $v_{out}$  and  $v_{\overline{out}}$  in Eqn. 3.33 and 3.34, respectively, is very small at low frequencies, it becomes rapidly worse with increasing frequency because of the transistor junction capacitances. A similar analysis can be carried out for the CB-CE configuration in Fig. 3.31b.



Figure 3.33: Deviation from ideal balun behavior for the two configurations in Fig. 3.31b and 3.31a.

Fig. 3.33a and 3.33b compare the magnitude and phase error between the two configurations in Fig. 3.31a and 3.31b, respectively. In the range of frequencies 20-70 GHz, which matches the wideband design of the clock tree, the magnitude and phase errors of both configuration are very close. Nevertheless, biasing the configuration based on the differential pair is easier compared to that of the CB-CE configuration. Therefore, a differential pair implementation was chosen in this work.

### 3.4.1 Circuit Description



Figure 3.34: The block diagram of the active balun.

Fig. 3.34 shows the block diagram of the designed active balun. Three stages are employed to provide the required gain and to increase the common mode rejection ratio (CMRR), hence decreasing the phase and magnitude errors [49]. The stages are AC-coupled to avoid the propagation of DC offsets from one stage to the other.



Figure 3.35: First stage of the active balun.

Fig. 3.35 shows the schematic of the first stage. The second stage is similar to that of the first stage, except that there are no extra emitter followers at its input.

Active inductors (Q3,4 in Fig. 3.35) were utilized in the first and second stages to shape the frequency response of the active balun such that it fits to the frequency operating range of the clock tree. Similar to the emitter follower explained in Sec. 3.2.2.1 and shown in Fig. 3.25b, the impedances seen at the emitters of Q3,4 contain inductive and resistive parts. The value of  $R_s$  and the transistor size were selected so as to have the gain peak around 65 GHz instead of 50 GHz, to take into account the effect of the parasitics after layout, as no parasitic extraction tool was available in the design kit during the design of this active balun. The use of active inductors was preferred over spiral inductors because:

1. Active inductors take up less area. This was necessary because the design of the first and second stages was reused for the clock buffers at the individual outputs of the clock tree. The layout of these clock buffers had to be very compact to fit between the different stages of the equalizer core, which in turn was necessary to reduce the interconnect parasitics between the D-flip-flops and the clock skew.
2. At low-frequency, the output impedance seen at the emitters of Q3,4 in Fig. 3.35 is very low ( $\approx \frac{1}{g_m}$ ). This helps the gain suppression outside the band of interest, hence reducing the risk of oscillations, especially when the gain is high and no parasitic extraction is available.

The third stage consists of a simple differential pair with  $50\Omega$  resistive loads. Fig. 3.36 shows the photo of the fabricated chip.

### 3.4.2 Circuit Characterization

4-port S-parameter measurement till 67 GHz was performed on wafer using the 2-port VNA 8361A and its test set 4-port extension M4421B H67 from Agilent. Fig. 3.37a and 3.37b show the magnitude and phase of the measured and simulated forward transmission coefficients, with port 1 as the input port and ports 3 and 4 as the two



Figure 3.36: Chip photo of the active balun.

output ports. Here, post-layout simulation was used for the comparison, since parasitic extraction became available in the design kit by the time of the measurement.



Figure 3.37: Measured and simulated (with parasitics) forward transmission coefficients ( $S_{21}$  and  $S_{31}$ ).

Measured magnitude and phase errors between the two outputs are depicted in Fig. 3.38. Maximum measured magnitude and phase errors are 0.6 dB and  $2.5^\circ$ , respectively. The maximum magnitude and phase errors in post-layout simulation are 0.2 dB and  $0.6^\circ$ , respectively. Fig. 3.39a and 3.39b show the measured input and output reflection coefficients as well as the reverse transmission coefficients.

To characterize the balun in time domain, Anritsu V240C resistive power divider was used to split the output signal from a CW generator (SMR 60 from Rohde&Schwarz). One branch was used to excite the input of the active balun using



Figure 3.38: Measured phase and amplitude errors.



Figure 3.39: Measured input and output reflection coefficients as well as reverse transmission coefficients.

a GSG probe and the other branch was used to provide the triggering signal for the Infiniium DCA 86100B wide-band sampling oscilloscope from Agilent. At the balun output a GSGSG probe was used and the two outputs were monitored on the oscilloscope. Measurement with a maximum frequency of only 45 GHz was possible due to constraints on the frequency of the triggering signal of the oscilloscope.

Fig. 3.40 shows the two out-of-phase output signals (blue and red) and their subtraction (yellow) when the balun is excited by a  $184 \text{ mV}_{\text{pp}}$  ( $\approx -11 \text{ dBm}$ ) sine wave at 45 GHz at its input. The subtracted signal has a 1.4 ps and 187 fs peak-to-peak and rms jitter, respectively (compared to 1.3 ps of peak-to-peak jitter of the input signal



Figure 3.40: Time domain measurement with a -11 dBm input signal at 45 GHz (X-axis scale is 5 ps/div and Y-axis scale is 40 mV/div).

from the CW source). The cables and connectors losses from the outputs of the balun to the sampling heads of the oscilloscope amount to about 4.5 dB. The amplitude of the differential signal at the out-of-phase outputs measured at the sampling heads of the oscilloscope is about 215 mV<sub>pp</sub> (= -9.4 dBm) in Fig. 3.40. This means that, taking into account the cables losses, the amplitude of the differential signal at the out-of-phase outputs of the balun itself is about 362 mV<sub>pp</sub> (= -4.9 dBm). When exciting the balun with the same input power at the same frequency in post-layout simulation, after parasitic extraction using Columbus-AMS®, a differential signal with an amplitude of 367 mV<sub>pp</sub> (= -4.7 dBm) at the out-of-phase outputs was observed, which is very close to the measurement result. The balun works from a 5 V supply and dissipates 278 mW of power. The dimensions of the balun without the pads are 187  $\mu$ m  $\times$  117  $\mu$ m, whereas with pads they are 900  $\mu$ m  $\times$  520  $\mu$ m.

Table 3.1 compares the results of the presented active balun with other active baluns with bandwidth in excess of 2 GHz found in the literature by the time of its publication [43]. On one hand, among the active baluns fabricated in SiGe technologies, it has the smallest area, the smallest phase error. On the other hand, it has highest power dissipation, which is attributed to the high gain.

|                                  | This work    | [51]         | [52]         | [53]                       | [54]         |
|----------------------------------|--------------|--------------|--------------|----------------------------|--------------|
| Freq range (GHz)                 | 31-65        | 54-59        | 2-40         | 1-16 and 10-30             | 0.2-22       |
| Gain (dB)                        | 12.4         | -1.4         | 1            | 1                          | 5            |
| Gain Imbalance (dB)              | 0.6          | 1.2          | 0.5          | 0.9 and 0.5                | 0.5          |
| Phase imbalance (deg)            | 2.5          | 5            | 10           | 5                          | 4            |
| Power (mW)                       | 278          | 10.4         | 40           | -                          | 166          |
| Area mm <sup>2</sup> (with Pads) | 0.47         | 0.63         | 0.56         | 0.36                       | 0.7          |
| Technology                       | 0.13 μm SiGe | 0.13 μm SiGe | 0.13 μm CMOS | 0.5 and 0.15 μm GaAs pHEMT | 0.25 μm SiGe |

\*simulation

Table 3.1: State of the art in active baluns with bandwidth in excess of 2 GHz

# Chapter 4

## Design and Testing of the Proposed DFE for 80 and 110 Gb/s

This chapter presents the integration of the active and passive components from chapter 3 into the proposed modified half-rate parallel look-ahead architecture described in Sec. 2.2.4. Explained in this chapter are the design and measurement of two versions of the DFE. The first version works till 80 Gb/s and the second one till 110 Gb/s.

### 4.1 Integrating the Components

#### 4.1.1 Floor Planing

The modified half-rate parallel look-ahead DFE architecture described in Sec. 2.2.4 is shown here again in Fig. 4.1. In a conventional implementation of one tap DFE, the even and odd channels have the same reference voltage (i.e.,  $V_{ref\_odd}=V_{ref\_even}$ ), however in this implementation, the reference voltages of the even and odd channels,  $V_{ref\_even}$  and  $V_{ref\_odd}$ , can be controlled independently to facilitate the measurement techniques described later in Sec. 4.2.1 and 4.2.2.



Figure 4.1: The modified half-rate parallel look-ahead DFE.

The floor planning is shown in Fig. 4.2. The chip has two inputs, namely the single-ended clock, the differential input data, and the two differential outputs, namely the even and odd output data. These inputs and outputs are arranged on the four sides of the chip, since on-wafer measurement was intended and only one high frequency probe can be used from each side. The pads have the configuration PGSGSGP (P:power, G:ground and S:signal) to match the configuration of the probe needles. Two pads for  $V_{cc}$  and  $V_{ee}$  were used to supply the DC current to the circuit, which is around 800 mA. The use of two DC supplies,  $V_{cc}=2$  V,  $V_{ee}=-3$  V, allows bias-free data input.

The core of the equalizer is the architecture in Fig. 4.1. Although all the signals in Fig. 4.1 are differential to reduce crosstalk [25], single-ended representation was used for simplicity. The chip photo of the DFE is shown in Fig. 4.3



Figure 4.2: Floor planning of the modified half-rate parallel look-ahead DFE.



Figure 4.3: Chip photo of the DFE.

### 4.1.2 Layout Techniques

Several techniques were utilized in the layout of the equalizer to reduce the effect of the parasitics. Among them are:

1. In ECL gates with several inputs, the input-output propagation delay is greater for the inputs associated with transistors at the lower level of the transistor stack [28] [55]. This means that for the multiplexers, MUX\_1-4, in Fig. 4.1, the delay from the selection line S to the MUX output  $t_{sq}$  is always greater than the delay from the input lines  $I_1, I_0$  to the MUX output  $t_{iq}$ . Therefore, the cross coupling of the even and odd channels was made before the latches LT\_1-4 so as to shorten the length of the interconnects from the outputs of LT\_1-4 to the selection lines of MUX\_1-4 and let the signals at the selection lines arrive earlier than the signals at the inputs. Taking advantage of the multiple thin metal layers M1-M5 offered by the technology (Appendix A), four layers M2-M5 were used to implement the cross coupling. A simplified layout without the interconnects of the clock distribution is shown in Fig. 4.4.
2. The components in the feedback loop at the end of the even and odd channels, namely MUX\_5,6 and FF\_5,6, are arranged in the layout so as to reduce the interconnect delay, as shown in Fig. 4.5.
3. Two power supply planes are employed, M1,M2 for  $V_{ee}, V_{cc}$ , respectively. These planes cover almost all the chip area and act as a large distributed parallel-plate decoupling capacitor [41]. Furthermore, decoupling metal-insulator-metal (MIM) capacitors are inserted and distributed all over the chip. Local concentration of decoupling capacitance is avoided because it may cause resonances in combination with supply plane inductance. The global power supply planes are realized by a metalization mesh at the maximum permissible metal density allowed by the technology. Implementing  $V_{ee}$  plane in M1 eases the connection to the substrate. To isolate the clock distribution network from the power supply planes, the ground plane of the clock tree is implemented in M3.



Figure 4.4: Simplified layout of a part of the equalizer core in Fig. 4.1 containing FF\_1-4, LT\_1-4, and MUX\_1-4.



Figure 4.5: Simplified layout of the feedback at the end of the odd channel.

4. To avoid distributed millimeter wave effects, the layout of the equalizer core is made to have a very compact width, as shown in Fig. 4.4. In addition to that, the two front-end comparators of the even and odd channel, in this case, sample the incoming bit at almost the same instance.
5. The inductive peaking in the front-end comparators is realized with small segments of transmission lines in the uppermost metal layer TM2 (Sec. 3.1). These segments are extended in the direction of data propagation to avoid increasing the spacing between the front-end comparators in the direction perpendicular to the data propagation. A chip photo showing them is found in Fig. 3.15. Also, since the spacing and consequently the clock skew between the D-flip-flops is dictated by the width of clock buffers layout, the layout of those buffers is made very compact as well.

## 4.2 Measurement Techniques

For full characterization of the DFE in Fig. 4.1, a bit pattern generator capable of producing PRBS at the full rate of  $R_b$  is necessary. Also, a data signal emulator (DSE)

is needed to generate electrical or optical data signal with different amounts of ISI [56]. Both pieces of equipment were neither available in the department of high-frequency electronics at the university of Paderborn nor at IHP, where the activities of this work took place. Consequently, other strategies and measurement techniques for testing the DFE had to be devised with the help of the available bit pattern generator SHF 12100B and the bit error rate tester SHF 11100B, working up to 56 Gb/s. These techniques, described in Sec. 4.2.1 and 4.2.2, provide a simple method to determine the maximum operating bit rate of the DFE [7] [30].

The effectiveness of the measurement techniques presented in Sec. 4.2.1 and 4.2.2 is confirmed by experiments at the full rate of 80 Gb/s in Sec. 4.3 [7] [56], which were performed at Bell Labs, Alcatel-Lucent, Holmdel, New Jersey, USA, in which the above mentioned equipment for measurement at the full rate was available.

In addition to the bandwidth requirement for the front-end of the DFE discussed in Sec. 3.1.1, the maximum operating bit rate of the DFE depends on the timing conditions, discussed in Sec. 2.2.4, regarding the combinational logic (MUX\_1-4) and the feedback loops (FF\_5,6 and MUX\_5,6).

The tests described in the following two sections provide a method to determine the maximum bit rate (the smallest UI) for which these timing conditions are satisfied and consequently determining the maximum bit rate of operation of the DFE.

### 4.2.1 Measuring the Maximum Operating Bit Rate of the Combinational Logic

To determine the maximum bit rate for which the timing conditions through the combinational logic (MUX\_1-4) are satisfied, a bit rate of  $R_b/2$  is applied at the input and the clock frequency in GHz also is equal to  $R_b/2$ . As shown in Fig. 4.6,  $V_{ref\_odd}$ ,  $-V_{ref\_odd}$  are given values to enforce the output of FF\_3,4 to have logic '1','0', respectively, independent of the DFE input, while  $V_{ref\_even}$ ,  $-V_{ref\_even}$  are connected to the ground to allow the even channel to sample the input bit sequence. Here, as shown in Fig. 4.6, MUX\_1,2 will have their input lines  $I_1, I_0$  toggling with the same bit rate as the input,



Figure 4.6: Measuring the maximum operating bit rate of the combinational logic.

which is  $R_b/2$ , whereas their selection lines S do not. The opposite will happen with MUX\_3,4, which will have their selection lines toggling, whereas their input lines do not. In Fig. 4.6, if the timing conditions for proper transmission of the data through MUX\_1-4 are satisfied, then the same bit sequence at the input will appear at the outputs of FF\_7-10. The feedback loops in this case will yield the same bit sequence at the outputs of the odd and even channels.

In a full rate test, the input and selection lines of the multiplexers may all be toggling, but not necessarily with the same bit sequence. Nevertheless in the half-rate test described here, an observation of the same input bit sequence at the output of the even and odd channels is a good indication of the combinational logic ability to work at  $R_b/2$ , which is the same bit rate through the even and odd channels in case of a full rate test with  $R_b$  at the input.

Furthermore, if the same bit sequence applied at the input is observed at the outputs of both the even and odd channels then the clock tree design satisfies the clock skew conditions previously discussed in Sec. 2.2.4.

This test, however, does not give a good indication of the maximum working frequency of the feedback loops of the even and odd channels, because an identical bit sequence appears at the two inputs  $I_1, I_0$  of MUX\_5,6. Therefore, the next section describes another test for the feedback loops.

In measurement, a PRBS of length  $2^{31} - 1$  is generated from the bit pattern generator with 200 mV single-ended logic. The BER is measured by the bit error rate tester. An external phase shifter, namely Spectrum Elektronik LS-P150-HFHM, is used to manually adjust the phase of the clock with respect to the input data. The clock signal in this case is taken also from the bit pattern generator, which provide a replica from its clock input. Testing was performed on-wafer, as shown in Fig. 4.7a. The BER was less than  $10^{-13}$  from 25 to 45 Gb/s at the even and odd outputs, confirming the wideband response of the clock tree described in Sec. 3.3. At 46 Gb/s errors start to occur and the BER becomes  $4 \times 10^{-11}$ , as shown in Fig. 4.7b.



Figure 4.7: (a) Measurement setup for the maximum operating bit rate of the combinational logic (b) Bit error rate.

At the time of the measurement, parasitic extraction became available in the design kit. To investigate the increase in the BER beyond 45 Gb/s, post-layout simulation on the equalizer core of Fig. 4.6 was performed after RLC parasitic extraction with Columbus-AMS®. To take into account the effect of clock skew, the delay between the different outputs of the clock tree was modeled in the post-layout simula-



Figure 4.8: The input bit sequence at 40 Gb/s used in the post-layout simulation of the equalizer core.

tion by using simple ideal sinusoidal voltage sources at the different clock inputs of the equalizer core. The phases of these ideal sources correspond to the phase information extracted from the EM simulation of the clock tree using Momentum®. The simple bit sequence shown in Fig. 4.8 was used as the input in post-layout simulations.

The same bit sequence was observed at the outputs of the even and odd channels up to a bit rate of 44 Gb/s. Increasing the bit rate beyond 44 Gb/s led to some bit errors at the outputs of the even and odd channels. This agrees with the measurement results presented earlier in this section.

These bit errors occur since each of the output signals of FF\_1-4:  $A(2m)$ ,  $B(2m)$ ,  $A(2m + 1)$  and  $B(2m + 1)$  is connected to two input lines of the MUXs and to one retiming latch at the selection lines of the MUXs, as shown in the simplified layout in Fig. 4.4. This increased fan-out produced heavy loading at the outputs of FF\_1-4, as evident from the long rise and fall times in Fig. 4.9, where the timing waveforms at three different bit rates, namely 40, 45 and 50 Gb/s, are depicted to assist in the explanation.



Figure 4.9: Post-layout simulation of the differential signal  $A(2m)$ : (a) 40 Gb/s, (b) 45 Gb/s, (c) 50 Gb/s.



Figure 4.10: Post-layout simulation of the differential output of MUX\_1: (a) 40 Gb/s, (b) 45 Gb/s, (c) 50 Gb/s.



Figure 4.11: Post-layout simulation of the differential signal at the output of FF\_5:  
 (a) 40 Gb/s, (b) 45 Gb/s, (c) 50 Gb/s.

Although MUX\_1-4 are able to retrieve the signal shape at their outputs, as shown in Fig. 4.10, still, the delay caused by this loading effect makes FF\_7-10 sample the outputs of MUX\_1-4 at wrong time instants (i.e., violation of the setup time for FF\_7-10) and errors start to occur, as shown in Fig. 4.11, where the bit errors are marked. This effect of fan-out was not taken into account in chapter 2 while analyzing the different timing conditions of the different architectures.

#### 4.2.2 Measuring the Maximum Operating Frequency of the Feed-back Loop

The following test can be utilized to determine the maximum bit rate for which the timing condition of the feedback loops is satisfied. In this test, as shown in Fig. 4.12,  $V_{ref\_even}, -V_{ref\_even}$  are given values to enforce  $A(2m), B(2m)$  to have logic '0', '1', respectively, independent of the input, while  $V_{ref\_odd}, -V_{ref\_odd}$  are given values to force  $A(2m+1), B(2m+1)$  to have logic '1', '0', respectively. Since MUX\_1-6 implement the Boolean algebra relation:

$$Q = I_1S + I_0\bar{S} \quad (4.1)$$

it follows that  $f_1(2m), f_2(2m)$  will always have logic '0', '1', respectively. Also  $f_1(2m+1), f_2(2m+1)$  will always have logic '0', '1', respectively. The inputs of MUX\_5,6 will become  $I_1=0, I_0=1$ . According to Eqn. 4.1, MUX\_5,6 act as logic inverters in the path from the outputs of FF\_5,6 to their inputs. Both feedback loops in this case act as static frequency dividers. The maximum frequency at which these static frequency divider-like loops work will be equal to the maximum bit rate, at which the feedback loop works properly.

It should be noted here that the condition of oscillation, however, is never satisfied in normal operation of the DFE when  $V_{ref\_even}=V_{ref\_odd}=V_{ref}$ . This stems from the fact that it only makes sense to assume positive values for  $V_{ref}$ , because a preceding bit of '1' can only increase the level of the current bit to be decided and a preceding bit of '0' can only decrease it [2].



Figure 4.12: Measuring the maximum operating bit rate of maximum operating frequency of the feedback loop.

As illustrated in Fig. 4.13, with positive values of  $V_{ref}$ , it is impossible to obtain  $A(m), A(2m+1)=1$  and  $B(m), B(2m+1)=0$ , which is the condition of oscillation.

In the measurement, a continuous wave (CW) generator is used as the input clock, and the output from the even or the odd channels is monitored by Agilent E4448A spectrum analyzer. The test result indicates an operating range from 19 to 40 GHz, which further confirms the wideband response of the clock tree and the active balun as described in Sec. 3.3. A 15 GHz signal was observed on the spectrum analyzer without applying input clock, which is the output oscillation frequency of the divider-like-operating feedback loop.

Using the test procedures outlined in this section, the post-layout simulation yielded 39 GHz as the maximum clock frequency of the feedback loop and 15.5 GHz for the output oscillation frequency. This agrees with the measurement results presented earlier.

Although the maximum operating bit rate of the combinational logic in Sec. 4.2.1



Figure 4.13: Possible logic values for  $A$  and  $B$  from the front-end comparators for different analog input.

is 45 Gb/s, the DFE as a whole can work only up to 80 Gb/s due to the operating frequency limitation of the feedback loops.

It is useful here to mention that the same test described in this section for the feedback loop can also be applied to the look-ahead architecture described in Sec. 2.2.2 to check the maximum operating frequency of its feedback loop.

### 4.3 Measurement at the Full Bit Rate

To facilitate measurements at the full rate of  $R_b=80$  Gb/s at Bell Labs, Alcatel-Lucent, a mounting board was designed to provide  $V_{ee}$ ,  $V_{cc}$ ,  $V_{ref\_odd}$  and  $V_{ref\_even}$  to the DFE via wire bonds, while the data input, clock and even and odd outputs were accessed by on-wafer 67 GHz probes with GSGSG configuration. The board was laminated on a 0.8 mm thick copper plate to improve heat dissipation. The chip was attached to the

copper plate with a heat conducting glue.

In this measurement setup,  $V_{ref}=V_{ref\_odd}=V_{ref\_even}$ . The DSE generates electrical or optical data signal with different amounts of ISI [56]. The outputs from the even and odd channels are 1:4 demultiplexed with SHF 423 demultiplexers and the BER is measured at 10 Gb/s.

A tunable phase shifter electronically controls the phase difference between the clock and the data at the input of the DFE. The software LabVIEW is employed to vary  $V_{ref}$  and the phase difference between the clock and the data at the input of the DFE in 5 mV and 0.5 ps steps, respectively. This allows the changing of the sampling threshold  $V_{ref}$  and sampling time instant of the D-flip-flops at the front-end of the DFE across the received eye-diagram, hence the evaluation of the ISI effect on BER as a function of both. The sampling time offset,  $V_{ref}$ , and BER are plotted as contour maps, which are plots showing the BER in different colors as a function of the sampling time instant offset from the optimum sampling time at the center of the eye-diagram on the x axis and  $V_{ref}$  on the y-axis. In the BER contour maps, the case where the feedback in the DFE is not active corresponds to  $V_{ref}=0$ , which indicates that the DFE front-end is only sampling the current bit in the input sequence without any feedback effect from the previous bit. Several experiments have been carried out to explore the ability of the DFE to equalize different kinds of optically and electrically generated ISI. Only two of them are presented here and the reader is referred to [56] for more details about the measurement setup as well as more experiments.

In the first experiment, which setup is shown in Fig. 4.14, two identical 40 Gb/s PRBS sequences of length  $2^{15} - 1$  are generated from a bit pattern generator on the transmitter side. One sequence is delayed by several bit duration at 40 Gb/s with respect to the other before multiplexing them into an 80 Gb/s stream using MICRAM MX2180 2:1 multiplexer. Fig. 4.15a shows the eye diagram at the output of the multiplexer, when its output is adjusted at the lowest output amplitude. Fig. 4.15b shows the BER contour map, when connecting the output of the multiplexer to the DFE. Here, the center part of the BER contour map shows very marginal improvement

with feedback, indicating low ISI in the received signal at the input of the DFE, which is expected from the eye diagram in Fig. 4.15a. A DSE consisting of a low-pass filter



Figure 4.14: Measurement setup for testing the DFE at 80 Gb/s.



Figure 4.15: (a) The 80 Gb/s single-ended eye-diagram at the input of DFE without distortion (30mV/div and 5ps/div) (b) The corresponding BER versus  $V_{ref}$ .

with a 3 dB bandwidth of 20 GHz from Picosecond Pulse Labs is then employed to introduce ISI. Fig. 4.16 shows the measured  $S_{21}$  of this filter. The insertion loss at the Nyquist rate (40 GHz for the 80 Gb/s) is 12 dB.

Because of the ISI it introduces, the filter causes heavy distortion to the signal at the input of the DFE, as shown in Fig. 4.17a. The differential output from the DSE is connected to the DFE using low-dispersion cables. The DFE improved the BER from



Figure 4.16: The measured  $S_{21}$  of the 20 GHz low-pass filter.



Figure 4.17: (a) The distorted single-ended eye-diagram at the output of the 20 GHz low-pass filter (50mV/div and 5ps/div) (b) The corresponding BER contour map.

0.5, when  $V_{ref}=0$ , to less than  $10^{-9}$  when the feedback is active (i.e.,  $V_{ref} > 0$ ), as shown in the BER contour map in Fig. 4.17b.

Fig. 4.18a shows the BER versus  $V_{ref}$  at sampling time offset=0 in the BER contour map of Fig. 4.17b. Fig. 4.18a can be thought of as a vertical slice of the BER contour map of Fig. 4.17b at sampling time offset=0. The BER, however, is not limited to  $10^{-9}$ , but since it is measured at 10 Gb/s, obtaining lower BER to construct the contour maps was unpractical, as it requires long measurement times.

The eye-diagram of the 40 Gb/s output of the DFE is shown in Fig. 4.18b.



Figure 4.18: (a) The BER versus  $V_{\text{ref}}$  at the optimum sampling time in the BER contour map (b) The single-ended 40 Gb/s output eye-diagram from the DFE (35mV/div and 10ps/div).

In the second experiment, an SMA elbow was employed as a DSE. Fig. 4.19 shows the measured  $S_{21}$  of this elbow. Its 3 dB bandwidth is around 41 GHz.

Fig. 4.20a shows the resulting distorted eye-diagram at the output of the SMA elbow. The ISI here happens not only because of bandwidth limitation but also because of the modal distortion, as the SMA elbow is only specified to work until 18 GHz. Fig. 4.20b shows the resulting BER contour map.



Figure 4.19: The measured  $S_{21}$  of an SMA elbow.

In the above experiments, all the 10 Gb/s outputs showed similar BER performance.

The experiments at the full data rate in this section validate the on-wafer measurement techniques presented earlier in Sec. 4.2.1 and 4.2.2 for predicting the maximum operating bit rate of the DFE.



Figure 4.20: (a) The distorted single-ended eye-diagram at the output of the SMA elbow (50mV/div and 5ps/div) (b) The corresponding BER contour map.

## 4.4 Design and On-wafer Measurement of a 110 Gb/s DFE

To carry the operating bit rate of the DFE above 80 Gb/s, an improvement on the 80 Gb/s DFE architecture is introduced in this section and is shown in Fig. 4.21. To overcome the two speed limiting factors of the combinational logic and the feedback loop, the following modifications have been carried out:

1. The outputs from FF\_1-4 in Fig. 4.21 are connected through buffering latches LT\_5-8 and short interconnects to the input lines of MUX\_1-4. The loading effect on FF\_1-4 is reduced compared to that in Fig. 4.1, as each output is connected to only one latch and one D-flip-flop. To maintain the synchronization between the input and selection lines of MUX\_1-4, additional latches (the slave latches in FF\_11-14) had to be inserted. Furthermore, the long interconnects



Figure 4.21: The architecture of the 110 Gb/s DFE.

from FF<sub>1-4</sub> to LT<sub>5-8</sub> are implemented in TM2 to reduce their parasitic capacitance to the substrate, hence further reducing the load on the outputs of FF<sub>1-4</sub>.

The effect of load reduction is evident when comparing the rise and fall time at 50 Gb/s of the post-layout simulation in Fig. 4.22a of the improved design with Fig. 4.9c of the 80 Gb/s DFE for the same input sequence in Fig. 4.8. In contrast to the 80 Gb/s DFE, where bit errors occurred at 50 Gb/s in Fig. 4.11c, no bit errors occur at the same bit rate for the 110 Gb/s DFE, as shown in Fig. 4.22b.

When performing the test of the combinational logic in Sec. 4.2.1 in post-layout simulation, the improved DFE worked up to 59 Gb/s, contrary to only 44 Gb/s in the case of the 80 Gb/s DFE.

2. To increase the maximum clock frequency at which the feedback loop can work, the improved design of D-flip-flops presented in Sec. 3.2.2, on which the 100 GHz static frequency divider is based, was employed for FF<sub>5,6</sub>. When performing the test of the feedback loop in Sec. 4.2.2, the maximum frequency of operation of the feedback loop increased to 57 GHz in post layout simulations, compared



Figure 4.22: Post-layout simulation at 50 Gb/s DFE of the improved architecture: (a) the differential signal  $A(2m)$ , (b) the differential signal at the output of  $FF\_7$ .

to 39 GHz for the 80 Gb/s DFE. The output oscillation frequency of the feedback loop also increased to 22.5 GHz in post-layout simulation, compared to 15.5 GHz for the 80 Gb/s DFE.

The improved version of the DFE was designed, fabricated and measured. The chip photo is shown in Fig. 4.23. It works from two DC power supplies  $V_{cc}=1.5$  V and  $V_{ee}=-3.5$  V and draws 1.15 A of current.



Figure 4.23: Chip photo of the 110 Gb/s DFE.

The same measurement techniques described in section 4.2.1 and 4.2.2 were utilized for on-wafer measurement.

In the combinational logic test, the BER was less than  $10^{-13}$  up to 55 Gb/s for single-ended input logic swing of 250 mV at the DFE input. However, when the bit rate was increased to 56 Gb/s errors began to occur and the measured BER was  $4.1 \times 10^{-13}$  over a time period of 7 minutes. This agrees with the post-layout simulations presented earlier.

The test of the maximum operating frequency of the feedback loop yielded 57 and 21.2 GHz for the maximum frequency of operation of the feedback loop and its output oscillation frequency, respectively. This also agrees with the post-layout simulations.

The maximum bit rate of the improved version of the DFE is, however, 110 Gb/s, due to the limitation of the combinational logic speed to 55 Gb/s.

Because of the even higher power dissipation of the 110 Gb/s DFE in comparison to the 80 Gb/s DFE, mounting the 110 Gb/s DFE on a board similar to the one described in Sec. 4.3 proved not to be enough to dissipate the heat, even though the wafer thickness of the 110 Gb/s DFE was 200  $\mu\text{m}$  in comparison to 370  $\mu\text{m}$  for the 80 Gb/s DFE. Consequently, board-level measurement at the full rate with  $R_b=110 \text{ Gb/s}$ , similar to Sec. 4.3, was not possible. The extreme high temperature, caused by the power dissipation of the 110 Gb/s DFE and the insufficient heat dissipation of the 0.8 mm thick copper plate, is evident when comparing the two infrared pictures of the 110 Gb/s DFE in Fig. 4.24 and 4.25. The first infrared picture (Fig. 4.24) shows the temperature on the upper surface of the 110 Gb/s DFE in on-wafer measurement, where adequate heat dissipation is provided by the 750  $\mu\text{m}$ -thick wafer before thinning. In this case the maximum temperature is 71.37° C. The second infrared picture (Fig. 4.25) shows the temperature on the upper surface of the 110 Gb/s DFE, when mounted on the 0.8 mm thick copper plate with the heat conducting glue. In this case the maximum temperature reaches 232.8° C.



Figure 4.24: Infrared picture of the 110 Gb/s DFE measured on-wafer (rotated by 90° with respect to Fig. 4.23).



Figure 4.25: Infrared picture of the 110 Gb/s DFE measured on-board.

## 4.5 Comparison with State of the Art in DFEs

Table 4.1 presents a comparison of the 80 and 110 Gb/s DFEs, in this work with the state of the art in DFEs. The 80 and 110 Gb/s DFE presented here are the fastest among all other DFEs. They, however, dissipate the maximum power. This is mainly because the attention in this work was on the speed, rather than the power dissipation.

In general, the high power dissipation is due to the bipolar implementation of the architecture presented in this work in Sec. 2.2.4. However, the presented architecture lends itself well to other technologies, like CMOS, in which lower supply voltage can be used and eventually lower power dissipation can be obtained.

Fig. 4.26 shows the power dissipation breakdown of the 80 and 110 Gb/s DFEs. As evident from the pie chart, the power dissipation of the clock buffers together with the active balun constitutes more than 55% of the total power dissipation in

both versions of the DFE. However, the use of these clock buffers was necessary to compensate the loss in the clock tree, as discussed in Sec. 3.3.

| Ref.      | Bit rate (Gb/s) | Architecture                                   | Technology               | Area (mm <sup>2</sup> ) | Power dissipation | mW/Gb/s |
|-----------|-----------------|------------------------------------------------|--------------------------|-------------------------|-------------------|---------|
| [57]      | 66 3-taps       | DFL                                            | 65nm CMOS                | 1.44                    | 46 mW             | 0.7     |
| [32]      | 40              | DFL                                            | 65nm CMOS                | 0.05                    | 45 mW             | 1.125   |
| [20]      | 40              | Look-ahead Sec. 2.2.2                          | 0.18 $\mu$ m SiGe BiCMOS | 1.5                     | 0.76 W            | 19      |
| This work | 80              | Half-rate parallel look-ahead Sec. 2.2.4       | 0.13 $\mu$ m SiGe BiCMOS | 2                       | 4 W               | 50      |
| This work | 110             | Half-rate parallel look-ahead Sec. 2.2.4 & 4.4 | 0.13 $\mu$ m SiGe BiCMOS | 2.56                    | 5.75 W            | 52.3    |

Table 4.1: State of the art in DFEs for 40 Gb/s and above. Except stated, the DFE uses 1-tap.



Figure 4.26: Power dissipation breakdown for the (a) 80 Gb/s and (b) 110 Gb/s DFE.



# Chapter 5

## Conclusion and Outlook

As emphasized in the introduction to this work, the emergence of bandwidth-hungry Internet services, like cloud computing, audio and video streaming and video web conferencing, has been a driving force for an ever-increasing demand on links with high bit rates. To cope with this demand, communication systems employ optical fibers due to their inherent high bandwidth. Unfortunately, different kinds of dispersions in optical fiber result in intersymbol interference, which in turn limits the utilized bandwidth of the optical fibers. Here, equalizers present themselves as a good solution to combat ISI and consequently better utilize the bandwidth of the already installed optical fiber links. Equalizers, however, are not only employed in optical fiber communication systems but also in chip-to-chip and board-to-board communication systems to compensate for the ISI impacting the signals as they travel along the dispersive traces of printed circuit boards and bandwidth-limited electrical cables.

The goal of this dissertation was the design and characterization of a 1-tap decision feedback equalizer for 100 Gb/s communication systems. Firstly, a rigorous analysis was carried out as to which decision feedback architecture is more suitable for this high bit rate. The different architectures have been compared in terms of complexity and timing conditions. Thereupon, an architecture was selected and further modifications were proposed on it to improve its timing behavior.

For the implementation of the circuits, IHP SG13S 0.13  $\mu$ m BiCMOS SiGe:C technology was chosen, because of its high  $f_T / f_{max}$  and low ring oscillator gate delay. The circuit design and characterization of the different building blocks, which the architecture comprises, were explained in detail. These building blocks include the DFE front-end, D-flip-flops, clock tree and active balun. Different bandwidth enhancement techniques have been described and applied in the design of the DFE front-end with small signal voltage gain and 3 dB bandwidth of 2 dB and 71 GHz, respectively. A static frequency divider with maximum frequency of operation of 100 GHz has also been designed and measured to demonstrate the ability of the D-flip-flops to work at such high bit rate and clock frequency. In addition to that, the design of a broadband clock tree comprising coupled microstrip transmission lines and supporting clock frequencies 20-70 GHz has been expounded. Likewise, the design and measurement of a broadband balun with operating frequency range 31-65 GHz, matching the broadband operation of the clock tree, has been reported. The active balun achieves magnitude and phase imbalance of 0.6 dB and 2.5°, respectively.

Next, the different aspects of integrating the aforementioned building blocks into the DFE were explained. New on-wafer measurement techniques were devised to determine the maximum operation bit rate of the DFE. The first version of the DFE worked till 80 Gb/s. To demonstrate the effectiveness of the developed measurement techniques, two on-board experiments at 80 Gb/s are described in which the DFE was employed to mitigate the ISI resulting from bandwidth limitation by a 20 GHz low pass filter as well as ISI resulting from modal distortion by an SMA elbow. In both cases, the DFE enabled 80 Gb/s data transmission with  $10^{-9}$  bit error rate.

Finally, the main bottlenecks limiting the operation of the first version to 80 Gb/s were identified, whereupon, further improvements on the architecture were proposed. The improved architecture was used in the design of a second version of the DFE working till 110 Gb/s in on-wafer measurement. The increased power dissipation of the second version, however, hindered its on-board measurement at the full bit rate of 110 Gb/s.

Although the DFEs designed through this work represent the state of the art in terms of operational bit rate, they dissipate large amount of power. This is mainly due to the bipolar implementation necessitating a 5 V supply. The architecture, however, developed through this work lends itself well to ultra deep submicron CMOS technologies with typical gate lengths smaller than 0.1  $\mu$ m, which operate with power supplies less than 1.2 V. Provided that these technologies are fast enough to sustain the operation at such high bit rates, the decrease in power supply may lead to substantial decrease in the power dissipation of the DFE. Designing the DFE, however, in such technologies requires careful study, as lower power supplies result in smaller signal headroom, which may badly affect the signal integrity.

Another potential solution to reduce the large power dissipation of the DFEs is to dispense altogether with the active balun and the clock buffers, which power dissipation together constitutes more than 55% of the total power dissipation in the two DFE designs. The active balun in this case could be replaced by a transmission line based passive balun, like Marchand balun, or a transformer-based balun. In this case, however, one has to dispense with the wideband operation of the clock tree and provide enough power at the clock input of the DFE to compensate the losses arising from the clock tree.



# Appendix A

## Fabrication Technology

For implementation of the circuits in this work, IHP SG13S technology [58] [59] has been used. SG13S is a  $0.13\text{ }\mu\text{m}$  BiCMOS SiGe:C technology offering high-speed HBTs with  $f_T / f_{max} / \text{BV}_{CEO}$  of  $250\text{ GHz}/300\text{ GHz}/1.7\text{ V}$  and high-voltage HBTs with  $f_T / f_{max} / \text{BV}_{CEO}$  of  $45\text{ GHz}/120\text{ GHz}/3.7\text{ V}$ . Measured ring oscillator gate delay for this technology is  $2.9\text{ ps}$ . Fig. A.1 shows the backend aluminum metal stack, which consists of 5 thin layers (M1-5) based on  $130\text{ nm}$  rules and two thick layers (TM1 and TM2) with thickness of 2 and  $3\text{ }\mu\text{m}$ , for the implementation of high-quality factor passives. The technology offers  $130\text{ nm}$  gate length CMOS devices with thin and thick gate oxide for  $1.2$  and  $3.3\text{ V}$  core voltage, respectively. Table A.1 summarizes the parameters for this technology.

| Parameter                | high-speed HBT                  | high-voltage HBT                |
|--------------------------|---------------------------------|---------------------------------|
| $A_E$                    | $0.12 \times 0.48\mu\text{m}^2$ | $0.18 \times 1.02\mu\text{m}^2$ |
| Peak $f_T$               | 250 GHz                         | 45 GHz                          |
| Peak $f_{max}$           | 300 GHz                         | 120 GHz                         |
| $\text{BV}_{CEO}$        | 1.7 V                           | 3.7 V                           |
| $\text{BV}_{CBO}$        | 5 V                             | 15 V                            |
| $\beta$                  | 900                             | 600                             |
| 1.2 V core NMOS $V_{th}$ | 0.49 V                          |                                 |
| 1.2 V core PMOS $V_{th}$ | 0.42 V                          |                                 |
| MIM Capacitor            | $1.5\text{ fF}/\mu\text{m}^2$   |                                 |
| $P^+$ poly resistor      | $250\text{ }\Omega/\square$     |                                 |
| High poly resistor       | $1300\text{ }\Omega/\square$    |                                 |

Table A.1: Summary of IHP SG13S technology parameters.



Figure A.1: Metallization stack of IHP SG13S technology.

# Appendix B

## Deembedding

On-wafer standard calibration methods like short-open-load-thru (SOLT), Line-reflect-reflect-match (LRRM) and thru-reselect-line (TRL) [60] [61] [62] shift the measurement reference plane to the probe tip, as shown in Fig. B.1. However, it is often required to further shift the reference plane to the device under test (DUT), overcoming the effect of the error network (also called fixture) on the measurement results of the DUT. This further shift of the reference plane is performed with the help of deembedding techniques. Examples of this error network are the pads and the connecting lines from the pads to the DUT. Although full calibration using SOLT, LRRM or TRL can also shift the reference plane to the DUT, it is not always possible for all structures. Furthermore, deembedding techniques are preferred in this case because they do not require the manufacturing of extra test chips with well-defined reference standards [62].



Figure B.1: The error networks and the DUT.

One of these deembedding techniques is the open-short technique [63]. This deembedding technique is accurate only if the dimensions of the error network are very small compared to the wavelength ( $\leq \frac{\lambda}{20}$ ) corresponding to the maximum frequency of measurements. In this technique a model for the error network is assumed. The model consists of parallel parasitics (for example, the pad capacitance) and series parasitics (for example, the inductance and resistance of the connecting lines from the pads to the DUT) [63]. An example is shown in Fig. B.2a, in which pads and connecting lines are used to access the DUT. In this example, the measurement is performed on-wafer using GS probe.  $Y_{p1}, Y_{p2}, Y_{p3}$  are the parallel parasitic impedances and  $Z_{p1}, Z_{p2}, Z_{p3}$  are the series parasitic impedances, as shown in Fig. B.2b.



Figure B.2: The embedded DUT in the fixture and the corresponding model.

Two extra structures are required in this technique: an open structure, in which the DUT is simply removed (Fig. B.3a) and a short structure (Fig. B.4a), in which the DUT is replaced by a short circuit to the ground plane. The lumped models of the open and short structures are shown in Fig. B.3b and Fig. B.4b, respectively. The three structures are measured with a VNA and then their S-parameters are converted to Y-parameters [60]. The Y-parameters of the open, short, embedded (DUT+error networks) and the DUT structures are assumed to be  $Y_{open}$ ,  $Y_{short}$ ,  $Y_{embd}$  and  $Y_{dut}$ , respectively.

Since the series parasitic impedances,  $Z_{p1}, Z_{p2}, Z_{p3}$  are embedded into the parallel parasitic impedances  $Y_{p1}, Y_{p2}, Y_{p3}$ , as shown in Fig. B.4b, firstly, the effect of the parallel parasitics impedances must be removed from  $Y_{short}$  by subtraction and then converted



Figure B.3: The “open” structure and its corresponding model.

to Z-parameters

$$\begin{pmatrix} Z_{p1} + Z_{p3} & Z_{p3} \\ Z_{p3} & Z_{p2} + Z_{p3} \end{pmatrix} = (Y_{short} - Y_{open})^{-1} \quad (\text{B.1})$$

A similar procedure is then performed on  $Y_{embd}$  to remove the effect of the parallel parasitics impedances. The resulting matrix is also converted to Z-parameters, namely  $(Y_{embd} - Y_{open})^{-1}$ . The Z-parameters of DUT can then be found by subtracting  $(Y_{short} - Y_{open})^{-1}$  from  $(Y_{embd} - Y_{open})^{-1}$ . If desired, the Z-parameters of the DUT can be converted back to S-parameters.

$$Z_{dut} = (Y_{embd} - Y_{open})^{-1} - (Y_{short} - Y_{open})^{-1} \quad (\text{B.2})$$



Figure B.4: The “short” structure and its corresponding model.



## Appendix C

# Small Signal Voltage Gain and S-parameters

The following derivation was used to obtain an expression of the voltage gain,  $A_v$ , as a function of the S-parameters [64].



Figure C.1: The two port network.

The two-port network depicted in Fig. C.1 is assumed to be embedded in a system with source impedance,  $Z_S$  and load impedance,  $Z_L$ . The S-parameters of this two-port network is assumed to be given by

$$\begin{pmatrix} b_1 \\ b_2 \end{pmatrix} = \begin{pmatrix} S_{11} & S_{12} \\ S_{21} & S_{22} \end{pmatrix} \begin{pmatrix} a_1 \\ a_2 \end{pmatrix} \quad (C.1)$$

where  $a_1$  and  $a_2$  are the incident power waves at port 1 and 2, respectively.  $b_1$  and  $b_2$  are the reflected power waves at port 1 and 2, respectively. The input and output

voltages,  $v_1$  and  $v_2$ , respectively, can be written as

$$v_1 = \sqrt{R_o}(a_1 + b_1) \quad (C.2)$$

$$v_2 = \sqrt{R_o}(a_2 + b_2) \quad (C.3)$$

where  $R_o$  is the reference impedance of the S-parameters measurement system. The voltage gain  $A_v$  is given by

$$A_v = \frac{v_2}{v_1} = \frac{a_2 + b_2}{a_1 + b_1} = \frac{a_2}{a_1} \left( \frac{1 + \Gamma_2}{1 + \Gamma_1} \right) \quad (C.4)$$

Where  $\Gamma_1 = \frac{b_1}{a_1}$  and  $\Gamma_2 = \frac{b_2}{a_2}$  are the reflection coefficients looking into port 1 and port 2, respectively. But from the S-parameters in Eqn. C.1,  $b_2 = S_{21}a_1 + S_{22}a_2$ , which leads to

$$\frac{a_2}{a_1} = \frac{S_{21}}{\Gamma_2 - S_{22}} \quad (C.5)$$

Substituting from Eqn. C.5 into Eqn. C.4, we obtain

$$A_v = \frac{S_{21}}{\Gamma_2 - S_{22}} \left( \frac{1 + \Gamma_2}{1 + \Gamma_1} \right) \quad (C.6)$$

To find the expression for  $\Gamma_1$  and  $\Gamma_2$  in terms of the S-parameters of the two port-network and the load impedance,  $Z_L$ , we note that since no signal energy is explicitly applied at port 2 as an independent voltage or current source, the power wave,  $a_2$ , incident at port 2 of the linear network is necessarily equal to the energy reflected back to port 2 by the load impedance  $Z_L$ , that is,  $a_2 = b_L$ , as shown in Fig. C.1. Additionally, all energy reflected at port 2 of the linear network is incident to the load impedance  $Z_L$ , thereby establishing the constraint,  $b_2 = a_L$ . It follows that port 2 reflection coefficient,  $\Gamma_2$ , relates to the load reflection coefficient,  $\Gamma_L$ , by the relation

$$\Gamma_2 = \frac{b_2}{a_2} = \frac{a_L}{b_L} = \frac{1}{\Gamma_L} \quad (C.7)$$

It follows from Eqn. C.1 that  $b_1 = S_{11}a_1 + S_{12}a_2$ , which can be rewritten using Eqn. C.7

as

$$b_1 = S_{11}a_1 + \Gamma_L S_{12}a_L \quad (\text{C.8})$$

Using Eqn. C.8,  $\Gamma_1 = \frac{b_1}{a_1}$  can be written as:

$$\Gamma_1 = S_{11} + \Gamma_L S_{12} \frac{a_L}{a_1} \quad (\text{C.9})$$

The ratio  $\frac{a_L}{a_1}$  as a function of  $\Gamma_L$  and the S-parameters can be found similarly by substituting Eqn. C.5 in the expression of  $b_2$  from the scattering matrix in Eqn. C.1

$$b_2 = a_L = S_{21}a_1 + S_{22}b_L = S_{21}a_1 + \Gamma_L S_{22}a_L \quad \Rightarrow \quad \frac{a_L}{a_1} = \frac{S_{21}}{1 - \Gamma_L S_{22}} \quad (\text{C.10})$$

Now substituting from Eqn. C.10 into Eqn. C.9, an expression of  $\Gamma_1$  as a function of  $\Gamma_L$  and the S-parameters can be obtained as

$$\Gamma_1 = S_{11} + \frac{\Gamma_L S_{12} S_{21}}{1 - \Gamma_L S_{22}} \quad (\text{C.11})$$

Using Eqn. C.11 and Eqn. C.7 to substitute for  $\Gamma_1$  and  $\Gamma_2$  in Eqn. C.6 will lead to the required expression for  $A_v$  as a function of the load impedance  $Z_L$  (since  $\Gamma_L = \frac{Z_L - R_o}{Z_L + R_o}$ ) and the S-parameters of the two-port network as follows

$$A_v = \frac{S_{21}}{1 - \Gamma_L S_{22}} \left( \frac{1 + \Gamma_L}{1 + \Gamma_1} \right) = \frac{S_{21}}{1 - \Gamma_L S_{22}} \left( \frac{1 + \Gamma_L}{1 + S_{11} + \frac{\Gamma_L S_{12} S_{21}}{1 - \Gamma_L S_{22}}} \right) \quad (\text{C.12})$$

In case of high isolation between port 1 and 2 ( $S_{12} \approx 0$ ), the expression in Eqn. C.11 reduces to  $\Gamma_1 = S_{11}$  and consequently also the expression in Eqn. C.12 reduces to:

$$A_v = \frac{S_{21}}{1 - \Gamma_L S_{22}} \left( \frac{1 + \Gamma_L}{1 + S_{11}} \right) \quad (\text{C.13})$$



# References

- [1] T. Robertazzi, *Basics of computer networking*. Germany: Springer, 2011.
- [2] Z. Gu, *High-speed CMOS ICs for 10 Gbit/s optical fiber communication receivers*. Aachen, Germany: Shaker Verlag GmbH, 2005.
- [3] E. Säckinger, *Broadband circuits for optical fiber communication*. John Wiley & Sons, 2005.
- [4] E. Mammei, F. Loi, F. Radice, A. Dati, M. Brucolieri, M. Bassi, and A. Mazzanti, "Analysis and design of a power-scalable continuous-time FIR equalizer for 10 Gb/s to 25 Gb/s multi-mode fiber EDC in 28 nm LP CMOS," *IEEE J. Solid-State Circuits*, vol. 49, no. 12, Dec. 2014.
- [5] D. Molin, M. Bigot-Astruc, G. Kuyt, G. Melin, and P. Sillard, "Multimode fibers for cost-effective high-speed, short-range networks," in *European Conference and Exhibition on Optical Communications (ECOC)*, Sep. 2012.
- [6] E. Ip, A. P. T. Lau, D. J. F. Barros, and J. M. Kahn, "Coherent detection in optical fiber systems," *Optics Express*, vol. 16, no. 2, 2008.
- [7] A. Awny, L. Moeller, J. Junio, J. C. Scheytt, and A. Thiede, "Design and measurement techniques for an 80 Gb/s 1-Tap decision feedback equalizer," *IEEE J. Solid-State Circuits*, vol. 49, no. 2, Feb. 2014.

- [8] R. H. Derksen, M. Möller, and C. Schubert, "100-Gbit/s full-ETDM transmission technologies," in *Compound Semiconductor Integrated Circuit Symposium (CSICS)*, Portland, Oregon, USA, Oct. 2007.
- [9] J. M. Kahn, "modulation and detection techniques for optical communication systems," in *Optical amplifiers and their applications/coherent optical technologies and applications*. Optical Society of America, 2006.
- [10] 400 Gb/s Ethernet Study Group. [Online]. Available: <http://www.ieee802.org/3/400GSG/index.html>
- [11] IEEE P802.3bs 400 Gb/s Ethernet Task Force. [Online]. Available: <http://www.ieee802.org/3/bs/index.html>
- [12] H. Wu, J. A. Tierno, P. Pepeljugoski, J. Schaub, S. Gowda, J. A. Kash, and A. Hajimiri, "Integrated transversal equalizers in high-speed fiber-optic systems," *IEEE J. Solid-State Circuits*, vol. 38, no. 12, Dec. 2003.
- [13] H.-M. Bae, J. B. Ashbrook, J. Park, N. R. Shanbhag, A. C. Singer, and S. Chopra, "An MLSE receiver for electronic dispersion compensation of OC-192 fiber links," *IEEE J. Solid-State Circuits*, vol. 41, no. 11, Nov. 2006.
- [14] S. Shahramian, S. P. Voinigescu, and A. C. Carusone, "A 35-GS/s 4-bit flash ADC with active data and clock distribution trees," *IEEE J. Solid-State Circuits*, vol. 44, no. 6, Jun. 2009.
- [15] A. Balteanu, P. Schvan, and S. P. Voinigescu, "A 6-bit segmented RZ DAC architecture with up to 50-GHz sampling clock and 4 Vpp differential swing," in *IEEE International Microwave Symposium (IMS)*, Montreal, Quebec, Canada, Jun. 2012.
- [16] J. Sewter and A. C. Carusone, "A comparison of equalizers for compensating polarization-mode dispersion in 40-Gb/s optical systems," in *IEEE International Symposium on Circuits and Systems (ISCAS)*, 2005.

- [17] ——, “Equalizer architectures for 40-Gb/s optical systems limited by polarization-mode dispersion,” *International Journal of High Speed Electronics and Systems*, vol. 15, no. 03, 2005.
- [18] J. G. Proakis and D. G. Manolakis, *Digital signal processing*, 3rd ed. Prentice-Hall, 1996.
- [19] S. Haykin, *Communication systems*, 4th ed. John Wiley & Sons, 2001, pp. 291-293.
- [20] A. Gang, A. C. Carusone, and S. P. Voinigescu, “A 1-tap 40-Gb/s look-ahead decision feedback equalizer in 0.18  $\mu$ m SiGe BiCMOS technology,” *IEEE J. Sel. Areas Commun.*, vol. 41, no. 5, Oct. 2006.
- [21] J. H. Winters, R. D. Gitlin, and S. Kasturia, “Reducing the effects of transmission impairments in digital fiber optic systems,” *IEEE Commun. Mag.*, vol. 31, no. 6, Jun. 1993.
- [22] K. Kaviani, W. Ting, J. Wei, A. Amirkhani, J. Shen, T. J. Chin, C. Thakkar, W. T. Beyene, N. Chan, C. Chen, B. R. Chuang, D. Dressler, V. P. Gadde, M. Hekmat, E. Ho, C. Huang, P. Le, Mahabaleshwara, C. Madden, N. K. Mishra, L. Raghavan, K. Saito, R. Schmitt, D. Secker, X. Shi, S. Fazeel, G. S. Srinivas, S. Zhang, C. Tran, A. Vaidyanath, K. Vyas, M. Jain, K.-Y. K. Chang, and X. Yuan, “A tri-modal 20-Gbps/link differential/DDR3/GDDR5 memory interface,” *IEEE J. Solid-State Circuits*, vol. 47, no. 4, Apr. 2012.
- [23] P. K. HANUMOLU, G. WEI, and U.-K. U. MOON, “Equalizers for high-speed serial links,” *International Journal of High Speed Electronics and Systems*, vol. 15, no. 02, 2005.
- [24] J. M. Rabaey, A. Chandrakasan, and B. Nikolic, *Digital integrated circuits*, 2nd ed. Prentice Hall, 2003.

- [25] H.-M. Rein and M. Moeller, "Design considerations for very-high-speed Si-bipolar ICs operating up to 50Gb/s," *IEEE J. Solid-State Circuits*, vol. 31, no. 8, Aug. 1996.
- [26] S. Kasturia and J. H. Winters, "Techniques for high-speed implementation of nonlinear cancellation," *IEEE J. Sel. Areas Commun.*, vol. 9, no. 5, Jun. 1991.
- [27] J. H. Winters and S. Kasturia, "Adaptive nonlinear cancellation for high-speed fiber-optic systems," *J. Lightw. Technol.*, vol. 10, no. 7, Jul. 1992.
- [28] M. Alioto and G. Palumbo, *Model and design of bipolar and MOS current-mode logic: CML, ECL and SCL digital circuits*. Springer, 2005.
- [29] L. Möller, Z. Gu, A. Thiede, S. Chandrassekhar, and L. Stulz, "20 Gbit/s electrical data recovery using decision feedback equalizer supported receiver," *Electron. Lett.*, vol. 39, no. 1, Jan. 2003.
- [30] A. Awny, A. Thiede, and J. C. Scheytt, "Design and test of decision feedback equalizers for 80 Gbit/s bit rate and beyond," in *IEEE International Microwave Symposium (IMS)*, Baltimore, MD, USA, Jun. 2011.
- [31] B. Razavi, *Design of integrated circuits for optical communications*. Columbus, OH: McGraw-Hill, 2002.
- [32] C.-L. Hsieh and S.-I. Liu, "A 40 Gb/s decision feedback equalizer using back-gate feedback technique," in *Symposium on VLSI Circuit*, Kyoto, Japan, Jun. 2009.
- [33] ——, "Decision feedback equalizers using the back-gate feedback technique," *IEEE Trans. Circuits Syst. II*, vol. 58, no. 12, Dec. 2011.
- [34] A. Awny, A. Thiede, M. Elkhouly, J. Borngräber, F. Korndörfer, and J. C. Scheytt, "Mixed-signal techniques in mm-wave range for 100 Gbit decision feedback equalizer," in *10th Topical Meeting on Silicon Monolithic Integrated Circuits in RF Systems (SiRF)*, New Orleans, LA, USA, Jan. 2010.

- [35] A. Awny, A. Thiede, F. Korndorfer, and J. C. Scheytt, "mm-wave and logic acceleration techniques for 100 Gbit decision feedback equalizer," in *German Microwave Conference (GeMiC)*, Munich, Germany, Mar. 2009.
- [36] P. G. Gray, P. J. Hurst, S. H. Lewis, and R. G. Meyer, *Analysis and design of analog integrated circuits*, 4th ed. New York, USA: John Wiley & Sons, 2001.
- [37] B. Razavi, *Fundamentals of microelectronics*. Columbus, OH: John Wiley & Sons, 2008.
- [38] S. Voinigescu, *High-frequency integrated circuits*, 1st ed. Cambridge, UK: Cambridge University Press, 2013.
- [39] S. S. Mohan, M. Hershenson, S. P. Boyd, and T. H. Lee, "Bandwidth extension in CMOS with optimized on-chip inductors," *IEEE J. Solid-State Circuits*, vol. 35, no. 3, Mar. 2000.
- [40] A. Awny, A. Thiede, J. Borngräber, M. Elkhouly, and J. C. Scheytt, "Speed/power performance of D-type flip-flops in a 0.13  $\mu$ m SiGe:C HBT technology demonstrated by a 86 GHz static frequency divider," in *German Microwave Conference (GeMiC)*, Berlin, Germany, Mar. 2010.
- [41] M. Möller, "Challenges in the cell-based design of very-high-speed SiGe-bipolar ICs at 100 Gb/s," *IEEE J. Solid-State Circuits*, vol. 43, no. 9, Sep. 2008.
- [42] I. J. Bahl and P. Bhartia, *Microwave solid state circuit design*. John Wiley & Sons, 1988.
- [43] A. Awny, C. Wipf, J. C. Scheytt, and A. Thiede, "Broadband 31-65 GHz inductorless active BALUN with 12.4 dB gain in 0.13  $\mu$ m SiGe:C BiCMOS technology," in *14st European Microwave Integrated Circuits Conference (EuMIC)*, Manchester, UK, Oct. 2011.
- [44] M. Mandal and S. Sanyal, "Reduced-length rat-race couplers," *IEEE Trans. Microw. Theory Tech.*, vol. 55, no. 12, Dec. 2007.

- [45] P.-S. Wu, C.-H. Wang, T.-W. Huang, and H. Wang, "Compact and broad-band millimeter-wave monolithic transformer balanced mixers," *IEEE Trans. Microw. Theory Tech.*, vol. 53, no. 10, Oct. 2005.
- [46] P.-S. Wu, C.-H. Tseng, M.-F. Lei, T.-W. Huang, H. Wang, and P. Liao, "Three-dimensional x-band new transformer balun configuration using the multilayer ceramic technologies," in *34th European Microwave Conference (EuMC)*, Amsterdam, Netherlands, Oct. 2004.
- [47] E. Laskin, S. T. Nicolson, P. Chevalier, A. Chantre, B. Sautreuil, and S. P. Voinigescu, "Low-power, low-phase noise SiGe HBT static frequency divider topologies up to 100 GHz," *Bipolar/BICMOS Circuits and Technology Meeting (BCTM)*, 2006.
- [48] S. Aloui, E. Kerherve, R. Plana, and D. Belot, "RF-pad, transmission lines and balun optimization for 60GHz 65nm CMOS power amplifier," in *IEEE Radio Frequency Integrated Circuits Symposium (RFIC)*, Anaheim, CA, USA, May 2010.
- [49] E. Tiiliharju and K. Halonen, "An active differential broad-band phase splitter for quadrature-modulator applications," *IEEE Trans. Microw. Theory Tech.*, vol. 53, no. 2, Feb. 2005.
- [50] B. Nauta, "Single-to-differential converter," U.S. Patent 5,404,050, Apr. 4, 1995.
- [51] Y. Jin, M. Spirito, and J. Long, "A 60 GHz-band millimeter-wave active balun with  $\pm 5^\circ$  phase error," in *5th European Microwave Integrated Circuits Conference (EuMIC)*, Paris, France, Sep. 2010.
- [52] B.-J. Huang, B.-J. Huang, K.-Y. Lin, and H. Wang, "A 2 - 40 GHz active balun using 0.13  $\mu\text{m}$  CMOS process," *IEEE Microw. Wireless Compon. Lett.*, vol. 19, no. 3, Mar. 2009.
- [53] A. Costantini, B. Lawrence, S. Mahon, J. Harvey, G. McCulloch, and A. Bessemoulin, "Broadband active and passive balun circuits: functional blocks for

- modern millimeter-wave radio architectures," in *1st European Microwave Integrated Circuits Conference (EuMIC)*, Manchester, UK, Sep. 2006.
- [54] B.-J. Huang, B.-J. Huang, K.-Y. Lin, and H. Wang, "Design of an original K-band active balun with improved broadband balanced behavior," *IEEE Microw. Wireless Compon. Lett.*, vol. 15, no. 4, Apr. 2005.
- [55] W. Fang, A. Brunnschweiler, and P. Ashburn, "An analytical maximum toggle frequency expression and its application to optimizing high-speed ECL frequency dividers," *IEEE J. Solid-State Circuits*, vol. 25, no. 4, Aug. 1990.
- [56] L. Moeller, A. Awny, J. Junio, C. Bolle, C. Scheytt, and A. Thiede, "80 Gb/s decision feedback equalizer for intersymbol interference limited channels," in *The optical fiber communication conference (OFC)*, Anaheim, CA, USA, Mar. 2013.
- [57] L. Yue and E. Alon, "A 66 Gb/s 46 mW 3-tap decision feedback equalizer in 65nm CMOS," in *IEEE International Solid-State Circuits Conference (ISSCC)*, San Francisco, CA, USA, Feb. 2013.
- [58] H. Rücker, B. Heinemann, W. Winkler, R. Barth, J. Borngräber, J. Drews, G. G. Fischer, A. Fox, T. Grabolla, U. Haak, D. Knoll, F. Korndörfer, A. Mai, S. Marschmeyer, P. Schley, D. Schmidt, M. A. Schubert, K. Schmalz, B. Tillack, D. Wolansky, and Y. Yamamoto, "A 0.13  $\mu$ m sige bicmos technology featuring ft/fmax of 240/330 ghz and gate delays below 3 ps," *IEEE J. Solid-State Circuits*, vol. 45, no. 9, Sep. 2010.
- [59] H. Rücker, B. Heinemann, R. Barth, J. Bauer, K. Blum, D. Bolze, J. Drews, G. G. Fischer, A. Fox, O. Fursenko, T. Grabolla, U. Haak, W. Höppner, D. Knoll, K. Köpke, B. Kuck, A. Mai, S. Marschmeyer, T. Morgenstern, H. H. Richter, P. Schley, D. Schmidt, K. Schulz, B. Tillack, G. Weidner, W. Winkler, D. Wolansky, H.-E. Wulf, and Y. Yamamoto, "SiGe BiCMOS technology with 3.0 ps gate delay," *IEEE Int. Electron Device Meeting*, pp. 651–654, Dec. 2007.

- [60] D. M. Pozar, *Microwave engineering*, 2nd ed. New York, USA: John Wiley & Sons, 1997.
- [61] L. Martens, *High-frequency characterization of electronic packaging*. Springer, 1998.
- [62] V. Issakov, M. Wojnowski, A. Thiede, and L. Maurer, "Extension of thru de-embedding technique for asymmetrical and differential devices," *IET Circuits, Devices and Systems*, vol. 3, no. 2, pp. 91–98, 2009.
- [63] M. C. A. M. Koolen, J. A. M. Geelen, and M. P. J. G. Versleijen, "An improved de-embedding technique for on-wafer high-frequency characterization," in *IEEE Bipolar Circuits and Technology Meeting (BCTM)*, Minneapolis, USA, Sep. 1991.
- [64] J. Choma and W. K. Chen, *Feedback networks: theory and circuit applications*. World Scientific, 2007.

# **Curriculum Vitae**

|                  |                                          |
|------------------|------------------------------------------|
| Name and Surname | Ahmed Sanaa Ahmed Awny                   |
| Date of birth    | 31.05.1982                               |
| Place of birth   | Cairo, Egypt                             |
| Current address  | Gubenerstr. 21c<br>15230 Frankfurt(Oder) |
| Email            | a.awny@yahoo.com                         |

## **Education**

|      |                                                                                                                                                                                             |
|------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 2004 | Bachelor of Science (B.Sc.) in Communications and Electronics Engineering, Department of Communications and Electronics Engineering, Faculty of Engineering, Cairo University, Giza, Egypt. |
| 2007 | Master of Science (M.Sc.) in Electronics, Department of Communications and Electronics Engineering, Faculty of Engineering, Cairo University, Giza, Egypt.                                  |

## **Experience**

|                       |                                                                                                                                                        |
|-----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|
| Sep. 2004 - Sep. 2006 | Teaching Assistant at the Department of Electronics, Faculty of Information Engineering and Technology (IET), German University in Cairo (GUC), Egypt. |
|-----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|

- Dec. 2007 - Feb. 2011      Research Assistant at the Department of High Frequency Electronics, University of Paderborn, Germany. The work has been conducted at the Circuit Design Department, IHP, Frankfurt(Oder), Germany.
- Mar. 2011 - present      Research Scientist in the Broadband Mixed Signal Group, Circuit Design Department, IHP, Frankfurt(Oder), Germany.

# List of Publication

- A. Awny, L. Moeller, J. Junio, J. C. Scheytt, and A. Thiede “Design and measurement techniques for an 80 Gb/s 1-Tap decision feedback equalizer,” *IEEE J. Solid-State Circuits*, vol. 49, no. 2, Feb. 2014.
- L. Moeller, A. Awny, J. Junio, C. Bolle, C. Scheytt, and A. Thiede, “80 Gb/s decision feedback equalizer for intersymbol interference limited channels,” in *The optical fiber communication conference (OFC)*, Anaheim, CA, USA, Mar. 2013.
- A. Awny, A. Thiede, and J. C. Scheytt, “Design and test of decision feedback equalizers for 80 Gbit/s bit rate and beyond,” in *IEEE International Microwave Symposium (IMS)*, Baltimore, MD, USA, Jun. 2011.
- A. Awny, C. Wipf, J. C. Scheytt, and A. Thiede, “Broadband 31-65 GHz inductorless active BALUN with 12.4 dB gain in 0.13  $\mu$ m SiGe:C BiCMOS technology,” in *14st European Microwave Integrated Circuits Conference (EuMIC)*, Manchester, UK, Oct. 2011.
- A. Awny, A. Thiede, J. Borngräber, M. Elkhouly, and J. C. Scheytt, “Speed/power performance of D-type flip-flops in a 0.13  $\mu$ m SiGe:C HBT technology demonstrated by a 86 GHz static frequency divider,” in *German Microwave Conference (GeMiC)*, Berlin, Germany, Mar. 2010.
- A. Awny, A. Thiede, M. Elkhouly, J. Borngräber, F. Korndörfer, and J. C. Scheytt, “Mixed-signal techniques in mm-wave range for 100 Gbit decision feedback equal-

izer," in *10th Topical Meeting on Silicon Monolithic Integrated Circuits in RF Systems (SiRF)*, New Orleans, LA, USA, Jan. 2010.

- A. Awny, A. Thiede, F. Korndorfer, and J. C. Scheytt, "mm-wave and logic acceleration techniques for 100 Gbit decision feedback equalizer," in *German Microwave Conference (GeMiC)*, Munich, Germany, Mar. 2009.