C O M P I L AT I O N F O R M O R E P R A C T I C A L
S E C U R E M U LT I - PA RT Y C O M P U TAT I O N

Vom Fachbereich Informatik der
Technischen Universität Darmstadt genehmigte

Dissertation

zur Erlangung des akademischen Grades
Doktor-Ingenieur (Dr.-Ing.)

von

Niklas Büscher, M.Sc.
geboren in Münster

Referenten: Prof. Dr. Stefan Katzenbeisser

Prof. Dr. Florian Kerschbaum

Tag der Einreichung: 21.11.2018

Tag der Prüfung: 11.01.2019

D 17

Darmstadt, 2018


Dieses Dokument wird bereitgestellt von tuprints, E-Publishing-Service
der TU Darmstadt.
http://tuprints.ulb.tu-darmstadt.de
tuprints@ulb.tu-darmstadt.de

Bitte zitieren Sie dieses Dokument als:
URN: urn:nbn:de:tuda-tuprints-84950

URL: https://tuprints.ulb.tu-darmstadt.de/id/eprint/8495

Die Veröffentlichung steht unter folgender Creative Commons Lizenz:
Attribution – NonCommercial – NoDerivatives 4.0 International
(CC BY-NC-ND 4.0)
http://creativecommons.org/licenses/by-nc-nd/4.0/

http://tuprints.ulb.tu-darmstadt.de
mailto:tuprints@ulb.tu-darmstadt.de
urn:nbn:de:tuda-tuprints-84950
https://tuprints.ulb.tu-darmstadt.de/id/eprint/8495
http://creativecommons.org/licenses/by-nc-nd/4.0/


A B S T R A C T

Within the last decade, smartphone applications and online services
became universal resources and an integral part of nowadays life. Un-
fortunately, the ongoing digitization trend is also a huge risk for the
individual’s privacy because users of these interconnected devices
and services become more and more transparent and reveal sensi-
tive data to an untrusted and possibly unknown service provider. Yet,
since the 1980’s it is known that any computation between two or
more parties can be evaluated securely such that the parties do not
learn more about the inputs of the other parties than they can derive
from the output of the computation. For a long time considered to be
a purely theoretical concept, in the last fifteen years, this technique
known as Secure Multi-Party Computation (MPC), transitioned into
a powerful cryptographic tool to build privacy-enhancing technol-
ogy. As such MPC could prevent mass surveillance of online services
while maintaining the majority of their business use cases. Further-
more, MPC could be an enabler for novel business-to-business use
cases, where mutually distrusting parties corporate by sharing data
without losing control over it. Albeit its potential, the practicality of
MPC is hindered by the difficulty to implement applications on top
of the underlying cryptographic protocols. This is because their man-
ual construction requires expertise in cryptography and hardware de-
sign. The latter is required as functionalities in MPC are commonly
expressed by Boolean and Arithmetic circuits, whose creation is a
complex, error-prone, and time-consuming task.

To make MPC accessible to non-domain experts, in this thesis we
design, implement, and evaluate multiple compilation techniques that
translate the high-level language ANSI C into circuit representations
optimized for different classes of MPC protocols. Split in two parts,
we focus on Boolean circuit based protocols in the first part of this
thesis. We begin with an introduction into compilation and optimiza-
tion of circuits with minimal size, which is required for constant
round MPC protocols over Boolean circuits, such as Yao’s Garbled
Circuits protocol. For this purpose, we identify and evaluate classic
logic minimization techniques for their application in compilation for
MPC. Then, we present compiler assisted parallelization approaches
for Yao’s protocol that distribute the computational workload onto
multiple processors, which can allow a faster or possibly more en-
ergy efficient protocol evaluation. By extending the protocol, we fur-
ther show that parallelization leads to speed-ups even in single-core
settings. As not only size minimization is of relevance for MPC, we
also propose a compilation chain for the creation of depth-minimized

iii


Boolean circuits, optimized for their use in multi-round protocols,
such as the GMW protocol. For this purpose, we propose and im-
plement new hand-optimized building blocks as well as code and
circuit minimization techniques. In most cases the presented compil-
ers create applications from high-level source code that outperform
previous (hand-optimized) work.

In the second part, we introduce compilers for two advanced hy-
brid MPC protocols. First, we study the creation of MPC applica-
tions using multiple (standalone) MPC protocols at once. By com-
bining protocols with different paradigms, e.g., Boolean and Arith-
metic circuits based protocols, faster applications can be created. For
the compilation of these hybrid applications we design and present
novel code decomposition and optimization techniques. Moreover,
we introduce solutions to the protocol selection problem to efficiently
combine multiple protocols. Thus, we are able to present the first
compiler that achieves full automatization from source code to hy-
brid MPC. Second, we investigate compilation for the combination of
Oblivious RAM with MPC, also known as RAM based secure compu-
tation (RAM-SC). RAM-SC is required in data intensive applications,
where circuit based protocols show limited scalability. A multitude of
ORAMs based on different design principles with different trade-offs
has been proposed. We explore all these design principles and corre-
sponding deployment costs in different scenarios, before introducing
a compiler that identifies an optimal selection of ORAM schemes for
a given input source code. As such, we present the first fully autom-
atized compile chain for RAM-SC programs.

In summary, we contribute in making MPC practical by improving
both, efficiency and automatized application generation.

Z U S A M M E N FA S S U N G

Eine der größten technischen Entwicklungen in den letzten 15 Jah-
ren ist der Fortschritt des Smartphones, die ständige Verfügbarkeit
von Informationen und Konnektivität, sowie die damit verbundene
Nutzung von mächtigen Onlinediensten. Leider birgt dieser Digita-
lisierungstrend auch große Risiken für die Privatsphäre der Nutzer,
denn diese werden immer transparenter für den teilweise unbekann-
ten Dienstanbieter. Es ist jedoch bereits seit Mitte der 1980er-Jahre be-
kannt, dass jegliche Art von Berechnung oder Interaktion zwischen
zwei oder mehreren Parteien auch privatsphärefreundlich durchge-
führt werden kann, so dass die teilnehmenden Parteien nicht mehr
über die Eingaben der anderen Parteien lernen können, als das was
sie sich von der gemeinsame Ausgabe ableiten können. Lange war
diese Idee der sicheren Mehrparteienberechnung (MPC) nur ein theo-
retisches Konzept, jedoch hat es sich in den letzten 15 Jahren in ein

iv


sehr praktisches kryptographisches Werkzeug entwickelt, um privat-
sphäreschützende Anwendungen zu realisieren. Als solches könnte
MPC helfen die Massenüberwachung der Onlinedienste zu unterbin-
den, ohne deren Geschäftsmodelle komplett außer Kraft zu setzen.
MPC ermöglicht auch neuartige Geschäftsmodelle, so dass beispiels-
weise zwei sich nicht vertrauende Unternehmen kooperieren und
Daten teilen, ohne dabei die Kontrolle über diese Daten an den Ge-
schäftspartner zu verlieren. Trotz des immensen Potentials von MPC
findet es kaum Anwendung in der Praxis. Dies liegt vor allem an der
Komplexität und dem Aufwand, um Anwendungen für existierende
MPC Protokolle zu entwickeln. Die Erstellung dieser Anwendungen
benötigt nämlich umfangreiche Kenntnisse der Kryptographie und
des Logikdesigns (Schaltungsbau). Letzteres wird benötigt, da Funk-
tionen in MPC üblicherweise in Form von logischen oder arithmeti-
schen Schaltungen dargestellt werden, deren händische Erzeugung
sehr fehleranfällig und zeitaufwendig ist.

Um MPC auch Nicht-Experten zugängliche zu machen, haben wir
im Rahmen dieser Arbeit verschiedene Übersetzungstechniken (Com-
piler) entwickelt, implementiert und evaluiert, die es ermöglichen
standard ANSI C Programmcode automatisiert in Schaltungen für
verschiedene Klassen von MPC Protokollen zu übersetzen. Diese Ar-
beit besteht aus zwei Teilen, wobei sich der erste Teil mit der Erzeu-
gung von logischen Schaltkreisen beschäftigt. Wir beginnen mit einer
Betrachtung der Konstruktion von Schaltungen mit minimaler Grö-
ße. Diese werden üblicherweise in MPC Protokollen mit konstanter
Rundenanzahl eingesetzt, wie beispielsweise Yao’s Garbled Circuits
Protokoll. Zu diesem Zweck analysieren und adaptieren wir klas-
sische Logikminimierungstechniken für MPC. Anschließend zeigen
wir, wie Yao’s Protokoll mit Hilfe eines Compilers effektiv paralleli-
siert werden kann, um die Protokollausführung auf Mehrprozessor-
systemen zu beschleunigen. Weiterhin präsentieren wir eine Protokol-
lerweiterung, die selbst auf einem einzigen Rechenkern die Berech-
nungszeit durch Parallelisierung reduzieren kann. Neben der Mini-
mierung der Schaltungsgröße, ist die Minimierung der Schaltungstie-
fe von besondere Relevanz für MPC Protokolle mit variabler Runden-
anzahl, wie beispielsweise dem GMW Protokoll. Für diese Art von
Protokollen präsentieren wir einen Übersetzungsansatz, bestehend
aus neuen handminimierten Schaltungsbausteinen und Schaltungs-
minimierungstechniken. Die im Rahmen dieser Arbeit entstandenen
Compiler, die die vorgestellten Techniken implementieren, erzeugen
Schaltungen aus einer Hochsprache, die meist schneller evaluiert wer-
den können als die zuvor vorgestellten und oft händisch optimierten
Schaltungen.

Im zweiten Teil dieser Arbeit beschäftigen wir uns mit der Schal-
tungserzeugung für hybride MPC Protokolle. Hier zeigen wir zum
einen die automatisierte Konstruktion von Anwendungen, die ver-

v


schiedene MPC Protokolle in einer Anwendung kombinieren. Durch
die Kombination von Protokollen mit verschiedenen Designprinzipi-
en, beispielsweise logische und arithmetische Schaltungen, können
Anwendungen weiter beschleunigt werden. Zur Erzeugung dieser hy-
briden Anwendungen präsentieren wir neue Codezerlegungs- und
Optimierungstechniken. Weiterhin stellen wir Lösungen vor, welche
die effizienteste Kombination von Protokollen für eine gegebene An-
wendungszerlegung identifizieren kann. Mit diesem Beitrag können
wir den ersten Compiler präsentieren, der eine vollständige Auto-
matisierung von Programmcode zu hybriden Anwendungen erzielt.
Als zweite hybride Protokollklasse betrachten wir die Compilierung
von RAM basierten Protokollen. Diese so genannten RAM-SC Pro-
tokolle kombinieren klassische MPC Protokolle mit Oblivious RAM
(ORAM) Techniken und werden für datenintensive Anwendungen
benötigt. Da existierende ORAMs verschiedenste Designprinzipien
verwenden, präsentieren wir eine umfangreiche ORAM Kostenana-
lyse mit der die Laufzeit von RAM-SC in beliebigen Einsatzszenari-
en abgeschätzt werden kann. Diese Abschätzung ermöglicht es uns,
den ersten voll automatisierten Compiler für RAM-SC vorzustellen,
der sämtliche Speicherzugriffe in einem gegebenen Programmcode
erfasst und die effizientesten ORAMs ermitteln kann.

Zusammenfassend trägt diese Arbeit dazu bei, MPC praxistaug-
licher und zugänglicher für Nicht-Experten zu machen, indem die
Entwicklungsprozesse automatisiert und die Effizienz von MPC An-
wendungen gesteigert wird.

vi


E R K L Ä R U N G

Hiermit erkläre ich, dass ich die vorliegende Arbeit – abgesehen von
den in ihr ausdrücklich genannten Hilfen – selbständig verfasst habe.

Niklas Büscher

A K A D E M I S C H E R W E R D E G A N G

11/2013 - 10/2018 Technische Universität Darmstadt

Wissenschaftlicher Mitarbeiter

Promotionsstudium

04/2011 - 10/2013 Technische Universität Darmstadt

Studium der Informatik

Abschluss: Master of Science

08/2011 - 05/2012 University of Boulder, Colorado, USA

Studium der Informatik

04/2008 - 03/2011 Technische Universität Darmstadt

Studium der Informatik

Abschluss: Bachelor of Science

vii


P U B L I C AT I O N S

This thesis is based on the following publications:

1. Niklas Büscher and Stefan Katzenbeisser. “Faster Secure Compu-
tation through Automatic Parallelization”. In: Proceedings of the
24th USENIX Security Symposium, USENIX Security 2015,
Washington, D.C., USA.

2. Niklas Büscher, Andreas Holzer, Alina Weber, and Stefan Kat-
zenbeisser. “Compiling Low Depth Circuits for Practical Secure Com-
putation”. In: Proceedings of the 21st European Symposium on
Research in Computer Security, ESORICS 2016, Heraklion,
Greece.

3. Niklas Büscher, David Kretzmer, Arnav Jindal, and Stefan Kat-
zenbeisser. “Scalable Secure Computation from ANSI-C”. In: Pro-
ceedings of the 8th IEEE International Workshop on Informa-
tion Forensics and Security, WIFS 2016, Abu Dhabi, United
Arab Emirates.

4. Niklas Büscher, Martin Franz, Andreas Holzer, Helmut Veith,
and Stefan Katzenbeisser. “On Compiling Boolean Circuits Opti-
mized for Secure Multi-Party Computation”. In: Formal Methods
in System Design 51(2).

5. Niklas Büscher and Stefan Katzenbeisser, “Compilation for Secure
Multi-Party Computation”, Springer Briefs in Computer Science
2017, ISBN 978-3-319-67521-3.

6. Niklas Büscher, Alina Weber, and Stefan Katzenbeisser, “Towards
Practical RAM-based Secure Computation”. In: Proceedings of the
23rd European Symposium on Research in Computer Security,
ESORICS 2018, Barcelona, Spain.

7. Niklas Büscher, Daniel Demmler, Stefan Katzenbeisser, David
Kretzmer, and Thomas Schneider. “HyCC: Compilation of Hybrid
Protocols for Practical Secure Computation.” In: Proceedings of the
25th ACM SIGSAC Conference on Computer and Communica-
tions Security, CCS 2018, Toronto, Canada.

Further peer-reviewed publications published during my doctoral
studies:

8. Johannes A. Buchmann, Niklas Büscher, Florian Göpfert, Ste-
fan Katzenbeisser, Juliane Krämer, Daniele Micciancio, Sander

viii


Siim, Christine van Vredendaal, Michael Walter. “Creating Cryp-
tographic Challenges Using Multi-Party Computation: The LWE Chal-
lenge”. In: Proceedings of the 3rd ACM International Workshop
on ASIA Public-Key Cryptography, AsiaPKC@AsiaCCS
2016, Xi’an, China.

9. Niklas Büscher, Stefan Schiffner, Mathias Fischer. “Consumer Pri-
vacy on Distributed Energy Markets”. In: Proceedings of the Pri-
vacy Technologies and Policy - 4th Annual Privacy Forum, APF
2016, Frankfurt, Germany.

10. Florian Kohnhäuser, Niklas Büscher, Sebastian Gabmeyer, Ste-
fan Katzenbeisser. “SCAPI: A Scalable Attestation Protocol to De-
tect Software and Physical Attacks”. In: Proceedings of the 10th
ACM Conference on Security and Privacy in Wireless and Mo-
bile Networks, WiSec 2017, Boston, USA.

11. Niklas Büscher, Spyros Boukoros, Stefan Bauregger, Stefan Kat-
zenbeisser. “Two Is Not Enough: Privacy Assessment of Aggregation
Schemes in Smart Metering”. In: Proceedings of the 17th Privacy
Enhancing Technologies Symposium, PETS 2017, Minneapo-
lis, USA.

12. Florian Kohnhäuser, Niklas Büscher, Stefan Katzenbeisser.
“SALAD: Secure and Lightweight Attestation of Highly Dynamic and
Disruptive Networks”. In: Proceedings of the 13th ACM Asia Con-
ference on Computer and Communications Security, AsiaCCS
2018, Incheon, Republic of Korea.

ix


A C K N O W L E D G M E N T S

Being at the final step of the doctorate allows me to thank every-
one who supported me along the way. Foremost, I am very grateful
to my advisor Stefan Katzenbeisser, for his invaluable guidance in
and beyond research, the enjoyable working environment, and espe-
cially for the many mountaineering trips to Kleinwalsertal. Further-
more, I thank Florian Kerschbaum, for agreeing to be the second re-
viewer and even more for going to any length in trying to keep me in
academia. My thank also goes to Christian Reuter, Thomas Schneider,
and Neeraj Suri for joining the defense committee.

I owe my first contacts to research to Marc Fischlin and Michael
Goesele, who both were independently willing to push undergradu-
ate projects, jointly tackled with Benjamin, into papers and sponsor-
ing our first conference visit. During my masters I was very warmly
welcomed into the TK security group led by Stefan and Mathias, who
gave me a first introduction into privacy research.

Then, during my doctoral studies, I had the honor to work with ex-
cellent colleagues in the SecEng team. Without revealing any details,
e.g., with whom I shared the most coffee breaks, I have to say that
it was my pleasure to work with André, Christian, Dominik, Erik,
Florian, Heike, Marius, Markus, Nikolaos I., Nikolaos II., Nikolay,
Philipp, Sebastian B., Sebastian G., Spyros, Tolga, and Ursula. Sim-
ilarly, I would like to thank the ENCRYPTO group, namely, Ágnes,
Amos, Christian, Daniel, Michael, Oleksandr, and Thomas for accept-
ing me as an alien MPC researcher, sharing many conference visits,
lunches, and coffees, which frequently led to fruitful discussions.

Likewise, I am very thankful for my co-authors and students, most
notably Alina for her passion and endurance in performing circuitry,
Andreas for introducing me to CBMC-GC, which provides a founda-
tion for the tools developed as part of this thesis, and David for his
commitment in compiler development as well as his exceptional ex-
pertise in C++ and information flow.

Many thanks go to my family for their continuous encouragement,
teaching me open-mindedness, and the importance of education. But
most of all, I am grateful for my wife Katharina, especially for her
never-ending support and patience throughout the past years.

x


C O N T E N T S

1 introduction 1

1.1 Research Goal and Contribution . . . . . . . . . . . . . 3

1.2 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . 5

i preliminaries 7

2 basic techniques 8

2.1 Digital Logic and Boolean Circuits . . . . . . . . . . . . 8

2.2 Secure Multi-Party Computation . . . . . . . . . . . . . 10

2.3 Oblivious RAM and RAM-SC . . . . . . . . . . . . . . . 17

2.4 Compilers for MPC . . . . . . . . . . . . . . . . . . . . . 22

2.5 Benchmarking Applications for MPC Compilers . . . . 25

3 from ansi c code to boolean circuits 32

3.1 Motivation and Overview . . . . . . . . . . . . . . . . . 32

3.2 CBMC-GC’s Compilation Chain . . . . . . . . . . . . . 34

3.3 Complexity of Operations in MPC . . . . . . . . . . . . 42

ii compilation and optimization for boolean cir-
cuit based mpc protocols 45

4 compiling size-optimized circuits for constant-
round mpc protocols 46

4.1 Motivation and Overview . . . . . . . . . . . . . . . . . 46

4.2 Circuit Minimization for MPC . . . . . . . . . . . . . . 49

4.3 Building Blocks for Boolean Circuit Based MPC . . . . 50

4.4 Gate-Level Circuit Minimization . . . . . . . . . . . . . 54

4.5 Experimental Evaluation . . . . . . . . . . . . . . . . . . 58

4.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . 63

5 compiling parallel circuits 65

5.1 Motivation and Overview . . . . . . . . . . . . . . . . . 65

5.2 Parallel Circuit Evaluation . . . . . . . . . . . . . . . . . 67

5.3 Compiler Assisted Parallelization Heuristics . . . . . . 68

5.4 Inter-Party Parallelization . . . . . . . . . . . . . . . . . 75

5.5 Experimental Evaluation . . . . . . . . . . . . . . . . . . 81

5.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . 92

6 compiling depth-optimized circuits for multi-round

mpc protocols 94

6.1 Motivation and Overview . . . . . . . . . . . . . . . . . 94

6.2 Compilation Chain for Low-Depth Circuits . . . . . . . 96

6.3 Experimental Evaluation . . . . . . . . . . . . . . . . . . 106

6.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . 115

xi


iii compilation and optimization for hybrid mpc

protocols 116

7 compilation of hybrid protocols for practical

secure computation 117

7.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 117

7.2 The HyCC MPC Compiler . . . . . . . . . . . . . . . . . 121

7.3 Protocol Selection and Scheduling . . . . . . . . . . . . 131

7.4 Experimental Evaluation . . . . . . . . . . . . . . . . . . 136

7.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . 140

8 compilation for ram-based secure computation 148

8.1 Motivation and Overview . . . . . . . . . . . . . . . . . 148

8.2 Analysis and Optimization of ORAMs for Secure Com-
putation . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

8.3 Automatized RAM-SC . . . . . . . . . . . . . . . . . . . 156

8.4 Experimental Evaluation . . . . . . . . . . . . . . . . . . 158

8.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . 165

9 conclusion and future work 167

bibliography 170

xii


A C R O N Y M S

aig AND-Invert Graph

abb Arithmetic Black Box

asap As-soon-as-possible

ast Abstract Syntax Tree

bmc Bounded Model Checking

cfg Control Flow Graph

cgp Coarse-Grained Parallelization

cnf Conjunctive Normal Form

cnn Convolutional Neural Network

csa Carry-Save Adder

csn Carry-Save Network

dag Directed Acyclic Graph

dnf Disjunctive Normal Form

dpf Distributed Point Function

dsl Domain Specific Language

fa Full Adder

fgp Fine-Grained Parallelization

fss Function Secret Sharing

grr Garbled Row Reduction

ha Half Adder

ipp Inter-Party Parallelization

lsb Least Significant Bit

mpc Secure Multi-Party Computation

oram Oblivious RAM

pet Privacy-Enhancing Technology

pir Private Information Retrieval

xiii


ppa Parallel Prefix Adder

prf Pseudo Random Function

rca Ripple Carry Adder

rtt Round Trip Time

se Symbolic Execution

simd Single-Instruction-Multiple-Data

ssa Single Static Assignment

tpc Secure Two-Party Computation

xiv


“Die [...] Gestaltung von Datenverarbeitungssystemen sind
an dem Ziel auszurichten, so wenig personenbezogene Daten

wie möglich zu erheben, zu verarbeiten oder zu nutzen [...],
soweit dies nach dem Verwendungzweck möglich ist

und keinen im Verhältnis zu dem angestrebten Schutz-
zweck unverhältnismäßigen Aufwand erfordert.”

— Bundesdatenschutzgesetz (BDSG)
§3a Datenvermeidung und Datensparsamkeit

1
I N T R O D U C T I O N

The continuous grow of data gathering and processing, which is fired
by cheap sensors (e.g., in smart phones and wearables), cheap storage
costs, and efficient machine learning algorithms enables many use-
ful applications and powerful online services. However, this form of
heavy data processing, often referred to as ‘Big Data’, is also a huge
risk for the individual’s privacy because users of these services be-
come more and more transparent and reveal possibly sensitive data
to an untrusted and possibly unknown service provider. Even when
not assuming any malicious interest of the data collectors, the gath-
ered data is often centrally stored, and thus prone to many possible
breaches, e.g., hackers exploiting vulnerabilities, maliciously acting
employees, requests by state agencies, or insufficient data disposal
management.

Currently, all these risks are mostly ignored by the service providers
as well as the users. The former have (often legitimate) commercial in-
terests in their business use cases and the latter are either unaware of
the actual value of their data, lack alternatives, or have no possibilities
to opt-out if they want to participate in nowadays life. Consequently,
we observe a sincere conflict between business interest and the right
on privacy when processing personal data. Yet, there is a solution
to this dilemma: Namely, since the 1980’s it is (theoretically) known
that any computation over sensitive data from multiple parties can be
performed securely, such that the participating parties do not learn
more about the inputs of the other parties from the computation than
they can already derive from the output [Yao82; Yao86]. This form of
computation is known as Secure Computation or Secure Multi-Party
Computation (MPC) and can be explained by an example application:

Consider two parties Alice and Bob that want to schedule a joint
meeting. Yet, Alice does not want to reveal her own schedule, i.e., her
availability, to Bob, as it might contain sensitive information. For the
same reasons, Bob prefers to maintain his schedule secret. As a solu-
tion they can involve a third person, e.g., Charlie, whom they both

1


introduction 2

trust. Alice and Bob can send their private schedule, i.e., availabil-
ity slots, to Charlie, who identifies a matching time slot and reveals
this slot to Alice and Bob. Unfortunately, a third party that is trusted
by both parties barely exists in practice. MPC solves this problem by
simulating1 such a trusted third party using cryptographic protocols.
These protocols are run between the two parties, guarantee correct-
ness of the computation and privacy of the users’ data. Consequently,
MPC is a powerful Privacy-Enhancing Technology (PET), as the data
at the service providers can be processed under encryption, while
almost all functionality required for business use cases can be main-
tained.

Naturally, the questions arises: If such powerful cryptographic tech-
niques are available, why are they barely used in practice? The answer
is twofold. First, MPC protocols have noticeable deployment costs in
computation and communication. Second, creating, i.e., developing
efficient applications for MPC by hand, as it had often be done pre-
viously, is a tedious and error-prone task. In this thesis, we address
both challenges by presenting a multitude of compilation techniques and
implement these in compilers that automatically synthesize optimized MPC
applications for different deployment scenarios from high-level code. By au-
tomatizing the creation of MPC applications we significantly lower
the entry barrier to MPC.

Our work is of practical relevance because of two main reasons:
First, applied cryptography and IT infrastructures in general are get-
ting more and more complex. To handle this growing complexity, sim-
pler developer interfaces and further automatization are desirable to
create secure applications. This situation has been identified by indus-
try and academia alike. For example, replacements for standard cryp-
tography libraries have been proposed that provide clearer interfaces
to developers, e.g., [BLS12], and also dedicated tools have been de-
veloped that synthesize secure cryptographic implementations, e.g.,
[Krü+17]. Thus, automated compilation for MPC contributes in this
direction, as it reduces the complexity when designing new PETs.

Second, the right to privacy is known for ages in most civiliza-
tions and therefore most countries have issued data protection laws
that regulate the (commercial) use of data with different scopes and
strengths. For example, an excerpt of the German privacy law, which
has recently been adapted to the European General Data Protection
Regulation (GDPR), is printed at the beginning of this chapter. Loosely
translated, this excerpt says that all data processing systems should
be developed with the goal to use, store, and process as little per-
sonal data as possible, if this is reasonable given the technical efforts. The
proof-of-concept compilers developed as part of this thesis illustrate
that the technical efforts to implement privacy-preserving computa-
tion are reasonable and consequently the use of MPC in highly sensi-

1 Not meant in the cryptography way.


1.1 research goal and contribution 3

tive areas, e.g., genome processing or the handling of health records,
should be considered.

1.1 research goal and contribution

The goal of this thesis is the design, development, and evaluation of
compilation techniques for efficiency improvements of MPC, when
compiling from a high-level programming language. In this section,
we specify this goal, outline our contributions, and illustrate the con-
nection to related work.

secure multi-party computation. In MPC, two main re-
search directions are distinguished. First, MPC protocols dedicated
for specific applications have been developed, e.g., for recommender
systems [KP08] or privacy-preserving face recognition [Erk+09]. Sec-
ond, generic protocols have been proposed that allow to perform any
computation between two or more parties securely2. Dedicated pro-
tocols require cryptographers to prove their correctness and security,
yet in principal have the possibility to outperform the generic tech-
niques. Generic protocols only need to be proven secure once and
then offer significantly more versatility because any application can
realized on top of these protocols without further security proofs,
and thus without expert knowledge in cryptography. Due to these
reasons, many generic protocols using different cryptographic prim-
itives have been proposed, e.g., [Ara+17; BLW08; Dam+12; GMW87;
Yao82], and also many theoretical and practical optimizations, e.g.,
[Bel+13; KS08; Pin+09; ZRE15] made these protocols ready for prac-
tice. The focus of this thesis is the compilation for generic MPC pro-
tocols.3

application descriptions and compilation. At the heart
of (generic) MPC is the ability to efficiently represent the function-
ality to be computed. Most existing protocols require either a for-
mulation in terms of a Boolean or an Arithmetic circuit. Circuits are
acyclic graphs, where values flow along the edges, i.e., ‘wires’, and
are processed at nodes, representing either Boolean functions (in case
of Boolean circuits), e.g., logical AND, or basic Arithmetic operations
(in case of Arithmetic circuits). Hand-coding these circuits, i.e., effi-
ciently formulating the function to be computed in terms of basic
Boolean or Arithmetic operations, is practically infeasible even for
moderately complex functions, as the manual construction of efficient

2 The term secure is defined more precisely in Section 2.2.
3 The tools developed as part of this thesis are creating applications for MPC with

only two parties, commonly referred to as Secure Two-Party Computation (TPC).
Nevertheless, we remark that the ideas presented in this work transfer for protocols
with any number of parties. Because of this and for the sake of generality we will
use the term MPC in the remainder of this thesis.


1.1 research goal and contribution 4

circuits for MPC is a complex and time-consuming task that requires
expert knowledge in hardware design. Therefore, the first practical
framework (protocol implementation) [Mal+04] for Yao’s Garbled Cir-
cuits protocol [Yao86], which is one of the most studied MPC pro-
tocols, also provided a rudimentary compiler for a domain specific
language. In following works, e.g., [Hol+12; Hua+11b], compilation
has been identified as an independent task, separated from MPC pro-
tocol implementations. Since then, a multitude of compilers has been
proposed. We give a detailed overview and classification of these in
Section 2.4.

research goal In this thesis, we study compilation approaches
that translate a high-level language (ANSI C) into optimized circuits
suiting the requirements of different classes of MPC protocols with a
major focus on the optimization for Boolean circuit based protocols.
As such, we aim at compiling circuit descriptions that are compara-
ble or better than previous handmade constructions, given a suitable
high-level description. Consequently, we aim at increasing both, us-
ability and efficiency of MPC.

contributions . A major part of this thesis deals with the com-
pilation and optimization for Boolean circuit based MPC protocols.
Optimizing the created circuits is necessary, as MPC is still multi-
ple orders of magnitude slower than generic computation. Therefore,
we first study hand-optimized building blocks and a fixed-point op-
timization algorithm that adapts circuit synthesis techniques, known
from logic or chip design [Gaj+12; She93], for their use in compilation
for MPC. We implement and evaluate these techniques in the ANSI C
compiler CBMC-GC by Holzer et al. [Hol+12] and illustrate their
capabilities to compile efficient circuits that are capable of outper-
forming commercial hardware synthesis tools. As such, we present
a compile-chain that creates small circuits from a standard program-
ming language. These circuits are useful for all constant round MPC
protocols that operate on Boolean circuits, e.g., Yao’s Garbled Circuits
protocol.

Although creating highly optimized applications, we observed that
MPC in general, and Yao’s protocol in particular, still require substan-
tial computational resources. Therefore, we study parallelization tech-
niques for Yao’s protocol. We propose two parallelization approaches
and introduce a new parallel protocol variant that allows to profit
from parallelization even in single core settings. Furthermore, we
present the first complete and automatized compile chain that cre-
ates and partitions circuits from standard ANSI C code that can be
evaluated efficiently in parallel MPC frameworks.

While working on improving the application efficiency in Yao’s pro-
tocol, many further MPC protocols also advanced and became of prac-


1.2 thesis outline 5

tical relevance. Especially multi-round protocols based on Boolean
circuits, such as GMW [GMW86] and TinyOT [Nie+12] improved sig-
nificantly due to the advancements in Oblivious Transfers [Ash+13].
Also newer and faster protocols, e.g., [Ara+17], following the same
multi-round paradigm have been proposed. We address these devel-
opments and present an optimizing compiler from ANSI C that cre-
ates depth-minimized Boolean circuits for MPC. For this purpose, we
present new depth-minimized building blocks and adapt high- and
low-level optimization techniques, which allow to automatically gen-
erate circuits that outperform previously handmade constructions.

Recently also hybrid protocols, i.e., MPC protocols that combine
different techniques, e.g., Boolean with Arithmetic circuits or Boolean
circuits with Oblivious RAM (ORAM) [GO96], gained noticeable trac-
tion [DSZ15; Gor+12; Liu+14]. Therefore, we further extend the pre-
viously developed compilation chain to compile hybrid protocols. To
achieve this goal and to also cope with the development of increas-
ing circuit sizes, we present the first scalable compiler that allows to
compile, optimize, and partition hybrid MPC protocols consisting of
the combination of both Boolean and Arithmetic circuits.

Finally, we study the compilation and optimization for ORAM as-
sisted MPC protocols, also known as RAM-SC [Liu+14], which are
necessary for data intensive applications. For this purpose, we present
a cost analyses of relevant ORAM schemes, optimize these, and intro-
duce the first optimizing compilation chain for RAM-SC. In summary,
we make a wide range of MPC protocols accessible for non-domain
experts.

1.2 thesis outline

The remainder of this thesis is structured in three parts.
In Part I – Preliminaries we discuss basic techniques and notations.
In Chapter 2 we recap the basics of Boolean circuit design, give an
overview of MPC protocols relevant for this work, and describe re-
lated compilers for MPC. Moreover, we describe example applications
that are used as MPC benchmarks throughout this thesis. Then, in
Chapter 3 we describe a compiler, named CBMC-GC [Hol+12], which
translates ANSI C into Boolean circuits. This compiler is adapted and
extended in the following chapters to compile and optimize circuits
for the different protocols.

In Part II – Compilation and Optimization for Boolean Circuit based
MPC Protocols we present compilation techniques that focus on stan-
dalone MPC protocols that use Boolean circuits as application de-
scriptions. In Chapter 4 we present our first contribution by study-
ing multiple size-minimizing optimization techniques for their adap-
tion in circuit synthesis for MPC. The created circuits lead to ef-


1.2 thesis outline 6

ficiency improvements for all constant-round MPC protocols over
Boolean circuits. Next, in Chapter 5, we study the parallelization
of Yao’s Garbled Circuits and present a compilation chain that cre-
ates circuits, which can be efficiently evaluated in parallel. In Chap-
ter 6 we present a compilation and optimization approach that com-
piles depth-minimized Boolean circuits. These are required for multi-
round MPC protocols that are deployed practical network environ-
ments.

In Part III – Compilation and Optimization for Hybrid MPC Protocols
we extend our work towards more advanced MPC protocols. In Chap-
ter 7 we study the compilation of hybrid MPC applications that con-
sist of Boolean and Arithmetic circuits, which allows to mix constant
and multi-round MPC protocols efficiently. Furthermore, we present
a compilation chain for RAM-SC that suits the requirements of data
intensive applications in Chapter 8.

Finally, in Chapter 9 we conclude our work and give an outlook on
future research directions in compilation for MPC.


Part I

P R E L I M I N A R I E S


2
B A S I C T E C H N I Q U E S

Summary: In this chapter, we describe basic techniques required to fol-
low the ideas presented in this thesis. First, in Section 2.1 we describe the
circuit computation model, and recap the properties of Boolean circuits,
which are predominately used in this thesis. Then, in Section 2.2, we de-
scribe three of the most studied secure computation protocols, namely Yao’s
Garbled Circuits protocol and the Goldreich-Micali-Wigderson (GMW) pro-
tocol, which are prototypic for constant-round and multi-round MPC proto-
cols over Boolean circuits, as well as a multi-round additive secret sharing
based protocol over Arithmetic circuits. Furthermore, we introduce the con-
cept of Oblivious RAM (ORAM) in Section 2.3, and show how ORAM can
be combined with secure computation protocols to build more efficient ap-
plications. Afterwards, in Section 2.4 we present a comparison of related
compilers for MPC. Finally, we discuss multiple applications that are used
for benchmarking purposes in Section 2.5.

2.1 digital logic and boolean circuits

Boolean circuits are a common representation of functions in com-
puter science and a mathematical model for digital logic circuits. A
Boolean circuit is a Directed Acyclic Graph (DAG) with l inputs, m
outputs, and s gates. Technically, the graph consists of |V | = l+m+ s

nodes and |E| edges, where each node can either be of type input, out-
put or gate. Due to the relation of digital circuit synthesis, the edges
are called wires. We note that a Boolean circuit, as defined above, is
also referred to as a combinatorial circuit. Combinatorial circuits are
different to sequential circuits, which can be cyclic and carry a state.
As all MPC protocols described in this thesis evaluate combinatorial
circuits, their computational model is the combinatorial circuit model.

boolean gates . Each gate in a Boolean circuit has one or mul-
tiple input wires and one output wire, which can be input to many
subsequent gates. Each gate in a circuit represents a Boolean function
fg, which maps k input bits to one output bit, i.e., g(w1,w2, . . . ,wk) :
{0, 1}k → {0, 1}. In this thesis, we will only consider gates with at most
two input wires, namely unary and binary gates. The relevant unary
gate is the NOT gate (¬), while typical binary gates are AND (∧),
OR (∨), and XOR (⊕). Moreover, we distinguish between linear1 (e.g.,
XOR) and non-linear (e.g., AND, OR) gates, as they have different

1 The Boolean function represented by linear gates can be expressed by a linear com-
bination of its inputs over Z2, e.g., XOR(X, Y) = X+ Y mod 2.

8


2.1 digital logic and boolean circuits 9

evaluation costs in secure computation [KS08]. In some works on
MPC, non-linear gates are also referred to as non-XOR gates.

notation. For a function f, we refer to its circuit representation
as Cf. We use s to denote the total number of gates in a circuit, also
referred to as size, i.e., s = size(Cf). Moreover, we use snX to count
the number of non-linear (non-XOR) gates per circuit. We denote the
depth of a circuit by d and use dnX to denote its non-linear depth,
which is the maximum depth of all gates connected to an output
node, defined recursively for a gate g as:

dnX(g) =



0 if g is an input node,

dnX(w1) if g is an unary gate
with input w1,

max(dnX(w1),dnX(w2)) if g with inputs w1,w2
is linear,

max(dnX(w1),dnX(w2)) + 1 if g with inputs w1,w2
is non-linear.

Furthermore, we denote bit strings in lower-case letters, e.g., x. We
refer to individual bits, which can be part of a bit string, by capital
letters X and denote their negation with X. We refer to a single bit at
position i within a bit string as Xi. The Least-Significant Bit (LSB) is
denoted with X0. When writing Boolean equations, we denote AND
gates with ·, OR gates with + and XOR gates with ⊕. Moreover, when
useful, we abbreviate the AND gate A ·B with AB.

integer representation. Integers are represented in binary
form, as typical in computer science and logic synthesis. Hence, the
decimal value of a bit string x is

∑n−1
i=0 2

i · Xi. This common repre-
sentation has the advantage that arithmetic operations for unsigned
binary numbers such as addition, subtraction, and multiplication can
be reused for signed numbers. In the two’s complement, negative
numbers are represented by flipping all bits and adding one: −x =

x+ 1. In the following chapters, we assume a two’s complement rep-
resentation when referring to negative numbers.

fixed and floating point representation. To represent
real numbers, we use two representations. The fixed point represen-
tation has a fixed position of the radix point (the decimal point).
Hence, a binary bit string is split into an integer part and a frac-
tional part. The decimal value of a bit string in fixed point representa-
tion is computed as

∑n−1
i=0 2

i−r · Xi, where r is the position (counted
from the LSB) of the radix point. The IEEE-754 floating point rep-
resentation [Soc85] is the standard representation of floating point
numbers. It divides a bit string in three components, namely sign s,


2.2 secure multi-party computation 10

significant m and exponent e. Its real value is determined by com-
puting (−1)s ·m · 2e. In the IEEE-754 standard, the bit-width of each
component, as well as their range is determined. Moreover, various
error handling behavior is specified, e.g., overflow or division by
zero [Gol91].

2.2 secure multi-party computation

Secure Multi-Party Computation (MPC), also referred to as secure
computation, has been proposed in the 1980s as a more theoretical
construct [Yao82]; it only became practical in the last fifteen years.
MPC protocols are cryptographic protocols run between two or more
parties that allow to perform a joint computation of a functionality
f(x1, x2, . . . ) over the private inputs x1, x2, . . . of the participating
parties P1,P2, . . . with guaranteed correctness and privacy. Privacy in
MPC is understood as the guarantee that participating parties do not
learn more about the other party’s inputs than they could already de-
rive from the observed output of the joint computation. Thus, infor-
mally speaking, MPC allows collaboration among mutually distrust-
ing parties by realizing a virtual trusted party that receives the input
of all parties, computes the functionality, and returns the output to
the parties.

In MPC different adverserial models are distinguished. Most rel-
evant is the semi-honest (passive) model, where the adversary tries
to learn as much from the protocol execution as possible, yet does
not deviate from the protocol itself. This is opposed to the malicious
model, where the adversary is allowed to actively violate the protocol.
Examples of further adversarial models are the covert model by Au-
mann and Lindell [AL07] or the dual execution approach that leaks at
most one bit by Huang et al. [HKE12]. Examples for further security
properties of MPC protocols that are currently mostly of theoretic in-
terest are adaptive security (i.e., the adversary is allowed to choose
its input x while running the protocol), fairness (i.e., either all parties
receive the output of the computation or no party learns a single bit),
guaranteed output delivery, and robustness [CDN15].

As we focus on compilation in this thesis, we only give a high-level
description of three MPC protocols, where each is a representative
for a different class (in their application description and cost model)
of MPC protocols. For simplicity and explanatory reasons, we focus
on two-party protocols secure in the semi-honest adversary model, as
these are already sufficient to follow the ideas presented in this the-
sis. Typically, MPC protocols in the semi-honest model are building
blocks for protocols secure against malicious adversaries and are also
used for many privacy-preserving applications on their own. We de-
scribe the two most prominent MPC protocols over Boolean circuits,
namely Yao’s Garbled Circuits and the GMW protocol. Moreover, in


2.2 secure multi-party computation 11

Chapter 7, we study the compilation of hybrid protocols that consist
of multiple protocols, which involve Yao’s protocol, GMW, as well as
an additive secret sharing based protocol. Therefore, we also give an
introduction in MPC based on additive secret sharing.

We begin with an introduction into Oblivious Transfer, which is a
building block used for many MPC protocols and also used by the
protocols described in the remainder of this section.

2.2.1 Oblivious Transfer

Oblivious Transfer (OT) [Rab81; Kil88] is one of the most important
building blocks for MPC protocols. An Oblivious Transfer protocol
is a protocol in which a sender transfers one of multiple messages
to a receiver, but it remains oblivious to the sender which message
has been selected and retrieved by the receiver. At the same time, the
receiver is only able to learn the content of the selected message, yet
not anything about the other messages offered by the sender.

In this thesis, we mostly use 1-out-of-2 OTs, where the sender in-
puts two l-bit strings m0,m1 and the receiver inputs a bit C ∈ {0, 1}.
At the end of the protocol, the receiver obliviously receives mC such
that neither the sender learns the choice C, nor the receiver learns
anything about the other message m1−C.

In a classic result, Impagliazzo and Rudich [IR89] showed that OT
cannot be constructed from one-way functions in a black-box man-
ner. Moreover, as non-black-box constructions have also not been pre-
sented, it was assumed that OT can only be constructed from asym-
metric cryptography and thus was assumed to be very costly. How-
ever, in 2003 Ishai et al. [Ish+03] presented the idea of OT Extension,
which significantly reduces the computational costs of OTs for most
interesting applications of MPC. In OT Extension, a constant number
κ, i.e., the security parameter (e.g., κ = 128 bit), of OTs are realized
using a traditional OT protocol based asymmetric encryption and re-
ferred to as Base OTs. These Base OTs establish a symmetric key of
length κ that can subsequently be used to execute a larger number of
OTs with comparably cheap symmetric cryptography. Since Base OTs
only need to be established once between two parties, similar to a
key exchange protocol, their costs becomes negligible through amor-
tization in many applications. Finally, we remark that variants of OT
have been proposed that are beneficial for MPC, such as correlated or
random OT [Ash+13]. As a consequence, recent implementations of
OT Extension are capable of performing multiple millions of OTs per
second on a single core [Ash+13; KOS15].


2.2 secure multi-party computation 12

2.2.2 Yao’s Garbled Circuits Protocol

Yao’s garbled circuits protocol, proposed in the 1980s [Yao86], is the
most studied secure two-party computation protocol, secure in the
semi-honest model. The protocol is run between two parties P0, P1
and operates on functionality descriptions in form of Boolean cir-
cuits over binary Boolean gates. To securely evaluate a functionality
f(x,y) over their private inputs x and y, both parties agree on a circuit
Cf(x,y), which can be seen as the machine code for the protocol.

During protocol execution, one party becomes the circuit genera-
tor (the garbling party), the other the circuit evaluator. The generator
initializes the protocol by assigning each wire wi in the circuit two
random labels w0i and w1i of length κ (the security parameter), rep-
resenting the respective Boolean values 0 and 1. For each gate the
generator computes a garbled truth table. Each table consists of four
encrypted entries of the output wire labels wγo . These are encrypted
according to the gate’s Boolean functionality γ = g(α,β) using the
input wire labels wαl and w

β
r as keys. Thus, an entry gttαβo in the

table is encrypted as

gttαβo = Ewαl (Ewβr
(w
g(α,β)
o )).

After their creation, the garbled tables are randomly permuted and
sent to the evaluator, who, so far, is unable to decrypt a single row of
any garbled table due to the random choice of wire labels. To initiate
the circuit evaluation, the generator sends her input bits x in form
of her input wire labels to the evaluator. Moreover, the wire labels
corresponding to the evaluator’s input y are transferred via an OT
protocol, with the generator being the OT sender, who inputs the
two wire labels, and evaluator being the OT receiver, who inputs its
input bits of the computation. After the OT step, the evaluator is in
possession of the garbled circuit and one input label per input wire.
With this information the evaluator is able to iteratively evaluate the
circuit by decrypting a single entry of each garbled table, starting
from input wires and ending at the output wires. Due to the use of
a double-encryption scheme, the evaluator is unable to decrypt more
than one entry in each garbled table. Once all gates are evaluated,
all output wire labels are known to the evaluator. In the last step of
the protocol, the generator sends an output description table to the
evaluator, containing a mapping between output label and actual bit
value. The decrypted output is then shared with the generator. Note
that this procedure can be adapted to provide different outputs to
the two parties. Alternatively, the generator can encrypt the cleartext
output bits in the garbled tables of all gates connected to output wires,
which allows to reveal the output of the computation to the evaluator
without further communication.

Selective security of Yao’s Garbled Circuits in the semi-honest model
has been proven by Lindell and Pinkas [LP09] and recently also adap-


2.2 secure multi-party computation 13

tive security has been proven by Jafargholi and Wichs [JW16]. Yao’s
protocol has also been extended to be secure in the malicious model
in multiple works, e.g., [Hua+14; HKE13; LP15; Lin16].

implementations and optimizations . Yao’s original proto-
col has seen multiple optimizations in the recent past. Most important
are point-and-permute [BMR90; Yao86], which allows an efficient per-
mutation of the garbled table such that only one entry in the table
needs to be decrypted by the evaluator, garbled-row-reduction [NPS99],
which reduces the number of ciphertexts that are needed to be trans-
ferred per gate, free-XOR [KS08], which allows to evaluate linear gates
(XOR/XNOR) essentially for ‘free’ without any encryption or com-
munication costs, and finally the communication optimal half-gate
scheme [ZRE15], which requires only two ciphertexts per non-linear
gate while being compatible with free-XOR. Important for practical
implementations is the idea of pipelining [Hua+11b], which allows a
parallel circuit generation and evaluation and which is necessary for
the evaluation of larger circuits and a faster online execution of Yao’s
protocol, as well as the idea of fixed-key garbling [Bel+13], which al-
lows to achieve garbling speeds of more than 10 million gates per
second on a single CPU core using the AES instructions of modern
CPUs.

cost model . Considering all optimizations mentioned above, then
for a given Boolean function f(x,y) and its circuit representation
Cf(x,y), the evaluation costs of a circuit in Yao’s protocol are domi-
nated by the number of the input bits, and by the number of non-linear
gates in the circuit. In an amortized setting, the input gates are trans-
ferred via OT Extension. Current implementations achieve a speed
of more than 10 million OTs per second and require a bandwidth
of two ciphertexts per OT. The garbling and evaluation costs of lin-
ear gates are negligible for most applications, as communication is
the current practical bottleneck of Yao’s protocol. To garble a non-
linear gate, two entries from the garbled table of length κ each have
to be transferred. Assuming a standard key length of κ = 128 bit,
then a garbling throughput of 10 million non-linear gates per second
produces ≈ 2.8Gbit of data per second. Considering that the com-
munication requirement of two ciphertexts per gate is a proven lower
bound for known garbling schemes [ZRE15], minimizing the num-
ber of non-linear gates in a circuit is an important optimization goal
when compiling applications into circuits for Yao’s protocol. We re-
mark that this optimization goal holds for almost all MPC protocols
over Boolean circuits, as these have a mechanism similar to free-XOR,
which allow the evaluation of linear gates without communication.


2.2 secure multi-party computation 14

2.2.3 Goldreich-Micali-Wigderson (GMW) Protocol

The Goldreich-Micali-Wigderson (GMW) protocol [GMW87] also orig-
inates from the 1980s and allows two parties to securely compute
a functionality described in form of a Boolean circuit. In contrast to
Yao’s protocol, which uses the idea of garbled tables, GMW is built on
top of Boolean secret sharing. Each input and intermediate wire value
is shared among the two parties using an XOR based sharing scheme
over single bits, i.e., for every value V ∈ {0, 1} each party P ∈ {0, 1}
holds a share VP, indistinguishable from random, with V = V0 ⊕ V1.
Values can be revealed at the end of the computation by exchanging
and XORing the shares.

Given shared values, XOR gates can be evaluated locally without
any communication between the parties by XORing the respective
wire shares, e.g., both parties compute WP = UP ⊕ VP over their
shares of U and V to compute shares of W = U⊕ V . An AND gate
Z = X · Y requires the parties to run an interactive protocol. One
approach is to perform an 1-out-of-4 OT protocol [SZ13], where the
chooser inputs its share X0 and Y0, and the sender prepares the out-
put accordingly. Thus, the sender samples a random Z1 and computes
its input, i.e., different Z0’s, into the OT protocol according to

Z0 = Z1 ⊕ ((X0 ⊕X1) · (Y0 ⊕ Y1)),

such that Z0 ⊕Z1 = 1 iff X0 ⊕X1 = 1 and Y0 ⊕ Y1 = 1.
The same functionality can be realized in a pre-processing phase,

which is independent of the functionality to be computed and which
enables a very fast online phase that only requires the computation
of a few bit-wise operations. For this purpose Boolean multiplication
triples [Bea92] are used. A multiplication triple consists of three bits
A,B, and C, where C = A · B holds, that are shared between the
parties. Given such a triple, the two parties compute an AND gate
Z = X · Y, by opening (reconstructing) two bits E = A⊕ X and F =

B⊕ Y and computing their shares of Z as

ZP = P · E · F⊕ F ·AP ⊕ E ·BP ⊕CP.

The multiplication triples can be generated in a preprocessing phase
using the 1-out-of-4 OT protocol described above.

implementations and optimizations . A first implementa-
tion was given by Choi et al. [Cho+12] that was subsequently im-
proved by Schneider and Zohner [SZ13]. Furthermore, Asharov et
al. [Ash+13] showed that the triples can be precomputed using only
two random OTs with a total communication of 2κ bits, where κ de-
notes the key length in bits. A protocol secure against malicious ad-
versaries, named TinyOT, was proposed by Nielsen et al. [Nie+12].


2.2 secure multi-party computation 15

cost model . Since the multiplication triples, which are the most
expensive part of the computation, can be precomputed, only four
bits (two per party) for every AND gate need to be communicated
in the online phase. Moreover, we observe that the computation ef-
fort of AND and XOR gates is negligible in practice when compared
to the communication costs. Thus, when considering a typical band-
width (6 1Gbit) constrained deployment scenario, GMW outper-
forms Yao’s Garbled Circuits in gate throughput significantly, as only
a fraction bits per gate have to be transmitted. GMW is a multi-round
protocol, where the number of rounds is depending on the non-linear
circuit depth. All AND gates on the same layer in the circuit can be
computed in parallel and in the same communication round. Input
sharing is significantly cheaper than in Yao’s Garbled Circuits, as only
one bit per input wire has to be transmitted. We note that the compu-
tation and communication efforts in the preprocessing phase are com-
parable to the garbling cost of a gate in Yao’s Garbled Circuits [DSZ15;
Des+17].

In summary, the performance of the GMW protocol for a given
circuit Cf is dominated by the total number of non-linear gates snX as
well as the non-linear depth dnX of the circuit (number of layers of
AND gates), whereas the number of inputs and outputs marginally
influences the performance of the GMW protocol. This cost model is
prototypic for multi-round MPC protocols over Boolean circuits.

2.2.4 MPC Based on Additive Secret Sharing

In the later part in this thesis we make use of an MPC protocol
based on additive linear secret sharing that evaluates Arithmetic cir-
cuits consisting of only addition and multiplication gates. Many other
MPC protocols with different secret sharing schemes have been pro-
posed, and we refer the reader to the book of Cramer et al. [CDN15]
for a detailed introduction into the topic of MPC and secret sharing.

In this thesis, we make use of the two-party MPC protocol de-
scribed in [Ata+04; DSZ15], where an l-bit value is shared additively
in the ring Z2l . Thus, for every value v ∈ Z2l in the protocol, each
party P ∈ {0, 1} holds a share vP with v = v0 + v1 (this and all follow-
ing computations are performed modulo 2l). To enter a new value
x into the protocol, i.e., to share a value, party P samples a random
xP ∈ Z2l , computes x1−P = x − xP, and sends x1−P to the other
party. To output a shared value x, the parties send their own share
xP to the other party and locally compute x = x0 + x1. Additions
can be performed locally by adding the respective shares. Thus, to
compute z = x+ y over the shares of x and y, both parties compute
zP = xP + yP. Multiplications are performed with an interactive pro-
tocol, which can be precomputed using multiplication triples [Bea92;
Gil99]. To compute z = x · y over shared values x and y, a multiplica-


2.2 secure multi-party computation 16

tion triple c = a · b is used, such that both parties compute and then
reveal eP = xP − aP and fP = yP − cP. Given eP and fP, both parties
compute their share zP of z as follows:

zP = P · e · f+ f · aP + e · bP + cP.

Multiplication triples can be generated in the preprocessing phase,
independently of the computed functionality with the help of ho-
momorphic encryption [Ata+04] and additive blinding. Alternatively,
multiplication triples can be generated by performing a bit-wise OT
based protocol [Gil99].

implementations . An implementation of the protocol described
above is given by Demmler et al. [DSZ15]. Alternative implementa-
tions of MPC protocols based on additive linear secret sharing are
Sharemind [BLW08], which provides the richest set of functionalities,
VIFF [Dam+09], and the SPDZ [Dam+12] implementation [KSS13] as
well as FRESCO [Dam+16], which both provide security against ma-
licious adversaries.

cost model . The communication and computation costs in addi-
tive secret sharing based MPC is dominated by the number of multi-
plication gates and the multiplicative depth of the circuit. Similar to the
Boolean circuit based protocols, linear operations, i.e., additions, are
for free in the number of cryptographic operations and in communi-
cation. Multiplications have a small online cost, as only the respective
shares have to be transmitted and only cheap local operations are
performed. Moreover, similar to GMW, every multiplicative layer in
the circuit increases the number of communication rounds. The pre-
computation phase is the most expensive part of the protocol execu-
tion. For the generation of the multiplication triples, either multiple
(packed) encryptions in a homomorphic encryption systems or O(l)
OTs of length l have to be performed.

2.2.5 Hybrid Protocols

The efficiency of applications generated by compilers for all afore-
mentioned protocols is bound by the efficiency to represent elemen-
tary operations, also referred to as building blocks, in their respec-
tive cost model. Because of this, the protocols can have substantial
difference in their runtime for the same application. For example, ex-
pressing multiplications over nbit integers in Yao’s protocol or GMW
requires Boolean circuits of size O(n2). Due to this reason, multi-
ple applications based on hybrid protocols, i.e., a mix of multiple
standalone MPC protocols, have been proposed to achieve better ef-
ficiency by exploiting multiple function representations, e.g., [BG11;
Hua+11a; Nik+13a; SK11]. An application independent combination


2.3 oblivious ram and ram-sc 17

of protocols has been studied in [BLR13; Hen+10; SKM11; KSS14].
Most of these works combine Yao’s Garbled Circuits with homomor-
phic encryption, i.e., an asymmetric encryption scheme, which allows
to perform operations on the ciphertexts that resemble arithmetic op-
erations on the respective cleartexts, e.g., addition.

In this thesis, we evaluate hybrid protocols consisting of all three
aforementioned protocols, which has been described by Demmler et
al. [DSZ15]. To combine multiple sequentially aligned MPC proto-
cols into a hybrid protocol, an intermediate state has to be converted
securely between the different protocols. Such an intermediate state
consists of multiple values that are secretly shared between between
the computing parties. In Yao’s protocol, the wire label computed by
the evaluator and the mapping of wire label to cleartext value form
a sharing of one bit. In the two other protocols, the sharing is more
straight-forward, namely either an XOR (Boolean) sharing is used or
an additive (Arithmetic) sharing. The authors proposed conversion
protocols that securely transform one sharing into the other. For ex-
ample, a Yao sharing can be converted into a Boolean sharing at prac-
tically no cost by using the point-and-permute bits (cf. Section 2.2.2).

Unfortunately, other conversions require communication and com-
putation. For example, the conversion from an nbit Arithmetic shar-
ing into n Yao sharings or n Boolean sharings can be realized by eval-
uating a Boolean addition circuit. Thus, both parties have to enter
the bit decomposition of their arithmetic shares as input into either
GMW or Yao’s protocol and then evaluate an addition circuit. A more
detailed explanation of these conversion protocols and their costs is
given in [DSZ15].

2.3 oblivious ram and ram-sc

In Chapter 8, we will present a compilation approach for RAM based
MPC, which is often referred to as RAM-SC. In this section we give
an overview of relevant ORAMs used in RAM-SC, before describing
RAM-SC itself.

2.3.1 Oblivious RAM (ORAM)

Oblivious RAM (ORAM), first introduced by Goldreich and Ostro-
vsky in the context of software protection and simulation of RAM
programs [GO96], is a cryptographic primitive that allows to obfus-
cate the access pattern to an outsourced storage to achieve memory
trace obliviousness. Therefore, each logical access on some virtual ad-
dress space is translated into a sequence of physical accesses on the
memory, which appears to be random to observers, resulting in the
security guarantee that two sequences of virtual accesses of the same
length produce indistinguishable physical access patterns.


2.3 oblivious ram and ram-sc 18

ORAMs are commonly modeled as protocols between an ORAM
client, who is the data owner, and an untrusted ORAM server, who
provides the physical storage. Typically, an ORAM construction is
comprised of two distinct algorithms, the initialization and the ac-
cess algorithm. Thereby, the initialization algorithm introduces a new
oblivious structure for a given number of elements, while the access
algorithm performs an access to one of the elements in the virtual
memory space using a sequence of accesses on the physical memory,
hiding the accessed virtual index and also the type of operation, i.e,
read or write. An ORAM has a capacity m, which describes the num-
ber of data elements it can store. Moreover, most ORAMs require to
store metadata for each data element, which in combination with the
element itself is referred to as block.

The design goals of standalone ORAM constructions are manifold,
e.g., minimizing client side storage, communication or computation
costs. Therefore, many optimized ORAMs have been proposed, e.g.,
[GO96; KLO12; LO13; Ste+13]. For their combination with MPC (de-
scribed next), a different cost model applies, because the ORAM client
has to be evaluated as a circuit. In this thesis, we study the most ef-
ficient known ORAMs for MPC, namely Circuit-ORAM (C-ORAM)
[WCS15], optimized Square-Root ORAM (SQ-ORAM) [Zah+16], FLO-
RAM [Ds17a], and the FLORAM variant CPRG (FCPRG) [Ds17a].

circuit oram (c-oram). C-ORAM by Wang et al. [WCS15] is an
MPC-optimized derivative of Path ORAM [Ste+13], which is the most
practical known standalone ORAM to date. C-ORAM is a tree-based
ORAM that stores m data elements in a binary tree structure of at
most log2 (m) stages and an additional root level called stash, which
can also be imagined as a cache. Each node in the tree is a smaller
ORAM itself, called bucket ORAM, that stores multiple blocks. Each
data element is randomly mapped to one of the leaf nodes, maintain-
ing the invariant that an element with leaf identifier l is contained
in one of the buckets on the path from the root node to leaf l or in
the stash. To read or write an element in tree-based ORAMs, the ac-
cording path from root to the leaf is read, a new leaf identifier for the
read element is chosen, and the accessed path is moved to the stash.
Moreover, after each access an eviction procedure is initiated, which
writes blocks from stash into the tree while moving blocks as close as
possible to their designated leaves.

C-ORAM requires a recursive position map in RAM-SC, which as-
sociates the virtual index of each element with the position in the tree.
Henceforth, a tree-based ORAM has several construction parameters,
including the size of the bucket ORAMs B, the number of recursive
steps r, and a packing factor c describing the number of mappings
contained in one block of the recursive ORAMs.


2.3 oblivious ram and ram-sc 19

square-root oram (sq-oram). SQ-ORAM was one of the two
ORAMs introduced in the seminal paper by Goldreich and Ostro-
vsky [GO96] and later optimized for MPC by Zahur et al. [Zah+16].
SQ-ORAM uses a fundamentally different strategy than Path ORAM.
Its core idea is to randomly permute the memory and to periodically
refresh this permutation. For m elements, the so-called permuted mem-
ory has size m+

√
m, Furthermore, a shelter/stash for

√
m elements

is used. The simulation of a RAM program takes place in so called
epochs of

√
m steps, consisting of three phases: In a first step the

memory is obliviously permuted using a permutation π that assigns
each element a position in the permuted memory by using random
tags assigned to each element. Afterwards,

√
m virtual accesses can

take place, during which the updated values are written to the shel-
ter. To access an element at index v, first the entire shelter is scanned.
If the element cannot be found, the permuted memory is accessed
to retrieve the element at position π(v), otherwise, the element has
previously been visited and can thus be found in the shelter, and a
dummy access to the permuted memory is performed. As last step of
an epoch, the permuted memory is updated according the shelter.

Zahur et al. [Zah+16], removed the use of Pseudo Random Func-
tions (PRFs) (needed for the permutation), dummy elements, expen-
sive oblivious sorting algorithms, and identified public metadata. Fur-
thermore, they introduced the usage of recursive maps to compute
the mapping between virtual and physical addresses.

floram . FLORAM, recently introduced by Doerner and she-
lat [Ds17a], differs from the other ORAM constructions as it is built
from Function Secret Sharing (FSS), introduced by Boyle et al. [BGI15].
FLORAM is a distributed ORAM [Ds17b; LO13], where the data is
stored in a secret shared manner (XOR) between two servers. Ele-
ments are accessed using Private Information Retrieval (PIR) tech-
niques, i.e., a short query is evaluated on all elements on the server
to extracted the desired element. A point function fα,β(x) evaluates
to β if x = α and 0 otherwise. Informally, FSS allows to share a
Distributed Point Function (DPF) in such a way, that the parties can
evaluate the point function on arbitrary input, yet neither learn α nor
β. This feature allows the client to send the servers specially crafted
queries q0(i) or q1(i) using FSS to retrieve or to write an element at
index i.

FLORAM distinguishes a read-only memory (OROM) and write-
only memory (OWOM). Data in both is stored using an XOR sharing,
yet in OROM, each share is additionally masked using a PRF with a
key known only to the storing party. After a number of write accesses,
the OWOM memory is converted into the OROM. This process is re-
ferred to as refresh. To read an index i, the client shares a DPF that
evaluates to 1 on input i and to 0 otherwise. The servers evaluate


2.3 oblivious ram and ram-sc 20

the DPF on all indices, multiply the result with each associated ele-
ment share and return their aggregated result to the client. Writing is
performed using a similar approach. A conversion between OWOM
and OROM is too expensive to be performed after every write access,
therefore a stash is used that functions as a cache between refreshes.
The stash is scanned when performing read accesses to identify up-
dated elements.

floram cprg (fcprg). FCPRG [Ds17a] is an extension to FLO-
RAM. The most expensive computation on the client side of FLO-
RAM is the creation of the FSS scheme, which requires to compute
2 · log2(m) PRFs for every access. In MPC the evaluation of the PRFs
can render the scheme expensive and therefore, with FCPRG the au-
thors propose to compute the PRFs locally, i.e., outside of MPC, with
the trade-off thatO(log2(m)) interactions between the computing par-
ties are required.

2.3.2 RAM based MPC (RAM-SC)

MPC protocols evaluate functionalities represented as circuits. Cir-
cuits allow to express arbitrary computations, yet random memory
accesses have to be expressed as a chain of multiplexers of the com-
plete memory, referred to as linear scan (LS). This limits MPC for
applications that rely on dynamic memory accesses. Therefore, Gor-
don et al. [Gor+12] proposed to combine MPC protocols with ORAM
to enable dynamic memory accesses with sublinear overhead. The
authors describe a RAM machine, where the circuit computes an
oblivious machine that evaluates instructions and memory accesses.
A complete RAM machine is often not necessary, and thus the so-
called RAM-SC model was later refined by Liu et al. [Liu+14] for
practical efficiency. Its major concepts are described in the following
paragraphs.

First, the parties performing the MPC protocol also act as distributed
ORAM server, and the ORAM client is implemented as circuit evalu-
ated by the MPC protocol itself. Thus, both roles are shared between
the computing parties. Intuitively, privacy is preserved because the
MPC protocol acts as a virtual third party emulating the ORAM client.
Second, a program is evaluated by interweaving the MPC protocol
with ORAM accesses. Consequently, a RAM-SC program consists of
many small protocols that either perform a computation or an ORAM
access. This behavior is exemplary illustrated in Figure 1.

The construction of RAM-SC, as described, is very generic because
it allows to combine different MPC protocols and ORAMs. When as-
suming composable ORAMs, we observe that in one RAM-SC pro-
gram multiple ORAMs of possibly different type can be used, e.g.,
one ORAM for each array in the input program. Moreover, as in stan-


2.3 oblivious ram and ram-sc 21

P1 P2

computa�ons

x = 1023

oram_read(5)

decrypt(b1, b6, b7)

computa�ons

x = array[5]

MPC protocol

(ORAM client)

ORAM1

(server)

ORAM2

(server)

Figure 1: Exemplary and simplified illustration of RAM-SC. A program flow
is illustrated that is computed within an MPC protocol, run be-
tween two parties P1 and P2. At some point, a value is read from
an array with virtual index 5. Therefore, a circuit representing the
ORAM client functionality is executed that translates the virtual in-
dex into multiple physical addresses. These addresses are revealed
to both parties, who enter the blocks as input to the MPC protocol.
Not shown here is, that often also an intermediate state has to be
transferred between two MPC protocol.

dalone ORAMs, the blocks stored on the ORAM server have to be
encrypted. This can be realized by performing an encryption and de-
cryption within a circuit, which requires (even highly optimized) a
substantial amount of gates, e.g., 5000 non-linear gates to encrypt a
single block of 128 bits AES [BP12], using a secret sharing scheme,
e.g., XOR sharing [Ds17a], or by soldering the existing garbled labels
based on the publicly revealed index [Zah+16]. In the XOR sharing
approach a physical block is read by entering the shares as input to
the MPC protocol, which are then recombined within the protocol.
Similarly, to write to one or multiple blocks, the MPC protocol out-
puts one share for each block to every party. When using the solder-
ing approach, the circuit garbler re-uses the existing wire labels but
remaps them according the accessed indices reveled to both parties,
similar to a multiplexer (array) access with public index. We also re-
mark that in RAM-SC, the ORAM access type, i.e., read or write, can
be revealed to both parties, as the algorithm description is seen as
public knowledge. This access type is also referred to as semi-private
access [Ds17a].

security. RAM-SC provides the same privacy and correctness
properties against semi-honest adversaries as traditional MPC pro-
tocols [Gor+12]. RAM-SC with security against malicious adversaries
has been studied in [Afs+15].

complexity. The computation and communication complexity of
a RAM-SC protocol depends on the circuit complexity of the com-
putation, the circuit complexity of the ORAM client, the number of
protocol rounds, as well as additional ORAM protocol costs that are


2.4 compilers for mpc 22

performed outside of the MPC protocol. For ORAMs with less than
O(m) computations or less thanO(m) bandwidth RAM-SC is (asymp-
totically) more efficient than any circuit based MPC protocol.

2.4 compilers for mpc

Due to the complexity of circuit design, multiple compilers for MPC
have been proposed. In this section we give an overview and classifi-
cation of these, which partially form the related work of this thesis.

compiler classification. In general, compilers for MPC share
many similarities with high-level synthesis tools [Gaj+12], known
from digital circuit design. Nevertheless there are substantial differ-
ences between these tools and compilers for MPC, which will be
studied in detail in the following chapters. Therefore, we restrict our
comparison and classification to compilers that have explicitly been
proposed for their use in MPC.

Compilers for MPC can be categorized based on whether they com-
pile a (minimalistic) Domain Specific Language (DSL) or a widely
used common programming language. Moreover, compilers can be inde-
pendent or integrated into an MPC framework. Integrated compilers
produce an intermediate representation, which is interpreted (instan-
tiated by a circuit) only during the execution of an MPC protocol.
These interpreted circuit descriptions commonly allow a more com-
pact circuit representation. Independent compilers create circuits in-
dependent from the executing framework, and thus have the advan-
tage that produced circuits can be optimized to the full extent during
compile time and are more versatile in their use in MPC frameworks.

Especially for Arithmetic circuits it is often hard to make a strict
separation. When compiling these circuits, compilers commonly tar-
get the so called Arithmetic Black Box (ABB) model [DN03], which is
an abstraction between compiler and protocol implementation. The
ABB model provides representations for values and operations on
these, e.g., addition and multiplication, without providing details
about their protocol implementation to the compiler. In principle, an
ABB is Turing complete with only addition and multiplication. Yet,
practical efficiency is only achieved if further functionalities, such as
comparisons, are also provided. We consider an Arithmetic circuit
compiler to be independent if it compiles towards an ABB model
only consisting of elementary functions, involving comparisons and
possibly floating point operations, yet not high-level functionalities
such as sorting or statistical tests, as for example provided by Share-
mind [BLW08].

Some integrated compilers support the compilation of mixed-mode
secure computation. Mixed-mode computation allows to write code
that distinguishes between oblivious (private) and public computa-


2.4 compilers for mpc 23

tion. Thus, the public computation is performed locally in the clear,
interweaved by private computations insides an MPC protocol. This
leads to an even tighter coupling between compiler and execution
framework, but allows to express a mixed-mode program in a single
language. Moreover, some integrated compilers support the compila-
tion for hybrid secure computation protocols or RAM-SC. Next, we
give an overview of Boolean circuit and Arithmetic circuit compilers.

boolean circuit compilers . We begin by studying compilers
targeting Boolean circuit based MPC protocols that use a DSL as
input language. The Fairplay framework by Malkhi et al. [Mal+04]
started research on practical MPC. Fairplay compiles a domain spe-
cific hardware description language called SFDL into a gate list for
use in Yao’s protocol. Following Fairplay, Henecka et el. [Hen+10]
presented the TASTY compiler, which compiles its own DSL, called
TASTYL, into an interpreted hybrid protocol. The PAL compiler by
Mood et el. [MLB12] aims at low-memory devices as the compilation
target. PAL also compiles Fairplay’s hardware description language.
The KSS compiler by Kreuter et al. [KSS12] is the first compiler that
shows scalability up to a billion gates and uses gate level optimiza-
tion methods, such as constant propagation and dead-gate elimina-
tion. KSS compiles a DSL into a flat circuit format. TinyGarble by
Songhori et al. [Son+15] uses (commercial) hardware synthesis tools
to compile circuits from hardware description languages such as Ver-
ilog or VHDL. On the one hand, this approach allows the use of a
broad range of existing functionality in hardware synthesis, but also
shows the least degree of abstraction by requiring the developer to be
experienced in hardware design. We remark that high-level synthesis
from C is possible, yet, as the authors note, this leads to significantly
less efficient circuits. Recently, Mood et al. [Moo+16] presented the
Frigate compiler, which aims at very scalable and extensively tested
compilation of another DSL. Frigate and TinyGarble produce com-
pact circuit descriptions that compress sequential circuit structures.

Examples for mixed-mode and hybrid compilers involving Boolean
circuits from a DSL are L1, Wysteria, ObliVM, and Obliv-C. The L1

compiler by Schröpfer et al. [SKM11] compiles a DSL into a hybrid
protocol involving homomorphic encryption. Wysteria by Rastori et
al. [RHH14] creates Boolean circuits applied to GMW from a mixed-
mode DSL that is supported by a strong formal calculus. ObliVM by
Liu et al. [Liu+15b] extends SCVM [Liu+14], which both compile a
DSL that support the combination of oblivious data structures with
MPC. This approach allows the efficient development of oblivious
algorithms. Yet, both compilers provide only very limited gate and
source code optimization methods. The Obliv-C compiler by Zahur
and Evans [ZE15] also supports oblivious data structures, but fol-
lows a different, yet elegant source-to-source translation approach by


2.4 compilers for mpc 24

C
om

pi
le

r

La
ng

ua
ge

Ta
rg

et
Pr

ot
oc

ol

In
te

rp
re

te
d M

ix
ed

-m
od

e

M
at

ur
ity

[M
oo

+1
6]

C
od

e
Le

ve
lO

pt
im

iz
at

io
n

G
at

e
Le

ve
lO

pt
im

iz
at

io
n

C
BM

C
-G

C
’1

2
A

N
SI

C
Ya

o
no

no
ye

s
gl

ob
al

de
ad

ga
te

el
im

in
at

io
n,

co
ns

ta
nt

pr
op

ag
at

io
n,

th
eo

re
m

re
w

ri
ti

ng
,S

A
T

sw
ee

pi
ng

K
SS

’1
2

D
SL

Ya
o

no
no

no
no

co
ns

ta
nt

pr
op

ag
at

io
n,

de
ad

ga
te

el
im

in
at

io
n

PC
F’

1
3

A
N

SI
C

Ya
o

ye
s

ye
s

no
no

-

W
ys

te
ri

a’
1
4

D
SL

G
M

W
ye

s
ye

s
n/

a
lim

it
ed

lo
ca

lc
on

st
an

t
pr

op
ag

at
io

n,
de

ad
ga

te
el

im
in

at
io

n

O
bl

iv
C

’1
5

D
SL

A
N

SI
C

Ya
o

ye
s

ye
s

no
lo

ca
l

lo
ca

la
nd

lim
it

ed
[M

oo
+1

6
]

co
ns

ta
nt

pr
op

ag
at

io
n,

de
ad

ga
te

el
im

in
at

io
n

O
bl

iV
M

’1
5

D
SL

R
A

M
-S

C
ye

s
ye

s
no

no
-

Ti
ny

G
ar

bl
e’

1
5

Ve
ri

lo
g

V
H

D
L

Ya
o

&
G

M
W

no
no

ye
s

n/
a

de
ad

ga
te

el
im

in
at

io
n,

co
ns

ta
nt

pr
op

ag
at

io
n,

re
w

ri
ti

ng
,S

A
T

sw
ee

pi
ng

Fr
ig

at
e’

1
6

D
SL

Ya
o

no
no

ye
s

lo
ca

l
lo

ca
lc

on
st

an
t

pr
op

ag
at

io
n,

de
ad

ga
te

el
im

in
at

io
n

C
ir

cG
en

’1
7

A
N

SI
C

Ya
o

no
no

ve
ri

fie
d

gl
ob

al
de

ad
ga

te
el

im
in

at
io

n,
co

ns
ta

nt
pr

op
ag

at
io

n

Ta
bl

e
1
:C

om
pa

ri
so

n
of

re
ce

nt
Bo

ol
ea

n
ci

rc
ui

t
co

m
pi

le
rs

fo
r

M
PC

.C
om

pa
re

d
is

th
e

so
ur

ce
la

ng
ua

ge
,w

he
th

er
th

ey
co

m
pi

le
to

co
m

pl
et

e
ci

rc
ui

t
or

an
in

te
rm

ed
ia

te
re

pr
es

en
ta

ti
on

th
at

is
in

te
rp

re
te

d
du

ri
ng

ru
nt

im
e,

w
he

th
er

th
ey

su
pp

or
tm

ix
ed

-m
od

e
M

PC
,t

he
ir

m
at

ur
ity

(c
om

pi
la

ti
on

co
rr

ec
tn

es
s)

ba
se

d
on

th
e

st
ud

y
by

M
oo

d
et

al
.

[M
oo

+1
6

],
an

d
th

e
ap

pl
ie

d
op

ti
m

iz
at

io
n

te
ch

ni
qu

es
on

th
e

so
ur

ce
-c

od
e

le
ve

l
(c

on
st

an
t

pr
op

ag
at

io
n

an
d

co
ns

ta
nt

fo
ld

in
g)

an
d

on
th

e
ga

te
le

ve
l(

lo
gi

c
m

in
im

iz
at

io
n)

.


2.5 benchmarking applications for mpc compilers 25

compiling a modified variant of C into efficient mixed-mode applica-
tions. Very recently, EzPC [Cha+17] has been presented that compiles
mixed-mode hybrid protocols involving Boolean and Arithmetic cir-
cuits.

The CBMC-GC compiler [Hol+12] is the first compiler that creates
circuits for MPC from a common programming language (ANSI C).
CBMC-GC follows the independent compilation approach and pro-
duces a single circuit description. Moreover, CBMC-GC applies source
code optimization and provides a powerful symbolic execution. This
compiler is extended throughout this thesis. The PCF compiler by
Kreuter et al. [Kre+13] also compiles C, using the portable LCC com-
piler as a frontend. PCF compiles an intermediate bytecode repre-
sentation given in LCC into a interpreted circuit format. PCF shows
greater scalability than CBMC-GC, yet only supports comparably lim-
ited optimization methods that are only applied locally for every func-
tion. A (partially) formally verified compiler, named CircGen, has
been proposed by Almeida et al. [Alm+17] that extends the Com-
pcert compiler [Ler06] for ANSI C by a new backend that compiles
Boolean circuits.

A detailed overview of Boolean circuits compilers for MPC is given
in Table 1.

arithmetic circuit compilers . The Viff compiler [Dam+09]
jointly with the proposition of a DSL for MPC [NS07] are two of the
first works in the direction of compilation for Arithmetic circuit based
MPC. The most prominent compiler is the Sharemind compiler by
Bogdanov et al. [BLW08], which provides a production ready mixed-
mode compiler suite (and protocol framework) for a DSL called Se-
creC. The PICCO compiler by Zhang et al. [ZSB13] implements a
mixed-mode programming environment for C that supports paral-
lelization. The Armadillo compiler by Carpov et al. [CDS15] and the
ALCHEMY compiler by Crockett et al. [CPS18] target the creation
of Arithmetic circuits for homomorphic encryption. Recently, also an
integrated compiler for SPDZ has been proposed [Spd].

2.5 benchmarking applications for mpc compilers

MPC has many applications, e.g., privacy-preserving biometric au-
thentication [Erk+09], private set intersection [FNP04], or secure auc-
tions [Bog+09]. For research purposes, some (parts of) these appli-
cations have become popular for benchmarking purposes. The most
prominent example is the AES block cipher, which is the de facto stan-
dard benchmark for MPC protocols. In compiler research on MPC,
multiple different functionalities are commonly compiled into cir-
cuits, which are then compared regarding their properties, i.e., size,
depth and fraction of non-linear gates.


2.5 benchmarking applications for mpc compilers 26

In this section, we give an overview of these commonly bench-
marked functionalities, ranging from very small snippets, e.g., inte-
ger addition, to larger functionalities, e.g., IEEE-754 compliant float-
ing point operations or biometric authentication. In later chapters, we
use these functionalities to evaluate the techniques presented in this
book. Here, we discuss the functionalities together with their com-
plexity and motivate why these functionalities are relevant in the con-
text of MPC and the development of compilers.

Common benchmark functionalities are:

• Integer arithmetic. Due to their (heavy) use in almost every com-
putational problem, arithmetic building blocks are of high im-
portance in high-level circuit synthesis and are often bench-
marked individually. Most common building blocks are addi-
tion, subtraction, multiplication, and division. When studying
these building blocks, one should distinguish the results for dif-
ferent input and output bit widths. Namely, arithmetic opera-
tions can be differentiated in overflow-free operations, e.g., allo-
cating 2n bits for the result of a n×n bit multiplication, and op-
erations with overflow, e.g., when only allocating n bits for the
same multiplication. In later chapters we explicitly state which
input and output bit-widths are studied. To evaluate the scala-
bility of compilers, multiple sequentially (in-)dependent arith-
metic operations can be compiled as a single functionality.

• Matrix multiplication. Algebraic operations, such as matrix or
vector multiplications, are building blocks for many privacy-
preserving applications, e.g., for feature extractors in biometric
matching or the privacy friendly evaluation of neural networks,
and have therefore repeatedly been used to benchmark compil-
ers for MPC [Dem+15; Hol+12; Kre+13]. From an optimizing
compilers perspective, matrix multiplication is a comparably
simple task, as it only involves the instantiation of arithmetic
building blocks. However, it is very well suited to show the scal-
ability of compilers or the capability to fuse multiple arithmetic
operations into single statements, as required for depth mini-
mization (see Chapter 6). Matrix multiplication can be parametri-
zed according the matrix dimensions, the used number repre-
sentation (integer, fixed or floating point) and its bit-width.

• Modular exponentiation. Modular exponentiation, i.e., xymodp,
is relevant in secure computation, as it is a building block for
various cryptographic schemes. For example, blind signatures
can be realized with RSA using modular exponentiation. Blind
signatures allow a signing process, where neither the message
is revealed to the signing party, nor the signing key is revealed
to the party holding the message. Therefore, modular exponen-


2.5 benchmarking applications for mpc compilers 27

tiation has been used multiple times to study the performance
of MPC [Bel+13; DSZ15; KSS12].

• Distances. Various distances need to be computed in many pri-
vacy preserving protocols and are therefore relevant for MPC,
e.g., in biometric matching applications or location privacy ap-
plications. The Hamming distance is a measure between two bit
strings. Measured is the number of pairwise differences in every
bit position. Due to its application in biometrics, the Hamming
distance has often been used for benchmarking MPC compilers,
e.g., [Hol+12; KSS12; Moo+16]. The Manhattan distance

distMH = |x1 − x2|+ |y1 − y2|

between two points a = (x1,y1) and b = (x2,y2) is the distance
along a two dimensional space, i.e., along the Manhattan grid,
when only allowing horizontal or vertical moves. The Euclidean
distance is defined as

distED =
√
(x1 − x2)2 + (y1 − y2)2.

Due to the complexity of the square root function, it is com-
mon in MPC to benchmark the squared Euclidean distance sep-
arately, as its computation is usually sufficient when comparing
multiple distances. All distances can be parameterized accord-
ing to the used input bit-widths.

• Biometric matching. In biometric matching a party matches one
biometric sample (a vector of features) against the other’s party
database of biometric templates. Example scenarios are face-
recognition or fingerprint-matching [Erk+09]. One of the main
concepts is the computation of a distance (see above) between
a sample and all database entries. Once all distances are com-
puted, the minimal distance determines the best match. The
biometric matching application is very interesting for bench-
marking, as it involves many parallel arithmetic operations, as
well as a large number of sequential comparisons. The biomet-
ric matching application can be parameterized by the database
size, the sample dimension (number of features per sample), the
bit-width of each feature, and the used distance.

• Database analytics. Performing data analytics on sensitive data
has numerous applications and therefore many privacy-preserv-
ing protocols have been studied, e.g., [Bog+15; DHC04]. Using
generic MPC techniques is of special interest in analytics, as it
allows to perform arbitrary computations, e.g., hypothesis test-
ing, or allows to add data perturbation techniques, e.g., differen-
tial privacy, before releasing the result with minimal effort. This
task can be parametrized according the database size(s) and the
applied analyses.


2.5 benchmarking applications for mpc compilers 28

• Fixed and floating point arithmetics. Fixed and floating point num-
ber representations allow the computation on real numbers us-
ing integer arithmetic. Therefore, these representations are nec-
essary for all applications where numerical precision is required,
e.g., in privacy preserving statistics. The floating point represen-
tation is the more versatile representation, as it allows to repre-
sent a larger range of values, whereas fixed point arithmetic can
be realized with significant less costs and is therefore of interest
in MPC. Floating point operations are also very suited to evalu-
ate the gate level optimization methods of compilers since they
require many bit operations. When implementing floating point
arithmetic we follow the IEEE-754 standard.

• Gaussian elimination. Solving linear equations is required in many
applications with Gaussian elimination being the most studied
solving algorithm. Due to its wide range of use, it is also of
relevance in privacy-preserving applications. In this thesis we
evaluate our compiler using a textbook Gauss solver with par-
tial pivoting. The task can be parameterized by the number of
equations.

• Location-aware scheduling. Privacy-preserving availability schedul-
ing is another application that can be realized with MPC [Bil+11].
The functionality matches the availability of two parties over a
number of time slots, without revealing the individual schedule
to the other party. Location-aware scheduling also considers the
location and maximum travel distance of the two parties for a
given time slot. Therefore, the functionality outputs a matching
time slot where both parties are available and in close proximity
to each other (if existent). The functionality can be parameter-
ized by the number of time slots used per party, the representa-
tion of locations, and the distance measure, e.g., Manhattan or
Euclidean distance.

• Machine learning. Machine learning (ML) with its distinction in
super- and unsupervised learning techniques has many applica-
tions and is a very active field of research. Therefore, protecting
the privacy of training data or ML inputs is also an active re-
search area.

Supervised machine learning – Neural networks. One of the most
powerful ML techniques are Convolutional Neural Networks
(CNNs). Because of this, many dedicated protocols for private
data classification using CNNs have been proposed, e.g., [Gil+16;
Liu+17; Ria+18]. Computationally, CNNs consists of multiple
matrix multiplications concatenated with convolutions and non-
linear activation functions.


2.5 benchmarking applications for mpc compilers 29

Unsupervised machine learning – k-means. Clustering is another
ML task, frequently used to identify centroids in unstructured
data. One of the most well known clustering algorithms is k-
means, and multiple works proposed dedicated privacy-preserv-
ing k-means protocols, e.g., [JW05; VC03]. The k-means algo-
rithm can be parametrized according to the number of data
points, the number of clusters, and the number of iterations to
perform the algorithm.

• Median computation. The secure computation of the median is
required in privacy preserving statistics. It is an interesting task
for compilation, as it can be implemented using a sorting al-
gorithm, because a sorted array allows a direct access to the
median element. The compiled circuit depends on the used sort-
ing algorithm. In this book, we evaluate Bubble and Merge sort.
Moreover, the task can be parameterized according to the num-
ber of elements and their bit-width.

implementation differences . We remark that for a fair com-
parison between multiple compilers that compile the same function-
ality, it has to be made sure that the same algorithm as well as the
same abstraction level of the source code is used. To illustrate this
thought we give three example implementations for computing the
Hamming distance, which produce circuits of largely varying sizes
(cf., Section 4.5). The distance can be computed by XOR-ing the input
bit strings and then counting the number of bits. An exemplary im-
plementation is given in Listing 1 that computes the distance between
two bit-strings of length 160 bit, which are split over five unsigned in-
tegers. In Line 7 the number of ones in a string of 32 bit is computed.
This task is also known as population count. In the following para-
graphs we describe three different implementations.

1 #define N 5
2 void hamming () {
3 unsigned INPUT_A_x[N];
4 unsigned INPUT_B_y[N];
5 unsigned res = 0;
6 for(int i = 0; i < N; i++) {
7 res += count_naive32(INPUT_A_x[i]^ INPUT_B_y[i]);

8 }
9 unsigned OUTPUT_res = res;

10 }

Listing 1: Hamming distance computation between two bit strings.

The first implementation is given in Listing 2. In this naïve ap-
proach, each bit is extracted using bit shifts and the logical AND


2.5 benchmarking applications for mpc compilers 30

operator before being aggregated.

1 unsigned char count_naive32(unsigned y) {
2 unsigned char m = 0;
3

4 for(unsigned i = 0; i < 32; i++) {
5 m += (y & (1 << i)) >> i;
6 }
7

8 return m;
9 }

Listing 2: Counting bits using a naïve bit-by-bit comparison
approach.

A variant of this implementation is given in Listing 3. Here, the bit
string of length 32 is first split into chunks of 8 bit (unsigned char).
The ones set in each chunk are then counted as described above.

1 unsigned char count_naive8(unsigned char c) {
2 unsigned char m = 0;
3 for(int i = 0; i < 8; i++) {
4 m += (c & (1 << i)) >> i;
5 }
6 return m;
7 }
8

9 unsigned char count_tree32(unsigned y) {
10 unsigned char m0 = y & 0xFF;
11 unsigned char m1 = (y & 0xFF00) >> 8;
12 unsigned char m2 = (y & 0xFF0000) >> 16;
13 unsigned char m3 = (y & 0xFF000000) >> 24;
14

15 return count_naive8(m0) + count_naive8(m1) + \
16 count_naive8(m2) + count_naive8(m3);
17 }

Listing 3: Counting bits over unsigned chars in a tree based manner.

Finally, in Listing 4 a variant optimized for a CPU with 32 bit regis-
ters and slow multiplication is given. This implementation uses only
14 CPU instructions.

1 unsigned count_reg32(unsigned y) {
2 unsigned x = y - ((y >> 1) & 0x55555555);
3 x = (x & 0x33333333) + ((x >> 2) & 0x33333333);
4 x = (x + (x >> 4)) & 0x0f0f0f0f;
5 x += x >> 8;
6 x += x >> 16;
7 return x;
8 }

Listing 4: Counting bits, optimized for a 32 bit register machine.


2.5 benchmarking applications for mpc compilers 31

Fundamental implementation differences will inevitably lead to dif-
ferent compilation results. Therefore, all benchmarked functionalities
in this book are implemented using the same algorithms, data struc-
tures and data types when using them for the comparison with other
compilers even when using different input languages.


3
F R O M A N S I C C O D E T O B O O L E A N C I R C U I T S

Summary: The practicality of Secure Multi-party Computation (MPC) is
hindered by the difficulty to implement applications on top of the under-
lying cryptographic protocols. This is because the manual construction of
efficient applications, which need to be represented as Boolean or arithmetic
circuits, is a complex, error-prone, and time-consuming task. For the prac-
tical use of MPC, and thus the development of further privacy-enhancing
technologies, compilers supporting common programming languages are
desirable to provide developers an accessible interface to MPC.

In this chapter, we describe the translation approach that is followed by
the Boolean circuit compiler CBMC-GC [Hol+12], which is based on the
model checker CBMC [CKL04], and which compiles ANSI C into circuits
ready for their use in MPC protocols. Throughout this thesis, we will extend
the here presented tool-chain to create optimized circuits for MPC.

Remarks: This chapter is based in parts on Chapter 3 of our book – “Compi-
lation for Secure Multi-Party Computation”, Niklas Büscher and Stefan Katzen-
beisser, Springer Briefs in Computer Science 2017, ISBN 978-3-319-67521-3.

3.1 motivation and overview

When visualizing an MPC protocol as a hardware architecture that
can execute programs (functionalities), then the program descriptions
are Boolean circuits (e.g., for Yao’s Garbled Circuits [Yao86]) or Arith-
metic circuits (e.g., for the SPDZ protocol [Dam+12]). Consequently,
a compiler for MPC protocols compiles a functionality written in a
high-level language into an (optimized) circuit representation. The
core focus of this thesis is the creation of optimized Boolean circuits
that also allow developers without a professional background in com-
puter security and hardware design to create efficient applications for
MPC. All techniques presented in