SCALE-MAMBA v1.14: Documentation Programming
SCALE-MAMBA v1.14: Documentation Programming
14 : Documentation
A. Aly K. Cong D. Cozzo M. Keller E. Orsini D. Rotaru O. Scherer
P. Scholl N.P. Smart T. Tanguy T. Wood
August 2, 2021
Contents
1 Changes 7
1.1 Changes in version 1.14 from 1.13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Changes in version 1.13 from 1.12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Changes in version 1.12 from 1.11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Changes in version 1.11 from 1.10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Changes in version 1.10 from 1.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.6 Changes in version 1.9 from 1.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.7 Changes in version 1.8 from 1.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.8 Changes in version 1.7 from 1.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.9 Changes in version 1.6 from 1.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.10 Changes in version 1.5 from 1.4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.11 Changes in version 1.4.1 from 1.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.12 Changes in version 1.4 from 1.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.13 Changes in version 1.3 from 1.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.14 Changes in version 1.2 from 1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.15 Changes in version 1.1 from 1.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.16 Changes From SPDZ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.16.1 Things no longer supported . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.16.2 Additions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.16.3 Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2 Introduction 17
2.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1
3.3.2 Data for secret sharing: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3.3 Conversion Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4 Idiot’s Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 Simple Example 27
4.1 Compiling and running simple program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 The Test Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3 Run Time Switches for Player.x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2
5.3.25 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6 Compiler Pipeline 55
6.1 Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.1.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.2 The Inner MAMBA Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.2.1 Understanding the compilation output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.2.2 Program Level Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.2.3 Compilation comments regarding the tape enrollment: . . . . . . . . . . . . . . . . . . . . . 58
6.2.4 Offline data Requirements: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.3 The Old Compilation Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.3.1 Program Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.3.2 Optimizing Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.3.3 Register allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.3.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.4 The New Compilation Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.4.1 scasm Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.4.2 scasm Shell Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.4.3 scasm Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.4.4 scasm Internals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.4.5 scasm Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.4.6 Things to have in mind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3
8.2.3 Public Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8.2.4 Public Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8.2.5 Share Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
8.2.6 Share Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
8.3 Other IO Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
8.3.1 Opening and Closing Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
8.3.2 Trigger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
8.3.3 Debug Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
8.3.4 Crashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
8.4 MAMBA Hooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4
14 Advanced Protocols 126
14.1 Basic Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
14.1.1 Inv(hxi): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
14.1.2 Ran∗p (): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
14.1.3 PreMult(ha1 i, . . . , hat i, T ): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
14.2 Bit Oriented Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
14.2.1 OR(hai, hbi): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
14.2.2 XOR(hai, hbi): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
14.2.3 KOp( , ha1 i, . . . , hak i, k): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
14.2.4 PreOp( , ha1 i, . . . , hak i, k): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
14.2.5 Sum-Bits(hxiB ): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
14.2.6 PRandInt(k): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
14.2.7 PRandM(k, m, κ): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
14.2.8 CarryOut(haiB , hbiB , k): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
14.2.9 CarryOutAux(hdk iB , . . . , hd1 iB , k): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
14.2.10 BitAdd((hak−1 i, . . . , ha0 i), (hbk−1 i, . . . , hb0 i), k): . . . . . . . . . . . . . . . . . . . . . . . 133
14.2.11 BitLTFull(haiB , hbiB , k): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
14.2.12 BitLT(a, hbiB , k): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
14.2.13 BitDecFull(hai): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
14.2.14 BitDec(hai, k, m, κ): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
14.3 Arithmetic with Signed Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
14.3.1 Mod2m(haprime i, hai, k, m, κ, signed): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
14.3.2 Mod2(hai, k, κ, signed): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
14.3.3 Addition, Multiplication in Zhki . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
14.3.4 Pow2(hai, k, κ): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
14.3.5 B2U(hai, k, κ): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
14.3.6 TruncPr(hai, k, m, κ): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
14.3.7 Trunc(hai, k, m, κ, signed): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
14.3.8 Oblivious Trunc(hai, k, hmi, κ): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
14.3.9 LTZ(hai, k, κ): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
14.3.10 EQZ(hai, k, κ): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
14.3.11 Comparison Operators: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
14.4 Arithmetic with Fixed Point Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
14.4.1 FxEQZ, FxLTZ, FxEQ, FxLT, etc: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
14.4.2 FxAbs(hai, k, f ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
14.4.3 FxFloor(hai, k, f ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
14.4.4 FxNeg(hai, k, f ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
14.4.5 FxAdd(hai, hbi, k, f ): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
14.4.6 FxMult(hai, hbi, k, f ): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
14.4.7 FxDiv(hai, b, k, f ): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
14.4.8 FxDiv(hai, hbi, k, f ): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
14.4.9 AppRcr(hbi, k, f ): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
14.4.10 Norm(hbi, k, f ) : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
14.4.11 NormSQ(hbi, k) : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
14.4.12 SimplifiedNormSQ(hbi, k) : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
14.4.13 MSB(hbi, k) : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
14.5 Arithmetic with Floating Point Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
14.5.1 FlowDetect(hpi): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
14.5.2 FLNeg((hvi, hpi, hzi, hsi, herri)): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
14.5.3 FLAbs((hvi, hpi, hzi, hsi, herri)): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
14.5.4 FLMult((hv1 i, hp1 i, hz1 i, hs1 i, herr1 i), (hv2 i, hp2 i, hz2 i, hs2 i, herr2 i)): . . . . . . . . . . . . . 147
5
14.5.5 FLAdd((hv1 i, hp1 i, hz1 i, hs1 i, herr1 i), (hv2 i, hp2 i, hz2 i, hs2 i, herr2 i)): . . . . . . . . . . . . . 147
14.5.6 SDiv(hai, hbi, `): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
14.5.7 FLDiv((hv1 i, hp1 i, hz1 i, hs1 i, herr1 i), (hv2 i, hp2 i, hz2 i, hs2 i, herr2 i)): . . . . . . . . . . . . . 148
14.5.8 FLLTZ((hvi, hpi, hzi, hsi, herri)): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
14.5.9 FLEQZ((hvi, hpi, hzi, hsi, herri)): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
14.5.10 FLGTZ((hvi, hpi, hzi, hsi, herri)): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
14.5.11 FLLEZ((hvi, hpi, hzi, hsi, herri)): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
14.5.12 FLGEZ((hvi, hpi, hzi, hsi, herri)): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
14.5.13 FLEQ((hv1 i, hp1 i, hz1 i, hs1 i, herr1 i), (hv2 i, hp2 i, hz2 i, hs2 i, herr2 i)): . . . . . . . . . . . . . . 149
14.5.14 FLLT((hv1 i, hp1 i, hz1 i, hs1 i, herr1 i), (hv2 i, hp2 i, hz2 i, hs2 i, herr2 i)): . . . . . . . . . . . . . . 149
14.5.15 FLGT((hv1 i, hp1 i, hz1 i, hs1 i, herr1 i), (hv2 i, hp2 i, hz2 i, hs2 i, herr2 i)): . . . . . . . . . . . . . . 150
14.5.16 FLLET((hv1 i, hp1 i, hz1 i, hs1 i, herr1 i), (hv2 i, hp2 i, hz2 i, hs2 i, herr2 i)): . . . . . . . . . . . . . 150
14.5.17 FLGET((hv1 i, hp1 i, hz1 i, hs1 i, herr1 i), (hv2 i, hp2 i, hz2 i, hs2 i, herr2 i)): . . . . . . . . . . . . . 150
14.6 Conversion Routines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
14.6.1 FLRound((hv1 i, hp1 i, hz1 i, hs1 i, herr1 i, mode): . . . . . . . . . . . . . . . . . . . . . . . . . 151
14.6.2 Int2Fx(hai, k, f ): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
14.6.3 Int2FL(hai, γ, `): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
14.6.4 Fx2Int(hai, k, f ): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
14.6.5 FxFloor(hai, k, f ): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
14.6.6 Fx2FL(hgi, γ, f, `, k): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
14.6.7 FL2Fx((hvi, hpi, hzi, hsi, herri), `, k, γ, f ): . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
14.7 SQRT Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
14.7.1 ParamFxSqrt(hxi, k, f ): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
14.7.2 SimplifiedFxSqrt(hxi, k, f ): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
14.7.3 FxSqrt(hxi, k ← sfix.k, f ← sfix.f): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
14.7.4 LinAppSQ(hbi, k, f ): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
14.7.5 FLSqrt((hv1 i, hp1 i, hz1 i, hs1 i, herr1 i)): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
14.8 EXP and LOG Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
14.8.1 FxExp2(hai, k, f ): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
14.8.2 FLExp2((hv1 i, hp1 i, hz1 i, hs1 i, herr1 i)): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
14.8.3 FxLog2(hai, k, f ): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
14.8.4 FLLog2((hv1 i, hp1 i, hz1 i, hs1 i, herr1 i)): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
14.9 Trigonometic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
14.9.1 F ? TrigSub(hxi): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
14.9.2 F ? Sin(hxi) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
14.9.3 F ? Cos(hxi) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
14.9.4 F ? Tan(hxi) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
14.10Inverse Trigonometric Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
14.10.1 F ? ArcSin(hxi) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
14.10.2 F ? ArcCos(hxi) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
14.10.3 F ? ArcTan(hxi) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
References 168
6
1 Changes
This section documents changes made since the “last” release of the software.
This is currently version 1.14 of the SCALE and MAMBA software. There are three main components to the system,
a run time system called SCALE,
Secure Computation Algorithms from LEuven
and a Python-like programming language called MAMBA
Multiparty AlgorithMs Basic Argot
Note, that MAMBA is a type of snake (like a Python), a snake has scales, and an argot is a “secret language”. We also
have a Rust based programming language which will slowly replace MAMBA as the main programming language for
the system.
The software is a major modification of the earlier SPDZ software. You should not assume it works the same so please
read the documentation fully before proceeding.
7
1.3 Changes in version 1.12 from 1.11
Apart from the usual minor bug fixes...
1. Memory is now of fixed size determined at compile time. This is defined in the variables *MEMSIZE in the file
config.h. Any access outside of the range causes a crash of the system. No longer (as since version 1.7) does
the memory dynamically increase as your program asks for more.
2. Memory can be allocated/deallocated in the runtime using the NEW and DELETE byte-codes. This is not really
used in the MAMBA pipeline, but is used in the Rust pipeline to simplify memory management and reduce
memory consumption.
3. Local functions associated to floating point arithmetic have been added. This is to support the ClearIEEE
type in the Rust pipeline.
4. Removed need for gmpy2.
5. Added MODINT instruction to complement the DIVINT instruction.
1. There has been a major upgrade for when processing binary circuits when using replicated or Shamir secret
sharing. This gives a (roughly) ten-fold performance improvement when executing GC commands, or perform-
ing arithmetic on sregint’s. In particular this means all ShareData.txt files produced by the Setup.x
program need to be updated. For details of the differences between the two methodologies for executing binary
circuits see Chapter 10.
2. We now have an experimental way of programming the system using Rust. See the pdf file in the directory
RustDocumentation for an overview. Whilst this is not yet fully operational we welcome feedback and
language features. (Warning:) Overtime MAMBA will be deprecated as a means of programming the system
as the Rust variant becomes more fully operational.
3. Circuit for IEEE floating point ‘less than’ added.
7. We have added CALLR and JMPR instructions which call/jump to an absolute address indicated by a regint
argument.
8. The RUN_TAPE instruction now takes a fourth argument which specifies what value of the program counter you
should start the tape with.
9. Some code has been moved around to aid in calling Scale programmatically from other programs.
10. To compile you can now either use the traditional Makefile, or you can use Rust to compile the C++. To use this
latter option just set the environment variable USE_RUST to be equal to one, and then run Player.x. This
will compile the binary and then call it. Note, to compile the Test programs and the KeyGen program you still
need to use traditional Make. For this reason we have removed support for using CMake.
8
1.5 Changes in version 1.10 from 1.9
Apart from the usual minor bug fixes...
1. There has been some optimization of the low level math routines. This means there is a new compile time flag
called MAX_GFP_SIZE, which needs to be set. This corresponds to the maximum size of the finite field used
for the secret sharing scheme. The variable MAX_MOD_SIZE corresponds to the maximum size of the moduli
in the FHE scheme used for the Offline phase in the Full Threshold implementation, it must be bigger than
MAX_MOD_SIZE.
2. A compile time flag BENCH_OFFLINE now prints at the end of the program how much offline data your
program consumed. By using this data and tweaking the settings in config.h, and using the min and max
flags you can get better latency for your programs (sometimes).
3. Some extension to the API for the stacks has been introduced, you can now peek and poke in relative to the top
as well as relative to the bottom of the stack.
4. A negative value of verbose larger than −1 now prints timing information for each instruction in the virtual
machine which is executed in the online phase.
5. Tidy up of the compiler re location of functions etc. Fixed a lot of documentation, and the definitions in the
process. Some functions are now removed/renamed; but these are more in the internal functions and less in the
exposed functions. If a function you have used seems to have changed (i.e. it is not found by the compiler) let
us know and we will tell you what it has been renamed to.
6. You can now convert back and forward between sint and sbit values directly, without having to go via a
costly sregint intermediate value.
7. You can now also convert between an sregint holding an IEEE floating point value and an sfloat (and vice-versa).
8. Modifications to how we use the MMO hash function in garbling and in OT extension in the main SCALE
program.
9. Added instructions for regint bitwise and shift operations, this may speed up some programs.
10. Corrected a silly mistake in the random number generation. This gives about a 10 percent improvement in
throughput for triples generation for full threshold access structures; plus minor improvements in other cases.
9
1.7 Changes in version 1.8 from 1.7
Apart from the usual minor bug fixes...
1. Added a constructor for sbit objects.
2. The ability to execute IEEE compliant floating point arithmetic using the Garbled Circuit engine has been added.
See Section 10.6.
3. Loads of bug fixes for SIMD operation on vectors have been implemented. Plus a speed up for the SIMD input
and output of private gfp values.
4. Added a flag -dOT to turn off the OT threads to Player.x. This can increase speed, reduce memory, but
means all the GC based operations will no longer work.
5. Fixed some bugs in the low level documentation.
10
6. The GC operation now works by pushing and popping arguments and return values from the stack. See the
demo program as to how this is done, or the updated documentation in Section 10.
7. The Local Function operations work in the same way, via the stacks. Again see the demo program, or the
updated documentation in Section 11.
8. MAMBA functions for opening and closing channels are slightly changed.
9. daBit generation has been improved for all access structures for primes p > 264 using an adaption of the method
in [RST+ 19]. This gives both a significant simplification in the code and a performance boost.
10. Added a method to convert sregint’s to sint’s where we treat the sregint as an unsigned 64-bit value.
11. Added circuits for the SHA-256 and SHA-512 compression functions.
12. Added more efficient circuits for basic sregint arithmetic. These operations should now be twice as fast.
11
1.10 Changes in version 1.5 from 1.4.1
Apart from the usual minor bug fixes...
1. New byte-code instruction SINTBIT (and associated function to utilize it from MAMBA) which sets a given
bit of an sregint to a specific sbit value.
2. Ability to define and call your own garbled circuit based operations, for user defined circuits. See Section 10 for
more details.
3. Ability to define and call your own complex local functions. See Section 11 for more details.
4. Extra thread for aAND production to make garbled circuit operations a bit more smooth.
1. We now have a full OT based n-party garbled circuit functionality integrated. The base Random OT’s can
either be derived from the Dual Mode Encryption method of [PVW08] or the SimpleOT method of [CO15] The
former has, however, a simple attack against it when used to generate random OT’s. Thus we by default utilize
the second method to perform base OTs. This can be changed in the file config.h if needs be.
2. The base random OTs are converted into a large number of correlated COT’s, for which we use [FKOS15][Full
Version, Figure 19] and [KOS15][Full Version, Figure 7]. These correlated OTs are then converted into random
sharings of authenticated bits (so called aShares/aBits), for this step we use [HSS17][Full Version, Figure 16].
Finally these aBits are converted into aANDs using [WRK17][Full Version, Figures 16, 8 and 18 in order].
With the final GC protocol being implemented using [HSS17]. The hash function in the protocol to generated
HaANDs from [WRK17] is implemented using the CCR-secure MMO construction from [GKWY19].
3. We have extended the MAMBA language to include new datatypes sregint and sbit. These are secure
(signed) 64-bit and 1-bit integer datatypes respectively. These are operated on by using a fixed set of garbled
circuits. As such only certain operations are supported, roughly equivalent to the kind of arithmetic operations
you can do on C long values. Obviously we could extend the system to allow user defined circuits to be
executed on arbitrary data widths, but this would detract from our goal of trying to abstract away much of the
nitty-gritty of building MPC solutions.
12
4. To deal with conversion between LSSS and GC representations we use the daBit method from [RW19]. To
produce the conversion circuit needed we added a new part to the Setup routine. This conversion only works if
certain conditions are met by the prime used for the LSSS scheme; we will discuss this later in the relevant part
of Section 7.1.3.
5. We give advanced users the option of selecting their own prime for the full-threshold case.
6. Full threshold IO pre-processed data no longer does ZKPoKs, just as in the original SPDZ-2 implementation.
This is secure, as a moments thought will reveal.
7. The TopGear implementation has been upgraded to use the second version of the protocol, also other low-level
implementation optimizations have been performed which gives a massive boost to throughput.
3. Minor correction to how FHE parameters are created. This means that the FHE parameters are a bit bigger than
they were before. Probably a good idea to re-run setup to generate new keys etc in the Full Threshold case.
4. Minor change to how CRASH works. If the IO handler returns, then the RESTART instruction is automatically
called.
5. There is a new run time switch maxI for use when performing timings etc. This is only for use when combined
with the max switch below.
6. Two new instructions CLEAR_MEMORY and CLEAR_REGISTERS have been added. These are called from
MAMBA via clear_memory() and clear_registers(). The second may issue out of order (consider
it experimental at present).
13
1.15 Changes in version 1.1 from 1.0
1. Major bug fix to IO processing of private input and output. This has resulted in a change to the byte-codes for
these instructions.
2. We now support multiple FHE Factory threads, the precise number is controlled from a run-time switch.
3. The restart methodology now allows the use to programmatically create new schedules and program tapes if so
desired. The “demo” functionality is however exactly as it was before. Please see the example functionality in
the file src/Input_Output/Input_Output_Simple.cpp
4. We have added in some extra verbose output to enable timing of the offline phase. To time the offline phase,
on say bit production, you can now use the program Program/do_nothing.mpc, and then execute the
command for player zero.
Note square production on its own is deliberately throttled so that when run in a real execution bit production is
preferred over squares. By altering the constant in the program Program/do_nothing.mpc you can also
alter the number of threads used for this timing operation. If you enter a negative number for verbose then
verbose output is given for the online phase; i.e. it prints the byte-codes being executed.
5. The fixed point square root functions have now been extended to cope with full precision fixed point numbers.
6. The PRINTxxx byte-codes now pass their output via the Input_Output functionality. These byte-codes
are meant for debugging purposes, and hence catching them via the IO functionality makes most sense. The
relevant function in the IO class is debug_output.
7. We have added the ability to now also input and output regint values via the IO functionality, with associated
additional byte-codes added to enable this.
8. The IO class now allows one to control the opening and closing of channels, this is aided by two new byte-codes
for this purpose called OPEN_CHANNEL and CLOSE_CHANNEL.
9. Input of clear values via the IO functionality (i.e. for cint and regint values) is now internally checked to
ensure that all players enter the same clear values. Note, this requires modification to any user defined derived
classes from Input_Output_Base. See the chapter on IO for more details on this.
10. The way the chosen IO functionality is bound with the main program has now also been altered. See the chapter
on IO for more details on this.
11. These changes have meant there are a number of changes to the specific byte-codes, so you will need to re-
compile MAMBA programs. If you generate your own byte-codes then your backend will need to change as
well.
14
3. Socket connections, file, and other forms of IO to the main MPC engine is now unified into a single location.
This allows you to extend the functionality without altering the compiler or run-time in any way (bar changing
which IO class you load at compile time). See Section 8 for details.
1.16.2 Additions
1. The offline and online phases are now fully integrated. This means that run-times will be slower than you would
have got with SPDZ, but the run-times obtained are closer to what you would expect in a “real” system. Both
the online and offline phases are actively secure with abort.
2. Due to this change it can be slow to start a new instance and run a new program. So we provide a new (exper-
imental) operation which “restarts” the run-time. This is described in Section 9. This operation is likely to be
revamped and improved in the next release as we get more feedback on its usage.
3. We support various Q2 access structures now, which can be defined in various ways: Shamir threshold, via
Replicated sharing, or via a general Monotone Span Programme (MSP). For replicated sharing you can define
the structure via either the minimally qualified sets, or the maximally unqualified sets. For general Q2-MSPs
you can input a non-multiplicative MSP and the system will find an equivalent multiplicative one for you using
the method of [CDM00].
4. Offline generation for Q2 is done via Maurer’s method [Mau06], but for Replicated you can choose between
Maurer and the reduced communication method of Keller, Rotaru, Smart and Wood [KRSW18]. For general
Q2-MSPs, and Shamir sharing, the online phase is the method described in Smart and Wood [SW19], with
(currently) the offline phase utilizing Maurer’s multiplication method [Mau06].
5. All player connections are now via SSL, this is not strictly needed for full threshold but is needed for the other
access structures we now support.
6. We now have implemented more higher level mathematical functions for the sfix datatype, and corrected a
number of bugs. A similar upgrade is expected in the next release for the sfloat type.
1.16.3 Changes
1. The major change is that the offline and online phases are now integrated. This means that to run quick test
runs, using full threshold is going to take ages to set up the offline data. Thus for test runs of programs in the
online phase it is best to test using one of the many forms of Q2 access structures. For example by using Shamir
with three players and threshold one. Then once your online program is tested you can move to a production
system with two players and full threshold if desired.
2. You now compile a program by executing
./compile.py Programs/tutorial
where Programs/tutorial is a directory which contains a file called tutorial.mpc. Then the compiler
puts all of the compiled tapes etc into this directory. This produces a much cleaner directory output etc. By
typing make pclean you can clean up all pre-compiled directorys into their initial state.
3. The compiler picks up the prime being used for secret sharing after running the second part of Setup.x. So
you need to recompile the .mpc files if you change the prime used in secret sharing, and you should not compile
any SCALE .mpc programs before running Setup.x.
4. Internally (i.e. in the C++ code), a lot has been re-organized. The major simplification is removal of the
octetstream class, and it’s replacement by a combination of stringstream and string instead. This
makes readibility much easier.
15
5. All opcodes in the range 0xB* have been changed, so any byte-codes you have generated from outside the
python system will need to be changed.
6. We have tried to reduce dependencies between classes in the C++ code a lot more. Thus making the code easier
to manage.
7. Security-wise we use the latest FHE security estimates for the FHE-based part, and this can be easily updated.
See Chapter 12 on FHE security later.
16
2 Introduction
The SCALE system consists of three main sub-systems: An offline phase, an online phase and a compiler. Unlike the
earlier SPDZ system, in SCALE the online and offline phases are fully integrated. Thus you can no longer time just
the online phase, or just the offline phase. The combined online/offline phase we shall refer to as SCALE, the compiler
takes a program written in our special language MAMBA, and then turns it into byte-code which can be executed by
the SCALE system.
We provide switches (see below) to obtain different behaviours between how the online and offline phases are
integrated together, which can allow for some form of timing approximation. The main reason for this change is to
ensure that the system is “almost” secure out of the box, even if it means it is less good for obtaining super-duper
numbers for research papers.
An issue though is that the system takes a while to warm up the offline queues before the online phase can execute.
This is especially true in the case of using a Full Threshold secret sharing scheme. Indeed in this case it is likely that
the online phase will run so-fast that the offline phase is always trying to catch up. In addition, in this situation the
offline phase needs to do a number of high-cost operations before it can even start. Thus using the system to run very
small programs is going to be inefficient, although the execution time you get is indicative of the total run time you
should be getting in a real system, it is just not going to be very impressive.
In order to enable efficient operation in the case where the offline phase is expensive (e.g. for Full Threshold secret
sharing) we provide a mechanism to enable the SCALE system to run as a seperate process (generating offline data),
and then a MAMBA program can be compiled in a just-in-time manner and can then be dispatched to the SCALE
system for execution. Our methodology for this is not perfect, but has been driven by a real use case of the system.
See Section 9 for more details.
But note SCALE/MAMBA is an experimental research system there has been no effort to ensure that the system
meets rigourous production quality code control. In addition this also means it comes with limited support. If you
make changes to any files/components you are on your own. If you have a question about the system we will endeavour
to answer it.
Warnings:
• The Overdrive system [KPR18] for the offline phase for full-threshold access structures requires a distributed
key generation phase for the underlying homomorphic encryption scheme. The SPDZ-2 system and paper does
describe such a protocol, but it is only covertly secure. A newer paper [RST+ 19] presents an actively secure
method to generate the keys, which is specifically tailored for the keys used in SCALE in this case. A program
in the directory KeyGen implements the protocol in this paper, see Chapter 13 for more details.
In the normal execution of the Setup.x program in the full-threshold case this protocol is not executed, and
thus Setup.x internally generates a suitable key and distribute it to the different players.
2.1 Architecture
The basic internal runtime architecture is as follows:
• Each MAMBA program (.mpc file) will initiate a number of threads to perform the online stage. The number
of “online threads” needed is worked out by the compiler. You can programmatically start and stop threads
using the python-like language (see later). Using multiple threads enables you to get high throughput. Almost
all of our experimental results are produced using multiple threads.
• Each online is associated with another four “feeder” threads. One produces multiplication triples, one produces
square pairs, one produces shared bits and one produces data for input/output of data items. The chain of events
is that the multiplication thread (say) produces a fully checked triple. This triple is added to a triple-list (which
is done in batches for efficiency) for consumption by the online phase. The sizes of these lists can be controlled
(somewhat) by the values in the file config.h. One can control the number of entries ever placed on the
sacrificed-list by use of the run-time flag max.
17
• By having the production threads aligned with an online thread we can avoid complex machinary to match
produces with consumers. This however may (more like will) result in the over-production of offline data for
small applications.
• In the case of Full Threshold we have another set of global threads (called the FHE Factory threads, or the FHE
Industry) which produces level one random FHE ciphertexts which have passed the ZKPoK from the Top Gear
protocol [BCS19], which is itself a variant of the High Gear protocol in Overdrive [KPR18]. This is done in a
small bunch of global threads as the ZKPoK would blow up memory if done for each thread that would consume
data from the FHE thread. These threads are numbered from 10000 upwards in the code. Any thread (offline
production thread) can request a new ciphertext/plaintext pair from the FHE factory threads. Experiments show
that having two FHE factory threads is usually optimal.
• In the case of full-threshold access structures or Q2 access structures generated by a generic MSP, we also
implement a thread (number 20000 in the code) which implements pairwise OTs via OT-extension. This thread
produces a bunch of authenticated bits in various queues, one for each online thread. A similar thread (number
20001 in the code) does the same for authenticated ANDs, i.e. aANDS. Each main online thread then removes
aBits and aANDs from its respective queue when it needs these to produce daBits or execute the garbled circuit
protocols. The first time the online thread meets an instruction which requires such data there can be a little
lag as the sub-queues fill up, but this disappears on subsequent instructions (until the queues need filling again).
These aBits and aANDs are used in the HSS protocol to execute binary circuits [HSS17].
• For Shamir sharing or Replicated sharing instances we have a single thread 20000 which generates shared triples
for AND gates. This executes a generalisation of the protocol from [ABF+ 17] for triple generation (in particular
Protocol 3.1 from that paper). The generalisationis an obvious one from the three party case to a general Q2
Replicated secret sharing. The base multiplication protocol is that of Maurers. The only reason we do not
support general Q2 MSPs is that currently we have a bug in generating the associated Replicated scheme from
the generic MSP. The triples produced by this thread are then used in an online phase which is essentially the
modulo two variant of the protocol of Smart and Wood [SW19]; thus basically it uses Beaver multiplication and
‘checking’ is performed by hashing the reconstructed shares.
18
3 Installation and Setup
3.1 Installation
You will require the following previously loaded programs and libraries:
./compile.sh Programs/test_fix_array
You can now jump to Section 3.1.5 (although you might want to read 3.1.4 as well for other compilation tweaks).
19
3.1.2 Installing Rustc
Go to https//rustup.rs to find the best installation command for your platform. For linux this may be...
curl --proto ’=https’ --tlsv1.2 -sSf https://sh.rustup.rs | sh
For our Ubuntu systems we had to use the binary installer marked x86_64-unknown-linux-gnu from the page
https://forge.rust-lang.org/infra/other-installation-methods.html as there was some
incompatibility between curl and our openssl installation.
To get nightly support use
rustup default nightly
# install crypto++
curl -O https://www.cryptopp.com/cryptopp820.zip
unzip cryptopp820.zip -d cryptopp820
cd cryptopp820
make && make install PREFIX=${mylocal}/cryptopp
Now export MPIR, OpenSSL and Crypto++ paths by copying the following lines at the end of your $HOME/.bashrc
configuration file.
20
export PATH="${mylocal}/openssl/bin/:${PATH}"
export C_INCLUDE_PATH="${mylocal}/openssl/include/:${C_INCLUDE_PATH}"
export CPLUS_INCLUDE_PATH="${mylocal}/openssl/include/:${CPLUS_INCLUDE_PATH}"
export LIBRARY_PATH="${mylocal}/openssl/lib/:${LIBRARY_PATH}"
export LD_LIBRARY_PATH="${mylocal}/openssl/lib/:${LD_LIBRARY_PATH}"
• The DEBUG flag is a flag which turns on checking for reading before writing on registers, thus it is mainly a flag
for development testing of issues related to the compiler.
• The DETERMINISTIC flag turns off the use of true randomness. This is really for debugging to ensure we can
replicate errors due. It should not be used in a real system for obvious reasons.
If you are going to use full threshold LSSSs then MAX_MOD needs to be set large enough to deal with the sizes of
the FHE keys. Otherwise this can be set to just above the word size of your secret-sharing modulus to obtain better
performance. As default we have set it for use with full threshold. The value MAX_GFP corresponds to the size of the
prime used for the secret sharing scheme (in 64-bit words).
21
in the sub-directory src. The main things to watch out for here are the various FHE security parameters; these are
explained in more detail in Section 12. Note, to configure the statistical security parameter for the number repre-
sentations in the compiler (integer comparison, fixed point etc) from the default of 40 you need to add the following
commands to your MAMBA programs.
program.security = 100
sfix.kappa = 60
sfloat.kappa = 30
However, in the case of the last two you may also need to change the precision or prime size you are using. See the
documentation for sfix and sfloat for this.
make progs
That’s it! After make finishes then you should see a —PlayerBinary.x— executable inside the SCALE-MAMBA
directory.
22
remembering to set a different Common Name for each player.
In the above we assumed a global shared file system. Obviously on a real system the private keys is kept only in the
Cert-Store of that particular player, and the player public keys are placed in the Cert-Store on each player’s
computer. The global shared file system here is simply for test purposes. Thus a directory listing of Cert-Store
for player one, in a four player installation, will look like
Player1.crt
Player1.key
Player2.crt
Player3.crt
Player4.crt
RootCA.crt
Running the program Setup.x and specifying the secret-sharing method will cause the program to generate files
holding MAC and/or FHE keys and place them in the folder Data. When running the protocol on separate machines,
you must then install the appropriate generated MAC key file MKey-*.key in the Data folder of each player’s
computer. If you have selected full-threshold, you also need to install the file FHE-Key-*.key in the same directory.
You also need to make sure the public data files NetworkData.txt and SharingData.txt are in the directory
Data on each player’s computer. These last two files specify the configuration which you select with the Setup.x
program.
We now provide more detail on each of the three aspects of the program Setup.x.
23
Full Threshold: In this case the prime modulus cannot be chosen directly, but needs to be selected to be FHE-
friendly1 . Hence, in this case we give you two options.
• Recommended: You specify the number of bits in the modulus (between 16 bits and 1024 bits). The system will
then search for a modulus which is compatible with the FHE system we are using.
• Advanced: You enter a specific prime. The system then searches for FHE parameters which are compatible, if
it finds none (which is highly likely unless you are very careful in your selection) it aborts.
After this stage the MAC keys and FHE secret keys are setup and written into the files
MKey-*.key and FHE-Key-*.key
in the Data directory. This is clearly an insecure way of parties picking their MAC keys. But this is only a research
system. At this stage we also generate a set of keys for distributed decryption of a level-one FHE scheme if needed.
NOTE: Section 13 describes how to perform a secure distributed FHE key generation to avoid relying on a trusted
third party as the above does.
Shamir Secret Sharing: Shamir secret sharing we assume is self-explanatory. For the Shamir setting we use an
online phase using the reduced communication protocols of [KRSW18]; the offline phase (currently) only supports
Maurer’s multiplication method [Mau06]. This will be changed in future releases to also support the new offline
method from [SW19].
Replicated Secret Sharing: For Replicated sharing you should enter a complete monotone Q2 access structure.
There are three options to do this,
1. As a set of maximally unqualified sets;
2. As a set of minimally qualified sets;
3. As a simple threshold system.
If the first (resp. second) option is selected, then any set that is neither a superset nor a subset of a set provided as
input will be assumed qualified (resp. unqualified). The last option is really for testing, as most threshold systems
implemented using Replicated secret-sharing will be less efficient than using Shamir. Specifying either the first or
second option and providing input results in the program computing all of the qualified and unqualified sets in the
system.
For Replicated secret-sharing you can also decide what type of offline phase you want: one based on Maurer’s
multiplication method [Mau06], or one based on our Reduced communication protocols [KRSW18].
and
∆+ = {{P1 }, {P2 , P3 }, {P2 , P4 }, {P3 , P4 }}.
We can express this example in the following two ways:
4 parties, maximally unqualified
1 0 0 0
0 1 1 0
0 1 0 1
0 0 1 1
1 In all other cases you select the prime modulus for the LSSS directly at this point.
24
or as
4 parties, minimally qualified
1 1 0 0
1 0 1 0
1 0 0 1
0 1 1 1
As a second example with six parties with a more complex access structure for our Reduced Communication protocol
consider:
and
∆+ = {{P1 , P2 , P4 }, {P1 , P3 , P5 }, {P2 , P3 , P6 }, {P4 , P5 , P6 }}.
Each party is in a different pair of sets. We can represent it via:
6 parties, maximally unqualified sets
1 1 0 1 0 0
1 0 1 0 1 0
0 1 1 0 0 1
0 0 0 1 1 1
Q2-MSP Programs: A final way of entering a Q2 access structure is via a MSP, or equivalently, via the matrix
which defines an underlying Q2 LSSS. For n parties we defineP the number of shares each party has to be ni , and the
dimension of the MSP to be k. The MSP is then given by a ( ni ) × k matrix G. The first n1 rows of G are associated
with player one, the next n2 rows with player two and so on. An secret sharing of a value s is given by the vector of
values
s=G·k
k
P
where k = (ki ) ∈ Fp and ki = s. A subset of parties A is qualified if the span of the rows they control contain the
vector (1, . . . , 1) ∈ Fkp .
This secret sharing scheme can be associated with a monotone access structure, and we call this scheme Q2 if
the associated access structure is Q2. However, it is not the case (unlike for Shamir and Replicated sharing) that
the Q2 MSP is itself multiplicative (which is crucial for our MPC protocols). Thus if you enter a Q2 MSP which is
not multiplicative, we will automatically extend this for you into an equivalent multiplicative MSP using the method
of [CDM00].
As in the Shamir setting we use an online phase using the reduced communication protocols of [KRSW18]; the
offline phase (currently) only supports Maurer’s multiplication method [Mau06]. This will be changed in future
releases to also support the new offline method from [SW19].
In the file Auto-Test-Data/README.txt we provide some examples of MSPs. If we take the MSP for
Shamir (say) over three parties with threshold one we obtain the matrix
1 2
G = 1 3 .
1 4
This is slightly different from the usual Shamir generating matrix as we assume the target vector is (1, 1) and not (1, 0)
as is often the case in Shamir sharing.
25
3.3.3 Conversion Circuit
Here, as most users will not have a VHDL compiler etc, we provide a generic 512 bit conversion circuit for turning
an LSSS sharing into a GC sharing. This needs to be specialised to which ever prime p has been selected above. To
do this we use some C-code which specializes the generic 512 bit circuit for this specific prime chosen, and then tries
to simplify the resulting circuit. This simplification is rather naive, and can take a long time, but usually results in a
circuit ten percent less in size than the original (in terms of the all important AND gates). One can modify how long
is spent in the simplification by changing a variable in config.h.
You can play with writing your own MAMBA programs and running them on the same local host (via IP address
127.0.0.1).
26
4 Simple Example
In this section we describe writing and compiling a simple function using the python-like language MAMBA. We also
explain how to run the test scripts to test the system out, plus the run time options/switches you can pass to Player.x
for different behaviours.
a = sint(1)
b = cint(2)
test(a + b, 3)
test(a + a, 2)
test(a * b, 2)
test(a * a, 1)
test(a - b, -1)
test(a < b, 1)
test(a <= b, 1)
test(a >= b, 0)
test(a > b, 0)
test(a == b, 0)
test(a != b, 1)
clear_a = a.reveal()
a = Array(100, sint)
@for_range(100)
def f(i):
a[i] = sint(i)**2
test(a[99], 99**2)
# conditional
if_then(cint(0))
a[0] = 123
else_then()
27
a[0] = 789
end_if()
test(a[0], 789)
This takes a secret integer a and a clear integer b and applies various operations to them. It then prints tests as to
whether these operations give the desired results. Then an array of secret integers is created and assigned the values
i2 . Finally a conditional expression is evaluated based on a clear value. Notice how the tutorial.mpc file is put
into a directory called tutorial, this is crucial for the running of the compiler.
./Player.x 0 Programs/tutorial
./Player.x 1 Programs/tutorial
./Player.x 2 Programs/tutorial
Note players are numbers from zero, and again we run the directory and not the program file itself.
The test scripts place data in the clear memory dump at the end of a program, and test this cleared memory against a
simulated run. the test-all facility of
Script/test.sh
If a test passes then the program will fail, with a (possibly cryptic) explanation of why.
We also provide a script which tests the entire system over various access structures. This can take a very long
time to run, but if you want to run exhaustive tests then in the main directory execute
./run_tests.sh
Nigel when looking at the Harwell WITCH computer at TMNOC. They in some sense correspond to “largish” basic blocks in modern programming
languages.
28
• -pns x1 , . . . , xn : This overides the pnb option, and sets the listening portnumber for player i to xi . The same
arguments must be supplied to each player, otherwise the players do not know where to connect to, and if this
option is used there needs to be precisely n given distinct portnumbers.
• -mem xxxx: Where xxxx is either old or empty. The default is empty. See later for what we mean by
memory.
• -verbose n: Sets the verbose level to n. The higher value of n the more diagnostic information is printed. This
is mainly for our own testing purposes, but verbose level one gives you a method to time offline production (see
the Changes section for version 1.1). If n is negative then the byte-codes being executed by the online phase are
output (and no offline verbose output is produced).
• -max m,s,b: Stop running the offline phase for each online thread when we have generated m multiplication
triples, s square pairs and b shared bits.
• -min m,s,b: Do not run the online phase in each thread until the associated offline threads have generated m
multiplication triples, s square pairs and b shared bits. However, these minimums need to be less than the
maximum sacrificed list sizes defined in config.h. Otherwise the maximums defined in that file will result in
the program freezing.
• -maxI i: An issue when using the flag -max is that for programs with a large amount of input/output -max can
cause the IO queue to stop being filled. Thus if you use -max and are in this situation then signal using this
flag an upper bound on the number of amount IO data you will be consuming. We would recommend that you
multiply the max amount per player by the number of players here.
• -dOT: Disable threads 20000 and 20001. Historically these were the OT generation threads, hence the flag
name. But in the case of Shamir/Replicated they do not use OT operations anymore. Disabling these threads
can increase speed, reduce memory, but means it will mean all operations based on binary circuits will no longer
work.
• -f 2: The number of FHE factories to run in parallel. This only applies (obviously) to the Full Threshold
situation. How this affects your installation depends on the number of cores and how much memory you have.
We set the default to two.
For example by using high values of the variables set to -min you get the offline data queues full before you trigger
the execution of the online program. For a small online program this will produce times close to that of running the
offline phase on its own. Or alternatively you can stop these queues using -max. By combining the two together you
can get something close to (but not exactly the same as) running the offline phase followed by the online phase.
Note that the flags -max, -min and -maxI do not effect the offline production of aBits via the OT thread. Since
this is usually quite fast in filling up its main queue.
29
5 Run Time Subsystem
This chapter describes the implementation of the MPC Virtual Machine. The virtual machine is driven by byte-codes
which are produced by the MAMBA compiler (see later). Of course you could compile byte-codes from any compiler
if you wanted to write one with the correct backend.
The virtual machine structure resembles that of a simple multi-core processor, and is a register-based machine.
Each core corresponds to a seperate online thread of execution; so from now on we refer to these “cores” as threads.
Each thread has a seperate set of five types of registers, as well as a stack for each register type. To allow the saving
of state, or the transfer of data between threads, there is a global memory. This global memory for the cint, sint
and regint types (or at least the first 220 values) are saved whenever the SCALE system gracefully shuts down. The
loading of this saved memory into a future run of the system is controlled by the command line arguments passed
to the Player.x program. The design is deliberately kept sparse to ensure a fast, low-level implementation, whilst
more complex optimization decisions are intended to be handled at a higher level.
5.1 Overview
The core of the virtual machine is a set of threads, which execute sequences of instructions encoded in a byte-code
format. Files of byte-code instructions are referred to as tapes, and each thread processes a single tape at a time. Each
of these execution threads has a pairwise point-to-point communication channels with the associated threads in all
other players’ runtime environments. These communication channels, like all communication channels in SCALE, are
secured via TLS. The threads actually have three channels to the correspond thread in the other parties; we call these
different channels “connections”. For the online thread “connection” zero is used for standard opening of shared data,
whereas “connection” one is used for private input and output. Connection two is used for all data related to aBits,
aANDs and garbled circuit computations. This division into different connections is to avoid conflicts beween the three
usages (for example a PRIVATE_OUTPUT coming between a STARTOPEN and a STOPOPEN). Each online thread
is supported by four other threads performing the offline phase, each again with pairwise TLS secured point-to-point
channels. Currently the offline threads only communicate on “connection” zero. When using an HSS based online
phase for processing binary circuits a single offline thread produces authenticated bits, aBits, by OT-extension, whilst
another produced authenticated triples for GC operations, so called aANDs. When using a non-HSS based online
phase for processing binary circuits there is a single thread producing triples modulo two.
In the case of Full Threshold secret sharing another set of threads act as a factory for FHE ciphertexts. Actively
secure production of such ciphertexts is expensive, requiring complex zero-knowledge proofs (see Section 12). Thus
the FHE-Factory threads locates this production into a single location. The number of FHE Factory threads can be
controlled at run-time by the user. See Figure 1 for a pictorial overview in the case of full-threshold execution (which
is the most complex).
In addition to byte-code files, each program to be run must have a schedule. This is a file detailing the execution
order of the tapes, and which tapes are to be run in parallel. There is no limit to the number of concurrent tapes
specified in a schedule, but in practice one will be restricted by the number of cores. The schedule file allows you
to schedule concurrent threads of execution, it also defines the maximum number of threads a given run-time system
will support. It also defines the specific byte-code sequences which are pre-loaded into the system. One can also
programmatically control execution of new threads using the byte-code instructions RUN_TAPE and JOIN_TAPE
(see below for details). The schedule is run by the control thread. This thread takes the tapes to be executed at a given
step in the schedule, passes them to the execution threads, and waits for the threads to finish their execution before
proceeding to the next stage of the schedule.
Communication between threads is handled by a global main memory, which all threads have access to. To avoid
unnecessary stalls there is no locking mechanism provided to the memory. So if two simultaneously running threads
execute a read and a write, or two writes, to the same memory location then the result is undefined since it is not
specified as to which order the instructions will be performed in. Memory comes in four forms, corresponding to
sint, cint, regint and sregint data types. There is no memory for the sbit datatype, as it is meant only for
temporary storage of data.
Each execution thread also has its own local clear and secret registers, to hold temporary variables. To avoid
30
-Mult Triples
Q
Q
Q
Q
Q
-Square PairsP Q
PP Q
PP Q
PP Q s
Q
aAND P
q
P
- Bits Pairs - Online Two
*
aBit
- Inputs
-Mult TriplesP
FHE Fact 2 PP
P
@ PP
@ PPq
P
@
-Square Pairs - Online One
FHE Fact 1
1
- Bits Pairs
- Inputs
Figure 1: Pictorial View of a Players Threads: With Two Online Threads and Two FHE Factory Threads
confusion with the main memory we refer to these as registers. The values of registers are not assumed to be maintained
between an execution thread running one tape and the next tape, so all passing of values between two sequential tape
executions must be done by reading and writing to the virtual machine’s main memory. This holds even if the two
consequtive byte-code sequences run on the same “core”. A pictorial representation of the memory and registers is
given in Figure 2.
31
Thread Two
sint sint
cint cint
Registers
Stacks
sregint sregint
regint regint
Main Memory sbit sbit
sint
cint
sregint Thread One
cint cint
Registers
Stacks
sregint sregint
regint regint
sbit sbit
Figure 2: Pictorial Representation of Memory, Registers and Stacks: With Two Online Threads
this is done by taking the opcode being a 32 bit value. The last nine bits being the base opcode and previous 23 bits
being how many times the instruction should be executed in parallel. If this first 23 bits are zero, then this is interpreted
as one. The arguments of a vectorized instruction given the first of the consecutive registers which should be accessed
in turn. Arguments to instructions in the following table can have various types, in the documentation below a * in
front of a value like this means the value is repeated a number of times.
Notation Meaning
’c’ Clear Modp Register a.k.a. cint (Read Only),
’cw’ Clear Modp Register (Write Only),
’s’ Secret Modp Register a.k.a sint (Read Only),
’sw’ Secret Modp Register (Write Only),
’r’ Clear RegInt Register a.k.a. regint (64-bit value) (Read Only),
’rw’ Clear RegInt Register (64-bit value) (Write Only),
’sb’ A secret bit (Read Only)
’sbw’ A secret bit (Write Only)
’sr’ A secret RegInt (64-bit value) (Read Only)
’srw’ A secret RegInt (64-bit value) (Write Only)
’i’ Integer Value Possibly Signed
’int’ Integer Value Unsigned
’str’ String
We can divide the memory registers over which we operate in two main categories. Registers that use mod p arith-
metic, and those who use mod 264 arithmetic instead. Each of these categories includes two varieties, one for secret
and other for clear data. In the case of mod p, these varieties are sint and cint; and are denoted by S[i], C[i].
Whereas, for mod 264 , the varieties are sregint and regint; and are denoted by SR[i] and R[i]. In summary:
32
Notation Meaning
S[i] sint memory
C[i] cint memory
R[i] regint memory
SR[i] sregint memory
As explained above, whilst all registers are thread local, global memory comes in three variants, which are not thread
locked
Basic Load/Store/Move mod 264 Instructions: We have 2 basic extra instructions for secret types LDSINT,
MOVSINT; and two for clear registers LDINT, MOVINT.
Basic Load mod 2 Instruction: We have one basic instruction for the secret bit type sbit denoted as LDSBIT.
Loading to/from Memory in mod p: LDMC, LDMS, STMC, STMS, LDMCI, LDMSI, STMCI, STMSI.
Loading to/from Memory in mod 264 : For secret types we have the following instructions: LDMSINT, LDMSINTI,
STMSINT and STMSINTI. For clear registers we have the following: LDMINT, STMINT, LDMINTI and STMINTI.
Allocating and Deallocating Memory: You can allocate and deallocate memory using the instructions: NEWC,
NEWS, NEWINT, NEWSINT, DELETEC, DELETES, DELETEINT, DELETESINT. Note loads and stores are not
checked to come from allocated memory. New and Delete are thread locking, but loads and stores are not. The
value of the memory location returned by a NEW instruction is not gauranteed to be the same for different players
(although it will be for single threaded execution).
33
5.2.3 Data Conversion
To convert from mod p to integer values and back we provide the conversion routines. CONVINT, CONVMODP.
These are needed as the internal mod p representation of clear data is in Montgomery representation. To con-
vert between mod p and mod 264 types, we have the following instructions: CONVSINTSREG, CONVREGSREG,
CONVSREGSINT and CONVSUREGSINT These conversions are necessary to allow a smooth transition between the
secret sharing and the modulo-two engines. These execute the conversion routines using daBits from [RW19].
Instructions for mod p registers The process of opening secret values is covered by two instructions. The STARTOPEN
instruction takes as input a set of m shared registers, and STOPOPEN an associated set of m clear registers, where m
can be an arbitrary integer. This initiates the protocol to reveal the m secret shared register values, storing the result in
the specified clear registers. The reason for splitting this into two instructions is so that local, independent operations
may be placed between a STARTOPEN and STOPOPEN, to be executed whilst waiting for the communication to finish.
There is no limit on the number of operands to these instructions, allowing for communication to be batched into a
single pair of instructions to save on network latency. However, note that when the RunOpenCheck function in the
C++ class Open_Protocol is used to check MACs/Hashes then this can stall when the network buffer fills up, and
hang indefinitely. On our test machines this happens when opening around 10000 elements at once, so care must be
taken to avoid this when compiling or writing byte-code (the Python compiler could automatically detect and avoid
this).
Instructions for mod 2n registers When operating on mod 264 , there are two register types that need to be open.
In that sense we have a simplified process with two instructions, one for each type, namely OPENSINT for sregint
and OPENSBIT for sbit.
• The LDARG instruction loads an argument that was passed when the current thread was called. Thread arguments
are optional and consist of a single integer, which is specified in the schedule file that determines the execution
order of tapes, or via the instruction RUN_TAPE.
• The STARG allows the current tape to change its existing argument.
• To run a specified pre-loaded tape in a given thread, with a given argument the RUN_TAPE command is executed.
• To wait until a specified thread has finished one executes the JOIN_TAPE function.
34
5.2.7 Basic Arithmetic
This is captured by the following instructions, with different instructions being able to be operated on clear, shared and
integer types. For mod p registers: ADDC, ADDS, ADDM, ADDCI, ADDSI, SUBC, SUBS, SUBML, SUBMR, SUBCI,
SUBSI, SUBCFI, SUBSFI, MULC, MULM, MULCI, and MULSI.
In the case for mod 264 registers, we have the following instructions which work on either sregint registers
or a combination of sregint and regint registers. ADDSINT, ADDSINTC, SUBSINT, SUBSINTC, SUBCINTS,
MULSINT, MULSINTC, DIVSINT, SHLSINT, SHRSINT, NEGS. Plus we also have the following instructions ANDINT,
XORINT, ORINT, INVINT, ADDINT, SUBINT, and MULINT, which work on a regint registers.
There is also an instruction MUL2SINT to access a full 64 × 64 −→ 128 bit multiplier for sregint values.
Instructions for mod 264 For the case of mod 264 instructions, we have extended support to some logic operators
that work on mod 2. This functionality is specifically supported by sbit. The instructions are the following:
SAND, XORSB, ANDSB, ORSB, XORSB and NEGB.
We also implement a number of bitwise logical operations on the 64-bit sregint and regint variables. These
are ANDSINT, ANDSINTC, ORSINT, ORSINTC, XORSINT, XORSINTC, and INVSINT. You can extract bits from
an sregint variable using the instruction BITSINT, and assign an sbit to a specific bit location in an sregint
by using the instruction SINTBIT or SETBIT.
Note: The choices of byte-codes here is a bit of a legacy issue. It would make more sense to move almost all the
mod p byte-codes (bar the Legendre symbol one) to operate on regint values only; since they really make no
sense for mod p values. However, this would break a lot of legacy code. So we keep it as it is, for the moment. At
some point when (and if) we build a proper compiler and language we will not have legacy issues to support, and the
byte-codes can then change to something more sensible.
35
5.2.11 Branching
Branching is supported by the following instructions JMP, JMPNE, JMPEQ, and JMPR.
5.2.12 Call/Return
Call and return to subroutines is supported by the following instructions CALL, CALLR and RETURN. These push and
pop the program counter onto the stack of mod 264 clear registers.
• REQBL this is output by the compiler to signal that the tape requires a minimal bit length. This forces the
runtime to check the prime p satisfies this constraint.
• CRASH this enables the program to create a crash, if the programmer is feeling particularly destuctive.
• RAND this loads a pseudo-random value into a clear register. This is not a true random number, as all parties
output the same random number at this point.
• RESTART which restarts the online runtime. See Section 9 for how this intended to be used.
• CLEAR_MEMORY which clears the current memory. See Section 9 for more details on how this is used.
• CLEAR_REGISTERS which clears the registers of this processor core (i.e. thread). See Section 9 for more
details on how this is used.
• START_CLOCK and STOP_CLOCK are used to time different parts of the code. There are 100 times avail-
able in the system; each is initialized to zero at the start of the machine running. The operation START_CLOCK
re-initializes a specified timer, whereas STOP_CLOCK prints the elapsed time since the last initialization (it does
not actually reinitialise/stop the timer itself). These are accessed from MAMBA via the functions start_timer(n)
and stop_timer(n). The timers use an internal class to measure time command in the C runtime.
36
LDI 0x1 [’cw’, ’i’] LDI ci, n
Assigns (loads) clear register ci the value
n.
LDSI 0x2 [’sw’, ’i’] LDSI si, n
Assigns secret register si a share of the
value n.
LDMC 0x3 [’cw’, ’int’] LDMC ci, n ?(r)
Assigns clear register ci the value in
memory C[n].
LDMS 0x4 [’sw’, ’int’] LDMS si, n ?(r)
Assigns secret register si the value in
memory S[n].
STMC 0x5 [’c’, ’int’] STMC ci, n ?(w)
Sets memory C[n] to be the value in clear
register ci.
STMS 0x6 [’s’, ’int’] STMS si n ?(w)
Sets memory S[n] to be the value in se-
cret register si.
LDMCI 0x7 [’cw’, ’r’] LDMCI ci, rj ?(r)
Assigns clear register ci the value in clear
memory R[rj], where rj is the j-th regint
register.
LDMSI 0x8 [’sw’, ’r’] LDMSI si, rj ?(r)
Assigns secret register si the value in se-
cret memory S[rj], where rj is the j-th
regint register.
STMCI 0x9 [’c’, ’r’] STMCI ci, rj ?(w)
Sets clear memory C[rj] to be the value in
clear register ci, where rj is the j-th regint
register.
STMSI 0xA [’s’, ’r’] STMSI si, rj ?(w)
Sets secret memory S[rj] to be the value
in secret register si, where rj is the j-th
regint register.
MOVC 0xB [’cw’, ’c’] MOVC ci, cj
Assigns clear register ci the value in the
clear register cj.
MOVS 0xC [’sw’, ’s’] MOVS si, sj
Assigns secret register si the value in the
secret register sj.
MOVINT 0xD [’rw’, ’r’] MOVINT ri, rj
Assigns regint register ri the value in the
regint register rj.
MOVSB 0xE [’sbw’, ’sb’] MOVSB sbi, sbj
Assigns sbit register sbi the value in the
sbit register sbj.
LDMINT 0xCA [’rw’, ’int’] LDMINT ri, n ?(r)
Assigns regint register ri the value in
memory R[n].
37
STMINT 0xCB [’r’, ’int’] STMINT ri, n ?(w)
Sets regint memory R[n] to be the value
ri.
LDMINTI 0xCC [’rw’, ’r’] LDMINTI ri, rj ?(r)
Assigns regint register ri the value in
memory R[rj], where rj is the j-th regint
register.
STMINTI 0xCD [’r’, ’r’] STMINTI ri, rj ?(w)
Sets regint memory R[rj] to be the value
in regint register ri, where rj is the j-th
regint register.
5.3.2 Machine
38
CLEAR MEMORY 0x1D [] CLEAR MEMORY †
Clears the main memory. This can cause
problems if executed in one thread and
the memory is still being used in another.
It is really for usage in thread zero, when
all other threads are doing nothing. Say
before a RESTART
CLEAR REGISTERS 0x1E [] CLEAR REGISTERS †
Like CLEAR MEMORY but this clears
the registers of the current processor, i.e.
within the current thread.
5.3.3 Addition
39
5.3.4 Multiplication/division/other arithmetic
5.3.5 IO
40
PRIVATE INPUT 0x44 [’sw’, ’r’, ’r’] PRIVATE INPUT si rj rk c1, †, τ0
Private input from player rj on channel rk
assign result to secret si.
PRIVATE OUTPUT 0x46 [’s’, ’r’, ’r’] PRIVATE OUTPUT si rj rk c1, †, τ0
Private output to player rj on channel rk
of secret si.
OUTPUT INT 0x48 [’r’, ’r’] OUTPUT INT ri rj †, τ0
Public output of regint register ri to IO
class on channel rj.
INPUT INT 0x49 [’rw’, ’r’] INPUT INT ri rj †, τ0
Gets regint public input ri from the IO
class on channel rj. Public inputs need
to be the same for all players running the
protocol, otherwise a crash will occur.
OPEN CHANNEL 0x4A [’rw’, ’r’] OPEN CHANNEL ri rj †, τ0
Opens channel number rj for read-
ing/writing on the IO class. Channels
are assumed to be bi-directional, i.e. can
read and write. This is provided as some
IO classes may require this to be called
explicitly, the default one does not need
this. The return value ri can be some er-
ror code which the IO class may want to
return.
CLOSE CHANNEL 0x4B [’r’] CLOSE CHANNEL i †, τ0
Closes channel number ri for read-
ing/writing on the IO class. This is pro-
vided as some IO classes may require
this to be called explicitly, the default one
does not need this.
MPRIVATE INPUT 0x4C [’r’, ’r’, ’r’, ’r’] MPRIVATE INPUT ri rj rk rl c1, †, τ0 , ?(r), ?(w)
Private input of rj items from player rk
on channel rl assigning the result to sint
memory [ri+t] for t=0...rj-1.
MPRIVATE OUTPUT 0x4D [’r’, ’r’, ’r’, ’r’] MPRIVATE OUTPUT ri rj rk rl c1, †, τ0 , ?(r), ?(w)
Private output of rj items from player rk
on channel rl outputing the values in sint
memory [ri+t] for t=0...rj-1.
5.3.6 Open
41
OPENSBIT 0xA3 [’rw’, ’sb’] OPENSBIT ri sbj c2
Open the sbit in sbj and assign it to ri.
XXXX We will want to change this in fu-
ture XXXXX
42
5.3.9 Bitops on regints
43
SUBSINTC 0x69 [’srw’, ’sr’, ’r’] SUBSINTC sri srj rk c2
Subtracts clear from secret register
sri=srj-rk.
SUBCINTS 0x6A [’srw’, ’r’, ’sr’] SUBSINTC sri srj rk c2
Subtracts secret from clear register sri=rj-
srk.
MULSINT 0x6B [’srw’, ’sr’, ’sr’] MULSINT sri srj srk c2
Multiplies secret regint registers sri=srj *
srk.
MULSINTC 0x6C [’srw’, ’sr’, ’r’] MULSINTC sri srj rk c2
Multiplies secret and clear regint regis-
ters sri=srj * rk.
DIVSINT 0x6D [’srw’, ’sr’, ’sr’] DIVSINT sri srj srk c2
Divisrion of secret regint registers sri=srj
* srk.
SHLSINT 0x6E [’srw’, ’sr’, ’int’] SHLSINT sri srj k
Shift an sregint register left by k values
SHRSINT 0x6F [’srw’, ’sr’, ’int’] SHRSINT sri srj k
Shift an sregint register right by k values
44
SAND 0x78 [’srw’, ’sr’, ’sb’] SAND sri srj sbk c2
ANDs the sregint with the sbit (in all bit
positions) sri= srj and sbk.
XORSB 0x79 [’sbw’, ’sb’, ’sb’] XORSB sbi sbj sbk
Secret XOR of sbit registers sbi = (sbj xor
sbk).
ANDSB 0x7A [’sbw’, ’sb’, ’sb’] ANDSB sbi sbj sbk c2
Secret AND of sbit registers sbi = (sbj
and sbk).
ORSB 0x7B [’sbw’, ’sb’, ’sb’] ORSB sbi sbj sbk c2
Secret OR of sbit registers sbi = (sbj or
sbk).
NEGB 0x7C [’sbw’, ’sb’] NEGB sbi sbj
Secret NEG of sbit register sbi = 1-sbj.
45
JMPEQ 0x92 [’r’, ’int’, ’int’] JMPEQZ ri j n ‡
Jump of n+1 instructions if regint register
ri is equal to j.
EQZINT 0x93 [’rw’, ’r’] EQZINT ri rj
Clear comparison to zero test of regint
register ri = (rj == 0).
LTZINT 0x94 [’rw’, ’r’] LTZINT ri rj
Clear comparison of regint registers ri =
(rj ¡ 0).
LTINT 0x95 [’rw’, ’r’, ’r’] LTINT ri rj rk
Clear comparison of regint registers ri =
(rj ¡ rk).
GTINT 0x96 [’rw’, ’r’, ’r’] GTINT ri rj rk
Clear comparison of regint registers ri =
(rj ¿ rk).
EQINT 0x97 [’rw’, ’r’, ’r’] EQINT ri rj rk
Clear comparison of regint registers ri =
(rj == rk).
CALL 0x14 [’int’] CALL n ‡
Pushes the PC onto the stack, and does a
relative jump of n+1 instructions
RETURN 0x15 [] RETURN ‡
Pops the top element off the stack, and
assigns the PC to this value
CALLR 0x16 [’r’] CALLR i ‡
Pushes the PC onto the stack, and then
jumps to instruction at position ri
JMPR 0x17 [’r’] JMPR i ‡
Unconditional jump to instruction at ad-
dress ri.
5.3.14 Integers
46
5.3.15 Conversion
47
PRINT CHAR4 REGINT 0xB6 [’r’] PRINT CHAR4 REGINT ri †, τ0
Prints regint ri as a four single character
string to the debug IO channel.
PRINT FLOAT 0xB7 [’c’, ’c’, ’c’, ’c’, PRINT FLOAT ci cj ck cl cm †, τ0
’c’] Prints the floating point number in clear
registers (ci, cj, ck, cl, cm) assuming they
map to the representation (v,p,z,s,err)
PRINT FIX 0xB8 [’c’, ’i’, ’i’] PRINT FIX ci f k †, τ0
Prints the floating point number in clear
register ci using parameters f and k.
PRINT INT 0xB9 [’r’] PRINT INT ri †, τ0
Prints the value of register ri to debug IO
channel.
PRINT IEEE FLOAT 0xBA [’r’] PRINT IEEE FLOAT ri †, τ0
Prints the value of register ri to debug IO
channel as a double
48
INVSINT 0xD9 [’srw’, ’sr’] INVSINT sri srj
Bitwise inversion of the register sri =
INV srj.
5.3.21 Others
49
RANDC 0xE3 [’cw’] RANDC ci
Writes to the cint register ci a random
value mod p The random value is the
same for all players, so in particular it is
not really random. More useful for ran-
domization for Monte-Carlo algorithms
RANDINT 0xE4 [’rw’] RANDINT ri
Writes to the regint register ri a random
value in the range [0, .., 264 − 1] The
random value is the same for all play-
ers, so in particular it is not really ran-
dom. More useful for randomization for
Monte-Carlo algorithms
RANDSINT 0xE5 [’srw’] RANDSINT sri
Writes to the sregint register ri a (secret)
random value in the range [0, .., 264 − 1]
RANDFLOAT 0xE6 [’rw’] RANDFLOAT ri
Writes to the regint register ri the IEEE
representation of a floating point value in
the range [0,1) The random value is the
same for all players, so in particular it is
not really random. More useful for ran-
domization for Monte-Carlo algorithms
RANDSBIT 0xE7 [’sbw’] RANDSBIT sbi
Writes to the sregint register sbi a (secret)
random bit
50
GETSPINT 0x104 [’rw’] GETSPINT ri ?(r)
Assigns the current stack pointer on the
regint stack to register ri.
PUSHSINT 0x105 [’sr’] PUSHSINT ri ?(w)
Push secret regint register si onto the
thread local stack
POPSINT 0x106 [’srw’] POPSINT ri ?(r), ?(w)
Pop secret regint register si from the
thread local stack
PEEKSINT 0x107 [’srw’, ’r’] PEEKSINT si, rj ?(r)
Peeks at position pointed to by register rj
from the thread-local secret regint stack
and assigns to secret regint register si.
POKESINT 0x108 [’r’, ’sr’] POKESINT ri, sj ?(w)
Replaces the data item pointed to by reg-
ister ri on the thread-local secret regint
local stack with the contents of register
sj.
GETSPSINT 0x109 [’rw’] GETSPSINT ri ?(r)
Assigns the current stack pointer on the
secret regint stack to register ri.
PUSHSBIT 0x10A [’sb’] PUSHSBIT ri ?(w)
Push secret bit register sbi onto the thread
local stack
POPSBIT 0x10B [’sbw’] POPSBIT ri ?(r), ?(w)
Pop secret bit register sbi from the thread
local stack
PEEKSBIT 0x10C [’sbw’, ’r’] PEEKSBIT sbi, rj ?(r)
Peeks at position pointed to by register rj
from the thread-local secret bit stack and
assigns to secret bit register sbi.
POKESBIT 0x10D [’r’, ’sb’] POKESBIT ri, sbj ?(w)
Replaces the data item pointed to by reg-
ister ri on the thread-local secret bit local
stack with the contents of register sbj.
GETSPSBIT 0x10E [’rw’] GETSPSBIT ri ?(r)
Assigns the current stack pointer on the
secret bit stack to register ri.
PUSHC 0x110 [’c’] PUSHC ri ?(w)
Push clear register ci onto the thread local
stack
POPC 0x111 [’cw’] POPC ri ?(r), ?(w)
Pop clear register ci from the thread local
stack
PEEKC 0x112 [’cw’, ’r’] PEEKC ci, rj ?(r)
Peeks at position pointed to by register rj
from the thread-local clear stack and as-
signs to clear register ci.
51
POKEC 0x113 [’r’, ’c’] POKEC ri, cj ?(w)
Replaces the data item pointed to by reg-
ister ri on the thread-local clear local
stack with the contents of register cj.
GETSPC 0x114 [’rw’] GETSPC ri ?(r)
Assigns the current stack pointer on the
clear stack to register ri.
PUSHS 0x115 [’s’] PUSHS ri ?(w)
Push secret register si onto the thread lo-
cal stack
POPS 0x116 [’sw’] POPS ri ?(r), ?(w)
Pop secret register si from the thread lo-
cal stack
PEEKS 0x117 [’sw’, ’r’] PEEKS si, rj ?(r)
Peeks at position pointed to by register
rj from the thread-local secret stack and
assigns to secret register si.
POKES 0x118 [’r’, ’s’] POKES ri, sj ?(w)
Replaces the data item pointed to by reg-
ister ri on the thread-local secret local
stack with the contents of register sj.
GETSPS 0x119 [’rw’] GETSPS ri ?(r)
Assigns the current stack pointer on the
secret stack to register ri.
RPEEKINT 0x1F0 [’rw’, ’r’] RPEEKINT ri, rj ?(r)
Peeks at position pointed to by stack -
pointer-rj from the thread-local regint
stack and assigns to regint register ri.
RPOKEINT 0x1F1 [’r’, ’r’] RPOKEINT ri, rj ?(w)
Replaces the data item pointed to by
stack pointer-ri on the thread-local regint
local stack with the contents of register
rj.
RPEEKSINT 0x1F2 [’srw’, ’r’] RPEEKSINT si, rj ?(r)
Peeks at position pointed to by stack -
pointer-rj from the thread-local secret
regint stack and assigns to secret regint
register si.
RPOKESINT 0x1F3 [’r’, ’sr’] RPOKESINT ri, sj ?(w)
Replaces the data item pointed to by
stack pointer-ri on the thread-local secret
regint local stack with the contents of
register sj.
RPEEKSBIT 0x1F4 [’sbw’, ’r’] RPEEKSBIT sbi, rj ?(r)
Peeks at position pointed to by stack -
pointer-rj from the thread-local secret bit
stack and assigns to secret bit register sbi.
52
RPOKESBIT 0x1F5 [’r’, ’sb’] RPOKESBIT ri, sbj ?(w)
Replaces the data item pointed to by
stack pointer-ri on the thread-local secret
bit local stack with the contents of regis-
ter sbj.
RPEEKC 0x1F6 [’cw’, ’r’] RPEEKC ci, rj ?(r)
Peeks at position pointed to by stack -
pointer-rj from the thread-local clear
stack and assigns to clear register ci.
RPOKEC 0x1F7 [’r’, ’c’] RPOKEC ri, cj ?(w)
Replaces the data item pointed to by ri on
the thread-local clear local stack with the
contents of register cj.
RPEEKS 0x1F8 [’sw’, ’r’] RPEEKS si, rj ?(r)
Peeks at position pointed to by stack -
pointer-rj from the thread-local secret
stack and assigns to secret register si.
RPOKES 0x1F9 [’r’, ’s’] RPOKES ri, sj ?(w)
Replaces the data item pointed to by
stack pointer-ri on the thread-local secret
local stack with the contents of register
sj.
53
MMODC 0x136 [’r’, ’r’, ’r’, ’r’] MMODC ri rj rk rn ?(r), ?(w)
C[ri+t] = C[rj+t] % C[rk+t] for t=0...(rn-
1).
MREVC 0x138 [’r’, ’r’, ’r’] MREVC ri rj rn ?(r), ?(w)
Reverses the array, as in C[ri+t] =
C[rj+rn-1-t] for t=0...(rn-1).
MREVS 0x139 [’r’, ’r’, ’r’] MREVS ri rj rn ?(r), ?(w)
Reverses the array, as in S[ri+t] =
S[rj+rn-1-t] for t=0...(rn-1).
MEVALCC 0x13A [’cw’, ’r’, ’r’, ’c’] MEVALCC ci rj rn ck ?(r)
Evaluates the polynomial
ci = sum_{t=0}ˆ{rn-1} C[rj+t]*ckˆt.
MEVALSC 0x13B [’sw’, ’r’, ’r’, ’c’] MEVALSC si rj rn ck ?(r)
Evaluates the polynomial
si = sum_{t=0}ˆ{rn-1} S[rj+t]*ckˆt.
MBITDECC 0x13C [’r’, ’c’, ’r’] MBITDECC ri cj rk ?(w)
Takes cint register cj and decomposes it
into rk bits and places them in C[ri+t] for
t=0...rk-1.
MBITDECINT 0x13D [’r’, ’r’, ’r’] MBITDECC ri rj rk ?(w)
Takes register rj and decomposes it into
rk bits and places them in R[ri+t] for
t=0...rk-1.
MBITDECCS 0x13E [’r’, ’c’, ’r’] MBITDECC ri cj rk ?(w)
Takes cint register cj and decomposes it
into rk bits and places them in C[ri+t] for
t=0...rk-1. Assumes cj is signed, i.e. if
cj¿p/2 then this bit-decomposes cj-p
5.3.25 Notes
• Instructions marked with a ‡ are ones which signal an end of basic-block, these are usually jump instructions.
• Instructions marked with a † are ones which are barrier instructions, in terms of merging of instructions, see
below for how these affect optimization. Note a ‡ implies a †.
• Instructions marked with a ?(r) (resp. ?(w)) are memory read (resp. write) instructions.
• Instructions marked with cn involve communication on player channel n. Note this is not the same as the VM
channel n controlled by the OPEN/CLOSE CHAN commands. [Note to self really to ensure no conflicts occur
when we do other things in future].
• Instructions marked with τ0 can only be exeucuted in thread zero.
54
6 Compiler Pipeline
6.1 Getting Started
6.1.1 Setup
Dependencies: To run the compiler, the following Python packages are required:
• networkx — for graph algorithms
If these are not available on your system, you should be able to install them via easy_install:
You will also need to install Rust if you want to use the new compiler pipeline, as opposed to the original one.
Directory structure: The Compiler package should be located in the same root directory as the compile.sh
script, alongside a folder Programs with sub-folders for actual programs. The final directory structure should look
like:
root_dir/
|-- compile.sh
|-- Compiler/
| |-- ...
| |-- ...
|-- Programs/
| |-- some_program/
| | |-- some_program.mpc
MAMBA programs have the extension .mpc and each MAMBA program is placed in its own sub-directory within
the directory Programs.
compile.sh Programs/some_program
The compiled byte-code and schedule files are then stored in the same directory as the main program. Depending
on whether you have edited the compile.sh script or not, this will execute either the new or the old compiler
pipeline. The old pipeline uses the python program compiler-mamba.py to execute the entire pipeline, indeed
this old program can be called independently with various switches (see below) if you so desire. The new pipeline
uses the same program to generate just the assembly, then the assembly is passed to the SCALE-assembler scasm
for processing. The scasm program we believe to be more robust than the original python program. It is a little slow
however, we are working on speeding this up.
See Section 6.4 for specifically how the new compiler pipeline works. There are currently two flags you can pass
to the new compiler -D, which is passed to the MAMBA sub-compiler and -OX where X equals one of 0,1,2 or 3,
which is a flag passed to the scasm assembler.
55
-n --nomerge
Don’t optimize the program by merging independent rounds of communication.
-o --output <name>
Specify a prefix name- for output byte-code files (defaults to the input file name).
-d --debug
Keep track of the call stack and display when an error occurs.
-a --asm-output
Produces ASM output file for potential debugging use.
-D --dead-code-elimination
This eliminates instructions which produce an unused result.
-h --help
Print all available options.
There are a number of other options which are mainly for testing or developers purposes. These are given by
-r --noreorder
-M --preserve-mem-order
-u --noreallocate
-m -max-parallel-open <MAX_PARALLEL_OPEN>
-P --profile
-s --stop
We refer the reader to the compiler help for usage of these compiler options.
56
Under Over Flow flag: True
Compiling file Programs/tutorial/tutorial.mpc
Compiling basic block tutorial-0--0
Compiling basic block tutorial-0-begin-loop-1
Compiling basic block tutorial-0-end-loop-2
Compiling basic block tutorial-0-if-block-3
Compiling basic block tutorial-0-else-block-4
Compiling basic block tutorial-0-end-if-5
Processing tape tutorial-0 with 6 blocks
Processing basic block tutorial-0--0, 0/6, 12266 instructions
Eliminate dead code...
Eliminated 2 dead instructions, among which 1 opens
Processing basic block tutorial-0-begin-loop-1, 1/6, 21 instructions
Processing basic block tutorial-0-end-loop-2, 2/6, 18 instructions
Eliminated 1 dead instructions, among which 0 opens
Processing basic block tutorial-0-if-block-3, 3/6, 2 instructions
Processing basic block tutorial-0-else-block-4, 4/6, 2 instructions
Processing basic block tutorial-0-end-if-5, 5/6, 14 instructions
Not merging open instructions in tape tutorial-0
Compile offline data requirements...
Tape requires 607 triples in modp, 100 squares in modp, 628 bits in modp
Tape requires prime bit length 106
Program requires: {(’modp’, ’triple’): 607, (’modp’, ’square’): 100, (’modp’, ’bit’): 628}
Memory size: defaultdict(<function <lambda> at 0x7fc985e842a8>, {’sr’: 8192, ’c’: 8192, ’r’
Writing to Programs/tutorial/tutorial.sch
Writing to Programs/tutorial//tutorial-0.asm
Now running
./scasm Programs/tutorial/
The compilation output in this case corresponds to a 3-party scenario using Shamir Secret Sharing for both, online
and offline phases. The output introduces first some program level parameters, followed by the rolling of the tape
(converting all operations to byte-code instructions), and offline data requirements. We now proceed to analyze the
output more into detail.
Prime Size
Is the bit size of the prime p above.
57
particular see the value k below in Section 7.1.1. Because of mechanisms such as comparisons, implemented under
statistical security parameters, the input size has to be adjusted so some power of 2 smaller than the modulus.
Processing tape
Signals the start of the optimization of the contents of the tape. This includes merging blocks to reduce rounds. It also
eliminates dead code.
58
Memory Size
This is an estimation of the memory allocation needed to execute the protocol.
./compile-old.sh target
./compile.sh target
During parsing, instructions are created such that every output of an instruction is a new register. This ensures
that registers are only written to once, which simplifies the register allocation procedure later. The main goal of the
optimization process is to minimize the number of rounds of communication within basic blocks of each tape, by merg-
ing independent startopen and stopopen instructions. This is done through instruction reordering and analysis
of the program dependency graph in each basic block. After optimization, a simple register allocation procedure is
carried out to minimize register usage in the virtual machine.
Contains basic blocks and data for a tape. Each tape has its own set of registers
basicblocks: List of basic blocks in this tape.
optimize(): Optimize the tape. First optimizes communication, then inserts any branching instructions
(which would otherwise interfere with the previous optimization) and finally allocates registers.
req num: Dictionary storing the required numbers of preprocessed triples, bits etc. for the tape.
purge(): Clear all data stored by the tape to reduce memory usage.
new reg(reg type, size=None): Creates a register of type reg_type (either ‘s’ or ‘c’ for secret or clear)
for this Tape. If size is specified and > 1 then a vector of registers is returned.
59
BasicBlock
A basic block contains a list of instructions with no branches, but may end with a branch to another block
depending on a certain condition. Basic blocks within the same tape share the same set of registers.
instructions: The instructions in this basic block.
set exit(condition, exit block): Sets the exit block to which control flow will continue, if the instruction
condition evaluates to true. If condition returns false or set_exit is not called, control
flow implicitly proceeds to the next basic block in the parent Tape’s list.
add jump(): Appends the exit condition instruction onto the list of instructions. This must be done after
optimization, otherwise instruction reordering during merging of opens will cause the branch to
behave incorrectly.
adjust jump(): Calculates and sets the value of the relative jump offset for the exit condition instruction.
This must be done after add_jump has been called on every basic block in the tape, in order that
the correct jump offset is calculated.
Tape.Register
The class for clear and secret registers. This is enclosed within a Tape, as registers should only be created
by calls to a Tape’s new_reg method (or, preferably, the high-level library functions).
Instruction
This is the base class for all instructions. The field __slots__ should be specified on every derived
class to speed up and reduce the memory usage from processing large quantities of instructions. Creating
new instructions should be fairly simple by looking at the existing code. In most cases, __init__ will
not need to be overridden; if this is necessary, ensure that the original method is still called using super.
code: The integer opcode. This should be stored in the opcodes dictionary.
arg format: The format for the instruction’s arguments is given as a list of strings taking one of the
following:
• ‘c[w]’: clear register, with the optional suffix ‘w’ if the register is written to.
• ‘s[w]’: secret register, as above.
• ‘r[w]’: regint register, as above.
• ‘i’ : 32-bit integer signed immediate value.
• ‘int’ : 64-bit integer unsigned immediate value.
• ‘p’ : 32-bit number representing a player index.
• ‘str’ : A four byte string.
For example, to implement a basic addition instruction for adding a secret and a clear register, the follow-
ing code suffices.
class addm(Instruction):
""" Add a secret and clear register into a secret register. """
__slots__ = [] # contents inherited from parent
code = opcodes[’ADDM’]
arg_format = [’sw’, ’s’, ’c’]
Memory
Memory comes in three varieties sint, cint, and regint; denoted by S[i], C[i] and R[i].
60
6.3.2 Optimizing Communication
The first observation is that communication of small data items costs, so we want to pack as much data together in a
start/stop open command. The technique for automating this is as follows:
• Calculate the program dependence graph G, whose vertices correspond to instructions in the byte-code; treating
start/stop open instructions as a single instruction. Two vertices (vi , vj ) are connected by a directed edge if an
output from instruction vi is an input to instruction vj .
• Now consider the graph H, whose vertices also correspond to instructions; insert a directed edge into H from
every startopen vertex to any other vertex where there is a path in G not going through another startopen
vertex.
• Label vertices in H with their maximal distance from any source in H that is a startopen.
• Merge all start/stop opens with the same label in H.
• Create another graph H 0 with vertices corresponding to instructions and edges from every non-open vertex to
any startopen where there is a path not going though another startopen vertex.
• Label non-open vertices in H 0 with the minimum of the labels (in H) of their successors in H 0 .
• Compute a topological ordering the merged graph such that sources with H 0 -label i appear after the open with
label i − 1 and such that sinks with H-label j before the open with label j. Furthermore, let non-open vertices
with H-label i and H 0 -label j such that j − i ≥ 2 appear between the startopen and stopopen with label
i + 1.
• Output the resulting instruction sequence.
6.3.4 Notes
The following are mainly notes for the development team, so we do not forget anything. Please add stuff here when
you notice something which might be useful for people down the line.
1. Instructions/byte-codes can be re-ordered. Sometimes this is a bad idea, e.g. for IO instructions. All instructions/byte-
codes which inherit from IOInstruction are never reordered.
2. Instructions executed in the test scripts need to be emulated in python. So do not use an instruction here which
does not have an emulation. The emulation stuff is in core.py. Essentially, there’s two options for testing:A
• If you use the ’test(x,y)’ syntax you have to make sure that all functions (including classes) called for x are
defined in core.py, but they don’t have to do anything. For example:
61
\verb+ print_ln = lambda *args: None+
This allows to call print ln in tested programs, but nothing really happens when testing the results.
• If you use the ’test(x)’ syntax you have to make sure that the functionality is replicated in core.py and
understood by Scripts/test_results.py. This is more involved, but it allows to write tests with
less effort.
compile-mamba.py -A -n -r -u -s Programs/some_program
The -A option produces the output .asm files in the directory Programs/some\_program. To compile these
.asm files the assembler scasm is called
Which is itself an alias for the Rust program in the directory Assembler. The entire new compilation pipeline can
be executed with the default flags etc by calling
./compile-new.sh target
./compile.sh target
The default optimization level is -O3. The meaning of the flags is as follows, the effect of the optimizations is
cumulative,
-O0: Apply no optimizations to the assembler, simply output the correspond byte-codes.
-O1: Apply register painting so as to reduce the number of registers required.
-O2: Merge startopen and stopopen commands as described below.
62
The binary has a bunch of flags to change what kind of output you want, they are documented under
If you want to go from an .asm file to a .bc file, all you need is
The opposite direction is possible, too, so you can use scasm as a disassembler if you want to inspect binary files.
The addition of a --release flag will compile scasm with optimizations so it’s much faster. The optimizations
which scasm runs on the assembly files are not affected by the --release flag. To turn on the debug output you
can set the
export RUST_LOG=scasm::transforms::optimizer_step1
environment flag to turn on all of the debug printing in the transforms::optimizer_step1 module. The same
goes for any other module. To only print out debug info use...
export RUST_LOG=scasm::transforms::optimizer_step1=debug
The assembler also produces .dot file to visualize the blocks etc using package such as Graphviz. To obtain .dot
output, specify the output file with a .dot extension.
./scasm target
For example the command, if you want to execute scasm with only the optimization related to merging startopen
and stopopen executed, i.e. no register coloring is performed, plus also see possible warnings in the code, then
execute the command,
With verbose you get a lot of warnings of instructions which write but do not read from a register. These are almost
always perfectly acceptable as they can represent instructions which write to multiple registers of which only one ends
up being used later on.
To make the assembler output .asm files at the end of each optimization step (this is useful to find problems in
the output of the assembler) use the command
Various optimizations can be turned on and off using the -O0, -O1, -O2, -O3 flags. Again the default is to
execute -O3.
63
cargo test
again in the Assembler directory. If something fails this will produce files of the form .stderr. If you then make
edits, for example to instructions.rs, or the parser etc to remove the errors then the errors will disappear.
To run all tests in the scasm-tests directory, which is our standard way of testing for major bugs within the
assembler, execute the following steps:
1. In the main SCALE directory run
./Scripts/compile-scasm
Note that these commands will take quite some time as the test are multiple hundred megabytes in size.
Although when you run cargo test this is automatically generated in any case.
64
Finally when editing scasm remember to execute
cargo fmt
before committing. If you are running visual studio code inside the Assembler directory the formatting will happen
automatically as you type.
• We assume the input has no read-before-writes on registers, since these give undefined behaviour. Although
scasm will emit an error message if this is not upheld.
• A comment line is indicated by prepending with a #.
• Arguments to instructions are separated by a comma, e.g. ldmc c1, 10.
• Input of instructions can be in either upper or lower case, we do not care what you want to use in this regard.
However, all registers are in lower case.
• Registers are denoted by a type followed by a number, where the types can be one of c, s, r, sr, and sb. These
correspond to cint, sint, regint, sregint and sbit.
65
startopen 1, s7
stopopen 1, c9
print_reg c5
startopen 1, s8
stopopen 1, c10
addc c7, c6, c10
ldmc c8, 30
mulm s6, s2, c8
JMP if xxxx to LL
Note that this basic block can start with registers which are already allocated, e.g. register s2 in the above snippet, but
we must be in SSA form (no register can be written to more than once).
Step 1 To process a basic block of instructions we proceed as follows; note that some registers may have been
defined prior to the execution of this basic block. To each instruction we first assign an instruction and round depth,
and to each register we also assign an instruction and round depth. Let us call these id , ird , rd and rrd . This is done as
follows:
• We keep two variables mem_i_depth and mem_r_depth and assign them to −1 and 0. These are essen-
tially the ‘instruction depth’ and ‘round depth’ of a special register called ‘memory’ (we need to modify our
methodology for this ‘register’ as it is does not respect the SSA form).
• We keep a variable called block_num which is initially set to zero.
• We assign all registers to have initial instruction and round depth −1 as well, so we set rd = rrd = −1 for all
registers r.
• When we meet a new instruction we compute md = max rd and mrd = max rrd for all read registers r. If
there are no read registers we set md = −1 and mrd = 0.
• When we meet an instruction which just reads from a register we assign id and ird the value of md + 1 and mrd .
[This includes a startopen operation].
• When we meet an instruction which writes to a register we assign id and rd (for the written register r) the value
of md + 1. We assign ird and rrd the value of mrd . If the write instruction register before this assignment has
a depth already assigned then we should abort (as this is not correct input assembly, which should be in SSA
form).
• When we meet a memory load or store instruction we assign (akin to when we read a register) we set id and
mem_i_depth to be
max(mem_i_depth, md ) + 1,
and ird and mem_r_depth to be
max(mem_r_depth, mrd ).
• For a stopopen [which before optimization always follows a startopen] we take the id and ird of the
previous startopen and then assign the new id and new ird as one more than these values. The associated
values of rd and rrd , for the written registers, are assigned the same values as well.
• For any other instruction we assign the instruction and round depth to zero.
• Each instruction gets assigned the current value of block number.
66
• New, Peek and GetSp operations are considered as memory read instructions, whereas Delete, Poke and Push
operations are considered as memory write instructions in the above methodology. Pop on the other hand both
reads memory (takes something off the stack), and writes to memory (alters the stack itself). Of course Peek,
Pop and GetSp are also register write instructions and Poke and Push are register read instructions.
• The GarbledCircuit and LocalFunction operations are considered as both memory read and memory write oper-
ations, as they both push and pop from the stack.
• Note when doing all of the above we have to also remember that vectorized instructions touch more than just
the instruction mentioned in the opcode.
Once we have done this we obtain the following values for the instruction and round depths’ of the instructions in our
basic block...
instr_depth rd_depth mem_i_dpt mem_r_dpt
ldmc c1, 10 0 0 - 0
ldms s8, 20 1 0 1 0
addm s1,s2,c1 2 0 1 0
ldmc c2, 30 2 0 2 0
addc c3,c1,c2 3 0 2 0
adds s3,s1,s2 3 0 2 0
startopen 1, s3 4 0 2 0
stopopen 1, c4 5 1 2 0
adds s4, s1, s2 3 0 2 0
addm s5, s1, c4 6 1 2 0
startopen 1, s5 7 1 2 0
stopopen 1, c6 8 2 2 0
stmc c6, 50 9 2 9 2
addm s9, s2, c2 3 0 9 2
startopen 1, s9 4 0 9 2
stopopen 1, c5 5 1 9 2
ldms s7, 110 10 2 10 2
startopen 1, s7 11 2 10 2
stopopen 1, c9 12 3 10 2
print_reg c5 5 1 10 2 <- Barrier instruction
startopen 1, s8 2 0 10 2
stopopen 1, c10 3 1 10 2
addc c7, c6, c10 9 2 10 2
ldmc c8, 30 11 2 11 2
mulm s6, s2, c8 12 2 11 2
JMP if xxxx to LL
Step 2 We now merge the startopen and stopopen commands which have the same round depth and the same
block number. This means we do not merge instructions which are separated by a barrier command.
67
addm s1,s2,c1 2 0 1 0
ldmc c2, 30 2 0 2 0
addc c3,c1,c2 3 0 2 0
adds s3,s1,s2 3 0 2 0
startopen 2, s3, s9 4 0 2 0
stopopen 2, c4, c5 5 1 2 0
adds s4, s1, s2 3 0 2 0
addm s5, s1, c4 6 1 2 0
startopen 1, s5 7 1 2 0
stopopen 1, c6 8 2 2 0
stmc c6, 50 9 2 9 2
addm s9, s2, c2 3 0 9 2
ldms s7, 110 10 2 10 2
startopen 1, s7 11 2 10 2
stopopen 1, c9 12 3 10 2
print_reg c5 5 1 10 2 <- Barrier instruction
startopen 1, s8 2 0 10 2
stopopen 1, c10 3 1 10 2
addc c7, c6, c10 9 2 10 2
ldmc c8, 30 11 2 11 2
mulm s6, s2, c8 12 2 11 2
JMP if xxxx to LL
The problem now is that the instruction depths will be wrong, and instructions may not be in a write-before-read order.
For example s9 above is opened before it is written to.
Step 3 Noting which registers were defined on entry (i.e. registers with rd = −1) we now run the same algorithm
again. We have to do multiple passes through the instructions until all instructions are assigned an instruction depth.
We need to know the registers defined on entry, otherwise this we dont know whether an instruction is ok-as-is or
needs to be defined later.
instr_depth rd_depth mem_i_dpt mem_r_dpt
ldmc c1, 10 0 0 0 0
ldms s8, 20 1 0 1 0
addm s1,s2,c1 2 0 1 0
ldmc c2, 30 2 0 2 0
addc c3,c1,c2 3 0 2 0
adds s3,s1,s2 3 0 2 0
startopen 2, s3, s9 4 0 2 0 <-
stopopen 2, c4, c5 5 1 2 0 <-
adds s4, s1, s2 3 0 2 0
addm s5, s1, c4 6 1 2 0 <-
startopen 1, s5 7 1 2 0 <-
stopopen 1, c6 8 2 2 0 <-
stmc c6, 50 9 2 9 2 <-
addm s9, s2, c2 3 0 9 0
ldms s7, 110 10 2 10 2 <-
startopen 1, s7 11 2 10 2 <-
stopopen 1, c9 12 3 10 2 <-
print_reg c5 5 1 10 2 <- Barrier
startopen 1, s8 2 0 10 0
stopopen 1, c10 3 1 10 0
68
addc c7, c6, c10 9 2 10 2 <-
ldmc c8, 30 11 2 11 2 <-
mulm s6, s2, c8 12 2 11 2 <-
JMP if xxxx to LL
Instructions marked with a <- are defined on the second pass, through the list. Note the stmc means we cannot
process any future ldmc on the first pass. We obtain the associated register depths as
c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, s1, s2, s3, s4, s5, s6, s7, s8, s9
instr 0 2 3 5 5 8 9 11 12 3 2 -1 3 3 6 12 10 1 3
round 0 0 0 1 1 2 2 2 3 1 0 -1 0 0 1 2 2 0 0
Step 4 We now sort with respect to the following variables (in order of priority) (block number, round depth, in-
struction depth). We do not alter the position of any barrier operations, thus barrier operations still marks the change
between a given instruction block and the next one. This gives us...
instr_depth rd_depth mem_i_dpt mem_r_dpt
ldmc c1, 10 0 0 0 0
ldms s8, 20 1 0 1 0
addm s1,s2,c1 2 0 1 0
ldmc c2, 30 2 0 2 0
addc c3,c1,c2 3 0 2 0
adds s3,s1,s2 3 0 2 0
adds s4, s1, s2 3 0 2 0
addm s9, s2, c2 3 0 9 0
startopen 2, s3, s9 4 0 2 0 <-
stopopen 2, c4, c5 5 1 2 0 <-
addm s5, s1, c4 6 1 2 0 <-
startopen 1, s5 7 1 2 0 <-
stopopen 1, c6 8 2 2 0 <-
stmc c6, 50 9 2 9 2 <-
ldms s7, 110 10 2 10 2 <-
startopen 1, s7 11 2 10 2 <-
stopopen 1, c9 12 3 10 2 <-
print_reg c5 5 1 10 2 <- Barrier
startopen 1, s8 2 0 10 0
stopopen 1, c10 3 1 10 0
addc c7, c6, c10 9 2 10 2 <-
ldmc c8, 30 11 2 11 2 <-
mulm s6, s2, c8 12 2 11 2 <-
JMP if xxxx to LL
Step 4a To cope with the following case of vectorized start/stop opens we increase the variable block_num
whenever a different start/stop open vectorization is encountered. This leads to slightly less merging...
vstartopen 10, 1, s0 0
vstopopen 10, 1, c0 0
startopen 1, s101 1
stopopen 1, c101 1
startopen 1, s100 1
stopopen 1, c100 1
vstartopen 10, 1, s30 2
vstopopen 10, 1, c30 2
69
Thus we get the merged instructions
vstartopen 10, 1, s0 0
vstopopen 10, 1, c0 0
startopen 1, s101, s100 1
stopopen 1, c101, c100 1
vstartopen 10, 1, s30 2
vstopopen 10, 1, c30 2
But the two vectorized start and stops do not get merged.
Step 5 If optimization level is set to -O3 we now work out which additional instructions we can place between a
startopen and a stopopen. Note, the previous step can result in instructions being inserted between startopen
and a stopopen commands, but there may be additional ones which this step tries to find. To do this we execute the
following steps
1. We take all instructions with a given round depth and given block number. This set of instructions contains at
most one startopen instruction due to the discussion in Section 6.4.5. The set of interesting instructions is
the set of instructions of a given round depth and block number which occur before the startopen. If there
is no startopen then there are no interesting instructions for this round depth and block number.
2. We let R denote the set of all registers in the given startopen.
3. We now go through the set of interesting instructions and searching for instructions which write to a register
in the set R. We “delete” these instruction from our set of interesting instructions, and add the associated read
registers from this instruction into the set R. Note, we again treat ‘memory’ here as a special register for this
discussion.
4. We repeat the line 3 until no more instructions are “deleted” from the set of interesting instructions.
5. The remaining interesting instructions are then moved to a position directly after the given startopen.
This gives us...
ldmc c1, 10
ldmc c2, 30
addm s1,s2,c1
addm s9, s2, c2
adds s3,s1,s2
startopen 2, s3, s9
ldms s8, 20
addc c3,c1,c2
adds s4, s1, s2
stopopen 2, c4, c5
addm s5, s1, c4
startopen 1, s5
stopopen 1, c6
stmc c6, 50
ldms s7, 110
startopen 1, s7
stopopen 1, c9
print_reg c5
startopen 1, s8
stopopen 1, c10
addc c7, c6, c10
70
ldmc c8, 30
mulm s6, s2, c8
JMP if xxxx to LL
Incorrect Correct
x = cint(0) x = cint(0)
result = MemValue(cint(x))
a = cint(3) a = cint(3)
if then(a != 0) if then(a != 0)
x += 200 result.write(x + 200)
else then() else then()
x += 100 result.write(x + 100)
end if() end if()
x = result.read()
print ln(’%s’, x) print ln(’%s’, x)
MAMBA block usage
This should normally output 200 in both cases but in the left case the MAMBA compiler outputs no assembler for
scasm to compile; see the warnings you get when you compile this fragment. The warnings come from the fact that
in the left block, the results inside the if statement are written to (local) registers defined outside the blocks. This is
easily fixed in the right hand side block by writing the results back into a MemValue variable.
71
7 The MAMBA Programming Language
7.1 Writing Programs
(Warning:) Overtime MAMBA will be deprecated as a means of programming the system as the Rust method becomes
more fully operational.
Programs for the compiler are written in a restricted form of Python code, with an additional library of functions
and classes specifically for MPC functionalities. The compiler then executes the Python code, and the MPC library
functions cause this to output byte-code suitable for running on the virtual machine. Whilst the compiler will, in
theory, accept any valid Python code as input, some language features may not work as expected. The key point is that
any native Python constructs are evaluated at compile time, the effect being that all Python loops will be unrolled and
all functions inlined. So to get the most out of the compiler and write efficient programs, it is important to understand
a little about the internals of the underlying implementation.
This section documents the library functions available to an MPC source code file, and how to perform basic
operations on the associated data types. Basic knowledge of the workings of the Virtual Machine (chapter 5) and the
Python language are assumed. The definitions of all library functions are given in library.py.
• x < 48(bits) : k = 24
• 48 ≤ x < 85(bits) : k = 32
• 128-bit field: k = 64
These choices allow some room for secure comparison with a statistical security parameter of at least 30 bits, apart
from the 32-bit field, which should only be used for test purposes. More precisely, the default security parameter is
chosen as follows:
• prime_bit_size < 48:
Security parameter κ is fixed to 6 by Default.
The default values of k and κ can be modified using the commands described above if desired.
72
7.1.3 Mod 2n Data Types
The compiler is also capable to process inputs that are encoded in a mod 264 ring (like normal CPU registers) rather
than mod p. Variables that instantiate these data types are then processed via Garbled Circuits.
Data conversion between mod p and mod 264 types are also possible. As before, we make use of secret and
clear data types. A clear integer can be represented by the regint class, which works like normal 64-bit integer,
with the secure version being represented by the sregint class. Both types are used to represent secret shared/clear
integers that are bounded by 264 . Both correspond directly to a single clear or secret register in the virtual machine. In
these case both are integer types that lie in the range [−263 , . . . , 263 ), both regint and sregint support inputs on
such range.
We have also included an extra data type called sbit. This type can be instantiated via its constructor and also by
a result of a returned value from all sregint comparison tests. It can be used to perform boolean operations on top
of it, making it ideal for building complex logic predicates.
Moreover, the clear type regint was designed in such a way that it can be used for other tasks. One example
is an array index, since values used in range-loops and some of while-loops are regints. Also value in Mem-
Value(described later) is stored as regint. We use regint in such a generic way, given that operations on regint
are faster than on cint. This is because regint values map directly to the integer values in the byte-codes, whereas
cint map to the clear type in the byte-code.
Conversions: We can perform conversions between sint and sregint, bearing in mind that both have to be
allocated in the same integer range. This can be done by simply instantiating either an sregint or a sint object
passing the corresponding register as a parameter in the constructor as follows:
a = sint(-5)
b = sregint(-5)
# casting
d = sregint(a) # returns signed sregint
c = sint(b) # returns signed sint
d = sregint(2**64-5)
e = sint.convert_unsigned_to_sint(saa) # Explicit unsigned conversion
We can also convert a sint value to an sbit, on the assumption the original sint register holds a bit (if it does
not then information could leak). We can obviously also convert back from an sbit to an sint, which does not lose
information. There is also some syntactic sugar to load a sbit value into an sregint register as well (the other
direction can be done directly using bitsint command).
a0 = sint(0)
a1 = sint(1)
b0 = sbit(0)
b1 = sbit(1)
d0 = sregint(0)
d1 = sregint(1)
ca0=sbit(a0)
ca1=sbit(a1)
cb0=sint(b0)
cb1=sint(b1)
c0=sregint(b0)
c1=sregint(b1)
The conversion back and forth between sregint/sbit and sint values uses the daBit method of [RW19], plus
the evaluation (in the case of sint to sregint conversions) of a (relatively large) garbled circuit. Thus whilst it is
easy to pass from one secret data-type to the other, this operation has some computational and communication cost
associated to it, hence we recommend such conversions to be used with care.
sbit to sint conversion: Recall a daBit is a doubly shared bit bi in two secret sharing schemes (corresponding to our
sint and sbit types) hbi ip and hbi i2 . To convert from a sbit value to a sint value we take a single daBit r and
73
XOR it into the sbit x. Thus we form hvi2 ← hxi2 ⊕ hri2 . The value v is then opened to all parties. All parties can
then locally produce the sharing as an sint by forming
sint to sbit conversion: The method here is the same, but in the other direction. We take a single daBit r and then
XOR it into the sint x by computing
which requires consuming another Beaver triple. Then the value hvip is opened, and XOR’d on the bit hxi2 .
sregint to sint conversion: Note that such a conversion may loose precision if log2 p < 64, thus if this is the case
the user should guarantee that the number being converted is in the correct range, otherwise undefined behaviour will
occur. The basic idea is to duplicate 64 times the above method for converting an sbit to an sint, and then do some
adjustment. To convert from a sregint value to a sint value we take 64 daBits ri and XOR them into the bits
representing the 64-bit sregint x. Thus we form hvi i2 ← hxi i2 ⊕ hri i2 . The value vi is then opened to all parties.
All parties can then locally produce the sharing as an sint of the bits xi by forming
The standard/default conversion method proceeds as above, by treating the input sregint value as a signed 64-bit
integer. We have an additional conversion routine which treats the input as an unsigned 64-bit integer and produces
the result
X63
hxip ← hxi ip · 2i .
i=0
sint to sregint conversion: To convert from a sint to a sregint type is more complex. We define two cases,
one where log2 p > 64 and one where log2 p < 64. Again in the latter case one needs to be careful that any values
converted fit in the correct range.
In the former case, We first take either t = log2 p daBits ri (when 64 + conv stat sec ≥ log P2 p), or t = 64 +
conv stat sec daBits ri (when 64 + conv stat sec < log2 p), and then forming the integer r = ri · 2i in both the
mod p and the mod 264 worlds. The value hrip is subtracted from the input hxip in the mod p world, and the value
x − r opened. This is then enterred into a garbled circuit in the mod 264 world so as to add back on r. This circuit is
designed so that we convert signed integers to signed integers, where in the mod p world signing is by taking integers
with a centred rounding modulo p. Of course if you convert a value bigger than 64 bits in the mod p world over to
the mod 264 world then you will loose information.
For this method to be secure we need that the value r generated above is either large enough to drown out a 64-bit
value, or is itself sufficiently close to uniformly random to be secure. Thus we require one of the following three
conditions on p to be met:
1. log2 p > 64 + conv stat sec.
2. (p − 2log2 p )/2log2 p < 2−conv stat sec , i.e. p is just larger than a power of two.
3. (2log2 p − p)/p < 2−conv stat sec , i.e. p is slightly less than a power of two.
74
If these conditions are not met, then the conversion will abort at runtime. The value conv\_stat\_sec is by
default equal to 40, but this can be modified in config.h.
In the latter case, when log2 p < 64, we repeatedly generate log2 p random daBits ri until the value r =
Plog2 p+1
i=0 ri · 2i is less than p. Using this form of rejection sample we can treat r as a uniformly random value
modulo p. The number of iterations required will depend on the difference between 2log2 p and p. Each iteration
requires a GC-based subtraction operation. We then perform the operations above, opening x − r, and then adding
back on r using a Garbled Circuit and adjusting to get the sign correct.
Extract and Setting sbitss: Extractions of sbits from an sregint register can be done as follows:
sa=sregint(5424)
for j in range(32,0,-1):
sf=sbit()
bitsint(sf,sa,j)
cc=sf.reveal()
print_int(cc)
print_ln("")
On regint and clear data types: As it was noted before, regint offers support for a number of housekeeping
tasks. Because of this fact, conversions between all clear types, including regint are supported. Furthermore, all
operations that involve a clear register with regint would return a regint register. This can be achieved as follows:
# casting
a = cint(5)
b = regint(a)
a = cint(b)
# basic operations:
d = a + b # returns regint
c = b + a # returns regint
This would also implicitly allow the compiler to include operations between regint and secret mod p types. How-
ever, this is not recommended, given that the disparity on bit-length support in both mod p and mod 264 types might
cause incongruity among the answers. We advise users to use the conversion routines instead to transition between
both type families.
Note: value type can be: regint, int, long, sregint, sint and cint.
75
cint(value)
Loads the integer value into a clear register and returns the register.
Example:
i = cint(5)
Note: value type can be: regint, cint, int and long.
regint(value)
Loads the integer value into a regint register and returns the register.
Example:
i = regint(5)
Note: value type can be: regint, int, long.
sregint(value)
Loads the integer or clear integer value into a secret register and returns the register.
Example:
i = sregint(5)
Note: value type can be: regint, int, long, sregint, sint.
sfix(value)
It instantiates an sfix register based on value. In case value is a publicly available value, it loads it as a secret
shared fix point number. In case is a secret shared integer, it places value to the mantissa v of the sfix instance.
Example:
i = sfix(5.5)
cfix(value)
It instantiates an sfix register based on value. Note that value is a publicly available value, it loads it as a secret
fix point.
Example:
i = cfix(5.5)
sfloat(value)
It instantiates an sfloat register based on value and returns an sfloat object. Its instantiation logic mimics its
fixed point counterpart. Current implementation supports basic logical and arithmetic operations with all data types.
We describe the formatting later in this section.
Example:
i = sfloat(5.5)
Note: value type can be: int, float, sint, sfloat and sfix.
76
cfloat(value)
It is the clear register counterpart of sfloat. It instantiates a cfloat register based on value and returns a
cfloat object. It mimics sfloat number representation, and its main function is to serve as an interface when
operating between sfloat instances and clear types and registers.
Example:
i = cfloat(5.5)
Note: value type can be: int, float, sint, sfloat and sfix.
load_int_to_secret_vector(vector)
Loads a list of integers vector into a vectorized sint and operates on the vector as in a single instance (vectorized
instructions).
Example:
A= [1, 2, 3]
SA= load_int_to_secret(A)
print_ln("Values from A: %s", SA) # the output is 123.
load_secret_mem(address)
load_clear_mem(address)
Returns the value stored in memory address of the according type. The value had to be previously stored on SCALE.
Memory in this context refers to a data-storage from SCALE and not physical memory. Users select a memory address
when storing data, and the same address needs to be used to extract it. The calls can be implemented as follows:
Example:
i =cint(5)
i.store_in_mem(0)
ci= load_secret_mem(i)
%print_ln("Values from A: %s", ci) # the output is 5 and type cint.
Note: address type can be: regint, int, long and cint. It does the same as functions below and can store any data-type. Note: operation supported for
sregint, regint, sint, cint, sfix and sfloat.
x.store_in_mem(address)
Stores register x into a given address of the appropriate memory type. This basically implies it can be later retrieved
by a load_mem instruction. Memory addresses are decided by the user and are stored by the compiler on SCALE.
Example:
i =sint(5)
i.store_in_mem(0)
si= load_secret_mem(i)
print_ln("Value stored on memory address 0: %s", si.reveal()) # the output is 5.
Note: address type can be: regint, int, long and cint. Note: operation supported for sregint, sint, cint, sfix and sfloat.
x.load_mem(address)
Loads the value stored in memory address to the register. The address is selected during the invocation of a
store_in_mem call.
Example:
i =sint(5)
i.store_in_mem(0)
si= sint.load_mem(0)
print_ln("Value stored on memory address 0: %s", si.reveal()) # the output is 5.
77
Note: address type can be: regint, int, long and cint. Note: operation supported for sregint, regint, sint, cint, sfix and sfloat.
x.reveal()
Opens the value in a secret register and returns a clear data-type, also referred as register for the now publicly available
register.
Example:
si= sint(5)
print_ln("Value in the clear from si: %s", si.reveal()) # the output is 5.
Note: x type can be: sregint, sbit, sint, sfix, and sfloat, resulting in a regint, regint, cint, cfit and cfloat respectively.
This will print the values one, then three, then one. To use other stacks one uses the prefixes, sregint, cint, sint
and sbit.
The above peek and poke operations are relative to the bottom of the stack. If you want to reference relative to the
top of the stack then use
c=regint.peek(1)
regint.poke(0,three)
This will have the same affect as the peek and poke in the previous code segment.
78
• >> is not implemented for cint registers as the right operand. All other shift operations for mod p types are
implemented. The left and right shift by integers are available for sregint variables.
Boolean operations &, ∧, |, ∼ are also supported on clear registers and on sregint registers. This is true as well
for the special data type sbit, which is classified as a mod 264 type. The compiler returns sbit registers when
performing comparisons on sregint values. Note that comparisons are performed using subtraction and then com-
parison to zero, thus overflow errors can result from the subtraction if done at the limits of the range of representable
numbers. In the case of clear data types, comparisons work as expected for integers of bit length n. Whereas for
sbit, they operate as in boolean algebra. Note that the & operand is also supported between sregint and sbit
types (where it behaves as a bit wise AND of the bits in the sregint input by the single bit in the sbit input).
Access to the direct comparison with zero operations can be done via
sa=sregint(5424)
sb1=sbit()
ltzsint(sb1,sa)
sb2=sbit()
eqzsint(sb2,sa)
or
sa=sregint(5424)
sb1=sa.ltz()
sb2=sa.eqz()
Note that upon compilation all of the above operators return a new register – it is not possible to directly modify
the value of a specific register from MAMBA. For example, in the following code, the * operator creates a fresh clear
register, and assigns the result of the multiplication into this. The name c is then bound to the new register, and the
previous register is no longer accessible. The reason for this is to simplify the register allocation procedure, details of
which are in section 6.3.3.
When operating between different types, the result will be secret if one of the operands was a secret register.
Additionally, as in any other conventional programming language, the returned type will correspond to the type of the
strongest precision.
The goal of providing clear registers, is to provide the means to the user to interact and operate with secret val-
ues. Our examples make use of secret registers, but as mentioned, We now provide some examples for some basic
operations. All of these operations are supported also in between secret and clear registers. As a cautionary note,
although supported, multiplication between fixed and secret float registers might cause some loss of precision, hence
discouraged.
On Operations between regint and clear data types: As it was mentioned above, operations between regint
and clear types such as cint are not directly recommended. The reason, as stated, is because of the discrepancy on
bit-wise sizes between regint (which is limited to 64 bits) and mod p clear types, where their size depends on the
prime p. These operations look as follows:
a = cint(2**65) #will overflow
b = regint(3)
# overflown:
c = a + b # returns a regint
d = a * b # returns a regint
e = a - b # returns a regint
Basically, operations between these types incur on an implicit casting of the cint operator (to regint). This will
cause an overflow when the bit-length of cint register or of the result of such operation is greater than 64 bits.
Multiplication: As before, multiplication is supported for secret and non-secret, integer and fractional data types.
They can be invoked as follows:
79
c = sint(12)
c = c * c
f = sfix(12)
f = f * f
d = c * f
g = sfloat(12)
g = g * g
h = c * g
i = f * g
j = sregint(12)
k = j * j
l = j * 12
In this small example, we can see how to multiply among different and the same data types. As a result c = 122 ,
f = 12.02 , d = 12.04 , g = 12.02 , h = 12.04 , and i = 12.04 . Note that d is of type sfix, whereas g, h and i are of
type sfloat.
In the above when you multiply two sregint values together you only get the lower 64-bits back in the resulting
sregint. If you want to access the resulting top and bottom words then you should use
sa=sregint(2**62+212111)
sb=sregint(2**62-313131)
sd, sc = sa.mul_2_sint(sb)
j = sregint(12)
k = j + j
l = j + 12
As before in this case the type for d is sfix, whereas the type of g, h and i is sfloat
Division and Modulus: We first revise how to perform division and modulus operations for mod p types. In that
sense, the compiler can handle also division and modulus operations for such types. However, these are not generic
operations. Let us start by showing some basic constructions for division:
c= sint(12)
c= c / 3
f = sfix(3)
f = f/2
g = sfloat(3)
g = g/2
The results for c, f and g are 4, 1.5 and 1.5. Indeed, this is a quite natural way to call division on integers and decimal
types. But it has to be noted that the division on integers is constructed as a multiplication between the numerator and
the multiplicative inverse of the denominator. This is because of the modulo arithmetic the protocols are built upon.
Moreover, modulus operations are indeed somewhat different as we may see:
80
c = sint(2)
d = sint(3)
c = c % 2
d = d % 2
The operations will return, in the case of c the value 0 and for d the value 1. As with divisions, there are some
observations: modulo operations can only be performed to powers of 2. Furthermore, note that modulo operations
cannot be performed on non-integer types: sfix and sfloat.
In regards of mod 264 types, we have included a division instruction in the runtime for two variables of type
sregint. The compiler then overloads this to also allow division in the compiler of a sregint by an regint and
division of a regint by a sregint. The operations, mimic what could be expected from signed integer division.
The sregint division can be used as follows:
c = sregint(12)
d = sregint(4)
e = c / d
f = c / 4
g = 12 / d
Shift operations: We have included bit shifts operations for our basic integer mod p data types. Such shifts do not
work on fractional types nor on any mod 264 type.
c = sint(2)
d = c << 1
e = e >> 1
In this case, the output of d is 4 as expected and from e we obtain 1. Note that bit shifts only work on integer data
types. Similarly shifts can be performed for sregint variables.
Exponentiation: We also provide a built-in exponentiation operator for exponentiation over integer and fractional
mod p data types (∗∗), when the base is secret shared. This overload provides a simple implementation for successive
multiplications. We provide more comprehensive protocols for exponentiation of fractional inputs in the following
sections.
a = sint(2)
b = cint(3)
f = sfix(1.5)
c = a**b #returns 23
f = f**2 #returns 1.52
NOTE: For the reasons explained later with respect to fractional types, the exponent has to be of a native Python
integer type. We invite the reader to read the Advance Protocols section, to see alternative methods to compute the
exponent on fractional data types.
One can also raise a sfix value to another sfix value using the function mpc_math.pow_fx. However,
when doing this the routine uses sfloat variables internally, thus you need to ensure the two types are parametrized
correctly. In particular you must have k and f parameters for sfix being greater than the vlen parameter for
sfloat.
Comparisons (Inequality Tests): We have in-built operators for comparisons as well. Indeed, comparison of secret
values is supported, and returns a secret output containing 0 (false) or 1 (true). They work on both integer (either
mod p or mod 264 registers) and fractional data types. In this sense, they can be used as follows:
Example:
81
# mod p
a = sint(1)
b = sint(2)
c = a < b
d = sfloat(3)
f = a < d
# mod2n
h = sregint(1)
i = sregint(2)
j = h < i # sbit
k = regint(2)
j = h < k # sbit
NOTE: When executing a comparison using an sregint register, the return data type is going to be an sbit. We
can then build secret shared complex statements using boolean operators.
Boolean Operators for sbit and sregint: We have included 4 basic boolean operators (&, ∧, |, ∼) for sbit types.
They behave in the same way native boolean type. They are the product of the comparison tests performed on
sregint registers or can be instantiated via their contructor. We aim to give the user a way to build complex
logic predicates for, say if-like constructions. The operators can be invoked as follows:
# mod2n
a = sregint(1)
b = sregint(2)
c = a < b # sbit 1
d = a > b # sbit 0
temp_bit = sbit(1)
c = c & temp_bit
e = c | d # or
f = c & d # and
g = c ˆ d # xor
h = ˜ c # negation
# special case and
i = c & b # sregint with value 2
As it was previously mentioned, the & operation has been overloaded, to support also operations in between sbit and
sregint types. For this specific case, its behaviour is similar to a multiplication and it returns an sregint register.
The equivalent operations on sregint registers are bitwise operations on the shared 64-bit values. One can also
perform such operations between a sregint and a regint register.
# mod2n
a = sregint(1)
b = sregint(2)
c = regint(2)
d = a & b
e = a & c
f = a | b
g = a | c
h = a ˆ b
i = a ˆ c
j = ˜ a
On Clear Data types: Before a comparison of clear integers (cint) is done, all operands are reduced into the range
[−2t−1 , . . . , 2t−1 ]. Note that the virtual machine has no boolean type, so any comparison operation returns a clear
register value of 0 (false) or 1 (true).
82
Secret mod p Data types. The bit length of the comparison operators defaults to the parameter t, which is set-up by
the compiler based on the input modulus, but if a different length is required it can be specified with the following
functions:
x.less_than(y, bit_length, sec_param)
x.greater_than(y, bit_length, sec_param)
x.less_equal(y, bit_length, sec_param)
x.greater_equal(y, bit_length, sec_param)
x.equal(y, bit_length, sec_param)
x.not_equal(y, bit_length, sec_param)
The output in this case is 1 as expected. The 2 last parameters are not obligatory.
sint.get_random_triple()
Returns three secret registers a, b, c such that a · b = c.
a,b,c =sint.get_random_triple()
print_ln("these 2 results are equal %s, %s", (a*b).reveal() c.reveal())
The code above will show in this case c and the result of the multiplication of a and b, that should be equal.
sint.get_random_bit()
Returns a secret value b, with value in {0, 1}. The function can be used as follows:
b =sint.get_random_bit()
print_ln("the result is either 0 or 1 %s", b.reveal())
The code will get a secret shared random bit.
sint.get_random_square()
Returns two secret values a, b such that a2 = b. Let us see the following example:
a,b =sint.get_random_square()
print_ln("these 2 results are equal %s, %s", (a*a).reveal() b.reveal())
The code will output the value of a · a versus b, which are equal values.
sint.get_random_int(nbits)
Parameters:
nbits: bitsize of the secret shared randomness to be produced. Must be a native Python integer variable (not
any MAMBA data-type). The function returns a random integer of size nbits. This does not come directly from
preprocessed data, but instead loads nbits random bits and uses these to create the random secret integer. The
function can be used as follows:
83
a =sint.get_random_bit(5)
print_ln("the result is smaller that 2ˆa %s", a.reveal())
The output is a bounded integer by 25 . Note that nbits can be a public input by the parties.
7.1.8 Printing
We provide the following functions to printout outputs:
print_str(string, *args)
print_ln(string, *args)
Both of the functions do the same thing, only difference is that print ln adds newline after execution. Arguments are
inserted into string in places of %s respectively.
Example:
x = 13
y = cint(5)
z = sint(x)
print_ln("x = %s, y = %s, z = %s", x, y, z.reveal())
Will print x = 13, y = 5, z = 13.
There are other member functions which perform printing as well, these are
cfix.print_fix()
regint.print_reg()
cfloat.print_float()
cint.print_reg()
for i in range(n):
print_str("%s ", z_array[i].reveal())
This might seem useless - why do we want to do the same multiplication but 10 fold? Well we can put different data
in each slot by first dumping same length arrays to x and y and then multiplication is going to be faster due to SIMD.
84
values v, β = 2 and f , in SCALE, we can represent a rational value as follows:
x ≈ v · βf .
The results might be slightly different given the truncation of the information contained by the number. You can think
of f as the bitwise precision for the given fixed point representation.
Data Components: The following are the most important data stored by the class:
v
Accessed by: Default.
Type: sint.
Stores a register of type sint on the {−2k − 1, 2k − 1} interval, encoding of the rational original value.
f
Accessed by: Default.
Type: int.
Stores the bitwise bit precision of the value to be stored by the instance.
k
Accessed by: Default.
Type: int.
Defines the mapping space. Basically, we can map numbers from −2k−1 to 2k−1 , such that k − f ≥ 0 so that no
overflow occurs.
Special Operations:
sfix.set_precision(f, k = None)
Accessed by: Default.
Parameters:
- f: New bitwise precision value.
85
sfix.load_int(v)
Accessed by: Default.
Parameters:
v: Integer value to load into a sfix instance.
Returns: No return value.
Description: It is used to do explicit initialization of an sfix value from any integer instantiation value.
Example: To initialize a sfix value with an integer:
a = sfix()
a.load_int(5)
b = a*3.0
print_ln("the answer is %s", b.reveal()) #the output is 15
sfix.conv()
Accessed by: Default.
Parameters: N/A
Returns: sint value corresponding to the mantissa (v value) mapping.
Description: Function obtains the mantissa (v value) mapping of the sorted value by the instance.
Example: To obtain the value of the mantissa:
a =sfix()
a.load_int(4.5)
v = a.conv()
print_ln("the answer is %s", v.reveal()) # the output is 4718592
sfix.sizeof()
Accessed by: Default.
Parameters: N/A
Returns: Python native int value corresponding to the size of memory slots occupied by the instance.
Description: It returns the global vector size times 1.
Example: To obtain reciprocal you can execute:
a =sfix()
a.load_int(4.5)
r = a.sizeof()
print_ln("the answer is %s", r) # the output is 1.
# By Default the global_vector_size is set to 1.
sfix.compute_reciprocal()
Accessed by: Default.
Parameters: N/A
Returns: sfix value corresponding to the reciprocal of the instance.
Description: It calculates and returns the reciprocal of an instance in secret shared form, in whatever precision is
supported by sfix.
Example: To obtain reciprocal you can execute:
a =sfix()
a.load_int(4.5)
r = a.compute_reciprocal()
print_ln("the answer is %s", r.reveal()) # the output is 0.222222
86
Observations:
- The class should not be initialized from a sint object. Application level invocations should use the function
load_int.
- The default precision and mantissa size, for sfix, are fixed assuming at least a 128 bit modulus. The values
(f, k) = (20, 41) are fixed directly on types.py and where fixed taken size restrictions into account. Note
that by default the internal parameter kappa is fixed to 40 bits.
- You might get weird results due to precision loss when operating with numbers close to the maximum allowable
values.
Data Components: The following are the most important data stored by the class:
v
Accessed by: Default.
Type: int.
Stores a register on the {−2k − 1, 2k − 1} interval, encoding of the rational original value.
f
Accessed by: Default.
Type: int.
Stores the bitwise bit precision of the value to be stored by the instance.
k
Accessed by: Default.
Type: int.
Defines the mapping space. We can map numbers from −2k−1 to 2k−1 , such that k − f ≥ 0 so that no overflow
occurs.
Special Operations:
cfix.set_precision(f, k = None)
Accessed by: Default.
Parameters:
- f: New bitwise precision value.
- k: New bitwise k, interval. default: NONE.
87
By default the values (f, k) = (20, 41) are chosen internally in the compiler. If you change the precision for cfix
you should also change that for sfix otherwise you can get some strange arithmetic behaviours. It is used to fix the
default precision on types.py.
Example: To change the precision of a cfix:
fixed_f=20
fixed_k=41
cfix.set_precision(fixed_f, fixed_k)
cfix.load_int(v)
Accessed by: Default.
Parameters:
v: Public integer value to load into this cfix instance.
Returns: No return value.
Description: It is used to do explicit initialization of an cfix value from any integer instantiation value.
Example: To initialize a cfix value with an integer:
a = cfix()
a.load_int(5)
b = a*3.0
print_ln("the answer is %s", b) #the output is 15
cfix.conv()
Accessed by: Default.
Parameters: N/A
Returns: cint value corresponding to the mantissa (v value) mapping.
Description: Function obtains the mantissa (v value) mapping of the sorted value by the instance.
Example: To obtain the value of the mantissa:
a =cfix()
a.load_int(4.5)
v = a.conv()
print_ln("the answer is %s", v) # the output is 4718592
cfix.sizeof()
Accessed by: Default.
Parameters: N/A
Returns: Python native int value corresponding to the size of memory slots occupied by the instance.
Description: It returns the global vector size times 1.
Example: To obtain reciprocal you can execute:
a =cfix()
a.load_int(4.5)
r = a.sizeof()
print_ln("the answer is %s", r) # the output is 4.
# By Default the global_vector_size is set to 1.
88
cfix.compute_reciprocal()
Accessed by: Default.
Parameters: N/A
Returns: cfix value corresponding to the reciprocal of the instance.
Description: It calculates and returns the reciprocal of the instance, on whatever precision is supported by cfix.
Example: To obtain reciprocal you can execute:
a =sfix()
a.load_int(4.5)
r = a.compute_reciprocal()
print_ln("the answer is %s", r) # the output is 0.222222
Observations:
- A class instance cannot be initialized directly via a secret shared input. In case you are loading an cint value, we
recommend you to use load_int. You should work with a cfix, as you would do with a sfix type.
- cfix by default, uses the same precision and mantissa size as sfix. That means it requires atleast a 128 bit
modulus. The values (f, k) = (20, 41) are fixed directly on types.py and where fixed taken size restrictions
into account.
x ≈ (1 − 2 · s) · (1 − z) · v · 2p .
where s is the sign bit, z is a bit to signal a zero or not, v is the significand or mantissa, and p is the exponent. We
also maintain a flag err which determines whether some form of error state in the floating point variable has occured
(e.g. underflow, overflow, division by zero, taking square roots of negative numbers etc). Note, like all floating point
representations such an encoding is an approximation. The reader should note that our encoding does not directly
emulate IEEE floating point standards, but tries to mimic them (thus precision etc of results will be different from
IEEE standard).
If you want to have IEEE floating point arithmetic then you can do this by using an sregint which encodes an
IEEE double value. You can then operate on these values using the Garbled Circuit routines, see Section 10.6. It is
possible to convert between the sfloat and IEEE double values held in an sregint using the following operations.
import floatingpoint
a = sfloat(0.0)
b = floatingpoint.sfloat_to_ieee(a)
c = floatingpoint.ieee_to_sfloat(b)
The conversion will work always, but its accuracy will depend on the values for plen and vlen below.
Our encoding is in line with the one described in detail by Aliasgari et. al [ABZS13] and used across various
complex protocols for mathematical operations on MPC. The values s, z, v, p and err are kept in secret shared format.
We also maintain a statistical security parameter κ associated with an sfloat which needs to satisfy
It is possible to change the default sizes for sfloat values using the commands.
sfloat.plen=5
sfloat.vlen=10
sfloat.kappa=20
89
The default values being 8, 24 and 40 respectively.
The err flag needs to be treated with some care. For efficiency reasons this can be any integer, but only a value
of zero represents a valid number. The flag is initialized to 0 and this value would only change in case an error is
produce. We propagate the error by adding the err flags of the operating instances. Thus any positive value indicates
an error, and also could, if revealed, reveal how many errors have occured in a calculation. Thus before revealing an
sfloat instance, we first obtain the bit b = (err == 0), and multiply it by the data components of the instance. So
an sfloat value will equal zero if any error has occured. That way we guarantee that the output would not leak any
information related to the inputs or any intermediary value during the calculations.
The sfloat type supports basic arithmetic and logic operations with any clear or secret register, as well as
standard data types. Public values, as well as standard types are implicitly cast to cfloat before operating on an
sfloat instance. The result will always be of type sfloat. It is not recommended to utilize fixed and floating point
operations unless precision loss is taken into account.
Data Components: The following is a detailed description of the most important member state variables of sfloat,
their behaviour, and properties:
v
Accessed by: Default.
Type: sint.
Stores the mantissa/significand of the encoding in secret form.
p
Accessed by: Default.
Type: sint.
Stores the exponent of the encoding in secret form.
s
Accessed by: Default.
Type: sint.
Stores a {0, 1} value storing the sign of the instance, where h0i means is positive and h1i otherwise.
z
Accessed by: Default.
Type: sint.
Stores a {0, 1} value to flag when the sfloat instance’s value is zero. When this variable takes a value of h0i, it
means that, the sfloat instance is a non-zero value and h1i otherwise.
err
Accessed by: Default.
Type: sint.
Stores a register of type sint, that serves to flag whether the instance is reporting a numeric error. When this variable
takes a value of h0i, it means that the sfloat instance is valid and non-zero otherwise. Although not recommended,
in case of an error, if this value is later opened, it would disclose the number of chained operations executed on t op
its operation tree since the error was produced.
Special Operations:
90
sfloat.set_error(error)
Accessed by: Default.
Parameters:
error: sets the default precision error.
Returns: N/A.
Description: Not to be confused with the err flag. The error, in this case, pertains to the precision of the representation
which is initialized in 0. It can be increased as follows:
cls.error += error - cls.error * error
The formulation follows the literature on this topic.
Example: To alter the error you can do the following:
a =sfix(1.5)
a.set_error(0.1)
sfloat.convert_float(v,vlen,plen)
Accessed by: Default.
Parameters:
v: data publicly available value (numeric Python data-type) to be transformed into float representation.
vlen: bit-lenght of the mantissa that will encode v.
plen: bit-lenght of the exponential encoding of v.
Returns: A tuple composed by the 5 data components of a sfloat. It can be used to instantiate a new sfloat
obtect.
Description: Transforms and secret shares an input in plain text to our floating point representation. Note that both
vlen and plen parameters should be sufficiently large to handle v.
Example: An examplle, using would look as follows:
float_value= sfloat(9999, 14 ,1 ,0 , 0)
print_ln("The float representation of 9999 is %s", float_value.reveal())
sfloat.sizeof()
Accessed by: Default.
Parameters: N/A
Returns: Python native int value corresponding to the size of memory slots occupied by the instance.
Description: Note that each sfloat instance is composed by 5 different values (v, p, z, s, and err), it returns the
global_vector_size times 5. In case the instance is vectorized, it would return the vector size time 5.
Example: To obtain reciprocal you can execute:
a = sfloat(4.5)
r = a.sizeof()
print_ln("the answer is %s", r) # the output is 5.
# By Default the global_vector_size is set to 1.
91
Data Components: The following is a detailed description of the most important member state variables of cfloat,
and how it stores its value representation:
v
Accessed by: Default.
Type: cint.
Stores the significand in clear form.
p
Accessed by: Default.
Type: cint.
Stores the exponent in clear form.
s
Accessed by: Default.
Type: cint.
Stores a {0, 1} value storing the sign of the instance, where 0 means is positive and 1 otherwise. Our aim is to mimic
the behaviour of its secret shared counterpart.
z
Accessed by: Default.
Type: cint.
Stores a {0, 1} value to flag when the cfloat instance’s value is zero. When this variable takes a value of 0, it means
that the cfloat instance is a non-zero value and 1 otherwise. Our aim is to mimic the behaviour of its secret shared
counterpart.
err
Accessed by: Default.
Type: cint.
Stores a register of type cint, that serves to flag whether the instance is reporting a numeric error. When this variable
takes a value of 0 it means that the sfloat instance is valid and non-zero otherwise.
Special Operations:
cfloat.sizeof()
Accessed by: Default.
Parameters: N/A
Returns: Python native int value corresponding to the size of memory slots occupied by the instance.
Description: Note that each sfloat instance is composed by 4 different values (v, p, z and s), it returns the
global_vector_size times 4. In case the instance is vectorized, it would return the vector size time 4.
Example: To obtain reciprocal you can execute:
a =cfloat(4.5)
r = a.sizeof()
print_ln("the answer is %s", r) # the output is 4.
# By Default the global_vector_size is set to 1.
92
7.2.5 Branching and Looping
As discussed earlier, all native Python loops and branches are evaluated at compile time. So to allow for control
flow breaks depending on register values that are not known at compile time, and also to prevent unrolling of loops
(reducing code size), the special library functions for looping and branching should be used. Because of the constraints
of the compiling process, there are several possibilities for both branching and looping. Some of them require that the
body of a loop or branch is written in a separate function. The loop or branch is then initiated by a special function
call, taking the loop or branch function as a parameter.
In that case value in register c is non-zero (1), so in both cases True will be printed in the terminal. More complex
if/else statements (e.g. with multiple else conditions), can be created by simply nesting further if_statement calls
within if_false in the left example, or adding another statement after else_then in the right example.
Looping: All types of while-loops and range-loops with simple examples, like in other languages we can implement
same function using different types of loops.
do_loop(condition, loop_fn)
Executes the code in loop_body once, and continues to do so while the clear register returned by loop_body
is non-zero. Unlike with if and else statements, here the loop function must always return the loop condition regis-
ter. Otherwise there is no way of knowing which register holds the condition after each iteration, since the original
condition register cannot be modified:
def loop_body(cond):
print_ln("%s", cond)
return cond-1
t = cint(5)
do_loop(t, loop_body)
Prints numbers from 5 to 1.
93
while loop @while do
def loop body(i): @while do(lambda x: x < 5, 0)
print ln(”%s”,i+1) def loop body(i):
return i+1 print ln(”%s”,i+1)
return i+1
while loop(loop body, lambda x: x < 5, 0)
Both prints numbers from 1 to 5.
range_loop(loop_body, stop)
range_loop(loop_body, start, stop[, step])
For-range loops can also be implemented in two ways, second way is not very different from clasic for loop, if start
value is not declared it is automaticaly set to be 0, as on the example:
do_while(loop_body)
Finally, it is the most traditional variant of a do-while loop. The loop is stopped when the return value is zero or false.
However, variables declared outside the function cannot be modified inside the function, but they can be read. So in
order to create finite loop, memory have to be used, MemValue is the easiest way to deal with memory:
do while @do while
def loop body(): m = MemValue(cint(0))
m.write(m+1)
print ln(”%s”, m) @do while
return m < 5 def loop body():
m.write(m+1)
m = MemValue(cint(0)) print ln(”%s”, m)
do while(loop body) return m < 5
Both prints numbers from 1 to 5.
Memory in a classical way can also be used, address in function is read only, same function once again:
7.2.6 Arrays
Arrays are made very similar to the other programming languages, using the array[index] indexing method. To declare
array:
94
type.Array(size)
Where size is just a compile-time known integer, type is the data type of single array element, for exaple sint or cint.
Note that in previous versions, the array declaration was Array(size, type). This is still suported but consider it as
deprecated.
new_array = sint.Array(10)
@for_range(len(new_array))
def range_body(i):
new_array[i] = sint(i+1)
@while_do(lambda x: x < 5, 0)
def while_body(i):
print_ln("%s", new_array[i].reveal())
return i+1
Declares new array of size 10, then fills it with values from 1 to 10, at the end prints first five values, so it prints
numbers from 1 to 5. Note that the values of array can be modified inside the function, they are exacly like MemValue.
type.Matrix(rows, columns)
2D arrays are also implemented, but they are called matrices, matrix can be used in the same way as array:
new_matrix = sint.Matrix(3,2)
@for_range(len(new_matrix))
def range_body(i):
new_matrix[i][0] = sint(2*i)
new_matrix[i][1] = sint(2*i+1)
@while_do(lambda x: x < 3, 0)
def while_body(i):
@for_range(2)
def range_body(j):
print_ln("%s",new_matrix[i][j].reveal())
return i+1
Matrix is just an array of arrays so length of matrix is just number of arrays inside, as it is shown on the example the
first value of declaration is number of arrays inside, the second is length of an arrays(example prints numbers from 0
to 5).
type.MultiArray([n_1,...,n_k])
kD arrays are also implemented. They are meant to be just containers (retrieve and set data) - no fancy functions added
on top of them such as assign all, etc:
n = 3
m = 4
p = 5
a = sint.MultiArray([n,m,p])
b = sint.MultiArray([n,m,p])
c = sint.MultiArray([n,m,p])
for i in range(n):
for j in range(m):
for k in range(p):
95
a[i][j][k] = i + j + k
b[i][j][k] = 2 * (i + j + k)
c[i][j][k] = (a[i][j][k] + b[i][j][k])
# now c[i][j][k] = 3 * (i + j + k)
Short-circuit evalution: The following functions provide short-circuit evaluation, which means that only necessary
terms are executed, just as && and || in C or or and and in Python:
However, the functions should be combined directly because they already output lambda functions:
not_(and_(lambda: x < y, lambda: a < b), lambda: c < d)
It is possible to use short-circuit evaluation for branching and do_while loops but not do_loop loops because the
condition also represents the state in the latter case.
if_then(and_(lambda: x < y, lambda: a < b))
...
end_if()
@do_while
def f():
...
return and_(lambda: x < y, lambda: a < b)
@function_block
def f(i):
print_ln("The value is %s",i)
@function_block
def h(i):
return (sint(0), sint(1)), sint(i)
@function_block
def m(i=1):
return i+1
96
g()
f(3)
hh=h(4)
print_ln("The returned value is %s %s %s",hh[0][0].reveal(), hh[0][1].reveal(), hh[1].
reveal())
print_ln("m value : %s",m())
print_ln("m value : %s",m(2))
The functions are called executing a CALL byte-code, which pushes the current program counter onto the integer stack.
If you call RETURN() from within a MAMBA program this will execute a jump to the address on the integer
stack. If the integer stack is empty, i.e. you are not within a function or subroutine, then this jump is to the end of the
current tape.
7.2.8 Multi-threading
Creating and running multiple, concurrent tapes on the virtual machine is supported via a simple threading interface.
A thread is created by creating an MPCThread object referring to a function, which contains the code to be executed
by the thread.
def func1():
store_in_mem(sint(1),0)
store_in_mem(sint(1),0)
for i in range(8):
a = load_secret_mem(i)
b = load_secret_mem(i+1)
store_in_mem(a+b, i+2)
t = MPCThread(func1, ’t1’)
t.start()
t.join()
7.2.9 Testing
Testing of programs is done using the test.sh script, and the library functions test and test_mem. To test
output, execution of the library functions is emulated in Python by the test_result.py script, and compared with
the actual output from running the virtual machine. To trigger tests when running the script, calls to the following
functions must be inserted into the source code file.
97
7.2.10 SIMD Operations
As explained in the section on byte-codes (Section 5) most byte-codes can be vectorized. In most cases this results in
just a saving on resulting code size, and also in cycles for processing the byte-codes. However for the STARTOPEN
and STOPOPEN commands this results also in a reduction in the number of rounds of communication needed.
Just as one can call the non-vectorized byte-codes directly from MAMBA, one can also call the vectorized ones
directly as well. See how a parallel load and store from memory is done in the following example. The ldms byte-
code is executed n times in parallel by changing it to vldms and then giving the number of times this should be
executed as the first argument.
n = 100
A = Array(n, sint)
B = Array(n, sint)
C = Array(n, sint)
a = sint(size=n)
b = sint(size=n)
vldms(n, a, A.address)
vldms(n, b, B.address)
c = a * b
vstms(n, c, C.address)
C[0].reveal().print_reg()
C[1].reveal().print_reg()
C[n / 2].reveal().print_reg()
C[n - 1].reveal().print_reg()
Notice the multiplication open is performed in parallel and executes 100 multiplications in one go. This single Mamba
line is compiled down into byte-codes as...
vtriple 100, s100, s0, s200 # 0
vsubs 100, s400, s500, s0 # 2
vsubs 100, s300, s500, s100 # 4
vstartopen 100, 2, s300, s400 # 5
vstopopen 100, 2, c100, c200 # 6
vmulm 100, s300, s0, c100 # 7
vadds 100, s0, s200, s300 # 8
vmulm 100, s200, s100, c200 # 9
vadds 100, s100, s0, s200 # 10
vmulc 100, c0, c100, c200 # 11
vaddm 100, s0, s100, c0 # 12
Usage of the SIMD operations works in much the same as the above code snippet. For example if you call a class
function with no argument to determine the vector length you need to also pass in the vector length you want, via
size=n. Otherwise the SIMD vector length is deduced from the class variable. For example see the following code:
n = 100
A = Array(n, sint)
98
# assign some dummy values
for i in range(n):
A[i] = sint(2 * i)
a = sint(size=n)
b = sint(size=n)
vldms(n, a, A.address)
d = a + b
d.reveal_to(2)
e=d.reveal()
f= a*e
f.reveal_to(0)
g=a+e
g.reveal_to(0)
h=a.reveal()
i=e*h
i.public_output()
j=e+h
j.public_output()
sint.push(a)
sint.poke(s,b)
c=sint.peek(s)
c.reveal_to(1)
Note, not all vectorized operations have been tested for correctness, if you find a bug let us know and we will fix it.
99
8 The IO Class
A major change in SCALE over SPDZ is the way input and output is handled in the system to an outside data
source/sync. In SPDZ various hooks were placed within the compiler/byte-code to enable external applications to
connect. This resulted in code-bloat as the run-time had to support all possible ways someone would want to connect.
In SCALE this is simplified to a simple IO procedure for getting data in and out of the MPC engine. It is then up
to the application developer to write code (see below) which catches and supplies this data in the way that is wanted
by their application. This should be done without needing to modify then byte-code, runtime system, or any compiler
being used.
To use this interface, an application programmer will have to write their own derived class, just like in the example
class
src/Input_Output/Input_Output_Simple.h
given in the distribution. Then to compile the system they will need to place the name of their derived class the correct
place in the main program,
src/Player.cpp
including any configuration needed. To load your own IO functionality you alter the line
auto ios= std::make_unique<Input_Output_Simple>();
100
let mut your_struct = ...; // somehow initialize your object
let handle = ScaleMambaHandle { handler: &mut your_struct };
let exit_code = run_scale_from_rust(
...,
create_input_output_base_rust,
&mut handle,
);
assert_eq!(exit_code, 0);
These correspond to a combination of the old SPDZ byte-codes STARTPRIVATEOUTPUT and STOPPRIVATEOUTPUT.
The instruction enables a designated party to obtain output of one of the secretly shared variables, this is then passed to
the sytem by a call to the function IO.private_output_gfp(const gfp& output,unsigned int channel)
in the C++ IO class.
PRIVATE_INPUT.
This correspond to the old SPDZ byte-codes STARTINPUT and STOPINPUT. The instruction enables a designated
party to enter a value into the computation. The value that the player enters is obtained via a call to the member
function IO.private_input_gfp(unsigned int channel) in the C++ IO class.
101
8.2.5 Share Output
In some situations the system might want to store some internal state, in particular the shares themselves. To enable
this we provide the OUTPUT_SHARES byte-code (corresponding roughly to the old READFILESHARE byte-code
from SPDZ). The IO class can do whatever it would like with this share obtained. However, one needs to be careful
that any usage does not break the MPC security model. The member function hook to deal with this type of output is
the function IO.output_shares(const Share& S,unsigned int channel).
8.3.2 Trigger
There is a special function IO.trigger(Schedule& schedule) which is used for the parties to signal that
they are content with performing a RESTART command. See Section 9 for further details.
8.3.4 Crashing
The byte-code CRASH calls the function IO.crash(unsigned int PC, unsigned int thread_num).
This enables the IO class to be able to process any crash in an application specific way if needs be. If the IO.crash
command returns, as opposed to causing a system shut-down, then the RESTART instruction will be called immedi-
ately. This means an online program can crash, and then a new one can be restarted without loosing all the offline data
that has been produced.
102
# Private Input from player 0 on channel 0 and on channel 15
a=sint.get_private_input_from(0)
a=sint.get_private_input_from(0,15)
103
9 Programmatic Restarting
Because, unlike in SPDZ, the offline and online phases are totally integrated, it can takes a long time for the queues of
pre-processed triples to be full. This means that quickly starting an instance to run a short program can be very time-
consuming. For reactive systems in which the precise program which is going to be run is not known until just before
execution this can be a problem. We therefore provide a mechanism, which we call RESTART, to enable a program
to programatically restart the online runtime, whilst still maintaining the current offline data. In the byte-code this is ac-
cessed by the op-code RESTART and in MAMBA it is accessed by the function restart(Schedule &schedule).
Both must be called in online thread zero.
The basic model is as follows. You start an instance of the SCALE engine with some (possibly dummy) program.
This program instance will define the total number of online threads you will ever need; this is because we do not
want a restart operation to have to spawn new offline threads. You then ensure that the last operation performed by this
program is a RESTART/restart(Schedule &schedule). This must be executed in thread zero, and to avoid
undefined behaviour should only happen when all other thread programs are in a wait state.
After calling RESTART/restart(Schedule &schedule) the runtime waits for a trigger to be obtained
from each player on the IO functionality (see Section 8). This trigger can be anything, but in our default IO class it
is simply each player needing to enter a value on the standard input. The reason for this triggering is that the system
which is calling SCALE may want to replace the underlying schedule file and tapes; thus enabling the execution of a
new program entirely.
The specific restart(Schedule &schedule) function in the IO functionality will now be executed. In the
provided variant of IO functionality, given in Input_Output_Simple, the schedule file is reloaded from the same
directory as the original program instance. This also reloads the tapes etc.
You could implement your own restart(Schedule &schedule) function in your own IO functionality
which programmatically alters the underlying schedule to be executed, via the argument to restart. This enables a
more programmatic control of the SCALE engine.
To see this in operation we provide the following simple examples, in Programs/restart_1/restart.mpc
and Programs/restart_2/restart.mpc. To see these in operation execute the following commands from the
main directory (assuming you are using the default IO functionality,
\cp Programs/restart_1/restart.mpc Programs/restart/
./compile.py Programs/restart
Now run the resulting program as normal, i.e. using Programs/restart as the program. When this program has
finished it asks (assuming you are using the default IO class) for all players to enter a number. Do not do this yet! First
get the next program ready to run by executing
\cp Programs/restart_2/restart.mpc Programs/restart/
./compile.py Programs/restart
Now enter a number on each players console, and the second program should now run. Since it also ends with a
RESTART byte-code, you can continue in this way forever.
104
once all other threads have finished. A demonstration of the clear_memory() command called from MAMBA is
available in the program directory /Programs/mem_clear_demo.
The second command is clear_registers(), which corresponds to the byte-code CLEAR_REGISTERS.
This resets the registers in the calling processor to zero; it does nothing to the register files in the other threads. The
purpose is again to avoid mistakes from programmers. One should consider emitting this instruction at the end of the
code of every thread. However, the instruction ordering when compiled from MAMBA may be erratic (we have not
fully tested it); thus its usage should be considered experimental. The exact order of it being emitted can be checked
by running the player with a negative verbose number (which outputs the instructions being executed in the order they
are found). Again the usage is demonstrated in the program directory /Programs/mem_clear_demo.
105
10 System and User Defined Binary Circuits
SCALE-MAMBA allows one to call user defined circuits, both from the byte-code and the MAMBA language. In this
section we explain how this is done. We also explain the circuits we provide: These currently are for some symmetric
cryptographic operations such as AES, SHA-2 and SHA-3, as well as IEEE floating point arithmetic.
src/GC/Circuit.h
To create such files you can use any tool you want, but here is how we do ours....
If you look in the directory Circuits/VHDL you will find VHDL code for various functions. Write your VHDL
code and compile it to a netlist using whatever tool chain you have. The key point is that inputs and outputs to the
function must be a multiple of 64-bits in length. If this is not the case you just need to pad the input/output to the
correct size.
You then need to convert the netlist into a simple “Bristol Fashion”. To do this we use a small program called
convert.x which lives in the Circuits directory. This takes a .net file in the VHDL subdirectory and turns it
into a .txt file in the Bristol subdirectory.
Note 1: Some optimizations are applied to the circuit on this transfer, which include removing unused wires etc.
If your original function needed padding to make 64-bit multiple inputs/outputs you need to stop this optimization. To
do this just comment out the line SC.Simplify() in the program convert.cpp.
Note 2: The circuits in the subdirectory Bristol are in the simple Bristol fashion format, with no MAND gates.
You should now have a Bristol Fashion file with an extension .txt in the directory Circuits/Bristol.
To test this file does what you expect it to do you can use similar code to that which is given in the program
Test/Test-Circuit.cpp or Test/Test-Convert.cpp.
What this does is tell the runtime that we are going to use a new circuit numbered 65536, that it is not yet loaded into
memory, and that when it does need to be loaded where to get it from.
3 Ifyou have a cool circuit which you think others might find cool to use in SCALE-MAMBA, just send it to us and we might include it in a
future release with a circuit number less than 65536.
106
10.3 Byte Code Operation
The circuit is executed by the GC byte-code. This takes as input a single integer which defines the circuit to be
evaluated. The arguments to the circuit, as sregint values, are then popped off the sregint-stack and then passed
into the binary circuit engine. On return the resulting result values are then pushed onto the sregint-stack. Since a
sregint register is 64-bits long, this explains why the circuits take inputs/outputs a multiple of 64-bits.
When this instruction is executed, if the circuit has not yet been loaded it is loaded into memory. Then the binary
circuit engine is called with the given input values to produce the output. Note, that the first time a GC operation is
met, the runtime might give a small delay as the various pre-processing queues need to fill up, this is especially true
for the HSS based binary circuit engine. In subsequent GC operations this stall disappears (unless you are doing a lot
of GC operations one after another).
Note: When using the binary circuit engine for Shamir and Replicated sharing when loading a user define circuit
for the first time there is also a delay as the circuit gets ‘optimized’ on the fly by adding MAND gate instead of many
single AND gates. This can take a while for large circuits. Thus if timing GC operations always do a dummy execution
first before executing any timing.
Due to the way the queues are produced if you pass in a VERY big circuit the runtime might abort. If this happens
please let us know and we will fix this. But as we never meet this limit we have a cludge in place to just catch the issue
and throw an exception. The exception says to contact us, so you will know which one it is.
def push_data(stuff,n):
for i in range(n):
sregint.push(stuff[i])
def pop_data(stuff,n):
for i in range(n):
stuff[n-i-1]=sregint.pop()
# Set key
key = [sregint(0), sregint(-1)]
mess = [sregint(-1), sregint(1)]
ciph = [sregint() for _ in range(2)]
print_ln("AES-128")
push_data(key,2)
push_data(mess,2)
# Op
GC(AES_128)
pop_data(ciph,2)
107
print_ln("Key")
k[1].public_output()
k[0].public_output()
print_ln("Message")
m[1].public_output()
m[0].public_output()
print_ln("Ciphertext")
c[1].public_output()
c[0].public_output()
Note, that the AES-128 operation takes four input registers (two for the key and two for the message), and produces
two output registers as output. Also note bit ordering of the inputs and outputs. The correct output of AES-128 for the
above example is
406bab6335ce415f4f943dc8966682aa
In MAMBA we would hold this as the 64-bit integer 4614219293217783808; notice the bit order! To convert a
floating point value to this representation, and then print the value to the standard output device, we have the following
commands
r0=convert_to_float("3.125")
print_ieee_float(r0)
108
where here r0 will be a regint type. Note, that if you perform arithmetic on the associated regint you will not get
the equivalent of double arithmetic. There is currently no way to perform arithmetic on ‘clear’ floating point values.
However, we can perform arithmetic on ‘secret’ floating point values using the Garbled Circuit engine4 . We first
need to convert the regint to an sregint, which is done using the standard type conversion operation. Then to
perform arithmetic we use the stack and the garbled circuits defined above.
Clearly, as the Garbled Circuit engine is stack based one can be more efficient by transforming any expression to
Reverse Polish notation and then executing the expression as this. The following code example, from the example
program Programs/GC_Float, illustrates how to use the operations.
FP_add=120
FP_mul=121
FP_div=122
FP_eq=123
FP_f2i=124
FP_i2f=125
FP_sqrt=126
r0=convert_to_float("3.125")
r1=convert_to_float("1.25")
r2=convert_to_float("-6.535353")
r3=convert_to_float("199.3231")
s0=sregint(r0)
s1=sregint(r1)
s2=sregint(r2)
s3=sregint(r3)
# First compute
# s4 = (s0+s1)*(s2+s3)
#
# To do this we convert to reverse polish notation
#
# s0 s1 + s2 s3 + *
#
# Then we compute this by executing this as a stack programm...
#
sregint.push(s0)
sregint.push(s1)
GC(FP_add)
sregint.push(s2)
sregint.push(s3)
GC(FP_add)
GC(FP_mul)
conversion operation above. One presumes an application could do this themselves as it is not that hard programmatically.
109
r5=s5.reveal()
print_ieee_float(r5)
print_ln("\nThe last number should be -3.125\n")
# Now we divide s1 by s2
sregint.push(s1)
sregint.push(s2)
GC(FP_div)
s6=sregint.pop()
r6=s6.reveal()
print_ieee_float(r6)
print_ln("\nThe last number should be 1.25/-6.535353 = -0.191267403612322\n")
a=sa.reveal()
b=sb.reveal()
f=sf.reveal()
print_ln("Conversions...")
print_int(a); print_ln("")
print_ieee_float(f); print_ln("")
print_int(b); print_ln("")
Note how negation can be obtained by a bit flip in the sign position.
110
Shamir/Replicated Sharings The base triples are produced using Maurer’s simple protocol [Mau06] using the
underlying replicated sharing method. Thus this is use a PRSS to generate a and b, and then generate c via Schur-
Multiplication followed by resharing. The triples are then verified to be correct using Protocol 3.1 from [ABF+ 17].
This is all done in a single offline thread.
Note: This entire method (including the online phase) requires a sufficiently small Replicated representation of the
base access structure. Thus for Shamir sharings for which one cannot implement a PRSS (as the size of the associated
Replicated scheme is too large) we default to the HSS based method for Full Threshold sharings.
As the underlying access structure is Q2 the shares are automatically self-authenticating (see [SW19]), one can run
the MPC protocol from [SW19] to evaluate the circuit. Since this protocol is not constant round we need to ‘massage’
the Bristol Fashion circuit into its extended format, which is AND-depth oriented and includes the MAND gates.
This results in a delay when loading a circuit into SCALE for the first time. The protocol for multiplication within
the circuit makes use of the reduced communication methodology of [KRSW18]. With authentication performed by
hashing the opened sharings, and then comparing the hash value at appropriate moments.
Full Threshold/Q2 MSP Sharings Authenticated bits (called aBits) and authenticated triples (called aANDs) are
generated with BDOZ style MACs using an OT based pre-processing. The underlying OT protocol is from [CO15].
The base random OTs are converted into a large number of correlated COT’s, for which we use [FKOS15][Full Version,
Figure 19] and [KOS15][Full Version, Figure 7]. These correlated OTs are then converted into random sharings of
authenticated bits (so called aShares/aBits), for this step we use [HSS17][Full Version, Figure 16]. Finally these
aBits are converted into aANDs using [WRK17][Full Version, Figures 16, 8 and 18 in order]. The hash function in
the protocol to generated HaANDs from [WRK17] is implemented using the CCR-secure MMO construction from
[GKWY19]. Once the aBits and the aANDs are produced (in the offline threads) the protocol to process a circuit
utilizes the garbled-circuit BMR approach of [HSS17].
111
11 User Defined Local Functions
As you write complex applications you will soon see the need to execute local operations on each player which are
rather complex. For example this might be formatting data, or performing some local arithmetic which does not need
to be done securely. Prior to v1.5 there was two ways of doing this:
1. Write the local function in MAMBA or SCALE byte-codes directly. This is often both a pain from a program-
ming point of view, and also produced highly inefficient code.
2. Use the calling application to execute the local operations, and then use the I/O class to interact between the two.
This requires people to have calling applications, which any deployed application will have), but which a quick
experimental setup is unlikely to bother with. But it also requires expensive interaction between the SCALE
engine and the external application via the I/O class.
Having implemented the user defined circuits for Garbling we decided to also, in a similar manner, add a third method
of implementing local functions; namely via user-defined C++-code. To see this in operation we suggest looking at
the directory src/Local/, and looking at the files there. Here we implement some basic linear algebra routines.
Look at those files whilst reading this explanation.
Where the variable instr is the instruction number; which is akin to the earlier circuit number for garbled circuits.
We reserve all numbers less than 65536 for use by the developers, leaving you to define numbers greater than 65536.
Once again, if you have a good function which might be useful to others please let us know.
The local function is registered with an instruction number in the system by adding the function pointer to the
functions map in the file src/Local/Local_Functions.cpp. Must like user defined Garbled Circuits
were added into the system earlier.
Each local function obtains it arguments by popping any required data off the stacks, the functions outputs are then
placed on the stacks in a similar manner. See the BLAS.cpp example linear algebra routines for some examples.
112
Number Function
120 IEEE floating point (double) addition
121 IEEE floating point (double) multiplication
122 IEEE floating point (double) division
123 IEEE floating point (double) equality
124 IEEE floating point (double) to sregint
125 sregint to IEEE floating point (double)
126 IEEE floating point (double) sqrt
127 IEEE floating point (double) lt
128 IEEE floating point (double) floor
129 IEEE floating point (double) ceil
200 IEEE floating point (double) acos
201 IEEE floating point (double) asin
202 IEEE floating point (double) atan
203 IEEE floating point (double) cos
204 IEEE floating point (double) cosh
205 IEEE floating point (double) sin
206 IEEE floating point (double) sinh
207 IEEE floating point (double) tanh
208 IEEE floating point (double) exp
209 IEEE floating point (double) log
210 IEEE floating point (double) log10
211 IEEE floating point (double) fabs
113
def push_Cint_matrix(A,n,m):
regint.push(regint(n))
regint.push(regint(m))
for i in range(n):
for j in range(m):
cint.push(A[i][j])
def push_Sint_matrix(A,n,m):
regint.push(regint(n))
regint.push(regint(m))
for i in range(n):
for j in range(m):
sint.push(A[i][j])
def pop_Cint_matrix(A,n,m):
mm=regint.pop()
nn=regint.pop()
if_then(nn!=n or m!=mm)
print_ln("Something wrong")
print_ln("%s %s",nn,mm)
end_if()
for i in range(n-1,-1,-1):
for j in range(m-1,-1,-1):
A[i][j]=cint.pop()
def pop_Sint_matrix(A,n,m):
mm=regint.pop()
nn=regint.pop()
if_then(nn!=n or m!=mm)
print_ln("Something wrong")
print_ln("%s %s",nn,mm)
end_if()
for i in range(n-1,-1,-1):
for j in range(m-1,-1,-1):
A[i][j]=sint.pop()
n=2
l=3
m=2
Cp_A=cint.Matrix(n,l)
Cp_B=cint.Matrix(l,m)
Cp_out=cint.Matrix(n,m)
CpGE_out=cint.Matrix(n,l)
Sp_A=sint.Matrix(n,l)
Sp_B=sint.Matrix(l,m)
Sp_out=sint.Matrix(n,m)
cnt=1
114
for i in range(n):
for j in range(l):
Cp_A[i][j]=cint(cnt)
Sp_A[i][j]=sint(cnt)
cnt=cnt+1
for i in range(l):
for j in range(m):
Cp_B[i][j]=cint(cnt)
Sp_B[i][j]=sint(cnt)
cnt=cnt+1
push_Cint_matrix(Cp_A,n,l)
push_Cint_matrix(Cp_B,l,m)
LF(0)
pop_Cint_matrix(Cp_out,n,m)
push_Sint_matrix(Sp_A,n,l)
push_Cint_matrix(Cp_B,l,m)
LF(1)
pop_Sint_matrix(Sp_out,n,m)
push_Cint_matrix(Cp_A,n,l)
push_Sint_matrix(Sp_B,l,m)
LF(2)
pop_Sint_matrix(Sp_out,n,m)
push_Cint_matrix(Cp_A,n,l)
LF(3)
pop_Cint_matrix(CpGE_out,n,l)
115
12 FHE Security
In this chapter we detail how FHE parameters are selected. The first part deals with how the basic “sizes” are derived,
and the second part gives the mathematical justification (i.e. noise analysis) for the equations used in the code for
setting up parameters. Before proceeding it is perhaps worth detailing how different security parameters are used
within the code.
• sacrifice sec (default 80): This defined how many sacrifices we do per triple, to get the precise number you
compute dsacrifice sec/ log2 pe. Thus this statistical security parameter is effectively log2 p unless p is very
small.
• macs sec (default 80): For the full threshold case this defines how many MACs are held per data item, again
to get the precise number you compute dmacs sec/ log2 pe. Thus, again, this statistical security parameter is
effectively log2 p unless p is very small.
• Sound sec (default 128): This defines the soundness error of the ZKPoKs below, the value of 2−Sound sec defines
the probability that an adversary can cheat in a single ZKPoK.
• ZK sec (default 80): This defines the statistical distance between the coefficients of the ring elements in an
honest ZKPoK transcript and one produced by a simulation.
• DD sec (default 80): This defines the statistical distance between the coefficients of the ring elements in the
distribution produced in the distributed decryption protocol below to the uniform distribution.
• comp_sec (default 128): This defines the computational security parameter for the FHE scheme.
• (default 55): This defines the noise bounds for the FHE scheme below in the following sense. A FHE ciphertext
in the protocol is guaranteed to decrypt correctly, even under adversarial inputs assuming the ZKPoKs are
applied correctly, with probability 1 − φ(m) · 2− . In fact this is an under-estimate of the probability. From
we define ei such that erfc(ei )i ≈ 2− and then we set ci = eii .
• NewHopeB (default 1): This defines
P how Gaussians are selected in the FHE system for Full Threshold. We use
the NewHope approximation of bi − b0i , where bi , b0i ∈ {0, 1}, with the sum being
pover NewHopeB values of
i. This gives an approximation to a discrete Gaussian with standard deviation σ = NewHopeB/2.
• HwtSK (default 64): The Hamming weight of the secret key. If this zero then the discrete Gaussian is used for
the secret key.
https://bitbucket.org/malb/lwe-estimator
For various values of n and (symmetric equivalent) security levels comp_sec, we then find the maximum secure
value of q. This is done by running the following sage code
116
load("estimator.py")
import sys
for n in range(10,17):
N = 2ˆn
#for B in [1,2,4,8,16,20]:
for B in [1]:
sigma=sqrt(B/2.0)
ans=[1,1,1]
cnt=0
for sec in [80,128,256]:
bot=0
top=40
if N>5000:
top=100
if N>20000:
top=256
if N>40000:
top=900
repeat=true
while repeat:
repeat=false
q=2ˆtop
costs = estimate_lwe(N, RealField(512)(sigma*sqrt(2*pi)/q), q, \
reduction_cost_model=BKZ.sieve, skip=["arora-gb", \
"bkw", "dec", "mitm"])
if not any([cost.values()[0]<2ˆsec for cost in costs.values()]):
bot=top
top=2*top
repeat=true
while top <> bot:
mid=round((top+bot)/2-.5)
if (mid==bot):
break
sys.stdout.flush()
q = 2ˆmid
costs = estimate_lwe(N, RealField(512)(sigma*sqrt(2*pi)/q), q, \
reduction_cost_model=BKZ.sieve, skip=["arora-gb", \
"bkw", "dec", "mitm"])
if any([cost.values()[0]<2ˆsec for cost in costs.values()]):
top=mid
else:
bot=mid
sys.stdout.flush()
ans[cnt]=bot
cnt=cnt+1
print N,"&",B,"&",sigma,"&",ans[0],"&",ans[1],"&",ans[2],"\\\\"
sys.stdout.flush()
When run via
sage < SCALE-Est.py > Res.txt
117
this will produce lines of the form
1024 & 1 & 0.707106781186548 & 40 & 25 & 12 \\
This that for ring dimension 1024, with NewHopeB equal to one, and so σ = 0.707, that at the 80-bit security level
any q < 240 will be “secure” by the above definition of secure. Note that producing the table for NewHopeB equal to
one produces values which remain secure when a higher value of NewHopeB is selected.
We did this in Oct 2019 and obtained the following table of values, giving maximum values of q in the form of 2x
for the value x from the following table.
N NewHopeB σ comp_sec=80 comp_sec=128 comp_sec=256
1024 1 0.707106781186548 40 25 12
2048 1 0.707106781186548 82 52 26
4096 1 0.707106781186548 167 106 56
8192 1 0.707106781186548 340 215 115
16384 1 0.707106781186548 686 436 235
32768 1 0.707106781186548 1392 879 473
65536 1 0.707106781186548 2830 1778 953
Any updates to this table needs to be duplicated in the file FHE_Params.cpp in the code base.
where κ(a) : R −→ Cφ(m) is the canonical embedding. The key three relationships are that
can can
kak∞ ≤ cm · kak∞ and kak∞ ≤ kak1 and kak1 ≤ φ(m) · kak∞ .
for some constant cm depending on m. Since in our protocol we select m to be a power of two, we have cm = 1. In
particular (which will be important later in measuring the noise of potentially dishonestly generated ciphertexts) we
have
can
kak∞ ≤ φ(m) · kak∞ .
We also define the canonical embedding norm reduced modulo q of an element a ∈ R as the smallest canonical
embedding norm of any a0 which is congruent to a modulo q. We denote it as
can
|a|can
q = min{ ka0 k∞ : a0 ∈ R, a0 ≡ a (mod q) }.
We sometimes also denote the polynomial where the minimum is obtained by [a]can
q , and call it the canonical reduction
of a modulo q.
Following [GHS12][Appendix A.5] we examine the variances of the different distributions utilized in our protocol.
• HWT (h, N ): This generates a vector of length N with elements chosen at random from {−1, 0, 1} subject to
the condition that the number of non-zero elements is equal to h.
• ZO(0.5, N ): This generates a vector of length N with elements chosen from {−1, 0, 1} such that the probability
of coefficient is p−1 = 1/4, p0 = 1/2 and p1 = 1/4.
• DG(σ 2 , N ): This generates a vector of length N with elements chosen according to the NewHope approxima-
tion to the discrete Gaussian distribution with variance σ 2 .
118
• RC(0.5, σ 2 , N ): This generates a triple of elements (v, e0 , e1 ) where v is sampled from ZOs (0.5, N ) and e0
and e1 are sampled from DG s (σ 2 , N ).
• U(q, N ): This generates a vector of length N with elements generated uniformly modulo q.
Let ζm denote any complex primitive m-th root of unity. Sampling a ∈ R from HWT (h, φ(m)) and looking at a(ζm )
produces a random variable with variance h, when sampled from ZO(0.5, φ(m)) we obtain variance φ(m)/2, when
sampled from DG(σ 2 , φ(m)) we obtain variance σ 2√· φ(m) and when sampled from U(q, φ(m)) we obtain variance p
q 2 · φ(m)/12. We let in what follows Vs denote HwtSK √ in the case when HwtSK is positive, and σ · φ(m)
otherwise. By the law of large numbers we can use c1 · V , where V is the above variance, as a high probability
bound on the size of a(ζm ) (the probability is 1 − 2− ), and this provides the same bound on the canonical embedding
norm of a with probability 1 − φ(m) · 2− . √
If we take a product of t such elements with variances V1 , V2 , . . . , Vt then we use ct · V1 · V2 · · · Vt as the resulting
high probability bounds. In our implementation we approximate DG(σ 2 , n) using the above binomial method from
the NewHope paper, this means any vector sampled from DG(σ 2 , n) will have infinity norm bounded by NewHopeB.
p1 ≡ 1 (mod p),
p0 − 1 ≡ p1 − 1 ≡ p − 1 ≡ 0 (mod φ(m)).
where esk ← DG(σ 2 , φ(m)) and the switching key data (as,s2 , bs,s2 ) is of the form
119
12.3.2 Encryption:
To encrypt an element m ∈ R, we choose v, e0 , e1 ← RC(0.5, σ 2 , n), i.e.
However, below (using the TopGear protocol) we will only be able to guarantee the m, v, e0 and e1 values of a sum
of n fresh ciphertexts (one from each party) are selected subject to
k2 · vk∞ ≤ 2ZK sec+3 ·n and k2 · e0 k∞ , k2 · e1 k∞ ≤ NewHopeB·2ZK sec+2 ·n and k2 · mk∞ ≤ 2ZK sec+1 ·n·p,
where ZK sec is our soundness parameter for the zero-knowledge proofs and U is selected so that (2 · φ(m))U >
2Sound sec . Thus in TopGear we double the generated ciphertext (which is the sum of the input players ciphertext) to
obtain a ciphertext (c0 , c1 ) which even in the case of dishonest players has a noise bound in the infinity norm in the
canonical embedding of,
n
X
can can can can can
kc0 − s · c1 k∞ ≤ k2 · mi k∞ + p · kesk k∞ · k2 · vi k∞ + k2 · e0,i k∞
i=1
can can
+ ksk∞ · k2 · e1,i k∞
≤ 2 · φ(m) · 2ZK sec+1 · n · p
+ p · c1 · σ · φ(m)3/2 · 2 · 2ZK sec+2 · n
+ φ(m) · 2 · 2ZK sec+2 · n · NewHopeB
+ c1 · Vs · φ(m) · 2 · 2ZK sec+2 · n · NewHopeB
41
= φ(m) · 2ZK sec+2 · n · p · + c1 · σ · φ(m)1/2 + NewHopeB · c1 · Vs
2
dishonest
= Bclean .
Again this is a probabilistic bound (assuming validly distributed key generation), but assumes the worst case ciphertext
bounds.
120
can
The value Bscale is an upper bound on the quantity kτ0 + τ1 · sk∞ , where κ(τi ) is drawn from a distribution which is
close to a complex Gaussian with variance φ(m) · p2 /12. We therefore, we can (with high probability) take the upper
bound to be
p p
Bscale = c1 · p · φ(m)/12 + c2 · p · φ(m)/12 · Vs ,
Reshare − 1
Reshare Version 1: This is described in Figure 3. The value Bdec in the protocol is an upper bound on the noise in
the canonical embedding ν associated with a ciphertext we will decrypt in our protocols. To ensure valid distributed
decryption we require
2 · (1 + n · 2DD sec ) · Bdec < q0 .
121
Given a value of Bdec , we therefore will obtain a lower bound on p0 by the above inequality. The addition of
a random term with infinity norm bounded by 2DD sec · Bdec /p in the distributed decryption procedure ensures that
the individual coefficients of the sum t1 + · · · + tn are statistically indistinguishable from random, with probability
2−DD sec . This does not imply that the adversary has this probability of distinguishing the simulated execution in
[DPSZ12] from the real execution; since each run consists of the exchange of φ(m) coefficients, and the protocol is
executed many times over the execution of the whole protocol. We however feel that setting concentrating solely on
the statistical indistinguishability of the coefficients is valid in a practical context.
Reshare Version 2: This is given in Figure 4. This protocol is simpler as it does not require the resharing to a new
ciphertext. So in particular players do not need to provide encryptions of their fi values, and hence there is no need to
execute the ZKPoK needed in the previous protocol.
Reshare − 2
12.3.6 SwitchKey(d0 , d1 , d2 ):
In order to estimate the size of the output noise term in the canonical embedding we need first to estimate the size of
the term (again probabilistically assuming validly generated public keys)
can
kp · d2 · es,s2 k∞ .
Following [GHS12] we assume heuristically that d2 behaves like a uniform polynomial with coefficients drawn from
[−q0 /2, . . . , q0 /2] (remember we apply SwitchKey at level zero). So we expect
q
can
kp · d2 · es,s2 k∞ ≤ p · c2 · q02 /12 · σ 2 · φ(m)2
√
= p · c2 · q0 · σ · φ(m)/ 12
= BKS · q0 .
Thus √
BKS = p · c2 · σ · φ(m)/ 12.
Then if the input to SwitchKey has noise bounded by ν then the output noise value in the canonical embedding will
be bounded by
BKS · p0
ν+ + Bscale .
p1
12.3.7 Mult(c, c0 ):
Combining the all the above, if we take two ciphertexts of level one with input noise in the canonical embedding
bounded by ν and ν 0 , the output noise level from multiplication will be bounded by
0
00 ν ν BKS · p0
ν = + Bscale · + Bscale + + Bscale .
p1 p1 p1
122
12.3.8 Application to the Offline Phase:
In all of our protocols the most complext circuit we will be evaluating is the following one: We first add n ciphertexts
together and perform a multiplication, giving a ciphertext with respect to modulus p0 with noise in the canonical
embedding bounded by
dishonest 2
Bclean BKS · p0
U1 = + Bscale + + Bscale .
p1 p1
dishonest
Recall that Bclean is the bound for a sum of n ciphertexts, one from each party. We then add on another n ciphertexts,
which are added at level one and then reduced to level zero. We therefore obtain a final upper bound on the noise in
the canonical embedding for our adversarially generated ciphertexts of
dishonest
Bclean
U2 = U1 + + Bscale .
p1
To ensure valid (distributed) decryption, we require
i.e. we take Bdec = U2 in our distributed decryption protocol. Note, that here we take the worst case bound for
the ciphertext noise, but probabilistic analysis everywhere else. Since the key generation is assumed to be honestly
performed.
This lower bound on p0 ensure valid decryption in our offline phase. To obtain a valid and secure set of parameters,
we search over all tuples (N, p0 , p1 ) satisfying our various constraints for a given “size” of plaintext modulus p;
including the upper bound on q1 = p0 · p1 obtained from the Albrecht’s tool via the table above.
The above circuit is needed for obtaining the sharings of the c value, where we need to obtain a fresh encryption
of c in order to be able to obtain the MAC values. For the obtaining of the MAC values of the a and b values we need
a simpler circuit, and we use the Distributed Decryption protocol from [KPR18][Appendix A].
123
Protocol ΠZKPoK
The protocol is parametrized by integer parameters U, V and flag ∈ {Diag, ⊥} as well as pk and further parameters of the
encryption scheme. Define ρ1 = 1 and ρ2 , ρ3 = NewHopeB.
3. Compute the ciphertexts by encrypting each row separately, thus obtaining C ← Enc(m, R; pk) ∈ RqU1×2 .
4. Output (x = (C), w = (m, R)).
in {0, 1}V ×U .
Response Phase: Response
1. Each Pi computes zi ← yi + W · mi and Ti ← Si + W · Ri .
2. Party Pi sets respi ← (zi , Ti ), and broadcasts respi .
Verification Phase: Verify
1. Each party Pi computes Di ← Enc(zi , Ti ; pk).
2. The parties compute A ← n
P Pn Pn Pn Pn
i=1 Ai , C ← i=1 Ci , D ← i=1 Di , T ← i=1 Ti and z ← i=1 zi .
3. The parties check whether D = A + W · C, and then whether the following inequalities hold, for l ∈ [V ],
4. If flag = Diag then the proof is rejected if z(l) is not a constant polynomial (i.e. a “diagonal” plaintext element).
5. If all checks pass, the parties accept, otherwise they reject.
Figure 5: Protocol for global proof of knowledge of a set of ciphertexts
124
13.2 Compilation
To compile the code you will first have to compile the main framework following the instructions in Section 3.1. Then
do
cd KeyGen
make
13.3.1 Options
In addition to prime_size, you can define the number of threads used to generate the pre-processing data. It is
important to note that we will need pre-processing data from two different finite field. Therefore, you can either
specifiy one number, in which case the program will start this number of threads for both finite fields.
./PlayerKeyGen.x 0 prime_size nb_Threads
./PlayerKeyGen.x 1 prime_size nb_Threads
Or you can detail how many threads you want to use for each finite field.
./PlayerKeyGen.x 0 prime_size nb_Threads1 nb_Threads2
./PlayerKeyGen.x 1 prime_size nb_Threads1 nb_Threads2
Because the first finite field is bigger than the second one, having nb_Threads1 > nb_Threads2 may result in a
faster runtime.
125
14 Advanced Protocols
This section details the protocols needed to perform complex functionalities on LSSS style secret shared values in
MAMBA. As well as the documentation here further, details can be found at
$(HOME)/Documentation/Compiler_Documentation/index.html
under the heading Files.
The protocols break the “arithmetic circuit” model of computation, as they make heavy use of pre-processed data
and the ability to Open shared values. In particular we assume three lists of pre-processed data:
An entry on the MultList is of the form (hai, hbi, hci) where c = a · b (mod p), an entry on the SquaresList is of
the form (hai, hbi) where b = a2 (mod p), whilst an entry on the BitsList is of the form hbi where b ∈ {0, 1}. We
also assume a function Random() which can generate a random secret sharing (this can be implemented by just taking
the first element from a multiplication triple). We add and multiply secret shared elements in what follows using the
notation
ha + bi ← hai + hbi, ha · bi ← hai · hbi.
The protocols in this chapter allow us to do more advanced operations, without resorting to using full blown
arithmetic circuits. We describe here what we have implemented as a form of documentation, both for us and for
others. Many of the protocol ideas can be found in the five documents
• Unconditionally Secure Constant-Rounds Multi-party Computation for Equality, Comparison, Bits and Expo-
nentiation. TCC 2006, [DFK+ 06].
• Improved Primitives for Secure Multiparty Integer Computation. SCN 2010, [CdH10].
• D9.2 of the EU project secureSCM, [sec].
• Secure Computation with Fixed-Point Numbers FC 2010 [CS10].
• Secure Computation on Floating Point Numbers NDSS 2013 [ABZS13].
Many of the protocols operating on integers make use of a statistical security parameter κ. If the integer being operated
on is k bits long, then we often require (k + κ) < log2 p. For ease of implementation of the protocols we recommend
k is always a power of two, and we assume this in the write up below. If this is not the case, obvious tweaks can be
made to the protocols.
Due to experience with the SPDZ system we prefer logarithmic round protocols over constant round protocols, it
appears that in practice the logarithmic round protocols outperform the constant round ones. The MAMBA compiler
can execute non-constant round protocols, by flicking a compile time switch. But here we only document logarithmic
round protocols.
126
14.1 Basic Protocols
14.1.1 Inv(hxi):
This produces a share hzi of 1/x (mod p), with an abort if x = 0.
1. hai ← Random().
2. hyi ← hai · hxi.
3. y ← Open(hyi).
4. If y = 0 then abort.
5. t ← 1/y (mod p).
6. hzi ← t · hai.
7. Return hzi.
MAMBA Example: To obtain the inverse of a sint or a cint can be done as follows:
d = sint(5)
d_inv = AdvInteger.Inv(d)
print_ln("inverse is correct if 1: %s", (d*d_inv).reveal())
127
(a) hti ← hbi−1 i · hai i.
(b) hdi i ← hti · hb−1
i i.
(c) di ← Open(hdi i).
3. For (i0 , i1 ) ∈ T
Qi1
(a) di0 ,i1 ← i=i 0
di .
(b) hai0 ,i1 i ← di0 ,i1 · hb−1
i0 −1 i · hbi1 i.
Again, this function does not exist “as is” in the MAMBA language, as it is used in place within the python compiler.
Thus it is only here for documentation reasons.
128
14.2 Bit Oriented Operations
These operations refer exclusively to how to perform bit oriented operations on top of mod p data-types. It is not
related to the sbit type.
MAMBA Example: To obtain the or of two sint or cint numbers you can:
a= sint(1)
b= sint(0)
print_ln("or is correct if 1: %s", (a or b).reveal())
print_ln("or is correct if 1: %s", AdvInteger.or_op(a, b).reveal())
MAMBA Example: To obtain the xor of two sint or cint numbers you can:
a= sint(1)
b= sint(0)
print_ln("or is correct if 1: %s", AdvInteger.xor_op(a, b).reveal())
MAMBA Example: Note that we basically want to achieve a construction capable to call any function in an iterative
fashion reducing computation time. In this sense a call to the function could be perform in the following way:
# addition
def addition(a, b):
return a + b
ar=[1]*16
# (k is exctracted from ar directly on the implementation)
print_ln("KOpL is correct if 32: %s", (AdvInteger.KOpL(addition,ar)).reveal())
This runs in logarithmic rounds.
129
14.2.4 PreOp( , ha1 i, . . . , hak i, k):
j
This computes the prefix operator hpj i = i=1 hai i, for 1 ≤ j ≤ k.
1. For i ∈ [1, . . . , log2 k] do
(a) For j ∈ [1, . . . , k/2i ] do
i. y ← 2i−1 + j · 2i .
ii. For z ∈ [1, . . . , 2i−1 ] do
A. hay+z i ← hay i hay+z i.
2. Return (ha1 i, . . . , hak i).
MAMBA Example: Similarly, we basically want to achieve a construction capable to call any function in an iterative
fashion reducing computation time. In this case, however, we return all the list of intermediate values. The function
call could be performed in the following way:
def addition(a, b):
return a + b
e = sint(2)
ar = [e]*16
# k is stracted from ar.
print_ln("PreOpL is correct if 32: %s", (AdvInteger.PreOpL(addition_triple,ar))[15].
reveal())
print_ln("PreOpN is correct if 32: %s", (AdvInteger.PreOpN(addition,ar))[15].reveal())
Note that both methods are implementations of the functionality with slightly different communication and round
complexity. With the PreOpL function corresponding to the pseudo-code above.
14.2.5 Sum-Bits(hxiB ):
This outputs a shared integer x in the range [0, . . . , 2k ) with 2k < p given the input k bits hxiB making up its binary
representation.
Pk−1
1. hxi ← i=0 2i · hxi i.
2. Output hxi.
To ease notation in what follows we will write hxiB = {hxi i}k−1
i=0 .
130
14.2.6 PRandInt(k):
This generates a random secret integer r in the range [0, . . . , 2k − 1]. In the pseudo-code below we let [BitsList] ? k
denote taking k bits from BitsList.
1. hriB ← [BitsList] ? k.
2. hri ← Sum-Bits(hriB ).
3. Return hri.
# k are parameters
k = 5
AdvInteger.PRandInt(x, k)
131
(a) hdi iB ← (XOR(hai i, hbi i), hai i · hbi i). [Note, hdi iB is a set of two shared bits, one being the XOR of ai
and bi , whilst the other the AND].
2. hdiB ← CarryOutAux(hdk−1 iB , . . . , hd0 iB , k).
3. (hpi, hgi) ← hdiB .
4. Return hgi.
This is computed using arithmetic operations (i.e. where the values are held as bits modulo p) as (p, g) = (p2 , g2 ) ◦
(p1 , g1 ) via
p = p1 · p2 ,
g = g2 + p2 · g1 .
Given this operation the function CarryOutAux is defined by, which is just a specialisation of the protocol KOp above,
1. If k > 1 then
(a) For i ∈ [1, . . . , k/2] do
i. hui iB ← hd2·i iB ◦ hd2·i−1 iB .
(b) hdiB ← CarryOutAux(huk/2 iB , . . . , hu1 iB , k/2).
2. Else
(a) hdiB ← hd1 iB .
3. Return hdiB .
MAMBA Example: This method is thought as a subroutine for CarryOut and should not be used outside that
context. the code is invoked in the following way:
kappa = 16
x = [cint(i) for i in [1,0]]
res = sint()
AdvInteger.CarryOutAux(res, x)
132
14.2.10 BitAdd((hak−1 i, . . . , ha0 i), (hbk−1 i, . . . , hb0 i), k):
This function also makes use of the operator ◦. The inputs are shared bits. The case where one set of inputs is in the
clear is obviously more simple, and we do not detail this here.
1. For i ∈ [0, . . . , k − 1] do
(a) hdi iB ← (XOR(hai i, hbi i), hai i · hbi i).
2. hck−1 , tk−1 i, . . . , hc0 , t0 i ← PreOp(◦, hdk−1 iB , . . . , hd0 iB , k).
3. hs0 i ← XOR(ha0 i, ha1 i).
4. For i ∈ [1, . . . , k − 1] do
(a) hsi i ← hai i + hbi i + hci−1 i − 2 · hci i.
5. hsk i ← hck−1 i.
6. Return (hsk i, . . . , hs0 i).
There is also a variant (requiring less operations) which does an increment by one, i.e. x ← x + 1.
MAMBA Example: The addition of two numbers expressed in bits, can be performed as follows:
a_bits = [sint(i) for i in [0,1,0,1,1]]
b_bits = [sint(i) for i in [0,1,0,1,1]]
# k is extracted from the array size
b = AdvInteger.BitAdd(a_bits, b_bits)
c = AdvInteger.BitIncrement(a_bits)
MAMBA Example: Comparing two secret shared numbers decomposed in bits can be achieved as follows:
x = [sint(i) for i in [0,1,1]]
y = [sint(i) for i in [1,0,1]]
z = sint()
kappa = 16
# k can be extracted from the array size
# in this case the bit that is the answer is contained in z
AdvInteger.BitLTFull(z, x, y)
133
MAMBA Example: Comparing an open register with a secret shared number decomposed in bits can be achieved
as follows:
x = cint(5)
y = [sint(i) for i in [1,0,1]]
z = sint()
kappa = 16
# k can be extracted from the array size
# in this case the bit that is the answer is contained in z
AdvInteger.BitLT(z, x, y, kappa)
14.2.13 BitDecFull(hai):
This produces the bit-decomposition of the shared value a with respect to the prime program.P. The method used
is the one from [NO07], and thus it does not depend on any statistical security gap. The next version is more effi-
cient variant, but only gives statistical security guarantees, and does not work for all a ∈ [0, . . . , p) is given. When
program.P is larger than 64-bit the method from [DFK+ 06] is used; not because it is faster but purely because
SCALE is not so adapt at doing the other method for large primes. This routine has a side-effect of writing to memory
locations in the sint memory in the first 0, . . . , log2 p locations.
MAMBA Example: A secret shared value can be decomposed into bits as shown in the following snippet:
x = cint(23)
bits = AdvInteger.BitDecFull(a)
for i in range(program.P.bit_length()):
print_str(’%s’,bits[i].reveal())
MAMBA Example: A secret shared value can be decomposed into bits as shown in the following snippet:
a = sint(23)
k = 5
m = 5
kappa = 20
# where b is bit array of type sint
b = AdvInteger.BitDec(a, k, m, kappa)
# you can also vall via (where it selects m=k)
b = a.bit_decompose(k, kappa)
134
14.3 Arithmetic with Signed Integers
In this section we define basic arithmetic on signed integers. We define Zhki as the set of integers {x ∈ Z : −2k−1 ≤
x ≤ 2k−1 − 1}, which we embed into Fp via the map x 7→ x (mod p). Most algorithms require a statistical security
gap of κ.
6. Return ha0 i.
MAMBA Example: The mod to a power of 2 of a secret shared integer register can be obtain as follows:
a_prime = sint(0) # a % 2 ˆ m
a = sint(100)
k = 16 # bit length of a
m = 2 # the power of two, i..e we want ( a mod 2ˆm )
kappa = 8
signed = True # True/False
135
14.3.3 Addition, Multiplication in Zhki
Addition and multiplication of two elements hai, hbi to obtain hci, where a, b, c ∈ Zhki is then easy to define by
performing
1. hdi ← hai hbi.
2. hci ← Mod2m(hdi, k 0 , k), where k 0 = k + 1 is = + and k 0 = 2 · k if = ·.
3. Return hci.
These functions are not directly callable from MAMBA, they are included here purely for documentation reasons.
MAMBA Example: You can obtain the value of two raised to a secret shared number as follows:
a = sint(23)
l = 32
kappa = 20
# y stores the result of 2ˆ23
y = AdvInteger.Pow2(a, l, kappa)
136
MAMBA Example: You can transform a number into its unary form as follows:
a = sint(3)
l=5
kappa=20
b,c = AdvInteger.B2U(a, l, kappa)
6. Return hdi.
MAMBA Example: A secret shared fractional register can be approximately truncated as follows:
a = sint(23)
k = 5
m = 3
kappa = 20
# where b is a register of type sint
b=AdvInteger.TruncPr(a, k, m, kappa)
MAMBA Example: You truncate a number as follows, where a is a register of type sint,
a = sint(23)
k = 5
m = 3
kappa = 20
a= AdvInteger.Trunc(a, k, m, kappa, True)
137
When the final value is set to true the value a is assumed to lie in Zhki , but when false the value a is assumed to lie in
[0, . . . , 2k−1 ).
There is also an exact version of Trunc called TruncRoundNearest which rounds the value a/2m to the nearest
integer. This is called as
a= AdvInteger.TruncRoundNearest(a, k, m, kappa)
MAMBA Example: You determine whether a number is less than zero as follows:
a = sint(1)
b = sint()
k=80
kappa=40
# b stores the result of x < 0
AdvInteger.LTZ(b, a, k, kappa)
138
In which case the default value of κ = 40 is chosen (when using a 128-bit prime modulus), this default value can be
altered by using the command
program.security = 100
The default value of k = 64 is used in this setting, and this can be altered by executing
program.bit_length = 40
The requirement is that k + κ must be less than the bit length of the prime p.
4. For i ∈ [0, . . . , k − 1] do
(a) hdi i ← ci + hri i − 2 · ci · hri i.
5. hzi ← 1 − KOp(OR, hdk−1 i, . . . , hd0 i, k).
6. Return hzi.
139
MAMBA Example: The mod to a secret shared power of 2 of a secret shared integer register can be obtain as
follows:
a_prime = param_a_prime # a % 2ˆm
a sint(2137)
k = 16
m = sint(2)
kappa = 8
signed = True # True/False, describes a
140
14.4 Arithmetic with Fixed Point Numbers
In this section we define basic arithmetic on fixed point numbers. We mainly follow the algorithms given in
• Secure Computation with Fixed-Point Numbers FC 2010 [CS10].
We define Qhk,f i as the set of rational numbers {x ∈ Q : x = x · 2−f , x ∈ Zhki }. We represent x ∈ Q as the
integer x · 2f = x ∈ Zhki , which is then represented in Fp via the mapping used above. Thus x ∈ Q is in the range
[−2e , 2e − 2−f ] where e = k − f . As we are working with fixed point numbers we assume that the parameters f and
k are public. For our following algorithms to work (in particular fixed point multiplication and division) we require
that f < k and 2 · k + κ < log2 p. By abuse of notation we write hai to mean hai, i.e. the secret sharing of the fixed
point number a, is actually the secret sharing of the integer representative a.
14.4.2 FxAbs(hai, k, f )
1. hsi ← LTZ(hai).
14.4.3 FxFloor(hai, k, f )
1. hsi ← Trunc(hai, k − f, f, κ).
2. Return hsi.
ub = FixedPt.floor_fx(sb)
14.4.4 FxNeg(hai, k, f )
1. Return −hai.
141
MAMBA Example: To obtain the original value times −1 as follows:
b = -1.5
sb = sfix(b)
# k and f are extracted from b
# returns the value of signed b times -1.
nb = -sb
MAMBA Example: To obtain the addition of two fixed point secret shared values you can do as follows:
a = sfix(3.5)
b = sfix(1.5)
# k and f are extracted from b or a
#returns secret shared 5
a_plus_b = a+b
MAMBA Example: To obtain the multiplication of two fixed point secret shared values you can do as follows:
a = sfix(3.5)
b = sfix(1.5)
# k and f are extracted from b or a
#returns secret shared 5.25
a_mult_b = a*b
14.4.7 FxDiv(hai, b, k, f ):
We first give division for when a, b ∈ Qhk,f i and b is in the clear.
1. Compute x ∈ Qhk,f i such that x ≈ 1/b.
142
MAMBA Example: To divide two fixed point values (where only one is secret shared) you can do as follows:
a = sfix(3.5)
b = 1.5
# k and f are extracted from b
#returns secret shared 2.333333
a_div_b = a/b
3. hwi ← AppRcr(hbi, k, f ).
4. hxi ← α − hbi · hwi.
5. hyi ← hai · hwi.
6. hyi ← TruncPr(hyi, 2 · k, f ).
7. For i ∈ [1, . . . , θ − 1] do
(a) hyi ← hyi · (α + hxi).
(b) hxi ← hxi2 .
(c) hyi ← TruncPr(hyi, 2 · k, 2 · f ).
(d) hxi ← TruncPr(hxi, 2 · k, 2 · f ).
8. hyi ← hyi · (α + hxi).
9. hyi ← TruncPr(hyi, 2 · k, 2 · f ).
MAMBA Example: To obtain the division of two fixed point values that are secret shared, you can do as follows:
a = sfix(3.5)
b = 1.5
# k and f are extracted from b
#returns secret shared 2.333333
a_div_b = a/b
14.4.9 AppRcr(hbi, k, f ):
1. α ← 2.9142 · 2k . Note that α is the integer representative of 2.9142 in Qhk,f i .
143
6. Return hwi.
This is not mean to be called from MAMBA; but if you insist it is
a = sfix(3.5)
b = FixedPt.AppRcr(a, k, f, kappa)
14.4.10 Norm(hbi, k, f ) :
This returns the value c such that 2k−1 ≤ c < 2k and v 0 such that b·v 0 = c, and if 2m−1 ≤ |b| < 2m then v 0 = ±2k−m .
1. hsi ← 1 − 2 · LTZ(hbi, k).
2. hxi ← hsi · hbi.
3. hxk−1 i, . . . , hx0 i ← BitDec(hxi, k, k).
4. hyk−1 i, . . . , hy0 i ← PreOp(OR, hxk−1 i, . . . , hx0 i, k).
5. For i ∈ [0, . . . , k − 2] do
(a) hzi i ← hyi i − hyi+1 i.
6. hzk−1 i ← hyk−1 i.
Pk−1
7. hvi ← i=0 2k−i−1 · hzi i.
8. hci ← hxi · hvi.
9. hv 0 i ← hsi · hvi.
10. Return (hci, hv 0 i).
MAMBA Example: To obtain the norm of a secret shared fix point register you can do as follows:
kappa = 40
b = sfix(1.5)
#returns the norm
c, v = FixedPt.Norm(b, b.k, b.f, kappa, True)
14.4.11 NormSQ(hbi, k) :
As above, but now we assume b ≥ 0, and we also output shares of m and w such that w = 2m/2 if m is even and
2(m−1)/2 if m is odd. This algorithm is used in the Fixed Point sqrt routine later. Furthermore v = 2k−m . Note that
we have introduced some adaptations from the original paper:
1. z ← MSB(hbi, k, f ).
Pk−1
2. hvi ← i=0 2k−i−1 · hzi i.
3. hci ← hbi · hvi.
Pk−1
4. hmi ← i=0 (i + 1) · hzi i.
5. For i ∈ [1, . . . , k/2) do
(a) hwi i = hz2·i−1 i + hz2·i i.
Pk/2−1
6. hwi ← i=1 2i · hwi i.
7. Return hci, hvi, hmi, hwi.
144
14.4.12 SimplifiedNormSQ(hbi, k) :
Same as above, but in this case we only return w, together with a {0, 1} bit signaling whether m is odd. This algorithm
is used in the Fixed Point sqrt routine later.
1. z ← MSB(hbi, k, f ).
Pk−1
2. hmi ← i=0 (i + 1) · hzi i.
3. For i ∈ [0, . . . , k − 1] do
(a) If (i % 2 == 0):
i. hmodd i ← modd + hzi i
4. For i ∈ [1, . . . , k/2) do
(a) hwi i = hz2·i−1 i + hz2·i i.
Pk/2−1
5. hwi ← i=1 2i · hwi i.
6. Return hmodd i, hwi.
14.4.13 MSB(hbi, k) :
Returns index array hzi of size k, such that it holds a 1 in the position of the most significative bit of hbi and h0i
otherwise. This function is used internally in NormSQ and SimplifiedNormSQ.
1. hsi ← 1 − 2 · LTZ(hbi, k).
2. hxi ← hsi · hbi.
3. hxk−1 i, . . . , hx0 i ← BitDec(hxi, k, k).
6. hzk−1 i ← hyk−1 i.
7. Return hzi.
145
14.5 Arithmetic with Floating Point Numbers
For floating point numbers we utilize the methods described in
• Secure Computation on Floating Point Numbers NDSS 2013 [ABZS13].
However we make explicit use of an error flag which we carry throughout a computation, as detailed in previous
sections. The processing of overflow and underflow detection is expensive, and thus we enable the user to turn this
off via means of a compile time flag fdflag by passing the option -f or --fdflag when using compile.py.
This can also be turned on/off within a program by assigning to the variable program.fdflag. The error flag is
still needed however to catch other forms of errors in computations (such as division by zero, taking square roots of
negative numbers etc). Thus setting fdflag equal to false does not necessarily result in err always equaling zero.
Floating point numbers are defined by two global, public integer parameters (`, k) which define the size of the
mantissa and the exponent respectively. Each floating point number is represented as a five tuple (v, p, z, s, err), where
• v ∈ [2`−1 , 2` ) is an ` + 1-bit significand with it’s most significant bit always set to one.
• p ∈ Zhki is the signed exponent.
u = (1 − 2 · s) · (1 − z) · v · 2p .
We adopt the conventions that when u = 0 we also have z = 1, v = 0 and p = 0, and when err 6= 0 then the values of
v, p, z and s are meaningless.
The standard arithmetic operations of addition, multiplication and comparison (plus others) are then implemented
in MAMBA for this datatype using operator overloading. The precise algorithms which are executed are detailed
below.
14.5.1 FlowDetect(hpi):
1. If fdflag then
(a) hsi ← −2 · (hpi < 0) + 1.
(b) herri ← GT(hpi · hsi, 2k−1 − 1, k + 1).
2. Return herri.
146
14.5.4 FLMult((hv1 i, hp1 i, hz1 i, hs1 i, herr1 i), (hv2 i, hp2 i, hz2 i, hs2 i, herr2 i)):
For floating point operations multiplication is much easier than addition, so we deal with this first.
1. hvi ← hv1 i · hv2 i.
14.5.5 FLAdd((hv1 i, hp1 i, hz1 i, hs1 i, herr1 i), (hv2 i, hp2 i, hz2 i, hs2 i, herr2 i)):
1. hai ← LT(hp1 i, hp2 i, k).
2. hbi ← EQ(hp1 i, hp2 i, k).
7. hb · ci ← hbi · hci.
8. hvmax i ← (ha · bi − hai − hb · ci) · (hv1 i − hv2 i) + hv1 i.
9. hvmin i ← (ha · bi − hai − hb · ci) · (hv2 i − hv1 i) + hv2 i.
147
P`+1
19. hp0 i ← ` + 2 − i=0 hhi i.
P`+1
20. h2p0 i ← 1 + i=0 2i · (1 − hhi i).
21. hvi ← Trunc(h2p0 i · hvi, ` + 2, 2).
22. hpi ← hpmax i − hp0 i + 1 − hdi.
23. hz1 · z2 i ← hz1 i · hz2 i.
24. hvi ← (1 − hz1 i − hz2 i + hz1 · z2 i) · hvi + hz1 i · hv2 i + hz2 i · hv1 i.
25. hzi ← EQZ(hvi, `).
26. hpi ← (1 − hz1 i − hz2 i + hz1 · z2 i) · hpi + hz1 i · hp2 i + hz2 i · hp1 i) · (1 − hzi).
27. hsi ← (hai − ha · bi) · hs2 i + (1 − hai − hbi + ha · bi) · hs1 i) + hb · ci · hs2 i + (hbi − hb · ci) · hs1 i).
28. hsi ← (1 − hz1 i − hz2 i + hz1 · z2 i) · hsi + (hz2 i − hz1 · z2 i) · hs1 i + (hz1 i − hz1 · z2 i) · hs2 i.
29. herri ← herr1 i + herr2 i + FlowDetect(hpi).
30. Return (hvi, hpi, hzi, hsi, herri).
14.5.7 FLDiv((hv1 i, hp1 i, hz1 i, hs1 i, herr1 i), (hv2 i, hp2 i, hz2 i, hs2 i, herr2 i)):
1. hvi ← SDiv(hv1 i, hv2 i + hz2 i, `).
2. hbi ← LT(hvi, 2` , ` + 1).
3. hv 0 i ← hvi + hbi · hvi.
4. hvi ← Trunc(hv 0 i, ` + 1, 1).
5. hzi ← hz1 i.
6. hsiXOR(hs1 i, hs2 i).
7. hpi ← (hp1 i − hp2 i − ` + 1 − hbi) · (1 − hzi).
148
8. herri ← herr1 i + herr2 i.
9. herri ← herri + hz2 i.
10. herri ← herri + FlowDetect(hpi).
11. Return (hvi, hpi, hzi, hsi, herri).
14.5.13 FLEQ((hv1 i, hp1 i, hz1 i, hs1 i, herr1 i), (hv2 i, hp2 i, hz2 i, hs2 i, herr2 i)):
1. hb1 i ← EQ(hv1 i, hv2 i, `).
2. hb2 i ← EQ(hp1 i, hp2 i, k).
3. hb3 i ← hz1 i · hz2 i.
4. hb4 i ← hs1 i · hs2 i.
5. hti ← herr1 i + herr2 i.
6. hti ← (EQZ(hti, k).
7. Return (hb1 i · hb2 i · hb3 i · (1 − hb4 i) + hb4 i) · hti.
14.5.14 FLLT((hv1 i, hp1 i, hz1 i, hs1 i, herr1 i), (hv2 i, hp2 i, hz2 i, hs2 i, herr2 i)):
1. hai ← LT(hp1 i, hp2 i, k).
2. hci ← EQ(hp1 i, hp2 i, k).
3. hdi ← LT((1 − 2 · hs1 i) · hv1 i, (1 − 2 · hs2 i) · hv2 i, ` + 1).
4. ha · ci ← hai · hci.
5. hc · di ← hci · hdi.
6. hb+ i ← hc · di + (hai − ha · ci).
7. hb− i ← hc · di + (1 − hci − hai + ha · ci).
149
8. hz1 · z2 i ← hz1 i · hz2 i.
9. hs1 · s2 i ← hs1 i · hs2 i.
10. hbi ← hz1 · z2 i · (hs2 i − 1 − hs1 · s2 i) + hs1 · s2 i · (hz1 i + hz2 i − 1) + hz1 i · (1 − hs1i − hs2 i) + hs1 i.
11. hti ← herr1 i + herr2 i.
14.5.15 FLGT((hv1 i, hp1 i, hz1 i, hs1 i, herr1 i), (hv2 i, hp2 i, hz2 i, hs2 i, herr2 i)):
1. hvr i, hpr i, hzr i, hsr i, herrr i ← FLAdd((hv2 i, hp2 i, hz2 i, 1 − hs2 i, herr2 i), (hv1 i, hp1 i, hz1 i, hs1 i, herr1 i)).
2. Return FLLTZ(hvr i, hpr i, hzr i, hsr i, herrr i).
14.5.16 FLLET((hv1 i, hp1 i, hz1 i, hs1 i, herr1 i), (hv2 i, hp2 i, hz2 i, hs2 i, herr2 i)):
1. hbi ← FLGT((hv1 i, hp1 i, hz1 i, hs1 i, herr1 i), (hv2 i, hp2 i, hz2 i, hs2 i, herr2 i)).
2. Return 1 − hbi
14.5.17 FLGET((hv1 i, hp1 i, hz1 i, hs1 i, herr1 i), (hv2 i, hp2 i, hz2 i, hs2 i, herr2 i)):
1. hbi ← FLLT((hv1 i, hp1 i, hz1 i, hs1 i, herr1 i), (hv2 i, hp2 i, hz2 i, hs2 i, herr2 i)).
2. Return 1 − hbi
150
14.6 Conversion Routines
14.6.1 FLRound((hv1 i, hp1 i, hz1 i, hs1 i, herr1 i, mode):
This, depending on mode, computes either the floating point representation of the floor (if mode = 0) or the ceiling
(if mode = 1) of the input floating point number.
1. hai ← LTZ(hp1 i, k).
2. hbi ← LT(hp1 i, −` + 1, k).
3. ha · bi ← hai · hbi.
4. hv2 i, h2−p1 i ← Oblivious Trunc(hv1 i, `, (ha · bi − hai) · hp1 i). [Note, we save the computation of h2−p1 i which
this routine computes when 0 ≤ p1 < −`, otherwise the returned share is of 20 .]
5. hci ← EQZ(hv2 i, `).
6. hvi ← hv1 i − hv2 i + (1 − hci) · h2−p1 i · XOR(mode, hs1 i).
7. hdi ← EQ(hvi, 2` , ` + 1).
8. hvi ← 2`−1 · hdi + (1 − hdi) · hvi.
9. hvi ← (hai − ha · bi) · hvi + ha · bi · (mode − hs1 i) + (1 − hai) · hv1 i.
10. hsi ← (1 − hbi · mode) · hs1 i.
11. hzi ← OR(EQZ(hvi, `), hz1 i).
12. hvi ← hvi · (1 − hzi).
13. hpi ← (hp1 i + hdi · (hai − ha · bi)) · (1 − hzi).
14. herri ← herr1 i.
15. Return (hvi, hpi, hzi, hsi, herri).
MAMBA Example: To round a sfloat value we could invoke the function as follows:
from Compiler import floatingpoint
x = sfloat(5.5)
mode = 0
# v, p, z, s, err are extracted from x
# retunrs the floor of x
y = floatingpoint.FLRound(x,mode)
When mode=2 then we get the ceil operation.
14.6.2 Int2Fx(hai, k, f ):
Given an integer a ∈ Zhki this gives the equivalent integer b in Qhk,f i , namely b = a · 2f . Note this means, to ensure
correctness, that |a| ≤ 2k−f .
1. Return 2f · hai.
MAMBA Example: To cast an int or sint register into a sfix one, you could execute the following:
x = sfix(5.5)
# k, f are extracted from x
y = sfix.load_sint(x)
151
14.6.3 Int2FL(hai, γ, `):
We assume a ∈ Zhγi , this could loose precision if γ − 1 > `.
1. λ ← γ − 1.
12. herri ← 0.
13. Return (hvi, hpi, hzi, hsi, herri).
MAMBA Example: To cast an int or sint register into a sfloat one, you could execute the following:
x = sfloat(5.5)
# gamma and l are extracted from the system
y = sfloat(x)
14.6.4 Fx2Int(hai, k, f ):
Given a value a ∈ Qhk,f i this gives the integer ba/2f c.
1. Return Trunc(hai, k, f ).
MAMBA Example: To extract the integral component of a sfix register, which is then encapsulated on a sint
register, and taking into account what is currently implemented, you could execute the following:
x = sifx(5.5)
# y stores 5 in a sint register
y = AdvInteger.Trunc(x.v, x.k, x.f, x.kappa, True)
14.6.5 FxFloor(hai, k, f ):
Given a value a ∈ Qhk,f i this does the same, but gives the result as a fixed point value.
1. Return 2f · Trunc(hai, k, f ).
152
MAMBA Example: To floor an sfix register, you could execute the following:
from Compiler import mpc_math
x = sifx(5.5)
# k and f are extracted from x
# y stores 5 in a sfix register
y = mpc_math.floor_fx(x)
MAMBA Example: To cast from sfix to sfloat, you could execute the following:
# stores 5.5 on a sfloat register
x = sfloat(sfix(5.5))
MAMBA Example: To cast an sfloat register into a sfix one, you could execute the following:
# stores 5.5 on a sfix register
# v, p, z, s, err are extracted from x and l, k, gamma are system parameters
x = sfix(sfloat(5.5))
153
14.7 SQRT Functions
(TO DO:) These functions are only currently supported in their fixed point versions in Mamba, all variants are however
supported in the Rust pipeline. The original description of the protocols for the fixed point square root algorithm are
included in:
• Secure Distributed Computation of the Square Root and Applications, ISPEC 2012 [Lie12].
The floating point variant is in
• Secure Computation on Floating Point Numbers NDSS 2013 [ABZS13].
Additionally, we provide a simplified implementation for the Square Root on fixed point variables, that is appropriate
for inputs of any size. We make use Liedel’s protocol, when the input’s size makes it possible, as we explain later in
this section. Both algorithms makes use of two constants
α = −0.8099868542, β = 1.787727479.
α·x+β− √1
x
E(x) = ,
√1
x
√ √
r
3 −β 2 3
M= · · · β − q ,
3 α 3 −β
α
1
E = E(1) = −M.
2
14.7.1 ParamFxSqrt(hxi, k, f ):
√
This algorithm uses the sub-algorithm LinAppSQ defined below, note that LinAppSQ returns an scaled 1/ x · 2f .
The algorithm only works when 3 · k − 2 · f is less than the system precision, which is by default equal to 20. In a
future release we will extend the sqrt function to cope with other input ranges.
1. θ ← dlog2 (k/5.4)e.
2. hy0 i ← LinAppSQ(hxi, k, f ).
3. hy0 i ← hy0 i · 1/2f .
4. hg0 i ← hy0 i · hxi.
5. hg0 i ← hy0 i · 1/2f .
6. hg0 i ← hy0 i · 1/2.
7. hgh0 i ← hg0 i · hh0 i.
8. hgi ← hg0 i.
9. hhi ← hh0 i.
10. hghi ← hgh0 i.
11. For i ∈ [1, . . . , θ − 2] do
(a) hri ← 3/2 − hghi.
154
(b) hgi ← hgi · hri.
(c) hhi ← hhi · hri.
(d) hghi ← hgi · hhi.
12. hri ← 3/2 − hghi.
14.7.2 SimplifiedFxSqrt(hxi, k, f ):
This algorithm uses the sub-algorithm NormSQ defined above. Among the values √ it returns, we base our approxima-
x
tion by directly using w = 2m/2 . From that point it approximates the value of x by calculating 2m/2 . To avoid any
loss of precision, we reuse sfix instantiation process. The algorithm work on the precision of the system and can
solve values on any range. Its behaviour is still experimental. The function is designed in such a way that there is no
restriction on the size f .
155
(c) hhi ← hhi · hri.
(d) hghi ← hgi · hhi.
15. hri ← 3/2 − hghi.
16. hhi ← hhi · hri.
17. hHi ← 4 · (hhi2 ).
18. hHi ← hHi · hxi.
19. hHi ← (3) − hHi.
20. hHi ← hhi · hHi.
21. hgi ← hHi · hxi.
22. Return hgi.
MAMBA Example: To obtain the sqrt of any value, you could execute the following:
from Compiler import mpc_math
k = 5
f = 2
x = sfix(6.25)
y = sfix(144)
z = sfix (257.5)
b = mpc_math.sqrt(y)
c = mpc_math.sqrt(z)
156
14.7.4 LinAppSQ(hbi, k, f ):
We based this section on the contents of the original paper, [Lie12]. However we corrected the typos from the original
work, the result is as follows:
1. α ← (−0.8099868542) · 2k .
2. β ← (1.787727479) · 22·k .
3. (hci, hvi, hmi, hW i) ← NormSQ(hbi, k, f ).
4. hwi ← α · hci + β.
4. (hv2 i, hp2 i, hz2 i, hs2 i, herr2 i) ← FLMult((hv1 i, −`, 0, 0, 0), (vα , pα , zα , sα , 0)).
5. (hv0 i, hp0 i, hz0 i, hs0 i, herr0 i) ← FLAdd((hv2 i, hp2 i, hz2 i, hs2 i, herr2 i), (vβ , pβ , zβ , sβ , 0)).
6. (hvg i, hpg i, hzg i, hsg i, herrg i) ← FLMult((hv1 i, −`, 0, 0, 0), (hv0 i, hp0 i, hz0 i, hs0 i, herr0 i)).
7. (hvh i, hph i, hzh i, hsh i, herrh i) ← (hv0 i, hp0 i − 1, hz0 i, hs0 i, herr0 i).
8. For i ∈ [1, . . . , dlog2 (`/5.4)e − 1] do
(a) (hv2 i, hp2 i, hz2 i, hs2 i, herr2 i) ← FLMult((hvg i, hpg i, hzg i, hsg i, herrg i), (hvh i, hph i, hzh i, hsh i, herrh i)).
(b) (hv2 i, hp2 i, hz2 i, hs2 i, herr2 i) ← FLSub((3 · 2`−2 , −(` − 1), 0, 0, 0), (hv2 i, hp2 i, hz2 i, hs2 i, herr2 i)).
(c) (hvg i, hpg i, hzg i, hsg i, herrg i) ← FLMult((hvg i, hpg i, hzg i, hsg i, herrg i), (hv2 i, hp2 i, hz2 i, hs2 i, herr2 i)).
(d) (hvh i, hph i, hzh i, hsh i, herrh i) ← FLMult((hvh i, hph i, hzh i, hsh i, herrh i), (hv2 i, hp2 i, hz2 i, hs2 i, herr2 i)).
9. (hvh2 i, hph2 i, hzh2 i, hsh2 i, herrh2 i) ← FLMult((hvh i, hph i, hzh i, hsh i, herrh i), (hvh i, hph i, hzh i, hsh i, herrh i)).
10. (hv2 i, hp2 i, hz2 i, hs2 i, herr2 i) ← FLMult((hv1 i, −`, 0, 0, 0), (hvh2 i, hph2 i, hzh2 i, hsh2 i, herrh2 i).
11. (hv2 i, hp2 i, hz2 i, hs2 i, herr2 i) ← FLSub((3 · 2`−2 , −(` − 1), 0, 0, 0), (hv2 i, hp2 i + 1, hz2 i, hs2 i, herr2 i)).
12. (hvh i, hph i, hzh i, hsh i, herrh i) ← FLMult((hvh i, hph i, hzh i, hsh i, herrh i), (hv2 i, hp2 i, hz2 i, hs2 i, herr2 i)).
157
13. (hv2 i, hp2 i, hz2 i, hs2 i, herr2 i) ← FLMult((hv1 i, −`, 0, 0, 0), (hvh i, hph i + 1, hzh i, hsh i, herrh i)).
14. (hv2 i, hp2 i, hz2 i, hs2 i, herr2 i) ← FLMult((hv2 i, hp2 i, hz2 i, hs2 i, herr2 i), (2`−1 · (1 − hci) + v√2 · hci, −(1 −
hci) · (` − 1) + p√2 · hci, 0, 0, 0)).
15. hpi ← (hp2 i + hpi) · (1 − hz1 i).
158
14.8 EXP and LOG Functions
(TO DO:) The exp and log functions are only currently supported in their fixed point versions in the Mamba pipeline.
Both functions are supported in the Rust pipeline.
A secure fixed point exponentiation and logarithm algorithm is not found anywhere, so this is our own one derived
from the identities in the book.
• Computer Approximations by Hart from 1968 [Har78].
The floating point variants are in
• Secure Computation on Floating Point Numbers NDSS 2013 [ABZS13].
Once we have defined FxExp2 and FxLog2 (resp. FLExp2 and FLLog2) we can define the following functions from
the usual identities for non-secret values of the base b:
Note, that the functions on these section require specific sfloat parametrization, in accordance to the algorithms in
this section. They support secret shared sfix x ad y, as well as public floating point or integer inputs. We can define
these operations as follows:
14.8.1 FxExp2(hai, k, f ):
This algorithm computes 2a as a fixed point calculation. First takes the integer and fractional part of the input |a/2f |,
which we denote by b and c. We then compute d = 2b , which will clearly overflow if b > k − f , but we ignore this
error (if the user is stupid enough to put in garbage, they get garbage out). We then compute e = 2c , as 0 ≤ c ≤ 1 via
the polynomial the following polynomial5 . Our polynomial (which we produced using a Chebyshev approximation)
is of degree nine and has coefficients given by
0 0.99999999999998151058451
1 0.69314718056364205693851
2 0.24022650683729748257646
3 0.0555041102193305250618
4 0.0096181190501642860210497
5 0.0013333931011014250476911
6 0.00015395144945146697380844
7 0.000015368748541192116946474
8 0.0000012256971722926501833228
9 0.00000014433329807023165258784
Given d and e one can now compute 2|a| = 2b+c = 2b · 2c = d · e, and the final dealing with the sign of a can be done
by an inversion. We denote by FxPol(P1045 , hxi, k, f ) the evaluation of the polynomial P1045 on the fixed point input
hxi where x ∈ Qhk,f i . This is done by Horner’s rule.
1. hsi = FxLTZ(hai).
2. hai ← (1 − 2 · hsi) · hai.
3. hbi ← Fx2Int(hai, k, f ).
5 Note polynomial P1045 (X) from Hart [Har78] is incorrect and does not give an accurate result
159
4. hci ← hai − Int2Fx(hbi, k, f ).
5. hdi ← Int2Fx(Pow2(hbi, k), k, f ). [This will produce an invalid result if b is too big, in which case the result
cannot be held in an Fx in any case]
6. hei ← FxPol(P1045 , hci, k, f ).
7. hgi ← FxMult(hdi, hei, k, f ).
8. hg −1 i ← FxDiv(2f , hgi, k, f ).
9. hai ← (1 − hsi) · g + hsi · hg −1 i.
10. Return hai.
The above works, but we have found a little numerical instability due to the division operation.
MAMBA Example: To obtain 2y where y is secret shared you could run the following:
from Compiler import mpc_math
y =sfix(4)
# returns 2ˆ4
# extracts k and f from y
exp2_y=mpc_math.exp2_fx(sfix(y))
160
16. hui ← h(1 − c) · ai · (hbi · hxi + (1 − hbi) · 2` · Inv(h2p2 i) · hyi) + (2` − 1) · hcihs1 i.
17. hu` i, . . . , hu1 i ← BitDec(hui, `, `).
18. For i ∈ [1, . . . , `] do
−i
(a) [In this loop (cvi , cpi , 0, 0) represents the floating point number 22 ].
(b) hai i ← 2`−1 · (1 − hui i) + cvi · hui i.
(c) hbi i ← −(` − 1) · (1 − hui i) + cpi · hui i.
19. (hvu i, hpi i, 0, 0) ← FLProd((ha1 i, hb1 i, 0, 0), . . . , (ha` i, hb` i, 0, 0)). [This implements a product of ` floating
point values, which is performed via a binary tree style method.]
20. hpi ← hai · (hwi + hpu i) + 2k−1 · (1 − hai) · (1 − 2hs1 i).
21. hvi ← 2`−1 · hz1 i + (1 − hz1 i) · hvu i.
14.8.3 FxLog2(hai, k, f ):
We first map a to a value v in the interval [1/2, 1] by essentially converting to a floating point number. So we a have
a = (v/2k ) · 2p where v, p ∈ Zhki , and v/2k ∈ [1/2, 1]. Thus we have log2 a = p + log2 (v/2k ), and we then treat
v as a fixed point number and apply the Pade approximation P2524 /Q2524 from Hart’s book [Har78], which produces
an absolute error of 10−8.32 . We denote by FxPade(P2524 , Q2524 , hxi, k, f ) the evaluation of the rational function
P2524 /Q2524 on the fixed point input hxi where x ∈ Qhk,f i . The Pade approximation is given be the rational function
defined by the following table
P 0 1 -.20546 66719 51
P 1 1 -.88626 59939 1
P 2 1 +.61058 51990 15
P 3 1 +.48114 74609 89
Q 0 0 +.35355 34252 77
Q 1 1 +.45451 70876 29
Q 2 1 +.64278 42090 29
Q 3 1 +.1
161
MAMBA Example: To obtain log2(x) where y is secret shared you could run the following:
from Compiler import mpc_math
sfloat.vlen = 15 # Length of mantissa in bits
sfloat.plen = 10 # Length of exponent in bits
sfloat.kappa = 40 # Statistical security parameter for floats
sfix.set_precision(20,40)
cfix.set_precision(20,40)
x =sfix(4)
# extracts k and f from y
# returns log_2(4)
log2_x=mpc_math.log2_fx(sfix(x))
Note, internally log2_fx uses a polynomial to approximate the logarithm, but it uses the sfloat arithmetic to do
this. Thus, you need to set up sfloat and sfix correctly for this to work. In particular if you look deep inside the
code there is the following cryptic remark in relation to log2_fx...
# Note that sfloat and sfix sizes have to be parametrized correctly,
# such that sfix.k > sfix.f >= sfloat vlen. This is not the case by default.
162
14.9 Trigonometic Functions
(TO DO:) These still need implementing for sfloat in the Rust pipeline. All three basic trigonometric functions
support inputs of either fixed point or floating point precision. With the output type being equal to the input type.
The computation of sin(x) and cos(x) are performed using polnoymial approximations, with the computation of
tan(x) done via sin(x)/ cos(x). The basic idea for sin(x) and cos(x) is to first reduce the argument x into the range
[0, . . . , 2π), so as to obtain a new argument (which we call y) We then compute a bit b1 to test as to whether y ∈ [0, π)
or [π, 2π) (with 0 being the former). We then reduce y to z by reducing it into the range [0, π), and compute a bit b2
which says whether z is in the range [0, π/2) or [π/2, π). We finally reduce z into the range [0, π/2) resulting in w.
Then a polynomial is used to compute sin(w) or cos(w), which means we need to now scale w into the range [0, 1) to
obtain v. For the polynomial approximations to the basic functions in the range [0, π/2), where the argument is given
as w = v · π/2 we use the following approximations from Hart’s book [Har78]
sin(w) = v · P3307 (v 2 ),
cos(w) = P3508 (v 2 ).
Where we have
P3307 P3508
0 1 +.15707 96326 79489 66192 31314 989 0 +.99999 99999 99999 99999 99914 771
1 0 -.64596 40975 06246 25365 51665 255 0 -.49999 99999 99999 99999 91637 437
2 -1 +.79692 62624 61670 45105 15876 375 -1 +.41666 66666 66666 66653 10411 988
3 -2 -.46817 54135 31868 79164 48035 89 -2 -.13888 88888 88888 88031 01864 15
4 -3 +.16044 11847 87358 59304 30385 5 -4 +.24801 58730 15870 23300 45157
5 -5 -.35988 43235 20707 78156 5727 -6 -.27557 31922 39332 25642 1489
6 -7 +.56921 72920 65732 73962 4 -8 +.20876 75698 16541 25915 59
7 -9 -.66880 34884 92042 33722 -10 -.11470 74512 67755 43239 4
8 -11 +.60669 10560 85201 792 -13 +.47794 54394 06649 917
9 -13 -.43752 95071 18174 8 -15 -.15612 26342 88277 81
10 -15 +.25002 85418 9303 -18 +.39912 65450 7924
NOTE: Polynomial tables are described by the monomial number, the degree of the approximation p and its significand
s. The coefficient of the ith monomial can be obtained by multiplying the significand by 10pi as follows: si · 10pi .
14.9.1 F ? TrigSub(hxi):
1. hf i ← F ? Mult(hxi, (1/(2 · π))
2. hf i ← F ? Floor(hf i).
3. hyi ← F ? Mult(hf i, (2 · π)).
4. hyi ← F ? Add(hxi, −hyi).
5. hb1 i ← F ? GE(hyi, (π)).
6. hf i ← F ? Add(2 · π, −hyi)
7. w ← F ? Choose(hf i, hyi, hb1 i).
8. hb2 i ← F ? GE(h2i, (π/2)).
9. hf i ← F ? Add(π, −hwi)
10. w ← F ? Choose(hf i, hwi, hb2 i).
11. Return (hwi, hb1 i, hb2 i).
163
MAMBA Example: To reduce the angle you could execute the following (note that this function call is meant to be
used internally):
from Compiler import mpc_math
x = sfix(4) # sfloat(4)
# returns an angle in the [0,pi/2) interval in w and flags b1 and b2.
w, b1, b2 = mpc_math.sTrigSub_fx(x)
14.9.2 F ? Sin(hxi)
We present these routines as generic routines given the specific helper subroutine above; we assume an obvious
overloading/translation of arguments. We let F ? Choose(hxi, hyi, hbi), for a shared bit b, denote an operation which
produces hxi if hbi = 1 and hyi otherwise This is easily obtained by securely multiplying each component share of the
data representing hxi etc by hbi. So for fixed point representations this becomes, irrespective of the values k and f ,
MAMBA Example: To obtain the sin of any value, you could execute the following:
from Compiler import mpc_math
x = sfix(4) # sfloat(4)
# returns the sin of a number of any interval
y = mpc_math.sin(x)
14.9.3 F ? Cos(hxi)
Likewise this becomes
1. hwi, hb1 i, hb2 i ← F ? TrigSub(hxi)
2. hvi ← hwi.
3. hbi ← F ? Choose(h−1i, h1i, hb2 i).
4. hcos(v)i ← F ? Pol(P3308 , hv 2 i).
MAMBA Example: To obtain the sin of any value, you could execute the following:
from Compiler import mpc_math
x = sfix(4) # sfloat(4)
# returns the cos of an angle on any interval
y = mpc_math.cos(x)
164
14.9.4 F ? Tan(hxi)
Likewise this becomes
1. (hwi, hb1 i, hb2 i) ← F ? TrigSub(hxi).
6. hvi ← hwi.
7. hbi ← F ? Choose(h−1i, h1i, hb2 i).
8. hcos(v)i ← F ? Pol(P3308 , hv 2 i).
MAMBA Example: To obtain the tan of any value, you could execute the following:
from Compiler import mpc_math
x = sfix(4) # sfloat(4)
# returns the tan of an angle on any interval
y = mpc_math.tan(x)
165
14.10 Inverse Trigonometric Functions
(TO DO:) Given that MAMBA currently only supports square root operations for sfix inputs, inverse trigonometric
functions are restricted to sfix inputs. (TO DO:) These still need implementing for sfloat in the Rust pipeline. To
obtain arcsin and arccos one makes use of the formula:
x
arcsin(x) = arctan √ ,
1 − x2
π
arccos(x) = − arcsin(x).
2
Note, that arcsin and arccos are only defined when |x| ≤ 1. The value of arctan(x) is however defined for all real x.
For arctan(x) we first reduce to positive values of x by using the formula
arctan(−x) = − arctan(x).
The final approximation to arctan(x) for x ∈ [0, 1) is obtained using the Pade approximation P5102 /Q5102 from
Hart’s book [Har78]. Where the polynomials are represented as in our earlier descriptions.
P5102 Q5102
0 5 +.21514 05962 60244 19331 93254 468 5 +.21514 05962 60244 19331 93298 234
1 5 +.73597 43380 28844 42408 14980 706 5 +.80768 78701 15592 48851 76713 209
2 6 +.10027 25618 30630 27849 70511 863 6 +.12289 26789 09278 47762 98743 322
3 5 +.69439 29750 03225 23370 59765 503 5 +.97323 20349 05355 56802 60434 387
4 5 +.25858 09739 71909 90257 16567 793 5 +.42868 57652 04640 80931 84006 664
5 4 +.50386 39185 50126 65579 37791 19 5 +.10401 13491 56689 00570 05103 878
6 3 +.46015 88804 63535 14711 61727 227 4 +.12897 50569 11611 09714 11459 55
7 2 +.15087 67735 87003 09877 17455 528 2 +.68519 37831 01896 80131 14024 294
8 -1 +.75230 52818 75762 84445 10729 539 1 +.1
The following protocol for arcsin implements the formulas from before. The protocol and its implementation,
make use of our secure implementation of Square Root. This method is implemented, and it is used to derive arccos.
14.10.1 F ? ArcSin(hxi)
2
1. hx i ← F ? Mult(hxi, hxi).
2. h−x2 i ← F ? Neg(hx2 i).
3. h1 − x2 i ← F ? Add(1, h−x2 i).
√
4. h 1 − x2 i ← F ? Sqrt(1, h1 − x2 i).
√
5. hvi ← F ? Div(hxi, h 1 − x2 i).
6. hyi ← F ? ArcTan(hvi).
7. If ? = L
(a) h|x|i ← FLAbs(hxi).
(b) hyerr i ← FLGT(h|x|i, 1.0).
8. Return hyi.
166
MAMBA Example: To obtain the arcSin on the (-1,1) interval, you could execute the following:
from Compiler import mpc_math
x =sfix(0.5)
# returns the tan of an angle on any interval
y = mpc_math.asin(x)
14.10.2 F ? ArcCos(hxi)
1. hyi ← F ? ArcSin(hxi).
2. h−yi ← F ? Neg(hyi).
3. hπ/2 − yi ← F ? Add(π/2, h−yi).
4. Return hyi.
MAMBA Example: To obtain the arcCos of any value on the (-1,1) interval, you could execute the following:
from Compiler import mpc_math
x =sfix(0.5)
# returns the tan of an angle on any interval
y = mpc_math.acos(x)
14.10.3 F ? ArcTan(hxi)
1. hsi ← F ? LTZ(hzi).
2. h|x|i ← F ? Abs(hxi).
3. hbi ← F ? GT(h|x|i, 1.0).
4. hvi ← F ? Div(1, h|x|i).
5. hvi ← F ? Choose(h|x|i, hvi, 1 − hbi).
6. hv 2 i ← F ? Mul(hvi, hvi).
7. hyi ← FxPade(P5102 , Q5102 , hv 2 i).
8. hyi ← F ? Mul(hvi, hyi).
9. hπ/2 − yi ← F ? Sub(π/2, hyi).
10. hyi ← F ? Choose(hyi, hπ/2 − yi, 1 − hbi).
11. h−yi ← F ? Neg(hyi).
12. hyi ← F ? Choose(hyi, h−yi, 1 − hsi).
13. Return hyi.
MAMBA Example: To obtain the arcTan of any value, you could execute the following:
from Compiler import mpc_math
x =sfix(0.5)
# returns the tan of an angle on any interval
y = mpc_math.atan(x)
167
References
[ABF+ 17] Toshinori Araki, Assi Barak, Jun Furukawa, Tamar Lichter, Yehuda Lindell, Ariel Nof, Kazuma Ohara,
Adi Watzman, and Or Weinstein. Optimized honest-majority MPC for malicious adversaries - breaking
the 1 billion-gate per second barrier. In 2017 IEEE Symposium on Security and Privacy, SP 2017, San
Jose, CA, USA, May 22-26, 2017, pages 843–862. IEEE Computer Society, 2017.
[ABZS13] Mehrdad Aliasgari, Marina Blanton, Yihua Zhang, and Aaron Steele. Secure computation on floating
point numbers. In 20th Annual Network and Distributed System Security Symposium, NDSS 2013, San
Diego, California, USA, February 24-27, 2013. The Internet Society, 2013.
[BCS19] Carsten Baum, Daniele Cozzo, and Nigel P. Smart. Using topgear in overdrive: A more efficient zkpok
for SPDZ. In Kenneth G. Paterson and Douglas Stebila, editors, Selected Areas in Cryptography - SAC
2019 - 26th International Conference, Waterloo, ON, Canada, August 12-16, 2019, Revised Selected
Papers, volume 11959 of Lecture Notes in Computer Science, pages 274–302. Springer, 2019.
[CdH10] Octavian Catrina and Sebastiaan de Hoogh. Improved primitives for secure multiparty integer compu-
tation. In Juan A. Garay and Roberto De Prisco, editors, Security and Cryptography for Networks, 7th
International Conference, SCN 2010, Amalfi, Italy, September 13-15, 2010. Proceedings, volume 6280
of Lecture Notes in Computer Science, pages 182–199. Springer, 2010.
[CDM00] Ronald Cramer, Ivan Damgård, and Ueli M. Maurer. General secure multi-party computation from any
linear secret-sharing scheme. In Bart Preneel, editor, Advances in Cryptology - EUROCRYPT 2000,
International Conference on the Theory and Application of Cryptographic Techniques, Bruges, Belgium,
May 14-18, 2000, Proceeding, volume 1807 of Lecture Notes in Computer Science, pages 316–334.
Springer, 2000.
[CO15] Tung Chou and Claudio Orlandi. The simplest protocol for oblivious transfer. In Kristin E. Lauter and
Francisco Rodrı́guez-Henrı́quez, editors, Progress in Cryptology - LATINCRYPT 2015 - 4th International
Conference on Cryptology and Information Security in Latin America, Guadalajara, Mexico, August 23-
26, 2015, Proceedings, volume 9230 of Lecture Notes in Computer Science, pages 40–58. Springer,
2015.
[CS10] Octavian Catrina and Amitabh Saxena. Secure computation with fixed-point numbers. In Radu Sion,
editor, Financial Cryptography and Data Security, 14th International Conference, FC 2010, Tenerife,
Canary Islands, January 25-28, 2010, Revised Selected Papers, volume 6052 of Lecture Notes in Com-
puter Science, pages 35–50. Springer, 2010.
[DFK+ 06] Ivan Damgård, Matthias Fitzi, Eike Kiltz, Jesper Buus Nielsen, and Tomas Toft. Unconditionally se-
cure constant-rounds multi-party computation for equality, comparison, bits and exponentiation. In Shai
Halevi and Tal Rabin, editors, Theory of Cryptography, Third Theory of Cryptography Conference, TCC
2006, New York, NY, USA, March 4-7, 2006, Proceedings, volume 3876 of Lecture Notes in Computer
Science, pages 285–304. Springer, 2006.
[DKL+ 13] Ivan Damgård, Marcel Keller, Enrique Larraia, Valerio Pastro, Peter Scholl, and Nigel P. Smart. Practical
covertly secure MPC for dishonest majority - or: Breaking the SPDZ limits. In Jason Crampton, Sushil
Jajodia, and Keith Mayes, editors, Computer Security - ESORICS 2013 - 18th European Symposium
on Research in Computer Security, Egham, UK, September 9-13, 2013. Proceedings, volume 8134 of
Lecture Notes in Computer Science, pages 1–18. Springer, 2013.
[DPSZ12] Ivan Damgård, Valerio Pastro, Nigel P. Smart, and Sarah Zakarias. Multiparty computation from some-
what homomorphic encryption. In Safavi-Naini and Canetti [SC12], pages 643–662.
[FKOS15] Tore Kasper Frederiksen, Marcel Keller, Emmanuela Orsini, and Peter Scholl. A unified approach to
MPC with preprocessing using OT. In Tetsu Iwata and Jung Hee Cheon, editors, Advances in Cryptology
168
- ASIACRYPT 2015 - 21st International Conference on the Theory and Application of Cryptology and
Information Security, Auckland, New Zealand, November 29 - December 3, 2015, Proceedings, Part I,
volume 9452 of Lecture Notes in Computer Science, pages 711–735. Springer, 2015.
[GHS12] Craig Gentry, Shai Halevi, and Nigel P. Smart. Homomorphic evaluation of the AES circuit. In Safavi-
Naini and Canetti [SC12], pages 850–867.
[GKWY19] Chun Guo, Jonathan Katz, Xiao Wang, and Yu Yu. Efficient and secure multiparty computation from
fixed-key block ciphers, 2019.
[Har78] John F. Hart. Computer Approximations. Krieger Publishing Co., Inc., Melbourne, FL, USA, 1978.
[HSS17] Carmit Hazay, Peter Scholl, and Eduardo Soria-Vazquez. Low cost constant round MPC combining
BMR and oblivious transfer. In Tsuyoshi Takagi and Thomas Peyrin, editors, Advances in Cryptology
- ASIACRYPT 2017 - 23rd International Conference on the Theory and Applications of Cryptology and
Information Security, Hong Kong, China, December 3-7, 2017, Proceedings, Part I, volume 10624 of
Lecture Notes in Computer Science, pages 598–628. Springer, 2017.
[KOS15] Marcel Keller, Emmanuela Orsini, and Peter Scholl. Actively secure OT extension with optimal overhead.
In Rosario Gennaro and Matthew Robshaw, editors, Advances in Cryptology - CRYPTO 2015 - 35th
Annual Cryptology Conference, Santa Barbara, CA, USA, August 16-20, 2015, Proceedings, Part I,
volume 9215 of Lecture Notes in Computer Science, pages 724–741. Springer, 2015.
[KPR18] Marcel Keller, Valerio Pastro, and Dragos Rotaru. Overdrive: Making SPDZ great again. In Jesper Buus
Nielsen and Vincent Rijmen, editors, Advances in Cryptology - EUROCRYPT 2018 - 37th Annual Inter-
national Conference on the Theory and Applications of Cryptographic Techniques, Tel Aviv, Israel, April
29 - May 3, 2018 Proceedings, Part III, volume 10822 of Lecture Notes in Computer Science, pages
158–189. Springer, 2018.
[KRSW18] Marcel Keller, Dragos Rotaru, Nigel P. Smart, and Tim Wood. Reducing communication channels in
MPC. In Dario Catalano and Roberto De Prisco, editors, Security and Cryptography for Networks - 11th
International Conference, SCN 2018, Amalfi, Italy, September 5-7, 2018, Proceedings, volume 11035 of
Lecture Notes in Computer Science, pages 181–199. Springer, 2018.
[Lie12] Manuel Liedel. Secure distributed computation of the square root and applications. In Mark Dermot
Ryan, Ben Smyth, and Guilin Wang, editors, Information Security Practice and Experience - 8th Inter-
national Conference, ISPEC 2012, Hangzhou, China, April 9-12, 2012. Proceedings, volume 7232 of
Lecture Notes in Computer Science, pages 277–288. Springer, 2012.
[Mau06] Ueli M. Maurer. Secure multi-party computation made simple. Discrete Applied Mathematics,
154(2):370–381, 2006.
[NO07] Takashi Nishide and Kazuo Ohta. Multiparty computation for interval, equality, and comparison without
bit-decomposition protocol. In Tatsuaki Okamoto and Xiaoyun Wang, editors, Public Key Cryptography
- PKC 2007, 10th International Conference on Practice and Theory in Public-Key Cryptography, Beijing,
China, April 16-20, 2007, Proceedings, volume 4450 of Lecture Notes in Computer Science, pages 343–
360. Springer, 2007.
[PVW08] Chris Peikert, Vinod Vaikuntanathan, and Brent Waters. A framework for efficient and composable
oblivious transfer. In David A. Wagner, editor, Advances in Cryptology - CRYPTO 2008, 28th Annual In-
ternational Cryptology Conference, Santa Barbara, CA, USA, August 17-21, 2008. Proceedings, volume
5157 of Lecture Notes in Computer Science, pages 554–571. Springer, 2008.
[RST+ 19] Dragos Rotaru, Nigel P. Smart, Titouan Tanguy, Frederik Vercauteren, and Tim Wood. Actively secure
setup for SPDZ, 2019.
169
[RW19] Dragos Rotaru and Tim Wood. Marbled circuits: Mixing arithmetic and boolean circuits with active secu-
rity. In Feng Hao, Sushmita Ruj, and Sourav Sen Gupta, editors, Progress in Cryptology - INDOCRYPT
2019 - 20th International Conference on Cryptology in India, Hyderabad, India, December 15-18, 2019,
Proceedings, volume 11898 of Lecture Notes in Computer Science, pages 227–249. Springer, 2019.
[SC12] Reihaneh Safavi-Naini and Ran Canetti, editors. Advances in Cryptology - CRYPTO 2012 - 32nd Annual
Cryptology Conference, Santa Barbara, CA, USA, August 19-23, 2012. Proceedings, volume 7417 of
Lecture Notes in Computer Science. Springer, 2012.
[sec] secureSCM. Deliverable D9.2. https://www1.cs.fau.de/filepool/publications/
octavian_securescm/SecureSCM-D.9.2.pdf.
[SW19] Nigel P. Smart and Tim Wood. Error detection in monotone span programs with application to
communication-efficient multi-party computation. In Mitsuru Matsui, editor, Topics in Cryptology - CT-
RSA 2019 - The Cryptographers’ Track at the RSA Conference 2019, San Francisco, CA, USA, March
4-8, 2019, Proceedings, volume 11405 of Lecture Notes in Computer Science, pages 210–229. Springer,
2019.
[WRK17] Xiao Wang, Samuel Ranellucci, and Jonathan Katz. Global-scale secure multiparty computation. In
Bhavani M. Thuraisingham, David Evans, Tal Malkin, and Dongyan Xu, editors, Proceedings of the
2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, Dallas, TX,
USA, October 30 - November 03, 2017, pages 39–56. ACM, 2017.
170