Home

Applications of Secure Electronic Voting to Automated

1. 0t 8 g to the previous hop e The troubleshooter receives g g g from the first node and recovers g g g g using the stored value of so 5 4 Tallying phase e Since the tally m is guaranteed to be less than the number of participants t in the FTN request typi cally under 256 the troubleshooter can find m by computing g g g and stopping at the first one of these which matches g e For very large values of t the value of m can be found in O t time using the time space tradeoff known as baby step giant step form a table of values T g j 0 1 V and compute g g for i 0 1 vt stopping at the first value of i for which g g gie appears in T The value of m will be i j vt Readers who are familiar with the framework of 9 will recognize the above scheme as an additive t t threshold de cryption scheme where each FTN node is an election author ity The main difference here is that we accumulate shares of the secret key dynamically during the random walk phase in stead of instantiating secret key shares statically during the initialization phase In principle more general t n thresh old decryption schemes such as 21 and 24 could provide greater robustness against non cooperative nodes but this does not help here since random walk path is historyless and the return packet already needs the cooperation of ev ery FTN n
2. Homomorphic encryption protects users privacy from pas sive attacks where the troubleshooting requests and the con tributed data are all legitimate but friends are curious and might collude and snoop to try to infer other peoples data however without going to the lengths of falsifying in formation Occasionally compromised hosts may launch active at tacks against their peer friends in the forms of the trou bleshooter attack and the data injection attack Section 3 2 The homomorphic encrypted vote tallying can be combined together with the cluster based secure multi party sum pro tocol to increase robustness against active attacks which we will evaluate in the following subsections 11 2 Mitigating the Troubleshooter Attack A compromised host may fabricate a troubleshooting re quest and form a troubleshooting cluster to infer its friends information Without colluding message inspection at the troubleshooter does not reveal any privacy sensitive infor mation due to the use of the homomorphically aggregated network wide tally in the first round and the re encryption mixnet in the second round By colluding with the cluster s exit the malicious troubleshooter can determine the aggre gate contributions from other honest participants within its cluster but will still be unable to determine what an indi vidual member contributed Based on clustering we propose an enhancement to fork the request path to fur
3. ing up the indices for routing necessarily compromises the privacy of application ownership for example one may want to maintain privacy with respect to owning the KaZaa appli cation Furthermore recursive trust rather than transitive trust is assumed in FTN Alice trusts Bob s content and Bob trusts Carol s content but Alice does not trust Carol s content In FTN privacy is achieved using source less and destination less random walk of a troubleshooting request initiated from a troubled machine but where the immediate last hop and next hop are recorded on each involved node during the random walk respective configuration data is gathered There are two key limitations of the current FTN solu tion The first one is that the gathered configuration data is in plain text though in aggregated form Some forms of collusion can give away privacy revealing statistics The second limitation is that the data integrity problem is not explored In this paper we address the first limitation by tailoring a homomorphic encryption scheme to scale with the FTN scenario Our design has the novel property that decryption shares can be dynamically assembled among the participants during the data collection phase with no need for a dedicated key sharing phase To explore the integrity problem we investigate and analyze the effectiveness of zero knowledge proof together with a branching solution where multiple branches are taken to gather the
4. We compute a lookup table of the values g g3 93 for approximately 27 different values of m1 m2 m3 Since our table contains one out of every sixteen possible values of m we can recover any given m using at most 16 successive exponentiations and table lookups or 37000 values using at most 37000 16 27 such operations In Figure 3 we present some experimental measurements for the amount of CPU time used in an actual recovery op eration of this scale Our test program makes use of the optimized finite field arithmetic routines in the MS bignum library We use elliptic curve groups over prime fields with field sizes indicated in Figure 3 All tests represent the av erages of ten trial runs and were performed on a 3 GHz Pentium machine 11 PROTOCOL EVALUATION We evaluated our protocol based on a real world friends network topology It is a snapshot of MSN instant messenger IM operational data from 2003 with 150 682 876 users ES CPU Time sec N T a f f i Ll f f 80 90 100 110 120 130 140 150 160 a Field Size bits 25 7 T 2 o D D 5 20t fe E o 2 v 15 f i f i f 80 90 100 110 120 130 140 150 160 Field Size bits Figure 3 CPU time and memory usage in an actual recovery operation The number of friends of an IM user has a median of 9 and an average of 19 which represents the upper limit of our cluster size 11 1 Robustness Against Passive Attacks
5. configuration data using a real world friends network topology We find that when the percentage of compromised nodes is moderate or small e g 1 or less our approach can effectively reduce the risk of malicious data injection attacks to nearly zero For the rest of the paper we first provide background on PeerPressure in Section 2 Then we state our attacker model in Section 3 In Section 4 we review the previous FTN solution and its limitations We then introduce a privacy preserving data aggregation protocol for FTN based on ho momorphic encryption and various enhancements and opti mizations Section 5 6 7 9 Section 8 addresses the data integrity problem We analyze our protocol overhead in Sec tion 10 Using real world instant messenger IM data we present a security evaluation of our design in Section 11 A brief review of related work is given in Section 12 followed by our conclusion Acknowledgments We are grateful to Josh Benaloh for providing insightful discussion and comments on an earlier draft of this paper and to the anonymous reviewers for their numerous helpful suggestions and corrections 2 BACKGROUND PEERPRESSURE The PeerPressure troubleshooting algorithm uses respec tive configuration data from peers to diagnose the anoma lous configuration entries on the troubled machines The op eration goes as follows PeerPressure first uses application tracing to capture the configuration entries and values th
6. entry values means that the troubleshooting machine only knows the hash of the most popular value of a top ranking root cause candidate entry returned by the PeerPressure diagnosis cf Section 2 and not the entry value itself In order to com municate the actual most popular values of the top ranking entries to the troubleshooter for the purpose of correcting misconfigurations we perform another round of queries us ing a Chaumian style mixnet 5 to protect the identities of the machines having those entries The second round uses the same clusters keyholders en trance and exit nodes as in the first round For each top ranking root cause candidate entry e the troubleshooter queries the network asking for participants whose value for entry e has a hash value equal to the known most popular hash value of entry e Those participants with the matching hash value and who helped in the first round convert their actual entry values to integers Ve using ASCII strings say and Ve will be their contribution in the second round All other participants set their contribution to 0 Similarly to the first round each cluster member first generates and dis tributes random shares of its second round contribution to every distinct cluster member Each cluster member then sums up all the shares that it has received and encrypts this subtotal m using the formula E m g g m where r is chosen randomly and g is the public key it used in
7. ger value and hence its validity is hardly possible to verify Nevertheless use of zero knowledge proof does reduce the number of nodes that can corrupt the tally from G the cluster s size to 2 per hop To further limit the impact of data injection attacks a 3Tnnocence level is defined as a metric in 15 to evaluate different privacy levels troubleshooter can send several requests to disjoint subsets of friends enabling each subset to carry out a branch of troubleshooting and examine the results returned by all branches to check if any one of the results is anomalous The troubleshooter can filter out maliciously tampered re sults if the majority of the branches do not encounter any compromised nodes We call this a multiple branch trou bleshooting strategy Guaranteeing the integrity of the return values requires that all samples must be gathered from the troubleshooter s own cluster and every participant must verify the validity of their contribution to the troubleshooter itself However there are several reasons that this requirement cannot be met First of all it would compromise the troubleshooter s anonymity Furthermore the PeerPressure algorithm needs 10 samples to be effective According to our IM topology 51 of users have fewer than 10 friends and hence must collect samples from their friends clusters Finally a trou bleshooter would easily determine the aggregate information of its immediate friends i
8. is therefore directly proportional to the number of bits needed to represent an element of G Traditional ElGamal encryption uses a group G equal to the multiplicative group of a finite field for which 512 bits per group element is considered necessary for minimal secu rity and 1024 bits for good security At these group sizes an FTN request consisting of 37000 encryptions would be about 4 8 MB in size for a 512 bit group or 9 6 MB for a 1024 bit group These estimates do not take into account any extra overheads which would be incurred by zero knowl edge proofs Using elliptic curve based ElGamal groups 19 we can achieve the same level of security using fewer bits It is estimated 20 that a 110 bit elliptic curve achieves crypto graphic security equivalent to that of a 512 bit finite field and a 139 bit elliptic curve is comparable to a 1024 bit finite field These group sizes result in FTN request sizes of 1 0 MB with 110 bit elliptic curves or 1 25 MB with 139 bit elliptic curves For the second round if we use 1024 bits per suspect entry and include the 20 top ranking suspect entries in the mixnet then the total size of each user s collection of encrypted en tries is 40 1024 bits per user for a final accumulated packet size of 1 MB assuming that 200 users participate in the mixnet Therefore the packet size is comparable to the first round The above first round and second round figures only ap ply to communications b
9. of E m and E mz2 without having to perform any decryption If one imagines that m and mz represent vote tallies then the homomorphic property means that anyone can add up encrypted votes but only those who know the secret key can decrypt the tally We now describe the procedure for performing parame ter aggregation within an FTN request In order to avoid concentrating the secret key in the hands of one party we mutate the secret key at each step of the random walk as described below We assume that the group G and the gen erator g are global constants which are built into the FTN client program 5 1 Initialization phase e The troubleshooter picks a pair of integers ro so at random and computes E m g g g where m is the troubleshooter s value for the entry either 0 or 1 e The troubleshooter stores the secret key so and for wards the encrypted value E m and the public key g to an available friend 5 2 Random walk phase e Assume that the i th node on the request path re ceives a public key of the form g 1 1 and an encryption E m g g ots1t 8i g of m where m represents the number of votes for 1 which have been accumulated so far in the random walk For convenience write s for so s1 8 1 e The node picks a new secret key s at random and computes the values ght g p g mete ge g g g The quantity E m g g amp t g is now a
10. successful data in jection attack as the ratio of failed troubleshooting Figure 5 compares the probability of successful data injec tion attacks with and without zero knowledge proof protec tion for different percentages of compromised nodes aver age path lengths and the number of branches the requester selects to send a troubleshooting request Based on Figure 5 we have the following observations e The risk of data injection is smaller when a single path is used than two branches This is because the multiple branch troubleshooting is only effective when the majority of the branches return an untampered result If only two branches are used as long as one of them encounters a compromised node the trou bleshooting fails since the troubleshooter cannot de termine which branch s result is valid and since more nodes are involved compared to using a single branch the risk of encountering a compromised node is in creased Therefore if the troubleshooter has only a Without ZKP 1 compromised node c A A Sos a a S 3 T 0 8 a a n S E ee eee Vv k y w OTR ES Le e SA Sosy 5 wa o g Te 057 a n S S 0 44 2 clusters per branch 2 4 clusters per branch ost y 6 clusters per branch 4 2 a A 8 clusters per branch Q L od 0 2 Q Srii a o 6 2 3 4 5 7 8 9 10 Number of branches to carry out troubleshooting With
11. trou bleshooter attack which forces the request to traverse a longer path to gather enough samples The iterative helper selection method 15 guarantees probable in nocence for all the cluster participants in the face of a successful troubleshooter attack while achieving a higher helping rate and the request propagation ter minates after only 2 clusters on average Then with fA participant is probably innocent if from the attackers point of view it appears no more likely to be a helper than not to be one With ZKP 1 compromised node 0 35 T T T 2 clusters per branch 4 clusters per branch 0 3 y 6 clusters per branch 7 re A 8 clusters per branch 0 25 Probablity of successful data injection 2 3 4 5 6 7 8 9 10 Number of branches to carry out troubleshooting With ZKP 0 1 compromised node 0 03 T T T T T c 2 clusters per branch 2 A 4 clusters per branch 3 0 025 v 6 clusters per branch A 8 clusters per branch u i g 0 024 E X A 0 015 7 b Boy 5 Y o L 5 oop y y 2 SN a SON S 0 005 me A g ee a NN 0 1 1 1 f meon 1 1 2 3 4 5 7 8 9 10 Number of branches to carry out troubleshooting Figure 5 The probability of a successful data injection attack when the troubleshooter sends out a request on multiple branches zero knowledge proof the risk of a successful data in jection attack is below 0 04 whe
12. Applications of Secure Electronic Voting to Automated Privacy Preserving Troubleshooting Qiang Huang Princeton University qhuang princeton edu ABSTRACT Recent work 27 15 introduced a novel peer to peer applica tion that leverages content sharing and aggregation among the peers to diagnose misconfigurations on a desktop PC This application poses interesting challenges in preserving privacy of user configuration data and in maintaining in tegrity of troubleshooting results In this paper we provide a much more rigorous cryptographic and yet practical solu tion for preserving privacy and we investigate and analyze solutions for ensuring integrity Categories and Subject Descriptors k 4 1 Computers and Society Public Policy Issues privacy General Terms Security Design Keywords Privacy Integrity Automatic Troubleshooting Homomor phic Encryption Zero Knowledge Proof 1 INTRODUCTION Recent work 27 15 introduced a novel and legal peer to peer application that leverages content sharing and ag gregation among the peers to diagnose misconfigurations on a desktop PC automatically The diagnosis is based on the PeerPressure troubleshooting algorithm 28 The key intu ition of PeerPressure is that misconfigurations of a PC are likely anomalous when compared with the respective config urations from other PCs having the same setting Hence in a peer to peer setting the troubled PC collects respective conf
13. at are touched by the abnormal execution of the application under troubleshooting These entries are misconfiguration suspects Next from a sample set of helper machines for each suspect entry e PeerPressure obtains the number Me of samples that match the value of the suspect entry the cardinality Ce the number of distinct values for this entry among the sample set and the most popular value for the entry PeerPressure uses these parameters along with the sample set size and the number of suspect entries to calcu late the probability of a suspect entry being the cause of the symptom Pe w Te AES where N is the number of samples and t is the number of suspects The top ranking entries with regard to this probability are diagnosed as the root cause candidates Then the troubleshooting user can use the collected most popular values for corrections The sample set can be obtained either from a database of con figuration snapshots collected from a large number of user machines or from a peer to peer troubleshooting community such as the one described in this paper PeerPressure has been shown to be effective in troubleshooting 28 3 ATTACKER MODEL AND SECURITY OBJECTIVES 3 1 Security Objective The information being communicated in FTN is PC con figuration data We denote the complete set of configuration data on a machine as D A certain subset of D contains identity revealing information such as usernames and cook ies a
14. b Sg 2 i r2 w rd2 r w rd bo g 5 4 e3 w rs Figure 2 Zero knowledge proof that x y g g M where M 1 or M g suffices to prove that an encrypted message g g g7 satisfies either m 0 or m 1 without revealing which of the two is the case In general for a given value of C the proof requires transmitting C 1 quadruplets ai bi di ri and expands the bandwidth requirement of the protocol by a factor of 2C 2 If we transmit a vector of C2 encrypted messages as de scribed in Section 7 then we can transmit zero knowledge proofs of validity for each component of a contributed vec tor and apply the Shamir secret sharing scheme as described in 10 to pick the challenges c in such a way that the va lidity of the vector as a whole can be verified i e no more than one component contains a vote The total cost in bandwidth remains 2C 2 times that of the original unval idated vector since the O C2 cost factor is already reflected in the size expansion of the original unvalidated vector Even if this bandwidth cost is too high to allow all of the first round data to be validated it still makes sense to perform retroactive validation of the top ranking entries which appear in the second round Section 9 since the integrity of these entries is especially crucial to the success of the troubleshooting request 8 3 Zero Knowledge Proofs and Clustering Zero knowledge validity proofs ca
15. cryption is also used in 4 to allow a com munity of users to compute a public aggregate of their data without exposing individual users data Similarly the well known secure multiparty sum protocol enables aggregation without revealing individual private contributions How ever these protocols rely on a public bulletin board and a beacon for random bits and work only when there is a known space of choices for the data In our case the space of possible values for a configuration entry is unknown We combine those two techniques to address both the passive and the active attacks efficiently and we extend them to support counting the number of distinct values in a set as well as revealing the most popular value while keeping the individual contributions private Our problem of privacy preserving parameter aggregation shares much similarity to the problem of secure and privacy preserving voting 13 2 9 8 with a few differences First voting requires voters to be authenticated by a centralized authority such as the government Second our protocol has an additional requirement of participation privacy oth erwise privacy of application ownership is compromised Third most voting scenarios involve a fixed limited number of voting chances while our troubleshooting problem does not Finally the scaling requirements of FTN are differ ent from those of national elections while national elections have at most a few dozen ballot ite
16. e which can then be used to repair the sick machine The effect of this protocol is to implement a simple thresh old re encryption mixnet guaranteeing data privacy without providing any integrity checks on the decryption process The techniques of Section 8 already protect the integrity of the first round hash values which determine the PeerPres sure diagnosis and any node attacking the troubleshooter with a data injection attack will not know the number of helpers and hence at worse can only induce random faults in the second round output which can be easily recognized and remedied by repeating the query cf Section 11 3 A cluster exit participating in a troubleshooter attack cf Section 3 2 could in principle fabricate all of the public key data originating in the request path and collude with the keyholder to decrypt each member s subtotal Together with the troubleshooter they will be able to find out the aggregate value of Ve within their cluster However they still cannot determine which individual member contributed Ve if there is any The forking mechanism of Section 11 2 can further reduce this threat of troubleshooter attacks 10 RESOURCE USAGE 10 1 Bandwidth Overhead If we assume the parameter values given at the end of Section 7 then a median first round troubleshooting request of 1171 entries requires handling about 37000 ElGamal en cryptions each one consisting of two elements of G The bit length of a request
17. ed subtotal to the cluster exit In addition the cluster entrance modifies the old encrypted total to use the new public key and adds the old total into his subtotal using the homomorphic property before sending the combined total to the cluster exit 4 Exiting the cluster The cluster exit uses the homo morphic property of the encryption scheme to sum up the received subtotals from all participants and then sends this encrypted total along with the new pub lic key to the next recipient All other aspects of the protocol remain unchanged On the return trip the encrypted result is relayed to each keyholder and decrypted in the same manner as in Section 5 7 DEALING WITH UNKNOWN CARDINALITIES We now consider the case where the set of possible val ues for the entry is unknown As in 15 this problem can be solved by using a hash function h to map the values of the entry into a small numerical range 1 C The FTN nodes will then maintain the number of entry values m m2 mc that hash to each of the C values in the troubleshooting request In other words the voting scheme must maintain a vector of tallies instead of a single tally There are two ways that this can be achieved 1 Pick a predetermined list of generators g1 g2 9c and encode a vector m m1 mc as 2 Encrypt each value m separately and transmit a list of encryptions E m1 E mc instead of a single encryption Option 1 increases t
18. ering for both cases When there are 0 1 com promised nodes querying 4 branches in the second round reduces this threat to nearly zero 12 RELATED WORK There is much related work in the area of anonymiza tion The random walk approach is also used in FreeNet 7 and Crowds 23 FreeNet is a distributed anonymous in formation storage and retrieval system Crowds provides anonymous web transactions Other anonymization systems are based on Chaum s mizes 5 which serve as proxies to provide sender receiver unlinkability through traffic mixing Onion routing 14 extends the mixes with layers of onion style pre encryptions Tarzan 12 implements the mix idea using a peer to peer overlay and provides sender anonymity and robustness to the mix entry point All of the above anonymization techniques address point to point communications However our protocol in FTN involves one to many communication in the form of broad casting a troubleshooting request to peers This broad cast should be limited according to the friend relationships which is more naturally implemented using a peer to peer overlay Further our recursive trust model requires that the configuration data be transmitted between friends Fully anonymous configuration data arriving over a mix network could not be trusted to be authentic as only friends can be trusted not to contribute false and potentially harmful information about their configurations Homomorphic en
19. etween cluster entrances cluster ex its and keyholders and not to intra cluster communications within a single cluster The latter transmissions do not need encryption because the cluster based secure multi party sum protocol already provides sufficient protection against pri vacy attacks Accordingly the bandwidth requirements for communication within a single cluster remain the same as in 15 10 2 CPU Overhead Recall from Section 7 that the tallying phase requires the troubleshooter to recover for each suspect value a list of C2 32 vectors each of the form m m1 m2 m3 where m ranges from 0 to the maximum number of participants in an FTN request If we assume no more than 255 participants in a request then each of the values m ranges from 0 to 255 and we must recover the vector m m1 m2 m3 given the quantity g1 g3 g3 this recovery must be done 32 times per suspect value or a total of 37000 times assuming a median request size of 1171 candidate suspects We are mainly concerned with the amount of CPU time needed to recover these 37000 vectors since every other computation required by the protocol is at least an order of magnitude less time consuming than the recovery step Since there are 274 possible values for each m and 37000 different vectors m to recover it would be too slow to re cover the vectors m by brute force trial and error Instead we use the baby step giant step search algorithm from Sec tion 5
20. eudorandom hash functions to determine the val ues of the random inputs used in the interactive protocol 8 2 Proofs of Validity We now address the problem of proving validity of votes By validity we mean that each vote is syntactically valid in the sense that it increments at most one component s to tal by one We do not attempt to guarantee that a valid vote accurately reflects the actual internal configuration of the machine since such a guarantee is impossible to achieve Therefore the goal here is to allow participants to prove that their contribution E m represents the encryption of a mes sage m m1 mc where at most one of the values m is 1 and the rest are 0 without revealing the values them selves The generic construction of 10 provides a method to transform the Chaum Pedersen zero knowledge equality protocol of Figure 1 into a zero knowledge proof of validity for encrypted ballots of this form at the cost of increasing communications complexity by O C For example if we assume for simplicity that C 1 then the interactive proto col of Figure 2 or its non interactive Fiat Shamir analogue Prover Verifier IfM 1 EM g w r d EZ w ro d2 EZ r g r g rs rs xy y g ol yarn gt ai g a ai g d a b 71s Y Ws 12L b g Bi by g x az g a2 ga c d d2 d 1b be g 5 be g 4 2 a2 02 ai gir ag g r dy d 2 d d2 c d di c do LO l
21. f a single cluster were used Involv ing several clusters in the troubleshooting process renders the troubleshooter attack harder to launch We assume that there is only an occasional possibility for a trusted friend s machine to get compromised and hence the percentage of compromised nodes is moderate or small e g 1 or less To quantitatively study the effect of using zero knowledge proof and multiple branch troubleshooting on mitigating the data injection attack we performed simu lations on our IM topology under two different attacker sce narios in which we assumed the percentage of compromised nodes was 1 and 0 1 respectively For each attacker sce nario we randomly marked the nodes being compromised We then randomly picked 1000 honest nodes to send a trou bleshooting request to 1 to 10 distinct branches depend ing on the number of friends they may have We simulated the troubleshooting cases when different numbers of clusters were involved on each request path With zero knowledge proof we marked the result returned by one branch as being corrupted only if the request hit a compromised entrance or exit on that branch otherwise if the request encountered any compromised node during its propagation we marked the result as being tampered A requester s troubleshooting was considered unsuccessful if half or more of its branches returned a tampered result Then under each attacker sce nario we computed the probability of a
22. from friends To ensure integrity we combine the selective use of zero knowledge proofs together with a branching solution where multiple branches are taken to gather the configuration data using real world friends network topology We find that when the percentage of compromised nodes is moderate or small e g 1 or less our approach can effectively reduce the risk of malicious data injection attacks to nearly zero 14 REFERENCES 1 R Agrawal and R Srikant Privacy Preserving Data Mining In Proceedings of SIGMOD 2000 2 J Benaloh Verifiable Secret Ballot Elections Ph D thesis Yale University 1987 3 J Benaloh and M Yung Distributing the power of a government to enhance the privacy of voters In Proc 5th ACM Symposium on Principles of Distributed Computing PODC 86 pp 52 62 New York 1986 4 J Canny Collaborative Filtering with Privacy In IEEE Security and Privacy 2002 5 D Chaum Untraceable electronic mail return addesses and digital pseudonyms Communications of the ACM 24 2 pp 84 88 1981 6 D Chaum and T Pedersen Wallet databases with observers In Advances in Cryptology Crypto 92 LNCS 740 pp 89 105 1993 7 I Clarke O Sandberg B Wiley and T W Hong Freenet A distributed anonymous information storage and retrieval system In Proc International Workshop on Design Issues in Anonymity and Unobservability 2001 8 L Coney J L Hall P L Vora D Wagner T
23. g 2 a random nonce ReqID identifying the request 3 the value distribution or his togram of each suspect entry e that is a list of values that e can take and the vector me i counting the occur rences of each value i of e from the sample set The goal of the FTN protocol is for a sick machine to obtain the ag gregate value distributions for all suspect entries With the value distribution of each entry e the sick node can extract the cardinality the number of matches and the most popu lar value to carry out the PeerPressure diagnosis Section 2 To preserve source anonymity the troubleshooting mes sage is designed to be ownerless and the value distribution field is randomly initialized by the requester 4 2 Parameter Aggregation Through a Source less and Destination less Random Walk The FTN is an unstructured peer to peer network where overlay links are made only to trusted friends machines Search for samples and parameter aggregation is integrated in a source less and destination less random walk on the friends overlay network topology The sick machine first establishes a secure channel with an available friend chosen at random and sends this friend the troubleshooting request To avoid routing loops or double counting the friend responds with an acknowledgment only if it has not already seen the ReqID of the arriving request A friend that receives a troubleshooting request and runs the application under troub
24. g these techniques directly is that they assume a trusted base station is available to compute the resilient data aggregation function this is infeasible in the peer to peer FTN context where data contributions must be kept confidential from the other participants The authors of SIA 22 also presented a set of techniques for secure information aggregation in sensor networks The integrity of information aggregation is achieved essentially through authentication which is identity revealing In FTN we cannot do the same because of privacy concerns 13 CONCLUSIONS In this paper we tackle the key security challenges in Friends Troubleshooting Network preserving the privacy of aggregating peer configuration data and ensuring the in tegrity of troubleshooting results To guarantee data privacy we apply the asymptotically optimal homomorphic encryption scheme from 9 and tai lor it to scale with the FTN scenario Our design has the novel property that shares of the secret key are assembled in parallel with the encrypted data Although the use of ho momorphic encryption adds some computational complex ity the only additional user visible cost of the protocol is bandwidth which only has to be paid by the minority of participants who are routing nodes or keyholders The com putational resources required by our scheme are practical and represent realistic commitments for a troubleshooting network in which one responds to direct queries
25. he complexity of the tallying phase of the protocol instead of recovering a single value m given g the troubleshooter now has to recover an entire vector m mi mc given only the single value Tey ge Assuming each coordinate m has maximum size t the re covery of m will take O t time using the naive brute force search method or O t if the baby step giant step method of Section 5 is used For this reason Option 1 can only accommodate values of C up to about C 4 before the computational overheads become prohibitive Option 2 is less computationally taxing because the com putational overhead increases only linearly in C instead of exponentially However the size of each packet also in creases by a factor of C under Option 2 Since we want to keep bandwidth utilization to a minimum we use a hybrid scheme wherein we transmit a list of encryptions of length Cz as in Option 2 but each element of that list in turn encodes a short vector of length C as in Option 1 Even a modest value of C such as 3 will reduce the bandwidth overhead by a factor of 3 In 15 the authors propose using a range of 96 values per entry partitioned into six independent hashes of 16 values Prover Verifier x y g 9 w Z chosen randomly a b g g T lt c Z chosen randomly z z w csi ay g ar rz 2 g by Figure 1 Zero knowledge proof that y x given g and g each Using these pa
26. he honest exit in the first round since they are unaware of the actual number of helpers inside the cluster However since every participant contributes an encrypted subtotal in the second round and the mes sages are concatenated they may infer that no future hops are involved by counting the number of encrypted messages contained in the reply returned by the honest exit in the second round To prevent the troubleshooter from gaining extra information in such cases we require the honest exit to randomly select a number of subtotals which sum up to 0 encrypt them and add the encrypted messages into the second round reply if it does not forward the request due to Py or it encounters a dead end situation Therefore with two exits to fork the request path the troubleshooter s un certainty becomes Htwofork LEH Qi EENE EN G Ga1 wA G G 1 Hiaeat 2 ER a lt Hideai COED Hiaeat The forking strategy increases the troubleshooter s uncertainty by a factor of greetert E 1 g Figure 4 shows the ratio of a ae and pare for different values of C and G The ee the ratio ou A 1 the more robust the system is against the troubleshooter attack If a compromised host launches a successful troubleshooter attack by colluding with both exits together they will be x gt Isi AK ze ta Bx acA se A 2S 0 95 Ee ee S S aut ae oS a v Y at oO y Vv 63 0 97 y yY Zo P
27. i pant represents its own contribution using the vector format me t The contribution of the cluster entrance includes the aggregate value distribution from the previous hops Mem bers who do not run the application or who choose not to help according to P will contribute the all zeroes vector Members who help will set the vector element correspond ing to their value to 1 and 0 s for the rest The cluster entrance then initiates a secure multi party sum procedure that blends individual cluster member s contributions into an aggregate that encapsulates the contributions from both the cluster and the past hops A separate cluster member other than the entrance is selected as the cluster exit for receiving the aggregate With probability pY where Vp is the number of helpers in the cluster the exit further prox ies the request to one of its friends chosen at random which becomes the cluster entrance of the next hop 4 4 Weaknesses of Previous Design and Our Motivations to Use Homomorphic Encryp tion In the previous design the gathered configuration data is in plain text A curious entrance and exit can launch a passive attack merely by sharing what they know to com promise privacy of the cluster as a unit In this paper we use homomorphic encryption Section 5 to encrypt each in dividual s contribution which robustly guarantees privacy under the passive attacker model The previous design does not address compromised
28. iguration data from the peers Then the anomalous looking configuration entries on the troubled machine are diagnosed as misconfigurations and the most popular con figuration values from the peers are used as the correction values Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page To copy otherwise to republish to post on servers or to redistribute to lists requires prior specific permission and or a fee CCS 05 November 7 11 2005 Alexandria Virginia USA Copyright 2005 ACM 1 59593 226 7 05 0011 5 00 David Jao Microsoft Research davidjao microsoft com Helen J Wang Microsoft Research helenw microsoft com This application poses interesting challenges in preserv ing privacy of user configuration data and in maintaining integrity of troubleshooting results since configuration data often contain privacy sensitive information and a peer is not always trustworthy To this end the authors of 27 15 pro posed a Friends Troubleshooting Network FTN which is an unstructured peer to peer network where a link between two machines represents friendship of their owners and the two machines trust each other s content being exchanged A structured peer to peer network is unsuitable because build
29. leshooting only becomes a helper with probability Pa A helper needs to update the trou bleshooting request For each suspect entry e the helper increments me i where i is its own value for e Then with a probability of forwarding Py 1 1 N the helper prox ies the request to one of its friends where N is the num ber of samples needed otherwise it becomes the last hop This probabilistic proxying makes routing entirely history less Nodes that do not help always forward the request to their friends This results in N helpers being involved on average Each node on the forwarding path must record the ReqID along with the previous and next hop friend The last hop node waits for a random amount of time then sends the reply back to the previous hop The reply follows the request path back to the sick machine The sick machine first subtracts the random initialization from the value distributions then it performs PeerPressure diagnosis 4 3 Clustering If a helper contributes its relevant configuration state di rectly its previous and next hop may gossip to determine its configuration information The previous design addressed the gossip attack via a cluster based multiparty secure sum protocol When a node receives a troubleshooting request instead of contributing to the request individually it forms a trou bleshooting cluster from its immediate friends The initiat ing node serves as the cluster entrance Each cluster partic
30. ll be much smaller than P Obviously the more forks we use the less likely that a troubleshooter is able to successfully launch the attack since it has to collude with more exits To quantitatively measure the uncertainty of the malicious troubleshooter about whether the decrypted tally is from its own cluster or has been mixed with contributions from future hops we use an information theoretic metric based on Shannon s definition of entropy 25 In the ideal situ ation where there are no colluding participants the trou bleshooter only has the information that under probability 1 Py the reply only contains the aggregate data values from its own cluster where Py is the average probability of forwarding the request from one cluster to another The troubleshooter s uncertainty can be quantified as Hideat Py log Py 1 Pf log 1 Py When there are C col luders in the participants the troubleshooter s uncertainty is reduced to Anofork g 0 1 S Hideal S52 Hideat However if two exits are selected to fork the troubleshooting path the troubleshooter cannot infer whether future hops are involved unless it colludes with both exits If only one of the exits colludes with the troubleshooter then under prob ability Py the honest exit will forward the request to future hops We note that the troubleshooter and the colluding exit cannot determine if future hops are involved from the reply returned by t
31. ll see in this section it is pos sible to combine the clustering enhancement together with encrypted vote tallying without making any major changes to either scheme The clustering procedure requires four steps 1 Random share generation and distribution Each clus ter member generates G 1 random shares for its con tribution vector where G is the cluster size and dis tributes each share to a distinct cluster member The shares are signed with the sender s public key Note that these shares do not need to be encrypted since no proper subset of the shares reveals any information about the value of the vector 2 Cluster exit election This step proceeds unchanged except that in addition to selecting a cluster exit a number of keyholders are also selected typically one keyholder per cluster will suffice The cluster entrance broadcasts the first component g of the encrypted message to each keyholder Each keyholder in turn generates a secret key s as in the basic aggregation protocol and unicasts the quantities g and g to the cluster entrance The cluster entrance then cal culates the new public key using the values g and broadcasts the new public key g t to the cluster where g is the old public key it received from the pre vious hop 3 Unicast subtotal to the cluster exit Each cluster mem ber sums up all the shares it has received encrypts this sum with the new public key and unicasts its en crypt
32. milar to that of a secure elec tion in both cases we want to generate an accurate tally of the number of participants who voted 1 without revealing who contributed a 1 and who contributed a 0 In order to accomplish encrypted data collection we employ a type of homomorphic voting system with threshold decryption 2 3 The particular scheme we use is a simplified version of the ElGamal based election scheme of Cramer Gennaro and Schoenmakers 9 this scheme was chosen over the oth ers because of its optimality with respect to communication complexity In this paper we modify the CGS protocol so that shares of the decryption key can be aggregated in par allel with the participant recruitment and data collection steps To our knowledge this represents the first threshold decryption design which performs secret key sharing and data aggregation simultaneously Let G be a mathematical group which is generated by an element g G of finite order Given a public key gf where s is secret an ElGamal encryption E m of a message m consists of a pair g g g where r is chosen randomly by the machine performing the encryption A crucial prop erty of ElGamal encryption is that it is homomorphic given two encryptions E mi g g g E m2 g g g of m and mz respectively the encryption E mi m2 g i 172 gta tr2 s g it of m m2 can be computed by multiplying the compo nents
33. ms and turnaround times of up to an hour are perfectly acceptable FTN needs to ac commodate several thousand ballot items and turnaround times need to be minimized As an alternative to full blown homomorphic encryption the authors of 17 present a voting system based on cryp tographic counters that only support a restricted set of en crypted increment and decrement operations Although the concept of cryptographic counters is potentially useful in FTN the scheme given in 17 is not a good fit for FTN be cause of the differing scaling requirements In particular the use of an encryption function based on quadratic residuosity and the need for L rounds of communication with a public bulletin board where L is the number of participants mean that the bandwidth usage of this scheme exceeds that of our elliptic curve based scheme Likewise recent work of Kissner and Song 18 enables private computation of a very general class of set operations but does so at the cost of sacrificing the bandwidth savings afforded by elliptic curves Wagner in 26 addresses the problem of compromised nodes in the context of sensor networks and describes how resilient aggregation techniques can be used to limit the amount of damage a compromised sensor can inflict upon the aggregate results of the network Using resilient aggre gation in FTN may help when the amount of redundancy within peers data contributions is large The main difficulty in applyin
34. n be combined with clus tering by having each cluster member encrypt and sign the N 1 shares that it distributes to each of the N 1 other members of the cluster Instead of relying on the shared keyholders encryption key we require each member to gen erate a new encryption key from scratch in order to prevent malicious keyholders from colluding with the cluster exit and decrypting the encrypted values later Since this en cryption key is different from the encryption key used in the homomorphic tally the cluster member should also include a zero knowledge proof that the encrypted share has the same value as the unencrypted share using the zero knowl edge equality proof of Figure 1 The recipients of each share should then verify that the equality proof is correct if it is incorrect then they should publish both the signed original share and the signed encrypted share and proof so that the other cluster members can see that the proof is incorrect After verifying the equality proofs the recipients forward the encrypted shares to the cluster exit and the cluster exit combines the encrypted shares using the homomorphic prop erties of the encryption scheme to recover an encrypted ver sion of the original raw vote The owner of this vote can then provide a zero knowledge proof as in Section 8 2 to demon strate that this encrypted value represents a valid vote 9 SECOND ROUND QUERY The use of a hash function to digitize all
35. n the percentage of compromised nodes is 1 which is further reduced to 0 005 0 002 and nearly 0 if the request is sent to 4 6 and 8 branches respectively When 0 1 nodes are compromised the risk is below 0 004 and further reduced to nearly 0 with 4 or more branches e Finally issuing the request over multiple branches in curs a higher communication overhead than using zero knowledge proof since we only validate the few top ranking entries that make it through to the second round Therefore a single path is preferable if the size of the troubleshooting message is large e g the number of suspect entries is large and the bandwidth assigned for troubleshooting is limited When the set of possible values for suspect entries is un known we use a second round query to find out the most popular values of the top ranking root cause candidates Section 9 However the contribution of participants in the second round is the unknown entry value that we need to query for and hence its validity cannot be verified A compromised host may thus contribute a false entry value to corrupt the aggregate sume The consequence is that during the process of recovering the actual entry value the division will result in a non integral value or the resulting string will not be intelligible e g contains non ASCII char acters In this way the troubleshooter is able to recognize the corruption and hence discard the invalid result The troubleshoote
36. national Workshop on Peer to Peer Systems IPTPS 2004 H J Wang J Platt Y Chen R Zhang and Y M Wang Automatic Misconfiguration Troubleshooting with PeerPressure In Proceedings of OSDI 2004
37. nd we denote this subset as D A canonicalizer first fil ters any user specific entries into a canonicalized form The remaining set of configuration data D D D may con tain information that compromises privacy when linked with user identity Some examples of such information are URLs visited and applications installed Our privacy objective is to protect all peers privacy by anonymizing such privacy sensitive information in Dr of course D must never be revealed In addition to the configuration data we aim to protect the identities of the sick machine i e the trou bleshooter and the peer helpers In some cases the mere fact that one is running a particular application may be privacy sensitive Proper roll back mechanisms are needed if a root cause candidate is not actually the root cause when the correction does not remove the sick symptom Finding all identity revealing entries is an open research question Aside from the privacy of troubleshooting users we also aim to protect the integrity of their contributed configura tion information since a compromised friend may lie about the configuration state it has leading to incorrect trou bleshooting results 3 2 Attacks We assume a friendly operational environment in FTN where attackers are simply curious friends together with an occasional compromised machine that has not yet been repaired by its user Curious friends may launch passive attacks to ob
38. ng In Proceedings of the 11th USENIX Security Symposium pp 339 353 2004 J Katz S Myers and R Ostrovsky Cryptographic Counters and Applications to Electronic Voting In Advances in Cryptology Eurocrypt 2001 LNCS 2045 pp 78 92 2001 L Kissner and D Song Privacy Preserving Set Operations In Advances in Cryptology Crypto 2005 LNCS 3621 2005 N Koblitz Elliptic curve cryptosystems Math Comp 48 1987 pp 203 209 A Lenstra and E Verheul Selecting Cryptographic Key Sizes J Cryptology 14 2001 no 4 pp 255 293 T Pedersen A threshold cryptosystem without a trusted party In Advances in Cryptology Eurocrypt 91 LNCS 547 pp 522 526 1991 B Przydatek D Song and A Perrig SIA Secure Information Aggregation in Sensor Networks In Proceedings of ACM SenSys Nov 2003 M K Reiter and A D Rubin Crowds Anonymity for Web Transactions In ACM Transactions on Information and System Security Nov 1998 A Shamir How to share a secret Comm ACM 22 1979 no 11 pp 612 613 C E Shannon A Mathematical Theory Of Communication Bell System Tech J 27 1948 pp 379 423 623 656 D Wagner Resilient Aggregation in Sensor Networks In ACM Workshop on Security of Ad Hoc and Sensor Networks SASN 04 Oct 2004 H J Wang Y C Hu C Yuan Z Zhang and Y M Wang Friends Troubleshooting Network Towards Privacy Preserving Automatic Troubleshooting In Proceedings of the 3rd Inter
39. nodes In the real world friends machines might be occasionally compromised leading to active attacks against the FTN protocol in the forms of the troubleshooter attack and the data injection attack Section 3 2 Homomorphic encryp tion can be combined together with the clustering strategy Section 6 to mitigate the troubleshooter attack We also propose a further enhancement by forking the troubleshoot ing path to make the troubleshooter attack less productive Section 11 2 The previous design does not provide any protection for the integrity of the troubleshooting result A compromised host may hence launch a data injection attack by contribut ing false configuration information leading to incorrect trou bleshooting results The use of homomorphic encryption en ables verification of data validity via zero knowledge proof ZKP which together with our proposed multiple branch troubleshooting strategy largely reduces the risk of success ful data injection attacks Section 11 3 5 PRIVACY PRESERVING DATA AGGRE GATION USING HOMOMORPHIC EN CRYPTION We now present an encryption scheme for parameter ag gregation which provides robust guarantees of data privacy through the use of secure electronic voting protocols For simplicity we assume that the entry e takes on only two possible values e g 0 or 1 and that clustering is not used later we will explain how to remove these restrictions Note that this scenario is very si
40. nvolved would be 3 on average the same as an Innocence level of J 2 in 15 Therefore a cluster member can set the probability to help based on its cluster size to the value corresponding to I 2 in order to gather 10 samples We simulated our FTN routing protocol with the forking strategy on the static IM topology configured with probabil ity of forwarding Py 0 7 We randomly picked 100 start ing nodes as the requester and set the corresponding Pph according to the Innocence level of J 2 in order to obtain approximately 10 samples In our simulation we imposed an upper bound of 36 on the cluster size to limit the intra cluster communication overhead When we set Ps 0 7 the median number of clusters involved is 3 and the me dian number of nodes involved is 70 In occasional cases the number of clusters involved is more than 16 which incurs a high communication overhead The initiator may send a message via the request path to cancel the request propaga tion after an appropriate timeout period has elapsed The timeout should be chosen randomly and kept private to the initiator itself Another friends network characteristic that is of interest to the forking mechanism is the percentage of clusters that can only find one exit In such cases the cluster mem bers will not contribute any sample in order to prevent the troubleshooter attack The request can still be prop agated through the unique exit Encountering such node
41. ode in order to make its way back to the trou bleshooter This reliance on honest behavior means that the encryption scheme given here does not protect a user s privacy against active attacks such as the troubleshooter at tack Instead the purpose of the encryption is to protect users from passive attacks where we assume the attackers are curious friends who log and share whatever data they acquire in the course of responding to a legitimate request but do not alter results or fabricate false data in any way In the passive attack model 9 Theorem 2 combined with the Diffie Hellman assumption shows that no coalition can gain any information about individual votes except for that which is implied by the votes which are cast by the coalition and the final tally Another advantage of using homomorphic encryption is that we can require the participants to give zero knowledge proofs of data validity in order to show that all protocol op erations are legitimate and that they are not manipulating the tallies in any way Since zero knowledge proofs inter act with the protocol elements described in the next few sections we defer discussion of the topic to Section 8 6 CLUSTERING In addition to protecting users against gossip attacks the cluster based secure multi party sum protocol of 15 pro vides some protection against troubleshooter attacks by lim iting the extent to which any single machine s information can be isolated As we wi
42. out ZKP 0 1 compromised node 2 clusters per branch 4 clusters per branch 4 V 6 clusters per branch A amp clusters per branch 4 S N a 2 h ad a Probablity of successful data injection Number of branches to carry out troubleshooting limited number of friends e g less than 4 it should use a single path for troubleshooting When the percentage of compromised nodes is mod erate or small e g 1 or less the multiple branch troubleshooting strategy works particularly well to gether with zero knowledge proof since it is less likely that over half of the branches will be compromised The combination is especially efficient at reducing the risk of data injection when a large number of clusters are involved on a branch For example if there are 1 compromised nodes and the path length is 8 the prob ability of successful data injection attacks on a single path troubleshooting request without zero knowledge proof protection is 0 694 the probability is reduced to 0 14 if zero knowledge proof is enabled and issuing the request over 6 branches further reduces the risk to 0 058 8 branches to 0 02 and 10 branches to 0 01 In general involving more clusters increases the risk of successful data injection attacks Although a smaller path length is desirable by the troubleshooter the peers in FTN tend to choose smaller values of Pp to achieve a higher privacy level in case of a
43. owards a Privacy Measurement Criterion for Voting Systems Poster Paper National Conference on Digital Government Research May 2005 9 R Cramer R Gennaro and B Schoenmakers A secure and optimally efficient multi authority election scheme In Advances in Cryptology Eurocrypt 97 LNCS 1233 pp 103 118 1997 10 R Cramer I Damgard and B Schoenmakers Proofs of partial knowledge and simplified design of witness 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 hiding protocols In Advances in Cryptology Crypto 94 LNCS 839 pp 174 187 1994 A Fiat and A Shamir How to prove yourself Practical solutions to identification and signature problems In Advances in Cryptology Crypto 86 LNCS 263 pp 186 194 1987 M J Freedman E Sit J Gates and R Morris Introducing Tarzan a Peer to Peer Anonymizing Network Layer In IPTPS 2002 T Fujioka T Okamoto and K Ohta A Practical Secret Voting Scheme for Large Scale Elections In Proceedings of Auscrypt Dec 1992 D M Goldschlag M G Reed and P F Syverson Onion Routing for Anonymous and Private Internet Connections In CACM Feb 1999 Q Huang H J Wang and N Borisov Privacy Preserving Friends Troubleshooting Network ISOC NDSS 2005 San Diego CA M Jakobsson A Juels and R Rivest Making Mix Nets Robust for Electronic Voting by Randomized Partial Checki
44. r may still send the second round request to a different subset of friends to seek the entry value Our multiple branch troubleshooting strategy increases the robustness of the second round query against tamper 1 compromised node 2 clusters per branch 4 clusters per branch V 6 clusters per branch 7 A 8 clusters per branch 2 N a a 2 b t ES a gt B e T 4 a es Se a 5 6 7 8 9 10 2 3 4 Number of branches to carry out troubleshooting 1 Probablity of successful tampering in 2nd round 0 1 compromised node jp 2 clusters per branch 4 clusters per branch K Vv 6 clusters per branch A 8 clusters per branch e i E o i Peers ae a ee ET a 2 3 4 5 6 7 9 10 Number of branches to carry out troubleshooting o Probablity of successful tampering in 2nd round 1 Figure 6 The probability that attackers may suc cessfully tamper with the entry value in the second round ing By querying multiple branches in the second round the troubleshooter is able to obtain the actual entry value as long as one of the branches does not encounter any com promised node We use our simulation data to calculate the probability that attackers may successfully tamper with the entry values returned by all branches in the second round as shown in Figure 6 Our approach efficiently reduces the threat of second round tamp
45. rameters the probability of any root cause candidate triggering a simultaneous collision in all six hash functions is under 5 for typical troubleshooting re quests If we encode this list of 96 values using C1 3 and C2 32 then a median troubleshooting request consisting of 1171 suspect entries requires maintaining a list of 37000 atomic encryptions each representing three tallies 8 ZERO KNOWLEDGE PROOFS 8 1 Proofs of Decryption Certain portions of the protocol can take advantage of zero knowledge proofs of validity to ensure that the encrypted data values are not tampered with during aggregation For example during the decryption phase each keyholder must divide the public key by some quantity x g and the en crypted portion of the data by the quantity y g If the previous and next nodes wish to verify that the keyholder is performing a legitimate decryption operation and not for example altering the data values they may combine their shared knowledge of the pre decryption and post decryption packets to infer what values of x and y were used in the de cryption phase The keyholder can then execute the interac tive Chaum Pedersen proof of knowledge protocol 6 given in Figure 1 to prove that the values of x and y satisfy the relation z g and y g without revealing his private key s For practical applications the Fiat Shamir heuris tic 11 is used to implement the protocol non interactively by using ps
46. romised host may lie about the application it owns and the configuration state it has or tamper with other peers contributions leading to incorrect troubleshooting results Another form of passive attack that a compromised node may launch is non participation A compromised host may refuse to propagate the troubleshooting request it receives or drop the response messages causing the troubleshooting communication path to fail However this attack has no effect on our security objective since by merely not partici pating the attacker does not inject false configuration infor mation into the response and gains no private information about other peers The troubleshooter may still seek help from other honest friends after a suitable timeout on the non participating node has elapsed Therefore we do not address this attack in our paper 4 PREVIOUS DESIGN In this section we briefly review the previous FTN pro tocol design 27 15 and its weaknesses 4 1 Creating a Request on the Sick Machine A sick machine first filters out the identity revealing en tries from the suspects This filtering step prevents informa tion compromise via entry names and in practice does not hurt the performance of the PeerPressure algorithm since identity revealing entry names are unlikely to be a root cause of the symptom Then it creates a troubleshooting request which contains 1 the name of the application executable that is under troubleshootin
47. s on the forward path will increase the communication over head without helping to gather any samples According to our computation from the 81790827 IM users with at least 3 friends we excluded those nodes with one or two friends since the former will not be included on the FTN forwarding path while the latter will not form a cluster the probability of routing to such nodes is 0 0025 Therefore the average number of hops that need to be traversed before reaching such a node is 1 0 0025 400 which far exceeds the num ber of hops that need to be traversed with the FTN protocol typically under 10 11 3 Mitigating the Data Injection Attack Ensuring the integrity of the troubleshooting result is chal lenging considering the occasional possibility that a friend s machine may be compromised and hence may lie about the configuration state it has An advantage of using homomorphic encryption is that the key holder can be asked to prove in zero knowledge that its decryption operation is legitimate and each cluster s exit can require the participants to give zero knowledge proof of the validity of their vote to ensure that the encrypted tally is not tampered during aggregation within the cluster However a cluster s entrance and exit can still manipulate the tally on the propagation path since unlike each cluster member s binary valued vote that can be verified in zero knowledge the randomly initialized tally can take any inte
48. ser can configure policies on how to adjust P for data of different privacy levels Since forking at every hop doubles the number of clus ters involved in the troubleshooting process we need to use an appropriate probability of forwarding to achieve a rel atively short path length per branch Let L denote the average path length per branch then the total number of clusters involved is ar 12t 2 1 on average The probability of forwarding Py can be rerna by equa tions Average number of helpers per cluster 2 1 10 and Average number of helpers per cluster 0 5 L TF The first equation is due to the fact that the total number of samples to be collected should be around 10 in order for the PeerPressure diagnosis to be effective 28 The second equation corresponds to a branch in the forking scenario where the average number of times that an exit flips a bi ased coin to determine if it should forward the request or not is equal to half of the average number of a cluster s helpers and in total the coin will be flipped 557 z times on average In general we want L to be a small value A short path length not only saves communication overhead but also re duces the chance of encountering a compromised peer which may contribute false configuration data and affect the in tegrity of the troubleshooting result If we set Py 0 7 corresponding to the average path length L 2 then the total number of clusters i
49. t keyholder in randomly permuted order We also require that Totd Sot si 1 each decrypter re encrypt the messages by transforming an encrypted message g g m into an equivalent encryption ght girs m for a randomly chosen value of r in order to prevent other participants from correlating the g component of an encrypted message back to its originator If the decrypters do not re encrypt then the keyholder of the last cluster in the chain who is also the first decrypter can store the previous cluster s value of g and collude with the troubleshooter to determine which decrypted subtotal corresponds to the previous cluster s contribution In the final step the troubleshooter decrypts all the mes sages and recovers all the subtotals using its stored value of so It then discards the random messages used to initialize the second round request and sums up all the decrypted subtotals to obtain sume which is an aggregate of the most popular entry value V The troubleshooter can compute the most popular value Ve by dividing sume by the number of helpers who contributed Ve in the second round i e the number of helpers whose entry e has a hash value match ing the most popular value This number can be read from the value distribution histogram that the troubleshooter re ceived in the first round The division result interpreted as an ASCII string will be the most popular value for the top ranking suspect entry
50. tain pri vate information but will never intentionally lie about their configuration state or alter troubleshooting results since they do have incentives to help their friends out Passive attacks initiated by curious friends include the following 1 Eavesdropping on machines on the same LAN 2 Message inspection attack Infer privacy sensitive in formation by passively inspecting the messages that are passing by 3 Gossip attack Friends may participate normally in a legitimate request but simply gossip and share what they know to compromise privacy Friends can exchange public keys out of band and use them to establish secure communication channels which renders eavesdropping attacks ineffective Therefore we do not specifically address this attack in our paper A compromised machine on the other hand may launch active attacks against its peer friends to compromise their privacy or to corrupt the integrity of the troubleshooting result Active attacks include the following 1 Troubleshooter attack A compromised host may fab ricate a troubleshooting request to infer his friend s private information For example it may collude with a node on the forwarding path to determine the aggre gate data values of all the nodes in between We call this form of attack a troubleshooter attack because it relies on the initiator participating in the attack by fabricating a troubleshooting request 2 Data injection attack A comp
51. the first round for the i th cluster s so si If the length of m exceeds the length of g then we use generic transforms such as CFB or CBC to convert a cipher on a short block to one on a long block If m is shorter than g then suitable padding such as OAEP is applied to make it match g in length Finally each cluster member unicasts its encrypted subtotal to the cluster exit In addition the cluster entrance modifies each old en crypted message E moia geld grotalsote si 1 Mold that it has received from the previous hop to use the new public key and computes E moia g 4 g g Mota g 4 gS Mora As in the first round the value E mota constitutes a valid encryption for Mora under the new public key g The entrance then appends its own subtotal encrypted under the public key g and passes the encrypted messages to the cluster exit The set of all en crypted messages from a cluster is collected into one large packet and then passed to the next cluster using the same entrance and exit nodes as in the first round In order to protect source anonymity the troubleshooter should initialize the second round query with a number of randomly selected encrypted messages which will be dis carded upon conclusion of the query On the return path each keyholder decrypts his share of the secret key from each of the incoming messages as in the first round and then passes the messages to the nex
52. ther reduce the risk of a successful troubleshooter attack If two exits are selected for a cluster and each cluster member randomly chooses one of them to unicast the subtotal of the random shares it receives from other participants then the troubleshooter would have to collude with both exits in order to reconstruct the cluster s aggregate information Due to the use of a random walk based on a probability of forwarding Py cf Section 4 2 the troubleshooter cannot reduce the likelihood of an honest exit forwarding its request to other clusters Hence the re sponse from an honest exit is likely to include the aggregate information from multiple clusters and not reveal privacy compromising details In the first round the key holder will wait until it receives the replies from both exits and ag gregate the encrypted tally contained in both replies using the homomorphic property In the second round the key holder will wait until it receives both replies collect all the encrypted messages contained in both replies into one packet and mix them by random shuffling Assume a malicious troubleshooter forms a cluster of size G among which C members are colluding with it The probability of a successful troubleshooter attack with only one exit is Pi S If the forking strategy is used the probability is reduced to P oats We have rA E th If only a small portion of the cluster s participants collude with the troubleshooter then Pz wi
53. valid encryption of m under the new public key g rs T g e The friend replaces the old public key g with the new public key g and replaces the old encrypted value E m with the new encrypted value E m Note that E m and E m both represent encryptions of the same value but under different public keys e Ifthe friend chooses to become a helper then he forms an encrypted message E m g gts g under the new public key where m is his own value for the entry and r is chosen randomly and uses the homomorphic property to compute the encrypted tally E m mi pity gr ti gt tales grt e The new public key and the new encrypted tally are forwarded to the next node The secret key s is stored for use during decryption 5 3 Decryption phase e The last hop has a public key of the form g50 tst and an encrypted value E m g got 8 g representing the final tally of the number of votes for 1 Since he owns the secret key s he can compute GONER a 1 r so st 1 m gh ore s g g g grst Note that g got 8 g is a valid encryption of m under the public key g0 Tu st 1 e The last hop sends the encrypted value g7 got 8t g to the previous hop e Proceeding inductively the i th node possesses a pub lic key of the form g s and receives an encrypted value E m g g 0t 80 g and sends the en crypted value g g
54. y QD 2z y gz g7 g 0 85 v C 1 H no fork H ideal g v Y C 2 H no fork H ideal os v C 1 H two fork H ideal o 2 0 8 y A C 2 H two fork H ideal m oO P Pa 2 y C Number of colluding members r3 F 2 075 8 10 12 14 16 18 20 22 24 The cluster size G Figure 4 The ratio of a troubleshooter s uncertainty to the ideal case with and without forking strategy able to determine the collective contributions of the other honest cluster members Nevertheless they will still be un able to determine each individual member s data the secure multi party sum protocol in both rounds ensures that all other cluster members must collude to reveal the contribu tion of an individual member A cluster member can adjust its probability to help Ph cf Section 4 2 to improve its privacy in case of a successful troubleshooter attack In gen eral Pa should take a smaller value for a smaller cluster size and for better privacy Although a malicious troubleshooter may invite fewer friends when forming its troubleshooting cluster the honest cluster members can always make the number of helpers a small fraction of the cluster size by re ducing Pa according to the cluster size and therefore their privacy will not be compromised We note that some con figuration data or application ownership information e g owning Microsoft Word is not privacy sensitive For those cases an FTN node simply sets P 1 In general an FTN u

Applications of Secure Electronic Voting to Automated

Contents

Download Pdf Manuals

Related Search

Related Contents