Review for EPTD'19

[EPTD'19]

Efficient and Privacy-Preserving Truth Discovery in Mobile Crowd Sensing Systems

(IEEE TVT'19)

"... can tolerate users offline at any stage, while guaranteeing practical efficiency and accuracy under working process."

Motivation

previous researches require that users stay online at all times for interacting frequently with the cloud server consequently
two non-colluding servers seem not suitable for real-world applications

Problem Statement

Specifically, truth discovery always starts from estimation each user's reliability subsequently to integrate the users' weights and sensory data for further inferring the ground truths.

假设一共有 $\mathcal{H}$ 个对象， $\mathcal{D}$ 个用户。 $x_h^d$ 表示第 $d$ 个用户对对象 $h$ 观测到的结果，地面真值用 $x_h^*$ 来表示。

Weight Update:

$w_d = f(\sum_{h=1}^H d(x_h^d, x_h^*))$

Truth Update:

$x_h^* = \frac{\sum_{d=1}^\mathcal{D} w_d \cdot x_h^d}{\sum_{d=1}^\mathcal{D} w_d}$

Two types of entities (honest-but-curious):

users
the cloud server

In our EPTD, we allow the cloud server to collude with any set of less than $t$ users to attack EPTD for stealing users' privacy.

Goal: protect the privacy of user's sensory data and reliability information (i.e., weight) in whole working process

shamir secret sharing: tolerate additional parties dropping out during recovery process
double masking: ensure the privacy of $x_d$ even the cloud server can remove the masks above of each user $d$

Shamir's $t$ -out-of- $\mathcal{D}$ Secret Sharing:

$\mathcal{U}$ denotes the users' ID, $|\mathcal{U}| = \mathcal{D}$ , $\mathcal{M}\subseteq \mathcal{U}$ , $t\leq |\mathcal{M}|$

$\mbox{Shamir.share}(s, t, \mathcal{U}) \to \{(d, s_d)\}_{d\in \mathcal{U}}$
$\mbox{Shamir.recon}(\{(d, s_d)\}_{d\in \mathcal{M}}, t) \to s$

Key Agreement:

Note

$x$ 是私钥，后面用 $D_d^{SK}$ 表示； $g^x$ 是公钥，后面用 $D_d^{PK}$ 表示。 $\mbox{KA.agree}$ 的意思就是在知道用户 $d$ 的私钥 $x_d$ 和 $m$ 的公钥 $g^{x_m}$ 时，可以得到他们之间的共享密钥 $s_{d,m}$ 。

$\mbox{KA.param}(k)\to (\mathcal{G}, g, q, H)$
$\mbox{KA.gen}(\mathcal{G}, g, q, H)\to (x, g^x)$
$\mbox{KA.agree}(\mathsf{sign}_m(g^{x_d}, g^{x_m}), \mathsf{MAC}_{k_v}(m) ,x_d, g^{x_m}, d, m) \to s_{d,m}$

真值发现的目标是计算

$\sum_{d'=1}^\mathcal{D} \sum_{h=1}^{H} d(x_h^{d'}, x_h^*)$
$\sum_{d=1}^\mathcal{D} w_d \cdot x_h^d$
$\sum_{d=1}^\mathcal{D} w_d$

Note

对应到增量的真值发现中，也是需要计算这些，但是不需要迭代

One Masking:

假定所有用户按照一定次序排列，对于任一对用户 $(d,m)$ , $d<m$ ，他们共同有一个随机数 $r_{d,m}$ ，用户 $d$ 加上这个随机数，而用户 $m$ 减去这个随机数。

那么用户 $d$ 输出

$y_d = x_d + \sum_{m\in \mathcal{D}:d<m}r_{d,m} - \sum_{m\in \mathcal{D}:d>m} r_{m,d}$

因此把 $y_d$ 全部加起来就可以得到聚合后的总值

$z = \sum_{d\in \mathcal{D}} y_d = \sum_{d\in\mathcal{D}} x_d$

这个方法有两个缺点：

每一个用户 $d$ 都需要跟其他所有的用户达成共同随机数 $r_{d,m}$ ，通信开销为 $\mathcal{D}^2$
如果有一个用户退出，那么最后的结果就会产生偏差

Double-Masking with OTP:

为了降低通信开销，可以使用伪随机数生成器，即 $\mathsf{PRG}$ 。种子可以使用二者之间的共享密钥。至于用户退出的问题，可以用秘密共享的方式解决。但在现实中，可能出现用户 $d$ 发送的 $r_{d,m}$ 存在延迟的情况，这也就让服务器可以恢复 $d$ 上传的数值 $x_d$ 。

因此这里每个用户在上传之前，会随机选择一个种子 $n_d$ ，并把 $n_d$ 以秘密共享的方式分享给其他用户。

$y_d = x_d + \mathsf{PRG}(n_d) + \sum_{m\in\mathcal{D}:d<m}\mathsf{PRG}(r_{d,m}) - \sum_{m\in\mathcal{D}:d>m}\mathsf{PRG}(r_{m,d})$

The Proposed Scheme

High-Level View:

一个云服务器， $\mathcal{D}$ 个用户，而且每个用户秘密地上传 $x_d$ ，服务器的目标就是聚合 $x_d$ 并获取最终的结果。在这个过程中，用户可以随时退出，而云服务器只需要至少有 $t$ 个用户提供了数据即可恢复出原有的数据。

Secure Weight Update

Round 0 - Key Generation:

$\{(T_d^{PK}, T_d^{SK}), (D_d^{PK}, D_d^{SK}), (C_d^{PK}, C_d^{SK})\} \gets \mbox{KA.gen}(k)$

Note

此步骤由 TA 完成，每个用户有三对密钥，这三对密钥的作用分别是， $T_d$ 在密钥交换后作为对称加密密钥， $D_d$ 在密钥交换后作为种子， $C_d$ 作为签名和验证

每个用户使用私钥 $C_d^{SK}$ 对公钥 $(T_d^{PK}, D_d^{PK})$ 进行签名，即：

$\rho_d \gets \mbox{sign}(C_d^{SK}, T_d^{PK}||D_d^{PK})$

然后用户将 $(\rho_d || T_d^{PK} || D_d^{PK})$ 发送给云服务器。记云服务器收到的用户集合为 $\mathcal{U}_1$ ，它将 $\{(m, T_m^{PK}, D_m^{PK}, \rho_m)\}_{m\in \mathcal{U}_1}$ 广播给所有用户。

Round 1 - Key Sharing:

对于用户 $d$ 来说，他需要检查：

$|\mathcal{U}_1| \geq t$ ?
$\rho_m$ is valid (使用 $C_m^{PK}$ ，对于每一个 $m\in \mathcal{U}_1$ )?

检查完毕，用户首先随机选择 $n_d$ 并为 $D_d^{SK}$ 和 $n_d$ 创建秘密共享：

$\{(m, D_{m,d}^{SK})\}_{m\in\mathcal{U}_1} \gets \mbox{Shamir.share}(D_d^{SK}, t, \mathcal{U}_1)$

$\{(m, n_{m,d})\}_{m\in\mathcal{U}_1} \gets \mbox{Shamir.share}(n_d, t, \mathcal{U}_1)$

Note

也就是说把这些都分成了 $m$ 份，只需要其中的 $t$ 份便可以还原出 $D_d^{SK}$ 和 $n_d$

每个用户 $d$ 使用对称加密算法，加密 $d,m,D_{m,d}^{SK},n_{m,d}$ ，生成密文 $\mathcal{T}_{m,d}$ ，并提交给服务器：

$\mathcal{T}_{m,d} \gets \mbox{AE.enc}(\mbox{KA.agree}(T_d^{SK}, T_m^{PK}), d||m||D_{m,d}^{SK}||n_{m,d})$

Note

$\mathcal{T}_{m,d}$ 只有 $m$ 和 $d$ 可以解密

设 $\mathcal{U}_2$ 为此时服务器收到消息的用户集合， $\mathcal{U}_2\subseteq \mathcal{U}_1$ 。服务器首先初始化一个真值 $x_h^*$ ，并将 $\{\mathcal{T}_{m,d}\}_{m\in\mathcal{U}_2}$ 和 $x_h^*$ 发送给所有用户。

Round 2:

用户收到 $\{\mathcal{T}_{m,d}\}_{m\in\mathcal{U}_2}$ 后，首先检查是否 $|\mathcal{U}_2|\geq t$ 。对于用户 $d$ 来说，可以计算出：

$r_{d,m} \gets \mbox{KA.agree}(D_d^{SK}, D_m^{PK})$

然后可以计算：

$y_d^2 = x_d^2 + \mathsf{PRG}(n_d) + \sum_{m\in\mathcal{U}_2:d<m} \mathsf{PRG}(r_{d,m}) - \sum_{m\in\mathcal{U}_2:d>m} \mathsf{PRG}(r_{m,d})$

其中， $x_d^2$ 表示 $\sum_{h=1}^\mathcal{H}d(x_h^d, x_h^*)$