OXT

Info

Highly-Scalable Searchable Symmetric Encryption with Support for Boolean Queries

CRYPTO 2013 (CCF A 类会议)

David Cash, Stanislaw Jarecki, Charanjit Jutla, Hugo Krawczyk, Marcel-Catalin Rosu, Michael Steiner

"This work presents the design and analysis of the first searchable symmetric encryption (SSE) protocol that supports conjunctive search and general Boolean queries ..."

We achieve this by a novel approach that pre-computes parts of the protocol messages and stores them in encrypted form at the server. Then, during search, the client sends information to the server that allows to unlock these pre-computed messages without further interaction.

1. 系统模型

数据库 $\mathsf{DB}$ 中包含 $d$ 个文件，每一个文件对应一组关键词 $\mathsf{W}_i$ 。SSE 协议输出的是文件标识 $\mathsf{ind}$ ，客户端可以通过这些标识来获取加密的文件，之后再进行解密。

数据库 $\mathsf{DB} = (\mathsf{ind}_i, \mathsf{W}_i)_{i=1}^d$ ，是由标识符/关键词集组成，其中 $\mathsf{ind}_i\in\{0,1\}^\lambda$ ， $\mathsf{W}_i \subseteq\{0,1\}^*$ 。那么所有的关键词集合可以表示为 $\mathsf{W} = \cup_{i=1}^d \mathsf{W}_i$ 。由于 $\mathsf{ind}$ 是混淆后的文件标识，因此可以泄露给外包服务器。对于关键词 $\overline{w}\in\mathsf{W}^*$ ， $\psi(\overline{w})$ 表示对关键词 $\overline{w}$ 的布尔表达式。我们用 $\mathsf{DB}(\psi(\overline{w}))$ 来表示满足 $\psi(\overline{w})$ 的表达符。 $d$ 表示文件数量， $m=|\mathsf{W}|$ ， $N=\sum_{w\in \mathsf{W}}|\mathsf{DB}(w)|$ 。

定义 tuple set，或者称为 T-set。T-set 允许在数据库里将每一个关键词都关联一个固定大小的数据元组。用户查询时，返回的就是这些数据元组列表。在本文提出的协议中，T-set 被用来作为 expanded inverted index （倒排索引）。

T-set 由三个函数组成， $\Sigma = (\mathsf{TSetSetup}, \mathsf{TSetGetTag}, \mathsf{TSetRetrieve})$ ：

$\mathsf{TSetSetup}(\mathbf{T}) \to (\mathsf{TSet}, K_T)$
$\mathsf{TSetGetTag}(K_T,w)\to \mathsf{stag}$
$\mathsf{TSetRetrieve}(\mathsf{TSet}, \mathsf{stag}) \to \mathbf{T}[w]$

T-set 的作用是尽可能隐藏 $\mathbf{T}$ 中的元组，以及这些元组与查询关键词的对应关系。

2. 单关键词可搜索加密（SKS）

T-set 可以用来构造单关键词搜索，也就是 single-keyword SSE Scheme (SKS)，详细步骤如下：

$\mathsf{EDBSetup}(\mathsf{DB})$ :

选择 PRF $F$ 的密钥 $K_S$ ，将 $\mathsf{DB}$ 解析成 $(\mathsf{ind}_i, \mathsf{W}_i)_{i=1}^d$ 的形式
初始化 $\mathbf{T}$ 为一个空数组，关键词集合 $\mathsf{W}=\cup_{i=1}^d \mathsf{W}_i$
对于每一个 $w\in\mathsf{W}$ ，按照以下方式构建 $\mathbf{T}[w]$ ：
- 初始化一个空列表 $\mathbf{t}$ ，令 $K_e\gets F(K_S, w)$
- 对于所有的 $\mathsf{ind}\in\mathsf{DB}(w)$ ，计算 $\mathsf{e}\gets\mathsf{Enc}(K_e, ind)$ ，然后将 $\mathsf{e}$ 添加到列表 $\mathbf{t}$ 中
- $(\mathsf{TSet}, K_T)\gets \mathsf{TSetSetup}(\mathbf{T})$
- 输出密钥 $(K_S, K_T)$ 以及加密的数据库 $\mathsf{EDB} = \mathsf{TSet}$

查询协议：

客户端已知密钥 $(K_S, K_T)$ ，以及查询的关键词 $w$
计算 $\mathsf{stag} \gets \mathsf{TSetGetTag}(K_T, w)$ ，发送 $\mathsf{stag}$ 给服务器
服务器计算 $\mathbf{t}\gets \mathsf{TSetRetrieve}(\mathsf{TSet}, \mathsf{stag})$ ，返回 $\mathbf{t}$ 给客户端
客户端令 $K_e\gets F(K_S, w)$ ，对于在 $\mathbf{t}$ 中的每一个 $\mathsf{e}$ ，计算 $\mathsf{ind}\gets \mathsf{Dec}(K_e, \mathsf{e})$ ，然后输出 $\mathsf{ind}$

如何将其扩展为支持连接搜索 (conjunctive keyword search)？

比如输入的查询是 $\bar{w} = (w_1, \ldots, w_n)$ ，对于在 $\bar{w}$ 中的每个关键词 $w_i$ ，都执行一次。但是服务器并不直接将 $\mathbf{t}$ 返回给客户端，而是从客户端处获得 $K_{e_i}$ ，其中 $i=1,\ldots, n$ ，然后对 $\mathsf{e}$ 解密得到每个 $w_i$ 的 $\mathsf{ind}$ 集合。这样搜索的复杂度为 $\sum_{i=1}^n |\mathsf{DB}(w_i)|$ 的倍数。但同时也向服务器泄露了太多信息。

Our goal is to reduce both computation and leakage in the protocol by tying those to the less frequent terms in the conjunctions (i.e., terms $w$ with small sets $\mathsf{DB}(w)$ ).

3. Basic Cross-Tags 协议（BXT）

假设客户端给的关键词列表为 $\bar{w} = (w_1, \ldots, w_n)$ ，其中 $\mathsf{DB}(w_1)$ 是最短的（对应的文件数量最少）。通过执行 SKS 搜索协议来查找关键词 $w_1$ ，客户端得到所有匹配 $w_1$ 的文件。

BXT 协议的核心思想是使用 SKS 来检索 $\mathsf{TSet}(w_1)$ ，然后服务器对 $w_2,\ldots,w_n$ 的结果取交集，服务器只会返回 full conjunction 的结果。

具体的做法是在 $\mathsf{EDBSetup}(\mathsf{DB})$ 阶段，除了生成 $\mathsf{TSet}$ 之外，还生成一个 $\mathsf{XSet}$ ，其中包含的元素我们称为 $\mathsf{xtag}$ 。对每个 $w\in W$ ，首先计算关键词的陷门 $\mathsf{xtrap} = F(K_X, w)$ ；对于每一个 $\mathsf{ind}\in\mathsf{DB}(w)$ ，需要计算 $\mathsf{xtag} = f(\mathsf{xtrap}, \mathsf{ind})$ 。

对于连接搜索 $(w_1,\ldots, w_n)$ ，选择估测出现频率最低的关键词，比如 $w_1$ ，使用 SKS 协议，即 $K_e\gets F(K_S, w_1)$ ， $\mathsf{stag}\gets \mathsf{TSetGetTag}(K_T, w_1)$ 。对于每一个 $i=2,\ldots, n$ ，客户端计算 $\mathsf{xtrap}_i\gets F(K_X, w_i)$ ，发送 $(K_e, \mathsf{stag}, \mathsf{xtrap}_2, \ldots, \mathsf{xtrap}_n)$ 。服务器使用 $\mathsf{stag}$ 检索 $\mathbf{t} = \mathsf{TSetRetrive}(\mathsf{TSet}, \mathsf{stag})$ 。对每一个在 $\mathbf{t}$ 中的密文 $\mathsf{e}$ ，首先解密 $\mathsf{ind} = \mathsf{Dec}(K_e, \mathsf{e})$ ，如果 $f(\mathsf{xtrap}_i, \mathsf{ind})\in \mathsf{XSet}$ ，发送符合条件的 $\mathsf{ind}$ 给客户端。

Note

这样一来，服务器就知道了 $\mathsf{ind}$ 。但正如前面所说，由于 $\mathsf{ind}$ 是混淆后的文件标识，可以泄露给外包服务器。

BXT 协议的搜索复杂度为 $n\cdot |\mathsf{DB}(w_1)|$ ，它在搜索效率与隐私保护之间做了平衡。服务器可以使用 $\mathsf{xtrap}_i$ 来判断它是否与其他 s-term 中的 $\mathsf{ind}$ 匹配。比如当用户查询了 $(w_1, w_2)$ 和 $(w_1', w_2')$ 之后，服务器便可以知道 $(w_1, w_2')$ 和 $(w_1', w_2)$ 的结果，但不知道 $(w_2, w_2')$ 的结果。

Note

本文中所有用 s- 开头的代表频率最低的，x- 开头则代表除了 s- 之外的。

BXT 协议的安全性有一个重要前提，那就是 $\mathsf{ind}$ 集合需要是不可预测的 (unpredictable)，这在其他协议中则是不需要的。

4. Oblivious Cross-Tags 协议（OXT）

BXT 协议会面临如下的攻击：当服务器得到了 $(w_1, \ldots, w_n)$ 相应的 $\mathsf{xtrap}_i$ ，此时服务器可以把 $\mathsf{xtrp}_i$ 保存起来。对于之后任意泄漏的 $\mathsf{ind}$ ，服务器便可以通过测试 $f(\mathsf{xtrap}_i, \mathsf{ind})\in \mathsf{XSet}$ 是否成立，便可以知道该文件中是否包含关键词 $w_i$ 。

这种攻击的根源是由于服务器能够自行计算 $f(\mathsf{xtrap}_i, \cdot)$ ，一种规避方式就是让客户端计算该函数。简单说就是服务器把 $w_1$ 对应的所有加密后的 $\mathsf{ind}$ 发送给用户，用户在本地计算 $f(\mathsf{xtrap}_i, \mathsf{ind})$ 再把结果返回给服务器。然而这样做会增加通信开销，对于服务器来说，它可以让客户端为其进行判断任意的 s-term，对于客户端来说，他知道的信息太多了（客户端知道了 $w_1$ 对应的所有文件 $\mathsf{ind}$ ）。

解决思路是把 $f(\mathsf{xtrap}, \cdot)$ 用一种不经意的共享计算来代替。

oblivious shared computation between client and server

最直接的想法是采用盲化手段 (blinded exponentiation)，也就是说，使用映射到 $Z_p^*$ 的伪随机函数 $F_p$ ，计算 $\mathsf{xind} = F_p(K_I, \mathsf{ind})\in Z_p^*$ ，再得到每个 $\mathsf{xtag}$ 为 $g^{F_p(K_X, w)\cdot \mathsf{xind}} \in Z_p^*$ 。

共享计算的过程：客户端使用盲因子 $z\in Z_p^*$ 进行盲化操作，即计算 $g^{F_p(K_X, w_i) \cdot z}$ ；服务器得到 $g^{F_p(K_X, w_i) \cdot z \cdot \mathsf{xind}}$ ，返回给客户端；客户端再通过 $z^{-1}\mbox{ mod } p$ 去盲得到 $g^{F_p(K_X, w_i)\cdot \mathsf{xind}}$ 。然而这样做会有一个漏洞，当服务器并没有真正计算，而是把 $g^{F_p(K_X, w_i)\cdot z}$ 直接返回给客户端时，客户端直接去盲，那么服务器就会得到 $g^{F_p(K_X, w_i)}$ ，此时服务器便可以根据已知的 $\mathsf{xind}$ 计算任意的 $\mathsf{xtag}$ 。

设计 OXT 协议的思路： 在预计算阶段（ $\mathsf{EDBSetup}$ 阶段），数据拥有者对于每个关键词以及对应的文件标识元组，计算一个盲值 $y_c = \mathsf{xind}\cdot z_c^{-1}$ ，其中 $z_c$ 是由 $w$ 和计数器 $c$ 生成。搜索阶段，服务器需要计算 $g^{F_p(K_X, w_i)\cdot \mathsf{xind}}$ （其中 $\mathsf{xind}$ 由 $w_1$ 对应出来）。客户端发送给服务器的是一个 $|\mathsf{DB}(w_1)|\times (n-1)$ 的 $\mathsf{xtoken}$ ，其中 $\mathsf{xtoken}[c,i]:= g^{F_p(K_X, w_i)\cdot z_c}, i=1,\ldots, n$ 。服务器从 T-set 中检索出 $w_1$ 的结果之后，便可以通过测试 $\mathsf{xtoken}[c,i]^{y_c}\in \mathsf{XSet}, i=2,\ldots, n$ 来过滤第 $c$ 个结果，很容易可以看到

$\mathsf{xtoken}[c,i]^{y_c} = g^{F_p(K_X, w_i)\cdot z_c \cdot \mathsf{xind}\cdot z_c^{-1}} = g^{F_p(K_X, w_i)\cdot \mathsf{xind}}$

相对于 BXT 协议，OXT 泄露的信息更少。对于服务器而言，它既不知道 $\mathsf{ind}$ 也不知道 $\mathsf{xtrap_j}$ ， $j=2,\ldots, n$ 。

The only other leakage via s-terms is that $\mathcal{S}$ learns when two queries have the same s-term $w_1$ and the size of the set $\mathsf{DB}(w_1)$ .

5. 实现布尔搜索

将布尔表达式转换成 $w_1 \wedge \phi(w_2,\ldots, w_n)$ 的形式，其中 $\phi$ 表示任意的布尔表达式，比如 $w_1\wedge (w_2 \vee w_3 \vee \neg w_4)$ 。服务器在检索出 $w_1$ 对应的结果之后，再根据 $v_i := \mathsf{xtoken}[c,i]^{y_c}\in \mathsf{XSet}$ ，计算 $\phi(v_2,\ldots, v_n)$ ，如果成立，则返回相应的 $\mathbf{e}$ 。

OXT

1. 系统模型

2. 单关键词可搜索加密（SKS）

3. Basic Cross-Tags 协议（BXT）

4. Oblivious Cross-Tags 协议（OXT）

5. 实现布尔搜索

6. 安全分析

7. T-Set 实例

OXT

1. 系统模型

2. 单关键词可搜索加密 （SKS）

3. Basic Cross-Tags 协议 （BXT）

4. Oblivious Cross-Tags 协议 （OXT）

5. 实现布尔搜索

6. 安全分析

7. T-Set 实例

2. 单关键词可搜索加密（SKS）

3. Basic Cross-Tags 协议（BXT）

4. Oblivious Cross-Tags 协议（OXT）