Internet-Draft | E2EE Messaging Private User Discovery | August 2023 |
Hogben & Olumofin | Expires 17 February 2024 | [Page] |
This document specifies how users can find and communicate with each other privately when using end-to-end encryption messaging. Users can retrieve the key materials and message delivery endpoints of other users without revealing their social graphs to the key material service hosts. Users can search for phone numbers or user IDs, either individually or in batches, using private information retrieval (PIR). Our specification is based on the state-of-the-art lattice-based homomorphic PIR scheme, which provides a reasonable tradeoff between privacy and cost in a keyword-based sparse PIR setting.¶
This note is to be removed before publishing as an RFC.¶
The latest revision of this draft can be found at https://datatracker.ietf.org/doc/giles-interop-user-private-discovery/. Status information for this document may be found at https://datatracker.ietf.org/doc/draft-giles-mimi-user-private-discovery/.¶
Discussion of this document takes place on the mimi Working Group mailing list (mailto:mimi@ietf.org), which is archived at https://mailarchive.ietf.org/arch/browse/mimi/. Subscribe at https://www.ietf.org/mailman/listinfo/mimi/.¶
Source for this draft and an issue tracker can be found at https://github.com/femigolu/giles-interop-user-private-discovery.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 17 February 2024.¶
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
Glossary of terms:¶
Outline of design for message delivery bridge and key distribution server discovery mechanisms for interoperable E2EE messaging clients. A DNS-like resolver service stores UserID <-> service pairs that map to key distribution and message delivery endpoints (e.g. Platform1 Bridges). Each service is responsible for an "authoritative name server" that covers its own users and this can be replicated/cached by other providers and retrieved by sender clients using a privacy preserving protocol to prevent social graph leakage.¶
For a given messaging service identity handle (Phone number or alphanumeric UserID):¶
Hiding service reachability. All major E2EE messaging services already publish unACL'd reachability information without opt-out i.e. +16501234567, reachable on Messages, Whatsapp, Telegram (not including name or any other info). Therefore this should not be a goal (and would not be feasible to implement).¶
Taking Platform1 client sending to a Platform2 user as an example:¶
This provides E2EE interop while only disclosing to gateway service which services a phone number is registered to. In all other respects, gateway services learn no more information than in the non-interop case.¶
Each service is responsible for registering user enrollments with the resolver.¶
While the preferred service is public, the user should control its value/integrity. As well as ensuring user control, it also prevents spoofing attacks where an attacker A could create an account on a new service that B does not have an account on, and then set it to B's preferred service (see cross-service identity spoofing below). Therefore to prevent anyone but the user modifying the default service value, records must be signed with the user's private key and verified by the sender against their public key. For multiple key pairs across different services, the last key pair to sign the default service bit must be used to change the default.¶
Today, a messaging service may support one or more ways of identifying a user including email address, phone number, or service specific user name.¶
Messaging interoperability introduces a new problem that traditionally has been resolvable at the service level: cross-service identity spoofing, where a user on a given E2EE may or may not be addressable at the same ID on another service due to a lack of global uniqueness constraints across providers.¶
As a result, a user may be registered at multiple services with the same handles, e.g. if Bob's email is bob@example.com and his phone number is 555-111-2222 and he is registered with Signal and iMessage, he would be addressable at bob@example.com:iMessage, 555-111-2222:iMessage, and 555-111-2222:Signal. In this case, the same userId on iMessage and Signal is acceptable as the phone number can map to only one individual who proves their identity by validating ownership of the SIM card.¶
On services where a user can log in with a username alone, however e.g. Threema and FooService, the challenge becomes:¶
Options for solving this are: 1. Storing the supported services for a contact in Contacts and if a receipt receives a message from an unknown sender, to treat it as spam or otherwise untrusted from the start. 2. Requiring the fully qualified username for services that rely on usernames only - e.g. bob@threema.com vs bob.¶
Resolver lookup queries leak the user's social graph - i.e. who is communicating with whom, since the IP address of the querying client can be tied to user identity, especially when operating over a mobile data network. Therefore we propose to use Private Information Retrieval (PIR) to perform the resolver queries. We have evaluated multiple alternative schemes. The proposed solution is based on the Public Key PIR framework by Patel et al[PIRFramework] with sharded databases. This framework is applicable with any standard PIR scheme such as the open source implementation here. Cost estimates suggest this is feasible even for very large resolver databases (10 billion records).¶
A private information retrieval protocol enables a client holding an index (or keyword) to retrieve the database record corresponding to that index from a remote server. PIR schemes have communication complexities sublinear in the database size and they provide access privacy for clients which precludes the server from being able to learn any information about either the query index or the record retrieved. A standard single-server PIR scheme provides clients with algorithms to generate a query and decrypt a response from the server. It also provides an algorithm for the server to compute a response.¶
The Public Key PIR framework [PIRFramework] can be wrapped around any standard lattice-based PIR scheme. This framework consists of server database setup, client key initialization, client query generation, server response computation, and client response decryption sub-protocols. All operations are over a set of integers with a plaintext modulus.¶
Sharding: If the database is over 220 records, sub-divide it into shards of ~1 million unique records each, which is a good balance for privacy and costs. Performing PIR over the databases gives stronger privacy but is more costly. Similarly, running PIR queries over the shards is less costly but gives weaker privacy.¶
Set partitioning boundaries for each shard D: Given a n key-value pairs shard D = {(k1,v1),...,(kn,vn)}, then¶
For each partition Pi¶
This protocol is completed by the server without any client participation and before answering any client query. Note that a shard must be re-transformed after an update. Shard transform only takes a few minutes.¶
The client downloads query parameters from the server:¶
In these estimates, we propose using shards of size ~1 million of identifiers. For 1.28 TB (10 billion records), breaking this down into 10,000 shards each of size 1 million records gives a cost estimate for each query as below:¶
Parameter | Cost estimate |
---|---|
PIR Public Key Size Per Device, including metadata (storage required) | 14 MB |
Upload Bandwidth Per Query | 14 KB |
Download Bandwidth Per Query | 21 KB |
Client Time Per Query | 0.1s |
Server Time Per Query (Single Thread) | 0.8-1s |
Note on some assumptions for feasibility:¶
This document has no IANA actions.¶
The technical description of the private information retrieval framework is based on Sarvar Patel, Joon Young Seo and Kevin Yeo's USENIX Security '23 paper titled "Don't be Dense: Efficient Keyword PIR for Sparse Databases ".¶