Internet-Draft | The Multihash Data Format | August 2023 |
Benet & Sporny | Expires 21 February 2024 | [Page] |
Cryptographic hash functions often have multiple output sizes and encodings. This variability makes it difficult for applications to examine a series of bytes and determine which hash function produced them. Multihash is a universal data format for encoding outputs from hash functions. It is useful to write applications that can simultaneously support different hash function outputs as well as upgrade their use of hashes over time; Multihash is intended to address these needs.¶
This specification is a joint work product of Protocol Labs and the W3C Credentials Community Group. Feedback related to this specification should logged in the issue tracker or be sent to public-credentials@w3.org.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 21 February 2024.¶
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Multihash is particularly important in systems which depend on cryptographically secure hash functions. Attacks may break the cryptographic properties of secure hash functions. These cryptographic breaks are particularly painful in large tool ecosystems, where tools may have made assumptions about hash values, such as function and digest size. Upgrading becomes a nightmare, as all tools which make those assumptions would have to be upgraded to use the new hash function and new hash digest length. Tools may face serious interoperability problems or error-prone special casing.¶
How many programs out there assume a git hash is a SHA-1 hash?¶
How many scripts assume the hash value digest is exactly 160 bits?¶
How many tools will break when these values change?¶
How many programs will fail silently when these values change?¶
This is precisely why Multihash was created. It was designed for seamlessly upgrading systems that depend on cryptographic hashes.¶
When using Multihash, a system warns the consumers of its hash values that these may have to be upgraded in case of a break. Even though the system may still only use a single hash function at a time, the use of multihash makes it clear to applications that hash values may use different hash functions or be longer in the future. Tooling, applications, and scripts can avoid making assumptions about the length, and read it from the multihash value instead. This way, the vast majority of tooling - which may not do any checking of hashes - would not have to be upgraded at all. This vastly simplifies the upgrade process, avoiding the waste of hundreds or thousands of software engineering hours, deep frustrations, and high blood pressure.¶
A multihash follows the TLV (type-length-value) pattern and consists of several fields composed of a combination of unsigned variable length integers and byte information.¶
The following section details the core data types used by the Multihash data format.¶
A data type that enables one to express an unsigned integer of variable length. The format uses the Little Endian Base 128 (LEB128) encoding that is defined in Appendix C of the DWARF Debugging Information Format [DWARF] standard, initially released in 1993.¶
As suggested by the name, this variable length encoding is only capable of representing unsigned integers. Further, while there is no theoretical maximum integer value that can be represented by the format, implementations MUST NOT encode more than nine (9) bytes giving a practical limit of integers in a range between 0 and 2^63 - 1.¶
When encoding an unsigned variable integer, the unsigned integer is serialized seven bits at a time, starting with the least significant bits. The most significant bit in each output byte indicates if there is a continuation byte. It is not possible to express a signed integer with this data type.¶
Value | Encoding (bits) | hexadecimal notation |
---|---|---|
1 | 00000001 | 0x01 |
127 | 01111111 | 0x7F |
128 | 10000000 00000001 | 0x8001 |
255 | 11111111 00000001 | 0xFF01 |
300 | 10101100 00000010 | 0xAC02 |
16384 | 10000000 10000000 00000001 | 0x808001 |
Implementations MUST restrict the size of the varint to a max of nine bytes (63 bits). In order to avoid memory attacks on the encoding, the aforementioned practical maximum length of nine bytes is used. There is no theoretical limit, and future specs can grow this number if it is truly necessary to have code or length values larger than 2^31.¶
A multihash follows the TLV (type-length-value) pattern.¶
The hash function identifier is an unsigned variable integer identifying the hash function. The possible values for this field are provided in The Multihash Identifier Registry.¶
The digest length is an unsigned variable integer counting the length of the digest in bytes.¶
The digest value is the hash function digest with a length of exactly what is specified in the digest length, which is specified in bytes.¶
For example, the following is an expression of a SHA2-256 hash in hexadecimal notation (spaces added for readability purposes):¶
0x12 20 41dd7b6443542e75701aa98a0c235951a28a0d851b11564d20022ab11d2589a8¶
The first byte (0x12) specifies the SHA2-256 hash function. The second byte (0x20) specifies the length of the hash, which is 32 bytes. The rest of the data specifies the value of the output of the hash function.¶
There are a number of security considerations to take into account when implementing or utilizing this specification. TBD¶
The multihash examples are chosen to show different hash functions and different hash digest lengths at play. The input test data for all of the examples in this section is:¶
Merkle–Damgård¶
0x11148a173fd3e32c0fa78b90fe42d305f202244e2739¶
The fields for this multihash are - hashing function: sha1 (0x11), length: 20 (0x14), digest: 0x8a173fd3e32c0fa78b90fe42d305f202244e2739¶
0x122041dd7b6443542e75701aa98a0c235951a28a0d851b11564d20022ab11d2589a8¶
The fields for this multihash are - hashing function: sha2-256 (0x12), length: 32 (0x20), digest: 0x41dd7b6443542e75701aa98a0c235951a28a0d851b11564d20022ab11d2589a8¶
0x132052eb4dd19f1ec522859e12d89706156570f8fbab1824870bc6f8c7d235eef5f4¶
The fields for this multihash are - hashing function: sha2-512 (0x13), length: 32 (0x20), digest: 0x52eb4dd19f1ec522859e12d89706156570f8fbab1824870bc6f8c7d235eef5f4¶
0x134052eb4dd19f1ec522859e12d89706156570f8fbab1824870bc6f8c7d235eef5f4c2cbbafd365f96fb12b1d98a0334870c2ce90355da25e6a1108a6e17c4aaebb0¶
The fields for this multihash are - hashing function: sha2-512 (0x13), length: 64 (0x40), digest: 0x52eb4dd19f1ec522859e12d89706156570f8fbab1824870bc6f8c7d235eef5f4c2cbbafd365f96fb12b1d98a0334870c2ce90355da25e6a1108a6e17c4aaebb0¶
0xb24040d91ae0cb0e48022053ab0f8f0dc78d28593d0f1c13ae39c9b169c136a779f21a0496337b6f776a73c1742805c1cc15e792ddb3c92ee1fe300389456ef3dc97e2¶
The fields for this multihash are - hashing function: blake2b-512 (0xb240), length: 64 (0x40), digest: 0xd91ae0cb0e48022053ab0f8f0dc78d28593d0f1c13ae39c9b169c136a779f21a0496337b6f776a73c1742805c1cc15e792ddb3c92ee1fe300389456ef3dc97e2¶
0xb220207d0a1371550f3306532ff44520b649f8be05b72674e46fc24468ff74323ab030¶
The fields for this multihash are - hashing function: blake2b-256 (0xb220), length: 32 (0x20), digest: 0x7d0a1371550f3306532ff44520b649f8be05b72674e46fc24468ff74323ab030¶
0xb26020a96953281f3fd944a3206219fad61a40b992611b7580f1fa091935db3f7ca13d¶
The fields for this multihash are - hashing function: blake2s-256 (0xb260), length: 32 (0x20), digest: 0xa96953281f3fd944a3206219fad61a40b992611b7580f1fa091935db3f7ca13d¶
0xb250100a4ec6f1629e49262d7093e2f82a3278¶
The fields for this multihash are - hashing function: blake2s-128 (0xb250), length: 16 (0x10), digest: 0x0a4ec6f1629e49262d7093e2f82a3278¶
The editors would like to thank the following individuals for feedback on and implementations of the specification (in alphabetical order).¶
The Multihash Identifier Registry contains hash functions supported by Multihash each with its canonical name, its value in hexadecimal notation, and its status. The following initial entries should be added to the registry to be created and maintained at (the suggested URI) http://www.iana.org/assignments/multihash-identifiers:¶
Name | Identifier | Status | Specification |
---|---|---|---|
identity | 0x00 | active | Unknown |
sha1 | 0x11 | active | RFC 6234 [RFC6234] |
sha2-256 | 0x12 | active | RFC 6234 [RFC6234] |
sha2-512 | 0x13 | active | RFC 6234 [RFC6234] |
sha3-512 | 0x14 | active | FIPS 202 [FIPS202] |
sha3-384 | 0x15 | active | FIPS 202 [FIPS202] |
sha3-256 | 0x16 | active | FIPS 202 [FIPS202] |
sha3-224 | 0x17 | active | FIPS 202 [FIPS202] |
sha2-384 | 0x20 | active | RFC 6234 [RFC6234] |
sha2-256-trunc254-padded | 0x1012 | active | RFC 6234 [RFC6234] |
sha2-224 | 0x1013 | active | RFC 6234 [RFC6234] |
sha2-512-224 | 0x1014 | active | RFC 6234 [RFC6234] |
sha2-512-256 | 0x1015 | active | RFC 6234 [RFC6234] |
blake2b-256 | 0xb220 | active | RFC 7693 [RFC7693] |
poseidon-bls12_381-a2-fc1 | 0xb401 | active | Unknown |
NOTE: The most up to date place for developers to find the table above, plus all multihash headers in "draft" status, is https://github.com/multiformats/multicodec/blob/master/table.csv.¶
This memo registers the "mh" digest-algorithm in the HTTP Digest Algorithm Values registry with the following values:¶
Digest Algorithm: mh¶
Description: The multibase-serialized value of a multihash-supported algorithm.¶
References: this document¶
Status: standard¶
This memo registers the "mh" hash algorithm in the Named Information Hash Algorithm registry with the following values:¶
ID: 49¶
Hash Name String: mh¶
Value Length: variable¶
Reference: this document¶
Status: current¶