Core Lightning implementation of BOLT #11 invoices - part 4
In this live we try to understand why the description foo bar
of an invoice is represented in a BOLT #11 string by vehk7grzv9eq
.
Transcript with corrections and improvements
Today we are going to look at the transformation of the information of an invoice from its representation with bytes of 8 bits to bytes of 5 bits.
Specifically, we'll try to understand why the description foo bar
of
an invoice is represented in a BOLT #11 string by vehk7grzv9eq
.
Why bytes of 5 bits?
Because BOLT #11 invoices use bech32 encoding specified by BIP 173, and its charater set contains 32 characters, and 32 is equal to 2 to the power of 5.
A BOLT #11 invoice breaks down into 4 parts
A BOLT #11 invoice breaks down into 4 parts, as shown below
HRP separator data checksum
| | | |
lnbcrt100n 1 pj47g4... r8rsyy
where:
HRP stands for Human Readable Part,
lnbcrt
means it is an invoice for the regtest chain of 10000msat (100n
),1
is the separator (last 1 in the string)the we have a data part and finally
the checksum (6 characters) as defined by BIP 173.
Let's run some code!
Generate an invoice on regtest with the description "foo bar"
Let's start two Lightning nodes running on the Bitcoin regtest
chain
by sourcing the script lightning/contrib/startup_regtest.sh
provided
by CLN repository and running the command start_ln
:
◉ tony@tony:~/clnlive/lightning:[git»(HEAD detached at v23.11)]
$ source contrib/startup_regtest.sh
...
◉ tony@tony:~/clnlive/lightning:[git»(HEAD detached at v23.11)]
$ start_ln
...
We can check that l1-cli
is just an alias for lightning-cli
with the
base directory being /tmp/l1-regtest
:
◉ tony@tony:~/clnlive/lightning:[git»(HEAD detached at v23.11)]
$ alias l1-cli
alias l1-cli='/home/tony/clnlive/lightning/cli/lightning-cli --lightning-dir=/tmp/l1-regtest'
Now we can generate a BOLT #11 invoice like this:
◉ tony@tony:~/clnlive/lightning:[git»(HEAD detached at v23.11)]
$ l1-cli invoice 10000 label "foo bar"
{
"payment_hash": "53afc5894cc6a9aed4a547b776ae2d4a541be02189815f8364c98a37497c0856",
"expires_at": 1704985787,
"bolt11": "lnbcrt100n1pjedj3msp5cga03umqm7mrjnjag6cf3vvsay3fj4an6qurmxn9zuts47wjh0qqpp52whutz2vc656a499g7mhdt3dff2phcpp3xq4lqmyex9rwjtupptqdqvvehk7grzv9eqxqyjw5qcqp2fp4pctzqnye37nx89nzl5msq2htxgaa85a6f5qkfws6dfgj48dwv9rtq9qx3qysgqxyxw4gfzgd6zgml6me24m7nr0prgvq8e726ycrj2kjjvuseep7zxrdwqz2utu5x2ejt2rrfhhfgwf7zy35qe2tu42w79y64trhjfkxcqp74tlj",
"payment_secret": "c23af8f360dfb6394e5d46b098b190e9229957b3d0383d9a6517170af9d2bbc0",
"created_index": 1,
"warning_capacity": "Insufficient incoming channel capacity to pay invoice"
}
Break down the BOLT #11 invoice into 4 parts
We can break down the previous invoice
lnbcrt100n1pjedj3msp5cga03umqm7mrjnjag6cf3vvsay3fj4an6qurmxn9zuts47wjh0qqpp52whutz2vc656a499g7mhdt3dff2phcpp3xq4lqmyex9rwjtupptqdqvvehk7grzv9eqxqyjw5qcqp2fp4pctzqnye37nx89nzl5msq2htxgaa85a6f5qkfws6dfgj48dwv9rtq9qx3qysgqxyxw4gfzgd6zgml6me24m7nr0prgvq8e726ycrj2kjjvuseep7zxrdwqz2utu5x2ejt2rrfhhfgwf7zy35qe2tu42w79y64trhjfkxcqp74tlj
into the following 4 parts:
hrp:
lnbcrt100n
,separator:
1
,data:
pjedj3msp5cga03umqm7mrjnjag6cf3vvsay3fj4an6qurmxn9zuts47wjh0qqpp52whutz2vc656a499g7mhdt3dff2phcpp3xq4lqmyex9rwjtupptqdqvvehk7grzv9eqxqyjw5qcqp2fp4pctzqnye37nx89nzl5msq2htxgaa85a6f5qkfws6dfgj48dwv9rtq9qx3qysgqxyxw4gfzgd6zgml6me24m7nr0prgvq8e726ycrj2kjjvuseep7zxrdwqz2utu5x2ejt2rrfhhfgwf7zy35qe2tu42w79y64trhjfkxcq
,checksum:
p74tlj
.
timestamp
The first 7 characters of the data part of the BOLT #11 is the
timestamp: pjedj3m
.
s (secret) and p (payment_hash) tagged field
After removing the timestamp from the data part of the BOLT #11 invoice, we are left with that string:
sp5cga03umqm7mrjnjag6cf3vvsay3fj4an6qurmxn9zuts47wjh0qqpp52whutz2vc656a499g7mhdt3dff2phcpp3xq4lqmyex9rwjtupptqdqvvehk7grzv9eqxqyjw5qcqp2fp4pctzqnye37nx89nzl5msq2htxgaa85a6f5qkfws6dfgj48dwv9rtq9qx3qysgqxyxw4gfzgd6zgml6me24m7nr0prgvq8e726ycrj2kjjvuseep7zxrdwqz2utu5x2ejt2rrfhhfgwf7zy35qe2tu42w79y64trhjfkxcq
That string start with s
, so it corresponds to the tagged field for
the secret.
The next two characters, corresponding to the length (big-endian) of
the data for the secret, are p5
.
Why?
Because, 52 in decimal corresponds to p5
(big-endian) in bech32 as we
can see below:
But why the data length of the secret is 52?
Because, the secret is a SHA256 number that takes 256 bits. And to be represented using bech32 character set we need to pad those 256 bits with four 0s as described in BOLT #11 spec to have a length of 260, a number divisible by 5.
Specifically, 260 divided by 5 is equal to 52.
Finally, the data part of the s
tagged field is:
cga03umqm7mrjnjag6cf3vvsay3fj4an6qurmxn9zuts47wjh0qq
In bits we have the following:
[5 bits] s
[10 bits] p5
[260 bits] cga03umqm7mrjnjag6cf3vvsay3fj4an6qurmxn9zuts47wjh0qq
After removing the secret tagged field from the data part of the BOLT #11, we are left with that string:
pp52whutz2vc656a499g7mhdt3dff2phcpp3xq4lqmyex9rwjtupptqdqvvehk7grzv9eqxqyjw5qcqp2fp4pctzqnye37nx89nzl5msq2htxgaa85a6f5qkfws6dfgj48dwv9rtq9qx3qysgqxyxw4gfzgd6zgml6me24m7nr0prgvq8e726ycrj2kjjvuseep7zxrdwqz2utu5x2ejt2rrfhhfgwf7zy35qe2tu42w79y64trhjfkxcq
That string start with p
, so it corresponds to the payment_hash tagged
field. We can do something similar as for the s
tagged field.
In bits we have the following:
[5 bits] p
[10 bits] p5
[260 bits] 2whutz2vc656a499g7mhdt3dff2phcpp3xq4lqmyex9rwjtupptq
d (description) tagged field
After removing the payment_hash from the data part of the BOLT #11 invoice, we are left with that string:
dqvvehk7grzv9eqxqyjw5qcqp2fp4pctzqnye37nx89nzl5msq2htxgaa85a6f5qkfws6dfgj48dwv9rtq9qx3qysgqxyxw4gfzgd6zgml6me24m7nr0prgvq8e726ycrj2kjjvuseep7zxrdwqz2utu5x2ejt2rrfhhfgwf7zy35qe2tu42w79y64trhjfkxcq
That string starts with d
, so it corresponds to the tagged field of
the description.
The next two characters, corresponding to the length (big-endian) of
the data of the description, are qv
.
As qv
(big-endian) in bech32 corresponds to 12, the data part of the
description is:
vehk7grzv9eq
Below we see why foo bar
string description is represented by
vehk7grzv9eq
:
Terminal session
We ran the following commands in this order:
$ source contrib/startup_regtest.sh
$ start_ln
$ alias l1-cli
$ l1-cli invoice 10000 label "foo bar"
And below you can read the terminal session (command lines and outputs):
◉ tony@tony:~/clnlive/lightning:[git»(HEAD detached at v23.11)]
$ source contrib/startup_regtest.sh
...
◉ tony@tony:~/clnlive/lightning:[git»(HEAD detached at v23.11)]
$ start_ln
...
◉ tony@tony:~/clnlive/lightning:[git»(HEAD detached at v23.11)]
$ alias l1-cli
alias l1-cli='/home/tony/clnlive/lightning/cli/lightning-cli --lightning-dir=/tmp/l1-regtest'
◉ tony@tony:~/clnlive/lightning:[git»(HEAD detached at v23.11)]
$ l1-cli invoice 10000 label "foo bar"
{
"payment_hash": "53afc5894cc6a9aed4a547b776ae2d4a541be02189815f8364c98a37497c0856",
"expires_at": 1704985787,
"bolt11": "lnbcrt100n1pjedj3msp5cga03umqm7mrjnjag6cf3vvsay3fj4an6qurmxn9zuts47wjh0qqpp52whutz2vc656a499g7mhdt3dff2phcpp3xq4lqmyex9rwjtupptqdqvvehk7grzv9eqxqyjw5qcqp2fp4pctzqnye37nx89nzl5msq2htxgaa85a6f5qkfws6dfgj48dwv9rtq9qx3qysgqxyxw4gfzgd6zgml6me24m7nr0prgvq8e726ycrj2kjjvuseep7zxrdwqz2utu5x2ejt2rrfhhfgwf7zy35qe2tu42w79y64trhjfkxcqp74tlj",
"payment_secret": "c23af8f360dfb6394e5d46b098b190e9229957b3d0383d9a6517170af9d2bbc0",
"created_index": 1,
"warning_capacity": "Insufficient incoming channel capacity to pay invoice"
}
BOLT #11 - Data Part
This section is taken from bolts:11-payment-encoding.md.
The data part of a Lightning invoice consists of multiple sections:
timestamp
: seconds-since-1970 (35 bits, big-endian)zero or more tagged parts
signature
: Bitcoin-style signature of above (520 bits)
Tagged Fields
Each Tagged Field is of the form:
type
(5 bits)data_length
(10 bits, big-endian)data
(data_length
x 5 bits)
Note that the maximum length of a Tagged Field's data
is constricted
by the maximum value of data_length
. This is 1023 x 5 bits, or 639
bytes.
Currently defined tagged fields are:
p
(1):data_length
52. 256-bit SHA256 payment_hash. Preimage of this provides proof of payment.s
(16):data_length
52. This 256-bit secret prevents forwarding nodes from probing the payment recipient.d
(13):data_length
variable. Short description of purpose of payment (UTF-8), e.g. '1 cup of coffee' or 'ナンセンス 1杯'm
(27): ...n
(19): ...h
(23):data_length
52. 256-bit description of purpose of payment (SHA256). This is used to commit to an associated description that is over 639 bytes, but the transport mechanism for the description in that case is transport specific and not defined here.x
(6): ...c
(24):data_length
variable.min_final_cltv_expiry_delta
to use for the last HTLC in the route. Default is 18 if not specified.f
(9): ...r
(3): ...9
(5): ...
Requirements
A writer:
MUST include exactly one
p
field.MUST include exactly one
s
field.MUST set
payment_hash
to the SHA2 256-bit hash of thepayment_preimage
that will be given in return for payment.MUST include either exactly one
d
or exactly oneh
field.if
d
is included:MUST set
d
to a valid UTF-8 string.SHOULD use a complete description of the purpose of the payment.
if
h
is included: ...
MUST include one
c
field (min_final_cltv_expiry_delta
)....
if there is NOT a public channel associated with its public key:
MUST include at least one
r
field....
MUST pad field data to a multiple of 5 bits, using 0s.
...
Bech32 as defined in BIP 0173
This section is taken from https://en.bitcoin.it/wiki/BIP_0173.
A Bech32 string is at most 90 characters long and consists of:
The human-readable part, which is intended to convey the type of data, or anything else that is relevant to the reader. This part MUST contain 1 to 83 US-ASCII characters, with each character having a value in the range [33-126]. HRP validity may be further restricted by specific applications.
The separator, which is always "1". In case "1" is allowed inside the human-readable part, the last one in the string is the separator.
The data part, which is at least 6 characters long and only consists of alphanumeric characters excluding "1", "b", "i", and "o".
| | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|-----+---+---+---+---+---+---+---+---|
| +0 | q | p | z | r | y | 9 | x | 8 |
| +8 | g | f | 2 | t | v | d | w | 0 |
| +16 | s | 3 | j | n | 5 | 4 | k | h |
| +24 | c | e | 6 | m | u | a | 7 | l |
The last six characters of the data part form a checksum and contain no information.
UTF-8
This section is taken from https://en.wikipedia.org/wiki/UTF-8.
Since the restriction of the Unicode code-space to 21-bit values in 2003, UTF-8 is defined to encode code points in one to four bytes, depending on the number of significant bits in the numerical value of the code point.
The x characters are replaced by the bits of the code point.
Code point ↔ UTF-8 conversion
| First code point | Last code point | Byte 1 | Byte 2 | Byte 3 | Byte 4 |
| U+0000 | U+007F | 0xxxxxxx | | | |
| U+0080 | U+07FF | 110xxxxx | 10xxxxxx | | |
| U+0800 | U+FFFF | 1110xxxx | 10xxxxxx | 10xxxxxx | |
| U+10000 | [nb 2]U+10FFFF | 11110xxx | 10xxxxxx | 10xxxxxx | 10xxxxxx |
The first 128 code points (US-ASCII) need one byte.
ASCII
This section is taken from https://man7.org/linux/man-pages/man7/ascii.7.html.
The following table contains the 128 ASCII characters:
C program '\X' escapes are noted.
Oct Dec Hex Char Oct Dec Hex Char
────────────────────────────────────────────────────────────────────────
000 0 00 NUL '\0' (null character) 100 64 40 @
001 1 01 SOH (start of heading) 101 65 41 A
002 2 02 STX (start of text) 102 66 42 B
003 3 03 ETX (end of text) 103 67 43 C
004 4 04 EOT (end of transmission) 104 68 44 D
005 5 05 ENQ (enquiry) 105 69 45 E
006 6 06 ACK (acknowledge) 106 70 46 F
007 7 07 BEL '\a' (bell) 107 71 47 G
010 8 08 BS '\b' (backspace) 110 72 48 H
011 9 09 HT '\t' (horizontal tab) 111 73 49 I
012 10 0A LF '\n' (new line) 112 74 4A J
013 11 0B VT '\v' (vertical tab) 113 75 4B K
014 12 0C FF '\f' (form feed) 114 76 4C L
015 13 0D CR '\r' (carriage ret) 115 77 4D M
016 14 0E SO (shift out) 116 78 4E N
017 15 0F SI (shift in) 117 79 4F O
020 16 10 DLE (data link escape) 120 80 50 P
021 17 11 DC1 (device control 1) 121 81 51 Q
022 18 12 DC2 (device control 2) 122 82 52 R
023 19 13 DC3 (device control 3) 123 83 53 S
024 20 14 DC4 (device control 4) 124 84 54 T
025 21 15 NAK (negative ack.) 125 85 55 U
026 22 16 SYN (synchronous idle) 126 86 56 V
027 23 17 ETB (end of trans. blk) 127 87 57 W
030 24 18 CAN (cancel) 130 88 58 X
031 25 19 EM (end of medium) 131 89 59 Y
032 26 1A SUB (substitute) 132 90 5A Z
033 27 1B ESC (escape) 133 91 5B [
034 28 1C FS (file separator) 134 92 5C \ '\\'
035 29 1D GS (group separator) 135 93 5D ]
036 30 1E RS (record separator) 136 94 5E ^
037 31 1F US (unit separator) 137 95 5F _
040 32 20 SPACE 140 96 60 `
041 33 21 ! 141 97 61 a
042 34 22 " 142 98 62 b
043 35 23 # 143 99 63 c
044 36 24 $ 144 100 64 d
045 37 25 % 145 101 65 e
046 38 26 & 146 102 66 f
047 39 27 ' 147 103 67 g
050 40 28 ( 150 104 68 h
051 41 29 ) 151 105 69 i
052 42 2A * 152 106 6A j
053 43 2B + 153 107 6B k
054 44 2C , 154 108 6C l
055 45 2D - 155 109 6D m
056 46 2E . 156 110 6E n
057 47 2F / 157 111 6F o
060 48 30 0 160 112 70 p
061 49 31 1 161 113 71 q
062 50 32 2 162 114 72 r
063 51 33 3 163 115 73 s
064 52 34 4 164 116 74 t
065 53 35 5 165 117 75 u
066 54 36 6 166 118 76 v
067 55 37 7 167 119 77 w
070 56 38 8 170 120 78 x
071 57 39 9 171 121 79 y
072 58 3A : 172 122 7A z
073 59 3B ; 173 123 7B {
074 60 3C < 174 124 7C |
075 61 3D = 175 125 7D }
076 62 3E > 176 126 7E ~
077 63 3F ? 177 127 7F DEL
For convenience, below are more compact tables in hex and decimal.
2 3 4 5 6 7 30 40 50 60 70 80 90 100 110 120
------------- ---------------------------------
0: 0 @ P ` p 0: ( 2 < F P Z d n x
1: ! 1 A Q a q 1: ) 3 = G Q [ e o y
2: " 2 B R b r 2: * 4 > H R \ f p z
3: # 3 C S c s 3: ! + 5 ? I S ] g q {
4: $ 4 D T d t 4: " , 6 @ J T ^ h r |
5: % 5 E U e u 5: # - 7 A K U _ i s }
6: & 6 F V f v 6: $ . 8 B L V ` j t ~
7: ' 7 G W g w 7: % / 9 C M W a k u DEL
8: ( 8 H X h x 8: & 0 : D N X b l v
9: ) 9 I Y i y 9: ' 1 ; E O Y c m w
A: * : J Z j z
B: + ; K [ k {
C: , < L \ l |
D: - = M ] m }
E: . > N ^ n ~
F: / ? O _ o DEL