A common chant from many in this space these days in response to any discussion of Bitcoin protocol changes is: “Don’t mess with Layer 1! You can just build it on Layer 2!” This seems like a very logical thing to do, right? Why risk the security and stability of L1 when you can just build on top of it? The problem is that this fundamentally misunderstands the relationship between Layer 1 and Layer 2.
An L2 protocol is an extension of the L1. Everything an L2 was designed to do must ultimately be boiled down to what the L1 is capable of. The blanket statement of “just do it on L2!” obscures numerous implicit realities of what can or cannot be done on an L2 given the current state of the base layer. For example, imagine trying to build the Lightning Network without the existence of multi-signature scripts. You couldn’t. It wouldn’t be possible to share control between more than one person, and the whole concept of a payment channel wouldn’t be possible.
The evolution of payment channels
The whole reason that payment channels can exist in the first place is due to the fact that Bitcoin’s L1 supports the ability for multiple people to share control of a UTXO with a multisig script. What is possible on an L2 is inherently limited by what is possible on an L1; yes, of course it is possible to do things on L2 that are not possible on L1, but the ultimate limiting factor of what you can do off-chain is what is possible on-chain. Faster payment confirmation in a payment channel is only possible because on-chain custody can be shared between multiple people.
But even that is not enough for a secure payment channel. The original payment channel had a pre-signed transaction using an nLocktime timeslot that gives the backer their money back after so many blocks, and only supported one-way payment channels. The feasibility of transactions made the use of these original payment channels unsafe. If the funding transaction was mishandled by anyone before it was confirmed, the refund transaction would become void and the funder would not be able to recover their money. The other party in the channel could effectively hold their money hostage.
CHECKLOCKTIMEVERIFY, the absolute timelock opcode, was the solution. CLTV allows you to make a coin unusable until a certain block height or time in the future. This, combined with the ability to create scripts that could be spent in multiple ways, allowed the multisig UTXO to have a script path where the backer could spend all the money themselves after a time slot. This guaranteed that the lender would be able to recover the money in the worst-case scenario, even if the financing transaction failed. However, the channel could still only allow one-way payments.
To make two-way payments possible, a good solution for the feasibility of transactions was necessary. This was a huge motivator for Segregated Witness. A time slot is all that was needed for a one-way channel because of the money only increased in one direction. The only risk for the sender was that the other party would never claim what had already been sent to them down the chain, leaving the rest of the sender’s money trapped. The time slot refund gave both the recipient the incentive to claim money on the chain before the time slot, when they would lose all the money they had already sent, and the sender a worst-case scenario recourse in case something happen, which would cause the receiver to go offline permanently. . Script does not support enforcing certain amounts for certain future scripts, so a pre-signed transaction is the only viable initial refund mechanism if payments need to flow both ways. This reopened the risk of funds being held hostage.
This problem was resolved with the upgrade to Segwit. Instead of the time slot refund that encourages honest behavior, the penalty key was introduced. Because money in a two-way channel can flow back and forth in any direction, there will inevitably be a case where both parties had more money in a previous state of the channel than in the current one. By setting up a branch in each channel state’s pre-signed transaction using a penalty key, users can exchange it after signing the new state and know that if the other party tries to use an old transaction, they will receive 100% of the can claim money in the channel. Time slots are used to ensure that the normal spending process where users withdraw their respective balances is not valid for a while, giving channel parties the opportunity to use the penalty key if necessary. There’s a catch, though: using CLTV means the channel will air at some point in the future has to close, otherwise the time slot will expire and you will no longer have that safety period to punish the dishonest party.
Bi-directional payment channels also needed CHECKSEQUENCEVERIFY or relative time slots to solve this problem. Unlike CLTV, which specifies a specific time or block height in the future, CSV specifies a relative length of time or number of blocks from the time or block that the UTXO using CSV in the script is confirmed in the blockchain. This allowed the security window to function for the use of penalty keys without requiring channels to close the chain at a predetermined time.
However, even this doesn’t get us the Lightning Network. There’s still no way to actually route a payment across multiple payment channels. They can make payments in both directions, but only between the two people involved in the channel. To route payments across multiple channels you need, you guessed it, different functionality of the L1. Hash Time Locked Contracts are how this is achieved, and they require both CLTV and hashlocks. Hashlocks require the preimage to be provided to a hash in order to spend the coins. It’s similar to a signature, except you’re actually just revealing the “private key” instead of signing with it. This allows the recipient to provide a hashlock in a Lightning payment, and each intermediate channel between sender and recipient creates a script that allows it to be spent immediately with the hash preimage, or the money can be refunded afterwards after a timelock. If the recipient reveals the hashlock, anyone can claim the money for forwarding the payment. If not, the money can be reversed and reversed without rounding off.
The Lightning Network as it exists today is therefore completely dependent on five functionalities possible on the base layer of Bitcoin. Multisignature scripts, absolute time locks, relative time locks, Segregated Witness and hashlocks. Without any of these features on L1, Lightning as we know it wouldn’t be a possible L2 that we could build. Its existence as L2 is completely dependent on L1’s ability to do certain things. So if you were to do that, in a world with a Bitcoin that doesn’t support hashlocks, timelocks in scripts, and no manufacturability solution, just go: “Just build a bi-directional multi-hop payment channel system on Layer 2! We shouldn’t mess with Layer 1′, that would be a completely incoherent statement.
The catch
That said, strictly speaking, it would still have been possible to build that bi-directional multi-hop payment channel system in that world without those three features on L1. At one enormous costs in terms of creating trust in other people not to steal your money when they are able to do so. A federated sidechain. Anyone could have just set up a federated chain like Liquid or Rootstock and added these features to the sidechain, and built the Lightning Network there instead of on the main chain. The problem with that is that it’s not the same. On a technical level, the network would function exactly the same, but no one using it would actually have the same degree of control over their coins.
When they closed a Lightning channel it would settle on a sidechain backed by a federation, i.e. it would just be an accounting entry on top of someone else’s multisig wallet where you have no way to move those coins onto L1 to check. You just have to trust that the distributed group running the federation isn’t screwing everyone over. Even drivechains (which ironically require new L1 functionality) are ultimately just another form of federation, with some additional restrictions added to the ingestion process. The federation consists only of miners instead of people who hold private keys.
This is the implicit reality, whether they understand it or not, that underlies the response “just build it on T2!” when someone discusses improvements to L1. There is the scope of what is already possible to build on T2, which is quite limited and limited by its own limitations of scale, and then there is the scope of what is not yet possible. Anything that falls into the latter category is impossible to build without the involvement of a trusted entity or group of entities that ultimately controls users’ money.
What’s the point?
“Layer 2” is not a magical incantation. You can’t just wave a magic wand and sing the words, and anything and everything magically becomes possible. There are strict, inescapable limitations on what an L2 can achieve, and those limitations are what the L1 can achieve. This is just an inherent fact of technical reality when we look at a system like Bitcoin. There’s no way to escape it except by lowering the trust assumptions further and further as you build a more flexible L2 that goes beyond the capabilities of L1.
So when discussions about these issues take place, such as what improvements can be made in L1, two things are of paramount importance. First, these improvements to L1 are almost entirely focused on enabling more flexible and scalable L2s. Second, L2s cannot magically make everything possible. L2s have their own limitations, based on those of the L1, and having a discussion about changes in L1 without recognizing that the only way around these limitations is to introduce familiar entities is not a fair conversation.
It’s time to acknowledge reality when we start discussing what to do with Bitcoin in the future, otherwise nothing but denial of reality and gaslighting will happen. And that is not productive.