Analyzing bytecode
In the previous post we have created and compiled a simple contract. We have seen the output from the compiler, but have not done anything with it, since I told you I would get back to it. Well, here I am, let’s look at the bytecode and see what it does, what it means and how it is interpreted.
Bytecode
Now, I have talked a few times about 'bytecode' without really explaining what it is or what it does. Bytecode is best comparable to assembly; the compiled counterpart of human readable code.
Bytecode was proposed in the Ethereum Yellow Paper by Gavin Wood. It details all the instructions contract accounts can use. Contract accounts are, in contrast to 'Externally-owned account's' (EOA), not controlled by private keys and cannot independently initiate an ethereum transaction. It always needs an external trigger to start. What it does upon this trigger, is defined in the bytecode of the contract account.
Bytecode runs in the Etereum Virutal Machine (EVM). The EVM is considered a state machine with a globally shared state. This means that all block producers hold an equal state of the state machine depending on the last block mined. There is no room for interpretation for the state of the complete machine nor for an individual contract. Its memory state, balance and stack are determined by the instructions executed prior and deterministic because all interactions with the contract are recorded on the blockchain.
Now, what exactly happens when a contract is executed on the blockchain? Let’s dive into the starting sequence of a regular smart contract and see what happens in the memory, stack and in the EVM.
Bootstrapping a contract
A contract usually always starts with the following pattern:
6080604052
This is called the bootstrap. If we look up what the instructions are, for example by either reading the Yellow Paper or by consulting ethervm, we see that there are actually only three instructions, two of whom with a single argument.
60 80 = PUSH1 0x80
60 40 = PUSH1 0x40
52 = MSTORE
The first instruction tells the EVM to place the hexidecimal number 80
on the
stack. The second instruction tells the EVM to place the hexidecimal number 40
on the stack.
The stack and its operations
As in classical computing, the EVM also has a
stack
. The stack is nothing more than a piece of memory to which operations always happen in a LIFO style, "Last in, First Out". Which basically means that operations operating on the stack always operate on the top (last 'pushed' item) or place an item on top of the previous placed number on the stack.There are two different actions possible when working with a stack. The first is a push operation, the other a pop operation. The push operation, as is probably imaginable, pushes an item onto the stack. Which is then placed on top of the stack. The second operation,
pop
, retrieves the last placed item on the stack and removes it from the stack.
If we emulate the first two transactions, we can see what happens in the stack. The
stack starts of empty while the EVM starts with the first instruction (PUSH1
0x80
):
Operation | Stack contents |
---|---|
PUSH1 0x80 | - |
Next, we will see that the hexidecimal value 0x80 is pushed onto the stack and
the EVM now looks at the next instruction; PUSH1 0x40
.
Operation | Stack contents |
---|---|
PUSH1 0x40 | 0x80 |
Finally, after these two instructions, our stack looks as follows:
Operation | Stack contents |
---|---|
0x40 | |
0x80 |
The next instruction from the bytecode is MSTORE. MSTORE is defined in the yellow paper as follows:
Value |
Mnemonic |
δ |
α |
Description |
0x52 |
MSTORE |
2 |
0 |
Save word to memory. |
This definition ensures the following; it has two input arguments (as can seen in the table under the delta (δ) symbol) and no output arguments (under the alpha (α) symbol). In the description we can read that it stores a word to memory followed by an equation which defines the cost related to this function. The cost is dependent on the amount of memory the contract has already stored (for example; when deploying the contract it pays for the first 32 bytes of memory using the bootstrap instructions. Any untouched memory after that can be claimed in chunks of 32 bytes (256 bits) and will be paid for by the user invoking the contract at the time extra memory is needed. The costs are not interesting though to talk about right now. I might spend a future blog on deciphering this topic though.
So, what happens when we exectute the MSTORE command? Remember, our stack looked like
Operation | Stack contents |
---|---|
0x40 | |
MSTORE | 0x80 |
Now, the MSTORE command pops the last two arguments off the stack. So, after the MSTORE operation, our stack is empty again. However, another element is introduced, inject the memory. As MSTORE is one of the three bytecode instructions which can interact with memory, the others are MSTORE8 and MLOAD. We will see the other instructions in another post. MSTORE writes a (u)int256 to memory. In the ethervm table we find the definition for MSTORE:
memory[offset:offset+32] = value
where offset is the first argument and value the second. Inserting our arguments we see that we have the following expression:
memory[0x40:0x72] = 0x80
You may wonder why the memory is not starting at 0x0. Well, solidity compiled code requires the following memory layout:
0x00 - 0x3f (64 bytes): scratch space
0x40 - 0x5f (32 bytes): free memory pointer
0x60 - 0x7f (32 bytes): zero slot
-
The scratch space can be used between statements, i.e. within inline assembly and for hashing methods. This means that this piece of memory can -and will- be used by the compiler to use when switching between methods or as call arguments. What the solidity compiler really does goes beyond this post.
-
Free memory pointer is the currently allocated memory size. When more memory is allocated, we will see this number increasing. Storing this number can be usefull for other instructions to check wether there is memory already available for a contract or that extra memory should be allocated.
-
The zero slot is used as an initial value for dynamic memory arrays and should never be written to.
So, our memory looks as follows after the MSTORE command:
Memory |
---|
00000000000000000000000000000000 |