Layout of Code in Ehtereum (Burmese)

5 min readOct 23, 2023

ဒီ Article မှာ bytes code တွေကိုဘယ်လိုသိမ်းလဲဆိုတာရယ် Byte code မှာဘယ်လို layout ရှိလဲဆိုတာကို High level ပြောပြပေးသွားမှာဖြစ်ပါတယ်။

ဒီ Article ကို ဘယ်သူတွေဖတ်သင့်လဲ?

mid-level Web3 developer, pen-test သို့ bounty hunter တွေဖတ်သင့်ပါတယ်။

ဒီ Article မှာ Bytes code ရဲ့ Layout နဲ့ အဲ Byte code ကို ဘယ်နေရာမှာသိမ်းလဲ ကို overall ပြောပြပေးထားတာမို့ ဒီ Article ဖတ်ပြီးရင် ကိုယ်နဲ့သက်ဆိုင်ရာ career အလိုက် ဘာဆက်ဖတ်သင့်လဲဆိုတာကို သိရှိသွားမယ်လို့ မျှော်လင့်ပါတယ်။

ဒီနေရာမှာ Byte code ဆိုတာက Solidity ကနေ compile လုပ်လို့ရလာတဲ့ code ကို ဆိုလိုချင်တာဖြစ်ပါတယ်။

Byte code တနည်းအားဖြင့် machine code တွေဖြစ်ပါတယ်။ Hexadecimal format နဲ့ မြင်ရမှာဖြစ်ပါတယ်။ ဥပမာ hex code 0x00 ဆိုတာက Op code STOP ကိုပြောတာဖြစ်ပါတယ်။

https://github.com/crytic/evm-opcodes

တခြား Paradigm ကလို General access memory တို့ storage တို့မှာ code ကိုမသိမ်းဘူး Virtual ROM လို့ခေါ်တဲ့နေရာမှာ byte code ကိုသိမ်းတယ်။ တကယ် execution လုပ်မယ့်အချိန် ရောက်မှ သက်ဆိုင်ရာ address ကိုသုံးပြီး (Key သဘောမျိုး) bytes code (value သဘောမျိုး) ကိုဆွဲထုတ်ပြီးမှ Execution လုပ်တယ်။

Ethereum yellow paper အရ အဲ Virtual ROM ဆိုတာက နောက်ကွယ်မှာ state database လို့ခေါ်တဲ့ Low level database တစ်ခုခုမှာသိမ်းတာဖြစ်ပါတယ်။

ဒီနေရာမှာ ကျတော်တို့ Ethereum blockchain မှာသုံးတဲ့ Data structure တစ်ခုဖြစ်တဲ့ Tire tree data structure ကိုနားလည်ထားရမှာဖြစ်တယ်။ Tire tree data structure တစ်ခုဖြစ်တဲ့ Merkle Patricia tree ကို Ethereum က သုံးတယ်။

Ethereum မှာသုံးတဲ့ Merkle Patricia tree - Data structure (Before the merge) ဟာဆိုရင် Tire ၃ ခုပါဝင်တာဖြစ်ပါတယ်။ Account tire, Transaction Tire, Transaction Receipts တို့ပဲဖြစ်ပါတယ်။ Account tire ကို တနည်းအားဖြင့် World state လို့သိကျတယ်။

Ethereum ရဲ့ The merge phase အပြီးမှာ အောက်က specification အနည်းငယ်ကွာခြားမှုရှိနိုင်ပါတယ်။

ဒီတော့ “Bytes code တွေကို ဘယ်မှာ သိမ်းလဲ” လို့မေးရင်

Account state မှာ code hash အနေနဲ့ bytes code တစ်ခုလုံးကို Hash ပီးသိမ်းတယ်။

ဘာလို့သိမ်းရတာလဲဆိုရင် root hash ကို validate လုပ်တဲ့အချိန်မှာပိုပြီး efficient ဖြစ်အောင်ဖြစ်ပါတယ်
ပိုပြီး detail သိအောင် စာဖတ်သူအနေနဲ့ Block validation, consensus algorithms, Merkle Patricia tree Data structure ကို research လုပ်ဖို့တိုက်တွန်းလိုပါတယ်။ ကျွန်တော်လည်း Article ရေးဖို့ Plan ရှိပါတယ်။

တကယ့် bytes code ကိုကျ state database မှာသိမ်းပါတယ်။

Go နဲ့ Implement လုပ်ထားတဲ့ geth ရဲ့ execution client မှာ Level-db ကို Database အနေနဲ့သုံးပြီးသိမ်းပါတယ်။
Ethereum yellow paper မှာပြောထားတာကတော့ arbitrary-length binary data နဲ့ 256-bits ရှိတဲ့ byte code fragment ကိုတွဲသိမ်းတယ်လို့လဲပြောပါတယ်။
အချိန်ရှိရင် ဒီအပိုင်းကိုလည်း low level ထပ်ရေးပါဦးမယ်

နောက်တခုက Contract မှာ immutable နဲ့ constant လို့သတ်မှတ်ခံထားရတဲ့ variable တွေကို Byte code ထဲမှာ statically သိမ်းတယ်။ ဘာကိုပြောချင်တာလဲဆိုရင် immutable တို့ constant တို့ သတ်မှတ်ထားတဲ့ variable ရဲ့ assigned value ကို byte code ထဲမှာ hexadecimal အနေနဲ့မြင်နေရမှာဖြစ်ပါတယ်။

Layout of the code

Byte code hash ကို Account ရဲ့ element အဖြစ်သိမ်းတယ်။ (Which is part of the block) တကယ့် Byte code ကိုကျ Fragment လေးတွေအဖြစ်နဲ့ Low database မှာသိမ်းတယ်။ State database လို့ Yellow paper မှာသုံးတယ်

တကယ့် Implementation အနေနဲ့ Geth execution client မှာက Level-DB ကိုသုံးပြီးသိမ်းတယ်။

Level db လို Low level database ကို state database အနေနဲ့သုံးပြီး byte code တစ်ခုလုံးကို သိမ်းပြီး ဘာလို့ Account state မှာ code hash ဘာလို့သိမ်းရတာလဲဆိုတာ ကိုရှင်းဖို့ဆိုရင် အကြောင်းပြချက် ၂ ခုရှိတယ်။

Performance reason နဲ့ Optimization reason

Performance Reason

Blockchain မှာက account state ကအရေးကြီးသလို Block တစ်ခုတည်ဆောက်ရာမှာလည်း အရေးပါတယ်။

(merkle patricia tree နဲ့ Ethereum Block structure ကို သဘောပေါက်ထားဖို့လိုတယ်)

ဘာလို့လဲဆိုရင် Account state ရဲ့ elements လေးခုဖြစ်တဲ့ nonce, balance , storageRoot, codeHash တို့ကို root account hash ရအောင် လုပ်ပြီးမှ Block တည်ဆောက်ရလို့ပဲဖြစ်ပါတယ်။

ဒီတော့ ဘယ်တော့မှမပြောင်းလဲတဲ့ code ကို တိုက်ရိုက် Block ထဲမှာထပ်ခါထပ်ခါထည့်ပြီး Hash တာက Performance issues ဖြစ်စေပါတယ်။ အဲအစား code တစ်ခုလုံးကို hash ပြီးခါမှ အဲတွက်ပြီးသား hash ကိုပြန်ပြန်သုံးတာ ပိုပြီး Performance ကို ပိုကောင်းစေပါတယ်။

Optimization Reason

Low level database တွေက Optimization လုပ်ရတာပိုအဆင်ပြေတယ်၊ ဥပမာ: Same byte code ကို deploy လုပ်တဲ့အခါ တူညီတဲ့ same hash ပဲရမယ်၊ ပြောချင်တာက တူညီတဲ့ Byte code ကို လူ ၁၀ ယောက်က deploy ရင်နောက်ကွယ်မှာက တူညီတဲ့ Byte ကို 10 နေရာစာမယူပဲ ၁ နေရာစာမှာပဲသိမ်းပြီး တူညီတဲ့ hash နဲ့ read operation လုပ်ပါတယ်။

ဒီထက်ပိုပြီး detail သိချင်ရင် Level DB ကို research လုပ်ပြီး Data တွေဘယ်လိုသိမ်းလဲဆိုတာကို လေ့လာလို့ရပါတယ်။

Creation and Runtime code

တခြား Compile language တွေလိုပဲ source code ကို Compile လုပ်ပြီးရင် Byte code (machine code) ထုတ်ပေးတယ်။ Solidity compiler ကထုတ်ပေးတဲ့ byte code မှာ အစိတ်အပိုင်း ၂ ခုနဲ့ဖွဲ့စည်းထားတယ်..

အဲတာတွေက Creation bytes code နဲ့ Runtime bytes code တို့ပဲဖြစ်ပါတယ်။ ဒီတော့ Byte code ဆိုတာ ဒီ ၂ ခုပေါင်းထားတာကိုခေါ်တာပဲဖြစ်ပါတယ်။

Creation bytes code

Creation byte code ဆိုတာက contract ကို deploy လုပ်ဖို့ initial setup လုပ်တာပဲဖြစ်ပါတယ်။ မြင်သာအောင်ပြောရမယ်ဆိုရင် contract တစ်ခုရဲ့ constructor ထဲမှာရှိတဲ့ code တွေပဲဖြစ်ပါတယ်။ Constructor ထဲမှာပါတဲ့ code က message call တိုင်း run မယ့်ကောင်မျိုးမဟုတ်တဲ့အပြင် ဒီ Contract ကို Deploy လုပ်နေတုန်း တစ်ခေါက်ပဲ run ဖို့ပဲလိုတာမို့လို့ Creation byte code ရယ်ဆိုပြီးဖြစ်လာတာပါ။

Creation byte code ကို state database (Account state) ထဲမှာမသိမ်းပါဘူး။ သူက Return ပြန်ပေးလိုက်မယ့် Run time bytes code ကိုသာ Account state ထဲမှာ hash အနေနဲ့သိမ်းပါတယ်။

Creation bytes code မှာက procedure 6 ခု ပါပါတယ်။

Free memory pointer ကို setup လုပ်တာ
Non-payable check လုပ်တာ

Constructor က payable မဟုတ်ရင် Contract ကို deploy ချိန်မှာပါတဲ့ msg.value ကို စစ်ပြီး revert လုပ်မှာဖြစ်ပါတယ်။

3. Retrieve constructor parameters

Constructor က parameters တွေလက်ခံထားရင် အဲ Parameters တွေကို stack, memory ထဲကိုရွေ့မှာဖြစ်ပါတယ်။

4. Constructor body

Constructor ထဲ မှာပါတဲ့ code ဖြစ်ပါတယ်။

5. Copy runtime code to memory

Runtime code ကို memory ပေါ်မှာသိမ်းမယ်။

6. return runtime code to EVM

Memory ပေါ်က runtime code ကို EVM ကို return ပေးလိုက်မယ်၊
EVM က runtime code ကို hash မယ်, account state ရဲ့ code hash field ထဲမှာသွားထည့်မယ်။
ဒီနောက် Runtime byte code တွေကို state database ထဲမှာသိမ်းမယ်။

Runtime bytes code

Runtime bytes code ဆိုတာကတော့ EOA message call သို့ contract message call လာတိုင်းအလုပ်လုပ်မှာဖြစ်ပါတယ်။ တကယ့် Function တွေပါတဲ့ code ပေါ့။

Runtime bytes code ကို အဓိက Main အစိတ်အပိုင်း ၃ ခု ပါတယ်။

Dispatcher

“hub” လို့လည်းခေါ်တယ်။
ဒီကောင့်အလုပ်က calldata ကို analysis လုပ်ရင်း ဘယ် smart contract ကိုခေါ်တာလဲဆိုတာကိုရှာဖို့ နဲ့
ဘယ် Function ကို Run တာလဲဆိုတာကို သိရအောင် function selector တွေတိုက်စစ်ဖို့ ဖြစ်တယ်။

Function Wrapper

ဒီကောင်က Function arguments တွေကို Unpacked/Unwrap လုပ်ပေးတယ်။
Function body က return ပြန်မယ့်ကောင်ကို wrap လုပ်ပေးရတယ်။

Function Body

ဒီကောင်ကတော့ specific function ထဲမှာပါတဲ့ specific Logic ပါ။

အပေါ်က ၃ ခုကတော့ အဓိက ကျတဲ့ part တွေပါ။ ဒါအပြင် Runtime code မှာ နောက်ထပ် အသေးစား အစိတ်အပိုင်း ၃ ခုထပ်ပါပါသေးတယ်။ အဲတာတွေက

FMP ကို setup လုပ်တာ,
calldata check

Call data မှာပါတဲ့ Function selector နဲ့ byte code မှာရှိတဲ့ function selector ကိုတိုက်စစ်တယ်။
တကယ်လို့ တူတာမတွေ့ဖူးဆိုရင် receive သို့ fallback function တွေကို Jump ခုန်မယ်။

နောက်တစ်ခုက Contract ရဲ့ Metadata hash တွေပဲဖြစ်ပါတယ်။

မြင်သာအောင် Open Zeppelin ရဲ့ contract ကိုနမူနာကြည့်လို့ရပါတယ်။

https://gists.rawgit.com/ajsantander/23c032ec7a722890feed94d93dff574a/raw/a453b28077e9669d5b51f2dc6d93b539a76834b8/BasicToken.svg

ဒီ Article မှာက High level ပဲ ပြောမှာမို့ low level တွေ Hand-on particle တွေပါမှာမဟုတ်ပါဘူး။ ခုဆိုရင် Bytes code layout တွေ, နောက်ကွယ်မှာ ဘယ်လိုသိမ်းသွားလဲ, fully coverage ဖြစ်အောင် ဘယ် data structure တွေထပ်လေ့လာရမယ် စသည်ဖြင့် သိသွားမယ်လို့ ယုံပါတယ်။

ကျွန်တော့် အနေ နဲ့ Bytes code ကို အစအဆုံး walk thorough လုပ်ဖို့လည်းရှိပါတယ်။ Tire data structure ကိုလည်း research လုပ်ဖို့ရှိပါတယ်။

ဒီ Article ကတော့ detail ဆင်းတာမျိုးမဟုတ်ပဲ နောက်ဘာဆက်လုပ်ဖို့ရှိလဲဆိုတာကို taste ပေးတဲ့ article လေးဖြစ်လို့ ဒီလောက်နဲ့ပဲ Article ကိုရပ်လိုက်ပါတယ်။

Ref:

Bytecode and Init Code and Runtime Code, Oh My!

tl;dr — There are only two types of bytecode on Ethereum but five different names to describe them.

medium.com

Etheruem Yellow paper: https://ethereum.github.io/yellowpaper/paper.pdf