How Smart Contract Bytecode Hides Malicious Logic

The Code You Read Is Not Always The Code That Runs

You paste a contract's Solidity source into your browser. The syntax highlighter makes it look clean, the logic reads sensibly, and the audit firm's badge sits proudly on the project's website. Everything checks out. Except the thing that actually executes on-chain is bytecode, not Solidity, and bytecode, if someone is determined, can be made to do things the source file never mentions.

This is the part most guides skip. Not because it's obscure, but because explaining it properly requires getting into the guts of how the Ethereum Virtual Machine actually works. Worth it.

Solidity Is A Suggestion; Bytecode Is The Law

When a Solidity file gets compiled, it becomes EVM bytecode: a flat sequence of opcodes the virtual machine executes one at a time. Tools like Etherscan verify that a deployed contract's bytecode matches a given source file, and that verification is genuinely useful. But it only catches one attack surface: the case where the source file and the deployed code simply differ.

The subtler attacks live inside legitimate compilation. Three mechanisms matter most.

Metadata hash manipulation. Every compiled Solidity contract includes a metadata hash appended to its bytecode, a fingerprint of the compiler version and settings. Etherscan's verification strips this before comparing. An attacker can alter code in the metadata section in ways that don't affect the stripped comparison but do affect runtime behavior in certain edge cases. Narrow, but real.

Constructor arguments smuggling. Constructor bytecode runs exactly once, at deployment, then disappears from the stored contract code. Malicious initialization logic can run during deployment, set state variables to dangerous values, whitelist an attacker's address, or disable a safety check permanently. None of that shows up in the runtime bytecode that auditors and verification tools inspect afterward. The constructor is gone. Its effects remain.

Dispatcher manipulation. This is the big one. The EVM doesn't call functions by name. It matches a four-byte selector, derived from hashing the function signature, against an incoming transaction's calldata. A contract's dispatcher is the branching logic that says "if the first four bytes are 0xa9059cbb, jump to the transfer routine." An attacker who controls the compiler output, or who writes bytecode directly, can insert a branch the Solidity source never describes: a selector that looks like dead code, or that collides with a legitimate function signature under specific conditions, routing certain callers to a completely different execution path.

A Concrete Scenario Worth Walking Through

Imagine two developers, Priya and Marcus, both auditing the same lending protocol before depositing funds. Priya reads the verified Solidity source and sees a standard `withdraw()` function with a straightforward balance check. Marcus goes one step further and runs the bytecode through a disassembler.

What Marcus finds: a dispatcher branch that triggers when `msg.sender` matches a specific hardcoded address and the calldata includes a particular five-byte prefix that no public function signature produces. That branch bypasses the balance check entirely and drains the contract to an arbitrary address. The Solidity source has no trace of this. The bytecode does.

How did it get there? The deployer didn't use the standard Solidity compiler. They used a modified build that injected 47 additional opcodes into the dispatcher before verification, then submitted the original, clean Solidity source to Etherscan. Verification passed because the injected logic was inserted after the metadata boundary the verifier checks, or because the deployer exploited a now-patched edge case in the verifier's comparison logic.

Priya lost money. Marcus didn't.

What Standard Tooling Actually Catches (And What It Misses)

Mythril, Slither, and similar static analysis tools operate primarily on Solidity source or on the ABI. They are excellent at finding reentrancy patterns, integer overflow risks, and access control gaps that exist in the source. They are not, by default, tools for detecting discrepancies between source intent and bytecode behavior. Treating them as such is a category error that costs people money.

Etherscan verification is a hash comparison, not a semantic audit. It answers "does this bytecode match this source?" not "does this bytecode do anything the source doesn't mention?"

Formal verification tools like Certora's Prover or Runtime Verification's K framework work closer to the execution layer and are better equipped to catch behavioral anomalies. The catch: they require explicit specifications, written in advance, describing what the contract should and shouldn't do. A spec that doesn't ask "can any selector bypass the balance check?" won't catch a hidden bypass.

Most audits are source audits. Bytecode audits are rarer, slower, and require different skills.

What People Get Wrong About Verification

The common assumption is that a verified contract is a safe contract. This needs to die.

Verification means the source compiles to the deployed bytecode under a specific compiler version and settings. That's it. It says nothing about whether the source itself contains hidden logic triggered by specific conditions, nothing about whether the compiler was standard, nothing about constructor-installed state the runtime code silently depends on.

There's also a subtler confusion around proxy patterns. Many protocols deploy a thin proxy contract that delegates all calls to a separate implementation contract. The proxy is verified. The implementation is verified. But the proxy's `fallback()` function routes execution based on a stored address, and if that address can be changed by a privileged account, the entire verified codebase becomes irrelevant the moment someone calls `upgradeTo()`. Think of it like a notarized deed for a house where someone else holds a master key: the paperwork is perfect; the lock is the problem.

Upgradeability is, in this sense, a permanent asterisk on any audit finding. Ask yourself: have you ever actually checked whether a protocol you use has a live `upgradeTo()` function controlled by a two-of-three multisig you've never heard of?

How Serious Auditors Actually Approach This

The firms that find the hard stuff, Trail of Bits is the name that comes up most consistently, do several things that checkbox auditors don't.

They diff deployed bytecode against a locally compiled version of the submitted source, opcode by opcode, not hash by hash. Any divergence is a finding, even if it looks benign.

They trace the dispatcher manually, mapping every reachable selector to the code path it executes, including selectors that don't appear in the ABI. A selector routing to an unexpected jump target is an immediate red flag.

They review constructor logic separately, before it vanishes, by analyzing the deployment transaction's input data directly.

And they ask, explicitly, about compiler provenance: which binary was used, where it came from, and whether the hash of that binary matches the official release. This last step sounds paranoid. It isn't.

None of this is exotic. It's just slower, slower work costs more, and most project budgets don't stretch that far. That's not a technical problem. It's an incentives problem, and the market has not solved it.

A clean audit report is evidence of a professional review, not proof of safety. The bytecode is the contract. Everything else is documentation. When the documentation and the bytecode diverge, the bytecode wins, every single time, without a warning, without a grace period, and without any interest in your audit badge.