Blazin' Etudes - Hack A Sat 3 Quals (2022) - Writeup
(If you just want the solution, and not the story, skip down to “Emulating MLIL on z3”)
Introduction
This year was Samurai’s third time playing Hack A Sat’s (HAS) quals round. The first time we played, we qualified for finals. Year 2, we didn’t qualify. So this year we were hoping to qualify again. I personally wasn’t super involved in the first year, and only a bit involved last year (I don’t think the weekends lined up super great for me), but I was interested to take a shot at it this year and hopefully qualify!
Blazin’ Etudes was the third of a series of microblaze reversing challenges, released on the last day of the competition (Sunday). In order to properly appreciate it, we should briefly look at the preceding challenges.
I woke up Saturday morning a few hours after the start of the competition. I looked at Samurai’s chat and saw the competition was well underway. I started to sleepily survey the categories and saw there was an oddball architecture challenge. Months ago, I had read in Binary Ninja’s public Slack about someone building a Binary Ninja architecture plugin for this random arch called microblaze, so its existence was vaguely familiar. I decided to take a look.
The first challenge was called Small Hashes Anyways, and everyone was busy taking stock of what we had to work with. Ideas flew for different ways to disassemble and run the challenge binary. Before I finished waking up and getting up to speed, one of my sharp teammates had already solved it. It sounds like an educated bruteforce was the quickest win.
Ominous Etude
We moved onto the next challenge, called Ominous Etude. I was now vaguely awake and at my computer, so I installed the plugin for Binary Ninja and opened up the challenge binary. I noticed there appeared to be a bug computing call targets.
100032d8 int32_t main()
100032f8 0xfffdf5c(0x101e15e8, 0xffff9df0) // hmmm this call doesn't look right
10003314 std::istream::operator>>(&std::cin)
10003344 uint32_t var_8
10003344 if (zx.d(quick_maths(var_8) ^ 1) != 0)
10003370 0xfffc2e4(0xfffdf5c(0x101e15e8, 0xffff9e08), 0xffffd5f0)
10003380 exit(1, 0xffffd5f0)
100033b4 0xfffc2e4(0xfffdf5c(0x101e15e8, 0xffff9e1c), 0xffffd5f0)
100033c4 exit(0, 0xffffd5f0)
100033c4 return __static_initialization_and_destruction_0(0, 0xffffd5f0) __tailcall
We chatted about it a bit in Discord voice, and we dug up the microblaze arch spec to learn how call instructions (er brlid
instructions) work on this architecture. While some of us dove down the Binary Ninja rabbit hole, others were trying out r2, Ghidra, and IDA plugins. One teammate using the IDA plugin hand-decompiled the quick_maths
function and brute-forced the solution, which gave us the flag. Total time to solve: roughly an hour and a half.
Debugging the Binary Ninja plugin
At this point, I figured that if there had been 2 microblaze challenges already, there might be more (plus I like goofing around with tooling), so I wanted to finish fixing the plugin. A couple teammates joined me on this expedition.
We compared the IDA disassembly with the Binary Ninja disassembly and noticed that there seemed to be a problem computing the immediates. We noticed the sign bit was set on the immediate section of the brlid
instructions that were incorrect.
chainsaw10: I’m going to abuse this as a chat about the microblaze stuff for lack of another channel
chainsaw10:wrong: b0000009b9f4ac60 correct: b0000007b9f438ec
chainsaw10: so the one on the left ends up wrong, the one on the right shows up correct
chainsaw10: I wonder if they’re incorrectly sign extending
Microblaze nominally has 32-bit instructions, which only really leaves room for a 16-bit immediate value. However, they want to allow full 32-bit immediates for certain instructions, like brlid
, so they have a prefix instruction, imm
, which adds another 16-bits worth of immediate.
It turned out that the plugin was unconditionally sign extending the 16-bit immediate in these instructions. There was a check elsewhere for the imm
prefix, but there was no code to un-sign-extend the lower 16 bits. (An easy enough bug to write, no disrespect intended to the plugin author!)
@bitspec.dataclass('-:16 i:s16')
class Rel16(Imm16):
"""Imm16 operand statically known to be relative address pointer.
if isinstance(y.b, microblaze.Imm16):
y = copy.copy(y) # ???
y.b = microblaze.Imm32(
i=(x.b.i << 16) | (y.b.i),
fused_from=type(y.b)
)
Do all sufficiently complex binary analysis projects really contain an ad-hoc implementation of LLVM’s MCInst?
— amtal, https://amtal.github.io/bitspec/#motivation-and-similar-tools
Ok, great, we just need to fix up that math and we’ll be golden. However, to somewhat misquote amtal, the plugin author, all Binary Ninja arch plugins contain an ad-hoc domain-specific language for writing disassemblers and lifters. To implement a robust solution, we’ll need to spend a few minutes finding the right place to fix it.
amtal had created and used a library called bitspec
to handle the bit math and pattern matching needed to write the disassembler. At a quick glance, it looks really cool, and it seems like a great approach! However, on a CTF time-scale, we did not want to learn how this library worked, so we looked for some normal Python code away from any decorator magic :) . My teammate and I bounced ideas back and forth and eventually settled on creating a new Rel32
class to hold some of our hackery. We then adjusted the imm
instruction handler to use it.
diff --git a/arch.py b/arch.py
index 664c197..c5eebe3 100644
--- a/arch.py
+++ b/arch.py
@@ -143,10 +143,21 @@ def fuse_ops(data, addr, order):
else:
if not (y:= next(ops, None)):
return # not enough data to disambiguate
- if isinstance(y.b, microblaze.Imm16):
+ if isinstance(y.b, microblaze.Rel16):
+ y = copy.copy(y) # ???
+ y.b = microblaze.Rel32(
+ i=(x.b.i << 16) | (y.b.i & 0xFFFF),
+ #fused_from=type(y.b)
+ )
+ # length is treated as 8 to fool assembler algorithm, but .addr
+ # is still the non-IMM instruction's addr so lifter/branch
+ # predictor should still be correct
+ y.__bitspec_match__ = FusedLength
+ assert len(y) == 8
+ elif isinstance(y.b, microblaze.Imm16):
y = copy.copy(y) # ???
y.b = microblaze.Imm32(
- i=(x.b.i << 16) | (y.b.i),
+ i=(x.b.i << 16) | (y.b.i & 0xFFFF),
fused_from=type(y.b)
)
# length is treated as 8 to fool assembler algorithm, but .addr
diff --git a/microblaze.py b/microblaze.py
index e2b45e3..b3a441d 100644
--- a/microblaze.py
+++ b/microblaze.py
@@ -41,12 +41,25 @@ class Rel16(Imm16):
ea = (self.i + addr) & 0xFFFFffff
return [
tok(ty.PossibleAddressToken, hex(ea), ea),
- tok(ty.TextToken, ' - '), # do not break il.Operand count
- tok(ty.PossibleAddressToken, hex(addr), addr),
]
def r(self, il, addr=None, sz=4):
return il.const_pointer(sz, (self.i+addr) & 0xFFFFffff)
+class Rel32(Rel16):
+ def __init__(self, i):
+ self.i = i
+ def ea(self, addr):
+ # TODO: figure out if this wraps properly
+ return (self.i + addr) & 0xffffffff
+ def text(self, addr):
+ tok, ty = bn.InstructionTextToken, bn.InstructionTextTokenType
+ ea = self.ea(addr)
+ return [
+ tok(ty.PossibleAddressToken, hex(ea), ea)
+ ]
+ def r(self, il, addr=None, sz=4):
+ return il.const_pointer(sz, self.ea(addr))
+
@bitspec.dataclass('-:16 i:s16')
class Abs16(Imm16):
"""Imm16 operand statically known to be absolute address pointer.
100032d8 int32_t main()
100032f8 std::operator >(&std::cout, "enter decimal number: ")
10003314 std::istream::operator>>(&std::cin)
10003344 uint32_t var_8
10003344 if (zx.d(quick_maths(var_8) ^ 1) != 0)
10003370 std::ostream::operator >(&std::cout, &data_10199e08))
10003380 exit(1, 0x1009d5f0)
100033b4 std::ostream::operator >(&std::cout, "cool :)"))
100033c4 exit(0, 0x1009d5f0)
100033c4 return __static_initialization_and_destruction_0(0, 0x1009d5f0) __tailcall
This was enough to clean up our decompilation of main
, as well as of quick_maths
. At this point, another teammate dropped in chat to let us know that the next reversing challenge wasn’t microblaze, and it might not be worth devoting any more time to fixing up this plugin.
So naturally, I ignored his perfectly good advice and spent another 15 minutes staring at it to make sure the fix was robust. (I swear, I really was going to move on after that!) It turned out the fix wasn’t robust — the same issue applied to instructions that loaded 32-bit absolute integers, rather than just IP-relative integers. I applied the fix you see above, (y.b.i & 0xFFFF)
, then moved on to other challenges.
chainsaw10: Now I’m done screwing around with binja unless we find more arch bugs
Blazin’ Etudes
It would, in fact, have been a waste of CTF time, if not for the next challenge. In hindsight, “ominous” coupled with the simplistic design of this challenge binary should probably have been a clue, but I missed it at the time.
However, about a day later, a new microblaze challenge dropped, Blazin’ Etudes. In the style of the 334, 666, and 1000 cuts challenges from Defcon CTF 2016 quals, we were provided 178 nearly identical binaries, with just the constants changing. The remote service would prompt us with the name of the binary, and we needed to provide a valid input for it.
Now, back in 2016, I was still in college. I was too chicken to attempt Quals that year, but I remember hearing from one of my friends about this challenge, and I had eagerly read Ryan Stortz’s 2000 cuts with Binary Ninja blog post. As an undergrad, actually buying an IDA license was out of reach, so news of an up-and-coming competitor that was actually good was excellent news. I dreamed of one day doing something similar to that blog post. Well, fast forward 6 years and now I have my chance, or something like that.
chainsaw10: I’m planning to attempt scripting binja
chainsaw10: Just trying to figure out an automated way to solve the quick_maths
chainsaw10: Current plan is to write an MLIL emulator in terms of z3 then solve
chainsaw10: I think I can do that somewhat quickly
Binary Ninja
Based on the theory behind the 2000 cuts blog post, I was planning to leverage Binary Ninja’s intermediate languages to abstract away the Microblaze architecture (that I didn’t know anything about before today). Binary Ninja is a disassembler, similar to IDA and Ghidra, with a focus on having a usable API. The authors take inspiration from the construction of compilers. Compilers start by parsing your code into an AST, which they then “lower” to some intermediate representation (IR) (of which the most famous is LLVM IR). Often, compilers will then perform simplifying (or “lowering”) passes upon their IR until they eventually reach a representation that maps onto machine code, at which point they can do code generation.
In the same vein, Binary Ninja starts at the bottom and works up. It starts by “lifting” from assembly to a machine-independent representation they call “Lifted IL” (where IL is “intermediate language”, a synonym for “intermediate representation”). Then they perform analysis and abstract away CPU flags, which yields “Low Level IL”, or LLIL. Next, stack analysis is performed and registers and stack are combined to form “variables”. Calling conventions are analyzed and function calls are annotated with their full argument list. This, combined with some analysis I’ve likely left out, forms “Medium Level IL” (MLIL). To reach a state most would agree is “decompilation”, they next analyze control flow and recover if-statements and high level loops. Variables are deduplicated. Eventually, they reach “High Level IL” (HLIL), which maps directly to C constructs. They’ve fully inverted a compiler’s pipeline, replacing lowering passes with lifting passes.
Not mentioned in that summary is all the dataflow and value-set analysis that enables you to point at a variable and ask Binary Ninja what its value is. Additionally, Binary Ninja can optionally give you any of that set of ILs in Static Single Assignment (SSA) form, where each time a variable is modified, it is given a new name. This aids your analysis as each value in the analyzed function has a name. Recovering SSA form from normal use of variables is a somewhat mechanical process (summarized first in a paper by Cytron et al), but having tried to code it up myself previously, I can attest that it’s tedious and I’d much prefer lean on Binary Ninja to do that for me.
Alright how is all this fancy program analysis going to help me win
So at this point we have our pick of the different levels of (de?) optimization from Binary Ninja. I chose MLIL in SSA form because it’s often recommended as best for analysis, and I happen to understand it the best. Here’s a snippet of how the quick_maths
function displays in the UI. We can see a set of consecutive math operations ending in a return statement. Our analysis is simplified by the fact that the entire function is one basic block (i.e. no conditionals).
10000964 uint32_t quick_maths(int32_t arg1)
0 @ 10000974 arg_4#1 = arg1#0
1 @ 10000978 r3#1 = arg_4#1
2 @ 1000097c r3_1#2 = r3#1 - 0x29
3 @ 10000980 arg_4#2 = r3_1#2
4 @ 10000984 r3_2#3 = arg_4#2
5 @ 10000988 r3_3#4 = r3_2#3 - 0x6a
6 @ 1000098c arg_4#3 = r3_3#4
7 @ 10000990 r3_4#5 = arg_4#3
<...snip>
88 @ 10000af4 r4_28#29 = r4_27#28
Next, I must introduce another favorite CTF tool, z3! z3 is the product of years of research at Microsoft Research, with a goal of solving SAT problems as efficiently as possible. For CTF purposes, it turns out that it’s really good at algebra. The Python bindings for z3 allow us to declare variables and use normal overloaded operators to build an expression. We can then declare constraints with the expression and ask z3 for values of our variables that satisfy those constraints. I’ve included a simple example below:
>>> import z3
>>> s = z3.Solver()
>>> x = z3.BitVec("x", 32)
>>> y = z3.BitVec("y", 32)
>>> s.add(y-x == 42)
>>> s.add(x-3 == 7)
>>> s.check()
sat
>>> s.model()
[x = 10, y = 52]
So now in theory we have a plan. We’ll loop through the quick_maths
function and compute a z3 expression for each SSA variable assignment. (Here we make use of the fact that each variable only has one assignment!) Once finished, we tell z3 that the return value is equal to 1 and ask it to solve. It should quickly find the input value to satisfy that constraint.
Emulating MLIL on z3
11 @ 100009a0 r3_7#8 = r3_6#7 + 0x225
Binary Ninja’s ILs follow a tree-based format. That instruction above is a MLIL_SET_VAR_SSA
instruction, where the “dest” is an SSAVariable
, and the src
is itself a MLIL_ADD
expression, with a left
side of MLIL_VAR_SSA
and a right
side of MLIL_CONST
.
Writing a series of consecutive conditionals to handle all those possibilities can be daunting, unless we reach into the annals of object-oriented design and take inspiration from the Visitor pattern. As it applies here, we can write roughly the following pseudocode:
function emulate(insn: Instruction): Expression {
switch (insn.op) {
case MLIL_SET_VAR_SSA:
add_constraint(insn.dest == emulate(insn.src))
case MLIL_ADD:
return emulate(insn.left) + emulate(insn.right)
case MLIL_VAR_SSA:
return generate_varname(insn.src)
default:
print("you didn't emulate", insn)
}
}
This neatly takes care of the tree-walking for us via recursion! At the end we can implement case MLIL_RET
to ask z3 for the solution.
So I did this. And I ended up with Python that looked roughly like the following (note the full solve script is available at the bottom):
class Emulator:
@staticmethod
def z3var_from_ssa(ssa_var, size):
assert size in [1, 4]
return z3.BitVec(ssa_var.name + "#" + str(ssa_var.version), size*8)
def emulate(self, insn):
if isinstance(insn, bn.SetVar):
var = self.z3var_from_ssa(insn.dest, insn.size)
expr = self.emulate(insn.src)
self.s.add(var == expr)
elif isinstance(insn, bn.MediumLevelILVarSsa):
return self.z3var_from_ssa(insn.src, insn.size)
elif isinstance(insn, bn.MediumLevelILAdd):
return self.emulate(insn.left) + self.emulate(insn.right)
elif isinstance(insn, bn.MediumLevelILSub):
return self.emulate(insn.left) - self.emulate(insn.right)
elif isinstance(insn, bn.Return):
retval = self.emulate(insn.src[0])
self.s.add(retval == 1)
return retval
else:
print("unhandled", insn.operation)
def solve_for_arg(self, mlil_bb):
retval = None
for insn in mlil_bb:
retval = self.emulate(insn)
print(self.s)
chk = self.s.check()
print(chk)
if chk == z3.sat:
res = self.s.model()[z3.BitVec("arg1#0", 4*8)]
return res
And I ran it and I got… unsat
, z3’s way of saying “that’s impossible”.
chainsaw10: I have a vaguely working emulator, but I’m getting unsat
chainsaw10: so should just need to debug
What could I possibly have wrong? Well, there are lots of options:
- Microblaze is big-endian. It’s possible I have endianness somehow wrong, or maybe the architecture plugin’s lifting does.
- Maybe I’m not properly handling integer overflow or bit math or something. (z3’s BitVec type is designed for precisely this, but maybe I’m holding it wrong.)
- Maybe my bitshift emulation is wrong. IDK how z3 numbers bits, maybe something’s switched somewhere. The shift operation isn’t “shift”, it’s “extract bits”.
So what can you do? Well, I tried weakening the constraint on the return value. Eventually I tried just setting it non-zero, and that got me sat
!
chainsaw10: I have a solve that works for most but not all binaries
So I coded up a harness to talk to the remote and throw solutions, let’s see what happens. Along the way it turns out pwntools doesn’t play nicely on an M1 Mac, so I tried out mpwn
, which worked well.
chainsaw10: I’ve gotten as many as 7 in a row, but that’s not enough
teammate: show me your script
It was kinda fun watching my script solve a few binaries (before hitting one that hit our bug and failing), so I reran it a few more times for the fun of it. But one time I didn’t get an error. I looked closer, and I had gotten lucky and gotten 10 binaries in a row that didn’t use whatever operation was buggy in my emulator! But I hadn’t handled the “win” case in my script, so it just exited.
chainsaw10: hell, I should just run it again. It just won, but I didn’t print the flag
chainsaw10: It only won once in like 30
chainsaw10: but that’s doable
So I fixed the bug (ensuring I used r.interactive()
to drop to an interactive shell if I won but didn’t read the flag properly in my script) and reran it. And reran it, and reran it, like 15 times. Finally on what had to have been the 20th time, I got lucky and it gave me the flag. Time to solve: ~2 hours.
Epilogue
So what was the bug?
10000a8c 90830041 srl r4, r3
10000a90 90840041 srl r4, r4
10000a94 90840041 srl r4, r4
10000a98 90840041 srl r4, r4
10000a9c 90840041 srl r4, r4
10000aa0 90840041 srl r4, r4
Well, it turns out that srl
is a shift left, not a shift right, as it was lifted to be. I had noticed it seemed off, and I tried to correct for it, but I apparently had something else messed up, and in the confusion I never properly tested it.
Incidentally, from talking to someone from another team in the Hack-A-Sat Slack afterward, I realized that my goofing around on Saturday had saved me a good bit of trouble. The bug in the 32-bit immediates affected xor
instructions. Trying to debug subtly-wrong xor
immediate values would have been very difficult.
After the competition, I submitted a PR to fix the bugs we found, and it was accepted soon after. I don’t mean to complain about the bugs, as the plugin seems pretty good overall, and I can easily imagine writing those bugs myself. Not to mention that they made for a more interesting time playing the CTF, and a far more fun writeup.
Solve script
from mp import *
import base64
import glob
import sys
import z3
import binaryninja as bn
FUNCNAME = '_Z11quick_mathsj'
class Emulator:
def __init__(self, bv):
self.s = z3.Solver()
self.hooks = {
self.get_import(bv, "__udivsi3"): self.hook_udivsi3,
}
def hook_udivsi3(self, insn):
lhs = self.emulate(insn.params[0])
rhs = self.emulate(insn.params[1])
#return lhs / rhs
return z3.UDiv(lhs, rhs)
@staticmethod
def get_import(bv, name):
type = bn.SymbolType.ImportedFunctionSymbol
try:
sym = next((sym for sym in bv.symbols[name] if sym.type == type), None)
except KeyError:
sym = None
if not sym:
return None
return sym.address
@staticmethod
def z3var_from_ssa(ssa_var, size):
assert size in [1, 4]
return z3.BitVec(ssa_var.name + "#" + str(ssa_var.version), size*8)
def emulate(self, insn):
if isinstance(insn, bn.SetVar):
var = self.z3var_from_ssa(insn.dest, insn.size)
expr = self.emulate(insn.src)
self.s.add(var == expr)
elif isinstance(insn, bn.Call):
assert insn.dest.operation == bn.MediumLevelILOperation.MLIL_CONST_PTR
addr = insn.dest.constant
result = self.hooks[addr](insn)
output = self.z3var_from_ssa(insn.output[0], insn.dest.size)
self.s.add(output == result)
elif isinstance(insn, bn.MediumLevelILVarSsa):
return self.z3var_from_ssa(insn.src, insn.size)
elif isinstance(insn, bn.MediumLevelILAdd):
return self.emulate(insn.left) + self.emulate(insn.right)
elif isinstance(insn, bn.MediumLevelILSub):
return self.emulate(insn.left) - self.emulate(insn.right)
elif isinstance(insn, bn.MediumLevelILXor):
return self.emulate(insn.left) ^ self.emulate(insn.right)
elif isinstance(insn, bn.MediumLevelILLsl):
# Work around arch plugin bug if not using fixed plugin
return z3.LShR(self.emulate(insn.left), self.emulate(insn.right))
elif isinstance(insn, bn.MediumLevelILLsr):
return z3.LShR(self.emulate(insn.left), self.emulate(insn.right))
elif isinstance(insn, bn.MediumLevelILOr):
return self.emulate(insn.left) | self.emulate(insn.right)
elif isinstance(insn, bn.MediumLevelILZx):
assert insn.size == 4 and insn.src.size == 1
return z3.ZeroExt(32-8, self.emulate(insn.src))
elif isinstance(insn, bn.MediumLevelILLowPart):
assert insn.size == 1 and insn.src.size == 4
src = self.emulate(insn.src)
return z3.Extract(7, 0, src)
elif isinstance(insn, bn.Constant):
return insn.constant
elif isinstance(insn, bn.Return):
retval = self.emulate(insn.src[0])
self.s.add(retval == 1)
return retval
else:
print("unhandled", insn.operation)
def solve_for_arg(self, mlil_bb):
retval = None
for insn in mlil_bb:
retval = self.emulate(insn)
print(self.s)
chk = self.s.check()
print(chk)
if chk == z3.sat:
res = self.s.model()[z3.BitVec("arg1#0", 4*8)]
return res
def handle_bin(bv: bn.BinaryView):
syms = bv.symbols[FUNCNAME]
assert len(syms) == 1
func = bv.get_function_at(syms[0].address)
assert func
mlil = func.mlil.ssa_form
assert len(list(mlil)) == 1 # one basic block
mlil_bb = list(mlil)[0]
emu = Emulator(bv)
res = emu.solve_for_arg(mlil_bb)
if res is not None:
return base64.b64encode(str(res).encode()).decode()
def handle_filename(filename):
filename = "/Users/zack/Downloads/blazing_etudes/" + filename
with bn.open_view(filename) as bv:
return handle_bin(bv)
TICKET = "<ticket>"
if sys.argv[1] == "TEST":
successes = 0
for file in glob.glob("/Users/zack/Downloads/blazing_etudes/*"):
with bn.open_view(file) as bv:
if handle_bin(bv) is not None:
successes += 1
else:
raise Exception("bug: " + file)
print("Test finished with", successes, "successes")
elif sys.argv[1] != "REMOTE":
with bn.open_view("/Users/zack/Downloads/blazing_etudes/alarming_shuffle") as bv:
print(handle_bin(bv))
else:
r = remote("blazin_etudes.satellitesabove.me", 5300)
r.expect("Ticket please:\n")
r.sendline(TICKET)
r.expect("followed by a newline\n")
wins = 0
while True:
data = r.recvline()
print(data)
if b"You did it" in data:
print(r.recvline())
r.interactive()
elif b"didn't exit happy" in data.lower():
wins -= 1
break
name, sha = data.strip().split()
assert b"." not in name and b"/" not in name
result = handle_filename(name.decode())
if result is not None:
wins += 1
r.sendline(result)
else:
print("Failed to solve for", name.decode())
break
print("Number of wins", wins)
r.close()
</...snip>