During my internship, I reverse engineer macOS programs. So, I need a debugger.
I am quite familiar with GDB, and there are nice extensions like GEF out there. However, Apple made it so difficult to compile a usable GDB on macOS… 😑 We are pretty much forced to use LLDB (which I really don’t like).
Anyways, beggars can’t be choosers, and vanilla GDB/LLDB is not really usable in the long run, so being able to use the scripting interface is very important. Similar to GDB, LLDB also has a Python scripting interface. Instead of .gdbinit, LLDB loads commands from .lldbinit during startup. There is a great lldbinit project that contains many custom commands which makes LLDB so much nicer to use, and I have a fork of it.
(It is such a pain to even set breakpoints, vanilla LLDB is just not usable imo.)
LLDB Architecture and Python Bindings
Having the lldbinit extension is good. Being able to add my own custom commands is even better. To do so, I had to understand the architecture of the scripting interface.
The LLDB scripting interface is quite tidy. Everything is grouped into modules. Quoting the docs, here are less than half of the modules:
SBAddress
- A section + offset based address class.
SBBreakpoint
- Represents a logical breakpoint and its associated settings.
SBBreakpointList
- Proxy of C++
lldb::SBBreakpointList
class
- Proxy of C++
SBBreakpointLocation
- Represents one unique instance (by address) of a logical breakpoint.
SBCommandInterpreter
SBCommandInterpreter
handles/interprets commands for lldb.
SBCommandReturnObject
- Represents a container which holds the result from command execution.
Feels like writing C++ but in Python. And the basic architecture is as follows:
LLDB design:
------------|
lldb -> debugger -> target -> process -> thread -> frame(s)
-> thread -> frame(s)
LLDB
talks to thedebugger
objectdebugger
holds atarget
target
holds aprocess
process
holds multiplethreads
- and lastly, each
thread
has one or moreframe
s
With more details (quoting the docs):
SBTarget
: Represents the target program running under the debugger.- Gives information about the executable, process, modules, memory, breakpoints, etc
- There’s a lot, check the docs
SBProcess
: Represents the process associated with the target program.- Gives information about the process, memory
- Some overlap with the above, but this one doesn’t have modules nor breakpoints
- Check the docs
SBThread
: Represents a thread of execution.- Gives information about a thread, e.g. thread ID
- Exposes functions for stepping (step in, step over, etc) and suspending/resuming
- Contains stack frame(s) (according to docs, it is possible to have more than 1, but I always see just 1)
- Check out the docs
SBFrame
: Represents one of the stack frames associated with a thread.- Gives information about a stack frame, e.g. registers, functions, symbols, disassembly, etc
- This is a really useful module because of the information it gives.
- Check out the docs
Now, some useful functions to access the objects mentioned above (defined by lldbinit):
SBTarget
-get_target()
SBProcess
-get_process()
SBThread
-get_thread()
SBFrame
-get_frame()
Yea, quite easy.
Create Custom Commands
To define a new command, it is as simple as creating a function in lldbinit.py. For example, to create a command called newcmd
:
def cmd_newcmd(debugger, command, result, _dict):
args = command.split(' ')
if len(args) < 1:
print('newcmd <expression>')
return
...
Then, the function must be registered as a command in the __lldb_init_module
method:
def __lldb_init_module(debugger, internal_dict):
...
ci.HandleCommand("command script add -f lldbinit.cmd_newcmd newcmd", res)
...
The function name can be anything, but by convention all command functions in lldbinit go by cmd_<command name>
.
The function arguments result
and _dict
might be useful but I don’t use them. debugger
is the LLDB debugger object mentioned earlier, and command
is the exact command string entered by the user.
We can use the API listed above to obtain information about the target/process/thread/frame, or perform actions such as setting breakpoints, stepping through instructions, etc.
Example 1: Reading Memory
Here’s a simple command I wrote to print unicode strings from memory (similar to x/s
but printing unicode strings).
def cmd_xu(debugger, command, result, _dict):
args = command.split(' ')
if len(args) < 1:
print('xu <expression>')
return
addr = int(get_frame().EvaluateExpression(args[0]).GetValue(), 10)
error = lldb.SBError()
ended = False
s = u''
offset = 0
while not ended:
mem = get_target().GetProcess().ReadMemory(addr + offset, 100, error)
for i in range(0, 100, 2):
wc = mem[i+1] << 8 | mem[i]
s += chr(wc)
if wc == 0:
ended = True
break
offset += 100
print(s)
Example 2: Alias
It is also possible to make aliases for long commands. For example, an alias for disabling breakpoints, through the SBCommandInterpreter
object obtained via the debugger.GetCommandInterpreter()
method.
# disable breakpoint number
def cmd_bpd(debugger, command, result, dict):
res = lldb.SBCommandReturnObject()
debugger.GetCommandInterpreter().HandleCommand("breakpoint disable " + command, res)
print(res.GetOutput())
Example 3: Function Tracing
Lastly, here is an example of a more complicated command I wrote to get the list of functions called by the target. This is useful when I am attaching LLDB to Safari, and want to know the functions in a library that were called by Safari when loading a webpage.
For example, to see the functions in CoreGraphics
called when browsing Wikipedia:
(lldbinit) cz CoreGraphics
[+] Creating breakpoints for all symbols in CoreGraphics
[+] Done creating breakpoints for all symbols in CoreGraphics
0x7fff2505d324:
| CoreGraphics CGColorSpaceUsesExtendedRange
|__ WebKit WebKit::ShareableBitmap::calculateBytesPerRow(WebCore::IntSize, WebKit::ShareableBitmap::Configuration const&)
0x7fff2504a997:
| CoreGraphics CGColorSpaceGetNumberOfComponents
|__ QuartzCore CABackingStorePrepareUpdates_(CABackingStore*, unsigned long, unsigned long, unsigned int, unsigned int, unsigned int, unsigned long long, CA::GenericContext*, UpdateState*)
0x7fff250496fc:
| CoreGraphics CGColorSpaceRetain
|__ QuartzCore CA::CG::IOSurfaceDrawable::IOSurfaceDrawable(__IOSurface*, unsigned int, unsigned int, CGColorSpace*, int, int, unsigned int, unsigned int)
0x7fff2549cd24:
| CoreGraphics CFRetain
|__ CoreGraphics CGColorSpaceRetain
0x7fff2504971c:
| CoreGraphics cs_retain_count
|__ CoreFoundation _CFRetain
0x7fff2504e3d8:
| CoreGraphics CGColorSpaceRelease
|__ QuartzCore CABackingStorePrepareUpdates_(CABackingStore*, unsigned long, unsigned long, unsigned int, unsigned int, unsigned int, unsigned long long, CA::GenericContext*, UpdateState*)
0x7fff2505fdd1:
| CoreGraphics CGSNewEmptyRegion
|__ QuartzCore CABackingStorePrepareUpdates_(CABackingStore*, unsigned long, unsigned long, unsigned int, unsigned int, unsigned int, unsigned long long, CA::GenericContext*, UpdateState*)
...
def cmd_cz(debugger, command, result, _dict):
args = command.split(' ')
if len(args) < 1:
print('cov <module name>')
return
module_name = args[0]
target = debugger.GetSelectedTarget()
module = find_module_by_name(get_target(), module_name)
# to keep track of the breakpoints set
bpmap = {}
print("[+] Creating breakpoints for all symbols in", module_name)
for symbol in module:
sym_name = symbol.GetName()
if sym_name.startswith("os") or "pthread" in sym_name or "lock" in sym_name or "operator" in sym_name:
continue
address = symbol.GetStartAddress().GetLoadAddress(target)
bp = target.BreakpointCreateByAddress(address)
bpmap[address] = bp
print("[+] Done creating breakpoints for all symbols in", module_name)
visited = []
while True:
get_process().Continue()
thread = get_thread()
rip = int(str(get_frame().reg["rip"].value), 16)
if rip in visited:
continue
if rip not in bpmap.keys():
print("[+] Dead") # crashed or something
break
# disable breakpoint after reaching it one time
bpmap[rip].SetEnabled(False)
print(hex(rip) + ":")
for i in range(2):
frame = thread.GetFrameAtIndex(i)
symbol = frame.GetSymbol()
module = frame.GetModule().GetFileSpec().GetFilename()
print("|" + "__" * i, module, symbol.GetName())
In this example, I created a breakpoint using target.BreakpointCreateByAddress(address)
, by using the SBSymbol
methods to get a function’s address in the process. I also used the lldbinit helper function find_module_by_name
to get a SBModule
object given a module name.
There were more unmentioned API calls used by this command, read the code to learn more if you are interested.
That’s all. Hope you find this useful :D
References: