The skillset of performing binary analysis may to some appear to be limited to a few undeadly souls. While it may look like a form of dark arts when someone can read data structures in a raw hex dump, it shouldn’t even qualify as a party trick. To quote @BizTheDeveloper’s mother, “…reading a hex dump is not that hard…”
Now, the goal of this post is not to turn the reader into a hex dump magician. Instead, I want to show that binary analysis is all about data parsing. If you are a Go developer and are not interested in the analysis of Go binaries, this post will still have something for you. Did you know that “Go binary analysts” know if you organize your source code neatly or just dump everything into one file?
In this post, we will see how we can extract some of the available hidden metadata in the binaries produced by the Go compiler. With the extracted data, we will see a few use cases including how it can be used to determine if the application uses a vulnerable dependency. The goal is to be able to perform this on “production builds” so we are targeting support for stripped binaries. This means we can’t depend on debug information or symbols that normally are included in binaries produced by the compiler. This may look like something we would need but in fact, we actually don’t. Finally, let’s limit us to only using the standard library and optionally a “golang x” package.
The Debug Package
A package that everyone who wants to do some analysis of Go binaries should know about is the debug package in the standard library. The package provides sub-packages to parse ELF (Linux and Unix), PE (Windows), and Mach-O (macOS) files. In addition to this, it also has useful functions for parsing some of the data structures we need to process. One good thing with these functions is that they are not dependent on debug information or symbols, which means we can use them for stripped “production builds.” Finding the structures is a bit harder but unfortunately not something we can work around.
The metadata table we are going to focus on in this section is the PCLNTAB. The PCLNTAB was added in Go 1.2 and holds data needed for Go’s panic messages. The table is used to map between the program counter (location of an assembly instruction) and the source code file and line number, allowing a more developer-friendly panic message as it includes where in the source code the panic occurred. What is also part of this table is a list of all the function information, including functions that have been eliminated by the compiler as part of the dead code elimination step. The goal of parsing this table is to recover all the packages in the Go binary.
Before we can process the table, we need to first find it. This is very easy for ELF and Mach-O files because the table is located in its own section called .gopclntab. When it comes to PE files, the process is a bit less straightforward. The table is usually located in the .rdata or .text section of the PE file. The table starts with a magic value that can be used to locate the start of the table. For Go binaries compiled with 1.2 up to excluding 1.16 of the compiler, the magic value is 0xfffffffb. For files compiled with 1.16 and later the magic value is 0xfffffffa. To ensure the match is correct we can use the same checks that the parser function uses to check the table.
To parse the table, we will use the debug/gosym package. First, we need to create a LineTable with the function NewLineTable. Using the LineTable we can create a symbol table with the function NewTable. The NewTable function accepts a byte slice of the symbol table. This argument can be nil which is good since the symbol table is not available in stripped binaries.
The definition of the structure returned is shown in the code snippet below. The structure has really one field which is exported, Funcs which is a slice of Func.
The Func type holds information about a single function which includes the entry point and where it ends. These addresses are the memory locations when the file is loaded into memory for execution and not the offset from the beginning of the file. If the function has been eliminated by the compiler, the entry has a value of uint64(0).
The Func structure also includes a pointer to the underlying symbol. Via the Sym, we can access the name of the function via the method BaseName and the package it belongs to via the method PackageName. If the Func is a method attached to a type, the method ReceiverName returns the string name of the receiver. Otherwise, it returns an empty string.
Using this information, we can iterate through all functions to discover which packages are used and which functions are reachable (according to the Go compiler’s dead code elimination logic). Additional logic based on heuristics to determine if the package is part of the standard library, a dependency, or the main module can be used to sort the packages.
With the introduction of the module system, there is another way of enumerating the packages used when compiling a Go program. While this information is only available in binaries compiled with “go mod” enabled, the data is richer. The build info structure is available as a separate section in ELF and Mach-O files. The section name is .go.buildinfo. For PE files this data is stored inside one of the data sections. It’s a small data structure of 32-bytes. An example is shown in the code snippet below. The first 16-bytes is the structure header. It starts with a 14-byte magic value of 0xff Go buildinf:. The next byte is either 0x4 or 0x8 and gives the pointer size in bytes. The last byte in the header indicates the bit-endianness, a 0x0 means little-endian while a 0x1 means big-endian.
Following the header are two pointers to two Go strings. Under the hood, a Go string is a structure with a pointer to the start of the data as the first field and a length of the data as the second field. The first string is the version of the compiler that was used to compile the binary. The second string is the build information. This string is only available if the project was compiled with go modules enabled.
The runtime package has the logic to parse the build info string into the BuildInfo structure shown in the snippet below. The structure essentially holds the information from the go.mod file plus the checksums from go.sum.
The code in the runtime for parsing out this data structure has seen some changes in the last couple of months. The logic has also been added as part of a sub-package in the debug package that can be assumed to be part of release Go 1.18. One thing that has changed with the structure is the addition of the Settings field to hold more information about the build environment.
Until Go 1.18 is released, we can just copy the code from the runtime and use it to parse the string into a useful data structure. With this information, we get the versions and can easily detect which packages are part of a dependency module.
In this section, we are going to see how the extracted information can be used to design a vulnerability scanner. There is currently a project of developing a vulnerability database and code for checking against it. The goal with this scanner is instead of working with the source code, it is to work with the compiled artifacts allowing users of a Go application to check it for vulnerability. For our example, we will use a vulnerability reported in the gopkg.in/yaml.v2 module. The code snippet below shows the data available in the vulnerability database.
In addition to the module name gopkg.in/yaml.v2 we also have the fixed version v2.2.3 and the affected symbol decoder.unmarshal. What we can decipher from this is that the unmarshal method on the decoder type prior to v2.2.3 is vulnerable. With this information we can construct the following logic to check for this vulnerability:
- Extract the build information to see if the binary uses a version earlier than v2.2.3. If not, report as not vulnerable.
- Search for a function with the package name of gopkg.in/yaml.v2, the receiver of decoder, and the name of unmarshal.
- Check if the found function has a non-nil Entry field. If it does, report as vulnerable. Otherwise, report as not vulnerable.
Now, this approach is not perfect. We are standing on the shoulders of the Go compiler’s dead-code elimination logic to remove code that’s not reachable. It is possible that the code is never executed in the binary but the compiler failed to eliminate it, which would result in a false positive. It is possible to reduce this false positive further but it would need to create call graphs to see if the code is reachable. This is out of scope for this post.
Source Code Map
With the data extracted using the debug package, there are some other interesting things we can do. Remember that the PCLN table is used to map a process counter to a specific line in a source code file. This means that the binary has information about the structure of the source code. What we just need to do is extract it and present it in a friendlier way. As I have described the process in a previous blog post, this will be a summarized version.
In an earlier section, the Func type was introduced that holds symbol information about a function. Two of the fields are pointers to where the code of the function starts and ends. This means we know where the first instruction starts and the last instruction ends for the method. The Table structure has a method for resolving a process counter to both a line number and a source code file name. This gives us of way to determine the first and last line number of a function in the source code. One may naively assume that we can just put the entry and the end field values into the function and it will return what we want. Unfortunately, this is not the case. The entry works fine but issues sometimes occur with the last instruction. The compiler adds code to the end of each function (this code requests more stack space) which can throw off the information. So the best we can do is to estimate where the end is by checking all instructions in the function and assuming that the largest line number in the same file as the starting line is the end of the function. This method isn’t perfect as inline functions can break this assertion.
The next step is to get the location of each instruction in the function. For x86 this isn’t straightforward because they can be one to 15 bytes long. This means that the only way of getting the location of each instruction is to disassemble it. The Go team maintains a package named arch that can disassemble x86, ARM, and PowerPC. The package is used by the Go tool objdump. Another hack that can be used is to just assume that each instruction is say for example four bytes long and resolve the line for every four-bytes. The process counter to line mapping function does not care if the passed in process counter is right in the middle of an instruction…
With the file names and line numbers extracted we can just organize the data and present it. Here is for example the extracted data for a gofmt binary. It only shows the data for the main module. The first line gives the name of the package and the path to the folder when it was compiled. Next, each file is listed. Under each file, the function is listed in the sorted order of the starting line number. The function line also includes an estimated ending line of the function and estimated lines of code; the first line is the function definition. In the code snippet below we can see that one file is named <autogenerated>. This is for code generated by the compiler. Another thing we can see in the output is some functions with dwrap in the name, for example, processFile·dwrap·1. These are functions generated by the compiler for defer calls.
One may wonder what this information can be used for. One thing it can be used for is to detect changes in Go applications. Another place where it is useful is for the analysis of suspicious Go binaries by for example identifying if the application is an open-source program or malware. The snippet below is from a ransomware called Snatch that has been around for a few years. From the function names, we can get an idea of what the binary might be doing. We see both function names that suggest scanning folders for files and encrypting. One thing that this ransomware does is to install itself as a Windows service that is started in Safe boot mode. After it has installed itself as a service, it reboots the machine into safe mode. We can see function names in the output that suggest this behavior.
There is much more metadata, for example, types and the build-id, that can be extracted from Go binaries. If we were to cover all of it, this post would be way too long. Hopefully, this has shown that Go binaries are rich with metadata and that binary analysis isn’t that bad. For readers that would like to do some of their own binary analysis but do not want to write all the code to extract the data, there are libraries available. The Go Reverse Engineering Tool Kit can extract all the data covered in this post, allowing just your imagination with what to do with the data.