Ghidra is a software reverse engineering (SRE) framework developed by NSA's Research Directorate. This framework includes a suite of full-featured, high-end software analysis tools that enable users to analyze compiled code on a variety of platforms including Windows, MacOS, and Linux. Capabilities include disassembly, assembly, decompilation, graphing, and scripting, along with hundreds of other features. Ghidra supports a wide variety of processor instruction sets and executable formats and can be run in both user-interactive and automated modes. Users may also develop their own Ghidra plug-in components and/or scripts using the exposed API. In addition there are numerous ways to extend Ghidra such as new processors, loaders/exporters, automated analyzers, and new visualizations.
In support of NSA's Cybersecurity mission, Ghidra was built to solve scaling and teaming problems on complex SRE efforts, and to provide a customizable and extensible SRE research platform. NSA has applied Ghidra SRE capabilities to a variety of problems that involve analyzing malicious code and generating deep insights for NSA analysts who seek a better understanding of potential vulnerabilities in networks and systems.
Ghidra 9.2 is fully backward compatible with project data from previous releases. However, programs opened in 9.2 may no longer be accessible by an earlier Ghidra version if the processor model has been updated. A processor version number mismatch error is displayed if this occurs. In almost all cases, it is better to use the latest version than to attempt to use both Ghidra 9.2 and a previous release, unless absolutely necessary.
This release includes many new features and capabilities, performance improvements, quite a few bug fixes, and many pull-request contributions. Thanks to all those who have contributed their time, thoughts, and code. The Ghidra user community thanks you too!
NOTE: Ghidra Server: The Ghidra 9.0 server is compatible with Ghidra 9.x clients, however starting with 9.1 the server requires clients to use a TLS secure connection for the initial RMI registry port access. If the Ghidra multi-user server is upgraded to 9.2, then all clients must upgrade to 9.2. A 9.x Ghidra client will fall back to a non-TLS connection when accessing the RMI Registry on a 9.0 server. Note that all other server interaction including authentication were and continue to be performed over a secure TLS connection.
Minor Note: FIDB Files: If a processors instruction implementation has changed significantly, any generated .fidb files using that processor definition may need to be regenerated. Changes that could require regeneration include, change in instruction size, number of operands, the nature of the operands, changes in register decoding for an operand. The x86-64bit has had such changes, for example there were changes to the decoded register for many instructions with prefix byte overrides. All the provided .fidb files have been regenerated, and new ones for VS 2017/2019 have been added.
Minor Note: SLA Files: Ghidra-compiled .sla files are not always backwards compatible due to changes in the underlying .sla specification. In the prebuilt Ghidra, all .sla files are rebuilt from scratch. However if you have local processor modules, or are building Ghidra from scratch, you may need to do a clean build. Any processor modules with changes are normally recompiled at Ghidra startup so this situation is rare.
Minor Note: AARCH64 Long: The size of a long on the AARCH64 has been changed from 4-bytes to 8-bytes in the data organization within the compiler specification. This change could have ramifications in existing AARCH64 programs using a long within data structures or custom storage of function parameters (dynamic storage should not be an issue). An included script FixupCompositeDataTypesScript can be run on programs, only with exclusive checkout in Multi-User, where the datatype sizes for long has changed. This general script can be used whenever a program's base datatypes have changed in the compiler specification, which should be rare occurrence.
Ghidra has been integrated with an open source graph visualization package, called JUNGGRAPHT, to display interactive block graphs, call graphs, AST control flow graphs, as well as a general API to create graphs within plug-ins and scripts. Prior to initial public release, graphing had been provided by a legacy graphing package which was unreleasable publicly due to licensing issues.
Graphs are displayed in a new tabbed graph window. Current location and selection of vertices are kept in sync with other information displays such as the listing and decompiler. Each graph can be filtered and visualized with various layout algorithms to examine the program structure. In addition, Graphs can be exported in several standard graph formats, such as CSV, GRAPHML, GML, JSON, and VISIO. The exported file can then be imported into external tools.
The graphing capability is implemented by a general service mechanism allowing other graph providers to be implemented to support a favorite graphing tool, however, users will most likely be satisfied with the new default implementation. There will be follow up capabilities such as graph specific popup actions on the the nodes and edges that can be added by the creator of the graph before display. As in everything, the Ghidra team is interested in any feedback you might provide on this new capability.
Added a new platform-independent PDB Reader/Analyzer/Loader that has the ability to process raw PDB files and apply extracted information to a program. Written in Java, PDBs can be utilized on any supported platform, not just on Windows as in prior Ghidra versions. PDBs can be applied during analysis or by loading and applying the PDB before analysis. Information from PDBs can be force-loaded into a program with a mismatched PDB signature, which is very useful for extracting datatypes to be used with the program from a PDB related to that program. Loading the PDB utilizes a new underlying Universal Reader API.
The PDB Reader and Analyzer capabilities are an evolutionary development and are expected to be expanded in future releases. We expect to improve this feature over time, adding to its capabilities and fixing bugs. If the new PDB Analyzer causes issues, you can turn it off and use the original PDB Analyzer.
A change to scripting brings a powerful form of dynamic extensibility to Ghidra scripting, where Java source code is (re)compiled, loaded, and run without exiting Ghidra. When a script grows large or requires external dependencies, it might be worth the effort to split up code into modules. To support modularity while preserving the dynamic nature of scripts, Ghidra uses OSGi. The new feature provides better script change detection, external jar dependencies, script lifecycle management, and modularity.
To find out more, bring up Help contents in Ghidra, and search for OSGi or Bundles.
There have been numerous changes to the decompiler addressing quality, readability, and usability. Decompilation has been improved by:
The decompiler GUI as also been enhanced with the addition of multiple highlights of varying color, called secondary highlights. In addition, the Decompiler's Auto Create/Fill Structure commands incorporate datatype information from function prototypes and will override undefined or more general datatypes with discovered datatypes that are more specific.
There is rewritten more comprehensive Decompiler documentation too!
There have been major performance improvements in both analysis and the display or filtering of information within GUI components. These changes are most notable on large binaries, with reports of improvements from 24 plus hours to under an hour for analysis. Some operations were done very inefficiently such that the end user might give up on analysis. Please report if you notice any severe performance issues or binaries that take a large amount of time to process. If you can find an example binary that is easily obtainable that reproduces the issue, the root cause can be identified and hopefully improved. There are some continued sore performance areas we are still working such as the non-returning function analyzer. We hope you will find the binary analysis speed and interactivity much improved.
Some specific areas of improvement are binaries with rich datatype information, RTTI information, exception records, large number of bytes, large number of defined symbols, and many symbols at a single address.
Function Identification databases have been recreated from scratch, including new information for Visual Studio 2017 and 2019 libraries. The databases have been cleaned and should overall result in more matches with fewer mis-matched or multiple matches for identified functions. In addition the FID libraries had to be rebuilt from scratch due to errors or differences in instruction set decode (especially in the 64-bit X86) with prior versions of Ghidra. The FID is sensitive to the actual instruction bytes, the mnemonic, register, and number of operands.
There are several new improvements that have been identified that will be added in a future release. Until then to get an even better increased positive match rate, turn on the Shared Return Calls Analyzer option Assume Contiguous Functions Only, and possibly Allow Conditional Jumps. For normal clean non-heavily optimized, non-malware or obfuscated binaries, these options should cause few issues.
Both GNU and Microsoft symbol demangling has been greatly improved resulting in fewer unmangled symbols with better function signature recovery.
Several new processor specifications have been added, from very old processors to more recent: CP1600, M6809, M8C, RISC-V, V850.
Note: the Elan EM78xxx just missed the 9.2 cutoff, but should appear shortly.
Many improvements and bug fixes have been made to existing processor specifications: ARM, AARCH64, AVR8, CRC16C, PIC24/30, SH2, SH4, TriCore, X86, XGATE, 6502, 68K, 6805, M6809, 8051, and others. Of note, the AARCH64 has been updated to support all v8.6 spec instructions. Many improvements have been contributed by the Ghidra community, while others were discovered and fixed using a currently internal tool which automates fuzzing of individual instructions against an external emulator or debugger. We hope to put the tool out in a near term future release.
Minor changes have been made to the build process of the Sleigh Editor. For those trying to build it from scratch the instructions are a little clearer and should work correctly. In addition the new POPCOUNT operator is supported. For those modifying or studying sleigh processor specifications, who were unaware of the Sleigh Editor, we encourage you to give it a try. We suggest you install/run the Sleigh Editor in a separate Eclipse installation, possibly the Eclipse you use with the Ghidra runtime, from the one you are using with the entire Ghidra source code base imported. To find out more read the GhidraSleighEditor_README.html.
The External Disassembler is a plug-in useful when developing or trouble-shooting sleigh processor specifications. It is part of the Xtra SleighDevTools project. The plug-in integrates with an external disassembler such as binutils, and provides a code browser field that displays the disassembly from an external disassembler, such as bintutils, at each instruction or undefined byte in the listing. The only external disassembler integration provided is binutils, however it is possible to add support for additional external disassemblers. Previously the External Disassembler had trouble with instruction sets which have an alternate mode set of instruction such as Thumb or MicroMips. The working aide field has new configuration files to feed different options to the external disassembler to choose the correct alternate encoding set. This also works well with several scripts that also aide in processor development such as the CompareSleighExternal script.
A new p-code operation POPCOUNT is supported in sleigh processor specifications. POPCOUNT was mainly added to deal with instructions that needed to compute the parity of an operation. In addition, the Sleigh compiler error messages have been reworked to be more comprehensible, consistent in format layout, and to provide correct line numbers as close to the error as possible. In addition, several cases have been caught during compilation that previously would pass compilation but cause issues during use of the processor.
The debugger is very much still in progress. You may have seen some commits, in the Ghidra GitHub master branch, to get in sync with the debugger. Stay tuned for more on the Dynamic Analysis Framework soon after the 9.2 release.
Numerous other bug fixes and improvements are fully listed in the ChangeHistory file.
Minor Note: Ghidra compiled .sla files are not backwards compatible due to the newly added OTHER space for syscalls support. In the prebuilt Ghidra all .sla files are rebuilt from scratch. However if you have local processor modules, or are building Ghidra from scratch, you may need to do a clean build. You will get an error if an old .sla file is loaded without recompilation of the .slaspec file. Any processor modules with changes are normally recompiled at Ghidra startup so this situation is rare.
Bitfields within structures are now supported as a Ghidra datatype. Bitfield definitions can come from PDB, DWARF, parsed header files, and can also be created within the structure editor. All Datatype archives delivered with Ghidra have been reparsed to capture bitfield information. In addition, compiler bitfield allocation schemes have been carefully implemented. Full support for bitfield references within the decompiler is planned for a future release.
In support of creating bitfields within structures, a new bitfield editor within the structure editor has been added. The Bitfield Editor includes a visual depiction of the datatype byte layout and the associated bits. The BitField Editor simplifies the creation of bitfields within a structure.
Ghidra now supports overriding indirect calls, CALLOTHER p-code ops, and conditional jumps via new overriding references. These references can be used to achieve correct decompilation of syscall-like instructions. A new script, ResolveX86orX64LinuxSyscallsScript, has been provided as part of this initial implementation. Future releases will automatically identify and apply system calls for other operating systems and versions.
To support system calls, the decompiler follows references into OTHER address space overlays. This allows users to create address spaces on the fly without worrying about conflicts with existing spaces. For example, instructions with a unique calling convention can be properly handled by adding a reference to a custom function signature.
A new set of tools designed to make processor specifications easier to create, modify, and validate have been added. The tools consist of a context sensitive Sleigh file editor, a p-code validation framework, an external disassembler field, and several scripts to make development easier. The Sleigh editor is a plug-in for Eclipse and provides modern editor features such as syntax coloring, hover, navigation, code formatting, validation, reference finding, and error navigation. The test suite emulates the p-code to automatically validate the instructions most commonly used by the compiler for that processor.
DYLD shared cache images, extracted from an iOS image, can now be imported in their entirety. A DYLD's embedded DYLIB's are split into memory blocks, greatly enhancing follow-on analysis. Internal Macho headers are retained and marked up similarly to ELF and PE files, which includes tracking the origin of the program bytes from the initial import binary.
The Ghidra server now requires the client to use a TLS secure connection for the initial RMI registry port access. Previously, TLS was used for all remote object interactions and data transfers on the two other ports. This change will now ensure that all connections to the Ghidra Server utilize TLS. As noted above a 9.1 clients can connect to a 9.0 or 9.1 server, while clients prior to 9.1 will be unable to connect to a 9.1 server.
The Ghidra server has two additional authentication methods, Active Directory using Kerberos and Pluggable Authentication Modules (PAM) using JAAS. To utilize these new methods you must configure the server.conf file and use either -a1 for windows authentication or -a4 along with -jaas. The JAAS mode will require setup of an additional configuration file (jaas.conf).
When importing files, the origin of all imported bytes can be tracked back to their offset within the original binary source. This change lays the ground work for exporting back to the original file after modifying the bytes. There are programmer API methods to get the bytes either from the memory block or the underlying original source bytes. To see the original bytes a memory block can be mapped onto the original filebytes. The source of each memory block within the memory map is shown in a new Byte Source column. When hovering on the bytes in the program listing, the origin of the bytes at that address are displayed.
The decompiler now implements a more detailed analysis of local variables on the stack. This change resolves many problems with disappearing structure initialization and incorrect dead code removal.
The decompiler now generates fewer duplicate assignments. For example, repeated assignment of the same value to a variable in two branches will now appear before either branch is taken.
In addition the decompiler now recognizes more optimization patterns used by compilers for signed division, resulting in simplified decompilation.
AARCH64-based binary decompilation will be cleaner due to better handling of zero extensions into larger registers. This improves data flow analysis and primarily affects functions using floating point Neon instructions.
Renaming a parameter in the decompiler will no longer commit the datatypes of all parameters, allowing datatypes to continue to "float" without getting locked into a potentially incorrect initial datatype. In addition, the cumbersome warning dialog for renaming and retyping has been removed, improving your RE workflow.
There are many new processor specifications including SuperH4, MCS-96, HCS12X/XGATE, HCS08, and user-contributed specifications for MCS-48, SuperH1/2a, and Tricore.
The 16-bit x86 processor specification has been reworked to include protected mode addressing, which the NE loader now uses by default. Handling of segmented or paged memory has been updated to use a newer scheme, hiding its complications from decompilation results. The implementation handles the HCS12X paging scheme as well.
Many improvements and bug fixes have been made to existing processor specifications: ARM, AARCH64, PIC, 68K, MIPS, PPC, JVM, Sparc, AVR8, 8051, 6502, and others.
Numerous other bug fixes and improvements are fully listed in the ChangeHistory file.
In case you missed it, in March 2019, a public version of Ghidra was released for the first time. Soon after, the full buildable source was made available as an open source project on the NSA GitHub page. The response from the Ghidra Open Source community has been overwhelmingly positive. We welcome contributions from GitHub including bug fixes, requests, scripts, processor modules, and plug-ins.
Bug fixes and improvements for 9.0.x are listed in the Change History file.