1# Rust Codegen23The first phase in debug info generation requires Rust to inspect the MIR of the program and4communicate it to LLVM. This is primarily done in [`rustc_codegen_llvm/debuginfo`][llvm_di], though5some type-name processing exists in [`rustc_codegen_ssa/debuginfo`][ssa_di]. Rust communicates to6LLVM via the `DIBuilder` API - a thin wrapper around LLVM's internals that exists in7[rustc_llvm][rustc_llvm].89[llvm_di]: https://github.com/rust-lang/rust/tree/main/compiler/rustc_codegen_llvm/src/debuginfo10[ssa_di]: https://github.com/rust-lang/rust/tree/main/compiler/rustc_codegen_ssa/src/debuginfo11[rustc_llvm]: https://github.com/rust-lang/rust/tree/main/compiler/rustc_llvm1213# Type Information1415Type information typically consists of the type name, size, alignment, as well as things like16fields, generic parameters, and storage modifiers if they are relevant. Much of this work happens in17[rustc_codegen_llvm/src/debuginfo/metadata][di_metadata].1819[di_metadata]: https://github.com/rust-lang/rust/blob/main/compiler/rustc_codegen_llvm/src/debuginfo/metadata.rs2021It is important to keep in mind that the goal is not necessarily "represent types exactly how they22appear in Rust", rather it is to represent them in a way that allows debuggers to most accurately23reconstruct the data during debugging. This distinction is vital to understanding the core work that24occurs on this layer; many changes made here will be for the purpose of working around debugger25limitations when no other option will work.2627## Quirks2829Rust's generated DI nodes "pretend" to be C/C++ for both CDB and LLDB's sake. This can result in30some unintuitive and non-idiomatic debug info.3132### Pointers and Reference3334Wide pointers/references/`Box` are treated as a struct with 2 fields: `data_ptr` and `length`.3536All non-wide pointers, references, and `Box` pointers are output as pointer nodes, and no37distinction is made between `mut` and non-`mut`. Several attempts have been made to rectify this,38but unfortunately there is not a straightforward solution. Using the `reference` DI nodes of the39respective formats has pitfalls. There is a semantic difference between C++ references and Rust40references that is unreconcilable.4142>From [cppreference](https://en.cppreference.com/w/cpp/language/reference.html):43>44>References are not objects; **they do not necessarily occupy storage**, although the compiler may45>allocate storage if it is necessary to implement the desired semantics (e.g. a non-static data46>member of reference type usually increases the size of the class by the amount necessary to store47>a memory address).48>49>Because references are not objects, **there are no arrays of references, no pointers to references, and no references to references**5051The current proposed solution is to simply [typedef the pointer nodes][issue_144394].5253[issue_144394]: https://github.com/rust-lang/rust/pull/1443945455Using the `const` qualifier to denote non-`mut` poses potential issues due to LLDB's internal56optimizations. In short, LLDB attempts to cache the child-values of variables (e.g. struct fields,57array elements) when stepping through code. A heuristic is used to determine which values are safely58cache-able, and `const` is part of that heuristic. Research has not been done into how this would59interact with things like Rust's interior mutability constructs.6061### DWARF vs PDB6263While most of the type information is fairly straight forward, one notable issue is the debug info64format of the target. Each format has different semantics and limitations, as such they require65slightly different debug info in some cases. This is gated by calls to66[`cpp_like_debuginfo`][cpp_like].6768[cpp_like]: https://github.com/rust-lang/rust/blob/main/compiler/rustc_codegen_ssa/src/debuginfo/type_names.rs#L8136970### Naming7172Rust attempts to communicate type names as accurately as possible, but debuggers and debug info73formats do not always respect that.7475Due to limitations in MSVC's expression parser, the following name transformations are made for PDB76debug info:7778| Rust name | MSVC name |79| --- | --- |80| `&str`/`&mut str` | `ref$<str$>`/`ref_mut$<str$>` |81| `&[T]`/`&mut [T]` | `ref$<slice$<T> >`/`ref_mut$<slice$<T> >`[^1] |82| `[T; N]` | `array$<T, N>` |83| `RustEnum` | `enum2$<RustEnum>` |84| `(T1, T2)` | `tuple$<T1, T2>`|85| `*const T` | `ptr_const$<T>` |86| `*mut T` | `ptr_mut$<T>` |87| `usize` | `size_t`[^2] |88| `isize` | `ptrdiff_t`[^2] |89| `uN` | `unsigned __intN`[^2] |90| `iN` | `__intN`[^2] |91| `f32` | `float`[^2] |92| `f64` | `double`[^2] |93| `f128` | `fp128`[^2] |9495[^1]: MSVC's expression parser will treat `>>` as a right shift. It is necessary to separate96consecutive `>`'s with a space (`> >`) in type names.9798[^2]: While these type names are generated as part of the debug info node (which is then wrapped in99a typedef node with the Rust name), once the LLVM-IR node is converted to a CodeView node, the type100name information is lost. This is because CodeView has special shorthand nodes for primitive types,101and those shorthand nodes to not have a "name" field.102103### Generics104105Rust outputs generic *type* information (`T` in `ArrayVec<T, N: usize>`), but not generic *value*106information (`N` in `ArrayVec<T, N: usize>`).107108CodeView does not have a leaf node for generics/C++ templates, so all generic information is lost109when generating PDB debug info. There are workarounds that allow the debugger to retrieve the110generic arguments via the type name, but it is fragile solution at best. Efforts are being made to111contact Microsoft to correct this deficiency, and/or to use one of the unused CodeView node types as112a suitable equivalent.113114### Type aliases115116Rust outputs typedef nodes in several cases to help account for debugger limitiations, but it does117not currently output nodes for [type aliases in the source code][type_aliases].118119[type_aliases]: https://doc.rust-lang.org/reference/items/type-aliases.html120121### Enums122123Enum DI nodes are generated in [rustc_codegen_llvm/src/debuginfo/metadata/enums][di_metadata_enums]124125[di_metadata_enums]: https://github.com/rust-lang/rust/tree/main/compiler/rustc_codegen_llvm/src/debuginfo/metadata/enums126127#### DWARF128129DWARF has a dedicated node for discriminated unions: `DW_TAG_variant`. It is a container that130references `DW_TAG_variant_part` nodes that may or may not contain a discriminant value. The131hierarchy looks as follows:132133```txt134DW_TAG_structure_type (top-level type for the coroutine)135 DW_TAG_variant_part (variant part)136 DW_AT_discr (reference to discriminant DW_TAG_member)137 DW_TAG_member (discriminant member)138 DW_TAG_variant (variant 1)139 DW_TAG_variant (variant 2)140 DW_TAG_variant (variant 3)141 DW_TAG_structure_type (type of variant 1)142 DW_TAG_structure_type (type of variant 2)143 DW_TAG_structure_type (type of variant 3)144```145146#### PDB147PDB does not have a dedicated node, so it generates the C equivalent of a discriminated union:148149```c150union enum2$<RUST_ENUM_NAME> {151 enum VariantNames {152 First,153 Second154 };155 struct Variant0 {156 struct First {157 // fields158 };159 static const enum2$<RUST_ENUM_NAME>::VariantNames NAME;160 static const unsigned long DISCR_EXACT;161 enum2$<RUST_ENUM_NAME>::Variant0::First value;162 };163 struct Variant1 {164 struct Second {165 // fields166 };167 static enum2$<RUST_ENUM_NAME>::VariantNames NAME;168 static unsigned long DISCR_EXACT;169 enum2$<RUST_ENUM_NAME>::Variant1::Second value;170 };171 enum2$<RUST_ENUM_NAME>::Variant0 variant0;172 enum2$<RUST_ENUM_NAME>::Variant1 variant1;173 unsigned long tag;174}175```176177An important note is that due to limitations in LLDB, the `DISCR_*` value generated is always a178`u64` even if the value is not `#[repr(u64)]`. This is largely a non-issue for LLDB because the179`DISCR_*` value and the `tag` are read into `uint64_t` values regardless of their type.180181# Source Information182183TODO
Findings
✓ No findings reported for this file.