Created
October 3, 2024 17:15
-
-
Save mav3ri3k/88be5d6f1e8b9ddbe04d536c9a9d7e16 to your computer and use it in GitHub Desktop.
Final report for GSOC24 project: Sandboxed and Deterministic Proc Macro using Wasm
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Sandboxed and Deterministic Proc Macro using Wasm | |
| This branch: 'gsoc24' is the final snapshot for state of code during the end of gsoc project: [Sandboxed and Deterministic Proc Macro using Wasm](https://summerofcode.withgoogle.com/programs/2024/projects/kXG0mZoj) | |
| The initial goal for the project was to: | |
| Add experimental support to rustc for building and running procedural macros as WebAssembly. Procedural Macro crates can opt in for being compiled to WebAssembly. This wasm-proc-macro will be a wasm blob sandboxed using WASM. It will interact with the compiler only through a stream of token and no ability to interact with the outside world. | |
| The project was inspired by: [Build-time execution sandboxing](https://github.com/rust-lang/compiler-team/issues/475) and [Pre-RFC: Sandboxed, deterministic, reproducible, efficient Wasm compilation of proc macros](https://internals.rust-lang.org/t/pre-rfc-sandboxed-deterministic-reproducible-efficient-wasm-compilation-of-proc-macros/19359) | |
| I started with the project in a top down fashion such that I started with the problem and slowly chipping away at it as required. | |
| **Fire first aim later**. | |
| Few notable decisions were taken early on based on discussions and advise from other community members so as to only focus on | |
| `wasm32-unknown-unknown` as the wasm target and focus on using Wasmtime as the runtime. | |
| Over the course of the project it was establised that reaching the original goal would not be feasible within the | |
| time constrains of a medium project and thus was much reduces. | |
| ## Noteable Content | |
| For exploring project and work done, it is advised to check the [gsoc24 branch diff](https://github.com/rust-lang/rust/compare/master...mav3ri3k:rust:gsoc24). | |
| Also please find summary of relevant folders/files below. | |
| ```sh | |
| Rust | |
| │ | |
| ├─▶ Compile | |
| │ │ | |
| │ └▶ rustc_metadata: Used for loading wasm proc macro: | |
| │ │ `.wpm` file and loading metadata | |
| │ │ from a proc macro | |
| │ │ | |
| │ └▶ rustc_expand: Used for expanding macros proc macro | |
| │ | |
| ├─▶ Library | |
| │ │ | |
| │ └▶ proc_macro/src/bridge: Bridge module contains all | |
| │ │ the code related to moving | |
| │ │ TokenStream between compiler | |
| │ │ (server) and the proc macro (client) | |
| │ │ | |
| │ │ | |
| │ └▶ bridge/client.rs: wasm proc macro is also | |
| │ a client and thus this | |
| │ file contains related | |
| │ changes | |
| │ | |
| ├▶ use-wasm-proc-macro: Contains helper script for running | |
| │ wasm proc macro | |
| │ | |
| ├▶ wasm-proc-macro: Folder containing all client side code for wasm | |
| │ proc macro which compiler to wasm | |
| │ | |
| └▶ wasm-proc-macro-loader: Contains the main logic for loading and | |
| running wasm proc macro | |
| ``` | |
| ## Decisions | |
| Now I shall try to explain the project and decisions in detail: | |
| ### Loading wasm proc macro | |
| The starting point in lifecycle of any proc macro starts at `compiler/rustc_metadata` | |
| where the crate for relevant proc macro is registered relevant, metadata is decoded, etc. | |
| Now for wasm proc macros(wpm) we have chosen the extension `.wpm`. However in reality this is just the | |
| `.wasm` file generated while compiling wasm proc macro to target `wasm32-unknown-unknown`. This also | |
| means there is no actual metadata attached with wasm proc macro file. Currently for correctly | |
| registering wasm proc macro(wpm), a normal proc macro in `rust/pmacro1` is used to augment the metadata. | |
| ### Calling wasm proc macro | |
| Once registered, the next part is process of expanding wpm. This is handled in `compiler/rustc_expand`. | |
| Originally there are 3 types of proc macros: | |
| ```rust | |
| pub struct BangProcMacro | |
| pub struct AttrProcMacro | |
| pub struct DeriveProcMacro | |
| ``` | |
| We have introduced another type: `pub struct WasmBangProcMacro`. Instead of relying/using orignal types | |
| we felt that introduction of wasm proc macro were large enought that significant parts of the original | |
| code pipe could not be used and thus new type was used. Currently every experiment is done with this one | |
| new type. This was because extending the code for wpm from function type to attribute and derived type | |
| is extremently trivial and can be handled in future. Code for spawning the server, type conversion | |
| from compiler's definiton of tokenstream to proc macro's definitoin of token stream also happends here. | |
| Initially there were test to keep all the wasm runtime related code also consolidated here in a slightly | |
| similar fashion to mbe. However, it did not quite work and then my mentor scaffolded a setup to load | |
| code for wasmtime runtime as shared object. Thus this possibility was never futher explored. | |
| However, from my experience now, I feel api wise having wasm runtime related code here is possible | |
| and infact favourable since it would allow for easier use of various wasm related dependencies and | |
| forgo the hacky method of loading wasmtime crate as shared object. | |
| ### Logic for wasm proc macro | |
| The rest of proc macro related logic lives in `library/proc_macro`. Even specifically we are only really | |
| interested in code for the bridge module which handles communication between the compiler (server) | |
| and the proc macro (client). | |
| The core idea behind current iteration of wpm is as follows: | |
| The goal is allow passing tokenstreams between the compiler and wpm. Is is generally not desireable | |
| to directly pass rust types between wasm and rust due to ffi limitations. So general practice is | |
| to pass then around in serialized format. However serialization is not as straightforward due | |
| to complexities with span. The first experiment was to build new custom encoding format similar to | |
| one used by the [watt](https://github.com/dtolnay/watt/tree/master). However there is already so | |
| much code laid out for the almost exact same function so efforts were made to reused the current | |
| serialization mechanisms which overcome the span related complexities/drawbacks through a rpc | |
| mechanism. | |
| However as a shortcut the buffer which is used to this serialization in `bridge::buffer` was made | |
| public. Over serveral iterations I have tried to reduce the amount of internal code I have made public. | |
| However these are just some shortcuts I took and actual logic wise these can be cleaned in later iterations. | |
| The core logic for expanding wpm lives in `proc_macro::bridge::client`. The core logic | |
| related to interacting with the wasm file lives in `rust/wasm-proc-macro-loader`. | |
| ## Hacks | |
| Since the project was approached in a goal first top down fashion, lot of hack were used along the way. | |
| ### 1. Loading wasmtime | |
| We can not have dependencies for lib proc_macro. This is due to how the code for | |
| proc_macro is structured in the compiler and as of now there is no way to circumvent it. | |
| This was overcome with help of my mentor who wrote the code for loading wasmtime crate as a shared object. | |
| The code itself is very hacky and uses leaking some stuff for it to correctly work. | |
| ### 2. Reading metadata | |
| Currently `.wpm` is chosen as the extension for a wasm procedural macro file. However it is a wasm | |
| file and does not contain any metadata so a proxy proc macro is used for | |
| loading metadata names `pmacro1`. | |
| ### 3. Passing TokenStream | |
| It is best to only pass complex types between wasm object in serialized formats. | |
| In the current implementation of proc_macro TokenStreams have a set serialization format | |
| and passed using a private `Buffer`. For easier / faster implementation this buffer was made public | |
| and used directly for passing TokenStreams. | |
| ### 4. Hardcoded values | |
| Currently most values like function names for proc_macro are hardcoded. | |
| ## Current State | |
| The code snapshot in the current state is still not fully complete. Upon running the code | |
| following the step in `wasm-proc-macros-draft.md` the code compiles but errors out during | |
| the final steps of running the wasm proc macro. This is because some code of RPC between | |
| client-server depends on thread api in rust which is not present for wasm32-unknown-unknown. | |
| Parts of this code could not be reworked before the end of gsoc time period. | |
| ## Future Work | |
| If you would like to start with code from current state you are advised to go through | |
| `with_api` macro defined in lib proc_macro::bridge and all the times it is used/called. | |
| This should give you an overall idea for the bridge api. | |
| ### Rework bridge | |
| The current bridge depends on some thread api. As of right now, this is the final hurdle | |
| before we can have a scafholded working demo for wpm. | |
| We need to either rework the bridge api to not depend on thread or provide a alternate set | |
| of api which is only used inside the wasm proc macro. From my assessment, for the logic | |
| where thread local is used, it was hard to reasong about not using it. The best option | |
| seems to be providing an alternate set of api. | |
| In my assessment this can add up to enought for atleast a gsoc small project. | |
| ### Build Target and Metadata | |
| Add proper support for a `wasm32-wpm` target with proper metadata support. This is dependent | |
| on the api for wpm however relatively easier because compared to other crates, proc macro | |
| do not have much in to be added to header and for the most part using previous header/rlib | |
| with few minor tweaks during encoding would be suffcient. | |
| ### Better handle wasm runtime | |
| From the work done, I think we should be able to load wasmtime as a dependency in rustc_expand | |
| and retain most of the functionality as seen currently. This would avoid the hacky nature | |
| of current method. We can also look into receiving a path to wasm runtime by the compiler | |
| rather than using only one runtime i.e. wasmtime at the moment. | |
| ## Problems I faced | |
| I though my current skill set was not being sufficiently challenged previously, so I took this | |
| project as a real challenge. When I took the project I was informed that this project would liley | |
| be a moonshot. I still took it because a moonshot is exciting to my little mind. | |
| However in hindsignt this turned out to be lot harder than I anticipated. Proc macros can not have any | |
| dependency and so all the code here is hand written low level code, lower than I have ever been. | |
| Also not having any depencies means I can not just use my favourite crates for various jobs. | |
| This was a major slip in my pre project assessment which later caused me to reduce the goals | |
| and increase the overall time period which ended up overlapping during my college too. | |
| This overlapping time has really pushed me in terms of managing my time and really prioritizing | |
| things and focusing on maintaining a rigorous schedule. | |
| ## Learnings | |
| Like I said it was a moonshot, a challenge I willingly took and I am very happy I did that. I actually | |
| went through a somewhat legacy codebase, dug through it and currently in a place where I am pretty | |
| confident with it when I make changes. When a panic or error happends, I generally know what caused it | |
| and reading the core dump traces has become surpsigingly readable. | |
| I also became more familiar with various debugging/tracing tools like flamegraph, objdump | |
| ptrace which I had to use a lot during the initial phase for understanding | |
| the working of proc macros. | |
| In personal capacity I also properly studied through Category Theory for Programmers | |
| which demistified the functional programming paradigm. | |
| Understanding the functional paradigm was also once of the reasons I wanted to take | |
| up a project in rust along with others. Rust is also a low level language with modern | |
| tooling which allows for a healthier step to low level code. I wanted to say easier | |
| but rust is definitely not easy, however it feels that way because while other languages | |
| like C allow to do anything, they also make it trivially easy to do and accumulate the wrong | |
| habits and it is generallhy hard to find that the good habits are. | |
| In that regard I feel that habits I have learned using rust and category theory have also | |
| found their way into other code I have to write in other capacities like college, etc where | |
| I have a better understanding of how I am structuring my code and thinking about the problem | |
| and the logic. | |
| Another wonderful thing I experinece were the increadible people. Rust is still not the | |
| absolute mainstream language, but rapidly growing. Thus the people I met and saw in the rust | |
| zulip chat where some of the most amazing inspiring people I have had a privilege of being | |
| with. They are extremely technical people and where really inspiring to be around. | |
| ## Final Remarks | |
| While I had overestimated the orignal goal for the project which had to be reduces later on, | |
| I have found the experince to be very endearing. I wanted a challenge and it provided me | |
| with a real one. I met various wonderful, inspirational people and made some new friends. | |
| It has allowed me to really break through from my previous technical prowess and reacher | |
| better hights. Extremely happy and grateful that I was given a change to be part of this | |
| experince. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment