Skip to content

Instantly share code, notes, and snippets.

@mav3ri3k
Created October 3, 2024 17:15
Show Gist options
  • Save mav3ri3k/88be5d6f1e8b9ddbe04d536c9a9d7e16 to your computer and use it in GitHub Desktop.
Save mav3ri3k/88be5d6f1e8b9ddbe04d536c9a9d7e16 to your computer and use it in GitHub Desktop.
Final report for GSOC24 project: Sandboxed and Deterministic Proc Macro using Wasm
# Sandboxed and Deterministic Proc Macro using Wasm
This branch: 'gsoc24' is the final snapshot for state of code during the end of gsoc project: [Sandboxed and Deterministic Proc Macro using Wasm](https://summerofcode.withgoogle.com/programs/2024/projects/kXG0mZoj)
The initial goal for the project was to:
Add experimental support to rustc for building and running procedural macros as WebAssembly. Procedural Macro crates can opt in for being compiled to WebAssembly. This wasm-proc-macro will be a wasm blob sandboxed using WASM. It will interact with the compiler only through a stream of token and no ability to interact with the outside world.
The project was inspired by: [Build-time execution sandboxing](https://github.com/rust-lang/compiler-team/issues/475) and [Pre-RFC: Sandboxed, deterministic, reproducible, efficient Wasm compilation of proc macros](https://internals.rust-lang.org/t/pre-rfc-sandboxed-deterministic-reproducible-efficient-wasm-compilation-of-proc-macros/19359)
I started with the project in a top down fashion such that I started with the problem and slowly chipping away at it as required.
**Fire first aim later**.
Few notable decisions were taken early on based on discussions and advise from other community members so as to only focus on
`wasm32-unknown-unknown` as the wasm target and focus on using Wasmtime as the runtime.
Over the course of the project it was establised that reaching the original goal would not be feasible within the
time constrains of a medium project and thus was much reduces.
## Noteable Content
For exploring project and work done, it is advised to check the [gsoc24 branch diff](https://github.com/rust-lang/rust/compare/master...mav3ri3k:rust:gsoc24).
Also please find summary of relevant folders/files below.
```sh
Rust
├─▶ Compile
│ │
│ └▶ rustc_metadata: Used for loading wasm proc macro:
│ │ `.wpm` file and loading metadata
│ │ from a proc macro
│ │
│ └▶ rustc_expand: Used for expanding macros proc macro
├─▶ Library
│ │
│ └▶ proc_macro/src/bridge: Bridge module contains all
│ │ the code related to moving
│ │ TokenStream between compiler
│ │ (server) and the proc macro (client)
│ │
│ │
│ └▶ bridge/client.rs: wasm proc macro is also
│ a client and thus this
│ file contains related
│ changes
├▶ use-wasm-proc-macro: Contains helper script for running
│ wasm proc macro
├▶ wasm-proc-macro: Folder containing all client side code for wasm
│ proc macro which compiler to wasm
└▶ wasm-proc-macro-loader: Contains the main logic for loading and
running wasm proc macro
```
## Decisions
Now I shall try to explain the project and decisions in detail:
### Loading wasm proc macro
The starting point in lifecycle of any proc macro starts at `compiler/rustc_metadata`
where the crate for relevant proc macro is registered relevant, metadata is decoded, etc.
Now for wasm proc macros(wpm) we have chosen the extension `.wpm`. However in reality this is just the
`.wasm` file generated while compiling wasm proc macro to target `wasm32-unknown-unknown`. This also
means there is no actual metadata attached with wasm proc macro file. Currently for correctly
registering wasm proc macro(wpm), a normal proc macro in `rust/pmacro1` is used to augment the metadata.
### Calling wasm proc macro
Once registered, the next part is process of expanding wpm. This is handled in `compiler/rustc_expand`.
Originally there are 3 types of proc macros:
```rust
pub struct BangProcMacro
pub struct AttrProcMacro
pub struct DeriveProcMacro
```
We have introduced another type: `pub struct WasmBangProcMacro`. Instead of relying/using orignal types
we felt that introduction of wasm proc macro were large enought that significant parts of the original
code pipe could not be used and thus new type was used. Currently every experiment is done with this one
new type. This was because extending the code for wpm from function type to attribute and derived type
is extremently trivial and can be handled in future. Code for spawning the server, type conversion
from compiler's definiton of tokenstream to proc macro's definitoin of token stream also happends here.
Initially there were test to keep all the wasm runtime related code also consolidated here in a slightly
similar fashion to mbe. However, it did not quite work and then my mentor scaffolded a setup to load
code for wasmtime runtime as shared object. Thus this possibility was never futher explored.
However, from my experience now, I feel api wise having wasm runtime related code here is possible
and infact favourable since it would allow for easier use of various wasm related dependencies and
forgo the hacky method of loading wasmtime crate as shared object.
### Logic for wasm proc macro
The rest of proc macro related logic lives in `library/proc_macro`. Even specifically we are only really
interested in code for the bridge module which handles communication between the compiler (server)
and the proc macro (client).
The core idea behind current iteration of wpm is as follows:
The goal is allow passing tokenstreams between the compiler and wpm. Is is generally not desireable
to directly pass rust types between wasm and rust due to ffi limitations. So general practice is
to pass then around in serialized format. However serialization is not as straightforward due
to complexities with span. The first experiment was to build new custom encoding format similar to
one used by the [watt](https://github.com/dtolnay/watt/tree/master). However there is already so
much code laid out for the almost exact same function so efforts were made to reused the current
serialization mechanisms which overcome the span related complexities/drawbacks through a rpc
mechanism.
However as a shortcut the buffer which is used to this serialization in `bridge::buffer` was made
public. Over serveral iterations I have tried to reduce the amount of internal code I have made public.
However these are just some shortcuts I took and actual logic wise these can be cleaned in later iterations.
The core logic for expanding wpm lives in `proc_macro::bridge::client`. The core logic
related to interacting with the wasm file lives in `rust/wasm-proc-macro-loader`.
## Hacks
Since the project was approached in a goal first top down fashion, lot of hack were used along the way.
### 1. Loading wasmtime
We can not have dependencies for lib proc_macro. This is due to how the code for
proc_macro is structured in the compiler and as of now there is no way to circumvent it.
This was overcome with help of my mentor who wrote the code for loading wasmtime crate as a shared object.
The code itself is very hacky and uses leaking some stuff for it to correctly work.
### 2. Reading metadata
Currently `.wpm` is chosen as the extension for a wasm procedural macro file. However it is a wasm
file and does not contain any metadata so a proxy proc macro is used for
loading metadata names `pmacro1`.
### 3. Passing TokenStream
It is best to only pass complex types between wasm object in serialized formats.
In the current implementation of proc_macro TokenStreams have a set serialization format
and passed using a private `Buffer`. For easier / faster implementation this buffer was made public
and used directly for passing TokenStreams.
### 4. Hardcoded values
Currently most values like function names for proc_macro are hardcoded.
## Current State
The code snapshot in the current state is still not fully complete. Upon running the code
following the step in `wasm-proc-macros-draft.md` the code compiles but errors out during
the final steps of running the wasm proc macro. This is because some code of RPC between
client-server depends on thread api in rust which is not present for wasm32-unknown-unknown.
Parts of this code could not be reworked before the end of gsoc time period.
## Future Work
If you would like to start with code from current state you are advised to go through
`with_api` macro defined in lib proc_macro::bridge and all the times it is used/called.
This should give you an overall idea for the bridge api.
### Rework bridge
The current bridge depends on some thread api. As of right now, this is the final hurdle
before we can have a scafholded working demo for wpm.
We need to either rework the bridge api to not depend on thread or provide a alternate set
of api which is only used inside the wasm proc macro. From my assessment, for the logic
where thread local is used, it was hard to reasong about not using it. The best option
seems to be providing an alternate set of api.
In my assessment this can add up to enought for atleast a gsoc small project.
### Build Target and Metadata
Add proper support for a `wasm32-wpm` target with proper metadata support. This is dependent
on the api for wpm however relatively easier because compared to other crates, proc macro
do not have much in to be added to header and for the most part using previous header/rlib
with few minor tweaks during encoding would be suffcient.
### Better handle wasm runtime
From the work done, I think we should be able to load wasmtime as a dependency in rustc_expand
and retain most of the functionality as seen currently. This would avoid the hacky nature
of current method. We can also look into receiving a path to wasm runtime by the compiler
rather than using only one runtime i.e. wasmtime at the moment.
## Problems I faced
I though my current skill set was not being sufficiently challenged previously, so I took this
project as a real challenge. When I took the project I was informed that this project would liley
be a moonshot. I still took it because a moonshot is exciting to my little mind.
However in hindsignt this turned out to be lot harder than I anticipated. Proc macros can not have any
dependency and so all the code here is hand written low level code, lower than I have ever been.
Also not having any depencies means I can not just use my favourite crates for various jobs.
This was a major slip in my pre project assessment which later caused me to reduce the goals
and increase the overall time period which ended up overlapping during my college too.
This overlapping time has really pushed me in terms of managing my time and really prioritizing
things and focusing on maintaining a rigorous schedule.
## Learnings
Like I said it was a moonshot, a challenge I willingly took and I am very happy I did that. I actually
went through a somewhat legacy codebase, dug through it and currently in a place where I am pretty
confident with it when I make changes. When a panic or error happends, I generally know what caused it
and reading the core dump traces has become surpsigingly readable.
I also became more familiar with various debugging/tracing tools like flamegraph, objdump
ptrace which I had to use a lot during the initial phase for understanding
the working of proc macros.
In personal capacity I also properly studied through Category Theory for Programmers
which demistified the functional programming paradigm.
Understanding the functional paradigm was also once of the reasons I wanted to take
up a project in rust along with others. Rust is also a low level language with modern
tooling which allows for a healthier step to low level code. I wanted to say easier
but rust is definitely not easy, however it feels that way because while other languages
like C allow to do anything, they also make it trivially easy to do and accumulate the wrong
habits and it is generallhy hard to find that the good habits are.
In that regard I feel that habits I have learned using rust and category theory have also
found their way into other code I have to write in other capacities like college, etc where
I have a better understanding of how I am structuring my code and thinking about the problem
and the logic.
Another wonderful thing I experinece were the increadible people. Rust is still not the
absolute mainstream language, but rapidly growing. Thus the people I met and saw in the rust
zulip chat where some of the most amazing inspiring people I have had a privilege of being
with. They are extremely technical people and where really inspiring to be around.
## Final Remarks
While I had overestimated the orignal goal for the project which had to be reduces later on,
I have found the experince to be very endearing. I wanted a challenge and it provided me
with a real one. I met various wonderful, inspirational people and made some new friends.
It has allowed me to really break through from my previous technical prowess and reacher
better hights. Extremely happy and grateful that I was given a change to be part of this
experince.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment