Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save kafkadev/c476e6e107be43c79b6cd7ad923d2e4c to your computer and use it in GitHub Desktop.
Save kafkadev/c476e6e107be43c79b6cd7ad923d2e4c to your computer and use it in GitHub Desktop.

Revisions

  1. @0xdevalias 0xdevalias revised this gist Jul 20, 2025. 1 changed file with 2 additions and 0 deletions.
    2 changes: 2 additions & 0 deletions _deobfuscating-unminifying-obfuscated-web-app-code.md
    Original file line number Diff line number Diff line change
    @@ -84,6 +84,8 @@ Since this gist has gotten huge with many references.. here's a tl;dr shortlist
    - > Issue 5: Evaluate `PerimeterX/Restringer` against the benchmark
    - https://github.com/jsdeobf/jsdeobf.github.io/issues/9
    - > Issue 9: Include LLM evaluation costs in the leaderboard
    - https://github.com/jsdeobf/jsdeobf.github.io/issues/10
    - > Issue 10: Evaluate the benchmark against other hosted LLM API's, not just OpenAI GPT-4o (eg. Claude, Gemini, etc)
    - The following issues on external repos relate to evaluating various JS Deobfuscation / Unminifier tools against this benchmark:
    - https://github.com/jehna/humanify/issues/539
    - > \[jehna/humanify] Issue 539: Explore / benchmark humanify against "JsDeObsBench: Measuring and Benchmarking LLMs for JavaScript Deobfuscation"
  2. @0xdevalias 0xdevalias revised this gist Jul 20, 2025. 1 changed file with 4 additions and 0 deletions.
    4 changes: 4 additions & 0 deletions _deobfuscating-unminifying-obfuscated-web-app-code.md
    Original file line number Diff line number Diff line change
    @@ -82,6 +82,8 @@ Since this gist has gotten huge with many references.. here's a tl;dr shortlist
    - > Issue 4: Evaluate `lelinhtinh/de4js` against the benchmark
    - https://github.com/jsdeobf/jsdeobf.github.io/issues/5
    - > Issue 5: Evaluate `PerimeterX/Restringer` against the benchmark
    - https://github.com/jsdeobf/jsdeobf.github.io/issues/9
    - > Issue 9: Include LLM evaluation costs in the leaderboard
    - The following issues on external repos relate to evaluating various JS Deobfuscation / Unminifier tools against this benchmark:
    - https://github.com/jehna/humanify/issues/539
    - > \[jehna/humanify] Issue 539: Explore / benchmark humanify against "JsDeObsBench: Measuring and Benchmarking LLMs for JavaScript Deobfuscation"
    @@ -98,6 +100,8 @@ Since this gist has gotten huge with many references.. here's a tl;dr shortlist
    - > JsDeObsBench is a dedicated benchmark designed to rigorously evaluate the effectiveness of LLMs in the context of JS deobfuscation. We here release the utils for building the test dataset and conducting evaluation, which also facilates the evaluation of new LLMs and summarize the results into a leaderboard format.
    - https://github.com/Ch3nYe/JsDeObsBench/issues/2
    - > Issue 2: Consider evaluating identifier naming in a future version of the benchmark
    - https://github.com/Ch3nYe/JsDeObsBench/issues/3
    - > Issue 3: Are there any similar benchmarks / leaderboards for JS unbundling / unminification?
    - https://arxiv.org/abs/2506.20170
    - > JsDeObsBench: Measuring and Benchmarking LLMs for JavaScript Deobfuscation
    - > Deobfuscating JavaScript (JS) code poses a significant challenge in web security, particularly as obfuscation techniques are frequently used to conceal malicious activities within scripts. While Large Language Models (LLMs) have recently shown promise in automating the deobfuscation process, transforming detection and mitigation strategies against these obfuscated threats, a systematic benchmark to quantify their effectiveness and limitations has been notably absent. To address this gap, we present JsDeObsBench, a dedicated benchmark designed to rigorously evaluate the effectiveness of LLMs in the context of JS deobfuscation. We detail our benchmarking methodology, which includes a wide range of obfuscation techniques ranging from basic variable renaming to sophisticated structure transformations, providing a robust framework for assessing LLM performance in real-world scenarios. Our extensive experimental analysis investigates the proficiency of cutting-edge LLMs, e.g., GPT-4o, Mixtral, Llama, and DeepSeek-Coder, revealing superior performance in code simplification despite challenges in maintaining syntax accuracy and execution reliability compared to baseline methods. We further evaluate the deobfuscation of JS malware to exhibit the potential of LLMs in security scenarios. The findings highlight the utility of LLMs in deobfuscation applications and pinpoint crucial areas for further improvement.
  3. @0xdevalias 0xdevalias revised this gist Jul 20, 2025. 1 changed file with 38 additions and 0 deletions.
    38 changes: 38 additions & 0 deletions _deobfuscating-unminifying-obfuscated-web-app-code.md
    Original file line number Diff line number Diff line change
    @@ -5,6 +5,8 @@
    <!-- TOC start (generated with https://bitdowntoc.derlin.ch/) -->
    - [tl;dr: AKA: devalias's shortlist](#tldr-aka-devaliass-shortlist)
    - [PoC](#poc)
    - [Benchmarks / Leaderboards / etc](#benchmarks--leaderboards--etc)
    - [JsDeObsBench](#jsdeobsbench)
    - [Tools](#tools)
    - [Unsorted](#unsorted)
    - [wakaru](#wakaru)
    @@ -65,6 +67,42 @@ Since this gist has gotten huge with many references.. here's a tl;dr shortlist
    - > poc-ast-tools
    > PoC scripts and tools for working with (primarily JavaScript) ASTs.
    ## Benchmarks / Leaderboards / etc

    ### JsDeObsBench

    - https://jsdeobf.github.io/
    - > JsDeObsBench Leaderboard 🏆
    - > JsDeObsBench: Measuring and Benchmarking LLMs for JavaScript Deobfuscation
    - https://github.com/jsdeobf/jsdeobf.github.io
    - The following issues on this repo relate to evaluating various JS Deobfuscation / Unminifier tools against this benchmark:
    - https://github.com/jsdeobf/jsdeobf.github.io/issues/3
    - > Issue 3: Evaluate `ben-sb/obfuscator-io-deobfuscator` against the benchmark
    - https://github.com/jsdeobf/jsdeobf.github.io/issues/4
    - > Issue 4: Evaluate `lelinhtinh/de4js` against the benchmark
    - https://github.com/jsdeobf/jsdeobf.github.io/issues/5
    - > Issue 5: Evaluate `PerimeterX/Restringer` against the benchmark
    - The following issues on external repos relate to evaluating various JS Deobfuscation / Unminifier tools against this benchmark:
    - https://github.com/jehna/humanify/issues/539
    - > \[jehna/humanify] Issue 539: Explore / benchmark humanify against "JsDeObsBench: Measuring and Benchmarking LLMs for JavaScript Deobfuscation"
    - https://github.com/j4k0xb/webcrack/issues/189
    - > \[j4k0xb/webcrack] Issue 189: Explore / benchmark webcrack against "JsDeObsBench: Measuring and Benchmarking LLMs for JavaScript Deobfuscation"
    - https://github.com/pionxzh/wakaru/issues/144
    - > \[pionxzh/wakaru] Explore / benchmark wakaru against "JsDeObsBench: Measuring and Benchmarking LLMs for JavaScript Deobfuscation"
    - https://github.com/ben-sb/obfuscator-io-deobfuscator/issues/50
    - > \[ben-sb/obfuscator-io-deobfuscator] Explore / benchmark obfuscator-io-deobfuscator against "JsDeObsBench: Measuring and Benchmarking LLMs for JavaScript Deobfuscation"
    - https://github.com/PerimeterX/restringer/issues/143
    - > \[PerimeterX/restringer] Issue 143: Explore / benchmark restringer against "JsDeObsBench: Measuring and Benchmarking LLMs for JavaScript Deobfuscation"
    - https://github.com/Ch3nYe/JsDeObsBench
    - > JsDeObsBench: Benchmarking Large Language Models for JavaScript Deobfuscation
    - > JsDeObsBench is a dedicated benchmark designed to rigorously evaluate the effectiveness of LLMs in the context of JS deobfuscation. We here release the utils for building the test dataset and conducting evaluation, which also facilates the evaluation of new LLMs and summarize the results into a leaderboard format.
    - https://github.com/Ch3nYe/JsDeObsBench/issues/2
    - > Issue 2: Consider evaluating identifier naming in a future version of the benchmark
    - https://arxiv.org/abs/2506.20170
    - > JsDeObsBench: Measuring and Benchmarking LLMs for JavaScript Deobfuscation
    - > Deobfuscating JavaScript (JS) code poses a significant challenge in web security, particularly as obfuscation techniques are frequently used to conceal malicious activities within scripts. While Large Language Models (LLMs) have recently shown promise in automating the deobfuscation process, transforming detection and mitigation strategies against these obfuscated threats, a systematic benchmark to quantify their effectiveness and limitations has been notably absent. To address this gap, we present JsDeObsBench, a dedicated benchmark designed to rigorously evaluate the effectiveness of LLMs in the context of JS deobfuscation. We detail our benchmarking methodology, which includes a wide range of obfuscation techniques ranging from basic variable renaming to sophisticated structure transformations, providing a robust framework for assessing LLM performance in real-world scenarios. Our extensive experimental analysis investigates the proficiency of cutting-edge LLMs, e.g., GPT-4o, Mixtral, Llama, and DeepSeek-Coder, revealing superior performance in code simplification despite challenges in maintaining syntax accuracy and execution reliability compared to baseline methods. We further evaluate the deobfuscation of JS malware to exhibit the potential of LLMs in security scenarios. The findings highlight the utility of LLMs in deobfuscation applications and pinpoint crucial areas for further improvement.
    - https://www.alphaxiv.org/overview/2506.20170v1

    ## Tools

    ### Unsorted
  4. @0xdevalias 0xdevalias revised this gist Apr 24, 2025. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions _deobfuscating-unminifying-obfuscated-web-app-code.md
    Original file line number Diff line number Diff line change
    @@ -25,7 +25,7 @@
    - [`swc`](#swc)
    - [`esbuild`](#esbuild)
    - [Source Maps](#source-maps)
    - [Sourcemap v4 and similarly proposed extensions (eg. adding scope/function names as well as variable names, globally unique debug IDs (enabling symbol server support), etc)](#sourcemap-v4-and-similarly-proposed-extensions-eg-adding-scopefunction-names-as-well-as-variable-names-globally-unique-debug-ids-enabling-symbol-server-support-etc)
    - [Source Map v4 and similarly proposed extensions (eg. adding scope/function names as well as variable names, globally unique debug IDs (enabling symbol server support), etc)](#source-map-v4-and-similarly-proposed-extensions-eg-adding-scopefunction-names-as-well-as-variable-names-globally-unique-debug-ids-enabling-symbol-server-support-etc)
    - [Visualisation/etc](#visualisationetc)
    - [Browser Based Code Editors / IDEs](#browser-based-code-editors--ides)
    - [CodeMirror](#codemirror)
    @@ -1382,7 +1382,7 @@ Since this gist has gotten huge with many references.. here's a tl;dr shortlist
    - > Just a simple hacky visualisation of SourceMaps
    - https://sokra.github.io/source-map-visualization/#typescript

    #### Sourcemap v4 and similarly proposed extensions (eg. adding scope/function names as well as variable names, globally unique debug IDs (enabling symbol server support), etc)
    #### Source Map v4 and similarly proposed extensions (eg. adding scope/function names as well as variable names, globally unique debug IDs (enabling symbol server support), etc)

    - https://ecma-international.org/publications-and-standards/standards/ecma-426/
    - > ECMA-426: Source map format specification
  5. @0xdevalias 0xdevalias revised this gist Apr 24, 2025. 1 changed file with 455 additions and 0 deletions.
    455 changes: 455 additions & 0 deletions _deobfuscating-unminifying-obfuscated-web-app-code.md
    Original file line number Diff line number Diff line change
    @@ -25,6 +25,7 @@
    - [`swc`](#swc)
    - [`esbuild`](#esbuild)
    - [Source Maps](#source-maps)
    - [Sourcemap v4 and similarly proposed extensions (eg. adding scope/function names as well as variable names, globally unique debug IDs (enabling symbol server support), etc)](#sourcemap-v4-and-similarly-proposed-extensions-eg-adding-scopefunction-names-as-well-as-variable-names-globally-unique-debug-ids-enabling-symbol-server-support-etc)
    - [Visualisation/etc](#visualisationetc)
    - [Browser Based Code Editors / IDEs](#browser-based-code-editors--ides)
    - [CodeMirror](#codemirror)
    @@ -1381,6 +1382,460 @@ Since this gist has gotten huge with many references.. here's a tl;dr shortlist
    - > Just a simple hacky visualisation of SourceMaps
    - https://sokra.github.io/source-map-visualization/#typescript

    #### Sourcemap v4 and similarly proposed extensions (eg. adding scope/function names as well as variable names, globally unique debug IDs (enabling symbol server support), etc)

    - https://ecma-international.org/publications-and-standards/standards/ecma-426/
    - > ECMA-426: Source map format specification
    > 1st edition, December 2024
    - > This Standard defines the source map format, used by different types of developer tools to improve the debugging experience of code compiled to JavaScript, WebAssembly, and CSS.
    - > The latest drafts are available at: https://tc39.es/ecma426/
    - https://tc39.es/ecma426/
    - https://github.com/tc39/ecma426
    - > Source map specification, RFCs and new proposals.
    - https://github.com/tc39/ecma426/issues/12
    - > Proposal: Source Maps v4 (or v3.1): Improved post-hoc debuggability
    - > At a recent TC-39 JS Tools meeting we decided to bring the proposal for v4 here as a neutral discussion point.
    >
    > The proposal document is here: [MicrosoftEdge/MSEdgeExplainers#538](https://github.com/MicrosoftEdge/MSEdgeExplainers/pull/538)
    - > The proposal is to add (one or two) new field(s) to the existing Source Maps document. This would map text ranges of the source files to a "scope", which would describe the lexical subroutine from the source language.
    - > Primary Points of Discussion
    - > Storing function names
    >
    > Option 1: Store in the existing “names” field
    > Option 2: Introduce a new “scopeNames” field
    - > Keying “scopes” by generated index VS source index
    >
    > Option 1: Key in “scopes” is source line and column
    > Option 2: Key in “scopes” is generated line and column
    - > Nested scopes VS linear scopes
    >
    > Option 1: Scopes are nested
    > Option 2: Scopes are linear
    - > Format of “scopes” field
    >
    > Option 1: Starting points
    > Option 2: Ranges
    - > Relative VS absolute indices
    - > Versioning
    >
    > Option 1: Version 4
    > Option 2: Version 3.1
    > Option 3: Retain version 3, but just add new fields
    - > Naming of functions
    >
    > This decision point relates to the naming convention of functions. While a free function f() will of course be named f, there are more choices available for other types of functions. For example, does a class member function get Class.prototype.f or Class.f as its name, or how do you name an anonymous function? These decisions probably don’t belong in the spec, but it would be useful to have a common naming convention across tools.
    - https://github.com/tc39/ecma426/issues/12#issuecomment-1020071783
    - > I am wondering if we could perhaps design the new source maps so that they are extensible. I am thinking something along the lines of [DWARF](https://dwarfstd.org/), where one defines abbreviations to represent their subset of debug attributes in the .debug_abbrev section and then emits debug info using the abbreviations in the .debug_info section (see a short example [here](https://wiki.osdev.org/DWARF#The_.debug_abbrev_section)). I started exploring the idea applied to source maps in a slide deck [here](https://docs.google.com/presentation/d/13pKF8FTg7OhEUfQ4YO2vWJgJYM74wdspCzgPkpbRvN4/edit?usp=sharing&resourcekey=0-9xs5hDXmyqlt4GVs9w2BRg).
    - https://dwarfstd.org/
    - > Welcome to the DWARF Debugging Standard Website
    >
    > DWARF is a debugging information file format used by many compilers and debuggers to support source level debugging. It addresses the requirements of a number of procedural languages, such as C, C++, and Fortran, and is designed to be extensible to other languages. DWARF is architecture independent and applicable to any processor or operating system. It is widely used on Unix, Linux and other operating systems, as well as in stand-alone environments.
    - https://github.com/tc39/ecma426/issues/12#issuecomment-1021460780
    - > Perhaps it is also worth noting that Dart compiled to JavaScript has been using "battle-proven" source map extensions for more than three years. A somewhat outdated documentation is [here](https://github.com/dart-lang/sdk/blob/master/pkg/compiler/doc/sourcemap_extensions.md) (the latest version uses VLQ encodings). Compared to pasta-sourcemaps, the Dart extensions also describe inlining - they are keyed by generated positions. The Dart source maps also describe property and global renames, but I am not sure if those extensions are flexible enough. Since the renames are global, they cannot describe renames that happen only in some parts of the code (property renames are tricky to describe for other reasons, too).
    - https://github.com/tc39/ecma426/issues/12#issuecomment-1027127093
    - > ICYMI, it looks like there's also some prior art from back in 2015 in [\#4](https://github.com/tc39/ecma426/pull/4) ([rendered](https://github.com/source-map/source-map-rfc/blob/cc41ac3c7514dcafc7ce1f5735308a2c3dde9d6e/proposals/env.md)). From a high-level skim, it seems to share some of the ideas from the DWARF-inspired proposal in [@jaro-sevcik](https://github.com/jaro-sevcik)'s slides.
    - https://github.com/tc39/ecma426/issues/12#issuecomment-1336212827
    - > We ([@getsentry](https://github.com/getsentry)) have a lot of interest in scope information. As of recently we have started parsing minified source code to reconstruct scope information as the previous heuristics to reconstruct function names just don't work well. ([We wrote about this here](https://blog.sentry.io/2022/11/30/how-we-made-javascript-stack-traces-awesome/)). The technical implementation for this can be found here: [getsentry/js-source-scopes](https://github.com/getsentry/js-source-scopes/)
    >
    > Obviously this is a pretty crude way of resolving this issue and it has a lot of restrictions still. In particular we are using the minified sources which do not have the right source information in all cases.
    >
    > From our perspective longer term source maps probably should be replaced. Particularly for us operating with JSON files at scale has always been quite frustrating and the standard leaves a lot to be desired. Facebook has actually extended the source map format with the Metro bundler for react-native to include some scope information (`x_facebook_sources`) so there is some prior art about the scope information: [getsentry/rust-sourcemap@`5187edf`/src/hermes.rs#L135](https://github.com/getsentry/rust-sourcemap/blob/5187edf627d70ec1198e737bb14eecfe95fab7af/src/hermes.rs#L135)
    >
    > I'm super happy to see that something is moving about scope information but since we probably already need to touch a lot of tooling in the process, I wonder if it's not time to start seriously considering actually using DWARF for this instead.
    - https://github.com/getsentry/js-source-scopes/
    - > JS Source Scopes
    - > **Features**
    >
    > - Extracting scopes from source text using \[`extract_scope_names`\]
    > - Fast lookup of scopes by byte offset using \[`ScopeIndex`\]
    > - Fast conversion between line/column source positions and byte offsets using \[`SourceContext`\]
    > - Resolution of minified scope names to their original names using \[`NameResolver`\]
    - https://github.com/getsentry/js-source-scopes/blob/e5a675d0319f3f782815c82decbc77bf42d1655f/src/name_resolver.rs#L7
    - > A structure for resolving `[ScopeName]`s in minified code to their original names using information contained in a `[DecodedMap]`.
    - https://blog.sentry.io/how-we-made-javascript-stack-traces-awesome/
    - > How We Made JavaScript Stack Traces Awesome
    - > You might have noticed a significant improvement in Sentry JavaScript stack traces recently. In this blog post, we want to explain why source maps are insufficient for solving this problem, the challenges we faced, and how we eventually pulled it off by parsing JavaScript.
    - https://blog.sentry.io/how-we-made-javascript-stack-traces-awesome/#our-improved-approach-javascript-parsing
    - > Our Improved Approach: JavaScript Parsing
    - https://blog.sentry.io/how-we-made-javascript-stack-traces-awesome/#how-does-that-work-in-detail
    - > How Does That Work in Detail?
    - https://github.com/tc39/ecma426/issues/12#issuecomment-2185969351
    - > I believe this is subsumed by the current [scopes proposal](https://github.com/tc39/source-map/blob/main/proposals/scopes.md). If there is anything missing, we should add it as a follow up to the main proposal.
    - https://github.com/tc39/ecma426/issues/37
    - > Scopes and variable shadowing
    - > [The env proposal](https://github.com/source-map/source-map-rfc/blob/main/proposals/env.md) would add information about scopes and bindings in the original source to a sourcemap. That proposal satisfies all the scenarios listed [here](https://github.com/source-map/source-map-rfc/issues/2#issuecomment-74966399) but there is one scenario it doesn't support: variable shadowing, i.e. a variable declared in an outer scope that is shadowed by another variable with the same name in an inner scope. This scenario is quite common because javascript minifiers reuse variable names aggressively.
    - https://github.com/tc39/ecma426/issues/37#issuecomment-1701107791
    - > I've created an example implementation [here](https://github.com/hbenl/tc39-proposal-scope-mapping/) for computing the original scopes using the scope information encoded in the sourcemap and added two of the examples above.
    - https://github.com/hbenl/tc39-proposal-scope-mapping
    - > tc39-proposal-scope-mapping
    - > This repository contains an example implementation of the algorithm for computing original frames and scopes in a debugger using scope information encoded in the sourcemap according to [this proposal](https://github.com/tc39/source-map-rfc/issues/37#issuecomment-1650027594). The algorithm can be found in `src/getOriginalFrames.ts`, the `test` directory contains some examples from [here](https://github.com/tc39/source-map-rfc/issues/37#issuecomment-1699356967). Furthermore it shows how this scope information could be encoded in a VLQ string in `src/encodeScopes.ts`/`src/decodeScopes.ts`.
    - https://github.com/tc39/ecma426/issues/37#issuecomment-1721365785
    - > I have added a `decodeScopes()` function to [this repo](https://github.com/hbenl/tc39-proposal-scope-mapping/) showing how scope information could be encoded in a VLQ string: each scope is encoded as a list of VLQ numbers, scopes are separated by `,`
    >
    > - the first number encodes the scope type and whether the scope appears in the original and generated sources
    > - if the scope type is `ScopeType.NAMED_FUNCTION`, the second number is the index of the scope name in the `scopeNames` array
    > - the next four numbers are `startLine`, `startColumn`, `endLine` and `endColumn`
    > - for each binding we add the indices of the variable name and the expression in the `scopeNames` array
    - https://github.com/tc39/ecma426/issues/37#issuecomment-1734680599
    - > would it help if I added the spec notes that Tobias took as well as the screenshots from the Munich meetup to this issue? Or maybe somewhere else?
    - https://github.com/tc39/ecma426/issues/37#issuecomment-1777452283
    - > Tobias' spec notes: https://gist.github.com/sokra/97a53a869b9a421accadbc9681cb26f3
    - https://github.com/tc39/ecma426/issues/19
    - > Unique Debug IDs (Build IDs) for Source Maps
    - > One of the most annoying issues with source maps today is that they are referenced by names. This makes working with them in production very challenging. Sentry for instance needs to download all referenced source minified files, to then try to use those minified files to locate the source map and then hope that it's there. If it's not there, we hope that the users actually managed to upload the source maps with the information of that URL to our own system (eg: they need to achieve an exact match of where the source map is supposed to be but instead of uploading them to the public place, they upload them to us with the URL as name).
    >
    > This is very brittle and usually means that many people have challenges getting source maps to work reliably for their own applications.
    >
    > We have much better success with debug information files from C/C++ and other languages where we get a unique build ID which can then be fetched by that ID from a symbol server.
    >
    > I'm curious if there is some appetite with embedding a unique build ID into the source map and to include that ID in the `error.stack` for reference by other tools.
    >
    > I made a similar proposal for WASM (which however uses DWARF) and there is some movement into that direction: [WebAssembly/tool-conventions#133](https://github.com/WebAssembly/tool-conventions/issues/133)
    - https://github.com/WebAssembly/tool-conventions/issues/133
    - > Build ID Section for WAS
    - https://github.com/WebAssembly/tool-conventions/issues/133#issuecomment-554123720
    - > To generate the build ID they could take the hash of the main wasm sections and store it in the file. They can alternatively just generate a random UUID and embed it. I do think though that a build ID should ideally always be embedded.
    >
    > ([This here describes the workflow](https://blog.sentry.io/2019/06/13/building-a-sentry-symbolicator) where this information is particularly useful)
    - https://blog.sentry.io/building-a-sentry-symbolicator/
    - > Building Sentry: Symbolicator
    - > Welcome to our series of blog posts about all the nitty-gritty details that go into building a great debug experience at scale. Today, we're looking at Symbolicator, the service that processes all native crash reports and minidumps at Sentry.
    - > Now, the time has come to lift the curtain and show you how we handle native crashes in Sentry. Join us on a multi-year journey from our first baby-steps at native crash analysis to Symbolicator, the reusable open-source service that we've built to make native crash reporting easier than ever.
    - https://blog.sentry.io/building-a-sentry-symbolicator/#debug-information-is-gold
    - > (Debug) information is gold
    - > The final executable no longer needs to know the names of variables or the files that your code was declared in. Sometimes, not even function names play a role anymore. To ensure that developers can still inspect their applications, compilers, therefore, output _debug information_ containing data to connect the optimized instructions with their source code.
    >
    > However, this debug information can get large. It's not uncommon to encounter debug information 10 times the size of the executable. For this reason, debug information is often moved (or _stripped_) to separate companion files. They are commonly referred to as _Debug Information Files_, or _Debug Symbols_. On Windows, they carry a `.pdb` extension, on macOS, they are `.dSYM` folder structures, and, on Linux, there is a convention to put them in `.debug` files.
    - > The internal format of these files also varies — while macOS and Linux generally use the open-source [DWARF](http://dwarfstd.org/) standard, Microsoft implemented their proprietary CodeView that was eventually [open-sourced](https://github.com/Microsoft/microsoft-pdb) at the request of the LLVM project.
    >
    > At the heart of each debug information file are tree-like structures explaining the contents of every compilation unit. They contain all types, functions, parameters as well as variables, scopes, and more. Additionally, there are mappings of these structures to instruction pointer addresses in the source code as well as the file and line number where they are declared.
    - https://dwarfstd.org/
    - > Welcome to the DWARF Debugging Standard Website
    >
    > DWARF is a debugging information file format used by many compilers and debuggers to support source level debugging. It addresses the requirements of a number of procedural languages, such as C, C++, and Fortran, and is designed to be extensible to other languages. DWARF is architecture independent and applicable to any processor or operating system. It is widely used on Unix, Linux and other operating systems, as well as in stand-alone environments.
    - https://github.com/Microsoft/microsoft-pdb
    - > Information from Microsoft about the PDB format. We'll try to keep this up to date. Just trying to help the CLANG/LLVM community get onto Windows.
    - > Of course, there are great Rust libraries that can handle debug information, including [`gimli`](https://github.com/gimli-rs/gimli) for DWARF and `pdb` for CodeView, that are contributing to our improvements. In our own `symbolic` library, we've created a [handy](https://docs.rs/symbolic-debuginfo) [abstraction](https://docs.rs/symbolic-debuginfo/latest/symbolic_debuginfo/) over the files and debug formats to simplify native symbolication.
    - https://github.com/gimli-rs/gimli
    - > gimli
    - > A library for reading and writing the DWARF debugging format
    - https://docs.rs/symbolic-debuginfo/latest/symbolic_debuginfo/
    - > Crate symbolic_debuginfo
    - > Abstractions for dealing with object files and debug information.
    >
    > This module defines the [`Object`](https://docs.rs/symbolic-debuginfo/latest/symbolic_debuginfo/enum.Object.html) type, which is an abstraction over various object file formats used in different platforms. Also, since executables on MacOS might contain multiple object files (called a _“Fat MachO”_), there is an [`Archive`](https://docs.rs/symbolic-debuginfo/latest/symbolic_debuginfo/enum.Archive.html) type, that provides a uniform interface with access to an objects iterator in all platforms.
    >
    > Most processing of object files will happen on the `Object` type or its concrete implementation for one platform. To allow abstraction over this, there is the [`ObjectLike`](https://docs.rs/symbolic-debuginfo/latest/symbolic_debuginfo/trait.ObjectLike.html) trait. It defines common attributes and gives access to a [`DebugSession`](https://docs.rs/symbolic-debuginfo/latest/symbolic_debuginfo/trait.DebugSession.html), which can be used to perform more stateful handling of debug information.
    >
    > See [`Object`](https://docs.rs/symbolic-debuginfo/latest/symbolic_debuginfo/enum.Object.html) for the full API, or use one of the modules for direct access to the platform-dependent data.
    - https://docs.rs/symbolic-debuginfo/latest/symbolic_debuginfo/#modules
    - > Modules
    - `js`: Utilities specifically for working with JavaScript specific debug info.
    - https://docs.rs/symbolic-debuginfo/latest/symbolic_debuginfo/js/index.html
    - > Utilities specifically for working with JavaScript specific debug info.
    >
    > This for the most part only contains utility functions to parse references out of minified JavaScript files and source maps. For actually working with source maps this module is insufficient.
    - etc
    - https://blog.sentry.io/building-a-sentry-symbolicator/#speeding-it-up
    - > Speeding it up
    - > Dealing with large debug files also has its drawbacks. At Sentry's scale, we've repeatedly run into cases where we were unsatisfied with the various aspects of retrieving, storing, and processing multiple gigabytes worth of debug information just for a single crash. Additionally, handling different file types all the time only complicates the overall symbolication process — even when hidden behind a fancy abstraction.
    >
    > Engineers at Google faced the same issue when they created the Breakpad library. They came up with a human-readable and cross-platform representation for the absolutely necessary subset of debug information: Breakpad symbols. And it worked; those files are much smaller than the original files and can easily be handled by engineers.
    >
    > However, their format is optimized for human readability, not automated processing. Also, certain debug information can't be stored, such as inline function data, which is a core part of our product. So we decided to create our own format. The objectives: make it as small as possible and as fast as possible to read. And since it needed a name, we pragmatically dubbed it [SymCache](https://docs.rs/symbolic-symcache).
    >
    > Usually, symcaches weigh an order of magnitude less than original debug files and come with a format that's easily binary searchable by instruction address. Paired with memory mapping, this makes them the ideal format for quick symbolication. Whenever a native crash comes in, we quickly convert the original debug file into a SymCache and then use that for repeated symbolication.
    - https://blog.sentry.io/building-a-sentry-symbolicator/#symbol-servers
    - > Symbol Servers
    - > With our newly announced [support for symbol servers](https://blog.sentry.io/native-crash-reporting-symbol-servers-pdbs-sdk-c-c-plus-plus), we have now added a second, more convenient way to provide debug information. Instead of uploading, Sentry will download debug files as needed.
    >
    > When implementing this feature, we realized just how inconsistent debug file handling still is. While Microsoft has established a de facto standard for addressing PDBs, all other platforms are still very underspecified. In total, we have implemented 5 different schemas for addressing debug information files on symbol servers:
    >
    > - Microsoft SymbolServer (including compression)
    > - [SSQP](https://github.com/dotnet/symstore/blob/master/docs/specs/SSQP_Key_Conventions.md) (Simple Symbol Query Protocol)
    > - Google Breakpad's Directory Layout
    > - [LLDB File Mapped UUID Directories](http://lldb.llvm.org/use/symbols.html#file-mapped-uuid-directories)
    > - [GDB Build ID Method](https://sourceware.org/gdb/onlinedocs/gdb/Separate-Debug-Files.html)
    >
    > While some of them are quite similar, they all handle certain file types differently or have their own formatting for the file identifiers. As we're also continuing to expand our internal repositories of debug files, we will be working towards a more accessible and consistent standard that covers all major platforms to avoid issues like case insensitivity during lookup.
    - https://blog.sentry.io/building-a-sentry-symbolicator/#symbolication-as-a-service
    - > Symbolication as a service
    - > Since the beginning, it has been our objective to create reusable components for the handling of debug information and native symbolication in general.
    - > Over the past months, we have started to move a lot of the symbolication code that we have been using at Sentry into a standalone service. We're now proud to present [Symbolicator](https://github.com/getsentry/symbolicator), a standalone native symbolication service.
    - > Symbolicator can process native stack traces and minidumps and will soon learn more crash report formats. It uses symbol servers to download debug files and cache them intelligently for fast symbolication. Symbolicator also comes with a scope isolation concept built-in so that it can be used in multi-tenant use cases. Over time, we will be adding more capabilities and tools around debug file handling.
    >
    > Additionally, Symbolicator can act as a symbol server proxy. Its API is compatible with Microsoft’s symbol server, which means you can host your own instance and point Visual Studio to it. Symbolicator will automatically serve debug files from configured sources like S3, GCS or any other available symbol server.
    >
    > Symbolicator is and will always be 100% open-source.
    - https://github.com/tc39/ecma426/issues/14
    - > Including dependency graph
    - > Source programs have a module graph that is lost when compiling down to a bundle. If this information
    were persisted in the source map, developer tools could utilize it to inform things such as tree shaking
    (or lack thereof), optimized chunking, editor capabilities (showing in DevTools Sources panel "here are all the files that import this module"?), and possibly more use cases.
    >
    > This would close a big gap between browser devtools knowledge and local bundler knowledge.
    >
    > Some tools exist using module graph data. Currently, they consume the raw debug data outputted by various bundlers.
    > Putting this into the source map specification would greatly reduce the complexity of making such tools.
    >
    > - [Bundle Buddy](https://github.com/samccone/bundle-buddy)
    > - [inspectpack](https://github.com/formidablelabs/inspectpack)
    - https://github.com/tc39/ecma426/issues/33
    - > Improve function name mappings
    - > Bloomberg's solution: [bloomberg/pasta-sourcemaps@`master`/spec.md](https://github.com/bloomberg/pasta-sourcemaps/blob/master/spec.md).
    >
    > It would be great to have a standard form of something in this space.
    - https://github.com/tc39/ecma426/issues/33#issuecomment-1505823429
    - > We've implemented this behavior post hoc, by parsing the minified source and its corresponding source map and original source in [getsentry/js-source-scopes](https://github.com/getsentry/js-source-scopes/) and using it as a part of our error mapping in [getsentry/symbolic@`master`/symbolic-sourcemapcache](https://github.com/getsentry/symbolic/tree/master/symbolic-sourcemapcache)
    - https://github.com/tc39/ecma426/issues/47
    - > List source map implementations
    - https://github.com/tc39/ecma426/issues/47#issuecomment-1590899277
    - > This is our Rust library: [crates.io/crates/sourcemap](https://crates.io/crates/sourcemap)
    >
    > The most notable users of this library to the best of my knowledge are Sentry, SWC and Deno.
    >
    > It's pretty low level, we have higher level wrappers on top of that ([js-source-scopes](https://crates.io/crates/js-source-scopes), [symbolic](https://docs.rs/symbolic/latest/symbolic/sourcemapcache/index.html), and [symbolicator](https://getsentry.github.io/symbolicator/)) for consumption.
    - https://github.com/getsentry/symbolic
    - > Symbolic
    - > Stack trace symbolication library written in Rust
    - > Symbolic is a library written in Rust which is used at Sentry to implement symbolication of native stack traces, sourcemap handling for minified JavaScript and more. It consists of multiple largely independent crates which are bundled together into a C and Python library so it can be used independently of Rust.
    - https://github.com/getsentry/symbolic#whats-in-the-package
    - > What's in the package
    - > Symbolic provides the following functionality:
    >
    > - Symbolication based on custom cache files (symcache)
    > - Symbol cache file generators from:
    > - Mach, ELF and PE symbol tables
    > - Mach, ELF and PE embedded DWARF data
    > - PDB CodeView debug information
    > - .NET Portable PDB
    > - Breakpad symbol files
    > - Unity IL2CPP
    > - Demangling support
    > - C++ (GCC, clang and MSVC)
    > - Objective C / Objective C++
    > - Rust
    > - Swift
    > - JavaScript sourcemap expansion
    > - Basic token mapping
    > - Heuristics to find original function names based on minified sources
    > - Indexed sourcemap to sourcemap merging
    > - Proguard function mappings
    > - Generate Breakpad symbol files from Mach, ELF and PDBs
    > - Convenient C and Python library
    > - Processing of Unreal Engine 4 native crash reports
    > - Extract and process minidumps
    > - Expose logs and UE4 context information
    - https://crates.io/crates/js-source-scopes
    - > js-source-scopes
    > Utilities for extracting and dealing with scope information in JS code
    - https://docs.rs/symbolic/latest/symbolic/sourcemapcache/index.html
    - > Crate sourcemapcache
    - > A fast lookup cache for SourceMaps.
    - https://getsentry.github.io/symbolicator/
    - > Symbolicator is a standalone service that resolves function names, file location and source context in native and JavaScript stack traces. It can process minidumps, Apple crash reports and source maps. Additionally, Symbolicator can act as a proxy to symbol servers supporting multiple formats, such as Microsoft's symbol server or Breakpad symbol repositories.
    - https://getsentry.github.io/symbolicator/advanced/symbol-server-compatibility/
    - > **Symbol Server Compatibility**
    >
    > This page describes external sources supported by Symbolicator.
    >
    > The layout of external sources intends to be compatible to several symbol server implementations that have been used historically by different platforms. We commit to provide compatibility to the following services or directory structures:
    >
    > - Microsoft Symbol Server
    > - Breakpad Symbol Repositories
    > - LLDB File Mapped UUID Directories
    > - GDB Build ID Directories
    > - debuginfod
    > - Unified Symbol Server Layout
    - https://github.com/getsentry/symbolicator
    - > Symbolicator
    - > Native Symbolication as a Service
    - > A symbolication service for native stacktraces and minidumps with symbol server support. It's a flexible frontend for parts of the symbolic library.
    - https://github.com/tc39/ecma426/tree/main/proposals
    - https://github.com/tc39/ecma426/blob/main/proposals/scopes.md
    - > Proposal for adding information about scopes and their bindings to source maps
    - > This document describes an extension to the [source map format](https://tc39.es/source-map-spec/) for encoding scopes and bindings information to improve the debugging experience. There is [another proposal](https://github.com/tc39/source-map-rfc/blob/main/proposals/env.md) that is also trying to solve the same problem, but it includes less information about the scopes and hence doesn't support all scenarios that this proposal supports, like dealing with inlined functions or variable shadowing that was introduced by minification.
    - > We introduce a new field "scopes" to the source map JSON: "scopes" is a string. It contains a list of comma-separated items. Each item is prefixed with a unique "tag". The items themselves build a tree structure that describe "original scope" and "generated range" trees.
    - https://github.com/tc39/ecma426/blob/main/proposals/env.md
    - > Proposal for Encoding Source-Level Environment Information Within Source Maps
    - > This document describes a proposed extension to the [source map format](https://docs.google.com/document/d/1U1RGAehQwRypUTovF1KRlpiOFze0b-_2gc6fAH0KY0k/edit?pli=1#) for encoding source-level environment debugging information, such as scopes and bindings.
    - > This proposal adds an "env" property to source maps, which contains a string of base 64 VLQ numbers. Legacy consumers and generators do not use the "env" property, thus goal (3) is satisfied.
    >
    > The basic syntactic unit is a "record" which has a type and a open ended set of property/value pairs. Records are loosely inspired by DWARF's Debugging Information Entries, but in a "source mappy" way. While the grammar below defines known record types, known properties, and known values, it explicitly allows for unknown record types, unknown properties, and unknown values for use by future extensions. Thus goal (4) is satisfied.
    >
    > In order to satisfy goal (5), we provide an abbreviation mechanism for records (again, similar to DWARF). This allows for the definition of an abbreviation that includes a record type and the set of properties. Then, when serializing a record whose type and property set already has an abbreviation definition, the record's type and properties can be omitted, emitting only the property values. Additionally, a property's value is encoded relative to the last value emitted for that property. This technique is used in encoding segments in the "mappings" property of the source map format, and has proved valuable in reducing file size.
    - > There is a reference implementation for serializing and deserializing this source map extension in a branch of the source-map library: https://github.com/fitzgen/source-map/tree/scopes.
    >
    > Note that it uses "x_env" instead of "env".
    - https://github.com/tc39/ecma426/blob/main/proposals/debug-id.md
    - > Source Map Debug ID Proposal
    >
    > This document presents a proposal to add globally unique build or debug IDs to source maps and generated code, making build artifacts self-identifying and facilitating bidirectional references between Source Maps and generated code.
    - > Source maps play a crucial role in debugging by providing a mapping between generated code and the original source code. However, the current source map specification lacks important properties such as self-describing and self-identifying capabilities for both the generated code as well as the source map. This results in a subpar user experience and numerous practical problems, most prominently making it difficult to associate Source Maps with the corresponding generated code. To address these issues, we propose an extension to the source map format: the addition of globally unique Debug IDs.
    - > Debug IDs (also sometimes called Build IDs) are already used in the native language ecosystem and supported by native container formats such as PE, ELF, MachO or WASM.
    - > Symbol Server Support: With Debug IDs and source maps with embedded sources it becomes possible to support symbol server lookup from symbol servers.
    - https://github.com/tc39/ecma426/blob/main/proposals/debug-id.md#debug-ids
    - > Debug IDs are globally unique identifiers for build artifacts. They are specified to be UUIDs in the format of 85314830-023f-4cf1-a267-535f4e37bb17. The format is intentionally chosen to be strict to ensure consistency and simplicity in generating and consuming tooling.
    >
    > Debug IDs are embedded in both source maps and transformed files, allowing a bidirectional mapping between them. The linking of source maps and transformed files via HTTP headers is explicitly not desired. A file identified by a Debug ID must have that Debug ID embedded to ensure the file is self-identifying.
    - > The way a Debug ID is generated is specific to the toolchain and the only proposed requirement is that Debug IDs are 128-bit values. We propose this requirement to ensure consistency and promote simplicity across the ecosystem.
    >
    > Since Debug IDs are embedded in build artifacts, it is recommended that tools generated deterministic Debug IDs (e.g. UUIDv3, UUIDv5) whenever possible, so that the produced artifacts are stable across builds. Specification-wise, Debug IDs do not need to be deterministic. Determinism is not enforced so that tools can employ non-deterministic fallback mechanisms in case of colliding Debug IDs between two different generated artifacts.
    >
    > Whether or not a Debug ID is deterministic can be encoded via the UUID version bits. For instance, UUIDv3 and UUIDv5 are deterministic by design. Dev tools or debuggers may need to know whether a debug ID is deterministic to decide whether to apply caching mechanisms or not.
    - https://en.wikipedia.org/wiki/Universally_unique_identifier#Versions_3_and_5_(namespace_name-based)
    - > Universally unique identifier
    - > Versions 3 and 5 (namespace name-based)
    >
    > Version-3 and version-5 UUIDs are generated by hashing a namespace identifier and name. Version 3 uses MD5 as the hashing algorithm, and version 5 uses SHA-1.
    >
    > The namespace identifier is itself a UUID. The specification provides UUIDs to represent the namespaces for URLs, fully qualified domain names, object identifiers, and X.500 distinguished names; but any desired UUID may be used as a namespace designator.
    >
    > To determine the version-3 UUID corresponding to a given namespace and name, the UUID of the namespace is transformed to a string of bytes, concatenated with the input name, then hashed with MD5, yielding 128 bits. Then 6 or 7 bits are replaced by fixed values, the 4-bit version (e.g. 00112 for version 3), and the 2- or 3-bit UUID "variant" (e.g. 102 indicating an RFC 9562 UUIDs, or 1102 indicating a legacy Microsoft GUID). Since 6 or 7 bits are thus predetermined, only 121 or 122 bits contribute to the uniqueness of the UUID.
    >
    > Version-5 UUIDs are similar, but SHA-1 is used instead of MD5. Since SHA-1 generates 160-bit digests, the digest is truncated to 128 bits before the version and variant bits are replaced.
    >
    > Version-3 and version-5 UUIDs have the property that the same namespace and name will map to the same UUID. However, neither the namespace nor name can be determined from the UUID, even if one of them is specified, except by brute-force search. RFC 4122 recommends version 5 (SHA-1) over version 3 (MD5), and warns against use of UUIDs of either version as security credentials.
    - > We propose adding a `debugId` property to the source map at the top level of the source map object. This property must be a string value representing the Debug ID in hexadecimal characters, using the canonical UUID format
    - > Generated JavaScript files containing a Debug ID must embed the ID near the end of the source, ideally on the last line, in the format `//# debugId=<DEBUG_ID>` using the canonical UUID format
    - > If the special `//# sourceMappingURL=` comment already exists in the file, it is recommended to place the debugId comment in the line above to maintain compatibility with existing tools. Because the last line already has meaning in the existing specification for the sourceMappingURL comment, tools are required to examine the last 5 lines to discover the Debug ID.
    >
    > Note on the end of file: for all intents and purposes having the Debug ID at the top of the file would be preferable. However this has the disadvantage that a tool could not add a Debug ID to a file without having to adjust all the tokens in the source map by the offset that this line adds. Having it at the end of the file means it's after all tokens which would allow a separate tool to add Debug IDs to generated files and source maps.
    - https://github.com/tc39/ecma426/blob/main/proposals/debug-id.md#appendix-b-symbol-server-support
    - > Appendix B: Symbol Server Support
    >
    > With debug IDs it becomes possible to resolve source maps and generated code from the server. That way a tool such as a browser or a crash reporter could be pointed to a S3, GCS bucket or an HTTP server that can serve up source maps and build artifacts keyed by debug id.
    >
    > The strong upside of keying by an ID rather than a URL is that an ID is more resistant to resources moving on the symbol server.
    >
    > An additional use-case that was discovered is that Debug IDs can be passed alongside "resources" to browser extensions, and if the browser supports it, the browser extensions can resolve source map resources for the passed debug IDs and pass them back to the browser to populate dev tools. While this would technically be possible with URLs, having IDs makes association between JS resources and source maps much simpler and it is resistant to the JS resource locations changing.
    - https://github.com/tc39/ecma426/blob/main/proposals/debug-id.md#implementors
    - > The following Source Map **Generators** have implemented Debug IDs as proposed:
    >
    > - Rollup ([`output.sourcemapDebugIds` option](https://rollupjs.org/configuration-options/#output-sourcemapdebugids))
    > - Oxc ([`debug_id` API](https://docs.rs/oxc/latest/oxc/sourcemap/struct.JSONSourceMap.html#structfield.debug_id))
    > - Expo ([Injected by default](https://docs.expo.dev/versions/latest/config/metro/#source-map-debug-id))
    > - Rolldown ([`output.sourcemapDebugIds` option](https://github.com/rolldown/rolldown/pull/2516))
    >
    > The following Source Map **Consumers/Debuggers** have implemented Debug IDs:
    >
    > - Sentry.io ([Docs](https://docs.sentry.io/platforms/javascript/sourcemaps/troubleshooting_js/artifact-bundles/#artifact-bundles))
    - https://sourcemaps.info/
    - > Online Source Mapper
    - > This tool lets you apply source maps to stacktraces.
    > Paste the minified stacktrace or just a url with a line number and a column number, and it will look up the original.
    - https://sourcemaps.info/spec.html
    - > Source Map Revision 3 Proposal
    - > **Note!** This is a copy of [the standard](https://docs.google.com/document/d/1U1RGAehQwRypUTovF1KRlpiOFze0b-_2gc6fAH0KY0k/edit#heading=h.djovrt4kdvga). Any discrepancies should be resolved in favour of the Google Doc.
    - https://sourcemaps.info/spec.html#h.qz3o9nc69um5
    - > Proposed Format
    - > 7: `"names": ["src", "maps", "are", "fun"],`
    - > Line 7: A list of symbol names used by the “mappings” entry.
    - https://github.com/microsoft/TypeScript/issues/59905
    - > TypeScript 5.7 Iteration Plan
    - > Investigate Support for Sourcemap v4
    - https://github.com/microsoft/TypeScript/issues/46695
    - > Extend Source Maps to support post-hoc debugging
    - > I would like to propose that TypeScript create an experimental source map extension (version 4). The purpose of this would be to improve the ability to apply Source Map data to stack traces and to have the Source Map contain everything needed to go from a minified stack trace to an unminified stack trace without using function name guessing.
    - > I want to use this to improve in-production debugging experiences, specifically, to improve stack traces, particularly those that exist after minification.
    >
    > Presently, to work around this, we need to do one of two things:
    >
    > - Manually reconstruct the stack by inspecting the source contents (as mentioned above)
    > - Or, by writing code that "guesses" the function name by walking backwards from the location the Mapping produces. (This is the mechanism that stacktrace-gps uses, although it often doesn't work for TypeScript sources because it doesn't recognize type annotations).
    - https://github.com/stacktracejs/stacktrace-gps
    - > Turns partial code location into precise code location
    - > This library accepts a code location (in the form of a StackFrame) and returns a new StackFrame with a more accurate location (using source maps) and guessed function names.
    - > Proposed changes to the Source Map spec
    - > Substantively: Only the `mappings` field would be altered, and would be altered by adding the 6th field. The complete section is included below:
    - > 6. If present, the zero-based index into the "names" list associated with the call stack of this segment. This field is a base 64 VLQ relative to the previous occurrence of this field, unless this is the first occurrence of this field, in which case the whole value is represented.
    - > In addition, the `version` field of the spec should be bumped to 4.
    - https://github.com/microsoft/TypeScript/issues/46695#issuecomment-962397374
    - > The [source map spec](https://sourcemaps.info/spec.html) outlines as many as 6 data that are contained when parsing the `mappings` field: generated line, generated column, source line, source column, index into `sources` (to find the name of the source file), and index into `names` which represents the token at the current point of execution. (This is to allow for variable renaming to still function within the live debugger).
    - https://github.com/microsoft/TypeScript/issues/46695#issuecomment-962439379
    - > Hello friends, this is a valuable topic. I'm pleased to see interest in progressing the sourcemap spec.
    >
    > We're currently using the [`pasta-sourcemaps`](https://github.com/bloomberg/pasta-sourcemaps) extension (originally created by [@ldarbi](https://github.com/ldarbi)) to deal with this. Pasta stands for _"Pretty (and) Accurate Stack Trace Analysis"_.
    >
    > The extension solves the exact use-case outlined above. It permits accurate decoding of function names without guessing and without the need to consult the original source files. The sourcemap tells you everything.
    >
    > [`pasta-sourcemaps`](https://github.com/bloomberg/pasta-sourcemaps) works by adding an additional field to the sourcemap called `"x_com_bloomberg_sourcesFunctionMappings"` which contains a series of VLQ-encoded mappings that identify named function spans in the original source. So you first use a pre-existing decoding function (that reads `"mappings"`) to identify a source position `(file, line, column)`, and then use that position to lookup the original function name in the dedicated list of function spans. Here's [the spec.](https://github.com/bloomberg/pasta-sourcemaps/blob/master/spec.md)
    >
    > [`pasta-sourcemaps`](https://github.com/bloomberg/pasta-sourcemaps) has been used in production as part of the Bloomberg Terminal's crash stack telemetry for over two years, handling millions of stacks. It supports `.js/.jsx/.ts/.tsx` source files, is mature, and has a nifty logo. We aim to keep it up to date with the latest TypeScript version - though it is temporarily lagging on TS 4.3.
    >
    > We had early discussions with [@bcoe](https://github.com/bcoe) about getting pasta support into Node but never got around to acting on it. Since then, Node gained a best-effort (guessing) implementation, but would still benefit from something like this to increase reliability.
    >
    > Please take a look at the approach for inspiration. It would be interesting to compare extending the `"mappings"` vs adding an extra field. We've not been in control of the tools (like TypeScript) that generate `"mappings"` and therefore it seemed easiest to add an extra field that can be guaranteed to be a complete record of all the necessary function ranges. Whereas the sourcemap spec itself says nothing about completeness of `"mappings"` - it's up to the specific encoder (e.g. in TypeScript, or in Terser) to decide when to emit them, so different tools make different decisions on the fidelity of the points.
    >
    > So thank you [@robpaveza](https://github.com/robpaveza) for raising this issue! I'd love to see this problem more widely solved. Beyond stack trace decoding, this information could also be used in DevTools, e.g. the VSCode debugger's call stack could use it to show the original function name when debugging minified code.
    - https://github.com/bloomberg/pasta-sourcemaps
    - > @bloomberg/pasta-sourcemaps
    - > Pretty (and) Accurate Stack Trace Analysis is an extension to the JavaScript source map format that allows for accurate function name decoding.
    - > `pasta`, or Pretty (and) Accurate Stack Trace Analysis, is an implementation of an extension to the source map format that allows for accurate function name decoding. It allows you to extract function-related metadata from a source file and encode it into a source map, as well as decode a pasta-enriched source map to query enclosing function names for a given location.
    - > Today, [source maps](https://docs.google.com/document/d/1U1RGAehQwRypUTovF1KRlpiOFze0b-_2gc6fAH0KY0k/edit?hl=en_US&pli=1&pli=1) already provide the ability to produce accurate _locations_ (filename, line number, column number) in a crash stack, but not _enclosing function names_. This hinders debugging and confuses automatic crash stack consolidation.
    >
    > `pasta` extends the source map format with a `x_com_bloomberg_sourcesFunctionMappings` field to allow for accurate function name decoding. See [spec.md](https://github.com/bloomberg/pasta-sourcemaps/blob/main/spec.md) to learn more about the `pasta` format.
    - https://github.com/bloomberg/pasta-sourcemaps/blob/main/spec.md
    - > pasta, Pretty (and) Accurate Stack Trace Analysis
    > pasta extends the source map format with a `x_com_bloomberg_sourcesFunctionMappings` field to allow for accurate function name decoding.
    - https://github.com/bloomberg/pasta-sourcemaps/blob/main/spec.md#proposed-source-map-extension-x_com_bloomberg_sourcesfunctionmappings
    - > **Proposed Source Map Extension: `x_com_bloomberg_sourcesFunctionMappings`**
    >
    > `x_com_bloomberg_sourcesFunctionMappings` is a new source map field that describes, for a given range in a source, the enclosing function's name.
    >
    > This implies a two-phase decode. First, the source location is decoded, the same way it is done today. Second, the function name is decoded using the source location.
    >
    > The existing `names` array is used to store function names.
    - https://github.com/bloomberg/pasta-sourcemaps#api
    - > **API**
    >
    > `@bloomberg/pasta-sourcemaps` exposes three utilities:
    >
    > - [parser](https://github.com/bloomberg/pasta-sourcemaps/blob/main/src/parser.ts)
    > - [encoder](https://github.com/bloomberg/pasta-sourcemaps/blob/main/src/encoder.ts)
    > - [decoder](https://github.com/bloomberg/pasta-sourcemaps/blob/main/src/decoder.ts)
    >
    > The parser and the encoder are normally used in conjunction to parse a source file and encode the resulting function descriptions into a source map.
    >
    > The decoder takes a pasta-enriched sourcemap and gives back enclosing function names for a given source file, line and column location.
    >
    > To read the full API documentation please visit the [GitHub Pages](https://bloomberg.github.io/pasta-sourcemaps/)
    - https://github.com/Rich-Harris/vlq
    - > vlq.js
    > Convert integers to a Base64-encoded VLQ string, and vice versa.
    - > Generate, and decode, base64 VLQ mappings for sourcemaps and other uses
    - > What is a VLQ string?
    >
    > A variable-length quantity is a compact way of encoding large integers in text (i.e. in situations where you can't transmit raw binary data). An integer represented as digits will always take up more space than the equivalent VLQ representation
    - > Adapted from murzwin.com/base64vlq.html by Alexander Pavlov.
    - https://www.murzwin.com/base64vlq.html
    - > Base64 VLQ Codec (Coder/Decoder) And Sourcemap V3 / ECMA-426 Mappings Parser
    - https://github.com/microsoft/TypeScript/issues/46695#issuecomment-964509602
    - > I've been mulling over the options here, and I think that the most interesting possibility here is to leverage what you've put together with pasta-sourcemaps. To that end, I've authored an explainer [here](https://github.com/MicrosoftEdge/MSEdgeExplainers/pull/538) - see [the direct explainer link](https://github.com/MicrosoftEdge/MSEdgeExplainers/blob/user/ropaveza/sourcemaps-v3.1/DevTools/SourceMaps/explainer.md) - which borrows inspiration from `pasta-sourcemaps` but splits the function names into a second array, rather than into the `names` array. I did this primarily so that for the scenario:
    >
    > `file.ts -> file.js -> file.min.js`
    >
    > The minifier doesn't need to know anything about these fields, it just needs to pass them through. For tools like Rollup or Webpack, it's also fairly trivial: if they have access to the original source files anyway, the originally-transpiled values can just move into the `scopes` field in the same way.
    - https://github.com/MicrosoftEdge/MSEdgeExplainers/pull/538
    - > Source Maps v4 Proposal
    - > This public enhancement proposal updates the existing Source Maps spec to improve stack trace decoding capabilities.
    - > The [previous version of the Source Maps specification](https://sourcemaps.info/spec.html) is reasonably thorough in supporting live debugging. However, one shortcoming that the specification lacks is that it isn't obvious how to decode the names of functions on the call stack. Several implementations (including Node.js and a library [stacktrace-gps](https://github.com/stacktracejs/stacktrace-gps)) use name guessing, and [pasta-sourcemaps](https://github.com/bloomberg/pasta-sourcemaps) uses precompilation and a custom extension to reconstruct a stack trace.
    - > Proposed Solution
    >
    > We want to adopt the general approach taken by the `pasta-sourcemaps` ("Pretty (and) Accurate Stack Trace Analysis") library. However, instead of adding to the `names` field, we will add two fields.
    >
    > - Add two additional field to the source map: `scopes` and `scopeNames`
    > - The `scopes` field should be a list of ranges and pointers into `scopeNames`
    > - The ranges in the field should always point to **Original Source** locations.
    - https://github.com/webpack/webpack/issues/14996
    - > Proposal: Source maps v4
    - https://github.com/webpack/webpack/issues/14996#issuecomment-996748709
    - > The above is core spec discussion rather than anything specific to webpack, so maybe we should talk about this in [the proposal repo](https://github.com/MicrosoftEdge/MSEdgeExplainers/pull/538)? This also highlights the need for us to build recorded rationale for each key design decision so we can get more buy-in and avoid explaining these same things across repos. I think [@ldarbi](https://github.com/ldarbi) can help with that.
    >
    > Maybe it would also be good to chat about this in [the TC39 Tooling Group](https://github.com/tc39/js-outreach-groups) run by [@romulocintra](https://github.com/romulocintra). That brings together the kind of folk who would be interested in this. I think the next meeting will be mid-January (the github is a little out of date).
    - https://github.com/vitejs/vite/issues/6672
    - > Source maps v4 proposal support

    ### Visualisation/etc

    - https://github.com/keeyipchan/esgoggles
  6. @0xdevalias 0xdevalias revised this gist Mar 12, 2025. 1 changed file with 8 additions and 5 deletions.
    13 changes: 8 additions & 5 deletions _deobfuscating-unminifying-obfuscated-web-app-code.md
    Original file line number Diff line number Diff line change
    @@ -2040,15 +2040,18 @@ These are private chat links, so won't work for others, and are included here on
    - https://github.com/0xdevalias
    - https://gist.github.com/0xdevalias
    - https://github.com/0xdevalias/chatgpt-source-watch : Analyzing the evolution of ChatGPT's codebase through time with curated archives and scripts.
    - [Reverse engineering ChatGPT's frontend web app + deep dive explorations of the code (0xdevalias gist)](https://gist.github.com/0xdevalias/4ac297ee3f794c17d0997b4673a2f160#reverse-engineering-chatgpts-frontend-web-app--deep-dive-explorations-of-the-code)
    - [Reverse Engineering Webpack Apps (0xdevalias gist)](https://gist.github.com/0xdevalias/8c621c5d09d780b1d321bfdb86d67cdd#reverse-engineering-webpack-apps)
    - [Reverse engineering ChatGPT's frontend web app + deep dive explorations of the code (0xdevalias' gist)](https://gist.github.com/0xdevalias/4ac297ee3f794c17d0997b4673a2f160#reverse-engineering-chatgpts-frontend-web-app--deep-dive-explorations-of-the-code)
    - [Reverse Engineering Webpack Apps (0xdevalias' gist)](https://gist.github.com/0xdevalias/8c621c5d09d780b1d321bfdb86d67cdd#reverse-engineering-webpack-apps)
    - [React Internals (subsection)](https://gist.github.com/0xdevalias/8c621c5d09d780b1d321bfdb86d67cdd#react-internals)
    - [Vue Internals (subsection)](https://gist.github.com/0xdevalias/8c621c5d09d780b1d321bfdb86d67cdd#vue-internals)
    - [Angular Internals (subsection)](https://gist.github.com/0xdevalias/8c621c5d09d780b1d321bfdb86d67cdd#angular-internals)
    - [React Server Components, Next.js v13+, and Webpack: Notes on Streaming Wire Format (`__next_f`, etc) (0xdevalias' gist))](https://gist.github.com/0xdevalias/ac465fb2f7e6fded183c2a4273d21e61#react-server-components-nextjs-v13-and-webpack-notes-on-streaming-wire-format-__next_f-etc)
    - [Fingerprinting Minified JavaScript Libraries / AST Fingerprinting / Source Code Similarity / Etc (0xdevalias' gist)](https://gist.github.com/0xdevalias/31c6574891db3e36f15069b859065267#fingerprinting-minified-javascript-libraries--ast-fingerprinting--source-code-similarity--etc)
    - [JavaScript Web App Reverse Engineering - Module Identification (0xdevalias' gist)](https://gist.github.com/0xdevalias/28c18edfc17606f09cf413f97e404a60#javascript-web-app-reverse-engineering---module-identification)
    - [Reverse Engineered Webpack Tailwind-Styled-Component (0xdevalias' gist)](https://gist.github.com/0xdevalias/916e4ababd3cb5e3470b07a024cf3125#reverse-engineered-webpack-tailwind-styled-component)
    - [Bypassing Cloudflare, Akamai, etc (0xdevalias gist)](https://gist.github.com/0xdevalias/b34feb567bd50b37161293694066dd53#bypassing-cloudflare-akamai-etc)
    - [Debugging Electron Apps (and related memory issues) (0xdevalias gist)](https://gist.github.com/0xdevalias/428e56a146e3c09ec129ee58584583ba#debugging-electron-apps-and-related-memory-issues)
    - [devalias' Beeper CSS Hacks (0xdevalias gist)](https://gist.github.com/0xdevalias/3d2f5a861335cc1277b21a29d1285cfe#beeper-custom-theme-styles)
    - [Bypassing Cloudflare, Akamai, etc (0xdevalias' gist)](https://gist.github.com/0xdevalias/b34feb567bd50b37161293694066dd53#bypassing-cloudflare-akamai-etc)
    - [Debugging Electron Apps (and related memory issues) (0xdevalias' gist)](https://gist.github.com/0xdevalias/428e56a146e3c09ec129ee58584583ba#debugging-electron-apps-and-related-memory-issues)
    - [devalias' Beeper CSS Hacks (0xdevalias' gist)](https://gist.github.com/0xdevalias/3d2f5a861335cc1277b21a29d1285cfe#beeper-custom-theme-styles)
    - [Reverse Engineering Golang (0xdevalias' gist)](https://gist.github.com/0xdevalias/4e430914124c3fd2c51cb7ac2801acba#reverse-engineering-golang)
    - [Reverse Engineering on macOS (0xdevalias' gist)](https://gist.github.com/0xdevalias/256a8018473839695e8684e37da92c25#reverse-engineering-on-macos)
    - [Editor Frameworks and Collaborative Editing/Conflict Resolution Tech (0xdevalias' gist)](https://gist.github.com/0xdevalias/2fc3d66875dcc76d5408ce324824deab#editor-frameworks-and-collaborative-editingconflict-resolution-tech)
  7. @0xdevalias 0xdevalias revised this gist Mar 12, 2025. 1 changed file with 2 additions and 0 deletions.
    2 changes: 2 additions & 0 deletions _deobfuscating-unminifying-obfuscated-web-app-code.md
    Original file line number Diff line number Diff line change
    @@ -2037,6 +2037,8 @@ These are private chat links, so won't work for others, and are included here on

    ### My Other Related Deepdive Gist's and Projects

    - https://github.com/0xdevalias
    - https://gist.github.com/0xdevalias
    - https://github.com/0xdevalias/chatgpt-source-watch : Analyzing the evolution of ChatGPT's codebase through time with curated archives and scripts.
    - [Reverse engineering ChatGPT's frontend web app + deep dive explorations of the code (0xdevalias gist)](https://gist.github.com/0xdevalias/4ac297ee3f794c17d0997b4673a2f160#reverse-engineering-chatgpts-frontend-web-app--deep-dive-explorations-of-the-code)
    - [Reverse Engineering Webpack Apps (0xdevalias gist)](https://gist.github.com/0xdevalias/8c621c5d09d780b1d321bfdb86d67cdd#reverse-engineering-webpack-apps)
  8. @0xdevalias 0xdevalias revised this gist Mar 11, 2025. 1 changed file with 353 additions and 1 deletion.
    354 changes: 353 additions & 1 deletion fingerprinting-minified-javascript-libraries.md
    Original file line number Diff line number Diff line change
    @@ -10,6 +10,11 @@
    - [Issue 15: Explore using embeddings/similar to identify/track similar chunks/modules even when renamed](#issue-15-explore-using-embeddingssimilar-to-identifytrack-similar-chunksmodules-even-when-renamed)
    - [On `j4k0xb/webcrack`](#on-j4k0xbwebcrack)
    - [Issue 21: rename short identifiers](#issue-21-rename-short-identifiers)
    - [Issue 62: add smart-rename rule from wakaru](#issue-62-add-smart-rename-rule-from-wakaru)
    - [Issue 143: `[plugin]` Add support for `data-sentry-component` / `data-sentry-element` / `data-sentry-source-file` (from `@sentry/babel-plugin-component-annotate`)](#issue-143-plugin-add-support-for-data-sentry-component--data-sentry-element--data-sentry-source-file-from-sentrybabel-plugin-component-annotate)
    - [Issue 151: `[plugin]` plugin to support WordPress Gutenberg specific blocks features (including how it injects `window.React`, `window.wp.element`, etc) within JSX decompilation](#issue-151-plugin-plugin-to-support-wordpress-gutenberg-specific-blocks-features-including-how-it-injects-windowreact-windowwpelement-etc-within-jsx-decompilation)
    - [Issue 152: `[plugin]` plugin to support unminifying `goober` CSS-in-JS library patterns + related JSX decompilation](#issue-152-plugin-plugin-to-support-unminifying-goober-css-in-js-library-patterns--related-jsx-decompilation)
    - [Issue 154: "stable" identifier demangling](#issue-154-stable-identifier-demangling)
    - [On `pionxzh/wakaru`](#on-pionxzhwakaru)
    - [Issue 34: support `un-mangle` identifiers](#issue-34-support-un-mangle-identifiers)
    - [Issue 41: Module detection](#issue-41-module-detection)
    @@ -165,6 +170,8 @@ Source: https://chat.openai.com/c/d9b7b64f-aa93-474e-939f-79e376e6d375
    ## Thoughts / comments as I've articulated them elsewhere

    This is mostly making references to issues I have opened, or comments I have written in them.. but sometimes it also just includes an interesting issue within the same theme as the types of things I am interested about here as well.

    ### On `0xdevalias/chatgpt-source-watch`

    #### Issue 15: Explore using embeddings/similar to identify/track similar chunks/modules even when renamed
    @@ -506,6 +513,351 @@ Source: https://chat.openai.com/c/d9b7b64f-aa93-474e-939f-79e376e6d375
    >
    > _Originally posted by @0xdevalias in https://github.com/j4k0xb/webcrack/issues/21#issuecomment-1822262649_
    #### Issue 62: add smart-rename rule from wakaru
    - https://github.com/j4k0xb/webcrack/issues/62
    - > add smart-rename rule from wakaru
    - https://github.com/j4k0xb/webcrack/pull/63
    - > feat: rename destructuring
    - https://github.com/j4k0xb/webcrack/pull/100
    - > feat: configurable smart rename
    - https://github.com/pionxzh/wakaru/blob/main/packages/unminify/README.md#smart-rename
    - https://github.com/pionxzh/wakaru/blob/main/packages/unminify/src/transformations/smart-rename.ts
    - `handleDestructuringRename`
    - `handleReactRename`
    #### Issue 143: `[plugin]` Add support for `data-sentry-component` / `data-sentry-element` / `data-sentry-source-file` (from `@sentry/babel-plugin-component-annotate`)
    See:
    - [On `pionxzh/wakaru`](#on-pionxzhwakaru)
    - [Issue 140: `[smart-rename]` Add support for `data-sentry-component` / `data-sentry-element` / `data-sentry-source-file` (from `@sentry/babel-plugin-component-annotate`)](#issue-140-smart-rename-add-support-for-data-sentry-component--data-sentry-element--data-sentry-source-file-from-sentrybabel-plugin-component-annotate)
    Note: This was crossposted to the following issues:
    - https://github.com/j4k0xb/webcrack/issues/143
    - https://github.com/pionxzh/wakaru/issues/140
    - https://github.com/jehna/humanify/issues/350
    #### Issue 151: `[plugin]` plugin to support WordPress Gutenberg specific blocks features (including how it injects `window.React`, `window.wp.element`, etc) within JSX decompilation
    > Mostly creating this based on the exploration I did in https://github.com/j4k0xb/webcrack/issues/10#issuecomment-2693645060 before I realised it was likely unrelated to more core React / JSX handling.
    >
    > I suspect the bulk of this is niche enough that it wouldn't make sense to include in core, but would be a good candidate for a plugin as per https://github.com/j4k0xb/webcrack/issues/143#issuecomment-2692345330 / https://github.com/j4k0xb/webcrack/issues/143#issuecomment-2692517232
    >
    > This is also aligned to `wakaru`'s proposed module-detection feature:
    >
    > - https://github.com/pionxzh/wakaru/issues/41
    >
    > @j4k0xb I also don't expect this to be something you create; but figured since I already did the deeper exploration in this repo, I may as well create a standalone reference point for it, even if this issue ends up getting closed.
    >
    > ---
    >
    > From my prior exploration:
    >
    > > **Edit 2:** Looking a bit deeper, I think `window.wp.element` relates more specifically to how the Wordpress Gutenberg editor may inject things:
    > >
    > > - https://github.com/search?type=code&q=window.wp.element
    > > - https://wordpress.org/gutenberg/
    > > - https://github.com/WordPress/gutenberg
    > > - https://developer.wordpress.org/block-editor/reference-guides/packages/packages-create-block/
    > > - https://github.com/WordPress/gutenberg/tree/trunk/packages/create-block
    > > - https://github.com/WordPress/gutenberg/blob/2103d5021066593f25f2baae9038b0cf23372b7f/packages/create-block/lib/templates/es5/index.js.mustache#L9-L14
    > > - We can see `wp.element.createElement` / etc usage here
    > > - And where it's reading from `window.wp` here
    > > - https://github.com/WordPress/gutenberg/blob/2103d5021066593f25f2baae9038b0cf23372b7f/packages/create-block/lib/templates/es5/index.js.mustache#L71
    > >
    > > Specifically in the 'plain JS' usage:
    > >
    > > - https://github.com/hrsetyono/gutenberg-tutorial/blob/476f19ee0413ebf719df8981dc982b6aa5b64348/README.md?plain=1#L19-L31
    > > - https://developer.wordpress.org/block-editor/getting-started/fundamentals/javascript-in-the-block-editor/#javascript-without-a-build-process
    > > - > When you opt out of a build process, you interact directly with WordPress’s [JavaScript APIs](https://developer.wordpress.org/block-editor/reference-guides/packages/) through the global `wp` object.
    > > - https://developer.wordpress.org/block-editor/reference-guides/packages/#using-the-packages-via-wordpress-global
    > > - > JavaScript packages are available as a registered script in WordPress and can be accessed using the `wp` global variable.
    > > - https://www.npmjs.com/org/wordpress
    > >
    > > So using `window.wp.element` would map to a version of `@wordpress/element`, provided by the backend through the `window.wp` global:
    > >
    > > - https://github.com/WordPress/gutenberg/tree/trunk/packages/element
    > > - https://developer.wordpress.org/block-editor/reference-guides/packages/packages-element/
    > >
    > > Whereas in the non-static version, we can see that `registerBlockType` directly refers to the imported `Edit` / `Save`, which seem to handle their own imports, and/or use a JSX transform defined elsewhere in the build chain:
    > >
    > > - https://github.com/WordPress/gutenberg/blob/2103d5021066593f25f2baae9038b0cf23372b7f/packages/create-block/lib/templates/block/index.js.mustache#L26-L43
    > > - https://github.com/WordPress/gutenberg/blob/2103d5021066593f25f2baae9038b0cf23372b7f/packages/create-block/lib/templates/block/edit.js.mustache
    > > - https://github.com/WordPress/gutenberg/blob/2103d5021066593f25f2baae9038b0cf23372b7f/packages/create-block/lib/templates/block/save.js.mustache
    > > - https://developer.wordpress.org/block-editor/getting-started/fundamentals/javascript-in-the-block-editor/#javascript-with-a-build-process
    > >
    > > We can also see that the `window.React` global might come from Wordpress Gutenberg as well, as we can see from this example code that injects it:
    > >
    > > - https://github.com/search?q=repo%3AWordPress%2Fgutenberg%20window.React&type=code
    > > - https://github.com/WordPress/gutenberg/tree/2103d5021066593f25f2baae9038b0cf23372b7f/packages/editor#blockcontrols
    > > - https://github.com/WordPress/gutenberg/tree/2103d5021066593f25f2baae9038b0cf23372b7f/packages/editor#richtext
    > >
    > > We also get another clue here, where again `window.React` is injected into the function, and then a followup note to that:
    > >
    > > - https://github.com/WordPress/gutenberg/blob/2103d5021066593f25f2baae9038b0cf23372b7f/docs/how-to-guides/plugin-sidebar-0.md#step-1-get-a-sidebar-up-and-running
    > > - > For this code to work, those utilities need to be available in the browser, so you must specify `wp-plugins`, `wp-editor`, and `react` as dependencies of your script.
    > > - > Here is the PHP code to register your script and specify the dependencies:
    > > >
    > > > ```php
    > > > function sidebar_plugin_register() {
    > > > wp_register_script(
    > > > 'plugin-sidebar-js',
    > > > plugins_url( 'plugin-sidebar.js', __FILE__ ),
    > > > array( 'wp-plugins', 'wp-editor', 'react' )
    > > > );
    > > > }
    > > > add_action( 'init', 'sidebar_plugin_register' );
    > > > ```
    > >
    > > So I guess, similar to the comment made in https://github.com/j4k0xb/webcrack/issues/143#issuecomment-2692345330, the deeper specifics of this may belong in a [separate plugin](https://github.com/j4k0xb/webcrack/issues/143#issuecomment-2692517232) instead of `webcrack` core.
    > >
    > > Though.. I do wonder if the `window.React` (assigned to a variable) usage is generic enough that it might make sense to include in core?
    > >
    > > - https://github.com/search?type=code&q=window.React
    > >
    > > _Originally posted by @0xdevalias in https://github.com/j4k0xb/webcrack/issues/10#issuecomment-2693645060_
    >
    > ## See Also
    >
    > - https://github.com/j4k0xb/webcrack/issues/152
    >
    > _Originally posted by @0xdevalias in https://github.com/j4k0xb/webcrack/issues/151#issue-2890824882_
    #### Issue 152: `[plugin]` plugin to support unminifying `goober` CSS-in-JS library patterns + related JSX decompilation
    > Mostly creating this based on the exploration I did in https://github.com/j4k0xb/webcrack/issues/10#issuecomment-2693645060 in case there is no generic way to solve that in core, and it needs to be a more library specific plugin solution as per https://github.com/j4k0xb/webcrack/issues/143#issuecomment-2692345330 / https://github.com/j4k0xb/webcrack/issues/143#issuecomment-2692517232
    >
    > This is also aligned to `wakaru`'s proposed module-detection feature:
    >
    > - https://github.com/pionxzh/wakaru/issues/41
    >
    > @j4k0xb I also don't expect this to be something you create; but figured since I already did the deeper exploration in this repo, I may as well create a standalone reference point for it, even if this issue ends up getting closed.
    >
    > ---
    >
    > From my prior exploration:
    >
    > > **Edit 3:** Looking at the code from https://github.com/j4k0xb/webcrack/issues/10#issuecomment-2692599211 again, I think there is another case where JSX-like things may not be currently getting decompiled properly, which is syntax like this:
    > >
    > > ```js
    > > /* ..snip.. */
    > > /* 541 */ var Z = h("div")`
    > > /* 542 */ display: flex;
    > > /* 543 */ justify-content: center;
    > > /* 544 */ margin: 4px 10px;
    > > /* 545 */ color: inherit;
    > > /* 546 */ flex: 1 1 auto;
    > > /* 547 */ white-space: pre-line;
    > > /* 548 */ `;
    > > /* ..snip.. */
    > > /* 567 */ let c = t.createElement(Z, {
    > > /* 568 */ ...e.ariaProps
    > > /* 569 */ }, g(e.message, e));
    > > /* ..snip.. */
    > > ```
    > >
    > > Looking higher up in the file, we see the definition for `h`:
    > >
    > > ```js
    > > /* ..snip.. */
    > > /* 106 */ function h(e, t) {
    > > /* 107 */ let l = this || {};
    > > /* 108 */ return function () {
    > > /* 109 */ let i = arguments;
    > > /* 110 */ function n(a, o) {
    > > /* 111 */ let c = Object.assign({}, a);
    > > /* 112 */ let s = c.className || n.className;
    > > /* 113 */ l.p = Object.assign({
    > > /* 114 */ theme: p && p()
    > > /* 115 */ }, c);
    > > /* 116 */ l.o = / *go\d+/.test(s);
    > > /* 117 */ c.className = m.apply(l, i) + (s ? " " + s : "");
    > > /* 118 */ if (t) {
    > > /* 119 */ c.ref = o;
    > > /* 120 */ }
    > > /* 121 */ let r = e;
    > > /* 122 */ if (e[0]) {
    > > /* 123 */ r = c.as || e;
    > > /* 124 */ delete c.as;
    > > /* 125 */ }
    > > /* 126 */ if (w && r[0]) {
    > > /* 127 */ w(c);
    > > /* 128 */ }
    > > /* 129 */ return y(r, c);
    > > /* 131 */ }
    > > /* 132 */ if (t) {
    > > /* 133 */ return t(n);
    > > /* 134 */ } else {
    > > /* 135 */ return n;
    > > /* 136 */ }
    > > /* 137 */ };
    > > /* 138 */ }
    > > /* ..snip.. */
    > > ```
    > >
    > > And searching GitHub code for `/ *go\d+/.test` leads us to the
    > > - https://github.com/search?type=code&q=%22%2F+*go%5Cd%2B%2F.test%22
    > > - https://github.com/cristianbote/goober/blob/5f0b43976fac214262c2c8921b1691fc4729ec98/src/styled.js#L20-L71
    > > - https://github.com/cristianbote/goober
    > > - > goober, a less than 1KB css-in-js solution
    > > - https://goober.rocks/
    > >
    > > Which we can then also see additional confirmation for in earlier code as well:
    > >
    > > - https://github.com/cristianbote/goober/blob/5f0b43976fac214262c2c8921b1691fc4729ec98/src/core/get-sheet.js#L11-L25
    > >
    > > ```js
    > > /* ..snip.. */
    > > /* 6 */ let i = e => typeof window == "object" ? ((e ? e.querySelector("#_goober") : window._goober) || Object.assign((e || document.head).appendChild(document.createElement("style")), {
    > > /* 7 */ innerHTML: " ",
    > > /* 8 */ id: "_goober"
    > > /* 9 */ })).firstChild : e || l;
    > > /* ..snip.. */
    > > ```
    > >
    > > Which seems to be used across a number of libs/projects:
    > >
    > > - https://github.com/search?type=code&q=%22%23_goober%22+OR+%22window._goober%22
    > >
    > > Sometimes inlined directly:
    > >
    > > - https://github.com/KevinVandy/tanstack-query/blob/69476f0ce5778afad4520ed42485b4110993afed/packages/query-devtools/src/utils.tsx#L305-L323
    > >
    > > This may end up being another case where, similar to the comment made in https://github.com/j4k0xb/webcrack/issues/143#issuecomment-2692345330, the deeper specifics of this may belong in a [separate plugin](https://github.com/j4k0xb/webcrack/issues/143#issuecomment-2692517232) instead of `webcrack` core; but it makes me wonder if there is some kind of generic way we can identify a pattern of these sort of React component generator libraries so that the JSX decompilation can work effectively with them?
    > >
    > > Similar'ish prior art from `wakaru`:
    > >
    > > - https://github.com/pionxzh/wakaru/issues/40
    > > - https://github.com/pionxzh/wakaru/issues/40#issuecomment-1809704264
    > > - https://github.com/pionxzh/wakaru/issues/40#issuecomment-1809962543
    > >
    > > Looking back at the main format of the `styled` function (which was `Z` in the above code):
    > >
    > > - https://github.com/cristianbote/goober/blob/5f0b43976fac214262c2c8921b1691fc4729ec98/src/styled.js#L15-L20
    > > - > `styled(tag, forwardRef)
    > >
    > > This returns an inner wrapper function, which seems to use tagged template literal syntax to provide the CSS, and then it reads that from the `arguments` into `_args`:
    > >
    > > - https://github.com/cristianbote/goober/blob/5f0b43976fac214262c2c8921b1691fc4729ec98/src/styled.js#L23-L24
    > > - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals#tagged_templates
    > >
    > > It then uses the `_args` to create the CSS class name:
    > >
    > > - https://github.com/cristianbote/goober/blob/5f0b43976fac214262c2c8921b1691fc4729ec98/src/styled.js#L41-L43
    > >
    > > And then processes the `tag` (eg. `"div"`) passed to the original function:
    > >
    > > - https://github.com/cristianbote/goober/blob/5f0b43976fac214262c2c8921b1691fc4729ec98/src/styled.js#L50-L59
    > >
    > > Eventually 'rendering' that through the 'pragma' `h`:
    > >
    > > - https://github.com/cristianbote/goober/blob/5f0b43976fac214262c2c8921b1691fc4729ec98/src/styled.js#L66
    > >
    > > Which was assigned during `setup` earlier:
    > >
    > > - https://github.com/cristianbote/goober/blob/5f0b43976fac214262c2c8921b1691fc4729ec98/src/styled.js#L4-L13
    > >
    > > Tracing through the code in our bundle to find that 'pragma' function binding, we find `t.createElement` ends up being assigned to `h` (or `y` as it's called in our minified code):
    > >
    > > ```js
    > > /* ..snip.. */
    > > /* 582 */ (function (e, t, l, i) {
    > > /* 583 */ c.p = undefined;
    > > /* 584 */ y = e;
    > > /* 585 */ p = undefined;
    > > /* 586 */ w = undefined;
    > > /* 587 */ })(t.createElement);
    > > /* ..snip.. */
    > > ```
    > >
    > > And of course, we know that `t` relates to our React global:
    > >
    > > ```js
    > > /* ..snip.. */
    > > /* 2 */ var t = window.React;
    > > /* ..snip.. */
    > > ```
    > >
    > > This obviously ends up going through a few extra steps of more library specific indirection that probably doesn't make sense to be in `webcrack` core.. but I wonder if we're able to trace/follow the React global / `createElement` 'pragma' / `h` through so that JSX decompilation can work correctly?
    > >
    > > In the case of this library it also inserts the additional wrapping component [`Styled`](https://github.com/cristianbote/goober/blob/5f0b43976fac214262c2c8921b1691fc4729ec98/src/styled.js#L26-L67) in the middle.. but I think if the `createElement` 'pragma' flowed through properly.. that might end up being properly figured out as nested JSX anyway; as the `Styled` just ends up wrapping our provided `tag` component:
    > >
    > > - https://github.com/cristianbote/goober/blob/5f0b43976fac214262c2c8921b1691fc4729ec98/src/styled.js#L69
    > >
    > > _Originally posted by @0xdevalias in https://github.com/j4k0xb/webcrack/issues/10#issuecomment-2693645060_
    >
    > ## See Also
    >
    > - https://github.com/j4k0xb/webcrack/issues/151
    >
    > _Originally posted by @0xdevalias in https://github.com/j4k0xb/webcrack/issues/152#issue-2890987070_
    #### Issue 154: "stable" identifier demangling
    > When diffing deobfuscating minified code (not obfuscated) often most changes are simple identifier renames.
    > So my feature request is "stable" identifier demangling, kinda like the current "All Names" option but stabler.
    >
    > Currently, it just counts up:
    >
    > ```js
    > // input
    > var a = 100, b = 500, c = 1000;
    > // output
    > var v = 100;
    > var v2 = 500;
    > var v3 = 1000;
    > ```
    >
    > Now if we add a new variable at the top all variables are changed which causes a huge diff:
    >
    > ```js
    > // input
    > var a = 1, b = 100, c = 500, d = 1000;
    > // output
    > var v = 1;
    > var v2 = 100;
    > var v3 = 500;
    > var v4 = 1000;
    > ```
    >
    > So instead my suggestion is somehow making the chosen name stable.
    > An idea I had was hashing various attributes of a variable like:
    >
    > - the initialization value
    > - count usages
    > - general location (which function it's in)
    >
    > With the example from above:
    >
    > ```js
    > // input
    > var a = 100, b = 500, c = 1000;
    > // output
    > var v100_0_g = 100;
    > var v500_0_g = 500;
    > var v1000_0_g = 1000;
    >
    > // input
    > var a = 1, b = 100, c = 500, d = 1000;
    > // output
    > var v1_0_g = 1; // only changed line!
    > var v100_0_g = 100;
    > var v500_0_g = 500;
    > var v1000_0_g = 1000;
    > ```
    >
    > Where the format is `v${initialValue}_${usages}_${scope}` (scope = "g"lobal). Of course this is a very naive example, real world would probably involve a hash.
    >
    > _Originally posted by @Le0Developer in https://github.com/j4k0xb/webcrack/issues/154#issue-2895194646_
    > I'm currently testing a possible implementation here on my branch: [Le0Developer/webcrack@`feat`/stable-02](https://github.com/Le0Developer/webcrack/tree/feat/stable-02?rgh-link-date=2025-03-08T13%3A12%3A30.000Z)
    >
    > Output is huge but a LOT more stable. Tested on [Cloudflare-Mining/Cloudflare-Datamining@`4f4e67f`](https://github.com/Cloudflare-Mining/Cloudflare-Datamining/commit/4f4e67fb6e0c91d800ca81b5086fa55428ab5310) and it reduced the (further decompiled diff using webcrack) from over 2000 lines to just under 400 lines with only the actual changes (and https://github.com/j4k0xb/webcrack/issues/156).
    >
    > _Originally posted by @Le0Developer in https://github.com/j4k0xb/webcrack/issues/154#issuecomment-2708276924_
    ### On `pionxzh/wakaru`
    #### Issue 34: support `un-mangle` identifiers
    @@ -1647,7 +1999,7 @@ Note: This was crossposted to the following issues:
    > - https://github.com/j4k0xb/webcrack/issues/143
    >
    > _Originally posted by @0xdevalias in https://github.com/jehna/humanify/issues/350#issue-2888684046_
    ### On `jehna/humanify`
    #### Issue 97: More deterministic renames across different versions of the same code
  9. @0xdevalias 0xdevalias revised this gist Mar 4, 2025. 2 changed files with 5 additions and 2 deletions.
    5 changes: 3 additions & 2 deletions _deobfuscating-unminifying-obfuscated-web-app-code.md
    Original file line number Diff line number Diff line change
    @@ -2040,9 +2040,10 @@ These are private chat links, so won't work for others, and are included here on
    - https://github.com/0xdevalias/chatgpt-source-watch : Analyzing the evolution of ChatGPT's codebase through time with curated archives and scripts.
    - [Reverse engineering ChatGPT's frontend web app + deep dive explorations of the code (0xdevalias gist)](https://gist.github.com/0xdevalias/4ac297ee3f794c17d0997b4673a2f160#reverse-engineering-chatgpts-frontend-web-app--deep-dive-explorations-of-the-code)
    - [Reverse Engineering Webpack Apps (0xdevalias gist)](https://gist.github.com/0xdevalias/8c621c5d09d780b1d321bfdb86d67cdd#reverse-engineering-webpack-apps)
    - [Reverse Engineered Webpack Tailwind-Styled-Component (0xdevalias gist)](https://gist.github.com/0xdevalias/916e4ababd3cb5e3470b07a024cf3125#reverse-engineered-webpack-tailwind-styled-component)
    - [React Server Components, Next.js v13+, and Webpack: Notes on Streaming Wire Format (`__next_f`, etc) (0xdevalias' gist))](https://gist.github.com/0xdevalias/ac465fb2f7e6fded183c2a4273d21e61#react-server-components-nextjs-v13-and-webpack-notes-on-streaming-wire-format-__next_f-etc)
    - [Fingerprinting Minified JavaScript Libraries / AST Fingerprinting / Source Code Similarity / Etc (0xdevalias gist)](https://gist.github.com/0xdevalias/31c6574891db3e36f15069b859065267#fingerprinting-minified-javascript-libraries--ast-fingerprinting--source-code-similarity--etc)
    - [Fingerprinting Minified JavaScript Libraries / AST Fingerprinting / Source Code Similarity / Etc (0xdevalias' gist)](https://gist.github.com/0xdevalias/31c6574891db3e36f15069b859065267#fingerprinting-minified-javascript-libraries--ast-fingerprinting--source-code-similarity--etc)
    - [JavaScript Web App Reverse Engineering - Module Identification (0xdevalias' gist)](https://gist.github.com/0xdevalias/28c18edfc17606f09cf413f97e404a60#javascript-web-app-reverse-engineering---module-identification)
    - [Reverse Engineered Webpack Tailwind-Styled-Component (0xdevalias' gist)](https://gist.github.com/0xdevalias/916e4ababd3cb5e3470b07a024cf3125#reverse-engineered-webpack-tailwind-styled-component)
    - [Bypassing Cloudflare, Akamai, etc (0xdevalias gist)](https://gist.github.com/0xdevalias/b34feb567bd50b37161293694066dd53#bypassing-cloudflare-akamai-etc)
    - [Debugging Electron Apps (and related memory issues) (0xdevalias gist)](https://gist.github.com/0xdevalias/428e56a146e3c09ec129ee58584583ba#debugging-electron-apps-and-related-memory-issues)
    - [devalias' Beeper CSS Hacks (0xdevalias gist)](https://gist.github.com/0xdevalias/3d2f5a861335cc1277b21a29d1285cfe#beeper-custom-theme-styles)
    2 changes: 2 additions & 0 deletions fingerprinting-minified-javascript-libraries.md
    Original file line number Diff line number Diff line change
    @@ -24,6 +24,8 @@
    ## See Also

    - [Fingerprinting Minified JavaScript Libraries / AST Fingerprinting / Source Code Similarity / Etc (0xdevalias' gist)](https://gist.github.com/0xdevalias/31c6574891db3e36f15069b859065267#fingerprinting-minified-javascript-libraries--ast-fingerprinting--source-code-similarity--etc)
    - [JavaScript Web App Reverse Engineering - Module Identification (0xdevalias' gist)](https://gist.github.com/0xdevalias/28c18edfc17606f09cf413f97e404a60#javascript-web-app-reverse-engineering---module-identification)
    - [Reverse Engineered Webpack Tailwind-Styled-Component (0xdevalias' gist)](https://gist.github.com/0xdevalias/916e4ababd3cb5e3470b07a024cf3125#reverse-engineered-webpack-tailwind-styled-component)
    - [Deobfuscating / Unminifying Obfuscated Web App / JavaScript Code (0xdevalias' gist)](https://gist.github.com/0xdevalias/d8b743efb82c0e9406fc69da0d6c6581#deobfuscating--unminifying-obfuscated-web-app--javascript-code)
    - [Obfuscation / Deobfuscation](https://gist.github.com/0xdevalias/d8b743efb82c0e9406fc69da0d6c6581#obfuscation--deobfuscation)
    - [Variable Name Mangling](https://gist.github.com/0xdevalias/d8b743efb82c0e9406fc69da0d6c6581#variable-name-mangling)
  10. @0xdevalias 0xdevalias revised this gist Mar 2, 2025. 1 changed file with 36 additions and 1 deletion.
    37 changes: 36 additions & 1 deletion _deobfuscating-unminifying-obfuscated-web-app-code.md
    Original file line number Diff line number Diff line change
    @@ -1129,7 +1129,42 @@ Since this gist has gotten huge with many references.. here's a tl;dr shortlist
    - > @babel/helper-validator-identifier is a utility package for parsing JavaScript keywords and identifiers. It provides several helper functions for identifying valid identifier names and detecting reserved words and keywords.
    - https://babeljs.io/docs/babel-helper-environment-visitor
    - > @babel/helper-environment-visitor is a utility package that provides a current this context visitor.

    - https://github.com/jamiebuilds/babel-handbook
    - > Babel Handbook
    - > A guided handbook on how to use Babel and how to create plugins for Babel.
    - https://github.com/jamiebuilds/babel-handbook/blob/master/translations/en/README.md
    - > This handbook is divided into two parts:
    > - User Handbook - How to setup/configure Babel and more.
    > - Plugin Handbook - How to create plugins for Babel.
    - https://github.com/jamiebuilds/babel-handbook/blob/master/translations/en/user-handbook.md
    - > Babel User Handbook
    >
    > This document covers everything you ever wanted to know about using Babel and related tooling.
    - https://github.com/jamiebuilds/babel-handbook/blob/master/translations/en/plugin-handbook.md
    - > Babel Plugin Handbook
    >
    > This document covers how to create Babel plugins.
    - https://babeljs.io/docs/plugins
    - > Plugins
    > Babel's code transformations are enabled by applying plugins (or presets) to your configuration file.
    - https://babeljs.io/docs/plugins#plugin-development
    - > Plugin Development
    > Please refer to the excellent `babel-handbook` to learn how to create your own plugins.
    - [Basics](https://github.com/jamiebuilds/babel-handbook/blob/master/translations/en/plugin-handbook.md#toc-basics) -> [Traversal](https://github.com/jamiebuilds/babel-handbook/blob/master/translations/en/plugin-handbook.md#toc-traversal) -> [Scopes](https://github.com/jamiebuilds/babel-handbook/blob/master/translations/en/plugin-handbook.md#toc-scopes)
    - https://github.com/jamiebuilds/babel-handbook/blob/master/translations/en/plugin-handbook.md#scopes
    - > Scopes
    - [Transformation Operations](https://github.com/jamiebuilds/babel-handbook/blob/master/translations/en/plugin-handbook.md#toc-transformation-operations) -> [Scope](https://github.com/jamiebuilds/babel-handbook/blob/master/translations/en/plugin-handbook.md#toc-scope)
    - https://github.com/jamiebuilds/babel-handbook/blob/master/translations/en/plugin-handbook.md#toc-scope
    - > Scope
    - > - Checking if a local variable is bound
    - https://github.com/jamiebuilds/babel-handbook/blob/master/translations/en/plugin-handbook.md#toc-checking-if-a-local-variable-is-bound
    - > - Generating a UID
    - https://github.com/jamiebuilds/babel-handbook/blob/master/translations/en/plugin-handbook.md#generating-a-uid
    - > - Pushing a variable declaration to a parent scope
    - https://github.com/jamiebuilds/babel-handbook/blob/master/translations/en/plugin-handbook.md#toc-pushing-a-variable-declaration-to-a-parent-scope
    - > - Rename a binding and its references
    - https://github.com/jamiebuilds/babel-handbook/blob/master/translations/en/plugin-handbook.md#toc-rename-a-binding-and-its-references
    ### `semantic` / `tree-sitter` + related
    - https://github.com/github/semantic
  11. @0xdevalias 0xdevalias revised this gist Mar 2, 2025. 1 changed file with 11 additions and 0 deletions.
    11 changes: 11 additions & 0 deletions _deobfuscating-unminifying-obfuscated-web-app-code.md
    Original file line number Diff line number Diff line change
    @@ -3,6 +3,7 @@
    ## Table of Contents

    <!-- TOC start (generated with https://bitdowntoc.derlin.ch/) -->
    - [tl;dr: AKA: devalias's shortlist](#tldr-aka-devaliass-shortlist)
    - [PoC](#poc)
    - [Tools](#tools)
    - [Unsorted](#unsorted)
    @@ -46,6 +47,16 @@
    - [`fingerprinting-minified-javascript-libraries.md`](#file-fingerprinting-minified-javascript-libraries-md)
    - > Fingerprinting Minified JavaScript Libraries
    ## tl;dr: AKA: devalias's shortlist

    Since this gist has gotten huge with many references.. here's a tl;dr shortlist of the main tools I have been using / paying attention to lately. If I find/add anything new to my main toolset, I'll try and ensure it's captured here.

    - [`wakaru`](#wakaru)
    - [`webcrack`](#webcrack)
    - [`humanify`](#humanify)
    - [`babel`](#babel)
    - [`semantic` / `tree-sitter` + related](#semantic--tree-sitter--related)

    ## PoC

    - https://replit.com/@0xdevalias/Rewriting-JavaScript-Variables-via-AST-Examples
  12. @0xdevalias 0xdevalias revised this gist Mar 2, 2025. 1 changed file with 44 additions and 42 deletions.
    86 changes: 44 additions & 42 deletions _deobfuscating-unminifying-obfuscated-web-app-code.md
    Original file line number Diff line number Diff line change
    @@ -2,42 +2,41 @@

    ## Table of Contents

    <!-- TOC start (generated with https://derlin.github.io/bitdowntoc/) -->

    <!-- TOC start (generated with https://bitdowntoc.derlin.ch/) -->
    - [PoC](#poc)
    - [Tools](#tools)
    - [Unsorted](#unsorted)
    - [wakaru](#wakaru)
    - [webcrack](#webcrack)
    - [ast-grep](#ast-grep)
    - [Restringer](#restringer)
    - [debundle + related](#debundle--related)
    - [joern](#joern)
    - [Unsorted](#unsorted)
    - [wakaru](#wakaru)
    - [webcrack](#webcrack)
    - [humanify](#humanify)
    - [ast-grep](#ast-grep)
    - [Restringer](#restringer)
    - [debundle + related](#debundle--related)
    - [joern](#joern)
    - [Blogs / Articles / etc](#blogs--articles--etc)
    - [Libraries / Helpers](#libraries--helpers)
    - [Unsorted](#unsorted-1)
    - [Recast + related](#recast--related)
    - [estools + related](#estools--related)
    - [Babel](#babel)
    - [`semantic` / `tree-sitter` + related](#semantic--tree-sitter--related)
    - [Shift AST](#shift-ast)
    - [`swc`](#swc)
    - [`esbuild`](#esbuild)
    - [Source Maps](#source-maps)
    - [Visualisation/etc](#visualisationetc)
    - [Unsorted](#unsorted-1)
    - [Recast + related](#recast--related)
    - [estools + related](#estools--related)
    - [Babel](#babel)
    - [`semantic` / `tree-sitter` + related](#semantic--tree-sitter--related)
    - [Shift AST](#shift-ast)
    - [`swc`](#swc)
    - [`esbuild`](#esbuild)
    - [Source Maps](#source-maps)
    - [Visualisation/etc](#visualisationetc)
    - [Browser Based Code Editors / IDEs](#browser-based-code-editors--ides)
    - [CodeMirror](#codemirror)
    - [`monaco-editor`](#monaco-editor)
    - [CodeMirror](#codemirror)
    - [`monaco-editor`](#monaco-editor)
    - [Obfuscation / Deobfuscation](#obfuscation--deobfuscation)
    - [Variable Name Mangling](#variable-name-mangling)
    - [Variable Name Mangling](#variable-name-mangling)
    - [Stack Graphs / Scope Graphs](#stack-graphs--scope-graphs)
    - [Symbolic / Concolic Execution](#symbolic--concolic-execution)
    - [Profiling](#profiling)
    - [Unsorted](#unsorted-2)
    - [My ChatGPT Research / Conversations](#my-chatgpt-research--conversations)
    - [See Also](#see-also)
    - [My Other Related Deepdive Gist's and Projects](#my-other-related-deepdive-gists-and-projects)

    - [My Other Related Deepdive Gist's and Projects](#my-other-related-deepdive-gists-and-projects)
    <!-- TOC end -->

    **Other files in this gist:**
    @@ -269,6 +268,27 @@
    - https://github.com/e9x/krunker-decompiler/blob/master/src/libDecompile.ts
    - https://github.com/e9x/krunker-decompiler/blob/master/src/libRenameVars.ts

    ### humanify

    - https://thejunkland.com/blog/using-llms-to-reverse-javascript-minification.html
    - > Using LLMs to reverse JavaScript variable name minification
    - > This blog introduces a novel way to reverse minified Javascript using large language models (LLMs) like ChatGPT and llama2 while keeping the code semantically intact. The code is open source and available at Github project Humanify
    - https://github.com/jehna/humanify
    - > Un-minify Javascript code using ChatGPT
    - > This tool uses large language modeles (like ChatGPT & llama2) and other tools to un-minify Javascript code. Note that LLMs don't perform any structural changes – they only provide hints to rename variables and functions. The heavy lifting is done by Babel on AST level to ensure code stays 1-1 equivalent.
    - https://github.com/jehna/humanify/issues/3
    - > Consider using `pionxzh/wakaru` instead of/alongside `webcrack`
    - https://github.com/jehna/humanify/blob/main/src/index.ts
    - https://github.com/jehna/humanify/blob/main/src/humanify.ts
    - https://github.com/jehna/humanify/blob/main/src/openai/openai.ts#L28-L82
    - https://github.com/jehna/humanify/blob/main/src/openai/rename-variables-and-functions.ts#L9-L26
    - https://github.com/jehna/humanify/blob/main/src/openai/is-reserved-word.ts1
    - https://github.com/jehna/humanify/blob/main/src/local-rename.ts
    - https://github.com/jehna/humanify/blob/main/src/mq.ts
    - https://github.com/jehna/humanify/tree/main/local-inference
    - https://github.com/jehna/humanify/blob/main/local-inference/inference-server.py
    - https://github.com/jehna/humanify/blob/main/local-inference/rename.py
    ### ast-grep
    - https://github.com/ast-grep/ast-grep
    @@ -449,24 +469,6 @@
    ## Blogs / Articles / etc
    - https://thejunkland.com/blog/using-llms-to-reverse-javascript-minification.html
    - > Using LLMs to reverse JavaScript variable name minification
    - > This blog introduces a novel way to reverse minified Javascript using large language models (LLMs) like ChatGPT and llama2 while keeping the code semantically intact. The code is open source and available at Github project Humanify
    - https://github.com/jehna/humanify
    - > Un-minify Javascript code using ChatGPT
    - > This tool uses large language modeles (like ChatGPT & llama2) and other tools to un-minify Javascript code. Note that LLMs don't perform any structural changes – they only provide hints to rename variables and functions. The heavy lifting is done by Babel on AST level to ensure code stays 1-1 equivalent.
    - https://github.com/jehna/humanify/issues/3
    - > Consider using `pionxzh/wakaru` instead of/alongside `webcrack`
    - https://github.com/jehna/humanify/blob/main/src/index.ts
    - https://github.com/jehna/humanify/blob/main/src/humanify.ts
    - https://github.com/jehna/humanify/blob/main/src/openai/openai.ts#L28-L82
    - https://github.com/jehna/humanify/blob/main/src/openai/rename-variables-and-functions.ts#L9-L26
    - https://github.com/jehna/humanify/blob/main/src/openai/is-reserved-word.ts1
    - https://github.com/jehna/humanify/blob/main/src/local-rename.ts
    - https://github.com/jehna/humanify/blob/main/src/mq.ts
    - https://github.com/jehna/humanify/tree/main/local-inference
    - https://github.com/jehna/humanify/blob/main/local-inference/inference-server.py
    - https://github.com/jehna/humanify/blob/main/local-inference/rename.py
    - https://blog.apify.com/chatgpt-reverse-engineer-code/
    - > Unlocking JavaScript secrets: reverse engineering code with ChatGPT
    - https://book.hacktricks.xyz/network-services-pentesting/pentesting-web/code-review-tools#static-analysis
  13. @0xdevalias 0xdevalias revised this gist Mar 1, 2025. 1 changed file with 56 additions and 1 deletion.
    57 changes: 56 additions & 1 deletion fingerprinting-minified-javascript-libraries.md
    Original file line number Diff line number Diff line change
    @@ -2,7 +2,7 @@

    ## Table of Contents

    <!-- TOC start (generated with https://derlin.github.io/bitdowntoc/) -->
    <!-- TOC start (generated with https://bitdowntoc.derlin.ch/) -->
    - [See Also](#see-also)
    - [Initial ChatGPT Conversation / Notes](#initial-chatgpt-conversation--notes)
    - [Thoughts / comments as I've articulated them elsewhere](#thoughts--comments-as-ive-articulated-them-elsewhere)
    @@ -16,6 +16,7 @@
    - [Issue 73: add a 'module graph'](#issue-73-add-a-module-graph)
    - [Issue 74: explore 'AST fingerprinting' for module/function identification (eg. to assist smart / stable renames, etc)](#issue-74-explore-ast-fingerprinting-for-modulefunction-identification-eg-to-assist-smart--stable-renames-etc)
    - [Issue 121: Explore creating a 'reverse engineered' records.json / stats.json file from a webpack build](#issue-121-explore-creating-a-reverse-engineered-recordsjson--statsjson-file-from-a-webpack-build)
    - [Issue 140: `[smart-rename]` Add support for `data-sentry-component` / `data-sentry-element` / `data-sentry-source-file` (from `@sentry/babel-plugin-component-annotate`)](#issue-140-smart-rename-add-support-for-data-sentry-component--data-sentry-element--data-sentry-source-file-from-sentrybabel-plugin-component-annotate)
    - [On `jehna/humanify`](#on-jehnahumanify)
    - [Issue 97: More deterministic renames across different versions of the same code](#issue-97-more-deterministic-renames-across-different-versions-of-the-same-code)
    <!-- TOC end -->
    @@ -1590,6 +1591,60 @@ Source: https://chat.openai.com/c/d9b7b64f-aa93-474e-939f-79e376e6d375
    > > _Originally posted by @0xdevalias in https://github.com/0xdevalias/chatgpt-source-watch/issues/9#issuecomment-1974432157_
    >
    > _Originally posted by @0xdevalias in https://github.com/pionxzh/wakaru/issues/121#issuecomment-1974433150_
    #### Issue 140: `[smart-rename]` Add support for `data-sentry-component` / `data-sentry-element` / `data-sentry-source-file` (from `@sentry/babel-plugin-component-annotate`)
    Note: This was crossposted to the following issues:
    - https://github.com/pionxzh/wakaru/issues/140
    - https://github.com/j4k0xb/webcrack/issues/143
    - https://github.com/jehna/humanify/issues/350
    > Sentry has a feature that allows it to annotate built React components with the component name and source filename it was built from, to help provide better error logs. If these are present in the built output, this could be leveraged to extract those details and assist in restoring the original component name and/or source file name:
    >
    > - https://docs.sentry.io/platforms/javascript/guides/react/features/component-names/
    > - > Sentry helps you capture your React components and unlock additional insights in your application. You can set it up to use React component names instead of selectors.
    > - > You can capture the names of React components in your application via a [Babel plugin](https://www.npmjs.com/package/@sentry/babel-plugin-component-annotate), which can unlock powerful workflows and decrease ambiguity.
    > - > Please note that your Sentry browser SDK must be at version `7.91.0` or higher before you can use these features. Only React components in `.jsx` or `.tsx` files can be tracked.
    > - > The Babel plugin parses your application's JSX source code at build time, and applies additional data attributes onto it. These attributes then appear on the DOM nodes of your application's built HTML,
    > - > For example, if you had a component named `MyAwesomeComponent` in the file `myAwesomeComponent.jsx`:
    > >
    > > ```js
    > > function MyAwesomeComponent() {
    > > return <div>This is a really cool and awesome component!</div>;
    > > }
    > > ```
    > >
    > > After your bundler applied the plugin and built your project, the resulting DOM node would look like this:
    > >
    > > ```html
    > > <div
    > > data-sentry-component="MyAwesomeComponent"
    > > data-sentry-source-file="myAwesomeComponent.jsx"
    > >
    > > This is a really cool and awesome component!
    > > </div>
    > > ```
    > - https://github.com/getsentry/sentry-javascript-bundler-plugins/tree/main/packages/babel-plugin-component-annotate
    > - `@sentry/babel-plugin-component-annotate`
    > - https://github.com/getsentry/sentry-javascript-bundler-plugins/blob/ee73414589a3341c4a4a8ec8efa3116d838e33f8/packages/babel-plugin-component-annotate/src/index.ts#L40-L46
    > - ```js
    > const webComponentName = "data-sentry-component";
    > const webElementName = "data-sentry-element";
    > const webSourceFileName = "data-sentry-source-file";
    >
    > const nativeComponentName = "dataSentryComponent";
    > const nativeElementName = "dataSentryElement";
    > const nativeSourceFileName = "dataSentrySourceFile";
    > ```
    > - https://github.com/search?type=code&q=%22data-sentry-source-file%22+OR+%22data-sentry-component%22
    >
    > ## See Also
    >
    > - https://github.com/pionxzh/wakaru/issues/140
    > - https://github.com/j4k0xb/webcrack/issues/143
    >
    > _Originally posted by @0xdevalias in https://github.com/jehna/humanify/issues/350#issue-2888684046_
    ### On `jehna/humanify`
  14. @0xdevalias 0xdevalias revised this gist Feb 6, 2025. 1 changed file with 199 additions and 0 deletions.
    199 changes: 199 additions & 0 deletions fingerprinting-minified-javascript-libraries.md
    Original file line number Diff line number Diff line change
    @@ -6,6 +6,8 @@
    - [See Also](#see-also)
    - [Initial ChatGPT Conversation / Notes](#initial-chatgpt-conversation--notes)
    - [Thoughts / comments as I've articulated them elsewhere](#thoughts--comments-as-ive-articulated-them-elsewhere)
    - [On `0xdevalias/chatgpt-source-watch`](#on-0xdevaliaschatgpt-source-watch)
    - [Issue 15: Explore using embeddings/similar to identify/track similar chunks/modules even when renamed](#issue-15-explore-using-embeddingssimilar-to-identifytrack-similar-chunksmodules-even-when-renamed)
    - [On `j4k0xb/webcrack`](#on-j4k0xbwebcrack)
    - [Issue 21: rename short identifiers](#issue-21-rename-short-identifiers)
    - [On `pionxzh/wakaru`](#on-pionxzhwakaru)
    @@ -160,6 +162,203 @@ Source: https://chat.openai.com/c/d9b7b64f-aa93-474e-939f-79e376e6d375
    ## Thoughts / comments as I've articulated them elsewhere

    ### On `0xdevalias/chatgpt-source-watch`

    #### Issue 15: Explore using embeddings/similar to identify/track similar chunks/modules even when renamed

    > I did some initial exploratory work for this in a script ages back; can't remember if it was in `chatgpt-source-watch` or [`udio-source-watch`](https://github.com/0xdevalias/udio-source-watch) repo, and not sure if it ever got to being committed or if it's just somewhere locally still.
    >
    > The general gist of this issue is that between webpack/similar builds, sometimes the chunk identifiers are renamed, which can mess up our diffing. Often times it's relatively easy to see/guess the renames based on looking at the diffs themselves (eg. in the [`_buildManifest.js`](https://github.com/0xdevalias/chatgpt-source-watch/commits/main/unpacked/_next/static/%5BbuildHash%5D/_buildManifest.js) / [`webpack.js`](https://github.com/0xdevalias/chatgpt-source-watch/blob/e78982472adbc9c5d8fd525ab2aba270f49c1006/unpacked/_next/static/chunks/webpack.js#L122-L266) files; but then it's a semi-manual process of renaming these to align so that the diffs look correctly (I believe I wrote some scripts to assist with this at some point also, probably alongside the one mentioned earlier, but similarly may not have been committed anywhere yet).
    >
    > Similarly, sometimes the chunk identifiers themselves may not have changed, but the module identifiers and/or which chunk they are in may have moved around; causing similar issues with diffing/identifying what is actually new code vs just being moved around, etc.
    >
    > The idea here is basically to use embeddings / similarity search / etc to compare the chunk files (which is what my initial script does), or the modules within them (which is a more recent idea I had for further enhancements to this) to find the closest match; which then allows us to infer in a programmatic/automated way whether its likely to have been renamed; after which we can handle it appropriately.
    >
    > _Originally posted by @0xdevalias in https://github.com/0xdevalias/chatgpt-source-watch/issues/15#issue-2834205626_
    > I'll try and find my older scripts later, but for now, a couple of initial references that may be useful, initially found from this issue:
    >
    > - https://github.com/All-Hands-AI/openhands-aci/pull/34
    > - https://github.com/UKPLab/sentence-transformers
    > - > Sentence Transformers: Multilingual Sentence, Paragraph, and Image Embeddings using BERT & Co.
    > - > State-of-the-Art Text Embeddings
    > - https://sbert.net/
    > - > Sentence Transformers (a.k.a. SBERT) is the go-to Python module for accessing, using, and training state-of-the-art text and image embedding models. It can be used to compute embeddings using Sentence Transformer models ([quickstart](https://sbert.net/docs/quickstart.html#sentence-transformer)) or to calculate similarity scores using Cross-Encoder models ([quickstart](https://sbert.net/docs/quickstart.html#cross-encoder)). This unlocks a wide range of applications, including [semantic search](https://sbert.net/examples/applications/semantic-search/README.html), [semantic textual similarity](https://sbert.net/docs/usage/semantic_textual_similarity.html), and [paraphrase mining](https://sbert.net/examples/applications/paraphrase-mining/README.html).
    > - https://sbert.net/docs/sentence_transformer/usage/semantic_textual_similarity.html
    > - > Semantic Textual Similarity
    > > For Semantic Textual Similarity (STS), we want to produce embeddings for all texts involved and calculate the similarities between them. The text pairs with the highest similarity score are most semantically similar. See also the [Computing Embeddings](https://sbert.net/examples/applications/computing-embeddings/README.html) documentation for more advanced details on getting embedding scores.
    > - https://sbert.net/docs/quickstart.html#cross-encoder
    > - > Cross Encoder
    > > Characteristics of Cross Encoder (a.k.a reranker) models:
    > > - Calculates a similarity score given pairs of texts.
    > > - Generally provides superior performance compared to a Sentence Transformer (a.k.a. bi-encoder) model.
    > > - Often slower than a Sentence Transformer model, as it requires computation for each pair rather than each text.
    > > - Due to the previous 2 characteristics, Cross Encoders are often used to re-rank the top-k results from a Sentence Transformer model.
    > - https://sbert.net/examples/applications/computing-embeddings/README.html
    > - > Computing Embeddings
    > - https://sbert.net/docs/sentence_transformer/usage/efficiency.html
    > - > Speeding up Inference
    > > Sentence Transformers supports 3 backends for computing embeddings, each with its own optimizations for speeding up inference
    > - https://huggingface.co/spaces/mteb/leaderboard
    > - > MMTEB: Massive Multilingual Text Embedding Benchmark
    > > The MMTEB leaderboard compares text embedding models on 1000+ languages.
    >
    > _Originally posted by @0xdevalias in [#15](https://github.com/0xdevalias/chatgpt-source-watch/issues/15#issuecomment-2638317388)_
    > > I did some initial exploratory work for this in a script ages back; can't remember if it was in `chatgpt-source-watch` or [`udio-source-watch`](https://github.com/0xdevalias/udio-source-watch) repo, and not sure if it ever got to being committed or if it's just somewhere locally still.
    >
    > Looks like it was in `udio-source-watch`, and that it is still only local, not committed/pushed anywhere. Here are the seemingly relevant scripts/bits.
    >
    > ## `requirements.txt`:
    >
    > ```
    > numpy==1.26.4
    > scikit-learn==1.4.2
    > ```
    >
    > ## `scripts/text_similarity_checker.py`
    >
    > ```python
    > #!/usr/bin/env python
    >
    > # TODO: It would be interesting to see how this TfidfVectorizer + cosine_similarity method compares with using difflib's SequenceMatcher + ratio methods:
    > # https://docs.python.org/3/library/difflib.html#sequencematcher-objects
    > # See also:
    > # https://docs.python.org/3/library/difflib.html#difflib.get_close_matches
    > # Return a list of the best “good enough” matches. word is a sequence for which close matches are desired (typically a string), and possibilities is a list of sequences against which to match word (typically a list of strings).
    >
    > # TODO: This ChatGPT chat has some examples of how to calculate this sort of thing in JavaScript:
    > # https://chatgpt.com/c/7fef26fd-0531-4079-b508-43904ff3e089
    > # See also:
    > # https://github.com/NaturalNode/natural/
    > # https://naturalnode.github.io/natural/
    > # https://blog.logrocket.com/natural-language-processing-node-js/
    > # https://winkjs.org/
    > # https://winkjs.org/wink-nlp/bm25-vectorizer.html
    > # BM25 is a major improvement over the classical TF-IDF based algorithms. The weights for a specific term (i.e. token) is computed using the BM25 algorithm.
    > # https://github.com/winkjs/wink-nlp
    > # https://github.com/winkjs/wink-nlp-utils
    > # https://winkjs.org/wink-nlp-utils/
    > # https://github.com/winkjs/wink-distance
    >
    > import argparse
    > from sklearn.feature_extraction.text import TfidfVectorizer
    > from sklearn.metrics.pairwise import cosine_similarity
    > import os
    >
    > def read_file(file_path):
    > with open(file_path, 'r', encoding='utf-8') as file:
    > return file.read()
    >
    > def calculate_similarities(main_file, other_files):
    > documents = [read_file(main_file)] + [read_file(f) for f in other_files]
    > tfidf_vectorizer = TfidfVectorizer()
    > tfidf_matrix = tfidf_vectorizer.fit_transform(documents)
    > main_doc_matrix = tfidf_matrix[0:1]
    > similarities = cosine_similarity(main_doc_matrix, tfidf_matrix[1:])
    > return list(zip(other_files, similarities.flatten()))
    >
    > def main():
    > parser = argparse.ArgumentParser(description="Calculate cosine similarity between a main file and a list of other files.")
    > parser.add_argument("main_file", type=str, help="The main file to compare.")
    > parser.add_argument("other_files", nargs='+', type=str, help="A list of other files to compare against the main file.")
    > args = parser.parse_args()
    >
    > # Filter out the main file early if it's accidentally included in other_files
    > filtered_files = [f for f in args.other_files if f != args.main_file]
    >
    > if not os.path.isfile(args.main_file):
    > print(f"Error: '{args.main_file}' does not exist or is not a file.")
    > return
    >
    > for file_path in filtered_files:
    > if not os.path.isfile(file_path):
    > print(f"Error: '{file_path}' does not exist or is not a file.")
    > return
    >
    > results = calculate_similarities(args.main_file, filtered_files)
    > sorted_results = sorted(results, key=lambda x: x[1], reverse=True)
    >
    > # for other_file, similarity in sorted_results:
    > # print(f"Similarity between {args.main_file} and {other_file}: {similarity:.4f}")
    >
    > print(f"Comparing against: {args.main_file}")
    > for other_file, similarity in sorted_results:
    > print(f"{other_file}: {similarity:.4f}")
    >
    > if __name__ == "__main__":
    > main()
    > ```
    >
    > ## `scripts/rename-chunk.sh`
    >
    > ```shell
    > #!/usr/bin/env zsh
    >
    > # Check if the correct number of arguments is provided
    > if [ "$#" -ne 2 ]; then
    > echo "Usage: $0 <old_file> <new_file>"
    > exit 1
    > fi
    >
    > old_file=$1
    > new_file=$2
    >
    > # Check if both arguments are regular files
    > if [ ! -f "$old_file" ]; then
    > echo "Error: $old_file is not a file."
    > exit 1
    > fi
    >
    > if [ ! -f "$new_file" ]; then
    > echo "Error: $new_file is not a file."
    > exit 1
    > fi
    >
    > # Check if the old file is tracked by Git
    > if ! git ls-files --error-unmatch "$old_file" &> /dev/null; then
    > echo "Error: $old_file is not tracked by Git."
    > exit 1
    > fi
    >
    > echo "Starting the file renaming process..."
    >
    > # Temporarily rename the new file to preserve it
    > mv $new_file $new_file.new
    >
    > # Use git mv to rename the old file to the new file's original name
    > git mv $old_file $new_file
    >
    > # Restore the originally new file from its temporary name
    > mv $new_file.new $new_file
    >
    > echo "File renaming complete. $old_file has been renamed to $new_file."
    > ```
    >
    > ## `useful-commands.md`
    >
    > This isn't the full file, just some relevant looking snippets from it:
    >
    > ````markdown
    > See how similar a chunk file is to other chunk files (to find potential chunkID churn):
    >
    > ```bash
    > npm run chunk:check 1793
    >
    > # chunk-check () { ./scripts/text_similarity_checker.py unpacked/_next/static/chunks/${1}.js unpacked/_next/static/chunks/*.js | head -n 5; }; chunk-check 1793
    > ```
    >
    > Rename a chunk file that changed due to chunkID churn:
    >
    > ```bash
    > npm run chunk:rename 7073 1793
    >
    > # chunk-rename () { ./scripts/rename-chunk.sh unpacked/_next/static/chunks/${1}.js unpacked/_next/static/chunks/${2}.js; }; chunk-rename 7073 1793
    > ```
    > ````
    >
    > _Originally posted by @0xdevalias in [#15](https://github.com/0xdevalias/chatgpt-source-watch/issues/15#issuecomment-2638327124)_
    ### On `j4k0xb/webcrack`
    #### Issue 21: rename short identifiers
  15. @0xdevalias 0xdevalias revised this gist Jan 3, 2025. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions fingerprinting-minified-javascript-libraries.md
    Original file line number Diff line number Diff line change
    @@ -13,6 +13,7 @@
    - [Issue 41: Module detection](#issue-41-module-detection)
    - [Issue 73: add a 'module graph'](#issue-73-add-a-module-graph)
    - [Issue 74: explore 'AST fingerprinting' for module/function identification (eg. to assist smart / stable renames, etc)](#issue-74-explore-ast-fingerprinting-for-modulefunction-identification-eg-to-assist-smart--stable-renames-etc)
    - [Issue 121: Explore creating a 'reverse engineered' records.json / stats.json file from a webpack build](#issue-121-explore-creating-a-reverse-engineered-recordsjson--statsjson-file-from-a-webpack-build)
    - [On `jehna/humanify`](#on-jehnahumanify)
    - [Issue 97: More deterministic renames across different versions of the same code](#issue-97-more-deterministic-renames-across-different-versions-of-the-same-code)
    <!-- TOC end -->
  16. @0xdevalias 0xdevalias revised this gist Jan 3, 2025. 1 changed file with 51 additions and 0 deletions.
    51 changes: 51 additions & 0 deletions fingerprinting-minified-javascript-libraries.md
    Original file line number Diff line number Diff line change
    @@ -1340,6 +1340,57 @@ Source: https://chat.openai.com/c/d9b7b64f-aa93-474e-939f-79e376e6d375
    >
    > _Originally posted by @0xdevalias in https://github.com/pionxzh/wakaru/issues/74#issuecomment-2568536103_
    #### Issue 121: Explore creating a 'reverse engineered' records.json / stats.json file from a webpack build
    > This is an idea I've had in passing a few times, but keep forgetting to document it:
    >
    > - https://medium.com/@songawee/long-term-caching-using-webpack-records-9ed9737d96f2
    > - > there are many factors that go into getting consistent filenames. Using Webpack records helps generate longer lasting filenames (cacheable for a longer period of time) by reusing metadata, including module/chunk information, between successive builds. This means that as each build runs, modules won’t be re-ordered and moved to another chunk as often which leads to less cache busting.
    > - > The first step is achieved by a Webpack configuration setting: `recordsPath: path.resolve(__dirname, ‘./records.json’)`
    > > This configuration setting instructs Webpack to write out a file containing build metadata to a specified location after a build is completed.
    > - > It keeps track of a variety of metadata including module and chunk ids which are useful to ensure modules do not move between chunks on successive builds when the content has not changed.
    > - > With the configuration in place, we can now enjoy consistent file hashes across builds!
    > - > In the following example, we are adding a dependency (superagent) to the vendor-two chunk.
    > >
    > > We can see that all of the chunks change. This is due to the module ids changing. This is not ideal as it forces users to re-download content that has not changed.
    > >
    > > The following example adds the same dependency, but uses Webpack records to keep module ids consistent across the builds. We can see that only the vendor-two chunk and the runtime changes. The runtime is expected to change because it has a map of all the chunk ids. Changing only these two files is ideal.
    > - https://webpack.js.org/configuration/other-options/#recordspath
    > - > `recordsPath`: Use this option to generate a JSON file containing webpack "records" – pieces of data used to store module identifiers across multiple builds. You can use this file to track how modules change between builds.
    > - https://github.com/search?q=path%3A%22webpack.records.json%22&type=code
    > - https://github.com/GooTechnologies/goojs/blob/master/webpack.records.json
    >
    > I'm not 100% sure if this would be useful, or partially useful, but I think I am thinking of it tangentially in relation to things like:
    >
    > - https://github.com/0xdevalias/chatgpt-source-watch/issues/9
    > - https://github.com/pionxzh/wakaru/issues/34
    > - https://github.com/pionxzh/wakaru/issues/41
    > - https://github.com/pionxzh/wakaru/issues/73
    > - https://github.com/pionxzh/wakaru/issues/74
    > - etc
    >
    > _Originally posted by @0xdevalias in https://github.com/pionxzh/wakaru/issues/121#issue-2164642094_
    > > Even more tangentially related to this, I've pondered how much we could 're-construct' the files necessary to use tools like bundle analyzer, without having access to the original source (or if there would even be any benefit to trying to do so):
    > >
    > > - https://github.com/webpack-contrib/webpack-bundle-analyzer
    > > - > Webpack plugin and CLI utility that represents bundle content as convenient interactive zoomable treemap
    > > - https://github.com/webpack-contrib/webpack-bundle-analyzer#usage-as-a-cli-utility
    > > - > You can analyze an existing bundle if you have a webpack stats JSON file.
    > > >
    > > > You can generate it using `BundleAnalyzerPlugin` with `generateStatsFile` option set to `true` or with this simple command: `webpack --profile --json > stats.json`
    > > - https://webpack.js.org/api/stats/
    > > - > Stats Data
    > > > When compiling source code with webpack, users can generate a JSON file containing statistics about modules. These statistics can be used to analyze an application's dependency graph as well as to optimize compilation speed.
    > > - https://nextjs.org/docs/pages/building-your-application/optimizing/bundle-analyzer
    > > - https://www.npmjs.com/package/@next/bundle-analyzer
    > >
    > > My gut feel is that we probably can figure out most of what we need for it; we probably just can't give accurate sizes for the original pre-minified code, etc; and the module names/etc might not be mappable to their originals unless we have module identification type features (see https://github.com/pionxzh/wakaru/issues/41)
    > >
    > > _Originally posted by @0xdevalias in https://github.com/0xdevalias/chatgpt-source-watch/issues/9#issuecomment-1974432157_
    >
    > _Originally posted by @0xdevalias in https://github.com/pionxzh/wakaru/issues/121#issuecomment-1974433150_
    ### On `jehna/humanify`
    #### Issue 97: More deterministic renames across different versions of the same code
  17. @0xdevalias 0xdevalias revised this gist Jan 3, 2025. 1 changed file with 160 additions and 0 deletions.
    160 changes: 160 additions & 0 deletions fingerprinting-minified-javascript-libraries.md
    Original file line number Diff line number Diff line change
    @@ -1180,6 +1180,166 @@ Source: https://chat.openai.com/c/d9b7b64f-aa93-474e-939f-79e376e6d375
    >
    > _Originally posted by @0xdevalias in https://github.com/pionxzh/wakaru/issues/74#issuecomment-2084114246_
    > Further 'prior art', an example of an 'obfuscation detector' based on AST structure:
    >
    > > > Here are projects that try to support many different ones: [PerimeterX/restringer](https://github.com/PerimeterX/restringer), [ben-sb/javascript-deobfuscator](https://github.com/ben-sb/javascript-deobfuscator)
    > >
    > > > Instead I'd rather add more interactive actions that make manually working on unknown obfuscators faster and let the user decide if its safe
    > >
    > > Linked from that `restringer` repo, I came across this project:
    > >
    > > - https://github.com/PerimeterX/obfuscation-detector
    > > - > Detect different types of JS obfuscation by their AST structure
    > > - https://github.com/PerimeterX/obfuscation-detector#supported-obfuscation-types
    > > - https://github.com/PerimeterX/obfuscation-detector/tree/main/src/detectors
    > >
    > > It could be cool to have a similar sort of 'obfuscation detector' feature within `webcrack`, particularly if it was paired with the 'interactive actions'. The 'detector' rules could suggest which obfuscations seem to be in place, and could then potentially recommend corresponding rules, etc.
    > >
    > > _Originally posted by @0xdevalias in https://github.com/j4k0xb/webcrack/issues/76#issuecomment-2116401646_
    >
    > _Originally posted by @0xdevalias in https://github.com/pionxzh/wakaru/issues/74#issuecomment-2116415993_
    > RE: https://github.com/pionxzh/wakaru/issues/74#issuecomment-2042100894
    >
    > > Some more 'prior art' from the binary reverse engineering world
    >
    > **TL;DR:** Some info on various other technologies, with Binary Ninja's new WARP being of particular interest, as well as some more general background knowledge around debug symbol servers / signature servers (eg. with a thought to how similar concepts could be used alongside module identifiers / etc for JS libs)
    >
    > Similar to IDA's [FLIRT](https://hex-rays.com/products/ida/tech/flirt/in_depth/) / [FLAIR](https://cloud.google.com/blog/u/1/topics/threat-intelligence/flare-ida-pro-script/) signatures, and what I was doing in [0xdevalias/poc-re-binsearch](https://github.com/0xdevalias/poc-re-binsearch)
    >
    > - https://binary.ninja/2024/11/20/4.2-frogstar.html#warp-advanced-function-matching-algorithm-alpha
    > - > WARP: Advanced Function Matching Algorithm Alpha
    > > This release features a new way to transfer function information between binaries. Unlike our existing SigKit tool, WARP is meant for whole function matching. This means fewer false positives and more opportunities to match on smaller functions, thanks to WARP’s function constraints.
    > - > For more information about WARP, visit the documentation [here](https://docs.binary.ninja/dev/annotation.html?h=warp#warp-signature-libraries)!
    > - https://www.seandeaton.com/binary-ninja-warp-signatures/
    > - > Trying Out Binary Ninja's new WARP Signatures with IPSW Diff'ing
    > > Binary diff'ing is pretty complex, but being able to apply markup from one binary to another is quite powerful. Binary Ninja's new WARP extends previous efforts, using SigKit, to quickly identify library functions.
    > - https://docs.binary.ninja/dev/annotation.html
    > - > Applying Annotations
    > - > - [Symbols](https://docs.binary.ninja/dev/annotation.html?h=sigkit#symbols) covers how to work with Symbols in a binary
    > > - [Types](https://docs.binary.ninja/dev/annotation.html?h=sigkit#types) documents creating and interacting with types through the API
    > > - [Tags](https://docs.binary.ninja/dev/annotation.html?h=sigkit#tags) describes how to create tags and bookmarks
    > > - [Type Libraries](https://docs.binary.ninja/dev/typelibraries.html) explains how to work with Type Libraries, including multiple sources of information from which Binary Ninja can automatically source for type information from and how you can add to them
    > > - [Signature Libraries](https://docs.binary.ninja/dev/annotation.html?h=sigkit#signature-libraries) explains how to work with the signature library which match statically compiled functions which are then matched with type libraries
    > - https://docs.binary.ninja/dev/annotation.html?h=sigkit#signature-libraries
    > - > Signature Libraries
    > > There are now two different signature library systems: SigKit, and WARP. SigKit will be deprecated in the near future as WARP represents a superset of its features.
    > - https://docs.binary.ninja/dev/annotation.html?h=sigkit#sigkit-signature-libraries
    > - > SigKit Signature Libraries
    > - https://github.com/Vector35/sigkit
    > - > Signature Kit Plugin
    > - > Function signature matching and signature generation plugin for Binary Ninja
    > - > This plugin provides Python tools for generating, manipulating, viewing, loading, and saving signature libraries (`.sig`) for the Signature System.
    > - https://docs.binary.ninja/dev/annotation.html?h=sigkit#warp-signature-libraries
    > - > WARP Signature Libraries
    > > WARP integration is included with Binary Ninja but turned off by default, for more information about WARP itself visit the open source repository here!
    > >
    > > The benefit to using WARP over SigKit is that WARP signatures are more comprehensive and as such will have fewer false positives. Alongside fewer false positives WARP will match more functions with less information due to the matching algorithm taking into account function locality (i.e. functions next to each other). After matching has completed WARP functions will be tagged and the types for those functions will be transferred, this means less work for those looking to transfer analysis information from one version of a binary to another version.
    > - https://github.com/Vector35/warp
    > - > WARP
    > > WARP provides a common format for transferring and applying function information across binary analysis tools.
    > - https://github.com/Vector35/warp#function-identification
    > - > Function Identification
    > > Function identification is the main way to interact with WARP, allowing tooling to utilize WARP's dataset to identify common functions within any binary efficiently and accurately.
    > - https://github.com/Vector35/warp#comparison-of-function-recognition-tools
    > - > Comparison of Function Recognition Tools
    > > WARP vs FLIRT
    > > The main difference between WARP and FLIRT is the approach to identification.
    > - > Function Identification
    > >
    > > - WARP the function identification is described [here](https://github.com/Vector35/warp#function-identification).
    > > - FLIRT uses incomplete function byte sequence with a mask where there is a single function entry (see: [IDA FLIRT Documentation](https://docs.hex-rays.com/user-guide/signatures/flirt/ida-f.l.i.r.t.-technology-in-depth) for a full description).
    > >
    > > What this means in practice is WARP will have less false positives based solely off the initial function identification. When the returned set of functions is greater than one, we can use the list of [Function Constraints](https://github.com/Vector35/warp#function-constraints) to select the best possible match. However, that comes at the cost of requiring a computed GUID to be created whenever the lookup is requested and that the function GUID is *always* the same.
    > - https://docs.binary.ninja/dev/typelibraries.html
    > - > Type Libraries
    > > Type Libraries are collections of type information (structs, enums, function types, etc.) stored in a file with the extension `.bntl`.
    > - https://binary.ninja/2024/10/01/plugin-spotlight-coolsigmaker.html
    > - > A common desire in reverse engineering is to match re-used code across multiple binaries. Whether you're doing malware lineage tracking, identifying a statically compiled library, or any other use case about identifying similar code, there are multiple technologies that attempt to solve parts of this problem. Other tools for related problems include [SigKit](https://github.com/Vector35/sigkit) (Binary Ninja's [static library detection](https://docs.binary.ninja/dev/annotation.html?h=sigkit#signature-library)), IDA's [FLIRT/FLAIR](https://docs.hex-rays.com/user-guide/signatures/flirt) and [Lumina](https://docs.hex-rays.com/user-guide/lumina) features, or even more advanced systems like [Diaphora](http://diaphora.re/) or [BinDiff](https://www.zynamics.com/bindiff.html).
    > >
    > > Related to those, you might already be familiar with the "SigMaker" style of plugins for various platforms[[1]](https://github.com/ajkhoury/SigMaker-x64) [[2]](https://github.com/apekros/binja_sigmaker) [[3]](https://github.com/Alex3434/Binja-SigMaker). These plugins generate patterns from code that can be used to find said code across different binaries or find the same function reliably between application updates. This is useful for malware classification and static-library identification among other purposes.
    > >
    > > [binja_coolsigmaker](https://github.com/unknowntrojan/binja_coolsigmaker) is just that: a fast and reliable "SigMaker" plugin for Binary Ninja.
    > - http://diaphora.re/
    > - > Diaphora has many of the most common program diffing (bindiffing) features you might expect, like:
    > >
    > > - Diffing assembler.
    > > - Diffing control flow graphs.
    > > - Porting symbol names and comments.
    > > - Adding manual matches.
    > > - Similarity ratio calculation.
    > > - Batch automation.
    > > - Call graph matching calculation.
    > > - Dozens of heuristics based on graph theory, assembler, bytes, functions’ features, etc…
    > - https://github.com/joxeankoret/diaphora
    > - > Diaphora, the most advanced Free and Open Source program diffing tool.
    > - https://www.zynamics.com/bindiff.html
    > - > BinDiff uses a unique graph-theoretical approach to compare executables
    > by identifying identical and similar functions
    > - > Identify identical and similar functions in different binaries
    > - > Port function names, anterior and posterior comment lines, standard comments and local names from one disassembly to the other
    > - https://github.com/google/bindiff
    > - > Quickly find differences and similarities in disassembled code
    > - https://github.com/google/bindiff#further-reading--similar-tools
    > - > Further reading / Similar tools
    > > The original papers outlining the general ideas behind BinDiff:
    > >
    > > - Thomas Dullien and Rolf Rolles. Graph-Based Comparison of Executable Objects. [bindiffsstic05-1.pdf](https://github.com/google/bindiff/blob/main/docs/papers/bindiffsstic05-1.pdf). SSTIC ’05, Symposium sur la Sécurité des Technologies de l’Information et des Communications. 2005.
    > > - Halvar Flake. Structural Comparison of Executable Objects. [dimva_paper2.pdf](https://github.com/google/bindiff/blob/main/docs/papers/dimva_paper2.pdf). pp 161-173. Detection of Intrusions and Malware & Vulnerability Assessment. 2004.3-88579-375-X.
    >
    > Then in the space of debug symbol servers / similar:
    >
    > - https://hex-rays.com/lumina
    > - > What is a Lumina server?
    > > A Lumina server keeps track of metadata about some widely-recognizable functions, like their names, prototypes, or operand types. Additionally, Lumina allows you to "export" work that was previously done on another file to other projects.
    > - > How does Lumina work?
    > > Your IDA instance exchanges function hash values and metadata with the Hex-Rays Lumina server, instead of entire byte patterns. When hash values provided by IDA match the Lumina knowledge base, your IDA instance downloads the function and applies it to the current IDA binary file database (IDB).
    > >
    > > Lumina is implemented as a hash-based lookup table, mapping byte patterns to metadata. For increased resilience, relocatable bits are masked out before hashing. The Lumina server performs lookups purely based on cryptographic digests, so (potentially sensitive) byte patterns are never transferred over the network.
    > - https://github.com/tc39/ecma426/blob/main/proposals/debug-id.md#appendix-b-symbol-server-support
    > - > Source Map Debug ID Proposal
    > > This document presents a proposal to add globally unique build or debug IDs to source maps and generated code, making build artifacts self-identifying and facilitating bidirectional references between Source Maps and generated code.
    > - > Appendix B: Symbol Server Support
    > > With debug IDs it becomes possible to resolve source maps and generated code from the server. That way a tool such as a browser or a crash reporter could be pointed to a S3, GCS bucket or an HTTP server that can serve up source maps and build artifacts keyed by debug id.
    > - https://github.com/getsentry/javascript-debug-ids
    > - > `javascript-debug-ids`
    > > JavaScript polyfills, bundler plugins and utils for the [TC39 Debug ID proposal](https://github.com/tc39/source-map/blob/main/proposals/debug-id.md).
    > - https://github.com/rollup/rollup/blob/master/CHANGELOG.md#4250
    > - > Add `output.sourcemapDebugIds` option to add matching debug ids to sourcemaps and code for tools like Sentry or Rollbar
    > - And some more random info/resources related to debug symbol servers
    > - https://docs.sentry.io/platforms/native/data-management/debug-files/symbol-servers/
    > - > Symbol Servers
    > - > Sentry can download debug information files from external repositories. This allows you to stop uploading debug files and instead configure a public symbol server or run your own. It is also possible to configure external repositories and upload debug files at the same time.
    > - https://docs.sentry.io/platforms/native/data-management/debug-files/symbol-servers/#custom-repositories
    > - > Independent of the internal format, Sentry supports three kinds of custom repositories:
    > >
    > > - HTTP Symbol Server: An HTTP server that serves debug files at a configurable path. Lookups in the server should generally be case-insensitive, although an explicit casing can be configured in the settings. Note that Sentry requires a minimum download speed of 4Mb/s to fetch DIFs from custom HTTP symbol servers.
    > > - Amazon S3 Bucket: Either an entire S3 bucket or a subdirectory. This requires `s3:GetObject`, and optionally `s3:ListBucket` permissions for the configured Access Key. Lookups in the bucket are case-sensitive, which is why we recommend storing all files lower-cased and using a lowercased path casing configuration.
    > > - Google Cloud Storage Bucket: Either an entire GCS bucket or a subdirectory. This requires `storage.objects.get` and `storage.objects.list` permissions for the configured service account. Lookups in the bucket are case sensitive, which is why we recommend storing all files lower-cased.
    > - https://docs.sentry.io/platforms/native/data-management/debug-files/symbol-servers/#directory-layouts
    > - > Directory Layouts
    > - > The following table contains a mapping from the supported layouts to file path schemas applied for specific files
    > - https://www.jetbrains.com/help/clion/using-symbol-servers-when-debugging-on-windows.html
    > - > Use symbol servers when debugging on Windows
    > - https://learn.microsoft.com/en-us/windows/win32/dxtecharts/debugging-with-symbols#symbol-servers
    > - > Debugging with Symbols
    > - > Symbol Servers
    > - https://learn.microsoft.com/en-us/windows/win32/debug/symbol-servers-and-symbol-stores
    > - > Symbol Server and Symbol Stores
    > - https://en.wikipedia.org/wiki/Microsoft_Symbol_Server
    > - > Microsoft Symbol Server is a Windows technology used to obtain symbol debugging information.
    > - https://blog.inedo.com/nuget/source-server-debugging/
    > - > When you build a NuGet package with Source Link enabled a Git Repository URL and Commit ID will be embedded in the package metadata. This allows Visual Studio to locate the required code files for debug time.
    > - https://github.com/dotnet/sourcelink
    > - > Source Link
    > > Source Link is a language- and source-control agnostic system for providing first-class source debugging experiences for binaries.
    > - > Source Link [specification](https://github.com/dotnet/designs/blob/main/accepted/2020/diagnostics/source-link.md#source-link-file-specification) describes source control metadata that can be embedded in symbols, binaries and packages to link them to their original sources.
    > - https://github.com/dotnet/designs/blob/main/accepted/2020/diagnostics/source-link.md#source-link-file-specification
    > - https://sourceware.org/elfutils/Debuginfod.html
    > - > [elfutils](https://sourceware.org/elfutils/index.html) debuginfod is a client/server in elfutils 0.178+ that automatically distributes elf/dwarf/source-code from servers to clients such as debuggers across HTTP.
    >
    > _Originally posted by @0xdevalias in https://github.com/pionxzh/wakaru/issues/74#issuecomment-2568536103_
    ### On `jehna/humanify`
    #### Issue 97: More deterministic renames across different versions of the same code
  18. @0xdevalias 0xdevalias revised this gist Jan 2, 2025. 1 changed file with 23 additions and 10 deletions.
    33 changes: 23 additions & 10 deletions fingerprinting-minified-javascript-libraries.md
    Original file line number Diff line number Diff line change
    @@ -19,16 +19,29 @@

    ## See Also

    - https://github.com/pionxzh/wakaru/issues/41
    - > Module detection
    - https://github.com/pionxzh/wakaru/issues/34
    - > support `un-mangle` identifiers
    - https://github.com/pionxzh/wakaru/issues/74
    - > explore 'AST fingerprinting' for module/function identification (eg. to assist smart / stable renames, etc)
    - https://github.com/pionxzh/wakaru/issues/73
    - > add a 'module graph'
    - https://github.com/pionxzh/wakaru/issues/121
    - > Explore creating a 'reverse engineered' records.json / stats.json file from a webpack build
    - [Fingerprinting Minified JavaScript Libraries / AST Fingerprinting / Source Code Similarity / Etc (0xdevalias' gist)](https://gist.github.com/0xdevalias/31c6574891db3e36f15069b859065267#fingerprinting-minified-javascript-libraries--ast-fingerprinting--source-code-similarity--etc)
    - [Deobfuscating / Unminifying Obfuscated Web App / JavaScript Code (0xdevalias' gist)](https://gist.github.com/0xdevalias/d8b743efb82c0e9406fc69da0d6c6581#deobfuscating--unminifying-obfuscated-web-app--javascript-code)
    - [Obfuscation / Deobfuscation](https://gist.github.com/0xdevalias/d8b743efb82c0e9406fc69da0d6c6581#obfuscation--deobfuscation)
    - [Variable Name Mangling](https://gist.github.com/0xdevalias/d8b743efb82c0e9406fc69da0d6c6581#variable-name-mangling)
    - [Stack Graphs / Scope Graphs](https://gist.github.com/0xdevalias/d8b743efb82c0e9406fc69da0d6c6581#stack-graphs--scope-graphs)
    - [My ChatGPT Research / Conversations](https://gist.github.com/0xdevalias/d8b743efb82c0e9406fc69da0d6c6581#my-chatgpt-research--conversations)
    - https://github.com/j4k0xb/webcrack
    - https://github.com/j4k0xb/webcrack/issues/21
    - > rename short identifiers
    - https://github.com/pionxzh/wakaru
    - https://github.com/pionxzh/wakaru/issues/34
    - > support `un-mangle` identifiers
    - https://github.com/pionxzh/wakaru/issues/41
    - > Module detection
    - https://github.com/pionxzh/wakaru/issues/73
    - > add a 'module graph'
    - https://github.com/pionxzh/wakaru/issues/74
    - > explore 'AST fingerprinting' for module/function identification (eg. to assist smart / stable renames, etc)
    - https://github.com/pionxzh/wakaru/issues/121
    - > Explore creating a 'reverse engineered' records.json / stats.json file from a webpack build
    - https://github.com/jehna/humanify
    - https://github.com/jehna/humanify/issues/97
    - > More deterministic renames across different versions of the same code
    ## Initial ChatGPT Conversation / Notes

  19. @0xdevalias 0xdevalias revised this gist Dec 31, 2024. 1 changed file with 325 additions and 2 deletions.
    327 changes: 325 additions & 2 deletions chrome-devtools-sources-extension.md
    Original file line number Diff line number Diff line change
    @@ -1,4 +1,4 @@
    ## Chrome DevTools 'Sources' Extension
    # Chrome DevTools 'Sources' Extension

    Originally articulated here:

    @@ -7,7 +7,17 @@ Originally articulated here:
    - https://github.com/pionxzh/wakaru/issues/76
    - > add Chrome DevTools extension that allows the web IDE to be used within DevTools
    ### Overview
    ## Table of Contents

    <!-- TOC start (generated with https://bitdowntoc.derlin.ch/) -->
    - [Overview](#overview)
    - [Original Notes](#original-notes)
    - [Notes from chat with Chris (December 2024)](#notes-from-chat-with-chris-december-2024)
    <!-- TOC end -->

    ## Overview

    ### Original Notes

    The following is as I originally wrote it on the above issues, copied here for posterity:

    @@ -120,3 +130,316 @@ The following is as I originally wrote it on the above issues, copied here for p
    > - > Use the `chrome.webNavigation` API to receive notifications about the status of navigation requests in-flight.
    > - https://developer.chrome.com/docs/extensions/reference/api/offscreen
    > - > Use the `offscreen` API to create and manage offscreen documents.
    ### Notes from chat with Chris (December 2024)

    The following are relevant snippets from a Messenger chat with Chris M in December 2024:

    > **Chris M (11:46AM):**
    > I wanna build a better debugger for the browser too, but the only way to build it would be to have a deep overall understanding of js and the DOM that no human can hold in their head all at once
    > **Glenn (11:47AM):**
    > What sort of ‘better debugger’ features would you be wanting?
    > **Chris M (11:47AM):**
    > Not features, just an entirely new way of debugging
    > **Chris M (11:48AM):**
    > I had some ways I wanted to implement it before but my memory is all fuzzy now so it's easier just to explain the goal
    > **Chris M (11:49AM):**
    > Basically it wouldn't be possible to obfuscate anything in the first place, because everything would be super open and high level in the first place
    > **Chris M (11:49AM):**
    > I had some ways I wanted to implement it before but my memory is all fuzzy now so it's easier just to explain the goal
    > **Chris M (11:50AM):**
    > So like if you stripped the semantics, it wouldn't do anything, because you can always see a high level interpretation of what's being changed
    > **Glenn (11:51AM):**
    > Maybe slightly lower level/more detailed than that description :p
    >
    > “Basically stuff with things and therefore magic!”
    > **Chris M (12:14PM):**
    > Ok so say youtube makes a request for a video. But the way it does that is through m3u streams chunked up, and then you place a breakpoint on it, you find all the js is chunked too, and then if you trace it you're left with like `f(c,y,u)`, and this splits off into 16 different branches, none of which individually explain what's going on, some of which are.
    >
    > What I want is for it to be inherently interpretable based on what it's actually doing, like part of it is that it does this handshake to google auth servers and weird cypher stuff with a thing called sapisidhash in the background. You wouldn't necessarily know looking at the minified code, cos all you see is a bunch of ints being passed around, but on a higher level, the DOM knows, because it has the full context and knows it's calling crypto libraries, so it could be named based on what it's actually doing, and then you can developers can still alias that on top with their own naming schemes.
    > I wanna see this visually at every step as well. You should be able to literally walk through it and see the information being transferred more intuitively. Maybe like a node graph, but also it should be directly connected to the dom too.
    >
    > Like if pressing a button interacts with some other element on the page, I want to see the path connecting them and be able to intercept it on the page itself instead of going in and out of the debugger and trying to make sure I've got a hold of the right thing.
    > **Chris M (12:14PM):**
    > It's hard to explain what I mean without just describing a debugger
    >
    > Tech bros reinventing the wheel
    > **Glenn (12:33PM):**
    > Nah, I def followed some of that at least
    > **Chris M (12:35PM):**
    > Like if you take the sapisidhash example, you might say "but how can the interpreter work out that you're building some niche auth system due to the math you're doing", but the thing is, an LLM can work it out even if you remove the semantics, because it can understand the context and how that structure fits into it. If you had autocomplete on, it would probably just name it something like `auth_hash_verifier()` for you.
    >
    > So surely you could take extra steps to make that even more robust by specifically designing a js engine for interpretability.
    > **Glenn (12:36PM):**
    > Like some bits of this feel a bit ‘high level dream’ and good luck implementing it; but also AI definitely makes to far more likely to be achievable than it would have been in the past.
    >
    > But other parts of this are definitely more in the tangible space of things that I have definitely wanted in the past; like particularly getting from button on page -> the _actual_ click handler for it, without just being directed to some convoluted nested layer of react’s internal handlers or similar; or having to know that I can access the react fibre via that DOM nodes properties and then access funky things from it that way; or that I have react devtools installed and it magically does something to make it easier for me to get that.
    >
    > I feel like for at least some of what you described there are already tools and methods for it; they aren’t just all in one place but (eg. You’d need react devtools and angular devtools and xyzfoobar devtools, and sometimes there isn’t an appropriate devtools)
    > **Glenn (12:37PM):**
    > One thing I’ve wanted and been thinking of building forever is just a mashup of the chrome devtools sources tab with an actual useful IDE implementation (Monaco), so that I can jump around the variable references like I’m in an actual IDE.
    >
    > And yeah I can probably just connect to it from my IDE as a debugger; but I never remember how to do that for my own code let alone a random website; so I want it accessible straight from chrome devtools sources all I need to remember is ‘right click on page then hack shit’
    > **Glenn (12:39PM):**
    > And I’ve looked into it, and the chrome extension API’s seem to have everything I need to access that stuff; and can create devtools panels and could embed Monaco in that. So it’s just a case of me actually playing around and doing it
    > **Glenn (12:39PM):**
    > Then next step from there is to add wakaru into the mix, so I can go from “right click devtools” to “oh look, pretty deminified”
    > **Glenn (12:39PM):**
    > And then my vision gets hazier from there; but I suspect I would add some nice features for easily allowing me to call into the devtools debug hook/monitor functions to more easily/precisely target things; and/or to allow me to add proxies to monitor/override various things on the fly monkey patch style
    > **Glenn (12:40PM):**
    > Basically all stuff I can kind of already do based on existing tools + stuff I figured out and documented in gists and such; but I want it all in 1 place relying on code and gui rather than my memory of how to do things
    > **Glenn (12:42PM):**
    > Also being able to see module import graph overview (which can already be done with tools like madge/etc)
    > **Glenn (12:43PM):**
    > And then go deeper into the module fingerprinting stuff I’ve started exploring on wakaru repo + building a database of open source libraries fingerprints for that so it can basically be like 85% of this bundle is just the following libs/versions, and here is the bit that’s actually unique and interesting
    > **Chris M (12:43PM):**
    > Yeah that's basically what I mean
    > Also it's not like any of these tools talk to each other
    > It's a whole ass project
    > **Glenn (12:44PM):**
    > Yeah; one of my annoyances of security tools and I guess tools in general.
    >
    > Everyone reinvents the wheel with a slightly different slice of functionality; or language of implementation or similar
    > **Glenn (12:45PM):**
    > > It's a whole ass project
    >
    > It is. But it’s one that depending on the nuance of what you wants; you can take standalone steps towards it.
    >
    > That’s what I do with my gists and deep dive issues and similar; basically do some of the reverse engineering or research or similar and cache that for others or future me.
    >
    > A lot of my issue contributions on wakaru have been minimising the effort they need to go from an idea I want to it being implemented without me having to be the one that wrote the code, etc
    > **Glenn (12:46PM):**
    > This + a more generalised and cleaned up/automated version of the tooling I was building out for ChatGPT-source-watch are still 2 projects I want to hack on and make available open source; as while I haven’t looked properly in ages, I feel like this area of tooling is still super shit for no good reason in the web app security space; and I’ve thought it could be done better for years now
    > **Glenn (12:47PM):**
    > There’s so much interesting stuff the devtools protocol / debugger gives you access to that we barely scratch the surface of using.
    >
    > Hell, even i regularly find ways i can be more effective in using the debugger in devtools just as it is
    > **Glenn (12:51PM):**
    > > Like if you take the sapisidhash example, you might say "but how can the interpreter work out that you're building some niche auth system due to the math you're doing", but the thing is, an LLM can work it out
    >
    > Mm, maybe. But I guess at least with current codebase sizes and inference speed/cost/etc it would probably be overkill/slow AF to try and figure that out in a global scale.
    >
    > If I was approaching it I would more build human in the loop assistance tools. So like, understand the bundle, identify known modules, basically do all the basic normal AST/etc non LLM things to reduce the noise/be able to provide better context; and then provide easy tools to allow the human to be like “wtf is this chunk of weird math stuff, grab all the relevant context and throw it to an LLM and tell me what it is”
    > **Glenn (12:54PM):**
    > Then I guess if you wanted to add an open source layer to that; you could take the same concept of ‘source code fingerprinting’ and module identification that I described earlier for open source libs; but allow users to contribute it for arbitrary code as well. Then say one day Chris decides to try and hack YouTube and makes the LLM figure out that chunk of code does some crypto BS; and then he submits that fingerprint + ‘smart prettied naming’ to this open database; and then Glenn decides he wants to hack it one day and the tool is like “oh, that code seems like a close match to this thing Chris already spent time figuring out, do you want to just apply that?”
    > **Glenn (12:54PM):**
    > To a basic degree, and manually, that’s what I already do a lot of the time with identifying open source libs by just grabbing keywords from them and using GitHub code search
    > **Chris M (12:56PM):**
    > Oh that's a good idea
    > Anyway yea
    > All of this is a lot of work
    > and a lot of stuff to work out on the way
    > It's a sick idea but realistically I'm probably not gonna do it
    > Even if o5-mini got me close though it'd be worth the investment
    > Wouldn't mind waiting a week or a month for it even
    > **Glenn (12:58PM):**
    > You’ve got g1988 on it; but it’s an unreliable model with a lot of contention for its compute resources and so it could be anywhere from days to years to see an outcome :p
    > **Glenn (1:00PM):**
    > But yeah; it’s the type of project I would probably happily collab on if I trusted the other person to actually stick with it
    > **Glenn (1:01PM):**
    > Or like.. not even ‘stick with it’ per se; but just like.. get to some tangible lasting level of something and not just talk about dreams and then never come back to it type vibe
    > **Chris M (1:02PM):**
    > I can't even trust myself to stick with things Im heavily invested in but yeah if you find someone before LLMs are this capable feel free to run with it and id probably contrib to some degree
    >
    > It was more concrete about a year ago, I think im just making shit up now that was loosely based on it and overwriting stuff that was better originally
    > **Glenn (1:06PM):**
    > I mean, eventually I’ll do the things I said above myself.
    >
    > And there’s this Canadian kid I’ve been vaguely half mentoring/giving feedback to on twitter occasionally who has been hacking in similar spaces and been playing with building out a lot of the types of tooling/solutions I was hacking on for ChatGPT-source-watch. So hopefully can get him to share back some of what he’s built eventually.
    >
    > He’s also similarly hard to consistently motivate on things though; so it’s more of an ‘opportunistic if it happens to align’ type vibe
    > **Glenn (1:10PM):**
    > > I can't even trust myself to stick with things Im heavily invested in...
    >
    > Like, even if you have cool ideas in that space, and some vague snippets of steps towards how you might implement it; and can be fucked writing that out in some vaguely tangible way; I find even that kind of stuff useful.
    >
    > Like I’ll add it to a gist of collected ideas around this sort of thing and then use that as a source of remembering/motivating/etc when I do decide to hack on it
    > **Chris M (1:10PM):**
    > I worked with triex on some website he was getting paid for a while back and it was a hard problem, probably would have never been fucked to do it myself
    > But I was like "oh thats easy, you just" and he was like nah I tried that
    > And then it became this whole rabbithole I spent ages on and eventually cracked
    > But like if I dedicated myself to doing that just for it's own sake I would've never done it, prob even less motivating if i was getting paid for it cos then id HAVE to
    > **Glenn (1:11PM):**
    > > It was more concrete about a year ago, I think im just making shit up now...
    >
    > Well, let it sit in the back of your mind and if you do come up with an idea that seems good; document some measure of it in a way that we can refer back to it later.
    >
    > One reason I like the gists is its minimal effort, tracks history, and is public so even if I never get back to it, someone might benefit from the effort I put in.
    > **Glenn (1:11PM):**
    > > But I was like "oh thats easy, you just" and he was like nah I tried that
    >
    > Honestly that’s the kind of people and situations I enjoy working on the most.
    >
    > Solving problems on my own is fine; but being able to bounce ideas between smart people to make cool shit is infinitely better and way easier to stay motivated on
    > **Glenn (1:25PM):**
    > For your react click event handler stuff; I would potentially be looking at the react fibre properties on the DOM node; as you can access the props of the components from there and all sorts of fun stuff like that.
    >
    > On one of my gists I have some hacky snippets of code for animating the nodes that have react fibres and such
    > **Glenn (1:29PM):**
    > > And I’ve looked into it, and the chrome extension API’s seem to have everything I need to access that stuff; and can create devtools panels and could embed Monaco in that. So it’s just a case of me actually playing around and doing it
    >
    > A paid contracting project I’ve been hacking in recently uses Monaco; so it’s given me an opportunity to build some knowledge in that space that will make starting this project seem less ‘dauntingly unknown’ when I do get around to it
    > **Chris M (1:37PM):**
    > I think I tried that
    > Basically it needed to save the prior element state
    > in some react specific way
    > And then check the last set value before updating it again
    > Seems like it doesn't always work this way though, it worked for Udio and a few things on facebook reels
    > Othertimes it does nothing and then react props will do it or a `.click()` will do it
    > It's kinda dumb that we can't just see what it's doing up front
    > I miss when you could just look at the onclick property and know everything there was to it
    > **Glenn (3:49PM):**
    > > I miss when you could just look at the onclick property...
    >
    > Same
    >
    > > It's kinda dumb that we can't just see what it's doing up front
    >
    > Though tbh this part is probably just a skill/tooling issue.
    >
    > I’m sure that once you know how to find that info easily it’ll be pretty straightforward. It’s just that it’s not natively built into the chrome devtools; and it’s not really a supported feature to dig into and hijack react handlers from outside of react.
    >
    > Like in theory for manual dev look at things stuff; react devtools does this perfectly fine.
    >
    > It’s just that we’re wanting to programmatically hijack stuff that it gets weirder/harder
    >
    > Like I can’t remember exactly off the top of my head, but to think even native chrome devtools gives me somewhat useful feedback on this stuff when I remember to filter things appropriately
    > **Glenn (8:54PM):**
    > Tangent: here’s the section of one of my gists with some of the snippets/etc related to reading react fibers/etc:
    >
    > https://gist.github.com/0xdevalias/8c621c5d09d780b1d321bfdb86d67cdd#react-internals
    >
    > (Also some vaguely similar bits for other frameworks on that gist too)
    > **Glenn (10:00PM):**
    > So that chrome event listeners tab just sees a noop handler because react uses a synthetic event handling system. ChatGPT’s response wasn’t super definitive, but I think the noop handler is there for some quirks/edgecases (either browser reasons; or maybe dev/stubbed out in prod reasons)
    > **Glenn (10:29PM):**
    > Skimming some docs on react events; this indirectly mentions the higher level overview of how the synthetic event system runs down then up: https://react.dev/learn/responding-to-events#capture-phase-events
    > **Glenn (10:38PM):**
    > Then skimming some other articles that are barely worth linking; react uses synthetic events that wrap native events basically for cross browser compatibility reasons; it uses the single listener ‘event delegation’ pattern for performance reasons, it used to pool and reuse event objects for performance reasons but doesn’t do that anymore since react 17 (https://legacy.reactjs.org/docs/legacy-event-pooling.html) because modern browsers suck less so it doesn’t improve performance anymore and also it confused even experienced react users; etc
    > **Glenn (10:40PM):**
    > Earlier versions of react bound event listeners to document; event delegation was disabled by default in react 17; react 18 did something to change that again to be better and more flexible somehow (https://markovate.com/blog/react-18-update/#:~:text=Changes%20to%20Event%20Delegation)
    > **Glenn (10:41PM):**
    > They also batch various events together to reduce the number of re-renders: https://react.dev/blog/2022/03/29/react-v18#new-feature-automatic-batching
    >
    > https://github.com/reactwg/react-18/discussions/21
    > **Glenn (10:53PM):**
    > A bit more of the internals of the synthetic event and its parent classes/prototype: https://dev.to/grantcloyd/examining-react-s-synthetic-event-the-nativeevent-the-eventphase-and-bubbling-549k?utm_source=chatgpt.com#:~:text=Pulling%20Back%20the%20Curtain
    > **Glenn (10:55PM):**
    > And then this stuff about the event phases and bubbling: https://dev.to/grantcloyd/examining-react-s-synthetic-event-the-nativeevent-the-eventphase-and-bubbling-549k?utm_source=chatgpt.com#:~:text=event.eventPhase%20and%20Bubbling
    > **Glenn (10:59PM):**
    > Seemingly the pooling was related to object creation/removal and garbage collection issues from frequently doing that for high frequency events
    > **Glenn (11:01PM):**
    > Can look at `e.target` and `e.currentTarget` to see which element the event innovated from vs the one where the event handler is bound
    > **Glenn (11:05PM):**
    > Back to the event listener tab; if I tick the checkbox to show all ancestors
    >
    > The bottom one attached to the button is the react `noop`.
    >
    > One up is `dispatchDiscreteEvent`
    >
    > One up is `dispatchDiscreteEvent` again
    >
    > Then one up from that is `noop`
    >
    > And then one up again isn’t react related but just sort of this code sandbox sites native code
    > **Glenn (11:10PM):**
    > And `dispatchDiscreteEvent` then calls into `dispatchEvent`
    > **Glenn (11:11PM):**
    > But basically those `dispatchDiscreteEvent` handlers are on the react root node.
    >
    > So you could go look at it in chrome devtools ‘event listeners’ tab without needing to tick ‘show ancestors’ if you wanted to
    > **Glenn (11:15PM):**
    > Creating a log point on `dispatchDiscreteEvent`
    >
    > There are 2 `click` events (and a bunch of other events I don't care about right now such as: `focusin`, `pointerdown`, `mousedown`, `pointerup`, `mouseup`)
    > **Glenn (11:19PM):**
    > https://developer.mozilla.org/en-US/docs/Web/API/Event/eventPhase
    >
    > The first has `eventPhase` `1` (capturing)
    >
    > The next has `eventPhase` `3`: bubbling
    >
    > All standard react event handlers trigger on bubbling because of this event delegation in the root node thing.
    >
    > (You can also bind a react handler to trigger during the capture phase)
    >
    > If you bound a native event handler on the actual target element it would trigger in phase 2 (at the target element)
    > **Glenn (11:25PM):**
    > Looks like the react container has its own little internals: `__reactContainer$zwzryaw5zkr`
    > **Glenn (11:35PM):**
    > Here’s a gist revision with more of the internals you kind find on those properties: https://gist.github.com/0xdevalias/8c621c5d09d780b1d321bfdb86d67cdd/revisions#diff-fbb49510154bc56f46cbdc5d3afcc2281fa109b6fb3f475cf605a6955f8d5a1a
    >
    > `__reactFiber$`, `__reactProps$`, `__reactContainer$`, `__reactEvents$`, `__reactListeners$`, `__reactHandles$`, `__reactResources$`, `__reactMarker$`
    > **Glenn (11:39PM):**
    > And a better code snippet for finding all those: https://gist.github.com/0xdevalias/8c621c5d09d780b1d321bfdb86d67cdd/revisions#diff-fbb49510154bc56f46cbdc5d3afcc2281fa109b6fb3f475cf605a6955f8d5a1a
    > **Glenn (11:40PM):**
    > I cbf right now; but if I traced into `dispatchEvent` more I could no doubt figure how it maps to knowing where all the event handlers are among everything too. But cbf for now since reading it straight off the fiber is easier anyways
    > **Glenn (11:46PM):**
    > Also added some notes to the gist capturing the event phase bubbling/etc stuff from above for feature reference
    > **Glenn (11:58PM):**
    > This article also seems a pretty all in one overview of the random shit I found earlier: https://blog.logrocket.com/event-bubbling-capturing-react/
    > **Glenn (12:08PM):**
    > Anyways; hyperfocus over; sleep for me. Night
  20. @0xdevalias 0xdevalias revised this gist Dec 30, 2024. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion _deobfuscating-unminifying-obfuscated-web-app-code.md
    Original file line number Diff line number Diff line change
    @@ -135,7 +135,7 @@
    - https://github.com/gchq/CyberChef
    - > The Cyber Swiss Army Knife - a web app for encryption, encoding, compression and data analysis
    - https://gchq.github.io/CyberChef/
    - Javascrpt Parser: https://gchq.github.io/CyberChef/#recipe=JavaScript_Parser(false,false,false,false,false)
    - Javascript Parser: https://gchq.github.io/CyberChef/#recipe=JavaScript_Parser(false,false,false,false,false)
    - https://github.com/dandavison/delta
    - > A syntax-highlighting pager for git, diff, and grep output
    - > (the package is called "git-delta" in most package managers, but the executable is just delta)
  21. @0xdevalias 0xdevalias revised this gist Dec 30, 2024. 1 changed file with 6 additions and 1 deletion.
    7 changes: 6 additions & 1 deletion _deobfuscating-unminifying-obfuscated-web-app-code.md
    Original file line number Diff line number Diff line change
    @@ -1362,6 +1362,10 @@ In addition to the links directly below, also make sure to check out the various
    > - Using a Web-ish npm Dependency: Uses a dependency which isn't entirely optimised for running in a web page, but doesn't have too big of a dependency tree that it this becomes an issue either
    > - Presenting Information Inline: Using a fraction of the extensive Monaco API (monaco is the text editor at the core of the Playground) to showcase what parts of a TypeScript file would be removed by a transpiler to make it a JS file.

    See Also:

    - [Editor Frameworks and Collaborative Editing/Conflict Resolution Tech (0xdevalias' gist)](https://gist.github.com/0xdevalias/2fc3d66875dcc76d5408ce324824deab#editor-frameworks-and-collaborative-editingconflict-resolution-tech)

    ### CodeMirror

    - https://codemirror.net/
    @@ -1995,4 +1999,5 @@ These are private chat links, so won't work for others, and are included here on
    - [Debugging Electron Apps (and related memory issues) (0xdevalias gist)](https://gist.github.com/0xdevalias/428e56a146e3c09ec129ee58584583ba#debugging-electron-apps-and-related-memory-issues)
    - [devalias' Beeper CSS Hacks (0xdevalias gist)](https://gist.github.com/0xdevalias/3d2f5a861335cc1277b21a29d1285cfe#beeper-custom-theme-styles)
    - [Reverse Engineering Golang (0xdevalias' gist)](https://gist.github.com/0xdevalias/4e430914124c3fd2c51cb7ac2801acba#reverse-engineering-golang)
    - [Reverse Engineering on macOS (0xdevalias' gist)](https://gist.github.com/0xdevalias/256a8018473839695e8684e37da92c25#reverse-engineering-on-macos)
    - [Reverse Engineering on macOS (0xdevalias' gist)](https://gist.github.com/0xdevalias/256a8018473839695e8684e37da92c25#reverse-engineering-on-macos)
    - [Editor Frameworks and Collaborative Editing/Conflict Resolution Tech (0xdevalias' gist)](https://gist.github.com/0xdevalias/2fc3d66875dcc76d5408ce324824deab#editor-frameworks-and-collaborative-editingconflict-resolution-tech)
  22. @0xdevalias 0xdevalias revised this gist Sep 29, 2024. 1 changed file with 15 additions and 0 deletions.
    15 changes: 15 additions & 0 deletions _deobfuscating-unminifying-obfuscated-web-app-code.md
    Original file line number Diff line number Diff line change
    @@ -160,6 +160,21 @@
    - > JS obfuscated code restoration
    > Let confusion no longer be a stumbling block in reverse analysis
    > https://js-deobfuscator.vercel.app/
    - https://github.com/j4k0xb/unsea
    - > unsea
    > Extracts the javascript source code and assets of Node Single Executable Applications.
    >
    > Compatible with ELF (Linux), PE (Windows), and Mach-O (MacOS) executables.
    - https://nodejs.org/api/single-executable-applications.html
    - > Single executable applications
    >
    > This feature allows the distribution of a Node.js application conveniently to a system that does not have Node.js installed.
    >
    > Node.js supports the creation of single executable applications by allowing the injection of a blob prepared by Node.js, which can contain a bundled script, into the node binary. During start up, the program checks if anything has been injected. If the blob is found, it executes the script in the blob. Otherwise Node.js operates as it normally does.
    >
    > The single executable application feature currently only supports running a single embedded script using the CommonJS module system.
    >
    > Users can create a single executable application from their bundled script with the node binary itself and any tool which can inject resources into the binary.
    ### wakaru

  23. @0xdevalias 0xdevalias revised this gist Sep 25, 2024. 1 changed file with 9 additions and 7 deletions.
    16 changes: 9 additions & 7 deletions fingerprinting-minified-javascript-libraries.md
    Original file line number Diff line number Diff line change
    @@ -6,13 +6,15 @@
    - [See Also](#see-also)
    - [Initial ChatGPT Conversation / Notes](#initial-chatgpt-conversation--notes)
    - [Thoughts / comments as I've articulated them elsewhere](#thoughts--comments-as-ive-articulated-them-elsewhere)
    - [On `j4k0xb/webcrack`](#on-j4k0xbwebcrack)
    - [Issue 21: rename short identifiers](#issue-21-rename-short-identifiers)
    - [On `pionxzh/wakaru`](#on-pionxzhwakaru)
    - [Issue 34: support `un-mangle` identifiers](#issue-34-support-un-mangle-identifiers)
    - [Issue 41: Module detection](#issue-41-module-detection)
    - [Issue 73: add a 'module graph'](#issue-73-add-a-module-graph)
    - [Issue 74: explore 'AST fingerprinting' for module/function identification (eg. to assist smart / stable renames, etc)](#issue-74-explore-ast-fingerprinting-for-modulefunction-identification-eg-to-assist-smart--stable-renames-etc)
    - [On `j4k0xb/webcrack`](#on-j4k0xbwebcrack)
    - [Issue 21: rename short identifiers](#issue-21-rename-short-identifiers)
    - [On `pionxzh/wakaru`](#on-pionxzhwakaru)
    - [Issue 34: support `un-mangle` identifiers](#issue-34-support-un-mangle-identifiers)
    - [Issue 41: Module detection](#issue-41-module-detection)
    - [Issue 73: add a 'module graph'](#issue-73-add-a-module-graph)
    - [Issue 74: explore 'AST fingerprinting' for module/function identification (eg. to assist smart / stable renames, etc)](#issue-74-explore-ast-fingerprinting-for-modulefunction-identification-eg-to-assist-smart--stable-renames-etc)
    - [On `jehna/humanify`](#on-jehnahumanify)
    - [Issue 97: More deterministic renames across different versions of the same code](#issue-97-more-deterministic-renames-across-different-versions-of-the-same-code)
    <!-- TOC end -->

    ## See Also
  24. @0xdevalias 0xdevalias revised this gist Sep 25, 2024. 1 changed file with 204 additions and 1 deletion.
    205 changes: 204 additions & 1 deletion fingerprinting-minified-javascript-libraries.md
    Original file line number Diff line number Diff line change
    @@ -1163,4 +1163,207 @@ Source: https://chat.openai.com/c/d9b7b64f-aa93-474e-939f-79e376e6d375
    >
    > You can also find the first link dump of content in the collapsible in the [first post on this issue](https://github.com/pionxzh/wakaru/issues/74#issue-2038720195).
    >
    > _Originally posted by @0xdevalias in https://github.com/pionxzh/wakaru/issues/74#issuecomment-2084114246_
    > _Originally posted by @0xdevalias in https://github.com/pionxzh/wakaru/issues/74#issuecomment-2084114246_
    ### On `jehna/humanify`
    #### Issue 97: More deterministic renames across different versions of the same code
    > Currently, LLMs often guess variable names differently across various versions of the same JavaScript code. This inconsistency complicates versioning, tracking changes, and merging code for anyone regularly analyzing or modifying applications, extensions, etc.
    >
    > My suggestion is to create a mapping file that lists generated variable names alongside their LLM-generated alternatives, updated continuously. This would serve as a lookup table for the LLM, helping maintain consistency and reducing variations in the final output. Admittedly, I haven't fully explored the feasibility of this concept, but I believe it would strengthen reverse-engineering processes.
    >
    > _Originally posted by @neoOpus in https://github.com/jehna/humanify/issues/97
    ---
    > > My suggestion is to create a mapping file that lists generated variable names alongside their LLM-generated alternatives
    >
    > @neoOpus This is similar to an area I have spent a fair bit of time thinking about/prototyping tooling around in the past. One of the bigger issues that you're likely to find here is that with bundlers like webpack/etc, when they minimise the variable names, they won't necessarily choose the same minified variable name for the same code each time. So to make a 'lookup table' type concept work, you first need to be able to stabilise the 'reference key' for each of those variables, even if the bundler chose something different to represent it.
    >
    > You can find some of my initial hacky prototypes scattered in this repo:
    >
    > - https://github.com/0xdevalias/poc-ast-tools
    >
    > My thoughts/notes on this are scattered around a few places, but these may be some useful/interesting places to start:
    >
    > - https://github.com/0xdevalias/chatgpt-source-watch/issues/3
    > - https://github.com/Wilfred/difftastic/issues/631
    > - https://github.com/afnanenayet/diffsitter/issues/819
    > - https://github.com/0xdevalias/poc-ast-tools/blob/main/diff-minimiser.js
    > - https://github.com/0xdevalias/poc-ast-tools/blob/main/diff-minimiser-poc-acorn.js
    > - https://github.com/0xdevalias/chatgpt-source-watch/issues/10
    > - https://github.com/pionxzh/wakaru/issues/34
    > - https://github.com/pionxzh/wakaru/issues/74
    > - https://gist.github.com/0xdevalias/d8b743efb82c0e9406fc69da0d6c6581#variable-name-mangling
    > - https://gist.github.com/0xdevalias/d8b743efb82c0e9406fc69da0d6c6581#my-chatgpt-research--conversations
    > - https://gist.github.com/0xdevalias/d8b743efb82c0e9406fc69da0d6c6581#fingerprinting-minified-javascript-libraries
    > - https://gist.github.com/0xdevalias/31c6574891db3e36f15069b859065267#fingerprinting-minified-javascript-libraries--ast-fingerprinting--source-code-similarity--etc
    > - https://github.com/pionxzh/wakaru/issues/73
    > - https://github.com/pionxzh/wakaru/issues/41
    > - https://github.com/j4k0xb/webcrack/issues/21
    >
    > You can see an example of a larger scale project where I was trying to stabilise the minified variable names to reduce the 'noise' in large scale source diffing here:
    >
    > - https://github.com/0xdevalias/chatgpt-source-watch
    >
    > _Originally posted by @0xdevalias in https://github.com/jehna/humanify/issues/97#issuecomment-2347878686_
    ---
    > > Currently, LLMs often guess variable names differently across various versions of the same JavaScript code. This inconsistency complicates versioning, tracking changes, and merging code for anyone regularly analyzing or modifying applications, extensions, etc.
    >
    > Just to clarify that I'm on the same page here, is the issue that:
    > * You have multiple versions of a webapp/website that change over time
    > * You un-minify all of them
    > * You need to compare their differencies, and it's proving difficult as Humanify does not generate same names for same minified code
    >
    > This is an interesting problem. I'd love to research some ways to implement this. Especially AST fingerprinting seems promising, thank you @0xdevalias for your links.
    >
    > _Originally posted by @jehna in https://github.com/jehna/humanify/issues/97#issuecomment-2356724015_
    ---
    > One issue related to fingerprinting is that most of the stuff in a modern webapp bundle is dependencies. And most of the dependencies probably have public source code. So in theory it would be possible to build a huge database of open source code fingerprints that would match a specific version of a specific code, and to have a tool that deterministically reverses the code to its actual original source.
    >
    > In theory we could use a similar method to build a local database of already-humanified code, which would make the reverse process more deterministic on subsequent runs.
    >
    > _Originally posted by @jehna in https://github.com/jehna/humanify/issues/97#issuecomment-2356732911_
    ---
    > I would like to share an idea I’ve been considering, even though I’m still in the process of researching this topic. I hope it proves to be useful!
    >
    > My suggestion is to break the code down into smaller, modular functions, which seems to be a practice your script might already be implementing. One approach to enhance this is to replace all variable names with generic placeholders (like a, b, c, d) or numerical identifiers (such as 0001, 0002, 0003) by order of apparency. (I honestly don't know how this can be done but maybe via RegEx or just asking LLM to do it).
    >
    > Anyway, this would allow for a standardized, minified version of the code. After creating this stripped down and abstracted version, we could calculate a hash of the code as a string. This hash would serve as a unique identifier to track changes portions of the code from different versions of the project and prevent duplicate entries as well as a reference to where to store the future generated variable names. The resulting data could be stored in an appropriate format, such as CSV, NoSQL, or JSON, based on your requirements for speed, scalability, and ease of access.
    >
    > Next, we could analyze this stored data from a designated project location or a maybe specified subfolder (into .humanifjs). Here, we could leverage language models (LLMs) to generate meaningful variable names based on the context of the functions. This would create a "reference" that can assist in future analyses of the code.
    >
    > When new versions of the obfuscated code are generated (which will have different variable names), we can apply a similar process to compare them with previously processed versions. By using diff techniques, we can identify changes and maintain a collection of these sub-chunks of code, which would help reduce discrepancies. In most cases, we should see a high degree of similarity unless a particular function’s logic has altered. We can then reassign the previously generated variable names (instead of the original variable names or having to generate different ones) to the new code chunks by feeding them as choices for the LLM or assigning them directly programmatically to reduce the need to consume more tokens for the same chunks.
    >
    > Additionally, to enhance this process, we could explore various optimizations in how the LLM generates and assigns these variable names, as well as how we handle the storage and retrieval of the chunks.
    >
    > I look forward to your thoughts on this approach and any suggestions you may have for improving it further!
    >
    > What would make this work better is to make it able to work take advantage of diff (compare) technics to make some sort of sub-chuncks then keeping them available to reduce the discrepancy, maybe also optimize the generation... I hope this makes sense.
    >
    > And as you stated here
    >
    > > One issue related to fingerprinting is that most of the stuff in a modern webapp bundle is dependencies. And most of the dependencies probably have public source code. So in theory it would be possible to build a huge database of open source code fingerprints that would match a specific version of a specific code, and to have a tool that deterministically reverses the code to its actual original source.
    > >
    > > In theory we could use a similar method to build a local database of already-humanified code, which would make the reverse process more deterministic on subsequent runs.
    >
    > This would be optimal indeed as it will allow to leverage the collective work to get the best results.
    >
    > PS: I don't have a good machine right now to do some testing myself, nor an API key that allows me to do them properly.
    >
    > _Originally posted by @neoOpus in https://github.com/jehna/humanify/issues/97#issuecomment-2359434509_
    ---
    > > One issue related to fingerprinting is that most of the stuff in a modern webapp bundle is dependencies. And most of the dependencies probably have public source code. So in theory it would be possible to build a huge database of open source code fingerprints that would match a specific version of a specific code, and to have a tool that deterministically reverses the code to its actual original source.
    >
    > @jehna Agreed. This was one of the ideas that first led me down the 'fingerprinting' path. Though instead of 'deterministically reversing the code to the original source' in its entirety (which may also be useful), my plan was first to be able to detect dependencies and mark them as such (as most of the time I don't care to look too deeply at them), and then secondly to just be able to extract the 'canonical variable/function names' from that original source and be able to apply them to my unminified version (similar to how `humanify` currently uses AI for this step); as that way I know that even if there is some little difference in the actual included code, I won't lose that by replacing it with the original source. These issues on `wakaru` are largely based on this area of things:
    >
    > - https://github.com/pionxzh/wakaru/issues/41
    > - https://github.com/pionxzh/wakaru/issues/73
    > - https://github.com/pionxzh/wakaru/issues/74
    >
    > While it's a very minimal/naive attempt, and definitely not the most robust way to approach things, a while back I implemented a really basic 'file fingerprint' method, mostly to assist in figuring out when a chunk had been renamed (but was otherwise largely the same chunk as before), that I just pushed to `poc-ast-tools` (https://github.com/0xdevalias/poc-ast-tools/commit/b0ef60f8608385c40de2644b3346b1834eb477a0):
    >
    > - https://github.com/0xdevalias/poc-ast-tools/blob/main/text_similarity_checker.py
    > - https://github.com/0xdevalias/poc-ast-tools/blob/main/rename-chunk.sh
    >
    > When I was implementing it, I was thinking about embeddings, but didn't want to have to send large files to the OpenAI embeddings API; and wanted a quick/simple local approximation of it.
    >
    > Expanding on this concept to the more general code fingerprinting problem; I would probably look at breaking things down to at least an individual module level, as I believe usually modules tend to coincide with original source files; and maybe even break things down even further to a function level if needed. I would also probably be normalising the code to remove any function/variable identifiers first; and to remove the impact of whitespace differences/etc.
    >
    > While it's not applied to generating a fingerprint, you can see how I've used some of these techniques in my approach to creating a 'diff minimiser' for identifying newly changed code between builds, while ignoring the 'minification noise / churn':
    >
    > - https://github.com/0xdevalias/poc-ast-tools/blob/main/diff-minimiser.js
    > - https://github.com/0xdevalias/poc-ast-tools/blob/main/diff-minimiser-poc-acorn.js
    >
    > ---
    >
    > > In theory we could use a similar method to build a local database of already-humanified code, which would make the reverse process more deterministic on subsequent runs.
    >
    > @jehna Oh true.. yeah, that definitely makes sense. Kind of like a local cache.
    >
    > ---
    >
    > > One approach to enhance this is to replace all variable names with generic placeholders (like a, b, c, d) or numerical identifiers (such as 0001, 0002, 0003) by order of apparency. (I honestly don't know how this can be done but maybe via RegEx or just asking LLM to do it).
    >
    > @neoOpus This would be handled by parsing the code into an AST, and then manipulating that AST to rename the variables.
    >
    > You can see various hacky PoC versions of this with various parsers in my `poc-ast-tools` repo (I don't remember which is the best/most canonical as I haven't looked at it all for ages), eg:
    >
    > - https://github.com/0xdevalias/poc-ast-tools/blob/main/babel_v1.js
    > - https://github.com/0xdevalias/poc-ast-tools/blob/main/babel_v1_0_old_combined.js
    > - https://github.com/0xdevalias/poc-ast-tools/blob/main/babel_v1_1.js
    > - https://github.com/0xdevalias/poc-ast-tools/blob/main/babel_v1_2.js
    > - https://github.com/0xdevalias/poc-ast-tools/blob/main/babel_v1_3.js
    > - https://github.com/0xdevalias/poc-ast-tools/blob/main/babel_v1_3_clean.js
    > - https://github.com/0xdevalias/poc-ast-tools/blob/main/babel_v1_3_cli.js
    > - etc: https://github.com/0xdevalias/poc-ast-tools
    >
    > Which you can see some of the early hacky mapping attempts I was making in these files:
    >
    > - https://github.com/0xdevalias/poc-ast-tools/blob/main/variableMapping.167-121de668c4456907-HEAD.json
    > - https://github.com/0xdevalias/poc-ast-tools/blob/main/variableMapping.167-HEAD-rewritten.json
    > - https://github.com/0xdevalias/poc-ast-tools/blob/main/variableMapping.167-HEAD.json
    > - https://github.com/0xdevalias/poc-ast-tools/blob/main/variableMapping.167-HEAD%5E1.json
    > - https://github.com/0xdevalias/poc-ast-tools/blob/main/variableMapping.167-f9af0280d3150ee2-HEAD.json
    > - https://github.com/0xdevalias/poc-ast-tools/blob/main/variableMapping.167-test.json
    > - https://github.com/0xdevalias/poc-ast-tools/blob/main/variableMapping.json
    >
    > That was the point where I realised I really needed something more robust (such as a proper fingerprint that would survive code minification) to use as the key.
    >
    > ---
    >
    > > We can then reassign the previously generated variable names (instead of the original variable names or having to generate different ones) to the new code chunks by feeding them as choices for the LLM or assigning them directly programmatically to reduce the need to consume more tokens for the same chunks.
    >
    > @neoOpus Re-applying the old variable names to the new code wouldn't need an LLM at all, as that part is handled in the AST processing code within `humanify`:
    >
    > - https://thejunkland.com/blog/using-llms-to-reverse-javascript-minification#:~:text=Don%27t%20let%20AI%20touch%20the%20code
    > - > Don't let AI touch the code
    > > Now while LLMs are very good at rephrasing and summarizing, they are not very good at coding (yet). They have inherent randomness, which makes them unsuitable for performing the actual renaming and modification of the code.
    > >
    > > Fortunately renaming a Javascript variable within its scope is a solved problem with traditional tools like Babel. Babel first parses the code into an abstract syntax tree (AST, a machine representation of the code), which is easy to modify using well behaving algorithms.
    > >
    > > This is much better than letting the LLM modify the code on a text level; it ensures that only very specific transformations are carried out so the code's functionality does not change after the renaming. The code is guaranteed to have the original functionality and to be runnable by the computer.
    >
    > ---
    >
    > > I would like to share an idea I’ve been considering, even though I’m still in the process of researching this topic. I hope it proves to be useful!
    >
    > @neoOpus At a high level, it seems that the thinking/aspects you've outlined here are more or less in line with what I've discussed previously in the resources I linked to [in my first comment above](https://github.com/jehna/humanify/issues/97#issuecomment-2347878686).
    >
    > ---
    >
    > > PS: I don't have a good machine right now to do some testing myself, nor an API key that allows me to do them properly.
    >
    > @neoOpus IMO, the bulk of the 'harder parts' of implementing this aren't really LLM related, and shouldn't require a powerful machine. The areas I would suggest most looking into around this are how AST parsing/manipulation works; and then how to create a robust/stable fingerprinting method.
    >
    > IMO, figuring the ideal method of fingerprinting is probably the largest / potentially hardest 'unknown' in all of this currently (at least to me, since while I started to gather resources for it, I haven't had the time to deep dive into reading/analysing them all):
    >
    > - https://gist.github.com/0xdevalias/31c6574891db3e36f15069b859065267#fingerprinting-minified-javascript-libraries--ast-fingerprinting--source-code-similarity--etc
    > - https://gist.github.com/0xdevalias/d8b743efb82c0e9406fc69da0d6c6581#fingerprinting-minified-javascript-libraries
    >
    > Off the top of my head, I would probably look at breaking things down to at least an individual module level, as I believe usually modules tend to coincide with original source files; and maybe even break things down even further to a function level if needed; and then generate fingerprints for them.
    >
    > I would also potentially consider looking at the module/function 'entry/exit' points (eg. imports/exports); or maybe even the entire 'shape' of the module import graph itself.
    >
    > I would also probably be normalising the code to remove any function/variable identifiers and to remove the impact of whitespace differences/etc; before generating any fingerprints on it.
    >
    > Another potential method I considered for the fingerprints is identifying the types of elements that tend to remain stable even when minified, and using those as part of the fingerprint. As that is one of the manual methods I used to be able to identify a number of the modules listed here:
    >
    > - https://github.com/pionxzh/wakaru/issues/41
    > - https://github.com/pionxzh/wakaru/issues/40
    > - https://github.com/pionxzh/wakaru/issues/79
    > - https://github.com/pionxzh/wakaru/issues/88
    > - https://github.com/pionxzh/wakaru/issues/89
    > - https://github.com/pionxzh/wakaru/issues/87
    > - etc: https://github.com/pionxzh/wakaru/issues?q=%22%5Bmodule-detection%5D%22
    >
    > _Originally posted by @0xdevalias in https://github.com/jehna/humanify/issues/97#issuecomment-2372638981_
  25. @0xdevalias 0xdevalias revised this gist Sep 17, 2024. 1 changed file with 9 additions and 0 deletions.
    9 changes: 9 additions & 0 deletions _deobfuscating-unminifying-obfuscated-web-app-code.md
    Original file line number Diff line number Diff line change
    @@ -1933,6 +1933,15 @@ In addition to the links directly below, also make sure to check out the various
    - > 10min tech talk + demo
    > Trustfall was featured in the "How to Query (Almost) Everything" talk talk at the HYTRADBOI 2022 conference.
    - https://www.hytradboi.com/2022/how-to-query-almost-everything
    - https://predr.ag/querying/
    - > How to Query (Almost) Everything
    - > While working at Kensho, I started a query compiler project designed to empower everyone (even non-engineers) to make use of datasets available to them. The query system is responsible for creating a schema covering all available data across all databases, and allows the user to write cross-database queries without needing to know about the location and representation of the various bits of data involved in the query.
    - > The project I built at Kensho is now open source on GitHub, under the name GraphQL compiler — in retrospect, a name I somewhat regret since the query language uses GraphQL-compatible syntax but has significant differences in query semantics.
    - https://github.com/kensho-technologies/graphql-compiler
    - > graphql-compiler
    > Turn complex GraphQL queries into optimized database queries.
    - > GraphQL compiler is a library that simplifies data querying and exploration by exposing one simple query language written using GraphQL syntax to target multiple database backends. It currently supports OrientDB. and multiple SQL database management systems, such as PostgreSQL, MSSQL and MySQL.
    - > More recently, I've successfully extended the same ideas beyond querying just databases, and added support for querying APIs, raw files, ML models, or any other data source.
    - AST Query Libs
    - https://github.com/estools/esquery
    - > ESQuery
  26. @0xdevalias 0xdevalias revised this gist Sep 3, 2024. 1 changed file with 4 additions and 0 deletions.
    4 changes: 4 additions & 0 deletions _deobfuscating-unminifying-obfuscated-web-app-code.md
    Original file line number Diff line number Diff line change
    @@ -998,6 +998,8 @@
    - https://babeljs.io/docs/babel-parser#api
    - https://babeljs.io/docs/babel-parser#output
    - > The Babel parser generates AST according to Babel AST format. It is based on ESTree spec with the following deviations...
    - https://github.com/babel/babel/tree/main/packages/babel-parser
    - > @babel/parser
    - https://github.com/babel/babel/blob/main/packages/babel-parser/ast/spec.md
    - > AST for JSX code is based on Facebook JSX AST
    - https://github.com/facebook/jsx/blob/main/AST.md
    @@ -1028,6 +1030,8 @@
    - > @babel/traverse
    - > We can use it alongside the `babel` parser to traverse and update nodes
    - https://github.com/babel/babel/tree/main/packages/babel-traverse
    - > @babel/traverse
    - https://github.com/babel/babel/blob/main/packages/babel-traverse/src/index.ts
    - https://github.com/babel/babel/blob/main/packages/babel-traverse/src/traverse-node.ts#L8-L20
    - > Traverse the children of given node
    - https://github.com/babel/babel/blob/main/packages/babel-traverse/src/scope/index.ts#L380-L394
  27. @0xdevalias 0xdevalias revised this gist Sep 3, 2024. 1 changed file with 47 additions and 0 deletions.
    47 changes: 47 additions & 0 deletions _deobfuscating-unminifying-obfuscated-web-app-code.md
    Original file line number Diff line number Diff line change
    @@ -1899,6 +1899,53 @@ In addition to the links directly below, also make sure to check out the various
    > - `spectest-interp`: read a Spectest JSON file, and run its tests in the interpreter
    >
    > These tools are intended for use in (or for development of) toolchains or other systems that want to manipulate WebAssembly files. Unlike the WebAssembly spec interpreter (which is written to be as simple, declarative and "speccy" as possible), they are written in C/C++ and designed for easier integration into other systems. Unlike Binaryen these tools do not aim to provide an optimization platform or a higher-level compiler target; instead they aim for full fidelity and compliance with the spec (e.g. 1:1 round-trips with no changes to instructions).
    - https://glean.software/
    - > Glean
    > System for collecting, deriving and querying facts about source code
    - https://glean.software/docs/introduction/
    - > Introduction
    > Glean is a system for working with facts about source code. It is designed for collecting and storing detailed information about code structure, and providing access to the data to power tools and experiences from online IDE features to offline code analysis.
    >
    > For example, Glean could answer all the questions you'd expect your IDE to answer, accurately and efficiently on a large-scale codebase. Things like:
    >
    > - Where is the definition of this method?
    > - Where are all the callers of this function?
    > - Who inherits from this class?
    > - What are all the declarations in this file?
    >
    > But Glean isn't limited to storing particular kinds of data, or answering particular queries. Glean comes with indexers and schemas for some languages which support queries like the examples above, but you can also define your own schemas and store whatever data you like, perhaps augmenting the data that existing indexers collect. So, for example, you could store test coverage data or profiling data.
    >
    > Glean's powerful query language means that you can build tools around complex queries of the underlying data. For example, you could search for dead code, write code linters, API migration tools or refactoring tools, all by using Glean queries instead of a compiler API to inspect the code structure.
    - https://github.com/facebookincubator/glean
    - > System for collecting, deriving and working with facts about source code.
    - https://github.com/obi1kenobi/trustfall
    - > Trustfall — Engine for Querying (Almost) Everything
    > Trustfall is a query engine for querying any kind of data source, from APIs and databases to any kind of files on disk — and even AI models.
    - > A query engine for any combination of data sources. Query your files and APIs as if they were databases!
    - https://github.com/obi1kenobi/trustfall#try-trustfall-in-your-browser
    - > Try Trustfall in your browser
    > The Trustfall Playground supports running queries against public data sources
    - https://github.com/obi1kenobi/trustfall#10min-tech-talk--demo
    - > 10min tech talk + demo
    > Trustfall was featured in the "How to Query (Almost) Everything" talk talk at the HYTRADBOI 2022 conference.
    - https://www.hytradboi.com/2022/how-to-query-almost-everything
    - AST Query Libs
    - https://github.com/estools/esquery
    - > ESQuery
    > ECMAScript AST query library
    - > ESQuery is a library for querying the AST output by Esprima for patterns of syntax using a CSS style selector system
    - https://github.com/rse/astq
    - > ASTq
    > Abstract Syntax Tree (AST) Query Engine
    - > ASTq is an Abstract Syntax Tree (AST) query engine library for JavaScript, i.e., it allows you to query nodes of an arbitary AST-style hierarchical data structure with the help of a powerful XPath-inspired query language. ASTq can operate on arbitrary AST-style data structures through the help of pluggable access adapters.
    - https://github.com/rse/asty-astq
    - > ASTy-ASTq
    > Abstract Syntax Tree With Integrated Query Engine
    - > ASTy-ASTq, as its name implies, is a combination of the Abstract Syntax Tree (AST) Data Structure library ASTy and the Abstract Syntax Tree (AST) Query Engine Library ASTq. Technically ASTy-ASTq is a super-class of ASTy while it internally integrates ASTq so that each ASTy node has additional compile, execute and query methods available.
    - https://github.com/rse/asty
    - > ASTy
    > Abstract Syntax Tree (AST) Data Structure
    - > ASTy is a Abstract Syntax Tree (AST) Data Structure library for JavaScript, i.e., it provides a hierarchical data structure for holding the syntax abstraction of an arbitrary formal language. It is usually used in combination with a parser generator like PEG.js (and then especially with its utility class PEGUtil) to carry the results of the parsing step and to provide the vehicle for further processing those results.

    ## My ChatGPT Research / Conversations

  28. @0xdevalias 0xdevalias revised this gist Jun 9, 2024. 1 changed file with 96 additions and 0 deletions.
    96 changes: 96 additions & 0 deletions _deobfuscating-unminifying-obfuscated-web-app-code.md
    Original file line number Diff line number Diff line change
    @@ -1698,6 +1698,42 @@ In addition to the links directly below, also make sure to check out the various
    > - Symbolic execution and automated theorem provers have limitations on the classes of constraints they can represent and solve. For example, a theorem prover based on linear arithmetic will be unable to cope with the nonlinear path condition xy = 6. Any time that such constraints arise, the symbolic execution may substitute the current concrete value of one of the variables to simplify the problem. An important part of the design of a concolic testing system is selecting a symbolic representation precise enough to represent the constraints of interest.
    - https://en.wikipedia.org/wiki/Concolic_testing#Tools
    - Jalangi is an open-source concolic testing and symbolic execution tool for JavaScript. Jalangi supports integers and strings.
    - https://en.wikipedia.org/wiki/Constraint_logic_programming
    - > Constraint logic programming
    - > Constraint logic programming is a form of constraint programming, in which logic programming is extended to include concepts from constraint satisfaction. A constraint logic program is a logic program that contains constraints in the body of clauses.
    - https://en.wikipedia.org/wiki/Automated_theorem_proving
    - > Automated theorem proving
    - > Automated theorem proving (also known as ATP or automated deduction) is a subfield of automated reasoning and mathematical logic dealing with proving mathematical theorems by computer programs.
    - https://github.com/ksluckow/awesome-symbolic-execution
    - > Awesome Symbolic Execution
    > A curated list of awesome symbolic execution resources including essential research papers, lectures, videos, and tools.
    - https://angr.io/
    - > angr
    > angr is an open-source binary analysis platform for Python. It combines both static and dynamic symbolic ("concolic") analysis, providing tools to solve a variety of tasks.
    - > Features:
    > - Symbolic Execution: Provides a powerful symbolic execution engine, constraint solving, and instrumentation.
    > - Control-Flow Graph Recovery: Provides advanced analysis techniques for control-flow graph recovery.
    > - Disassembly & Lifting: Provides convenient methods to disassemble code and lift to an intermediate language.
    > - Decompilation: Decompile machine code to angr Intermediate Language (AIL) and C pseudocode.
    > - Architecture Support: Supports analysis of several CPU architectures, loading from several executable formats.
    > - Extensibility: Provides powerful extensibility for analyses, architectures, platforms, exploration techniques, hooks, and more.
    - https://docs.angr.io/en/latest/
    - > Welcome to angr’s documentation!
    - > Welcome to angr’s documentation! This documentation is intended to be a guide for learning angr, as well as a reference for the API.
    - https://angr.io/blog/
    - https://github.com/angr
    - > angr: Next-generation binary analysis framework!
    - https://github.com/angr/angr
    - > angr
    - > A powerful and user-friendly binary analysis platform!
    - https://github.com/angr/angr-management
    - > angr Management
    - > The official angr GUI
    - https://github.com/angr/cle
    - > CLE
    - > CLE Loads Everything (at least, many binary formats!)
    - > CLE loads binaries and their associated libraries, resolves imports and provides an abstraction of process memory the same way as if it was loader by the OS's loader.
    - https://github.com/angr/cle#usage-example
    - https://github.com/Z3Prover/z3
    - > The Z3 Theorem Prover
    - https://github.com/Z3Prover/z3/wiki
    @@ -1747,6 +1783,66 @@ In addition to the links directly below, also make sure to check out the various
    > - an analysis to profile object allocation and usage,
    > - a simple form of taint analysis,
    > - an experimental pure symbolic execution engine (currently undocumented)
    - https://github.com/ExpoSEJS/ExpoSE
    - > ExpoSE
    - > A Dynamic Symbolic Execution (DSE) engine for JavaScript. ExpoSE is highly scalable, compatible with recent JavaScript standards, and supports symbolic modelling of strings and regular expressions.
    - > ExpoSE is a dynamic symbolic execution engine for JavaScript, developed at Royal Holloway, University of London by Blake Loring, Duncan Mitchell, and Johannes Kinder (now at LMU Munich). ExpoSE supports symbolic execution of Node.js programs and JavaScript in the browser. ExpoSE is based on Jalangi2 and the Z3 SMT solver.
    - https://dl.acm.org/doi/10.1145/2635868.2635913
    - > SymJS: automatic symbolic testing of JavaScript web applications (2014)
    - > We present SymJS, a comprehensive framework for automatic testing of client-side JavaScript Web applications. The tool contains a symbolic execution engine for JavaScript, and an automatic event explorer for Web pages. Without any user intervention, SymJS can automatically discover and explore Web events, symbolically execute the associated JavaScript code, refine the execution based on dynamic feedbacks, and produce test cases with high coverage. The symbolic engine contains a symbolic virtual machine, a string-numeric solver, and a symbolic executable DOM model. SymJS's innovations include a novel symbolic virtual machine for JavaScript Web, symbolic+dynamic feedback directed event space exploration, and dynamic taint analysis for enhancing event sequence construction. We illustrate the effectiveness of SymJS on standard JavaScript benchmarks and various real-life Web applications. On average SymJS achieves over 90% line coverage for the benchmark programs, significantly outperforming existing methods.
    - https://github.com/javert2/JaVerT2.0
    - > JaVerT2.0 - Compositional Symbolic Execution for JavaScript
    - > JaVerT: JavaScript Verification Toolchain
    > JaVerT (pronounced [ʒavɛʁ]) is a toolchain for semi-automatic verification of functional correctness properties of JavaScript programs. It is based on separation logic.
    - > Deprected - Please use Gillian-JS instead
    > We've built a generalised version of JaVerT2.0 called Gillian, which is currently hosted at https://github.com/GillianPlatform/Gillian
    - https://github.com/GillianPlatform/Gillian
    - > The Gillian Platform main repository
    - https://gillianplatform.github.io/
    - https://gillianplatform.github.io/sphinx/c/index.html
    - > Gillian-C
    - > Gillian-C is the instantiation of Gillian to the C language (CompCert-C, to be precise). It can be found in the Gillian-C folder of the repository.
    - https://gillianplatform.github.io/sphinx/js/index.html
    - > Gillian-JS
    - > Gillian-JS is the instantiation of Gillian to JavaScript (ECMAScript 5 Strict), found in the Gillian-JS folder of the repository.
    - > Danger: Gillian-JS is currently broken (see here).
    - https://github.com/GillianPlatform/Gillian/issues/113
    - > Gillian-JS is broken
    - https://github.com/GillianPlatform/Gillian/pull/229
    - > Fix JS
    - https://github.com/GillianPlatform/Gillian/issues/237
    - > Fix Amazon JS
    - https://github.com/GillianPlatform/Gillian/pull/238
    - > Fix Amazon JS verification
    - https://www.doc.ic.ac.uk/~pg/publications/FragosoSantos2019JaVerT.pdf
    - > JaVerT 2.0: Compositional Symbolic Execution for JavaScript (2019)
    - > We propose a novel, unified approach to the development of compositional symbolic execution tools, bridging the gap between classical symbolic execution and compositional program reasoning based on separation logic. Using this approach, we build JaVerT 2.0, a symbolic analysis tool for JavaScript that follows the language semantics without simplifications. JaVerT 2.0 supports whole-program symbolic testing, verification, and, for the first time, automatic compositional testing based on bi-abduction. The meta-theory underpinning JaVerT 2.0 is developed modularly, streamlining the proofs and informing the implementation. Our explicit treatment of symbolic execution errors allows us to give meaningful feedback to the developer during wholeprogram symbolic testing and guides the inference of resource of the bi-abductive execution. We evaluate the performance of JaVerT 2.0 on a number of JavaScript data-structure libraries, demonstrating: the scalability of our whole-program symbolic testing; an improvement over the state-of-the-art in JavaScript verification; and the feasibility of automatic compositional testing for JavaScript.
    - https://webblaze.cs.berkeley.edu/2010/kudzu/kudzu.pdf
    - > A Symbolic Execution Framework for JavaScript (2010)
    - > As AJAX applications gain popularity, client-side JavaScript code is becoming increasingly complex. However, few automated vulnerability analysis tools for JavaScript exist. In this paper, we describe the first system for exploring the execution space of JavaScript code using symbolic execution. To handle JavaScript code’s complex use of string operations, we design a new language of string constraints and implement a solver for it. We build an automatic end-to-end tool, Kudzu, and apply it to the problem of finding client-side code injection vulnerabilities. In experiments on 18 live web applications, Kudzu automatically discovers 2 previously unknown vulnerabilities and 9 more that were previously found only with a manually-constructed test suite.
    - https://www.code-intelligence.com/blog/using-symbolic-execution-fuzzing
    - > Why You Should Combine Symbolic Execution and Fuzzing
    - > As opposed to traditional fuzzers, which generate inputs without taking code structure into account, symbolic execution tools precisely capture the computation of each value. They use solvers at each branch to generate new inputs and thus to provide the precisely calculated input to cover all parts of code.
    - > Symbolic Execution Tools
    > - KLEE: KLEE is an open-source code testing instrument that runs on LLVM bitcode, a representation of the program created by the clang compiler. KLEE explores the program and generates test cases to reproduce any crashes it finds.
    > - > Driller
    > > Driller is a concolic execution tool. Concolic execution is a software testing technique that performs symbolic execution (using symbolic input values with sets of expressions, one expression per output variable) with concrete execution (testing on particular inputs) path. The advantage of this approach is that it can achieve high code coverage even in case of complex source code, but still maintain a high degree of scalability and speed.
    > >
    > > Driller uses selective concolic execution to explore only the paths that are found interesting by the fuzzer and to generate inputs for conditions (branches) that a fuzzer cannot satisfy. In other words, it leverages concolic execution to reach deeper program code but uses a feedback-driven/guided fuzzer to alleviate path explosion, which greatly increases the speed of the testing process.
    - > Although Driller marked significant research advances in the field of symbolic execution, it is still a highly specialized tool that requires expert knowledge to set up and run and uses up a lot of computational resources. So, how can the industry profit from the latest research?
    >
    > Driller and other symbolic or concolic execution tools can be paired with open-source fuzzing tools. Contrary to many traditional fuzzers, modern fuzzers such as AFL++ or libfuzzer do not just generate random inputs. Instead, they use intelligent algorithms to provide inputs that reach deeper into the code structure. Enhancing such fuzzers with concolic execution is highly effective in cases when fuzzing algorithms reach their limits.
    - https://github.com/AFLplusplus/AFLplusplus
    - > American Fuzzy Lop plus plus (AFL++)
    - > The fuzzer afl++ is afl with community patches, qemu 5.1 upgrade, collision-free coverage, enhanced laf-intel & redqueen, AFLfast++ power schedules, MOpt mutators, unicorn_mode, and a lot more!
    - https://aflplus.plus/
    - > AFL++ Overview
    > AFLplusplus is the daughter of the American Fuzzy Lop fuzzer by Michał “lcamtuf” Zalewski and was created initially to incorporate all the best features developed in the years for the fuzzers in the AFL family and not merged in AFL cause it is not updated since November 2017.
    - https://llvm.org/docs/LibFuzzer.html
    - > libFuzzer – a library for coverage-guided fuzz testing
    - > LibFuzzer is an in-process, coverage-guided, evolutionary fuzzing engine.
    >
    > LibFuzzer is linked with the library under test, and feeds fuzzed inputs to the library via a specific fuzzing entrypoint (aka “target function”); the fuzzer then tracks which areas of the code are reached, and generates mutations on the corpus of input data in order to maximize the code coverage. The code coverage information for libFuzzer is provided by LLVM’s SanitizerCoverage instrumentation.

    ## Profiling

  29. @0xdevalias 0xdevalias revised this gist May 3, 2024. 1 changed file with 4 additions and 3 deletions.
    7 changes: 4 additions & 3 deletions _deobfuscating-unminifying-obfuscated-web-app-code.md
    Original file line number Diff line number Diff line change
    @@ -1815,12 +1815,13 @@ These are private chat links, so won't work for others, and are included here on
    ### My Other Related Deepdive Gist's and Projects

    - https://github.com/0xdevalias/chatgpt-source-watch : Analyzing the evolution of ChatGPT's codebase through time with curated archives and scripts.
    - [Fingerprinting Minified JavaScript Libraries / AST Fingerprinting / Source Code Similarity / Etc (0xdevalias gist)](https://gist.github.com/0xdevalias/31c6574891db3e36f15069b859065267#fingerprinting-minified-javascript-libraries--ast-fingerprinting--source-code-similarity--etc)
    - [Debugging Electron Apps (and related memory issues) (0xdevalias gist)](https://gist.github.com/0xdevalias/428e56a146e3c09ec129ee58584583ba#debugging-electron-apps-and-related-memory-issues)
    - [Reverse engineering ChatGPT's frontend web app + deep dive explorations of the code (0xdevalias gist)](https://gist.github.com/0xdevalias/4ac297ee3f794c17d0997b4673a2f160#reverse-engineering-chatgpts-frontend-web-app--deep-dive-explorations-of-the-code)
    - [Reverse Engineering Webpack Apps (0xdevalias gist)](https://gist.github.com/0xdevalias/8c621c5d09d780b1d321bfdb86d67cdd#reverse-engineering-webpack-apps)
    - [Reverse Engineered Webpack Tailwind-Styled-Component (0xdevalias gist)](https://gist.github.com/0xdevalias/916e4ababd3cb5e3470b07a024cf3125#reverse-engineered-webpack-tailwind-styled-component)
    - [Reverse engineering ChatGPT's frontend web app + deep dive explorations of the code (0xdevalias gist)](https://gist.github.com/0xdevalias/4ac297ee3f794c17d0997b4673a2f160#reverse-engineering-chatgpts-frontend-web-app--deep-dive-explorations-of-the-code)
    - [React Server Components, Next.js v13+, and Webpack: Notes on Streaming Wire Format (`__next_f`, etc) (0xdevalias' gist))](https://gist.github.com/0xdevalias/ac465fb2f7e6fded183c2a4273d21e61#react-server-components-nextjs-v13-and-webpack-notes-on-streaming-wire-format-__next_f-etc)
    - [Fingerprinting Minified JavaScript Libraries / AST Fingerprinting / Source Code Similarity / Etc (0xdevalias gist)](https://gist.github.com/0xdevalias/31c6574891db3e36f15069b859065267#fingerprinting-minified-javascript-libraries--ast-fingerprinting--source-code-similarity--etc)
    - [Bypassing Cloudflare, Akamai, etc (0xdevalias gist)](https://gist.github.com/0xdevalias/b34feb567bd50b37161293694066dd53#bypassing-cloudflare-akamai-etc)
    - [Debugging Electron Apps (and related memory issues) (0xdevalias gist)](https://gist.github.com/0xdevalias/428e56a146e3c09ec129ee58584583ba#debugging-electron-apps-and-related-memory-issues)
    - [devalias' Beeper CSS Hacks (0xdevalias gist)](https://gist.github.com/0xdevalias/3d2f5a861335cc1277b21a29d1285cfe#beeper-custom-theme-styles)
    - [Reverse Engineering Golang (0xdevalias' gist)](https://gist.github.com/0xdevalias/4e430914124c3fd2c51cb7ac2801acba#reverse-engineering-golang)
    - [Reverse Engineering on macOS (0xdevalias' gist)](https://gist.github.com/0xdevalias/256a8018473839695e8684e37da92c25#reverse-engineering-on-macos)
  30. @0xdevalias 0xdevalias revised this gist Apr 30, 2024. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions _deobfuscating-unminifying-obfuscated-web-app-code.md
    Original file line number Diff line number Diff line change
    @@ -1815,6 +1815,7 @@ These are private chat links, so won't work for others, and are included here on
    ### My Other Related Deepdive Gist's and Projects

    - https://github.com/0xdevalias/chatgpt-source-watch : Analyzing the evolution of ChatGPT's codebase through time with curated archives and scripts.
    - [Fingerprinting Minified JavaScript Libraries / AST Fingerprinting / Source Code Similarity / Etc (0xdevalias gist)](https://gist.github.com/0xdevalias/31c6574891db3e36f15069b859065267#fingerprinting-minified-javascript-libraries--ast-fingerprinting--source-code-similarity--etc)
    - [Debugging Electron Apps (and related memory issues) (0xdevalias gist)](https://gist.github.com/0xdevalias/428e56a146e3c09ec129ee58584583ba#debugging-electron-apps-and-related-memory-issues)
    - [Reverse Engineering Webpack Apps (0xdevalias gist)](https://gist.github.com/0xdevalias/8c621c5d09d780b1d321bfdb86d67cdd#reverse-engineering-webpack-apps)
    - [Reverse Engineered Webpack Tailwind-Styled-Component (0xdevalias gist)](https://gist.github.com/0xdevalias/916e4ababd3cb5e3470b07a024cf3125#reverse-engineered-webpack-tailwind-styled-component)