I have taken quite some machine learning courses and have done a few projects already. I think I know the math formula involved in transformers and GPT models. However, I always wondered how they work in reality. The best way for me is to read and understand source codes implementing these models. I am a C/C++ programmer mostly. I am more comfortable to read C/C++ programs. So, recently I started to read, run, and debug ggml's gpt-2 inference example since ggml is entirely written in C and can run many transformer models on a laptop: https://github.com/ggerganov/ggml/tree/master/examples/gpt-2 . The famous llama.cpp is closely connected to this library. My experiment environment is a MacBook Pro laptop+ Visual Studio Code + cmake+ CodeLLDB (gdb does not work with my M2 chip), and GPT-2 117 M model. Here is what I have learned so far:
The high-level main function has the following structure https://github.com/ggerganov/ggml/blob/master/examples/gpt-2/main-backend.cpp
- load the model: ggml specific format us