Bootstrap knowledge of LLMs ASAP. With a bias/focus to GPT.
Avoid being a link dump. Try to provide only valuable well tuned information.
Neural network links before starting with transformers.
| from bitsandbytes.nn.modules import Linear8bitLt, Linear4bit | |
| from contextlib import contextmanager | |
| def noop (x=None, *args, **kwargs): | |
| "Do nothing" | |
| return x | |
| @contextmanager | |
| def no_kaiming(): | |
| old_iku = init.kaiming_uniform_ | 
| {-# LANGUAGE TypeSynonymInstances #-} | |
| data Dual d = D Float d deriving Show | |
| type Float' = Float | |
| diff :: (Dual Float' -> Dual Float') -> Float -> Float' | |
| diff f x = y' | |
| where D y y' = f (D x 1) | |
| class VectorSpace v where | |
| zero :: v | 
The SalesForce CodeGen models are a family of large language models trained on a large amount of natural language data and then fine-tuned on specialized datasets of code. Models of size 350M, 2B, 6B, and 16B parameters are provided in three flavors:
Twitter thread: https://twitter.com/theshawwn/status/1456925974919004165
Hacker News thread: https://news.ycombinator.com/item?id=29128998
November 6, 2021
jnp.device_put(1) is deceptively simple to write in JAX. But on a TPU, what actually happens? How does a tensor containing the value 1 actually get onto a TPU?
Turns out, the answer is "C++", and a lot of it.
| #!/bin/bash | |
| # Attempt to set up the Nvidia GeForce GT 710 on a Pi CM4. | |
| # | |
| # I have tried both armv7l and aarch64 versions of the proprietary driver, in | |
| # addition to the nouveau open source driver (which needs to be compiled into | |
| # a custom Raspberry Pi kernel). | |
| # | |
| # tl;dr - None of the drivers worked :P | 
This document was originally written several years ago. At the time I was working as an execution core verification engineer at Arm. The following points are coloured heavily by working in and around the execution cores of various processors. Apply a pinch of salt; points contain varying degrees of opinion.
It is still my opinion that RISC-V could be much better designed; though I will also say that if I was building a 32 or 64-bit CPU today I'd likely implement the architecture to benefit from the existing tooling.
Mostly based upon the RISC-V ISA spec v2.0. Some updates have been made for v2.2
The RISC-V ISA has pursued minimalism to a fault. There is a large emphasis on minimizing instruction count, normalizing encoding, etc. This pursuit of minimalism has resulted in false orthogonalities (such as reusing the same instruction for branches, calls and returns) and a requirement for superfluous instructions which impacts code density both in terms of size and
| {-# LANGUAGE DataKinds #-} | |
| {-# LANGUAGE FlexibleInstances #-} | |
| {-# LANGUAGE GADTs #-} | |
| {-# LANGUAGE KindSignatures #-} | |
| {-# LANGUAGE OverloadedStrings #-} | |
| module Main where | |
| import Control.Applicative | |
| import Data.Attoparsec.Text as A | 
| WAYLAND_PROTOCOLS=/usr/share/wayland-protocols | |
| # wayland-scanner is a tool which generates C headers and rigging for Wayland | |
| # protocols, which are specified in XML. wlroots requires you to rig these up | |
| # to your build system yourself and provide them in the include path. | |
| xdg-shell-protocol.h: | |
| wayland-scanner server-header \ | |
| $(WAYLAND_PROTOCOLS)/stable/xdg-shell/xdg-shell.xml $@ | |
| xdg-shell-protocol.c: xdg-shell-protocol.h | 
| {-# LANGUAGE BangPatterns #-} | |
| import qualified Data.Vector as V | |
| import System.CPUTime | |
| import System.Environment | |
| import Text.Printf | |
| {- Implementation of the WROM algorithm for finding all | |
| free trees of a given order. The algorithm is explained | |
| here: |