Skip to content

Instantly share code, notes, and snippets.

View akshatvishu's full-sized avatar
🗿
Focusing

akshatvishu

🗿
Focusing
View GitHub Profile
@akshatvishu
akshatvishu / my_paddle_scale.py
Created September 25, 2023 18:37
try to mimic paddle.scale
def custom_scale(x, scale=1.0, bias=0.0, bias_after_scale=True, act=None, name=None):
original_dtype = x.dtype
if original_dtype in [paddle.int16,paddle.int8, paddle.uint8]:
x = paddle.cast(x, dtype=paddle.int32)
# ToDO:May need to add logic when scale_input is tensor
# because `x` dtype and `scale` dtype need to match!
if not isinstance(scale, paddle.Tensor):
scale = paddle.to_tensor(scale, dtype=x.dtype)
@akshatvishu
akshatvishu / format.py
Created September 15, 2023 20:19
format def gradients for `tokyo` (ivy)
def gradients(
y: Union[ivy.Array, ivy.NativeArray],
x: Union[ivy.Array, ivy.NativeArray],
grad_y: Union[ivy.Array, ivy.NativeArray] = None,
name: str = "gradients",
gate_gradients: bool = False,
aggregation_method: Callable = None,
stop_gradients: Union[ivy.Array, ivy.NativeArray] = None,
) -> ivy.Array:
"""
@akshatvishu
akshatvishu / llma2
Created July 28, 2023 13:59
mega quick summary
```md
# quick summary
Llama 2 is a large language model developed by Meta AI, which is an improvement over its predecessor, Llama 1. The small models (7B & 13B) of Llama 1 were trained on 1 trillion tokens, while the large models saw 1.4T tokens. In contrast, all models of Llama 2 were trained on 2T tokens, resulting in great performance for small models.
As a result of the long training runs, Llama 2 beats other major open-source models at most academic benchmarks. Their 7B model performs significantly better than other 7B options on all tasks except code. Compared to closed-source models, Llama2-70B is competitive with PaLM. Llama2 is not as good as GPT-3.5 at code but is probably comparable otherwise. Llama2 loses to the (reputedly much larger) PaLM2 and GPT-4 models on common benchmarks.
Meta also released Llama2-chat, which was created using a high-effort instruction tuning strategy. They acquired 28K human-labeled instruction responses from a commercial dataset/label vendor and used them to fine-tune
@akshatvishu
akshatvishu / gist:eaad7e2bcf829ea867dde4c0c232af89
Last active June 13, 2023 09:29
Naive erfc implementation which follows `scipy.special.erfc`
import math
import sys
import scipy.special as sp
def erfc(x):
c = 0.564189583547756
a = [7.71058495001320e-05, -0.00133732772997339, 0.0323076579225834, 0.0479137145607681, 0.128379167095513]
b = [0.00301048631703895, 0.0538971678740286, 0.375795757275549]
p = [-1.36864857382717e-07, 0.564195517478974, 7.21175825088309, 43.1622272220567, 152.989285046940, 339.320816734344, 451.918953711873, 300.459261020162]
@akshatvishu
akshatvishu / grokking_to_leetcode.md
Created January 30, 2023 09:29 — forked from tykurtz/grokking_to_leetcode.md
Grokking the coding interview equivalent leetcode problems

GROKKING NOTES

I liked the way Grokking the coding interview organized problems into learnable patterns. However, the course is expensive and the majority of the time the problems are copy-pasted from leetcode. As the explanations on leetcode are usually just as good, the course really boils down to being a glorified curated list of leetcode problems.

So below I made a list of leetcode problems that are as close to grokking problems as possible.

Pattern: Sliding Window