akshatvishu

GROKKING NOTES

I liked the way Grokking the coding interview organized problems into learnable patterns. However, the course is expensive and the majority of the time the problems are copy-pasted from leetcode. As the explanations on leetcode are usually just as good, the course really boils down to being a glorified curated list of leetcode problems.

So below I made a list of leetcode problems that are as close to grokking problems as possible.

	def custom_scale(x, scale=1.0, bias=0.0, bias_after_scale=True, act=None, name=None):
	original_dtype = x.dtype
	if original_dtype in [paddle.int16,paddle.int8, paddle.uint8]:
	x = paddle.cast(x, dtype=paddle.int32)

	# ToDO:May need to add logic when scale_input is tensor
	# because `x` dtype and `scale` dtype need to match!
	if not isinstance(scale, paddle.Tensor):
	scale = paddle.to_tensor(scale, dtype=x.dtype)

	def gradients(
	y: Union[ivy.Array, ivy.NativeArray],
	x: Union[ivy.Array, ivy.NativeArray],
	grad_y: Union[ivy.Array, ivy.NativeArray] = None,
	name: str = "gradients",
	gate_gradients: bool = False,
	aggregation_method: Callable = None,
	stop_gradients: Union[ivy.Array, ivy.NativeArray] = None,
	) -> ivy.Array:
	"""

	```md
	# quick summary
	Llama 2 is a large language model developed by Meta AI, which is an improvement over its predecessor, Llama 1. The small models (7B & 13B) of Llama 1 were trained on 1 trillion tokens, while the large models saw 1.4T tokens. In contrast, all models of Llama 2 were trained on 2T tokens, resulting in great performance for small models.

	As a result of the long training runs, Llama 2 beats other major open-source models at most academic benchmarks. Their 7B model performs significantly better than other 7B options on all tasks except code. Compared to closed-source models, Llama2-70B is competitive with PaLM. Llama2 is not as good as GPT-3.5 at code but is probably comparable otherwise. Llama2 loses to the (reputedly much larger) PaLM2 and GPT-4 models on common benchmarks.

	Meta also released Llama2-chat, which was created using a high-effort instruction tuning strategy. They acquired 28K human-labeled instruction responses from a commercial dataset/label vendor and used them to fine-tune

	import math
	import sys
	import scipy.special as sp


	def erfc(x):
	c = 0.564189583547756
	a = [7.71058495001320e-05, -0.00133732772997339, 0.0323076579225834, 0.0479137145607681, 0.128379167095513]
	b = [0.00301048631703895, 0.0538971678740286, 0.375795757275549]
	p = [-1.36864857382717e-07, 0.564195517478974, 7.21175825088309, 43.1622272220567, 152.989285046940, 339.320816734344, 451.918953711873, 300.459261020162]

akshatvishu

GROKKING NOTES

Pattern: Sliding Window