Farshid Varno fvarno

High-Performance Matrix Multiplication

This is a short post that explains how to write a high-performance matrix multiplication program on modern processors. In this tutorial I will use a single core of the Skylake-client CPU with AVX2, but the principles in this post also apply to other processors with different instruction sets (such as AVX512).

Intro

Matrix multiplication is a mathematical operation that defines the product of

name or nikname	author(s)	year	category	description
Evolutionary principles in self-referential learning or on learning how to learn: the meta-meta	Schmidhuber	1987	?	?
Meta-neural networks that learn by learning.	Naik et al.	1992	?	?

	import torch
	# multiple a MxN matrix with a NxK matrix
	M, N, K = 20,10,30
	A = torch.randn(M,N)
	B = torch.randn(N,K)
	untiled_res = torch.matmul(A, B)
	tile_size=5

	A_ = A.reshape(M, 1, N//tile_size, 1, tile_size)
	B_ = B.reshape(N//tile_size, tile_size, K//tile_size, tile_size).permute(2,0,1,3)

	import torch
	from torchvision.models import MobileNetV2

	def main():
	model = MobileNetV2()
	total = [0,]
	def _forward_hook(module, input, output):
	total[0] += output.numel()
	for _, module in model.named_modules():
	module.register_forward_hook(_forward_hook)

	sudo apt-get purge nvidia-*
	sudo apt-get update
	sudo apt-get autoremove

	# then stop and start the VM again

	curl https://raw.githubusercontent.com/GoogleCloudPlatform/compute-gpu-installation/main/linux/install_gpu_driver.py --output install_gpu_driver.py
	sudo python3 install_gpu_driver.py

	version: 1.1
	kind: operation
	component:
	name: notebook
	inputs:
	- name: gpus
	isOptional: true
	type: int
	value: 0
	-name: image


	version: 1.1
	kind: operation
	component:
	name: vscode
	inputs:
	- name: context
	description: The workspace context, defaults to the current run's outputs
	isOptional: true
	type: str

	"""
	Create train, valid, test iterators for CIFAR-10 [1].
	Easily extended to MNIST, CIFAR-100 and Imagenet.

	[1]: https://discuss.pytorch.org/t/feedback-on-pytorch-for-kaggle-competitions/2252/4
	"""

	import torch
	import numpy as np