| coding > leading > planning | < ☕

Nils Rethmeier NilsRethmeier

| coding > leading > planning | < ☕

Sr LLM/LMM researcher + technical lead. Efficient LLM/LMM pretraining, multi-modal AI, NLP, XAI, energy-based models, algorithmic bias.

22 followers · 3 following

German Research Center for AI (DFKI) + Copenhagen University + DB Systel
Berlin
@nils_rethmeier
in/nils-rethmeier-572ab734

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

NilsRethmeier / multi_dimensional_crossentropy_pytorch.py

Created February 10, 2023 23:04

How to use torch.nn.CrossEntropyLoss with more than 2 dimensions -- i.e. permute the make (batch_size, num_classes, dim3, dim4). E.g. token in sequence classification

	import torch

	def multi_dim_cross_entropy_test():
	""" This loss needs the shapes out of prediction order, rather than bs, seq_len, num_labels it needs:
	input: bs, num_labels, seq_len
	out: bs, seq_len ... where there is one label is = 0.. N per seq_len position
	"""
	# Example of target with class indices
	torch.manual_seed(0)
	loss = torch.nn.CrossEntropyLoss(reduction='none')

NilsRethmeier / bash_history_to_zsh_history.sh

Created September 13, 2022 15:08

copy bash history to zsh history using awk -- found in a user commet

sort ~/.bash_history | uniq | awk '{print ": :0:;"$0}' >> ~/.zsh_history

NilsRethmeier / gradient_accumulation.py

Created September 1, 2020 13:22 — forked from thomwolf/gradient_accumulation.py

PyTorch gradient accumulation training loop

	model.zero_grad() # Reset gradients tensors
	for i, (inputs, labels) in enumerate(training_set):
	predictions = model(inputs) # Forward pass
	loss = loss_function(predictions, labels) # Compute loss function
	loss = loss / accumulation_steps # Normalize our loss (if averaged)
	loss.backward() # Backward pass
	if (i+1) % accumulation_steps == 0: # Wait for several backward steps
	optimizer.step() # Now we can do an optimizer step
	model.zero_grad() # Reset gradients tensors
	if (i+1) % evaluation_steps == 0: # Evaluate the model when we...