Pham Quang Nhat Minh minhpqn

Some remarks on Large Language Models

Yoav Goldberg, January 2023

Audience: I assume you heard of chatGPT, maybe played with it a little, and was imressed by it (or tried very hard not to be). And that you also heard that it is "a large language model". And maybe that it "solved natural language understanding". Here is a short personal perspective of my thoughts of this (and similar) models, and where we stand with respect to language understanding.

Intro

Around 2014-2017, right within the rise of neural-network based methods for NLP, I was giving a semi-academic-semi-popsci lecture, revolving around the story that achieving perfect language modeling is equivalent to being as intelligent as a human. Somewhere around the same time I was also asked in an academic panel "what would you do if you were given infinite compute and no need to worry about labour costs" to which I cockily responded "I would train a really huge language model, just to show that it doesn't solve everything!". We

Quick Start

sudo curl https://gist.github.com/pankaj28843/3ad78df6290b5ba931c1/raw/soffice.sh > /usr/local/bin/soffice && sudo chmod +x /usr/local/bin/soffice

Create an bash script at `/usr/local/bin/soffice` with following content

#!/bin/bash

# Need to do this because symlink won't work

tmux shortcuts & cheatsheet

start new:

tmux

start new with session name:

tmux new -s myname

	# Sebastian Raschka 09/24/2022
	# Create a new conda environment and packages
	# conda create -n whisper python=3.9
	# conda activate whisper
	# conda install mlxtend -c conda-forge

	# Install ffmpeg
	# macOS & homebrew
	# brew install ffmpeg
	# Ubuntu

	import os
	import zipfile

	from zipcp932patch import zipcp932patch

	def zipdir(path, output_path):
	# ziph is zipfile handle
	with zipcp932patch, zipfile.ZipFile(output_path, 'w') as ziph:
	for root, dirs, files in os.walk(path):
	for file in files:

	# Source: https://stackoverflow.com/questions/25239958/impute-categorical-missing-values-in-scikit-learn
	import pandas as pd
	import numpy as np

	from sklearn.base import TransformerMixin

	class DataFrameImputer(TransformerMixin):

	def __init__(self):
	"""Impute missing values.

	# Reference: https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.DictVectorizer.html
	from sklearn.feature_extraction import DictVectorizer

	v = DictVectorizer(sparse=False)

	D = [
	{'foo': 1, 'bar': 2, 'A': '中'}, {'foo': 3, 'baz': 1, 'A': '大'}
	]

	X = v.fit_transform(D)

	import os
	import argparse
	import requests
	from time import sleep
	from logzero import logger
	from bs4 import BeautifulSoup


	def listFD(url, ext=''):
	page = requests.get(url).text

	def quantize(val, to_values):
	"""Quantize a value with regards to a set of allowed values.

	Examples:
	quantize(49.513, [0, 45, 90]) -> 45
	quantize(43, [0, 10, 20, 30]) -> 30

	Note: function doesn't assume to_values to be sorted and
	iterates over all values (i.e. is rather slow).

	model.zero_grad() # Reset gradients tensors
	for i, (inputs, labels) in enumerate(training_set):
	predictions = model(inputs) # Forward pass
	loss = loss_function(predictions, labels) # Compute loss function
	loss = loss / accumulation_steps # Normalize our loss (if averaged)
	loss.backward() # Backward pass
	if (i+1) % accumulation_steps == 0: # Wait for several backward steps
	optimizer.step() # Now we can do an optimizer step
	model.zero_grad() # Reset gradients tensors
	if (i+1) % evaluation_steps == 0: # Evaluate the model when we...