Skip to content

Instantly share code, notes, and snippets.

View minhpqn's full-sized avatar

Pham Quang Nhat Minh minhpqn

View GitHub Profile
@minhpqn
minhpqn / LLMs.md
Created January 4, 2023 08:28 — forked from yoavg/LLMs.md

Some remarks on Large Language Models

Yoav Goldberg, January 2023

Audience: I assume you heard of chatGPT, maybe played with it a little, and was imressed by it (or tried very hard not to be). And that you also heard that it is "a large language model". And maybe that it "solved natural language understanding". Here is a short personal perspective of my thoughts of this (and similar) models, and where we stand with respect to language understanding.

Intro

Around 2014-2017, right within the rise of neural-network based methods for NLP, I was giving a semi-academic-semi-popsci lecture, revolving around the story that achieving perfect language modeling is equivalent to being as intelligent as a human. Somewhere around the same time I was also asked in an academic panel "what would you do if you were given infinite compute and no need to worry about labour costs" to which I cockily responded "I would train a really huge language model, just to show that it doesn't solve everything!". We

@minhpqn
minhpqn / video-subtitles-via-whisper.py
Created September 27, 2022 04:40 — forked from rasbt/video-subtitles-via-whisper.py
Script that creates subtitles (closed captions) for all MP4 video files in your current directory
# Sebastian Raschka 09/24/2022
# Create a new conda environment and packages
# conda create -n whisper python=3.9
# conda activate whisper
# conda install mlxtend -c conda-forge
# Install ffmpeg
# macOS & homebrew
# brew install ffmpeg
# Ubuntu
@minhpqn
minhpqn / zip_foler.py
Created August 8, 2022 06:44
Zip folder in SJIS Encoding
import os
import zipfile
from zipcp932patch import zipcp932patch
def zipdir(path, output_path):
# ziph is zipfile handle
with zipcp932patch, zipfile.ZipFile(output_path, 'w') as ziph:
for root, dirs, files in os.walk(path):
for file in files:

Quick Start

sudo curl https://gist.github.com/pankaj28843/3ad78df6290b5ba931c1/raw/soffice.sh > /usr/local/bin/soffice && sudo chmod +x /usr/local/bin/soffice

Create an bash script at /usr/local/bin/soffice with following content

#!/bin/bash

# Need to do this because symlink won't work
@minhpqn
minhpqn / tmux-cheatsheet.markdown
Created December 29, 2021 09:08 — forked from MohamedAlaa/tmux-cheatsheet.markdown
tmux shortcuts & cheatsheet

tmux shortcuts & cheatsheet

start new:

tmux

start new with session name:

tmux new -s myname
@minhpqn
minhpqn / handle_missing_values.py
Last active October 19, 2020 08:36
Handling missing values for both numeric and non-numeric data
# Source: https://stackoverflow.com/questions/25239958/impute-categorical-missing-values-in-scikit-learn
import pandas as pd
import numpy as np
from sklearn.base import TransformerMixin
class DataFrameImputer(TransformerMixin):
def __init__(self):
"""Impute missing values.
@minhpqn
minhpqn / dict_vectorizer_example.py
Created October 7, 2020 08:20
Using DictVectorizer in scikit-learn
# Reference: https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.DictVectorizer.html
from sklearn.feature_extraction import DictVectorizer
v = DictVectorizer(sparse=False)
D = [
{'foo': 1, 'bar': 2, 'A': '中'}, {'foo': 3, 'baz': 1, 'A': '大'}
]
X = v.fit_transform(D)
@minhpqn
minhpqn / http_download.py
Last active July 30, 2020 04:55
Download all files from a http directory
import os
import argparse
import requests
from time import sleep
from logzero import logger
from bs4 import BeautifulSoup
def listFD(url, ext=''):
page = requests.get(url).text
@minhpqn
minhpqn / quantize.py
Created July 8, 2020 08:55 — forked from aleju/quantize.py
Simple quantization function for python
def quantize(val, to_values):
"""Quantize a value with regards to a set of allowed values.
Examples:
quantize(49.513, [0, 45, 90]) -> 45
quantize(43, [0, 10, 20, 30]) -> 30
Note: function doesn't assume to_values to be sorted and
iterates over all values (i.e. is rather slow).
@minhpqn
minhpqn / gradient_accumulation.py
Created June 15, 2020 16:07 — forked from thomwolf/gradient_accumulation.py
PyTorch gradient accumulation training loop
model.zero_grad() # Reset gradients tensors
for i, (inputs, labels) in enumerate(training_set):
predictions = model(inputs) # Forward pass
loss = loss_function(predictions, labels) # Compute loss function
loss = loss / accumulation_steps # Normalize our loss (if averaged)
loss.backward() # Backward pass
if (i+1) % accumulation_steps == 0: # Wait for several backward steps
optimizer.step() # Now we can do an optimizer step
model.zero_grad() # Reset gradients tensors
if (i+1) % evaluation_steps == 0: # Evaluate the model when we...