Skip to content

Instantly share code, notes, and snippets.

View myedibleenso's full-sized avatar

Gus Hahn-Powell myedibleenso

View GitHub Profile
@myedibleenso
myedibleenso / foma_demo.py
Created October 11, 2024 15:47
Usage examples for the Python Finite-State Toolkit (pyfoma)
# pip install pyfoma>=1
from pyfoma import FST
# The Python Finite-State Toolkit (pyfoma)
# has support for finite-state transducers.
# FSTs can be used to rewrite symbols in a context dependent manner.
# These rules can even be weighted.
# One application of an FST is to convert a written form of a word
# into IPA (and reverse the process).
# For languages with a phonologically transparent orthography, this rewrite can be straightforward.
@myedibleenso
myedibleenso / README.md
Last active August 29, 2025 00:13
Google Sheet -> sqlite DB

The URL structure to download a Google sheet to a csv is https://spreadsheets.google.com/feeds/download/spreadsheets/Export?key=yourDocId&exportFormat=csv&sheet=1

The URL structure to view a Google shet is https://docs.google.com/spreadsheets/d/yourDocId

import pandas as pd
import sqlite3

# pip install "pandas[excel]"
@myedibleenso
myedibleenso / PROBLEM.md
Last active August 29, 2025 00:14
A disjunction of graph traversals

If we want to produce a disjunction of graph traversal patterns (ex. a >rel b OR b >rel c), we currently cannot simply use (a >rel b) | (b >rel c) (this will not compile).

import ai.lum.odinson.ExtractorEngine
import ai.lum.odinson.lucene.search.OdinOrQuery
import ai.lum.odinson.{ Document, Sentence, GraphField, TokensField }

val d1 = Document(
  id = "doc-1",
  metadata = Seq(),
@myedibleenso
myedibleenso / pipeline-example.py
Last active August 29, 2025 00:15
sklearn example
# tested with ...
# numpy==1.26.0
# scikit-learn==1.2.2
from sklearn import preprocessing
from sklearn.pipeline import FeatureUnion, Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
import numpy as np
@myedibleenso
myedibleenso / example.py
Created March 14, 2023 15:33
odinson-gateway sketch
# conda create -n "odinsynth-redux" python=3.9 ipython
# clone https://github.com/lum-ai/odinson-gateway
# pip install ".[all]"
# from the iPython REPL
from typing import Dict, Any, List, Text
from odinson.gateway import OdinsonGateway, Document
import json
# FIXME: our specification of Odinson documents.
@myedibleenso
myedibleenso / megamillions.py
Last active August 29, 2025 00:18
Random drawing (w/ optional constraints) for megamillions
from typing import (
Callable,
Iterable,
Sequence,
Tuple,
TypeAlias
)
from collections import Counter
from itertools import combinations
from dataclasses import dataclass
@myedibleenso
myedibleenso / parse_page_data.py
Created October 11, 2022 12:38
facet discovery (ingest)
from clu.bridge import odinson
from clu.bridge import processors
from clu.bridge import spacy as sp
from typing import Sequence
from joblib import Parallel, delayed
from typing import Dict, Text
import pydantic
import spacy
import pandas as pd
@myedibleenso
myedibleenso / align.py
Last active August 29, 2025 00:19
Sketch for Seetha. "fuzzy" alignment to ground truth for taxonomy
from __future__ import annotations
from collections import Counter
from typing import Dict, List, Optional, Text
from dataclasses import dataclass
@dataclass
class Span:
start: int
end: int
tokens: Optional[List[Text]]
@myedibleenso
myedibleenso / README.md
Last active August 29, 2025 00:19
A summary of how to automatically transcribe English-language mp3 files using Wav2vec2.

Overview

This short README details the process I followed to perform automatic speech recognition on a 48+ minute audio interview.

1. Convert m4a to mp3

I Converted m4a to mp3 using ffmpeg. Assuming you have an audio file named interview.m4a to process, ...

ffmpeg -i interview.m4a -codec:a libmp3lame -qscale:a 1 interview.mp3
@myedibleenso
myedibleenso / README.md
Last active August 29, 2025 00:19
odinson examples

Using the experimental Odinson REST API

Grab the most recent image

First, pull the most recently pushed image with the :experimental tag:

docker pull lumai/odinson-rest-api:experimental

Launch the REST API