Skip to content

Instantly share code, notes, and snippets.

View joverlee521's full-sized avatar

Jover Lee joverlee521

  • Seattle, WA
View GitHub Profile
@joverlee521
joverlee521 / create-lineage-annotations.smk
Created June 13, 2025 22:42
Create annotations for lineages in fauna
"""
Workflow for tracking down records with mismatched lineages from the ingest
workflow compared to the metadata from fauna.
Intended to be run once to create the annotations.tsv for the ingest workflow
to correct the lineages using the version of files on S3 from 2025-06-13.
"""
SUBTYPES = [
"h1n1pdm",
"h3n2",
@joverlee521
joverlee521 / geolocation_rules.vdj
Created February 3, 2025 21:11
Order geolocation_rules.tsv
#!vd -p
{"sheet": "global", "col": null, "row": "header", "longname": "set-option", "input": "0", "keystrokes": "", "comment": null, "replayable": null}
{"longname": "open-file", "input": "augur/data/geolocation_rules.tsv", "keystrokes": "o", "replayable": true}
{"sheet": "geolocation_rules", "col": 1, "row": "", "longname": "addcol-split", "input": "/", "keystrokes": ":", "comment": "add column split by regex", "replayable": true}
{"sheet": "geolocation_rules", "col": "_re", "row": "", "longname": "expand-col", "input": "", "keystrokes": "(", "comment": "expand current column of containers one level", "replayable": true}
{"sheet": "geolocation_rules", "col": "_re[1]", "row": "", "longname": "key-col", "input": "", "keystrokes": "!", "comment": "toggle current column as a key column", "replayable": true}
{"sheet": "geolocation_rules", "col": 1, "row": "", "longname": "key-col", "input": "", "keystrokes": "!", "comment": "toggle current column as a key column", "replayable": true}
{"sheet": "geolocation_rules"
@joverlee521
joverlee521 / vidrl-human-sera-ingest.smk
Last active August 29, 2024 18:04
vidrl-human-sera-ingest
PREVIEW = config.get("preview", False)
YEAR = config.get("year", '2024')
VIDRL_PATH = "../fludata/VIDRL-Melbourne-WHO-CC/raw-data"
H1N1_PATH = f"{VIDRL_PATH}/A/H1N1pdm/HI"
H3N2_PATH = f"{VIDRL_PATH}/A/H3N2/HI"
H3N2_FRA = f"{VIDRL_PATH}/A/H3N2/FRA"
VIC_PATH = f"{VIDRL_PATH}/B/Victoria/HI"
@joverlee521
joverlee521 / copy-h1n1pdm-pandemic.smk
Created June 24, 2024 17:59
copy-h1n1pdm-pandemic.smk
subtypes = ['h1n1pdm']
segments = ['ha', 'na']
times = ['pandemic']
rule all:
input:
expand("data/{subtype}/{segment}/{time}.done", subtype=subtypes, segment=segments, time=times)
rule copy_aws_files:
output: touch("data/{subtype}/{segment}/{time}.done")
@joverlee521
joverlee521 / ncov-ingest-aws-batch-snakemake-report.html
Created June 17, 2024 21:20
Snakemake report generated from AWS Batch job
This file has been truncated, but you can view the full file.
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta name="description" content="">
<meta name="author" content="">
<title>Snakemake Report</title>
subtypes = ['h3n2', 'h1n1pdm', 'vic', 'yam']
segments = ['ha', 'na']
times = ['12y', '6y', '3y', '2y', '6m']
rule all:
input:
expand("data/{subtype}/{segment}/{time}.done", subtype=subtypes, segment=segments, time=times)
rule copy_aws_files:
output: touch("data/{subtype}/{segment}/{time}.done")
@joverlee521
joverlee521 / git-subrepo-pull-conflicts.md
Created September 20, 2023 20:48
git subrepo pull conflicts in monkeypox

In walking through git subrepo pull with @j23414 for nextstrain/mpox#182, we ran into unexpected merge conflicts.

$ git subrepo pull ingest/vendored -dv
>>> git rev-parse --verify HEAD
* Assert that working copy is clean: /Users/jlee2346/Repos/nextstrain/monkeypox
* Check for worktree with branch subrepo/ingest/vendored
  * Fetch the upstream: https://github.com/nextstrain/ingest (main).
>>> git fetch --no-tags --quiet https://github.com/nextstrain/ingest main
@joverlee521
joverlee521 / compare-hash-and-diff
Created September 5, 2023 23:45
Diff-seq-counts-files
#!/bin/bash
bucket="nextstrain-data"
s3_prefix="files/workflows/forecasts-ncov"
s3_trial_prefix="trial/seq-counts-workflow"
data_provenances=("open" "gisaid")
variants=("nextstrain_clades" "pango_lineages")
georesolutions=("global" "usa")
@joverlee521
joverlee521 / Snakefile
Created January 18, 2023 20:00
snakemake-hardlink-test
rule all:
input: "data/rule_all.txt"
rule a:
output: touch("data/rule_a.txt")
rule b:
input: "data/rule_a.txt"
output: "data/rule_a_hardlink.txt"

Example data (test-data.tsv)

ID city latitude longitude lineage lineage__color
AAA Auckland -37 174 B.1.1 #F020E2
BBB Auckland -39 175 C.12 #20C9F0
CCC Wellington -41 174 C.12 #20C9F0

Example commands

a colours TSV mapping lineage -> hex. (Including some form of colour averaging if there are multiple hexes per lineage, which happens.)