Skip to content

Instantly share code, notes, and snippets.

@y-lan
Last active July 20, 2023 14:16
Show Gist options
  • Select an option

  • Save y-lan/1d7574ba89f158fc9df36f25117671e8 to your computer and use it in GitHub Desktop.

Select an option

Save y-lan/1d7574ba89f158fc9df36f25117671e8 to your computer and use it in GitHub Desktop.

Revisions

  1. y-lan revised this gist Jul 20, 2023. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion count_llama_tokens.py
    Original file line number Diff line number Diff line change
    @@ -8,4 +8,4 @@ def count(text):
    def parallel_count(texts):
    from joblib import Parallel, delayed
    results = Parallel(n_jobs=-1)(delayed(count)(text) for text in texts)
    return sum([results])
    return sum(results)
  2. y-lan revised this gist Jul 20, 2023. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions count_llama_tokens.py
    Original file line number Diff line number Diff line change
    @@ -1,11 +1,11 @@
    from transformers import LlamaTokenizer

    tokenizer = LlamaTokenizer.from_pretrained('meta-llama/Llama-2-7b')
    tokenizer = LlamaTokenizer.from_pretrained('decapoda-research/llama-7b-hf')

    def count(text):
    return len(tokenizer(text)['input_ids'])

    def parallel_count(texts):
    from joblib import Parallel, delayed
    results = Parallel(n_jobs=-1)(delayed(count)(text) for text in texts))
    results = Parallel(n_jobs=-1)(delayed(count)(text) for text in texts)
    return sum([results])
  3. y-lan renamed this gist Jul 20, 2023. 1 changed file with 0 additions and 0 deletions.
    File renamed without changes.
  4. y-lan created this gist Jul 20, 2023.
    11 changes: 11 additions & 0 deletions count.py
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,11 @@
    from transformers import LlamaTokenizer

    tokenizer = LlamaTokenizer.from_pretrained('meta-llama/Llama-2-7b')

    def count(text):
    return len(tokenizer(text)['input_ids'])

    def parallel_count(texts):
    from joblib import Parallel, delayed
    results = Parallel(n_jobs=-1)(delayed(count)(text) for text in texts))
    return sum([results])