Skip to content

Instantly share code, notes, and snippets.

@kylebgorman
Last active June 14, 2021 20:51
Show Gist options
  • Select an option

  • Save kylebgorman/d8a6b1342dddfb4e95c46ff39f87a4cb to your computer and use it in GitHub Desktop.

Select an option

Save kylebgorman/d8a6b1342dddfb4e95c46ff39f87a4cb to your computer and use it in GitHub Desktop.

Revisions

  1. kylebgorman revised this gist Jun 14, 2021. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions rubert-embedding.py
    Original file line number Diff line number Diff line change
    @@ -12,5 +12,5 @@
    sentence = "Все счастливые семьи похожи друг на друга, каждая несчастливая семья несчастлива по-своему."

    tokenized = tokenizer(sentence, return_tensors="pt")
    embeddings = model(**tokenized)
    print(embeddings)
    embeddings = model(**tokenized, output_hidden_states=True).hidden_states[0]
    print(embeddings)
  2. kylebgorman revised this gist Jun 14, 2021. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion rubert-embedding.py
    Original file line number Diff line number Diff line change
    @@ -13,4 +13,4 @@

    tokenized = tokenizer(sentence, return_tensors="pt")
    embeddings = model(**tokenized)
    print(embedding)
    print(embeddings)
  3. kylebgorman created this gist Jun 14, 2021.
    16 changes: 16 additions & 0 deletions rubert-embedding.py
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,16 @@
    #!/usr/bin/env python

    # Documented in: https://metatext.io/models/DeepPavlov-rubert-base-cased

    import transformers


    model_name = "DeepPavlov/rubert-base-cased"
    model = transformers.AutoModel.from_pretrained(model_name)
    tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)

    sentence = "Все счастливые семьи похожи друг на друга, каждая несчастливая семья несчастлива по-своему."

    tokenized = tokenizer(sentence, return_tensors="pt")
    embeddings = model(**tokenized)
    print(embedding)