Skip to content

Instantly share code, notes, and snippets.

@GeorgeSeif
Last active December 28, 2019 16:48
Show Gist options
  • Select an option

  • Save GeorgeSeif/de4c0da2e9035e64a5f3bed6a5a4c501 to your computer and use it in GitHub Desktop.

Select an option

Save GeorgeSeif/de4c0da2e9035e64a5f3bed6a5a4c501 to your computer and use it in GitHub Desktop.

Revisions

  1. GeorgeSeif revised this gist Dec 28, 2019. 1 changed file with 2 additions and 1 deletion.
    3 changes: 2 additions & 1 deletion scikit_learn_1.py
    Original file line number Diff line number Diff line change
    @@ -15,7 +15,8 @@ def get_tf_idf(vectorizer):
    "a particular document"
    doc_2 = "The TF-IDF is perfectly balanced, considering both local and global " \
    "levels of statistics for the target word."
    doc_3
    doc_3 = "Words that occur more frequently in a document are weighted higher, " \
    "but only if they're more rare within the whole document."
    documents_list = [doc_1, doc_2, doc_3]

    vectors = vectorizer.fit_transform(documents_list)
  2. GeorgeSeif revised this gist Dec 28, 2019. 1 changed file with 2 additions and 1 deletion.
    3 changes: 2 additions & 1 deletion scikit_learn_1.py
    Original file line number Diff line number Diff line change
    @@ -13,7 +13,8 @@ def get_tf_idf(vectorizer):

    doc_1 = "TF-IDF uses statistics to measure how important a word is to " \
    "a particular document"
    doc_2
    doc_2 = "The TF-IDF is perfectly balanced, considering both local and global " \
    "levels of statistics for the target word."
    doc_3
    documents_list = [doc_1, doc_2, doc_3]

  3. GeorgeSeif revised this gist Dec 28, 2019. 1 changed file with 2 additions and 1 deletion.
    3 changes: 2 additions & 1 deletion scikit_learn_1.py
    Original file line number Diff line number Diff line change
    @@ -11,7 +11,8 @@ def get_tf_idf(vectorizer):

    vectorizer = TfidfVectorizer()

    doc_1 = "TF-IDF uses statistics to measure how important a word is to a particular document"
    doc_1 = "TF-IDF uses statistics to measure how important a word is to " \
    "a particular document"
    doc_2
    doc_3
    documents_list = [doc_1, doc_2, doc_3]
  4. GeorgeSeif revised this gist Dec 28, 2019. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion scikit_learn_1.py
    Original file line number Diff line number Diff line change
    @@ -11,7 +11,7 @@ def get_tf_idf(vectorizer):

    vectorizer = TfidfVectorizer()

    doc_1 =
    doc_1 = "TF-IDF uses statistics to measure how important a word is to a particular document"
    doc_2
    doc_3
    documents_list = [doc_1, doc_2, doc_3]
  5. GeorgeSeif revised this gist Dec 28, 2019. 1 changed file with 4 additions and 1 deletion.
    5 changes: 4 additions & 1 deletion scikit_learn_1.py
    Original file line number Diff line number Diff line change
    @@ -11,7 +11,10 @@ def get_tf_idf(vectorizer):

    vectorizer = TfidfVectorizer()

    documents_list = []
    doc_1 =
    doc_2
    doc_3
    documents_list = [doc_1, doc_2, doc_3]

    vectors = vectorizer.fit_transform(documents_list)

  6. GeorgeSeif created this gist Dec 28, 2019.
    21 changes: 21 additions & 0 deletions scikit_learn_1.py
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,21 @@
    import pandas as pd
    from sklearn.feature_extraction.text import TfidfVectorizer

    def get_tf_idf(vectorizer):
    feature_names = vectorizer.get_feature_names()
    dense_vec = vectors.todense()
    dense_list = dense_vec.tolist()
    tfidf_data = pd.DataFrame(dense_list, columns=feature_names)
    return tfidf_data


    vectorizer = TfidfVectorizer()

    documents_list = []

    vectors = vectorizer.fit_transform(documents_list)

    tfidf_data = get_tf_idf(vectorizer)

    print(tfidf_data)
    # Prints the TF-IDF data for all words across all documents