Skip to content

Instantly share code, notes, and snippets.

@sandgate-dev
Created April 7, 2019 07:43
Show Gist options
  • Save sandgate-dev/15b93ef69472c10df40598c20e42149e to your computer and use it in GitHub Desktop.
Save sandgate-dev/15b93ef69472c10df40598c20e42149e to your computer and use it in GitHub Desktop.

Revisions

  1. sandgate-dev created this gist Apr 7, 2019.
    16 changes: 16 additions & 0 deletions create_pyLDAvis.py
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,16 @@
    news_content = df[df.category=='reliable'].news_content.tolist()

    tf_vectorizer = CountVectorizer(strip_accents = 'unicode',
    stop_words = 'english',
    lowercase = True,
    token_pattern = r'\b[a-zA-Z]{3,}\b',
    max_df = 0.5,
    min_df = 10)

    tfidf_vectorizer = TfidfVectorizer(**tf_vectorizer.get_params())
    dtm_tfidf = tfidf_vectorizer.fit_transform(news_content)
    lda_tfidf = LatentDirichletAllocation(n_topics=20, random_state=0)
    lda_tfidf.fit(dtm_tfidf)

    vis_data = pyLDAvis.sklearn.prepare(lda_tfidf, dtm_tfidf, tf_vectorizer)
    pyLDAvis.display(vis_data)