Last active
February 1, 2023 09:32
-
-
Save ululh/c3edda2497b8ff9d4f70e63b0c9bd78c to your computer and use it in GitHub Desktop.
Revisions
-
ululh revised this gist
Jun 25, 2017 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,5 +1,5 @@ # derived from http://scikit-learn.org/stable/auto_examples/applications/topics_extraction_with_nmf_lda.html # explanations are located there : https://www.linkedin.com/pulse/dissociating-training-predicting-latent-dirichlet-lucien-tardres from sklearn.feature_extraction.text import CountVectorizer from sklearn.decomposition import LatentDirichletAllocation -
ululh revised this gist
Jun 25, 2017 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -12,7 +12,7 @@ with open ('outfile', 'rb') as fd: (features,lda.components_,lda.exp_dirichlet_component_,lda.doc_topic_prior_) = pickle.load(fd) # the dataset to predict on (first two samples were also in the training set so one can compare) data_samples = ["I like to eat broccoli and bananas.", "I ate a banana and spinach smoothie for breakfast.", "kittens and dogs are boring" -
ululh renamed this gist
Jun 25, 2017 . 1 changed file with 0 additions and 0 deletions.There are no files selected for viewing
File renamed without changes. -
ululh created this gist
Jun 25, 2017 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,26 @@ # derived from http://scikit-learn.org/stable/auto_examples/applications/topics_extraction_with_nmf_lda.html # explanations are located there : from sklearn.feature_extraction.text import CountVectorizer from sklearn.decomposition import LatentDirichletAllocation import pickle # create a blank model lda = LatentDirichletAllocation() # load parameters from file with open ('outfile', 'rb') as fd: (features,lda.components_,lda.exp_dirichlet_component_,lda.doc_topic_prior_) = pickle.load(fd) # the dataset to predict on (first two sample were also in the training set so you can compare) data_samples = ["I like to eat broccoli and bananas.", "I ate a banana and spinach smoothie for breakfast.", "kittens and dogs are boring" ] # Vectorize the training set using the model features as vocabulary tf_vectorizer = CountVectorizer(vocabulary=features) tf = tf_vectorizer.fit_transform(data_samples) # transform method returns a matrix with one line per document, columns being topics weight predict = lda.transform(tf) print(predict)