Last active
January 21, 2018 10:41
-
-
Save jcoreyes/7e76e90664f935c6f65d to your computer and use it in GitHub Desktop.
Revisions
-
jcoreyes revised this gist
Sep 17, 2015 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,6 +1,6 @@ ##Information name: LSTM image captioning model based on CVPR 2015 paper "[Show and tell: A neural image caption generator](http://arxiv.org/abs/1411.4555)" and code from Karpathy's [NeuralTalk](https://github.com/karpathy/neuraltalk). model_file: https://s3-us-west-1.amazonaws.com/nervana-modelzoo/image_caption_flickr8k.py -
jcoreyes revised this gist
Sep 17, 2015 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -2,7 +2,7 @@ name: LSTM image captioning model based on CVPR 2015 paper "Show and tell: A neural image caption generator" and code from Karpathy's [NeuralTalk](https://github.com/karpathy/neuraltalk). model_file: https://s3-us-west-1.amazonaws.com/nervana-modelzoo/image_caption_flickr8k.py model_weights: https://s3-us-west-1.amazonaws.com/nervana-modelzoo/image_caption_flickr8k.p -
jcoreyes revised this gist
Sep 17, 2015 . 1 changed file with 3 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -19,8 +19,9 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. CVPR, 2015 (arXiv ref. cs1411.4555) The model was trained for 15 epochs where 1 epoch is 1 pass over all 5 captions of each image. Training data was shuffled each epoch. To evaluate on the test set, download the model and weights, and run: python image_caption.py --model_file [path_to_weights] ##Performance For testing, the model is only given the image and must predict the next word until a stop token is predicted. Greedy search is currently used by just taking the max probable word each time. Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are below. -
jcoreyes revised this gist
Sep 17, 2015 . 1 changed file with 3 additions and 4 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -2,12 +2,10 @@ name: LSTM image captioning model based on CVPR 2015 paper "Show and tell: A neural image caption generator" and code from Karpathy's [NeuralTalk](https://github.com/karpathy/neuraltalk). model_file: model_weights: https://s3-us-west-1.amazonaws.com/nervana-modelzoo/image_caption_flickr8k.p neon_version: v1.0.rc1 neon_commit: 2169b093fbba0c189021a941d286c7a98c0c6c6c @@ -21,7 +19,8 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. CVPR, 2015 (arXiv ref. cs1411.4555) The model was trained for 15 epochs where 1 epoch is 1 pass over all 5 captions of each image. Training data was shuffled each epoch. To evaluate on the test set, download the model and weights and run: python image_caption.py --model_file ##Performance For testing, the model is only given the image and must predict the next word until a stop token is predicted. Greedy search is currently used by just taking the max probable word each time. Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are below. -
jcoreyes revised this gist
Sep 17, 2015 . 1 changed file with 1 addition and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -21,6 +21,7 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. CVPR, 2015 (arXiv ref. cs1411.4555) The model was trained for 15 epochs where 1 epoch is 1 pass over all 5 captions of each image. Training data was shuffled each epoch. ##Performance For testing, the model is only given the image and must predict the next word until a stop token is predicted. Greedy search is currently used by just taking the max probable word each time. Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are below. -
jcoreyes revised this gist
Sep 17, 2015 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -23,7 +23,7 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H ##Performance For testing, the model is only given the image and must predict the next word until a stop token is predicted. Greedy search is currently used by just taking the max probable word each time. Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are below. | BLEU | Score | | ---- | ---- | -
jcoreyes revised this gist
Sep 17, 2015 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -12,7 +12,7 @@ neon_version: v1.0.rc1 neon_commit: 2169b093fbba0c189021a941d286c7a98c0c6c6c gist_id: 7e76e90664f935c6f65d ##Description The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/HockenmaierGroup/8k-pictures.html) using precomputed VGG features from http://cs.stanford.edu/people/karpathy/deepimagesent/. Model details can be found in the following [CVPR-2015 paper](http://arxiv.org/abs/1411.4555): -
jcoreyes revised this gist
Sep 17, 2015 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -12,7 +12,7 @@ neon_version: v1.0.rc1 neon_commit: 2169b093fbba0c189021a941d286c7a98c0c6c6c gist_id:7e76e90664f935c6f65d ##Description The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/HockenmaierGroup/8k-pictures.html) using precomputed VGG features from http://cs.stanford.edu/people/karpathy/deepimagesent/. Model details can be found in the following [CVPR-2015 paper](http://arxiv.org/abs/1411.4555): -
jcoreyes revised this gist
Sep 17, 2015 . 1 changed file with 2 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -8,9 +8,9 @@ model_weights: https://s3-us-west-1.amazonaws.com/nervana-modelzoo/image_caption license: neon_version: v1.0.rc1 neon_commit: 2169b093fbba0c189021a941d286c7a98c0c6c6c gist_id: -
jcoreyes revised this gist
Sep 17, 2015 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -4,7 +4,7 @@ name: LSTM image captioning model based on CVPR 2015 paper "Show and tell: A neu model_file: model_weights: https://s3-us-west-1.amazonaws.com/nervana-modelzoo/image_caption_flickr8k.p license: -
jcoreyes revised this gist
Sep 17, 2015 . 1 changed file with 4 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -23,11 +23,13 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H ##Performance For testing, the model is only given the image and must predict the next word until a stop token is predicted. Greedy search is currently used by just taking the max probable word each time. Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are: | BLEU | Score | | ---- | ---- | | B-1 | 54.2 | | B-2 | 32.6 | | B-3 | 19.3 | | B-4 | 12.3 | A few things that were not implemented are beam search, l2 regularization, and ensembles. With these things, performance would be a bit better. -
jcoreyes revised this gist
Sep 16, 2015 . No changes.There are no files selected for viewing
-
jcoreyes revised this gist
Sep 16, 2015 . 1 changed file with 6 additions and 3 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -25,6 +25,9 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H ##Performance Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are: | BLEU | Score | | ---- | ---- | | B-1 | 54.2 | | B-2 | 32.6 | | B-3 | 19.3 | | B-4 | 12.3 | -
jcoreyes revised this gist
Sep 16, 2015 . 1 changed file with 3 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -24,4 +24,7 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H ##Performance Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are: | BLEU | | ---- | BLEU = 54.2/32.6/19.3/12.3 -
jcoreyes revised this gist
Sep 16, 2015 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -23,5 +23,5 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H ##Performance Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are: BLEU = 54.2/32.6/19.3/12.3 -
jcoreyes revised this gist
Sep 16, 2015 . 1 changed file with 4 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,6 +1,6 @@ ##Information name: LSTM image captioning model based on CVPR 2015 paper "Show and tell: A neural image caption generator" and code from Karpathy's [NeuralTalk](https://github.com/karpathy/neuraltalk). model_file: @@ -22,4 +22,6 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H CVPR, 2015 (arXiv ref. cs1411.4555) ##Performance https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ BLEU = 54.2/32.6/19.3/12.3 -
jcoreyes revised this gist
Sep 16, 2015 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -15,7 +15,7 @@ neon_commit: gist_id: ##Description The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/HockenmaierGroup/8k-pictures.html) using precomputed VGG features from http://cs.stanford.edu/people/karpathy/deepimagesent/. Model details can be found in the following [CVPR-2015 paper](http://arxiv.org/abs/1411.4555): Show and tell: A neural image caption generator. O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. -
jcoreyes revised this gist
Sep 16, 2015 . No changes.There are no files selected for viewing
-
jcoreyes revised this gist
Sep 16, 2015 . 1 changed file with 3 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -17,8 +17,9 @@ gist_id: ##Description The LSTM model is trained on the flickr8k, flickr30k, and coco datasets using precomputed VGG features from http://cs.stanford.edu/people/karpathy/deepimagesent/. Model details can be found in the following [CVPR-2015 paper](http://arxiv.org/abs/1411.4555): Show and tell: A neural image caption generator. O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. CVPR, 2015 (arXiv ref. cs1411.4555) ##Performance -
jcoreyes revised this gist
Sep 16, 2015 . 1 changed file with 2 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -17,8 +17,8 @@ gist_id: ##Description The LSTM model is trained on the flickr8k, flickr30k, and coco datasets using precomputed VGG features from http://cs.stanford.edu/people/karpathy/deepimagesent/. Model details can be found in the following [CVPR-2015 paper](http://arxiv.org/abs/1411.4555): O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. CVPR, 2015 (arXiv ref. cs1411.4555) ##Performance -
jcoreyes revised this gist
Sep 16, 2015 . 1 changed file with 8 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,11 +1,19 @@ ##Information name: LSTM image captioning model based on CVPR 2015 paper "Show and tell: A neural image caption generator". model_file: model_weights: license: neon_version: neon_commit: gist_id: ##Description The LSTM model is trained on the flickr8k, flickr30k, and coco datasets using precomputed VGG features from http://cs.stanford.edu/people/karpathy/deepimagesent/. Model details can be found in the following [CVPR-2015 paper](http://arxiv.org/abs/1411.4555): -
jcoreyes revised this gist
Sep 16, 2015 . 1 changed file with 6 additions and 3 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,13 +1,16 @@ ##Information name: LSTM image captioning model based on CVPR 2015 paper "Show and tell: A neural image caption generator". model_file: model_weights: license: neon_version: neon_commit: gist_id: ##Description The LSTM model is trained on the flickr8k, flickr30k, and coco datasets using precomputed VGG features from http://cs.stanford.edu/people/karpathy/deepimagesent/. Model details can be found in the following [CVPR-2015 paper](http://arxiv.org/abs/1411.4555): O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. CVPR, 2015 (arXiv ref. cs1411.4555) ##Performance -
jcoreyes created this gist
Sep 16, 2015 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,13 @@ ##Information --- name: model_file: model_weights: license: neon_version: neon_commit: gist_id: --- ##Description ##Performance