tianqig · September 11, 2017 09:43 · Sep 17, 2015 · Sep 17, 2015 · Sep 17, 2015 · Sep 17, 2015
diff --git a/readme.md b/readme.md
@@ -1,6 +1,6 @@
 ##Information
 
-name: LSTM image captioning model based on CVPR 2015 paper "Show and tell: A neural image caption generator" and code from Karpathy's [NeuralTalk](https://github.com/karpathy/neuraltalk).
+name: LSTM image captioning model based on CVPR 2015 paper "[Show and tell: A neural image caption generator](http://arxiv.org/abs/1411.4555)" and code from Karpathy's [NeuralTalk](https://github.com/karpathy/neuraltalk).
 
 model_file: https://s3-us-west-1.amazonaws.com/nervana-modelzoo/image_caption_flickr8k.py
 

diff --git a/readme.md b/readme.md
@@ -2,7 +2,7 @@
 
 name: LSTM image captioning model based on CVPR 2015 paper "Show and tell: A neural image caption generator" and code from Karpathy's [NeuralTalk](https://github.com/karpathy/neuraltalk).
 
-model_file: 
+model_file: https://s3-us-west-1.amazonaws.com/nervana-modelzoo/image_caption_flickr8k.py
 
 model_weights: https://s3-us-west-1.amazonaws.com/nervana-modelzoo/image_caption_flickr8k.p
 

diff --git a/readme.md b/readme.md
@@ -19,8 +19,9 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H
     O. Vinyals, A. Toshev, S. Bengio, and D. Erhan.  
     CVPR, 2015 (arXiv ref. cs1411.4555)
 
-The model was trained for 15 epochs where 1 epoch is 1 pass over all 5 captions of each image. Training data was shuffled each epoch. To evaluate on the test set, download the model and weights and run:
-        python image_caption.py --model_file
+The model was trained for 15 epochs where 1 epoch is 1 pass over all 5 captions of each image. Training data was shuffled each epoch. To evaluate on the test set, download the model and weights, and run:
+
+        python image_caption.py --model_file [path_to_weights]
 
 ##Performance
 For testing, the model is only given the image and must predict the next word until a stop token is predicted. Greedy search is currently used by just taking the max probable word each time. Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are below.

diff --git a/readme.md b/readme.md
@@ -2,12 +2,10 @@
 
 name: LSTM image captioning model based on CVPR 2015 paper "Show and tell: A neural image caption generator" and code from Karpathy's [NeuralTalk](https://github.com/karpathy/neuraltalk).
 
-model_file:
+model_file: 
 
 model_weights: https://s3-us-west-1.amazonaws.com/nervana-modelzoo/image_caption_flickr8k.p
 
-license:
-
 neon_version: v1.0.rc1
 
 neon_commit: 2169b093fbba0c189021a941d286c7a98c0c6c6c
@@ -21,7 +19,8 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H
     O. Vinyals, A. Toshev, S. Bengio, and D. Erhan.  
     CVPR, 2015 (arXiv ref. cs1411.4555)
 
-The model was trained for 15 epochs where 1 epoch is 1 pass over all 5 captions of each image. Training data was shuffled each epoch. 
+The model was trained for 15 epochs where 1 epoch is 1 pass over all 5 captions of each image. Training data was shuffled each epoch. To evaluate on the test set, download the model and weights and run:
+        python image_caption.py --model_file
 
 ##Performance
 For testing, the model is only given the image and must predict the next word until a stop token is predicted. Greedy search is currently used by just taking the max probable word each time. Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are below.

diff --git a/readme.md b/readme.md
@@ -21,6 +21,7 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H
     O. Vinyals, A. Toshev, S. Bengio, and D. Erhan.  
     CVPR, 2015 (arXiv ref. cs1411.4555)
 
+The model was trained for 15 epochs where 1 epoch is 1 pass over all 5 captions of each image. Training data was shuffled each epoch. 
 
 ##Performance
 For testing, the model is only given the image and must predict the next word until a stop token is predicted. Greedy search is currently used by just taking the max probable word each time. Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are below.

diff --git a/readme.md b/readme.md
@@ -23,7 +23,7 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H
 
 
 ##Performance
-For testing, the model is only given the image and must predict the next word until a stop token is predicted. Greedy search is currently used by just taking the max probable word each time. Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are:
+For testing, the model is only given the image and must predict the next word until a stop token is predicted. Greedy search is currently used by just taking the max probable word each time. Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are below.
 
 | BLEU | Score |
 | ---- | ----  |

diff --git a/readme.md b/readme.md
@@ -12,7 +12,7 @@ neon_version: v1.0.rc1
 
 neon_commit: 2169b093fbba0c189021a941d286c7a98c0c6c6c
 
-gist_id:7e76e90664f935c6f65d
+gist_id: 7e76e90664f935c6f65d
 
 ##Description
 The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/HockenmaierGroup/8k-pictures.html) using precomputed VGG features from http://cs.stanford.edu/people/karpathy/deepimagesent/. Model details can be found in the following [CVPR-2015 paper](http://arxiv.org/abs/1411.4555):

diff --git a/readme.md b/readme.md
@@ -12,7 +12,7 @@ neon_version: v1.0.rc1
 
 neon_commit: 2169b093fbba0c189021a941d286c7a98c0c6c6c
 
-gist_id:
+gist_id:7e76e90664f935c6f65d
 
 ##Description
 The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/HockenmaierGroup/8k-pictures.html) using precomputed VGG features from http://cs.stanford.edu/people/karpathy/deepimagesent/. Model details can be found in the following [CVPR-2015 paper](http://arxiv.org/abs/1411.4555):

diff --git a/readme.md b/readme.md
@@ -8,9 +8,9 @@ model_weights: https://s3-us-west-1.amazonaws.com/nervana-modelzoo/image_caption
 
 license:
 
-neon_version:
+neon_version: v1.0.rc1
 
-neon_commit:
+neon_commit: 2169b093fbba0c189021a941d286c7a98c0c6c6c
 
 gist_id:
 

diff --git a/readme.md b/readme.md
@@ -4,7 +4,7 @@ name: LSTM image captioning model based on CVPR 2015 paper "Show and tell: A neu
 
 model_file:
 
-model_weights:
+model_weights: https://s3-us-west-1.amazonaws.com/nervana-modelzoo/image_caption_flickr8k.p
 
 license:
 

diff --git a/readme.md b/readme.md
@@ -23,11 +23,13 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H
 
 
 ##Performance
-Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are:
+For testing, the model is only given the image and must predict the next word until a stop token is predicted. Greedy search is currently used by just taking the max probable word each time. Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are:
 
 | BLEU | Score |
 | ---- | ----  |
 | B-1  | 54.2  |
 | B-2  | 32.6  |
 | B-3  | 19.3  |
-| B-4  | 12.3  |
+| B-4  | 12.3  |
+
+A few things that were not implemented are beam search, l2 regularization, and ensembles. With these things, performance would be a bit better.
diff --git a/readme.md b/readme.md
@@ -25,6 +25,9 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H
 ##Performance
 Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are:
 
-| BLEU |
-| ---- |
-BLEU = 54.2/32.6/19.3/12.3
+| BLEU | Score |
+| ---- | ----  |
+| B-1  | 54.2  |
+| B-2  | 32.6  |
+| B-3  | 19.3  |
+| B-4  | 12.3  |
diff --git a/readme.md b/readme.md
@@ -24,4 +24,7 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H
 
 ##Performance
 Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are:
+
+| BLEU |
+| ---- |
 BLEU = 54.2/32.6/19.3/12.3
diff --git a/readme.md b/readme.md
@@ -23,5 +23,5 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H
 
 
 ##Performance
-https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/
+Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are:
 BLEU = 54.2/32.6/19.3/12.3
diff --git a/readme.md b/readme.md
@@ -1,6 +1,6 @@
 ##Information
 
-name: LSTM image captioning model based on CVPR 2015 paper "Show and tell: A neural image caption generator".
+name: LSTM image captioning model based on CVPR 2015 paper "Show and tell: A neural image caption generator" and code from Karpathy's [NeuralTalk](https://github.com/karpathy/neuraltalk).
 
 model_file:
 
@@ -22,4 +22,6 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H
     CVPR, 2015 (arXiv ref. cs1411.4555)
 
 
-##Performance
+##Performance
+https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/
+BLEU = 54.2/32.6/19.3/12.3
diff --git a/readme.md b/readme.md
@@ -15,7 +15,7 @@ neon_commit:
 gist_id:
 
 ##Description
-The LSTM model is trained on the flickr8k, flickr30k, and coco datasets using precomputed VGG features from http://cs.stanford.edu/people/karpathy/deepimagesent/. Model details can be found in the following [CVPR-2015 paper](http://arxiv.org/abs/1411.4555):
+The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/HockenmaierGroup/8k-pictures.html) using precomputed VGG features from http://cs.stanford.edu/people/karpathy/deepimagesent/. Model details can be found in the following [CVPR-2015 paper](http://arxiv.org/abs/1411.4555):
 
     Show and tell: A neural image caption generator.
     O. Vinyals, A. Toshev, S. Bengio, and D. Erhan.  

diff --git a/readme.md b/readme.md
@@ -17,8 +17,9 @@ gist_id:
 ##Description
 The LSTM model is trained on the flickr8k, flickr30k, and coco datasets using precomputed VGG features from http://cs.stanford.edu/people/karpathy/deepimagesent/. Model details can be found in the following [CVPR-2015 paper](http://arxiv.org/abs/1411.4555):
 
-    O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and
-    tell: A neural image caption generator. CVPR, 2015 (arXiv ref. cs1411.4555)
+    Show and tell: A neural image caption generator.
+    O. Vinyals, A. Toshev, S. Bengio, and D. Erhan.  
+    CVPR, 2015 (arXiv ref. cs1411.4555)
 
 
 ##Performance
diff --git a/readme.md b/readme.md
@@ -17,8 +17,8 @@ gist_id:
 ##Description
 The LSTM model is trained on the flickr8k, flickr30k, and coco datasets using precomputed VGG features from http://cs.stanford.edu/people/karpathy/deepimagesent/. Model details can be found in the following [CVPR-2015 paper](http://arxiv.org/abs/1411.4555):
 
-  O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and
-  tell: A neural image caption generator. CVPR, 2015 (arXiv ref. cs1411.4555)
+    O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and
+    tell: A neural image caption generator. CVPR, 2015 (arXiv ref. cs1411.4555)
 
 
 ##Performance
diff --git a/readme.md b/readme.md
@@ -1,11 +1,19 @@
 ##Information
+
 name: LSTM image captioning model based on CVPR 2015 paper "Show and tell: A neural image caption generator".
+
 model_file:
+
 model_weights:
+
 license:
+
 neon_version:
+
 neon_commit:
+
 gist_id:
+
 ##Description
 The LSTM model is trained on the flickr8k, flickr30k, and coco datasets using precomputed VGG features from http://cs.stanford.edu/people/karpathy/deepimagesent/. Model details can be found in the following [CVPR-2015 paper](http://arxiv.org/abs/1411.4555):
 

diff --git a/readme.md b/readme.md
@@ -1,13 +1,16 @@
 ##Information
----
-name:
+name: LSTM image captioning model based on CVPR 2015 paper "Show and tell: A neural image caption generator".
 model_file:
 model_weights:
 license:
 neon_version:
 neon_commit:
 gist_id:
----
 ##Description
+The LSTM model is trained on the flickr8k, flickr30k, and coco datasets using precomputed VGG features from http://cs.stanford.edu/people/karpathy/deepimagesent/. Model details can be found in the following [CVPR-2015 paper](http://arxiv.org/abs/1411.4555):
+
+  O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and
+  tell: A neural image caption generator. CVPR, 2015 (arXiv ref. cs1411.4555)
+
 
 ##Performance
diff --git a/readme.md b/readme.md
@@ -0,0 +1,13 @@
+##Information
+---
+name:
+model_file:
+model_weights:
+license:
+neon_version:
+neon_commit:
+gist_id:
+---
+##Description
+
+##Performance