Skip to content

Instantly share code, notes, and snippets.

@jcoreyes
Last active January 21, 2018 10:41
Show Gist options
  • Save jcoreyes/7e76e90664f935c6f65d to your computer and use it in GitHub Desktop.
Save jcoreyes/7e76e90664f935c6f65d to your computer and use it in GitHub Desktop.

Revisions

  1. jcoreyes revised this gist Sep 17, 2015. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion readme.md
    Original file line number Diff line number Diff line change
    @@ -1,6 +1,6 @@
    ##Information

    name: LSTM image captioning model based on CVPR 2015 paper "Show and tell: A neural image caption generator" and code from Karpathy's [NeuralTalk](https://github.com/karpathy/neuraltalk).
    name: LSTM image captioning model based on CVPR 2015 paper "[Show and tell: A neural image caption generator](http://arxiv.org/abs/1411.4555)" and code from Karpathy's [NeuralTalk](https://github.com/karpathy/neuraltalk).

    model_file: https://s3-us-west-1.amazonaws.com/nervana-modelzoo/image_caption_flickr8k.py

  2. jcoreyes revised this gist Sep 17, 2015. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion readme.md
    Original file line number Diff line number Diff line change
    @@ -2,7 +2,7 @@

    name: LSTM image captioning model based on CVPR 2015 paper "Show and tell: A neural image caption generator" and code from Karpathy's [NeuralTalk](https://github.com/karpathy/neuraltalk).

    model_file:
    model_file: https://s3-us-west-1.amazonaws.com/nervana-modelzoo/image_caption_flickr8k.py

    model_weights: https://s3-us-west-1.amazonaws.com/nervana-modelzoo/image_caption_flickr8k.p

  3. jcoreyes revised this gist Sep 17, 2015. 1 changed file with 3 additions and 2 deletions.
    5 changes: 3 additions & 2 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -19,8 +19,9 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H
    O. Vinyals, A. Toshev, S. Bengio, and D. Erhan.
    CVPR, 2015 (arXiv ref. cs1411.4555)

    The model was trained for 15 epochs where 1 epoch is 1 pass over all 5 captions of each image. Training data was shuffled each epoch. To evaluate on the test set, download the model and weights and run:
    python image_caption.py --model_file
    The model was trained for 15 epochs where 1 epoch is 1 pass over all 5 captions of each image. Training data was shuffled each epoch. To evaluate on the test set, download the model and weights, and run:

    python image_caption.py --model_file [path_to_weights]

    ##Performance
    For testing, the model is only given the image and must predict the next word until a stop token is predicted. Greedy search is currently used by just taking the max probable word each time. Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are below.
  4. jcoreyes revised this gist Sep 17, 2015. 1 changed file with 3 additions and 4 deletions.
    7 changes: 3 additions & 4 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -2,12 +2,10 @@

    name: LSTM image captioning model based on CVPR 2015 paper "Show and tell: A neural image caption generator" and code from Karpathy's [NeuralTalk](https://github.com/karpathy/neuraltalk).

    model_file:
    model_file:

    model_weights: https://s3-us-west-1.amazonaws.com/nervana-modelzoo/image_caption_flickr8k.p

    license:

    neon_version: v1.0.rc1

    neon_commit: 2169b093fbba0c189021a941d286c7a98c0c6c6c
    @@ -21,7 +19,8 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H
    O. Vinyals, A. Toshev, S. Bengio, and D. Erhan.
    CVPR, 2015 (arXiv ref. cs1411.4555)

    The model was trained for 15 epochs where 1 epoch is 1 pass over all 5 captions of each image. Training data was shuffled each epoch.
    The model was trained for 15 epochs where 1 epoch is 1 pass over all 5 captions of each image. Training data was shuffled each epoch. To evaluate on the test set, download the model and weights and run:
    python image_caption.py --model_file

    ##Performance
    For testing, the model is only given the image and must predict the next word until a stop token is predicted. Greedy search is currently used by just taking the max probable word each time. Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are below.
  5. jcoreyes revised this gist Sep 17, 2015. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -21,6 +21,7 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H
    O. Vinyals, A. Toshev, S. Bengio, and D. Erhan.
    CVPR, 2015 (arXiv ref. cs1411.4555)

    The model was trained for 15 epochs where 1 epoch is 1 pass over all 5 captions of each image. Training data was shuffled each epoch.

    ##Performance
    For testing, the model is only given the image and must predict the next word until a stop token is predicted. Greedy search is currently used by just taking the max probable word each time. Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are below.
  6. jcoreyes revised this gist Sep 17, 2015. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion readme.md
    Original file line number Diff line number Diff line change
    @@ -23,7 +23,7 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H


    ##Performance
    For testing, the model is only given the image and must predict the next word until a stop token is predicted. Greedy search is currently used by just taking the max probable word each time. Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are:
    For testing, the model is only given the image and must predict the next word until a stop token is predicted. Greedy search is currently used by just taking the max probable word each time. Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are below.

    | BLEU | Score |
    | ---- | ---- |
  7. jcoreyes revised this gist Sep 17, 2015. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion readme.md
    Original file line number Diff line number Diff line change
    @@ -12,7 +12,7 @@ neon_version: v1.0.rc1

    neon_commit: 2169b093fbba0c189021a941d286c7a98c0c6c6c

    gist_id:7e76e90664f935c6f65d
    gist_id: 7e76e90664f935c6f65d

    ##Description
    The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/HockenmaierGroup/8k-pictures.html) using precomputed VGG features from http://cs.stanford.edu/people/karpathy/deepimagesent/. Model details can be found in the following [CVPR-2015 paper](http://arxiv.org/abs/1411.4555):
  8. jcoreyes revised this gist Sep 17, 2015. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion readme.md
    Original file line number Diff line number Diff line change
    @@ -12,7 +12,7 @@ neon_version: v1.0.rc1

    neon_commit: 2169b093fbba0c189021a941d286c7a98c0c6c6c

    gist_id:
    gist_id:7e76e90664f935c6f65d

    ##Description
    The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/HockenmaierGroup/8k-pictures.html) using precomputed VGG features from http://cs.stanford.edu/people/karpathy/deepimagesent/. Model details can be found in the following [CVPR-2015 paper](http://arxiv.org/abs/1411.4555):
  9. jcoreyes revised this gist Sep 17, 2015. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -8,9 +8,9 @@ model_weights: https://s3-us-west-1.amazonaws.com/nervana-modelzoo/image_caption

    license:

    neon_version:
    neon_version: v1.0.rc1

    neon_commit:
    neon_commit: 2169b093fbba0c189021a941d286c7a98c0c6c6c

    gist_id:

  10. jcoreyes revised this gist Sep 17, 2015. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion readme.md
    Original file line number Diff line number Diff line change
    @@ -4,7 +4,7 @@ name: LSTM image captioning model based on CVPR 2015 paper "Show and tell: A neu

    model_file:

    model_weights:
    model_weights: https://s3-us-west-1.amazonaws.com/nervana-modelzoo/image_caption_flickr8k.p

    license:

  11. jcoreyes revised this gist Sep 17, 2015. 1 changed file with 4 additions and 2 deletions.
    6 changes: 4 additions & 2 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -23,11 +23,13 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H


    ##Performance
    Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are:
    For testing, the model is only given the image and must predict the next word until a stop token is predicted. Greedy search is currently used by just taking the max probable word each time. Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are:

    | BLEU | Score |
    | ---- | ---- |
    | B-1 | 54.2 |
    | B-2 | 32.6 |
    | B-3 | 19.3 |
    | B-4 | 12.3 |
    | B-4 | 12.3 |

    A few things that were not implemented are beam search, l2 regularization, and ensembles. With these things, performance would be a bit better.
  12. jcoreyes revised this gist Sep 16, 2015. No changes.
  13. jcoreyes revised this gist Sep 16, 2015. 1 changed file with 6 additions and 3 deletions.
    9 changes: 6 additions & 3 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -25,6 +25,9 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H
    ##Performance
    Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are:

    | BLEU |
    | ---- |
    BLEU = 54.2/32.6/19.3/12.3
    | BLEU | Score |
    | ---- | ---- |
    | B-1 | 54.2 |
    | B-2 | 32.6 |
    | B-3 | 19.3 |
    | B-4 | 12.3 |
  14. jcoreyes revised this gist Sep 16, 2015. 1 changed file with 3 additions and 0 deletions.
    3 changes: 3 additions & 0 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -24,4 +24,7 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H

    ##Performance
    Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are:

    | BLEU |
    | ---- |
    BLEU = 54.2/32.6/19.3/12.3
  15. jcoreyes revised this gist Sep 16, 2015. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion readme.md
    Original file line number Diff line number Diff line change
    @@ -23,5 +23,5 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H


    ##Performance
    https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/
    Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are:
    BLEU = 54.2/32.6/19.3/12.3
  16. jcoreyes revised this gist Sep 16, 2015. 1 changed file with 4 additions and 2 deletions.
    6 changes: 4 additions & 2 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -1,6 +1,6 @@
    ##Information

    name: LSTM image captioning model based on CVPR 2015 paper "Show and tell: A neural image caption generator".
    name: LSTM image captioning model based on CVPR 2015 paper "Show and tell: A neural image caption generator" and code from Karpathy's [NeuralTalk](https://github.com/karpathy/neuraltalk).

    model_file:

    @@ -22,4 +22,6 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H
    CVPR, 2015 (arXiv ref. cs1411.4555)


    ##Performance
    ##Performance
    https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/
    BLEU = 54.2/32.6/19.3/12.3
  17. jcoreyes revised this gist Sep 16, 2015. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion readme.md
    Original file line number Diff line number Diff line change
    @@ -15,7 +15,7 @@ neon_commit:
    gist_id:

    ##Description
    The LSTM model is trained on the flickr8k, flickr30k, and coco datasets using precomputed VGG features from http://cs.stanford.edu/people/karpathy/deepimagesent/. Model details can be found in the following [CVPR-2015 paper](http://arxiv.org/abs/1411.4555):
    The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/HockenmaierGroup/8k-pictures.html) using precomputed VGG features from http://cs.stanford.edu/people/karpathy/deepimagesent/. Model details can be found in the following [CVPR-2015 paper](http://arxiv.org/abs/1411.4555):

    Show and tell: A neural image caption generator.
    O. Vinyals, A. Toshev, S. Bengio, and D. Erhan.
  18. jcoreyes revised this gist Sep 16, 2015. No changes.
  19. jcoreyes revised this gist Sep 16, 2015. 1 changed file with 3 additions and 2 deletions.
    5 changes: 3 additions & 2 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -17,8 +17,9 @@ gist_id:
    ##Description
    The LSTM model is trained on the flickr8k, flickr30k, and coco datasets using precomputed VGG features from http://cs.stanford.edu/people/karpathy/deepimagesent/. Model details can be found in the following [CVPR-2015 paper](http://arxiv.org/abs/1411.4555):

    O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and
    tell: A neural image caption generator. CVPR, 2015 (arXiv ref. cs1411.4555)
    Show and tell: A neural image caption generator.
    O. Vinyals, A. Toshev, S. Bengio, and D. Erhan.
    CVPR, 2015 (arXiv ref. cs1411.4555)


    ##Performance
  20. jcoreyes revised this gist Sep 16, 2015. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -17,8 +17,8 @@ gist_id:
    ##Description
    The LSTM model is trained on the flickr8k, flickr30k, and coco datasets using precomputed VGG features from http://cs.stanford.edu/people/karpathy/deepimagesent/. Model details can be found in the following [CVPR-2015 paper](http://arxiv.org/abs/1411.4555):

    O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and
    tell: A neural image caption generator. CVPR, 2015 (arXiv ref. cs1411.4555)
    O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and
    tell: A neural image caption generator. CVPR, 2015 (arXiv ref. cs1411.4555)


    ##Performance
  21. jcoreyes revised this gist Sep 16, 2015. 1 changed file with 8 additions and 0 deletions.
    8 changes: 8 additions & 0 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -1,11 +1,19 @@
    ##Information

    name: LSTM image captioning model based on CVPR 2015 paper "Show and tell: A neural image caption generator".

    model_file:

    model_weights:

    license:

    neon_version:

    neon_commit:

    gist_id:

    ##Description
    The LSTM model is trained on the flickr8k, flickr30k, and coco datasets using precomputed VGG features from http://cs.stanford.edu/people/karpathy/deepimagesent/. Model details can be found in the following [CVPR-2015 paper](http://arxiv.org/abs/1411.4555):

  22. jcoreyes revised this gist Sep 16, 2015. 1 changed file with 6 additions and 3 deletions.
    9 changes: 6 additions & 3 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -1,13 +1,16 @@
    ##Information
    ---
    name:
    name: LSTM image captioning model based on CVPR 2015 paper "Show and tell: A neural image caption generator".
    model_file:
    model_weights:
    license:
    neon_version:
    neon_commit:
    gist_id:
    ---
    ##Description
    The LSTM model is trained on the flickr8k, flickr30k, and coco datasets using precomputed VGG features from http://cs.stanford.edu/people/karpathy/deepimagesent/. Model details can be found in the following [CVPR-2015 paper](http://arxiv.org/abs/1411.4555):

    O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and
    tell: A neural image caption generator. CVPR, 2015 (arXiv ref. cs1411.4555)


    ##Performance
  23. jcoreyes created this gist Sep 16, 2015.
    13 changes: 13 additions & 0 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,13 @@
    ##Information
    ---
    name:
    model_file:
    model_weights:
    license:
    neon_version:
    neon_commit:
    gist_id:
    ---
    ##Description

    ##Performance