Skip to content

Instantly share code, notes, and snippets.

@tianqig
Forked from jcoreyes/readme.md
Created September 11, 2017 09:43
Show Gist options
  • Save tianqig/c7bd2a336a2f763496e0ebed1dc27f6c to your computer and use it in GitHub Desktop.
Save tianqig/c7bd2a336a2f763496e0ebed1dc27f6c to your computer and use it in GitHub Desktop.

Revisions

  1. @jcoreyes jcoreyes revised this gist Sep 17, 2015. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion readme.md
    Original file line number Diff line number Diff line change
    @@ -1,6 +1,6 @@
    ##Information

    name: LSTM image captioning model based on CVPR 2015 paper "Show and tell: A neural image caption generator" and code from Karpathy's [NeuralTalk](https://github.com/karpathy/neuraltalk).
    name: LSTM image captioning model based on CVPR 2015 paper "[Show and tell: A neural image caption generator](http://arxiv.org/abs/1411.4555)" and code from Karpathy's [NeuralTalk](https://github.com/karpathy/neuraltalk).

    model_file: https://s3-us-west-1.amazonaws.com/nervana-modelzoo/image_caption_flickr8k.py

  2. @jcoreyes jcoreyes revised this gist Sep 17, 2015. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion readme.md
    Original file line number Diff line number Diff line change
    @@ -2,7 +2,7 @@

    name: LSTM image captioning model based on CVPR 2015 paper "Show and tell: A neural image caption generator" and code from Karpathy's [NeuralTalk](https://github.com/karpathy/neuraltalk).

    model_file:
    model_file: https://s3-us-west-1.amazonaws.com/nervana-modelzoo/image_caption_flickr8k.py

    model_weights: https://s3-us-west-1.amazonaws.com/nervana-modelzoo/image_caption_flickr8k.p

  3. @jcoreyes jcoreyes revised this gist Sep 17, 2015. 1 changed file with 3 additions and 2 deletions.
    5 changes: 3 additions & 2 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -19,8 +19,9 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H
    O. Vinyals, A. Toshev, S. Bengio, and D. Erhan.
    CVPR, 2015 (arXiv ref. cs1411.4555)

    The model was trained for 15 epochs where 1 epoch is 1 pass over all 5 captions of each image. Training data was shuffled each epoch. To evaluate on the test set, download the model and weights and run:
    python image_caption.py --model_file
    The model was trained for 15 epochs where 1 epoch is 1 pass over all 5 captions of each image. Training data was shuffled each epoch. To evaluate on the test set, download the model and weights, and run:

    python image_caption.py --model_file [path_to_weights]

    ##Performance
    For testing, the model is only given the image and must predict the next word until a stop token is predicted. Greedy search is currently used by just taking the max probable word each time. Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are below.
  4. @jcoreyes jcoreyes revised this gist Sep 17, 2015. 1 changed file with 3 additions and 4 deletions.
    7 changes: 3 additions & 4 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -2,12 +2,10 @@

    name: LSTM image captioning model based on CVPR 2015 paper "Show and tell: A neural image caption generator" and code from Karpathy's [NeuralTalk](https://github.com/karpathy/neuraltalk).

    model_file:
    model_file:

    model_weights: https://s3-us-west-1.amazonaws.com/nervana-modelzoo/image_caption_flickr8k.p

    license:

    neon_version: v1.0.rc1

    neon_commit: 2169b093fbba0c189021a941d286c7a98c0c6c6c
    @@ -21,7 +19,8 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H
    O. Vinyals, A. Toshev, S. Bengio, and D. Erhan.
    CVPR, 2015 (arXiv ref. cs1411.4555)

    The model was trained for 15 epochs where 1 epoch is 1 pass over all 5 captions of each image. Training data was shuffled each epoch.
    The model was trained for 15 epochs where 1 epoch is 1 pass over all 5 captions of each image. Training data was shuffled each epoch. To evaluate on the test set, download the model and weights and run:
    python image_caption.py --model_file

    ##Performance
    For testing, the model is only given the image and must predict the next word until a stop token is predicted. Greedy search is currently used by just taking the max probable word each time. Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are below.
  5. @jcoreyes jcoreyes revised this gist Sep 17, 2015. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -21,6 +21,7 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H
    O. Vinyals, A. Toshev, S. Bengio, and D. Erhan.
    CVPR, 2015 (arXiv ref. cs1411.4555)

    The model was trained for 15 epochs where 1 epoch is 1 pass over all 5 captions of each image. Training data was shuffled each epoch.

    ##Performance
    For testing, the model is only given the image and must predict the next word until a stop token is predicted. Greedy search is currently used by just taking the max probable word each time. Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are below.
  6. @jcoreyes jcoreyes revised this gist Sep 17, 2015. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion readme.md
    Original file line number Diff line number Diff line change
    @@ -23,7 +23,7 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H


    ##Performance
    For testing, the model is only given the image and must predict the next word until a stop token is predicted. Greedy search is currently used by just taking the max probable word each time. Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are:
    For testing, the model is only given the image and must predict the next word until a stop token is predicted. Greedy search is currently used by just taking the max probable word each time. Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are below.

    | BLEU | Score |
    | ---- | ---- |
  7. @jcoreyes jcoreyes revised this gist Sep 17, 2015. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion readme.md
    Original file line number Diff line number Diff line change
    @@ -12,7 +12,7 @@ neon_version: v1.0.rc1

    neon_commit: 2169b093fbba0c189021a941d286c7a98c0c6c6c

    gist_id:7e76e90664f935c6f65d
    gist_id: 7e76e90664f935c6f65d

    ##Description
    The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/HockenmaierGroup/8k-pictures.html) using precomputed VGG features from http://cs.stanford.edu/people/karpathy/deepimagesent/. Model details can be found in the following [CVPR-2015 paper](http://arxiv.org/abs/1411.4555):
  8. @jcoreyes jcoreyes revised this gist Sep 17, 2015. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion readme.md
    Original file line number Diff line number Diff line change
    @@ -12,7 +12,7 @@ neon_version: v1.0.rc1

    neon_commit: 2169b093fbba0c189021a941d286c7a98c0c6c6c

    gist_id:
    gist_id:7e76e90664f935c6f65d

    ##Description
    The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/HockenmaierGroup/8k-pictures.html) using precomputed VGG features from http://cs.stanford.edu/people/karpathy/deepimagesent/. Model details can be found in the following [CVPR-2015 paper](http://arxiv.org/abs/1411.4555):
  9. @jcoreyes jcoreyes revised this gist Sep 17, 2015. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -8,9 +8,9 @@ model_weights: https://s3-us-west-1.amazonaws.com/nervana-modelzoo/image_caption

    license:

    neon_version:
    neon_version: v1.0.rc1

    neon_commit:
    neon_commit: 2169b093fbba0c189021a941d286c7a98c0c6c6c

    gist_id:

  10. @jcoreyes jcoreyes revised this gist Sep 17, 2015. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion readme.md
    Original file line number Diff line number Diff line change
    @@ -4,7 +4,7 @@ name: LSTM image captioning model based on CVPR 2015 paper "Show and tell: A neu

    model_file:

    model_weights:
    model_weights: https://s3-us-west-1.amazonaws.com/nervana-modelzoo/image_caption_flickr8k.p

    license:

  11. @jcoreyes jcoreyes revised this gist Sep 17, 2015. 1 changed file with 4 additions and 2 deletions.
    6 changes: 4 additions & 2 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -23,11 +23,13 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H


    ##Performance
    Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are:
    For testing, the model is only given the image and must predict the next word until a stop token is predicted. Greedy search is currently used by just taking the max probable word each time. Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are:

    | BLEU | Score |
    | ---- | ---- |
    | B-1 | 54.2 |
    | B-2 | 32.6 |
    | B-3 | 19.3 |
    | B-4 | 12.3 |
    | B-4 | 12.3 |

    A few things that were not implemented are beam search, l2 regularization, and ensembles. With these things, performance would be a bit better.
  12. @jcoreyes jcoreyes revised this gist Sep 16, 2015. No changes.
  13. @jcoreyes jcoreyes revised this gist Sep 16, 2015. 1 changed file with 6 additions and 3 deletions.
    9 changes: 6 additions & 3 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -25,6 +25,9 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H
    ##Performance
    Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are:

    | BLEU |
    | ---- |
    BLEU = 54.2/32.6/19.3/12.3
    | BLEU | Score |
    | ---- | ---- |
    | B-1 | 54.2 |
    | B-2 | 32.6 |
    | B-3 | 19.3 |
    | B-4 | 12.3 |
  14. @jcoreyes jcoreyes revised this gist Sep 16, 2015. 1 changed file with 3 additions and 0 deletions.
    3 changes: 3 additions & 0 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -24,4 +24,7 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H

    ##Performance
    Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are:

    | BLEU |
    | ---- |
    BLEU = 54.2/32.6/19.3/12.3
  15. @jcoreyes jcoreyes revised this gist Sep 16, 2015. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion readme.md
    Original file line number Diff line number Diff line change
    @@ -23,5 +23,5 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H


    ##Performance
    https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/
    Using the bleu score evaluation script from https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/ and evaluating against 5 reference sentences the results are:
    BLEU = 54.2/32.6/19.3/12.3
  16. @jcoreyes jcoreyes revised this gist Sep 16, 2015. 1 changed file with 4 additions and 2 deletions.
    6 changes: 4 additions & 2 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -1,6 +1,6 @@
    ##Information

    name: LSTM image captioning model based on CVPR 2015 paper "Show and tell: A neural image caption generator".
    name: LSTM image captioning model based on CVPR 2015 paper "Show and tell: A neural image caption generator" and code from Karpathy's [NeuralTalk](https://github.com/karpathy/neuraltalk).

    model_file:

    @@ -22,4 +22,6 @@ The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/H
    CVPR, 2015 (arXiv ref. cs1411.4555)


    ##Performance
    ##Performance
    https://raw.githubusercontent.com/karpathy/neuraltalk/master/eval/
    BLEU = 54.2/32.6/19.3/12.3
  17. @jcoreyes jcoreyes revised this gist Sep 16, 2015. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion readme.md
    Original file line number Diff line number Diff line change
    @@ -15,7 +15,7 @@ neon_commit:
    gist_id:

    ##Description
    The LSTM model is trained on the flickr8k, flickr30k, and coco datasets using precomputed VGG features from http://cs.stanford.edu/people/karpathy/deepimagesent/. Model details can be found in the following [CVPR-2015 paper](http://arxiv.org/abs/1411.4555):
    The LSTM model is trained on the [flickr8k dataset](http://nlp.cs.illinois.edu/HockenmaierGroup/8k-pictures.html) using precomputed VGG features from http://cs.stanford.edu/people/karpathy/deepimagesent/. Model details can be found in the following [CVPR-2015 paper](http://arxiv.org/abs/1411.4555):

    Show and tell: A neural image caption generator.
    O. Vinyals, A. Toshev, S. Bengio, and D. Erhan.
  18. @jcoreyes jcoreyes revised this gist Sep 16, 2015. No changes.
  19. @jcoreyes jcoreyes revised this gist Sep 16, 2015. 1 changed file with 3 additions and 2 deletions.
    5 changes: 3 additions & 2 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -17,8 +17,9 @@ gist_id:
    ##Description
    The LSTM model is trained on the flickr8k, flickr30k, and coco datasets using precomputed VGG features from http://cs.stanford.edu/people/karpathy/deepimagesent/. Model details can be found in the following [CVPR-2015 paper](http://arxiv.org/abs/1411.4555):

    O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and
    tell: A neural image caption generator. CVPR, 2015 (arXiv ref. cs1411.4555)
    Show and tell: A neural image caption generator.
    O. Vinyals, A. Toshev, S. Bengio, and D. Erhan.
    CVPR, 2015 (arXiv ref. cs1411.4555)


    ##Performance
  20. @jcoreyes jcoreyes revised this gist Sep 16, 2015. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -17,8 +17,8 @@ gist_id:
    ##Description
    The LSTM model is trained on the flickr8k, flickr30k, and coco datasets using precomputed VGG features from http://cs.stanford.edu/people/karpathy/deepimagesent/. Model details can be found in the following [CVPR-2015 paper](http://arxiv.org/abs/1411.4555):

    O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and
    tell: A neural image caption generator. CVPR, 2015 (arXiv ref. cs1411.4555)
    O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and
    tell: A neural image caption generator. CVPR, 2015 (arXiv ref. cs1411.4555)


    ##Performance
  21. @jcoreyes jcoreyes revised this gist Sep 16, 2015. 1 changed file with 8 additions and 0 deletions.
    8 changes: 8 additions & 0 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -1,11 +1,19 @@
    ##Information

    name: LSTM image captioning model based on CVPR 2015 paper "Show and tell: A neural image caption generator".

    model_file:

    model_weights:

    license:

    neon_version:

    neon_commit:

    gist_id:

    ##Description
    The LSTM model is trained on the flickr8k, flickr30k, and coco datasets using precomputed VGG features from http://cs.stanford.edu/people/karpathy/deepimagesent/. Model details can be found in the following [CVPR-2015 paper](http://arxiv.org/abs/1411.4555):

  22. @jcoreyes jcoreyes revised this gist Sep 16, 2015. 1 changed file with 6 additions and 3 deletions.
    9 changes: 6 additions & 3 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -1,13 +1,16 @@
    ##Information
    ---
    name:
    name: LSTM image captioning model based on CVPR 2015 paper "Show and tell: A neural image caption generator".
    model_file:
    model_weights:
    license:
    neon_version:
    neon_commit:
    gist_id:
    ---
    ##Description
    The LSTM model is trained on the flickr8k, flickr30k, and coco datasets using precomputed VGG features from http://cs.stanford.edu/people/karpathy/deepimagesent/. Model details can be found in the following [CVPR-2015 paper](http://arxiv.org/abs/1411.4555):

    O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and
    tell: A neural image caption generator. CVPR, 2015 (arXiv ref. cs1411.4555)


    ##Performance
  23. @jcoreyes jcoreyes created this gist Sep 16, 2015.
    13 changes: 13 additions & 0 deletions readme.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,13 @@
    ##Information
    ---
    name:
    model_file:
    model_weights:
    license:
    neon_version:
    neon_commit:
    gist_id:
    ---
    ##Description

    ##Performance