Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save manisnesan/02efcbe93f7c4091bcc809bc46853e66 to your computer and use it in GitHub Desktop.
Save manisnesan/02efcbe93f7c4091bcc809bc46853e66 to your computer and use it in GitHub Desktop.

Revisions

  1. @shagunsodhani shagunsodhani created this gist Sep 27, 2016.
    41 changes: 41 additions & 0 deletions Bag of Tricks for Efficient Text Classification.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,41 @@
    # Bag of Tricks for Efficient Text Classification

    ## Introduction

    * Introduces fastText, a simple and highly efficient approach for text classification.
    * At par with deep learning models in terms of accuracy though an order of magnitude faster in performance.
    * [Link to the paper](http://arxiv.org/abs/1607.01759v3)
    * [Link to code](https://github.com/facebookresearch/fastText)

    ## Architecture

    * Built on top of linear models with a rank constraint and a fast loss approximation.
    * Start with word representations that are averaged into text representation and feed them to a linear classifier.
    * Think of text representation as a hidden state that can be shared among features and classes.
    * Softmax layer to obtain a probability distribution over pre-defined classes.
    * High computational complexity *O(kh)*, *k* is the number of classes and *h* is dimension of text representation.

    ### Hierarchial Softmax

    * Based on Huffman Coding Tree
    * Used to reduce complexity to *O(hlog(k))*
    * Top T results (from the tree) can be computed efficiently *O(logT)* using a binary heap.

    ### N-gram Features

    * Instead of explicitly using word order, uses a bag of n-grams to maintain efficiency without losing on accuracy.
    * Uses [hashing trick](https://arxiv.org/pdf/0902.2206.pdf) to maintain fast and memory efficient mapping of the n-grams.

    ## Experiments

    ### Sentiment Analysis

    * fastText benefits by using bigrams.
    * Outperforms [char-CNN](http://arxiv.org/abs/1502.01710v5) and [char-CRNN](http://arxiv.org/abs/1602.00367v1) and performs a bit worse than [VDCNN](http://arxiv.org/abs/1606.01781v1).
    * Order of magnitudes faster in terms of training time.
    * Note: fastText does not use pre-trained word embeddings.

    ### Tag Prediction

    * fastText with bigrams outperforms [Tagspace](http://emnlp2014.org/papers/pdf/EMNLP2014194.pdf).
    * fastText performs upto 600 times faster at test time.