Skip to content

Instantly share code, notes, and snippets.

@martinapugliese
Last active August 17, 2016 20:53
Show Gist options
  • Select an option

  • Save martinapugliese/56c69f18f2c8c4c3f20dad3c36a2bbbc to your computer and use it in GitHub Desktop.

Select an option

Save martinapugliese/56c69f18f2c8c4c3f20dad3c36a2bbbc to your computer and use it in GitHub Desktop.

Revisions

  1. Martina Pugliese renamed this gist Aug 17, 2016. 1 changed file with 0 additions and 0 deletions.
    File renamed without changes.
  2. Martina Pugliese revised this gist Aug 12, 2016. 1 changed file with 0 additions and 1 deletion.
    1 change: 0 additions & 1 deletion nltk_helpers.py
    Original file line number Diff line number Diff line change
    @@ -1,4 +1,3 @@
    # Helper functions for NLTK
    # Copyright (C) 2016 Martina Pugliese


  3. Martina Pugliese revised this gist Aug 12, 2016. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions nltk_helpers.py
    Original file line number Diff line number Diff line change
    @@ -1,4 +1,5 @@
    # Helper functions for NLTK
    # Copyright (C) 2016 Martina Pugliese


    def plot_freqdist_freq(fd,
  4. Martina Pugliese revised this gist Aug 12, 2016. 1 changed file with 14 additions and 3 deletions.
    17 changes: 14 additions & 3 deletions nltk_helpers.py
    Original file line number Diff line number Diff line change
    @@ -2,19 +2,30 @@


    def plot_freqdist_freq(fd,
    max_num=None,
    cumulative=False,
    title='Frequency plot',
    linewidth=2):
    """
    As of NLTK version 3.2.1, FreqDist.plot() plots the counts and has no kwarg for normalising to frequency. Work this around here.
    INPUT: the FreqDist object; OUTPUT: plot the freq and return None.
    INPUT:
    - the FreqDist object
    - max_num: if specified, only plot up to this number of items (they are already sorted descending by the FreqDist)
    - cumulative: bool (defaults to False)
    - title: the title to give the plot
    - linewidth: the width of line to use (defaults to 2)
    OUTPUT: plot the freq and return None.
    """

    tmp = fd.copy()
    norm = fd.N()
    for key in tmp.keys():
    tmp[key] = float(fd[key]) / norm

    tmp.plot(cumulative=cumulative, title=title, linewidth=linewidth)
    if max_num:
    tmp.plot(max_num, cumulative=cumulative,
    title=title, linewidth=linewidth)
    else:
    tmp.plot(cumulative=cumulative, title=title, linewidth=linewidth)

    return
    return
  5. Martina Pugliese created this gist Aug 12, 2016.
    20 changes: 20 additions & 0 deletions nltk_helpers.py
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,20 @@
    # Helper functions for NLTK


    def plot_freqdist_freq(fd,
    cumulative=False,
    title='Frequency plot',
    linewidth=2):
    """
    As of NLTK version 3.2.1, FreqDist.plot() plots the counts and has no kwarg for normalising to frequency. Work this around here.
    INPUT: the FreqDist object; OUTPUT: plot the freq and return None.
    """

    tmp = fd.copy()
    norm = fd.N()
    for key in tmp.keys():
    tmp[key] = float(fd[key]) / norm

    tmp.plot(cumulative=cumulative, title=title, linewidth=linewidth)

    return