{
 "metadata": {
  "name": ""
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "JellyFish"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Author: Rory Creedon\n",
      "\n",
      "Date: 28 November 2013\n",
      "\n",
      "Purpose: To examine the possibility of using the python JellyFish library for complex \"fuzzy\" string matching."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**CONTENTS**:\n",
      "\n",
      "+ **Introduction** - Outlining the Problem\n",
      "+ **JellyFish and It's Algorithms** - Brief introduction to JellyFish\n",
      "+ **Levenshtein Distance** - Description, Calculation, and Examples\n",
      "+ **Damerau Levenshtein Distance** - Description, Calculation, and Examples\n",
      "+ **Jaro Distance** - Description, Calculation, and Examples\n",
      "+ **Jaro-Winkler Distance** - Description, Calculation, and Examples\n",
      "+ **Match Rating Approach Comparison** - Brief Description\n",
      "+ **Hamming Distance** - Brief Description\n",
      "+ **Experimenting with the Measures**\n",
      "+ **Conclusion**"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import pandas as pd\n",
      "import numpy as np\n",
      "import datetime\n",
      "import os\n",
      "from pandas import DataFrame\n",
      "from numpy import nan as NA\n",
      "from IPython.core.display import HTML\n",
      "from IPython.core.display import Image\n",
      "from IPython.display import Math\n",
      "from IPython.display import Latex\n",
      "import collections\n",
      "import jellyfish as jf\n",
      "import re"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 1
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Introduction"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "With regard to style names in production data we face two distinct problems:\n",
      "\n",
      "1. Style names not being clean within factory. 
\n",
      "