Skip to content

Instantly share code, notes, and snippets.

@vishwanath79
Last active August 7, 2020 12:17
Show Gist options
  • Save vishwanath79/30d39e4e9c32522b7acf2bf04e5ad3e5 to your computer and use it in GitHub Desktop.
Save vishwanath79/30d39e4e9c32522b7acf2bf04e5ad3e5 to your computer and use it in GitHub Desktop.

Revisions

  1. vishwanath79 revised this gist Aug 7, 2020. 1 changed file with 60 additions and 32736 deletions.
    32,796 changes: 60 additions & 32,736 deletions TFDV_datavalidation.ipynb
    60 additions, 32,736 deletions not shown because the diff is too large. Please use a local Git client to view these changes.
  2. vishwanath79 revised this gist Aug 7, 2020. 1 changed file with 11 additions and 2 deletions.
    13 changes: 11 additions & 2 deletions TFDV_datavalidation.ipynb
    Original file line number Diff line number Diff line change
    @@ -34624,10 +34624,19 @@
    },
    {
    "cell_type": "code",
    "execution_count": null,
    "execution_count": 74,
    "metadata": {},
    "outputs": [],
    "source": []
    "source": [
    "#Save schema\n",
    "import os\n",
    "from tensorflow.python.lib.io import file_io\n",
    "from google.protobuf import text_format\n",
    "OUTPUT_DIR = \"\"\n",
    "file_io.recursive_create_dir(OUTPUT_DIR)\n",
    "schema_file = os.path.join(OUTPUT_DIR, 'schema.pbtxt')\n",
    "tfdv.write_schema_text(schema, schema_file)"
    ]
    }
    ],
    "metadata": {
  3. vishwanath79 revised this gist Aug 7, 2020. 1 changed file with 89 additions and 32777 deletions.
    32,866 changes: 89 additions & 32,777 deletions TFDV_datavalidation.ipynb
    89 additions, 32,777 deletions not shown because the diff is too large. Please use a local Git client to view these changes.
  4. vishwanath79 revised this gist Aug 7, 2020. 1 changed file with 28 additions and 8 deletions.
    36 changes: 28 additions & 8 deletions TFDV_datavalidation.ipynb
    Original file line number Diff line number Diff line change
    @@ -10,6 +10,15 @@
    "warnings.filterwarnings('ignore')"
    ]
    },
    {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
    "#Dataset location: https://www.kaggle.com/eswarchandt/amazon-music-reviews?select=Musical_instruments_reviews.csv"
    ]
    },
    {
    "cell_type": "code",
    "execution_count": 56,
    @@ -67091,6 +67100,23 @@
    "skew_anomalies"
    ]
    },
    {
    "cell_type": "code",
    "execution_count": null,
    "outputs": [],
    "source": [
    "#Drift comparator\n",
    "tfdv.get_feature(schema,'helpful').drift_comparator.infinity_norm.threshold = 0.01\n",
    "drift_anomalies = tfdv.validate_statistics(statistics=TRAIN,schema=schema,previous_statistics=TRAIN)\n",
    "drift_anomalies"
    ],
    "metadata": {
    "collapsed": false,
    "pycharm": {
    "name": "#%%\n"
    }
    }
    },
    {
    "cell_type": "code",
    "execution_count": 72,
    @@ -67282,13 +67308,7 @@
    "output_type": "execute_result"
    }
    ],
    "source": [
    "#Drift comparator\n",
    "\n",
    "tfdv.get_feature(schema,'helpful').drift_comparator.infinity_norm.threshold = 0.01\n",
    "drift_anomalies = tfdv.validate_statistics(statistics=TRAIN,schema=schema,previous_statistics=TRAIN)\n",
    "drift_anomalies"
    ]
    "source": []
    },
    {
    "cell_type": "code",
    @@ -67319,4 +67339,4 @@
    },
    "nbformat": 4,
    "nbformat_minor": 1
    }
    }
  5. vishwanath79 revised this gist Aug 6, 2020. 1 changed file with 67322 additions and 1 deletion.
    67,323 changes: 67,322 additions & 1 deletion TFDV_datavalidation.ipynb
    67,322 additions, 1 deletion not shown because the diff is too large. Please use a local Git client to view these changes.
  6. vishwanath79 renamed this gist Aug 6, 2020. 1 changed file with 0 additions and 0 deletions.
    File renamed without changes.
  7. vishwanath79 created this gist Aug 6, 2020.
    1 change: 1 addition & 0 deletions TFDV_datavalidation
    Original file line number Diff line number Diff line change
    @@ -0,0 +1 @@
    import tfdv