Last active
          February 22, 2024 22:13 
        
      - 
      
- 
        Save kylemcdonald/9bedafead69145875b8c to your computer and use it in GitHub Desktop. 
Revisions
- 
        kylemcdonald renamed this gist Aug 23, 2015 . 1 changed file with 0 additions and 0 deletions.There are no files selected for viewingFile renamed without changes.
- 
        kylemcdonald revised this gist Aug 23, 2015 . 1 changed file with 139 additions and 104 deletions.There are no files selected for viewingThis file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,40 +1,46 @@ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "First we load the GoogleNews word2vec vectors." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np\n", "from gensim.models import Word2Vec\n", "model = Word2Vec.load_word2vec_format('models/GoogleNews-vectors-negative300.bin', binary=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we are going to load the antonym pairs from a text file, but there are some duplicates. So we need a function to remove them." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[{'happy', 'sad'}, {'joyful', 'sad'}]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } @@ -43,8 +49,7 @@ "def unique_pairs(pairs):\n", " unique = []\n", " for x, y in pairs:\n", " pair = set([x, y])\n", " if not pair in unique:\n", " unique.append(pair)\n", " return unique\n", @@ -54,42 +59,55 @@ }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [], "source": [ "antonyms = unique_pairs(np.genfromtxt('antonyms.txt', dtype='str', delimiter=','))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Instead of using `most_similar` directly, we'll make a small wrapper that does analogies of the form `x:y::a:b`" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "man : king :: woman : queen\n" ] } ], "source": [ "def analogy(x, y, a):\n", " b = model.most_similar(positive=[a, y], negative=[x], topn=1)[0][0]\n", " return b, ' '.join([x,':',y,'::',a,':',b])\n", "b, text = analogy('man', 'king', 'woman')\n", "print(text)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we can do an analogy with every antonym pair to find different kinds of \"opposites\" for a target word. Not all of them make sense, and there is a lot of repetition, so we only print the unique results." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, @@ -98,75 +116,102 @@ "name": "stdout", "output_type": "stream", "text": [ "presence : absence :: happy : unhappy\n", "absence : presence :: happy : proud\n", "abundant : scarce :: happy : glad\n", "refuse : accept :: happy : satisfied\n", "accurate : inaccurate :: happy : disappointed\n", "admit : deny :: happy : delighted\n", "never : always :: happy : Said_Hirschbeck\n", "modern : ancient :: happy : ecstatic\n", "receded : approached :: happy : excited\n", "departure : arrival :: happy : overjoyed\n", "ascend : descend :: happy : anxious\n", "asleep : awake :: happy : enthused\n", "attractive : repulsive :: happy : disgusting\n", "forward : backward :: happy : sorry\n", "backward : forward :: happy : pleased\n", "ugly : beautiful :: happy : wonderful\n", "beginning : ending :: happy : happier\n", "bent : straight :: happy : consecutive\n", "worst : best :: happy : thrilled\n", "better : worse :: happy : sad\n", "bitter : sweet :: happy : nice\n", "curse : bless :: happy : thankful\n", "bless : curse :: happy : jinx\n", "bright : dull :: happy : boring\n", "troubled : calm :: happy : relaxed\n", "vague : clear :: happy : sure\n", "simple : cunning :: happy : envious\n", "indefinite : definite :: happy : definitely\n", "hope : despair :: happy : despondent\n", "dismal : cheerful :: happy : sociable\n", "cheerful : dismal :: happy : disappointing\n", "waste : economise :: happy : chuffed\n", "full : empty :: happy : dejected\n", "excited : calm :: happy : quiet\n", "contract : expand :: happy : broaden\n", "expand : contract :: happy : contracts\n", "famous : unknown :: happy : unsure\n", "powerful : feeble :: happy : pitiful\n", "found : lost :: happy : losing\n", "friend : enemy :: happy : confident\n", "mean : generous :: happy : grateful\n", "generous : mean :: happy : anymore\n", "rough : gentle :: happy : contented\n", "great : minute :: happy : minutes\n", "minute : great :: happy : good\n", "happy : sad :: happy : saddening\n", "hasten : dawdle :: happy : comfortable\n", "proud : humble :: happy : cheerful\n", "immense : tiny :: happy : bummed\n", "free : imprison :: happy : shocked_Gosper\n", "join : separate :: happy : seperate\n", "subject : king :: happy : kings\n", "large : little :: happy : bit\n", "little : large :: happy : sizeable\n", "cry : laugh :: happy : chuckle\n", "find : loss :: happy : losses\n", "loud : soft :: happy : softer\n", "me : you :: happy : 'll\n", "new : old :: happy : passenger_Bill_Zuhoski\n", "even : odd :: happy : strange\n", "offer : refuse :: happy : afraid\n", "impatient : patient :: happy : patients\n", "displease : please :: happy : Please\n", "unsightly : pretty :: happy : very\n", "regularly : irregularly :: happy : elated\n", "nonsense : sense :: happy : feel\n", "singular : plural :: happy : okay\n", "solid : liquid :: happy : glassful\n", "voluntary : compulsory :: happy : hapy\n" ] } ], "source": [ "target = 'happy'\n", "found = set()\n", "for x, y in antonyms:\n", " b, text = analogy(x, y, target)\n", " if b not in found:\n", " found.add(b)\n", " print(text)\n", " b, text = analogy(y, x, target) \n", " if b not in found:\n", " found.add(b)\n", " print(text)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For a visualization, we save the vectors and analogies. The similarity metric between these vectors should be cosine distance rather euclidean." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, @@ -175,25 +220,15 @@ "vectors = []\n", "labels = []\n", "for x, y in antonyms:\n", " xv = model[x]\n", " yv = model[y]\n", " d = (+1*xv + -1*yv) / 2 # weighted average\n", " labels.append(x + ':' + y)\n", " vectors.append(+d)\n", " labels.append(y + ':' + x)\n", " vectors.append(-d)\n", "np.savetxt('vectors.tsv', vectors, fmt='%.8f', delimiter='\\t')\n", "np.savetxt('words.txt', labels, fmt='%s')" ] } ], 
- 
        kylemcdonald revised this gist Aug 23, 2015 . 1 changed file with 0 additions and 0 deletions.There are no files selected for viewingBinary file not shown.
- 
        kylemcdonald created this gist Aug 22, 2015 .There are no files selected for viewingThis file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,221 @@ { "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Using gpu device 0: GeForce GT 750M\n" ] } ], "source": [ "import numpy as np\n", "from gensim.models import Word2Vec\n", "model = Word2Vec.load_word2vec_format('models/GoogleNews-vectors-negative300.bin', binary=True)" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[['happy', 'sad'], ['joyful', 'sad']]" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def unique_pairs(pairs):\n", " unique = []\n", " for x, y in pairs:\n", " pair = [x, y]\n", " pair.sort()\n", " if not pair in unique:\n", " unique.append(pair)\n", " return unique\n", "x = [['sad','happy'],['happy','sad'],['sad','joyful']]\n", "unique_pairs(x)" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": false }, "outputs": [], "source": [ "antonyms = unique_pairs(np.genfromtxt('antonyms/antonyms.txt', dtype='str', delimiter=','))" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'queen'" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def analogy(x, y, a):\n", " return model.most_similar(positive=[a, y], negative=[x], topn=1)[0][0]\n", "analogy('man', 'king', 'woman')" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "absence : presence :: sad : proud\n", "abundant : scarce :: sad : saddening\n", "alive : dead :: sad : Sad\n", "ally : enemy :: sad : horrible\n", "always : never :: sad : sorry\n", "artificial : natural :: sad : saddened\n", "attention : inattention :: sad : reminders_bobbing\n", "attractive : repulsive :: sad : disgusting\n", "bad : good :: sad : wonderful\n", "beautiful : ugly :: sad : disheartening\n", "beginning : ending :: sad : heartbreaking\n", "bend : straighten :: sad : saddest\n", "bitter : sweet :: sad : lovely\n", "bless : curse :: sad : depressing\n", "bravery : cowardice :: sad : shameful\n", "bright : dull :: sad : boring\n", "calm : troubled :: sad : saddens_me\n", "captivity : freedom :: sad : freedoms\n", "cheap : expensive :: sad : distressing\n", "close : distant :: sad : sadder\n", "cruel : kind :: sad : really\n", "despair : hope :: sad : sincerely_hope\n", "cheerful : dismal :: sad : disappointing\n", "difficult : easy :: sad : Aaaaah\n", "economise : waste :: sad : hazardous_waste\n", "empty : full :: sad : glad\n", "calm : excited :: sad : thrilled\n", "contract : expand :: sad : broaden\n", "famous : unknown :: sad : unkown\n", "feeble : powerful :: sad : bittersweet\n", "fresh : stale :: sad : sadly\n", "enemy : friend :: sad : dear_friend\n", "generous : mean :: sad : anymore\n", "glad : sorry :: sad : regretful\n", "great : minute :: sad : minutes\n", "guilty : innocent :: sad : unfortunate\n", "hard : soft :: sad : sorrowful\n", "coward : hero :: sad : heroes\n", "dishonest : honest :: sad : happy\n", "genuine : imitation :: sad : ironic\n", "immense : tiny :: sad : teeny_tiny\n", "large : little :: sad : bit\n", "cry : laugh :: sad : funny\n", "dislike : like :: sad : Turkoman_Shiites\n", "merry : mirthless :: sad : unbearably_sad\n", "new : old :: sad : senselessly_murdered\n", "even : odd :: sad : strange\n", "impatient : patient :: sad : patients\n", "displease : please :: sad : Please\n", "pretty : unsightly :: sad : eyesore\n", "poor : rich :: sad : Meny_Friedman\n", "nonsense : sense :: sad : feel\n", "joy : sorrow :: sad : tragic\n" ] } ], "source": [ "word = 'sad'\n", "found = set()\n", "for x, y in antonyms:\n", " antonym = analogy(x, y, word)\n", " if antonym not in found:\n", " found.add(antonym)\n", " print(x,':',y,'::',word,':',antonym)" ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "collapsed": false }, "outputs": [], "source": [ "vectors = []\n", "labels = []\n", "for x, y in antonyms:\n", " xv = raw_vector(model, x)\n", " yv = raw_vector(model, y)\n", " d = xv - yv\n", " labels.append(x + ':' + y)\n", " vectors.append(+d)\n", " labels.append(y + ':' + x)\n", " vectors.append(-d)" ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "collapsed": false }, "outputs": [], "source": [ "np.savetxt('antonyms/vectors', vectors, fmt='%.8f', delimiter='\\t')\n", "np.savetxt('antonyms/words', labels, fmt='%s')" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.3" } }, "nbformat": 4, "nbformat_minor": 0 } This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,272 @@ absence,presence abundant,scarce accept,refuse accurate,inaccurate admit,deny advance,retreat advantage,disadvantage agree,disagree alive,dead ally,enemy always,never ancient,modern answer,question approached,receded approval,disapproval arrival,departure artificial,natural ascend,descend asleep,awake attack,defense attention,inattention attractive,repulsive backward,forward bad,good beautiful,ugly beginning,ending below,above bend,straighten bent,straight best,worst better,worse big,small bitter,sweet blame,praise bless,curse blunt,sharp bold,timid borrow,lend bravery,cowardice bright,dull broad,narrow build,destroy calm,troubled capable,incapable captivity,freedom careful,rush cellar,attic cheap,expensive clear,vague clever,stupid clockwise,counterclockwise close,distant cold,hot combine,separate come,go comfort,discomfort common,rare conceal,reveal correct,incorrect courage,cowardice courteous,rude cruel,kind cunning,simple dainty,clumsy danger,safety dark,light decrease,increase deep,shallow definite,indefinite demand,supply despair,hope disappear,appear discourage,encourage disease,health dismal,cheerful doctor,patient dry,wet dull,bright dusk,dawn early,late East,West easy,difficult ebb,flow economise,waste employer,employee empty,full encourage,discourage end,beginning entrance,exit excited,calm expand,contract export,import exterior,interior external,internal fail,succeed false,true famous,unknown fast,slow fat,thin feeble,powerful few,many find,lose first,last fold,unfold foolish,wise forelegs,hind_legs forget,remember fortunate,unfortunate found,lost frank,secretive freedom,captivity frequent,seldom fresh,stale friend,enemy full,empty gather,distribute generous,mean gentle,rough giant,dwarf glad,sorry gloomy,cheerful granted,refused great,minute guardian,ward guest,host guilty,innocent happy,sad hard,soft harmful,harmless hasten,dawdle hate,love healthy,unhealthy heavy,light height,depth here,there hero,coward hill,valley hinder,help honest,dishonest horizontal,vertical humble,proud hunger,thirst imitation,genuine immense,tiny imprison,free include,exclude increase,decrease inferior,superior inhabited,uninhabited inhale,exhale inside,outside intelligent,unintelligent intentional,accidental interesting,uninteresting interior,exterior internal,external join,separate junior,senior justice,injustice king,subject knowledge,ignorance land,sea landlord,tenant large,little last,first laugh,cry lawful,unlawful lawyer,client lazy,industrious leader,follower lecturer,student left,right lender,borrower lengthen,shorten less,more light,dark like,dislike likely,unlikely little,large lofty,lowly long,short loss,find loud,soft low,high loyal,disloyal mad,sane magnetize,demagnetize master,servant mature,immature maximum,minimum me,you merry,mirthless minority,majority miser,spendthrift misunderstand,understand narrow,wide near,far neat,untidy new,old night,day noisy,quiet North,South obedient,disobedient odd,even offer,refuse open,shut optimist,pessimist out,in parent,child past,present patient,impatient peace,war permanent,temporary please,displease plentiful,scarce poetry,prose polite,impolite possible,impossible poverty,wealth powerful,feeble pretty,unsightly private,public prudent,imprudent pure,impure qualified,unqualified rapid,slow regularly,irregularly rich,poor right,wrong rigid,pliable rough,smooth satisfactory,unsatisfactory scatter,collect secondhand,new security,insecurity sense,nonsense serious,trivial shopkeeper,customer simple,complicated singular,plural slim,thick sober,drunk solid,liquid sorrow,joy sour,sweet sow,reap speaker,listener stand,lie straight,crooked strong,weak success,failure sunny,cloudy take,give tall,short tame,wild teacher,pupil thick,thin tight,slack top,bottom transparent,opaque truth,lie up,down vacant,occupied valuable,valueless victory,defeat virtue,vice visible,invisible voluntary,compulsory vowel,consonant wax,wane wisdom,folly within,without