Created
January 25, 2019 16:07
-
-
Save rebeccabilbro/2c7bb4d1acfbcdcf9156e7b9b7577cba to your computer and use it in GitHub Desktop.
For converting Python 2 pickles to Python 3
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # kimchi.py | |
| # For converting Python 2 pickles to Python 3 | |
| import os | |
| import dill | |
| import pickle | |
| import argparse | |
| def convert(old_pkl): | |
| """ | |
| Convert a Python 2 pickle to Python 3 | |
| """ | |
| # Make a name for the new pickle | |
| new_pkl = os.path.splitext(os.path.basename(old_pkl))[0]+"_p3.pkl" | |
| # Convert Python 2 "ObjectType" to Python 3 object | |
| dill._dill._reverse_typemap["ObjectType"] = object | |
| # Open the pickle using latin1 encoding | |
| with open(old_pkl, "rb") as f: | |
| loaded = pickle.load(f, encoding="latin1") | |
| # Re-save as Python 3 pickle | |
| with open(new_pkl, "wb") as outfile: | |
| pickle.dump(loaded, outfile) | |
| if __name__ == "__main__": | |
| parser = argparse.ArgumentParser( | |
| description="Convert a Python 2 pickle to Python 3" | |
| ) | |
| parser.add_argument("infile", help="Python 2 pickle filename") | |
| args = parser.parse_args() | |
| convert(args.infile) |
Author
In my case I have to convert a joblib jl saved file, how to adapt the script for joblib?
I have tried to map b'ObjectType' but it seems it is not enough:
import os
import joblib
import dill
dill._dill._reverse_typemap["ObjectType"] = object
DATA_PATH = '/root'
tfidf_vectorizer, _ = joblib.load(os.path.join(DATA_PATH,'nmf_topic_model/tfidf_mpd.jl'))
nmf_model, _ = joblib.load(os.path.join(DATA_PATH,'nmf_topic_model/nmf_mpd.jl'))I have tried to tap the joblib file .../joblib/numpy_pickle.py so that the class NumpyUnpickler has a ovverride load method like
def load(self):
eggs = pickle.load(self.file_handle, encoding='latin1')
return eggsand I have put
import dill
dill._dill._reverse_typemap["ObjectType"] = objectAnother attempt was to add the encoding directly in the NumpyUnpickler init:
def __init__(self, filename, file_handle, mmap_mode=None):
Unpickler.__init__(self, self.file_handle, encoding="latin1")
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi @nbecker; this gist is associated with a longer blog post, which may answer your question!