-
Star
(181)
You must be signed in to star a gist -
Fork
(29)
You must be signed in to fork a gist
-
-
Save Tushar-N/dfca335e370a2bc3bc79876e6270099e to your computer and use it in GitHub Desktop.
| import torch | |
| import torch.nn as nn | |
| from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence | |
| seqs = ['gigantic_string','tiny_str','medium_str'] | |
| # make <pad> idx 0 | |
| vocab = ['<pad>'] + sorted(set(''.join(seqs))) | |
| # make model | |
| embed = nn.Embedding(len(vocab), 10).cuda() | |
| lstm = nn.LSTM(10, 5).cuda() | |
| vectorized_seqs = [[vocab.index(tok) for tok in seq] for seq in seqs] | |
| # get the length of each seq in your batch | |
| seq_lengths = torch.LongTensor([len(seq) for seq in vectorized_seqs]).cuda() | |
| # dump padding everywhere, and place seqs on the left. | |
| # NOTE: you only need a tensor as big as your longest sequence | |
| seq_tensor = torch.zeros((len(vectorized_seqs), seq_lengths.max())).long().cuda() | |
| for idx, (seq, seqlen) in enumerate(zip(vectorized_seqs, seq_lengths)): | |
| seq_tensor[idx, :seqlen] = torch.LongTensor(seq) | |
| # SORT YOUR TENSORS BY LENGTH! | |
| seq_lengths, perm_idx = seq_lengths.sort(0, descending=True) | |
| seq_tensor = seq_tensor[perm_idx] | |
| # utils.rnn lets you give (B,L,D) tensors where B is the batch size, L is the maxlength, if you use batch_first=True | |
| # Otherwise, give (L,B,D) tensors | |
| seq_tensor = seq_tensor.transpose(0,1) # (B,L,D) -> (L,B,D) | |
| # embed your sequences | |
| seq_tensor = embed(seq_tensor) | |
| # pack them up nicely | |
| packed_input = pack_padded_sequence(seq_tensor, seq_lengths.cpu().numpy()) | |
| # throw them through your LSTM (remember to give batch_first=True here if you packed with it) | |
| packed_output, (ht, ct) = lstm(packed_input) | |
| # unpack your output if required | |
| output, _ = pad_packed_sequence(packed_output) | |
| print (output) | |
| # Or if you just want the final hidden state? | |
| print (ht[-1]) | |
| # REMEMBER: Your outputs are sorted. If you want the original ordering | |
| # back (to compare to some gt labels) unsort them | |
| _, unperm_idx = perm_idx.sort(0) | |
| output = output[unperm_idx] | |
| print (output) |
Great demo code! So you don't need to bother with padding_idx of Embedding to ignore the zeros, because the packing does not even show them to the lstm?
Very cool!
Help a lot! Thanks!
Thanks!
Can we feed (L,B,D) dimension to embedding layer? The docs say the first dimension should be mini batch size.
@nikhiltitus You can. Embedding expects a (N,W) tensor, but it pulls out an embedding for each element anyway.
Hi, I don't understand this part,
# throw them through your LSTM (remember to give batch_first=True here if you packed with it)
packed_output, (ht, ct) = lstm(packed_input)
I used packed_input = pack_padded_sequence(seq_tensor, seq_lengths.numpy() , batch_first=True ), then I tried packed_output, (ht, ct) = lstm(packed_input,batch_first=True) and get
TypeError: forward() got an unexpected keyword argument 'batch_first'
Thanks.
@datduong
batch_first argument is only for initialization of LSTM, forward() doesn't need that.
I ran this up and got the following error in python 3...
TypeError: torch.cuda.LongTensor constructor received an invalid combination of arguments - got (map), but expected one of...
The fix was to change line 24
seq_lengths = torch.cuda.LongTensor(map(len, vectorized_seqs))
to
seq_lengths = torch.cuda.LongTensor(list(map(len, vectorized_seqs)))
Guess they messed with the way maps work
seq_lengths = torch.LongTensor([len(seq) for seq in vectorized_seqs]) also works
Great demo, very helpful. I also used this way in my work. Thanks
It is really helpful!! Thanks very much!!
Thank you!
Great understanding
Agree that line 24 should be changed
@ngarneau Your demo was really helpful. Thank you very much !!
# REMEMBER: Your outputs are sorted. If you want the original ordering
# back (to compare to some gt labels) unsort them
_, unperm_idx = perm_idx.sort(0)
output = output[unperm_idx]
print (output)
if you want to get the original ordering, you should add script "output = output.transpose(1, 0)"
otherwise, the index will be out bounds of dimenssion of outout
Since the perm_idx is obtained by lengths, should we use the following code to do reverse?
output = output.transpose(0, 1) # L x B x D -> B x L x D
hidden = hidden.transpose(0, 1)
output = output[unperm_idx]
hidden = hidden[unperm_idx]
Just like @DarryO and @icesuns said, if you want the original ordering, transpose output first.
Since pytorch 1.1.0, sorting the sequences by their lengths is no longer needed: pytorch/pytorch#15225.
As an exercise, I tried to replicate this and the version by @HarshTrivedi, maybe it would be useful to someone (although I recommend the two mentioned above more): https://gist.github.com/MikulasZelinka/9fce4ed47ae74fca454e88a39f8d911a (also includes a very basic Dataset and DataLoader example).
great code!
but i wonder why pytorch does not just design a util function which just receives a fixed-length list of variable-length sequences and output a padded&packed variable...
it's so complicated now, though