Skip to content

Instantly share code, notes, and snippets.

@sergeyprokudin
Last active October 28, 2025 12:31
Show Gist options
  • Select an option

  • Save sergeyprokudin/4a50bf9b75e0559c1fcd2cae860b879e to your computer and use it in GitHub Desktop.

Select an option

Save sergeyprokudin/4a50bf9b75e0559c1fcd2cae860b879e to your computer and use it in GitHub Desktop.
Multivariate Gaussian Negative LogLikelihood Loss Keras
import keras.backend as K
import numpy as np
def gaussian_nll(ytrue, ypreds):
"""Keras implmementation of multivariate Gaussian negative loglikelihood loss function.
This implementation implies diagonal covariance matrix.
Parameters
----------
ytrue: tf.tensor of shape [n_samples, n_dims]
ground truth values
ypreds: tf.tensor of shape [n_samples, n_dims*2]
predicted mu and logsigma values (e.g. by your neural network)
Returns
-------
neg_log_likelihood: float
negative loglikelihood averaged over samples
This loss can then be used as a target loss for any keras model, e.g.:
model.compile(loss=gaussian_nll, optimizer='Adam')
"""
n_dims = int(int(ypreds.shape[1])/2)
mu = ypreds[:, 0:n_dims]
logsigma = ypreds[:, n_dims:]
mse = -0.5*K.sum(K.square((ytrue-mu)/K.exp(logsigma)),axis=1)
sigma_trace = -K.sum(logsigma, axis=1)
log2pi = -0.5*n_dims*np.log(2*np.pi)
log_likelihood = mse+sigma_trace+log2pi
return K.mean(-log_likelihood)
@gledsonmelotti
Copy link

Hello @sergeyprokudin https://github.com/sergeyprokudin, How are you? In fact I would like to use a Gaussian layer in my classification model, which could calculate mean and variance. Is this possible? Thank you very much. Em sex., 15 de mai. de 2020 às 19:44, Sergey Prokudin < [email protected]> escreveu:

@.**** commented on this gist. ------------------------------ Hello sergeyprokudin, Thank you very much. I have other doubte. Don't you use softmax to predict a multiclass? Can I do with sofmax and softplus? mean = model.add(Dense(n_outputs, activation='softmax')) sigma = model.add(Dense(n_outputs, activation='sofplus')) model = Model(x_input, otput([mean,sigma])) I'm afraid you are confusing regression and classification tasks. If you are interested in classification, you don't need Gaussian negative log-likelihood loss defined in this gist - you can use standard categorical crossentropy https://www.tensorflow.org/api_docs/python/tf/keras/losses/CategoricalCrossentropy loss and softmax activations to get valid class probabilities that will sum to 1. You don't need to model sigmas separately as (in theory) your softmax outputs already provide you with confidence estimates. In practice, however, you might want to calibrate them (check this paper https://arxiv.org/abs/1706.04599 for discussion of the topic). — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://gist.github.com/4a50bf9b75e0559c1fcd2cae860b879e#gistcomment-3305800, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIT3EDHN2LNBOW7K6MHAMFLRRWEPLANCNFSM4HQDOABQ .
Gaussian distribution is defined over continuous domain, while in classification you regularly want to model the parameters of some categorical distribution. What would be the implied interpretation of mean and variance in your case?

Yes. I now understand your explanation. In this case, could I consider the average to be the softmax value?
Best hegards.

The class with the maximum probability value is a mode of a corresponding categorical probability distribution, not its mean value which is undefined in this case. Hope this helps!

Okay, now I understand. My doubts were clarified. Thank you very much for the information.
Best Regards.

@aangius
Copy link

aangius commented Jun 4, 2021

Hi,
Why do you use sum in this piece of code
sigma_trace = -K.sum(logsigma, axis=1)
?

@lingleong981130
Copy link

Hi, may I know how to solve this error??
"ValueError: Dimensions must be equal, but are 128 and 64 for '{{node gaussian_nll/sub}} = Sub[T=DT_FLOAT](Cast, gaussian_nll/strided_slice)' with input shapes: [?,128,128,3], [?,64,128,3]."

@13512525
Copy link

Hello, I'd like to ask if the variance value you get here is the logarithm of the variance directly obtained. Then how do you design your variance prediction network? Do you take the logarithmic variance in the code after the prediction?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment