Skip to content

Instantly share code, notes, and snippets.

@snehalnair
Created July 5, 2020 09:41
Show Gist options
  • Select an option

  • Save snehalnair/00e29d3199cbba6470a46dec82ebe7c5 to your computer and use it in GitHub Desktop.

Select an option

Save snehalnair/00e29d3199cbba6470a46dec82ebe7c5 to your computer and use it in GitHub Desktop.
def get_mat_sparsity(ratings):
# Count the total number of ratings in the dataset
count_nonzero = ratings.select("rating").count()
# Count the number of distinct userIds and distinct movieIds
total_elements = ratings.select("userId").distinct().count() * ratings.select("movieId").distinct().count()
# Divide the numerator by the denominator
sparsity = (1.0 - (count_nonzero *1.0)/total_elements)*100
print("The ratings dataframe is ", "%.2f" % sparsity + "% sparse.")
get_mat_sparsity(ratings)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment