Skip to content

Instantly share code, notes, and snippets.

@sarkaraj
Forked from tuxdna/LICENSE.txt
Created April 25, 2017 13:57
Show Gist options
  • Save sarkaraj/e65df2954a27de22fee3c95a40308f6c to your computer and use it in GitHub Desktop.
Save sarkaraj/e65df2954a27de22fee3c95a40308f6c to your computer and use it in GitHub Desktop.
Most used or useful ML Algorithms

Ranking most-used or useful ML Algorithms

Generic Problem formulation

Finding Aggregate Ranks of same items ranked by different people

Given a list of ranked items of a particular domain, how do we aggregate various rankings to find a most popular rank. For example if we have a list of ML algorithms say:

1 SVM
2 LDA
3 KNN

and another list which says

1 KNN
2 LDA

and so on we have multiple such lists. How do we find an overall rank for the different ML algorithms?

In this article I am using Maximum-Weight Bipartite-Matching with max-cardinality to find a few ranked items. This is formulated as a Graph problem. Basically

  • every item ( LDA, SVM, KNN etc) is assigned a node and
  • every rank ( 1, 2, 3 ...) is assigned a node
  • we create an edge between item and rank, also assign a a weight to this edge -- how many times this item was ranked with a particular rank value.

Using this Graph, we can treat the above problem as Maximal Matching problem in a Weighted Bipartite Graph.

How to run the code?

sudo dnf install -y numpy python2-networkx
python algos.py

Most used/useful ML Algorithms

Here are the results:

Rank  1: Linear Regression
Rank  2: k-Means
Rank  3: SVM
Rank  4: Apriori
Rank  5: kNN
Rank  6: SVD
Rank  7: Decision Tree
Rank  8: Naive Bayes
Rank  9: ANN
Rank 10: Bayesian Networks
Rank 11: Logistic Regression
Rank 12: Boosting
Rank 13: Gaussian Processes
Rank 14: HDPs
Rank 15: Logit Boost
Rank 16: Model Tree
Rank 17: PLS
Rank 18: Random Forests
Rank 19: Ridge Regression

References

import itertools
import networkx as nx
import numpy as np
# https://www.dezyre.com/article/top-10-machine-learning-algorithms/202
algos_list1 = [
"Naive Bayes",
"k-Means",
"SVM",
"Apriori",
"Linear Regression",
"Logistic Regression",
"ANN",
"Random Forests",
"Decision Tree",
"kNN"
]
# https://www.quora.com/What-are-the-top-10-data-mining-or-machine-learning-algorithms
algos_list2 = [
"Decision Trees ",
"k-Means",
"SVM",
"Apriori",
"Expectation Maximization",
"PageRank",
"AdaBoost",
"kNN",
"Naive Bayes",
]
algos_list3 = [
"Naive Bayes",
"k-Means",
"Kernel PCA",
"Linear Regression",
"kNN",
"NNMF",
"SVM",
"Dimensionality Reduction",
"SVD",
"Decision Tree",
"Bootstapped SVM",
"Decision Tree",
"Gaussian Processes",
"Logistic Regression",
"Logit Boost",
"Model Tree",
"PLS",
"Random Forests",
"Ridge Regression",
]
algos_list4 = [
"Linear Regression",
"Logistic Regression",
"k-Means",
"SVM",
"Random Forests",
"SVD",
"Decision Tree",
"Naive Bayes",
"ANN",
"Bayesian Networks",
"Elastic Nets",
"LDA",
"Conditional Random Fields",
"HDPs"
]
algos_list6 = [
"SVM",
"ANN",
"Logistic Regression",
"Naive Bayes",
"KNN",
"Random Forests",
"Decision Tree",
"Bagged Tree"
]
algos_list7 = [
"Linear Regression",
"Logistic Regression",
"SVM",
"SVD",
"PCA",
"Kernel PCA",
"k-Means",
"Decision Trees",
"Random Forests",
"Neural Networks",
"Regularization",
"Boosting",
"Naive Bayes"
]
algos_list8 = [
"Convolution Neural Network",
"RNN",
"Deep Q-Learning",
"Deep Neural Network",
"Random Forests",
"Probabilistic Graphical Models",
"Compressed Sensing",
"Kernel Machines",
"Counter Factual Regret Minimization",
"Gaussian Processes",
]
algos_list9 = [
"k-Means",
"KNN",
"Dimensionality Reduction",
"PCA",
"Collaborative filtering",
"Association Rules",
"Logistic Regression",
"LDA",
"Shortest Path",
"PageRank"
]
algos_list10 = [
"Linear Regression",
"Logistic Regression",
"k-Means",
"SVM",
"Random Forests",
"SVD",
"Boosted Trees",
"Naive Bayes",
"ANN",
"AdaBoost"
]
algos_list11 = [
"Apriori",
"FPGrowth",
"GSP",
"PrefixSpan",
"k-Means",
"Regression Tree",
"Decision Tree",
"SVM",
"PageRank",
"Naive Bayes"
]
algos_list12 = [
"Decision Tree",
"Apriori",
"ANN",
"SVM",
"SOM",
"Genetic Algorithms",
"Naive Bayes",
"Ant Colony Optimization",
"Linear Regression"
]
# https://bickson.blogspot.in/2011/06/what-are-most-widely-deployed-machine.html?spref=tw
algos_list5 = [
"SVD",
"k-Means",
"Naive Bayes",
"Dirichlet clustering",
"Matrix Factorization",
"Frequent Pattern Mining",
"LDA",
"Expectation Maximization",
"SVM",
"Decision Tree",
"Logistic Regression",
"Random Forests"
]
all_algos = [
algos_list1,
algos_list2,
algos_list3,
algos_list4,
algos_list5,
algos_list6,
algos_list7,
algos_list8,
algos_list9,
algos_list10,
algos_list11,
algos_list12
]
items_list = list(set(itertools.chain(*all_algos)))
id2items_dict = {i+1: x for i,x in enumerate(items_list)}
items2id_dict = {x: i+1 for i,x in enumerate(items_list)}
N = len(items_list)
rank_counts = np.zeros(shape=(N, N))
for items in all_algos:
for i, item in enumerate(items):
rank = i+1
rank_counts[rank-1, items2id_dict[item]-1] += 1
G = nx.Graph()
for item in items_list:
for i in range(N):
rank = i+1
weight = rank_counts[rank-1, items2id_dict[item]-1]
if weight > 0.0:
# print((item, rank, {'weight': weight}))
G.add_edge(item, rank, attr_dict={'weight': weight})
matchings = nx.max_weight_matching(G, maxcardinality=True)
for i in range(1,20):
print("Rank %2d: %s" % (i, matchings[i]))
Copyright 2016 Saleem Ansari <[email protected]>
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment