-
-
Save jansturm1/735b56c85b7dfdbbde35e053c46533d4 to your computer and use it in GitHub Desktop.
Revisions
-
mdml revised this gist
Jan 7, 2014 . 3 changed files with 30 additions and 30 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,3 +1,3 @@ A dendrogram is a common way to represent hierarchical data. For Python users, [Scipy](http://www.scipy.org/) has a [hierarchical clustering module](http://docs.scipy.org/doc/scipy/reference/cluster.hierarchy.html) that performs hierarchical clustering and outputs the results as dendrogram plots via matplotlib. When it's time to make a prettier, more customized, or web-version of the dendogram, however, it can be tricky to use Scipy's dendrogram to create a suitable visualization. My preferred method of visualizing data -- especially on the web -- is [D3](http://d3js.org/). This example includes a script to convert a Scipy dendrogram into JSON format used by D3's [`cluster`](https://github.com/mbostock/d3/wiki/Cluster-Layout) method. In the example, I cluster six genes by their expression values from two experiments. You can easily replace that data with your own, larger data set, to harness the power of both Scipy and D3 for analyzing hierarchical data. The D3 code I used to generate this example is straight from Mike Bostock's [dendrogram example](](http://bl.ocks.org/mbostock/4063570): just replace the JSON file `flare.json` with your own (and maybe tweak the width/height). This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -4,50 +4,50 @@ "children": [ { "children": [], "name": "f" }, { "children": [ { "children": [], "name": "b" }, { "children": [ { "children": [ { "children": [], "name": "c" }, { "children": [], "name": "d" } ], "name": "c-d" }, { "children": [ { "children": [], "name": "a" }, { "children": [], "name": "e" } ], "name": "a-e" } ], "name": "a-c-d-e" } ], "name": "a-b-c-d-e" } ], "name": "a-b-c-d-e-f" } ], "name": "Root1" This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -8,23 +8,23 @@ import json import matplotlib.pyplot as plt # Example data: gene expression geneExp = {'genes' : ['a', 'b', 'c', 'd', 'e', 'f'], 'exp1': [-2.2, 5.6, 0.9, -0.23, -3, 0.1], 'exp2': [5.4, -0.5, 2.33, 3.1, 4.1, -3.2] } df = pd.DataFrame( geneExp ) # Determine distances (default is Euclidean) dataMatrix = np.array( df[['exp1', 'exp2']] ) distMat = scipy.spatial.distance.pdist( dataMatrix ) # Cluster hierarchicaly using scipy clusters = scipy.cluster.hierarchy.linkage(distMat, method='single') T = scipy.cluster.hierarchy.to_tree( clusters , rd=False ) # Create dictionary for labeling nodes by their IDs labels = list(df.genes) id2name = dict(zip(range(len(labels)), labels)) # Draw dendrogram using matplotlib to scipy-dendrogram.pdf -
mdml revised this gist
Jan 7, 2014 . 3 changed files with 70 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,3 +1,3 @@ A dendrogram is a common way to represent hierarchical data. For Python users, [Scipy](http://www.scipy.org/) has a [hierarchical clustering module](http://docs.scipy.org/doc/scipy/reference/cluster.hierarchy.html) that performs hierarchical clustering and outputs the results as dendrogram plots via matplotlib. When it's time to make a prettier, more customized, or web-version of the dendogram, however, it can be tricky to use Scipy's dendrogram to create a suitable visualization. My preferred method of visualizing data -- especially on the web -- is [D3](http://d3js.org/). This example includes a script to convert a Scipy dendrogram into JSON format used by D3's [`cluster`](https://github.com/mbostock/d3/wiki/Cluster-Layout) method. In the example, I cluster six genes by their expression values from two experiments. You can easily replace that data with your own, larger data set, to harness the power of both Scipy and D3 for analyzing hierarchical data. The D3 code I used to generate this example is straight from Mike Bostock's [dendrogram example](](http://bl.ocks.org/mbostock/4063570): just replace the JSON file `flare.json` with your own (and maybe tweak the width/height). This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,69 @@ <!DOCTYPE html> <meta charset="utf-8"> <style> .node circle { fill: #fff; stroke: steelblue; stroke-width: 1.5px; } .node { font: 10px sans-serif; } .link { fill: none; stroke: #ccc; stroke-width: 1.5px; } </style> <body> <script src="http://d3js.org/d3.v3.min.js"></script> <script> var width = 800, height = 550; var cluster = d3.layout.cluster() .size([height, width - 160]); var diagonal = d3.svg.diagonal() .projection(function(d) { return [d.y, d.x]; }); var svg = d3.select("body").append("svg") .attr("width", width) .attr("height", height) .append("g") .attr("transform", "translate(40,0)"); d3.json("d3-dendrogram.json", function(error, root) { var nodes = cluster.nodes(root), links = cluster.links(nodes); var link = svg.selectAll(".link") .data(links) .enter().append("path") .attr("class", "link") .attr("d", diagonal); var node = svg.selectAll(".node") .data(nodes) .enter().append("g") .attr("class", "node") .attr("transform", function(d) { return "translate(" + d.y + "," + d.x + ")"; }) node.append("circle") .attr("r", 4.5); node.append("text") .attr("dx", function(d) { return d.children ? -8 : 8; }) .attr("dy", 3) .style("text-anchor", function(d) { return d.children ? "end" : "start"; }) .text(function(d) { return d.name; }); }); d3.select(self.frameElement).style("height", height + "px"); </script> LoadingSorry, something went wrong. Reload?Sorry, we cannot display this file.Sorry, this file is invalid so it cannot be displayed. -
mdml revised this gist
Nov 19, 2013 . No changes.There are no files selected for viewing
-
mdml revised this gist
Nov 19, 2013 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,3 +1,3 @@ A dendrogram is a common way to represent hierarchical data. For Python users, [Scipy](http://www.scipy.org/) has a [hierarchical clustering module](http://docs.scipy.org/doc/scipy/reference/cluster.hierarchy.html) that performs hierarchical clustering and outputs the results as dendrogram plots via matplotlib. When it's time to make a prettier, more customized, or web-version of the dendogram, however, it can be tricky to use Scipy's dendrogram to create a suitable visualization. My preferred method of visualizing data -- especially on the web -- is [D3](http://d3js.org/). This example includes a script to convert a Scipy dendrogram into JSON format used by D3's [`cluster`](https://github.com/mbostock/d3/wiki/Cluster-Layout) method. In the example, I cluster six genes by their expression values from two experiments. You can easily replace that data with your own, larger data set, to harness the power of both Scipy and D3 for analyzing hierarchical data. The D3 code I used to generate this example is straight from Mike Bostock's [dendrogram example](http://bl.ocks.org/mbostock/4063570): just replace the JSON file `flare.json` with your own. -
mdml revised this gist
Nov 19, 2013 . 1 changed file with 20 additions and 20 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -4,50 +4,50 @@ "children": [ { "children": [], "name": "f" }, { "children": [ { "children": [], "name": "b" }, { "children": [ { "children": [ { "children": [], "name": "c" }, { "children": [], "name": "d" } ], "name": "c-d" }, { "children": [ { "children": [], "name": "a" }, { "children": [], "name": "e" } ], "name": "a-e" } ], "name": "a-c-d-e" } ], "name": "a-b-c-d-e" } ], "name": "a-b-c-d-e-f" } ], "name": "Root1" -
mdml revised this gist
Nov 19, 2013 . 1 changed file with 9 additions and 9 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -8,23 +8,23 @@ import json import matplotlib.pyplot as plt # Example data: gene expression geneExp = {'genes' : ['a', 'b', 'c', 'd', 'e', 'f'], 'exp1': [-2.2, 5.6, 0.9, -0.23, -3, 0.1], 'exp2': [5.4, -0.5, 2.33, 3.1, 4.1, -3.2] } df = pd.DataFrame( geneExp ) # Determine distances (default is Euclidean) dataMatrix = np.array( df[['exp1', 'exp2']] ) distMat = scipy.spatial.distance.pdist( dataMatrix ) # Cluster hierarchicaly using scipy clusters = scipy.cluster.hierarchy.linkage(distMat, method='single') T = scipy.cluster.hierarchy.to_tree( clusters , rd=False ) # Create dictionary for labeling nodes by their IDs labels = list(df.genes) id2name = dict(zip(range(len(labels)), labels)) # Draw dendrogram using matplotlib to scipy-dendrogram.pdf -
mdml revised this gist
Nov 19, 2013 . 1 changed file with 0 additions and 0 deletions.There are no files selected for viewing
LoadingSorry, something went wrong. Reload?Sorry, we cannot display this file.Sorry, this file is invalid so it cannot be displayed. -
mdml revised this gist
Nov 19, 2013 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,3 +1,3 @@ A dendrogram is a common way to represent hierarchical data. For Python users, [Scipy](http://www.scipy.org/) has a [hierarchical clustering module](http://docs.scipy.org/doc/scipy/reference/cluster.hierarchy.html) that performs hierarchical clustering and outputs the results as dendrogram plots via matplotlib. When it's time to make a prettier, more customized, or web-version of the dendogram, however, it can be tricky to use Scipy's dendrogram to create a suitable visualization. My preferred method of visualizing data -- especially on the web -- is [D3](http://d3js.org/). This example includes a script to convert a Scipy dendrogram into JSON format used by D3's [`cluster`](https://github.com/mbostock/d3/wiki/Cluster-Layout) method. In the example, I cluster six genes by their expression values from two experiments. You can easily replace that data with your own, larger data set, to harness the power of both Scipy and D3 for analyzing hierarchical data. The D3 code I used to generate this example is straight from Mike Bostock's [dendrogram example](](http://bl.ocks.org/mbostock/4063570): just replace the JSON file `flare.json` with your own. -
mdml renamed this gist
Nov 19, 2013 . 1 changed file with 0 additions and 0 deletions.There are no files selected for viewing
File renamed without changes. -
mdml revised this gist
Nov 19, 2013 . 1 changed file with 3 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,3 @@ A dendrogram is a common way to represent hierarchical data. For Python users, [Scipy](http://www.scipy.org/) has [hierarchical clustering module](http://docs.scipy.org/doc/scipy/reference/cluster.hierarchy.html) that boths performs hierarchical clustering and outputs the results as dendrogram plots via matplotlib. When it's time to make a prettier, more customized, or web-version of the dendogram, however, it can be tricky to convert the Scipy dendrogram into a suitable format. My preferred method of visualizing data -- especially on the web -- is [D3](http://d3js.org/). This example includes a script to convert a Scipy dendrogram into JSON format used by D3's [`cluster`](https://github.com/mbostock/d3/wiki/Cluster-Layout) method. In the example, I cluster six genes by their expression values from two experiments. You can easily replace that data with your own, larger data set, to harness the power of both Scipy and D3 for analyzing hierarchical data. The D3 code I used to generate this example is straight from Mike Bostock's [dendrogram example](](http://bl.ocks.org/mbostock/4063570): just replace the JSON file `flare.json` with your own. -
mdml created this gist
Nov 18, 2013 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,54 @@ { "children": [ { "children": [ { "children": [], "name": "B" }, { "children": [ { "children": [ { "children": [], "name": "A" }, { "children": [], "name": "C" } ], "name": "A-C" }, { "children": [ { "children": [], "name": "F" }, { "children": [ { "children": [], "name": "D" }, { "children": [], "name": "E" } ], "name": "D-E" } ], "name": "D-E-F" } ], "name": "A-C-D-E-F" } ], "name": "A-B-C-D-E-F" } ], "name": "Root1" } This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,70 @@ #!/usr/bin/python # Load required modules import pandas as pd import scipy.spatial import scipy.cluster import numpy as np import json import matplotlib.pyplot as plt # Create test data d = { 'employee' : ['A', 'B', 'C', 'D', 'E', 'F'], 'skillX': [2,8,3,6,8,10], 'skillY': [8,15,6,9,7,10] } d1 = pd.DataFrame(d) # Determine distances (default is Euclidean) distMat = scipy.spatial.distance.pdist(np.array(d1[['skillX', 'skillY']])) # Cluster hierarchicaly using scipy clusters = scipy.cluster.hierarchy.linkage(distMat, method='single') T = scipy.cluster.hierarchy.to_tree( clusters , rd=False ) # Create dictionary for labeling nodes by their IDs labels = list(d1.employee) id2name = dict(zip(range(len(labels)), labels)) # Draw dendrogram using matplotlib to scipy-dendrogram.pdf scipy.cluster.hierarchy.dendrogram(clusters, labels=labels, orientation='right') plt.savefig("scipy-dendrogram.png") # Create a nested dictionary from the ClusterNode's returned by SciPy def add_node(node, parent ): # First create the new node and append it to its parent's children newNode = dict( node_id=node.id, children=[] ) parent["children"].append( newNode ) # Recursively add the current node's children if node.left: add_node( node.left, newNode ) if node.right: add_node( node.right, newNode ) # Initialize nested dictionary for d3, then recursively iterate through tree d3Dendro = dict(children=[], name="Root1") add_node( T, d3Dendro ) # Label each node with the names of each leaf in its subtree def label_tree( n ): # If the node is a leaf, then we have its name if len(n["children"]) == 0: leafNames = [ id2name[n["node_id"]] ] # If not, flatten all the leaves in the node's subtree else: leafNames = reduce(lambda ls, c: ls + label_tree(c), n["children"], []) # Delete the node id since we don't need it anymore and # it makes for cleaner JSON del n["node_id"] # Labeling convention: "-"-separated leaf names n["name"] = name = "-".join(sorted(map(str, leafNames))) return leafNames label_tree( d3Dendro["children"][0] ) # Output to JSON json.dump(d3Dendro, open("d3-dendrogram.json", "w"), sort_keys=True, indent=4)