Skip to content

Instantly share code, notes, and snippets.

@jansturm1
Forked from mdml/README.md
Created May 30, 2022 11:56
Show Gist options
  • Save jansturm1/735b56c85b7dfdbbde35e053c46533d4 to your computer and use it in GitHub Desktop.
Save jansturm1/735b56c85b7dfdbbde35e053c46533d4 to your computer and use it in GitHub Desktop.

Revisions

  1. @mdml mdml revised this gist Jan 7, 2014. 3 changed files with 30 additions and 30 deletions.
    2 changes: 1 addition & 1 deletion README.md
    Original file line number Diff line number Diff line change
    @@ -1,3 +1,3 @@
    A dendrogram is a common way to represent hierarchical data. For Python users, [Scipy](http://www.scipy.org/) has a [hierarchical clustering module](http://docs.scipy.org/doc/scipy/reference/cluster.hierarchy.html) that performs hierarchical clustering and outputs the results as dendrogram plots via matplotlib. When it's time to make a prettier, more customized, or web-version of the dendogram, however, it can be tricky to use Scipy's dendrogram to create a suitable visualization. My preferred method of visualizing data -- especially on the web -- is [D3](http://d3js.org/). This example includes a script to convert a Scipy dendrogram into JSON format used by D3's [`cluster`](https://github.com/mbostock/d3/wiki/Cluster-Layout) method.

    In the example, I cluster six genes by their expression values from two experiments. You can easily replace that data with your own, larger data set, to harness the power of both Scipy and D3 for analyzing hierarchical data. The D3 code I used to generate this example is straight from Mike Bostock's [dendrogram example](](http://bl.ocks.org/mbostock/4063570): just replace the JSON file `flare.json` with your own (and maybe tweak the width/height).
    In the example, I cluster six genes by their expression values from two experiments. You can easily replace that data with your own, larger data set, to harness the power of both Scipy and D3 for analyzing hierarchical data. The D3 code I used to generate this example is straight from Mike Bostock's [dendrogram example](](http://bl.ocks.org/mbostock/4063570): just replace the JSON file `flare.json` with your own (and maybe tweak the width/height).
    40 changes: 20 additions & 20 deletions d3-dendrogram.json
    Original file line number Diff line number Diff line change
    @@ -4,50 +4,50 @@
    "children": [
    {
    "children": [],
    "name": "B"
    "name": "f"
    },
    {
    "children": [
    {
    "children": [
    {
    "children": [],
    "name": "A"
    },
    {
    "children": [],
    "name": "C"
    }
    ],
    "name": "A-C"
    "children": [],
    "name": "b"
    },
    {
    "children": [
    {
    "children": [],
    "name": "F"
    "children": [
    {
    "children": [],
    "name": "c"
    },
    {
    "children": [],
    "name": "d"
    }
    ],
    "name": "c-d"
    },
    {
    "children": [
    {
    "children": [],
    "name": "D"
    "name": "a"
    },
    {
    "children": [],
    "name": "E"
    "name": "e"
    }
    ],
    "name": "D-E"
    "name": "a-e"
    }
    ],
    "name": "D-E-F"
    "name": "a-c-d-e"
    }
    ],
    "name": "A-C-D-E-F"
    "name": "a-b-c-d-e"
    }
    ],
    "name": "A-B-C-D-E-F"
    "name": "a-b-c-d-e-f"
    }
    ],
    "name": "Root1"
    18 changes: 9 additions & 9 deletions dendro_scipy2d3.py
    Original file line number Diff line number Diff line change
    @@ -8,23 +8,23 @@
    import json
    import matplotlib.pyplot as plt

    # Create test data
    d = {
    'employee' : ['A', 'B', 'C', 'D', 'E', 'F'],
    'skillX': [2,8,3,6,8,10],
    'skillY': [8,15,6,9,7,10]
    }
    d1 = pd.DataFrame(d)
    # Example data: gene expression
    geneExp = {'genes' : ['a', 'b', 'c', 'd', 'e', 'f'],
    'exp1': [-2.2, 5.6, 0.9, -0.23, -3, 0.1],
    'exp2': [5.4, -0.5, 2.33, 3.1, 4.1, -3.2]
    }
    df = pd.DataFrame( geneExp )

    # Determine distances (default is Euclidean)
    distMat = scipy.spatial.distance.pdist(np.array(d1[['skillX', 'skillY']]))
    dataMatrix = np.array( df[['exp1', 'exp2']] )
    distMat = scipy.spatial.distance.pdist( dataMatrix )

    # Cluster hierarchicaly using scipy
    clusters = scipy.cluster.hierarchy.linkage(distMat, method='single')
    T = scipy.cluster.hierarchy.to_tree( clusters , rd=False )

    # Create dictionary for labeling nodes by their IDs
    labels = list(d1.employee)
    labels = list(df.genes)
    id2name = dict(zip(range(len(labels)), labels))

    # Draw dendrogram using matplotlib to scipy-dendrogram.pdf
  2. @mdml mdml revised this gist Jan 7, 2014. 3 changed files with 70 additions and 1 deletion.
    2 changes: 1 addition & 1 deletion README.md
    Original file line number Diff line number Diff line change
    @@ -1,3 +1,3 @@
    A dendrogram is a common way to represent hierarchical data. For Python users, [Scipy](http://www.scipy.org/) has a [hierarchical clustering module](http://docs.scipy.org/doc/scipy/reference/cluster.hierarchy.html) that performs hierarchical clustering and outputs the results as dendrogram plots via matplotlib. When it's time to make a prettier, more customized, or web-version of the dendogram, however, it can be tricky to use Scipy's dendrogram to create a suitable visualization. My preferred method of visualizing data -- especially on the web -- is [D3](http://d3js.org/). This example includes a script to convert a Scipy dendrogram into JSON format used by D3's [`cluster`](https://github.com/mbostock/d3/wiki/Cluster-Layout) method.

    In the example, I cluster six genes by their expression values from two experiments. You can easily replace that data with your own, larger data set, to harness the power of both Scipy and D3 for analyzing hierarchical data. The D3 code I used to generate this example is straight from Mike Bostock's [dendrogram example](](http://bl.ocks.org/mbostock/4063570): just replace the JSON file `flare.json` with your own.
    In the example, I cluster six genes by their expression values from two experiments. You can easily replace that data with your own, larger data set, to harness the power of both Scipy and D3 for analyzing hierarchical data. The D3 code I used to generate this example is straight from Mike Bostock's [dendrogram example](](http://bl.ocks.org/mbostock/4063570): just replace the JSON file `flare.json` with your own (and maybe tweak the width/height).
    69 changes: 69 additions & 0 deletions index.html
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,69 @@
    <!DOCTYPE html>
    <meta charset="utf-8">
    <style>

    .node circle {
    fill: #fff;
    stroke: steelblue;
    stroke-width: 1.5px;
    }

    .node {
    font: 10px sans-serif;
    }

    .link {
    fill: none;
    stroke: #ccc;
    stroke-width: 1.5px;
    }

    </style>
    <body>
    <script src="http://d3js.org/d3.v3.min.js"></script>
    <script>

    var width = 800,
    height = 550;

    var cluster = d3.layout.cluster()
    .size([height, width - 160]);

    var diagonal = d3.svg.diagonal()
    .projection(function(d) { return [d.y, d.x]; });

    var svg = d3.select("body").append("svg")
    .attr("width", width)
    .attr("height", height)
    .append("g")
    .attr("transform", "translate(40,0)");

    d3.json("d3-dendrogram.json", function(error, root) {
    var nodes = cluster.nodes(root),
    links = cluster.links(nodes);

    var link = svg.selectAll(".link")
    .data(links)
    .enter().append("path")
    .attr("class", "link")
    .attr("d", diagonal);

    var node = svg.selectAll(".node")
    .data(nodes)
    .enter().append("g")
    .attr("class", "node")
    .attr("transform", function(d) { return "translate(" + d.y + "," + d.x + ")"; })

    node.append("circle")
    .attr("r", 4.5);

    node.append("text")
    .attr("dx", function(d) { return d.children ? -8 : 8; })
    .attr("dy", 3)
    .style("text-anchor", function(d) { return d.children ? "end" : "start"; })
    .text(function(d) { return d.name; });
    });

    d3.select(self.frameElement).style("height", height + "px");

    </script>
    Binary file modified thumbnail.png
    Loading
    Sorry, something went wrong. Reload?
    Sorry, we cannot display this file.
    Sorry, this file is invalid so it cannot be displayed.
  3. @mdml mdml revised this gist Nov 19, 2013. No changes.
  4. @mdml mdml revised this gist Nov 19, 2013. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion README.md
    Original file line number Diff line number Diff line change
    @@ -1,3 +1,3 @@
    A dendrogram is a common way to represent hierarchical data. For Python users, [Scipy](http://www.scipy.org/) has a [hierarchical clustering module](http://docs.scipy.org/doc/scipy/reference/cluster.hierarchy.html) that performs hierarchical clustering and outputs the results as dendrogram plots via matplotlib. When it's time to make a prettier, more customized, or web-version of the dendogram, however, it can be tricky to use Scipy's dendrogram to create a suitable visualization. My preferred method of visualizing data -- especially on the web -- is [D3](http://d3js.org/). This example includes a script to convert a Scipy dendrogram into JSON format used by D3's [`cluster`](https://github.com/mbostock/d3/wiki/Cluster-Layout) method.

    In the example, I cluster six genes by their expression values from two experiments. You can easily replace that data with your own, larger data set, to harness the power of both Scipy and D3 for analyzing hierarchical data. The D3 code I used to generate this example is straight from Mike Bostock's [dendrogram example](](http://bl.ocks.org/mbostock/4063570): just replace the JSON file `flare.json` with your own.
    In the example, I cluster six genes by their expression values from two experiments. You can easily replace that data with your own, larger data set, to harness the power of both Scipy and D3 for analyzing hierarchical data. The D3 code I used to generate this example is straight from Mike Bostock's [dendrogram example](http://bl.ocks.org/mbostock/4063570): just replace the JSON file `flare.json` with your own.
  5. @mdml mdml revised this gist Nov 19, 2013. 1 changed file with 20 additions and 20 deletions.
    40 changes: 20 additions & 20 deletions d3-dendrogram.json
    Original file line number Diff line number Diff line change
    @@ -4,50 +4,50 @@
    "children": [
    {
    "children": [],
    "name": "B"
    "name": "f"
    },
    {
    "children": [
    {
    "children": [
    {
    "children": [],
    "name": "A"
    },
    {
    "children": [],
    "name": "C"
    }
    ],
    "name": "A-C"
    "children": [],
    "name": "b"
    },
    {
    "children": [
    {
    "children": [],
    "name": "F"
    "children": [
    {
    "children": [],
    "name": "c"
    },
    {
    "children": [],
    "name": "d"
    }
    ],
    "name": "c-d"
    },
    {
    "children": [
    {
    "children": [],
    "name": "D"
    "name": "a"
    },
    {
    "children": [],
    "name": "E"
    "name": "e"
    }
    ],
    "name": "D-E"
    "name": "a-e"
    }
    ],
    "name": "D-E-F"
    "name": "a-c-d-e"
    }
    ],
    "name": "A-C-D-E-F"
    "name": "a-b-c-d-e"
    }
    ],
    "name": "A-B-C-D-E-F"
    "name": "a-b-c-d-e-f"
    }
    ],
    "name": "Root1"
  6. @mdml mdml revised this gist Nov 19, 2013. 1 changed file with 9 additions and 9 deletions.
    18 changes: 9 additions & 9 deletions dendro_scipy2d3.py
    Original file line number Diff line number Diff line change
    @@ -8,23 +8,23 @@
    import json
    import matplotlib.pyplot as plt

    # Create test data
    d = {
    'employee' : ['A', 'B', 'C', 'D', 'E', 'F'],
    'skillX': [2,8,3,6,8,10],
    'skillY': [8,15,6,9,7,10]
    }
    d1 = pd.DataFrame(d)
    # Example data: gene expression
    geneExp = {'genes' : ['a', 'b', 'c', 'd', 'e', 'f'],
    'exp1': [-2.2, 5.6, 0.9, -0.23, -3, 0.1],
    'exp2': [5.4, -0.5, 2.33, 3.1, 4.1, -3.2]
    }
    df = pd.DataFrame( geneExp )

    # Determine distances (default is Euclidean)
    distMat = scipy.spatial.distance.pdist(np.array(d1[['skillX', 'skillY']]))
    dataMatrix = np.array( df[['exp1', 'exp2']] )
    distMat = scipy.spatial.distance.pdist( dataMatrix )

    # Cluster hierarchicaly using scipy
    clusters = scipy.cluster.hierarchy.linkage(distMat, method='single')
    T = scipy.cluster.hierarchy.to_tree( clusters , rd=False )

    # Create dictionary for labeling nodes by their IDs
    labels = list(d1.employee)
    labels = list(df.genes)
    id2name = dict(zip(range(len(labels)), labels))

    # Draw dendrogram using matplotlib to scipy-dendrogram.pdf
  7. @mdml mdml revised this gist Nov 19, 2013. 1 changed file with 0 additions and 0 deletions.
    Binary file added thumbnail.png
    Loading
    Sorry, something went wrong. Reload?
    Sorry, we cannot display this file.
    Sorry, this file is invalid so it cannot be displayed.
  8. @mdml mdml revised this gist Nov 19, 2013. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion README.md
    Original file line number Diff line number Diff line change
    @@ -1,3 +1,3 @@
    A dendrogram is a common way to represent hierarchical data. For Python users, [Scipy](http://www.scipy.org/) has [hierarchical clustering module](http://docs.scipy.org/doc/scipy/reference/cluster.hierarchy.html) that boths performs hierarchical clustering and outputs the results as dendrogram plots via matplotlib. When it's time to make a prettier, more customized, or web-version of the dendogram, however, it can be tricky to convert the Scipy dendrogram into a suitable format. My preferred method of visualizing data -- especially on the web -- is [D3](http://d3js.org/). This example includes a script to convert a Scipy dendrogram into JSON format used by D3's [`cluster`](https://github.com/mbostock/d3/wiki/Cluster-Layout) method.
    A dendrogram is a common way to represent hierarchical data. For Python users, [Scipy](http://www.scipy.org/) has a [hierarchical clustering module](http://docs.scipy.org/doc/scipy/reference/cluster.hierarchy.html) that performs hierarchical clustering and outputs the results as dendrogram plots via matplotlib. When it's time to make a prettier, more customized, or web-version of the dendogram, however, it can be tricky to use Scipy's dendrogram to create a suitable visualization. My preferred method of visualizing data -- especially on the web -- is [D3](http://d3js.org/). This example includes a script to convert a Scipy dendrogram into JSON format used by D3's [`cluster`](https://github.com/mbostock/d3/wiki/Cluster-Layout) method.

    In the example, I cluster six genes by their expression values from two experiments. You can easily replace that data with your own, larger data set, to harness the power of both Scipy and D3 for analyzing hierarchical data. The D3 code I used to generate this example is straight from Mike Bostock's [dendrogram example](](http://bl.ocks.org/mbostock/4063570): just replace the JSON file `flare.json` with your own.
  9. @mdml mdml renamed this gist Nov 19, 2013. 1 changed file with 0 additions and 0 deletions.
    File renamed without changes.
  10. @mdml mdml revised this gist Nov 19, 2013. 1 changed file with 3 additions and 0 deletions.
    3 changes: 3 additions & 0 deletions README
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,3 @@
    A dendrogram is a common way to represent hierarchical data. For Python users, [Scipy](http://www.scipy.org/) has [hierarchical clustering module](http://docs.scipy.org/doc/scipy/reference/cluster.hierarchy.html) that boths performs hierarchical clustering and outputs the results as dendrogram plots via matplotlib. When it's time to make a prettier, more customized, or web-version of the dendogram, however, it can be tricky to convert the Scipy dendrogram into a suitable format. My preferred method of visualizing data -- especially on the web -- is [D3](http://d3js.org/). This example includes a script to convert a Scipy dendrogram into JSON format used by D3's [`cluster`](https://github.com/mbostock/d3/wiki/Cluster-Layout) method.

    In the example, I cluster six genes by their expression values from two experiments. You can easily replace that data with your own, larger data set, to harness the power of both Scipy and D3 for analyzing hierarchical data. The D3 code I used to generate this example is straight from Mike Bostock's [dendrogram example](](http://bl.ocks.org/mbostock/4063570): just replace the JSON file `flare.json` with your own.
  11. @mdml mdml created this gist Nov 18, 2013.
    54 changes: 54 additions & 0 deletions d3-dendrogram.json
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,54 @@
    {
    "children": [
    {
    "children": [
    {
    "children": [],
    "name": "B"
    },
    {
    "children": [
    {
    "children": [
    {
    "children": [],
    "name": "A"
    },
    {
    "children": [],
    "name": "C"
    }
    ],
    "name": "A-C"
    },
    {
    "children": [
    {
    "children": [],
    "name": "F"
    },
    {
    "children": [
    {
    "children": [],
    "name": "D"
    },
    {
    "children": [],
    "name": "E"
    }
    ],
    "name": "D-E"
    }
    ],
    "name": "D-E-F"
    }
    ],
    "name": "A-C-D-E-F"
    }
    ],
    "name": "A-B-C-D-E-F"
    }
    ],
    "name": "Root1"
    }
    70 changes: 70 additions & 0 deletions dendro_scipy2d3.py
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,70 @@
    #!/usr/bin/python

    # Load required modules
    import pandas as pd
    import scipy.spatial
    import scipy.cluster
    import numpy as np
    import json
    import matplotlib.pyplot as plt

    # Create test data
    d = {
    'employee' : ['A', 'B', 'C', 'D', 'E', 'F'],
    'skillX': [2,8,3,6,8,10],
    'skillY': [8,15,6,9,7,10]
    }
    d1 = pd.DataFrame(d)

    # Determine distances (default is Euclidean)
    distMat = scipy.spatial.distance.pdist(np.array(d1[['skillX', 'skillY']]))

    # Cluster hierarchicaly using scipy
    clusters = scipy.cluster.hierarchy.linkage(distMat, method='single')
    T = scipy.cluster.hierarchy.to_tree( clusters , rd=False )

    # Create dictionary for labeling nodes by their IDs
    labels = list(d1.employee)
    id2name = dict(zip(range(len(labels)), labels))

    # Draw dendrogram using matplotlib to scipy-dendrogram.pdf
    scipy.cluster.hierarchy.dendrogram(clusters, labels=labels, orientation='right')
    plt.savefig("scipy-dendrogram.png")

    # Create a nested dictionary from the ClusterNode's returned by SciPy
    def add_node(node, parent ):
    # First create the new node and append it to its parent's children
    newNode = dict( node_id=node.id, children=[] )
    parent["children"].append( newNode )

    # Recursively add the current node's children
    if node.left: add_node( node.left, newNode )
    if node.right: add_node( node.right, newNode )

    # Initialize nested dictionary for d3, then recursively iterate through tree
    d3Dendro = dict(children=[], name="Root1")
    add_node( T, d3Dendro )

    # Label each node with the names of each leaf in its subtree
    def label_tree( n ):
    # If the node is a leaf, then we have its name
    if len(n["children"]) == 0:
    leafNames = [ id2name[n["node_id"]] ]

    # If not, flatten all the leaves in the node's subtree
    else:
    leafNames = reduce(lambda ls, c: ls + label_tree(c), n["children"], [])

    # Delete the node id since we don't need it anymore and
    # it makes for cleaner JSON
    del n["node_id"]

    # Labeling convention: "-"-separated leaf names
    n["name"] = name = "-".join(sorted(map(str, leafNames)))

    return leafNames

    label_tree( d3Dendro["children"][0] )

    # Output to JSON
    json.dump(d3Dendro, open("d3-dendrogram.json", "w"), sort_keys=True, indent=4)