Skip to content

Instantly share code, notes, and snippets.

@tjake
Last active September 8, 2024 04:11
Show Gist options
  • Select an option

  • Save tjake/fb166a659e8fe4c8d4a3 to your computer and use it in GitHub Desktop.

Select an option

Save tjake/fb166a659e8fe4c8d4a3 to your computer and use it in GitHub Desktop.

Revisions

  1. tjake revised this gist Dec 1, 2014. 4 changed files with 4 additions and 4 deletions.
    2 changes: 1 addition & 1 deletion insert.txt
    Original file line number Diff line number Diff line change
    @@ -1,4 +1,4 @@
    ./bin/cassandra-stress user profile=blogpost.yaml ops\(insert=1\)
    ./bin/cassandra-stress user profile=./blogpost.yaml ops\(insert=1\)

    Results:
    op rate : 8625
    2 changes: 1 addition & 1 deletion mixed1.txt
    Original file line number Diff line number Diff line change
    @@ -1,4 +1,4 @@
    ./bin/cassandra-stress user profile=blogpost.yaml ops\(singlepost=2,timeline=1,insert=1\)
    ./bin/cassandra-stress user profile=./blogpost.yaml ops\(singlepost=2,timeline=1,insert=1\)

    Results:
    op rate : 5938
    2 changes: 1 addition & 1 deletion query1.txt
    Original file line number Diff line number Diff line change
    @@ -1,4 +1,4 @@
    ./bin/cassandra-stress user profile=blogpost.yaml ops\(singlepost=1\)
    ./bin/cassandra-stress user profile=./blogpost.yaml ops\(singlepost=1\)

    Results:
    op rate : 7222
    2 changes: 1 addition & 1 deletion query2.txt
    Original file line number Diff line number Diff line change
    @@ -1,4 +1,4 @@
    ./bin/cassandra-stress user profile=blogpost.yaml ops\(timeline=1\)
    ./bin/cassandra-stress user profile=./blogpost.yaml ops\(timeline=1\)

    Results:
    op rate : 7132
  2. tjake revised this gist Dec 1, 2014. 1 changed file with 8 additions and 6 deletions.
    14 changes: 8 additions & 6 deletions blogpost.yaml
    Original file line number Diff line number Diff line change
    @@ -20,7 +20,7 @@ table_definition: |
    title text,
    body text,
    PRIMARY KEY(domain, published_date)
    ) WITH CLUSTERING ORDER BY (published_date DESC);
    ) WITH CLUSTERING ORDER BY (published_date DESC)
    AND compaction = { 'class':'LeveledCompactionStrategy' }
    AND comment='A table to hold blog posts'
    @@ -51,9 +51,7 @@ columnspec:
    insert:
    partitions: fixed(1) # Our partition key is the domain so only insert one per batch

    pervisit: fixed(1)/1000 # We have 1000 posts per domain so 1/1000 will allow 1 post per batch

    perbatch: fixed(1)/1 # With one partition per batch we can set this to 100%
    select: fixed(1)/1000 # We have 1000 posts per domain so 1/1000 will allow 1 post per batch

    batchtype: UNLOGGED # Unlogged batches

    @@ -62,5 +60,9 @@ insert:
    # A list of queries you wish to run against the schema
    #
    queries:
    singlepost: select * from blogposts where domain = ? LIMIT 1
    timeline: select url, title, published_date from blogposts where domain = ? LIMIT 10
    singlepost:
    cql: select * from blogposts where domain = ? LIMIT 1
    fields: samerow
    timeline:
    cql: select url, title, published_date from blogposts where domain = ? LIMIT 10
    fields: samerow
  3. tjake revised this gist Aug 1, 2014. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion blogpost.yaml
    Original file line number Diff line number Diff line change
    @@ -37,7 +37,7 @@ columnspec:
    - name: url
    size: uniform(30..300)

    - name: title #titles shouldn't go beyond 300 chars
    - name: title #titles shouldn't go beyond 200 chars
    size: gaussian(10..200)

    - name: author
  4. tjake revised this gist Aug 1, 2014. 1 changed file with 0 additions and 5 deletions.
    5 changes: 0 additions & 5 deletions blogpost.yaml
    Original file line number Diff line number Diff line change
    @@ -64,8 +64,3 @@ insert:
    queries:
    singlepost: select * from blogposts where domain = ? LIMIT 1
    timeline: select url, title, published_date from blogposts where domain = ? LIMIT 10

    #
    # In order to generate data consistently we need something to generate a unique key for this schema profile.
    #
    seed: changing this string changes the generated data. its hashcode is used as the random seed.
  5. tjake revised this gist Aug 1, 2014. 5 changed files with 0 additions and 0 deletions.
    File renamed without changes.
    File renamed without changes.
    File renamed without changes.
    File renamed without changes.
    File renamed without changes.
  6. tjake revised this gist Aug 1, 2014. 4 changed files with 31 additions and 3 deletions.
    2 changes: 1 addition & 1 deletion gistfile2.txt
    Original file line number Diff line number Diff line change
    @@ -1,4 +1,4 @@
    Writes
    ./bin/cassandra-stress user profile=blogpost.yaml ops\(insert=1\)

    Results:
    op rate : 8625
    2 changes: 1 addition & 1 deletion gistfile3.txt
    Original file line number Diff line number Diff line change
    @@ -1,4 +1,4 @@
    Read a single post
    ./bin/cassandra-stress user profile=blogpost.yaml ops\(singlepost=1\)

    Results:
    op rate : 7222
    2 changes: 1 addition & 1 deletion gistfile4.txt
    Original file line number Diff line number Diff line change
    @@ -1,4 +1,4 @@
    Timeline
    ./bin/cassandra-stress user profile=blogpost.yaml ops\(timeline=1\)

    Results:
    op rate : 7132
    28 changes: 28 additions & 0 deletions gistfile5.txt
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,28 @@
    ./bin/cassandra-stress user profile=blogpost.yaml ops\(singlepost=2,timeline=1,insert=1\)

    Results:
    op rate : 5938
    partition rate : 5583
    row rate : 10555
    latency mean : 67.6
    latency median : 57.8
    latency 95th percentile : 160.3
    latency 99th percentile : 287.1
    latency 99.9th percentile : 450.6
    latency max : 719.7
    Total operation time : 00:00:43
    Improvement over 271 threadCount: -4%
    Sleeping for 15s
    id, partitions, op/s, pk/s, row/s, mean, med, .95, .99, .999, max, time, stderr
    4 threadCount, 144779 , 3424, 3159, 5527, 1.2, 0.9, 2.1, 3.9, 23.2, 382.1, 45.8, 0.01988
    8 threadCount, 302678 , 4240, 3921, 6907, 1.9, 1.5, 3.8, 7.3, 47.1, 718.7, 77.2, 0.01967
    16 threadCount, 233321 , 5144, 4768, 8547, 3.1, 2.2, 7.3, 13.1, 77.0, 365.5, 48.9, 0.01945
    24 threadCount, 152504 , 5251, 4875, 8808, 4.6, 3.2, 11.0, 19.8, 127.8, 330.0, 31.3, 0.02022
    36 threadCount, 323510 , 5316, 4953, 9017, 6.8, 5.2, 18.6, 39.8, 157.5, 383.8, 65.3, 0.01950
    54 threadCount, 192879 , 5533, 5162, 9368, 9.7, 7.2, 24.1, 50.8, 127.8, 373.5, 37.4, 0.01915
    81 threadCount, 174440 , 5693, 5320, 9804, 14.1, 11.0, 32.8, 63.9, 127.2, 384.2, 32.8, 0.02233
    121 threadCount, 192749 , 5989, 5608, 10436, 20.1, 16.2, 47.1, 83.4, 158.3, 362.0, 34.4, 0.01916
    181 threadCount, 196909 , 6053, 5674, 10633, 29.8, 24.5, 67.8, 111.9, 195.2, 321.1, 34.7, 0.01669
    271 threadCount, 214778 , 6186, 5808, 10962, 43.5, 35.5, 104.1, 177.8, 310.3, 526.3, 37.0, 0.01777
    406 threadCount, 242622 , 5938, 5583, 10555, 67.6, 57.8, 160.3, 287.1, 450.6, 719.7, 43.5, 0.01863
    END
  7. tjake revised this gist Aug 1, 2014. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion gistfile1.yml
    Original file line number Diff line number Diff line change
    @@ -46,7 +46,7 @@ columnspec:
    - name: body
    size: gaussian(100..5000) #the body of the blog post can be long

    ### Column Ratio Distribution Specifications ###
    ### Batch Ratio Distribution Specifications ###

    insert:
    partitions: fixed(1) # Our partition key is the domain so only insert one per batch
  8. tjake revised this gist Aug 1, 2014. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion gistfile1.yml
    Original file line number Diff line number Diff line change
    @@ -51,7 +51,7 @@ columnspec:
    insert:
    partitions: fixed(1) # Our partition key is the domain so only insert one per batch

    pervisit: fixed(1)/1000 # We have 1000 posts per domain so 1/1000 will give at least 1 post per batch
    pervisit: fixed(1)/1000 # We have 1000 posts per domain so 1/1000 will allow 1 post per batch

    perbatch: fixed(1)/1 # With one partition per batch we can set this to 100%

  9. tjake revised this gist Aug 1, 2014. 4 changed files with 98 additions and 12 deletions.
    26 changes: 14 additions & 12 deletions gistfile1.yml
    Original file line number Diff line number Diff line change
    @@ -16,6 +16,7 @@ table_definition: |
    domain text,
    published_date timeuuid,
    url text,
    author text,
    title text,
    body text,
    PRIMARY KEY(domain, published_date)
    @@ -31,7 +32,7 @@ columnspec:
    population: uniform(1..10M) #10M possible domains to pick from

    - name: published_date
    cluster: gaussian(1..1000) #under each domain we should have a avg of a few hundred posts
    cluster: fixed(1000) #under each domain we will have max 1000 posts

    - name: url
    size: uniform(30..300)
    @@ -45,23 +46,24 @@ columnspec:
    - name: body
    size: gaussian(100..5000) #the body of the blog post can be long

    ### Column Ratio Distribution Specifications ###

    insert:
    partitions: uniform(1..50) # number of unique partitions to update in a single operation
    # if perbatch < 1, multiple batches will be used but all partitions will
    # occur in all batches (unless already finished); only the row counts will vary
    pervisit: uniform(1..10)/10 # ratio of rows each partition should update in a single visit to the partition,
    # as a proportion of the total possible for the partition
    perbatch: ~exp(1..3)/4 # number of rows each partition should update in a single batch statement,
    # as a proportion of the proportion we are inserting this visit
    # (i.e. compounds with (and capped by) pervisit)
    batchtype: UNLOGGED # type of batch to use
    partitions: fixed(1) # Our partition key is the domain so only insert one per batch

    pervisit: fixed(1)/1000 # We have 1000 posts per domain so 1/1000 will give at least 1 post per batch

    perbatch: fixed(1)/1 # With one partition per batch we can set this to 100%

    batchtype: UNLOGGED # Unlogged batches


    #
    # A list of queries you wish to run against the schema
    #
    queries:
    simple1: select * from typestest where name = ? and choice = ? LIMIT 100
    range1: select * from typestest where name = ? and choice = ? and date >= ? LIMIT 100
    singlepost: select * from blogposts where domain = ? LIMIT 1
    timeline: select url, title, published_date from blogposts where domain = ? LIMIT 10

    #
    # In order to generate data consistently we need something to generate a unique key for this schema profile.
    28 changes: 28 additions & 0 deletions gistfile2.txt
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,28 @@
    Writes

    Results:
    op rate : 8625
    partition rate : 8625
    row rate : 8612
    latency mean : 46.8
    latency median : 34.5
    latency 95th percentile : 121.9
    latency 99th percentile : 203.4
    latency 99.9th percentile : 600.4
    latency max : 877.0
    Total operation time : 00:00:42
    Improvement over 271 threadCount: 1%
    Sleeping for 15s
    id, partitions, op/s, pk/s, row/s, mean, med, .95, .99, .999, max, time, stderr
    4 threadCount, 224850 , 4459, 4459, 4463, 0.9, 0.6, 2.0, 3.5, 18.5, 399.9, 50.4, 0.02339
    8 threadCount, 162950 , 5177, 5177, 5186, 1.5, 1.1, 3.9, 6.7, 33.4, 524.5, 31.5, 0.02161
    16 threadCount, 244850 , 6439, 6439, 6425, 2.5, 1.6, 6.4, 11.7, 47.7, 672.4, 38.0, 0.01971
    24 threadCount, 214200 , 6933, 6933, 6928, 3.4, 2.2, 9.4, 16.8, 57.7, 551.1, 30.9, 0.01743
    36 threadCount, 231700 , 7345, 7345, 7334, 4.9, 3.0, 12.5, 22.0, 66.5, 732.5, 31.5, 0.01348
    54 threadCount, 250700 , 7976, 7976, 7982, 6.8, 4.5, 18.4, 34.8, 73.6, 399.1, 31.4, 0.01045
    81 threadCount, 263600 , 8238, 8238, 8207, 9.8, 6.8, 25.4, 45.8, 83.0, 472.1, 32.0, 0.00976
    121 threadCount, 270400 , 8267, 8267, 8220, 14.6, 10.5, 37.3, 66.6, 133.5, 394.9, 32.7, 0.01334
    181 threadCount, 282950 , 8409, 8409, 8398, 21.4, 15.8, 54.4, 85.0, 152.7, 227.8, 33.6, 0.01107
    271 threadCount, 304350 , 8561, 8561, 8537, 31.4, 24.2, 81.3, 119.9, 224.3, 367.0, 35.6, 0.01268
    406 threadCount, 365300 , 8625, 8625, 8612, 46.8, 34.5, 121.9, 203.4, 600.4, 877.0, 42.4, 0.01867
    END
    29 changes: 29 additions & 0 deletions gistfile3.txt
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,29 @@
    Read a single post

    Results:
    op rate : 7222
    partition rate : 6456
    row rate : 6456
    latency mean : 83.0
    latency median : 72.4
    latency 95th percentile : 180.9
    latency 99th percentile : 307.0
    latency 99.9th percentile : 732.3
    latency max : 1057.8
    Total operation time : 00:00:40
    Improvement over 406 threadCount: -1%
    Sleeping for 15s
    id, partitions, op/s, pk/s, row/s, mean, med, .95, .99, .999, max, time, stderr
    4 threadCount, 86769 , 3212, 2871, 2871, 1.2, 1.1, 2.1, 3.3, 9.1, 39.9, 30.2, 0.00906
    8 threadCount, 122557 , 4461, 3982, 3982, 1.8, 1.5, 3.2, 5.2, 18.2, 871.2, 30.8, 0.02569
    16 threadCount, 157139 , 5769, 5156, 5156, 2.8, 2.4, 5.8, 9.0, 25.1, 47.0, 30.5, 0.00378
    24 threadCount, 167097 , 6099, 5454, 5454, 3.9, 3.3, 8.3, 12.9, 38.0, 329.0, 30.6, 0.00963
    36 threadCount, 162503 , 5894, 5262, 5262, 6.1, 5.0, 14.1, 23.4, 43.9, 196.7, 30.9, 0.01686
    54 threadCount, 179920 , 6482, 5789, 5789, 8.3, 7.2, 16.5, 26.2, 53.5, 101.1, 31.1, 0.01165
    81 threadCount, 195019 , 6967, 6229, 6229, 11.6, 10.2, 22.6, 34.5, 71.1, 372.4, 31.3, 0.00480
    121 threadCount, 200841 , 7026, 6280, 6280, 17.1, 15.6, 31.9, 47.2, 103.7, 200.8, 32.0, 0.00737
    181 threadCount, 209828 , 7267, 6490, 6490, 24.8, 23.1, 45.7, 62.2, 123.2, 156.5, 32.3, 0.00417
    271 threadCount, 220879 , 7243, 6466, 6466, 37.7, 33.5, 74.9, 112.6, 688.4, 771.8, 34.2, 0.00883
    406 threadCount, 238178 , 7299, 6514, 6514, 55.1, 48.7, 113.8, 179.6, 643.5, 916.4, 36.6, 0.00949
    609 threadCount, 261881 , 7222, 6456, 6456, 83.0, 72.4, 180.9, 307.0, 732.3, 1057.8, 40.6, 0.01282
    END
    27 changes: 27 additions & 0 deletions gistfile4.txt
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,27 @@
    Timeline

    Results:
    op rate : 7132
    partition rate : 6366
    row rate : 25337
    latency mean : 37.7
    latency median : 33.2
    latency 95th percentile : 74.4
    latency 99th percentile : 107.9
    latency 99.9th percentile : 713.3
    latency max : 929.6
    Total operation time : 00:00:35
    Improvement over 181 threadCount: -1%
    Sleeping for 15s
    id, partitions, op/s, pk/s, row/s, mean, med, .95, .99, .999, max, time, stderr
    4 threadCount, 113793 , 4219, 3773, 15040, 0.9, 0.9, 1.5, 2.1, 6.3, 31.5, 30.2, 0.00846
    8 threadCount, 137662 , 5075, 4538, 18106, 1.6, 1.4, 2.9, 5.1, 20.1, 36.0, 30.3, 0.00896
    16 threadCount, 166180 , 6095, 5444, 21701, 2.6, 2.2, 5.5, 8.7, 21.4, 40.3, 30.5, 0.00619
    24 threadCount, 171222 , 6258, 5586, 22224, 3.8, 3.1, 8.1, 13.4, 28.4, 703.5, 30.6, 0.01146
    36 threadCount, 182360 , 6632, 5924, 23571, 5.4, 4.6, 11.5, 19.7, 34.7, 211.3, 30.8, 0.00579
    54 threadCount, 190032 , 6834, 6109, 24323, 7.9, 6.9, 16.3, 26.0, 41.3, 68.8, 31.1, 0.00511
    81 threadCount, 193598 , 6852, 6130, 24397, 11.8, 10.3, 23.9, 35.8, 51.9, 168.2, 31.6, 0.00700
    121 threadCount, 199891 , 6899, 6170, 24551, 17.5, 15.3, 31.9, 45.6, 649.3, 864.2, 32.4, 0.00605
    181 threadCount, 210030 , 7195, 6429, 25578, 25.1, 23.0, 46.1, 67.9, 110.5, 181.0, 32.7, 0.00411
    271 threadCount, 223373 , 7132, 6366, 25337, 37.7, 33.2, 74.4, 107.9, 713.3, 929.6, 35.1, 0.01103
    END
  10. tjake revised this gist Jul 31, 2014. 1 changed file with 5 additions and 5 deletions.
    10 changes: 5 additions & 5 deletions gistfile1.yml
    Original file line number Diff line number Diff line change
    @@ -14,23 +14,23 @@ table: blogposts
    table_definition: |
    CREATE TABLE blogposts (
    domain text,
    published_at timeuuid,
    published_date timeuuid,
    url text,
    title text,
    body text,
    PRIMARY KEY(website, published_at)
    ) WITH compaction = { 'class':'LeveledCompactionStrategy' }
    PRIMARY KEY(domain, published_date)
    ) WITH CLUSTERING ORDER BY (published_date DESC);
    AND compaction = { 'class':'LeveledCompactionStrategy' }
    AND comment='A table to hold blog posts'
    ### Column Distribution Specifications ###

    columnspec:
    - name: domain
    size: gaussian(5..100) #domain names are relatively short
    population: uniform(1..10M) #10M possible domains to pick from

    - name: published_at
    - name: published_date
    cluster: gaussian(1..1000) #under each domain we should have a avg of a few hundred posts

    - name: url
  11. tjake revised this gist Jul 31, 2014. 1 changed file with 5 additions and 0 deletions.
    5 changes: 5 additions & 0 deletions gistfile1.yml
    Original file line number Diff line number Diff line change
    @@ -29,14 +29,19 @@ columnspec:
    - name: domain
    size: gaussian(5..100) #domain names are relatively short
    population: uniform(1..10M) #10M possible domains to pick from

    - name: published_at
    cluster: gaussian(1..1000) #under each domain we should have a avg of a few hundred posts

    - name: url
    size: uniform(30..300)

    - name: title #titles shouldn't go beyond 300 chars
    size: gaussian(10..200)

    - name: author
    size: uniform(5..20) #author names should be short

    - name: body
    size: gaussian(100..5000) #the body of the blog post can be long

  12. tjake revised this gist Jul 31, 2014. 2 changed files with 25 additions and 53 deletions.
    59 changes: 25 additions & 34 deletions gistfile1.yml
    Original file line number Diff line number Diff line change
    @@ -1,3 +1,5 @@
    ### DML ###

    # Keyspace Name
    keyspace: stresscql

    @@ -11,44 +13,33 @@ table: blogposts
    # The CQL for creating a table you wish to stress (optional if it already exists)
    table_definition: |
    CREATE TABLE blogposts (
    url text PRIMARY KEY,
    author text,
    published_at timestamp,
    domain text,
    published_at timeuuid,
    url text,
    title text,
    body text,
    ) WITH comment='A table to hold blog posts'
    PRIMARY KEY(website, published_at)
    ) WITH compaction = { 'class':'LeveledCompactionStrategy' }
    AND comment='A table to hold blog posts'
    #
    # Optional meta information on the generated columns from the above table
    #
    # Tags are:
    # name: the name of the column
    # size: the size distribution (this only applies to text and bytes fields length)
    # Population distribution field represents the total unique population
    # distribution of that column across rows. Supported types are
    #
    # EXP(min..max) An exponential distribution over the range [min..max]
    # EXTREME(min..max,shape) An extreme value (Weibull) distribution over the range [min..max]
    # GAUSSIAN(min..max,stdvrng) A gaussian/normal distribution, where mean=(min+max)/2, and stdev is (mean-min)/stdvrng
    # GAUSSIAN(min..max,mean,stdev) A gaussian/normal distribution, with explicitly defined mean and stdev
    # UNIFORM(min..max) A uniform distribution over the range [min, max]
    # FIXED(val) A fixed distribution, always returning the same value
    # Aliases: extr, gauss, normal, norm, weibull
    #
    # If preceded by ~, the distribution is inverted
    #
    # Defaults for all columns are size: uniform(1..256), identity: uniform(1..1024)
    #
    columnspec:
    - name: name
    size: uniform(1..10)
    population: uniform(1..1M) # the range of unique values to select for the field (default is 100Billion)
    - name: date
    cluster: uniform(1..4)
    - name: lval
    population: gaussian(1..1000)
    cluster: uniform(1..4)
    ### Column Distribution Specifications ###

    columnspec:
    - name: domain
    size: gaussian(5..100) #domain names are relatively short
    population: uniform(1..10M) #10M possible domains to pick from
    - name: published_at
    cluster: gaussian(1..1000) #under each domain we should have a avg of a few hundred posts
    - name: url
    size: uniform(30..300)
    - name: title #titles shouldn't go beyond 300 chars
    size: gaussian(10..200)
    - name: author
    size: uniform(5..20) #author names should be short
    - name: body
    size: gaussian(100..5000) #the body of the blog post can be long

    insert:
    partitions: uniform(1..50) # number of unique partitions to update in a single operation
    # if perbatch < 1, multiple batches will be used but all partitions will
    19 changes: 0 additions & 19 deletions gistfile2.yml
    Original file line number Diff line number Diff line change
    @@ -1,19 +0,0 @@
    # Keyspace Name
    keyspace: stresscql

    # The CQL for creating a keyspace (optional if it already exists)
    keyspace_definition: |
    CREATE KEYSPACE stresscql WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
    # Table name
    table: blogposts

    # The CQL for creating a table you wish to stress (optional if it already exists)
    table_definition: |
    CREATE TABLE blogposts (
    url text PRIMARY KEY,
    author text,
    published_at timestamp,
    title text,
    body text,
    ) WITH comment='A table to hold blog posts'
  13. tjake revised this gist Jul 31, 2014. 1 changed file with 6 additions and 4 deletions.
    10 changes: 6 additions & 4 deletions gistfile1.yml
    Original file line number Diff line number Diff line change
    @@ -19,9 +19,12 @@ table_definition: |
    ) WITH comment='A table to hold blog posts'
    #
    # Optional meta information on the generated columns in the above table
    # The min and max only apply to text and blob types
    # The distribution field represents the total unique population
    # Optional meta information on the generated columns from the above table
    #
    # Tags are:
    # name: the name of the column
    # size: the size distribution (this only applies to text and bytes fields length)
    # Population distribution field represents the total unique population
    # distribution of that column across rows. Supported types are
    #
    # EXP(min..max) An exponential distribution over the range [min..max]
    @@ -40,7 +43,6 @@ columnspec:
    - name: name
    size: uniform(1..10)
    population: uniform(1..1M) # the range of unique values to select for the field (default is 100Billion)
    - name: choice
    - name: date
    cluster: uniform(1..4)
    - name: lval
  14. tjake created this gist Jul 31, 2014.
    71 changes: 71 additions & 0 deletions gistfile1.yml
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,71 @@
    # Keyspace Name
    keyspace: stresscql

    # The CQL for creating a keyspace (optional if it already exists)
    keyspace_definition: |
    CREATE KEYSPACE stresscql WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
    # Table name
    table: blogposts

    # The CQL for creating a table you wish to stress (optional if it already exists)
    table_definition: |
    CREATE TABLE blogposts (
    url text PRIMARY KEY,
    author text,
    published_at timestamp,
    title text,
    body text,
    ) WITH comment='A table to hold blog posts'
    #
    # Optional meta information on the generated columns in the above table
    # The min and max only apply to text and blob types
    # The distribution field represents the total unique population
    # distribution of that column across rows. Supported types are
    #
    # EXP(min..max) An exponential distribution over the range [min..max]
    # EXTREME(min..max,shape) An extreme value (Weibull) distribution over the range [min..max]
    # GAUSSIAN(min..max,stdvrng) A gaussian/normal distribution, where mean=(min+max)/2, and stdev is (mean-min)/stdvrng
    # GAUSSIAN(min..max,mean,stdev) A gaussian/normal distribution, with explicitly defined mean and stdev
    # UNIFORM(min..max) A uniform distribution over the range [min, max]
    # FIXED(val) A fixed distribution, always returning the same value
    # Aliases: extr, gauss, normal, norm, weibull
    #
    # If preceded by ~, the distribution is inverted
    #
    # Defaults for all columns are size: uniform(1..256), identity: uniform(1..1024)
    #
    columnspec:
    - name: name
    size: uniform(1..10)
    population: uniform(1..1M) # the range of unique values to select for the field (default is 100Billion)
    - name: choice
    - name: date
    cluster: uniform(1..4)
    - name: lval
    population: gaussian(1..1000)
    cluster: uniform(1..4)

    insert:
    partitions: uniform(1..50) # number of unique partitions to update in a single operation
    # if perbatch < 1, multiple batches will be used but all partitions will
    # occur in all batches (unless already finished); only the row counts will vary
    pervisit: uniform(1..10)/10 # ratio of rows each partition should update in a single visit to the partition,
    # as a proportion of the total possible for the partition
    perbatch: ~exp(1..3)/4 # number of rows each partition should update in a single batch statement,
    # as a proportion of the proportion we are inserting this visit
    # (i.e. compounds with (and capped by) pervisit)
    batchtype: UNLOGGED # type of batch to use

    #
    # A list of queries you wish to run against the schema
    #
    queries:
    simple1: select * from typestest where name = ? and choice = ? LIMIT 100
    range1: select * from typestest where name = ? and choice = ? and date >= ? LIMIT 100

    #
    # In order to generate data consistently we need something to generate a unique key for this schema profile.
    #
    seed: changing this string changes the generated data. its hashcode is used as the random seed.
    19 changes: 19 additions & 0 deletions gistfile2.yml
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,19 @@
    # Keyspace Name
    keyspace: stresscql

    # The CQL for creating a keyspace (optional if it already exists)
    keyspace_definition: |
    CREATE KEYSPACE stresscql WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
    # Table name
    table: blogposts

    # The CQL for creating a table you wish to stress (optional if it already exists)
    table_definition: |
    CREATE TABLE blogposts (
    url text PRIMARY KEY,
    author text,
    published_at timestamp,
    title text,
    body text,
    ) WITH comment='A table to hold blog posts'