Skip to content

Instantly share code, notes, and snippets.

@tjake
Last active September 8, 2024 04:11
Show Gist options
  • Select an option

  • Save tjake/fb166a659e8fe4c8d4a3 to your computer and use it in GitHub Desktop.

Select an option

Save tjake/fb166a659e8fe4c8d4a3 to your computer and use it in GitHub Desktop.
### DML ###
# Keyspace Name
keyspace: stresscql
# The CQL for creating a keyspace (optional if it already exists)
keyspace_definition: |
CREATE KEYSPACE stresscql WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
# Table name
table: blogposts
# The CQL for creating a table you wish to stress (optional if it already exists)
table_definition: |
CREATE TABLE blogposts (
domain text,
published_at timeuuid,
url text,
title text,
body text,
PRIMARY KEY(website, published_at)
) WITH compaction = { 'class':'LeveledCompactionStrategy' }
AND comment='A table to hold blog posts'
### Column Distribution Specifications ###
columnspec:
- name: domain
size: gaussian(5..100) #domain names are relatively short
population: uniform(1..10M) #10M possible domains to pick from
- name: published_at
cluster: gaussian(1..1000) #under each domain we should have a avg of a few hundred posts
- name: url
size: uniform(30..300)
- name: title #titles shouldn't go beyond 300 chars
size: gaussian(10..200)
- name: author
size: uniform(5..20) #author names should be short
- name: body
size: gaussian(100..5000) #the body of the blog post can be long
insert:
partitions: uniform(1..50) # number of unique partitions to update in a single operation
# if perbatch < 1, multiple batches will be used but all partitions will
# occur in all batches (unless already finished); only the row counts will vary
pervisit: uniform(1..10)/10 # ratio of rows each partition should update in a single visit to the partition,
# as a proportion of the total possible for the partition
perbatch: ~exp(1..3)/4 # number of rows each partition should update in a single batch statement,
# as a proportion of the proportion we are inserting this visit
# (i.e. compounds with (and capped by) pervisit)
batchtype: UNLOGGED # type of batch to use
#
# A list of queries you wish to run against the schema
#
queries:
simple1: select * from typestest where name = ? and choice = ? LIMIT 100
range1: select * from typestest where name = ? and choice = ? and date >= ? LIMIT 100
#
# In order to generate data consistently we need something to generate a unique key for this schema profile.
#
seed: changing this string changes the generated data. its hashcode is used as the random seed.
Copy link

ghost commented Aug 11, 2014

Hi,

If I wanted to just do selects and not inserts, would the following entries be sufficient in my test.yaml?

keyspace: test
table: user

queries:
simple1: SELECT * FROM user where id = ? and name = ?

It looks like when I execute the test, I get a
-- Exception in thread "main" java.lang.NullPointerException --

It seem to be expecting entries for "columnspec" fields. When I do run it including the "columnspec", I get 0 values for "partitions", "pk/s" and "row/s" which I presume isn't a biggie since I do get everything else.

Thanks

@raju-nuovo
Copy link

I am planning on using this tool. So if I give a query like "SELECT * FROM user where id = ? and name = ?" How do I pass the values for id and name for the load tests. Or does this tool automatically use the values in the table?

@naishe
Copy link

naishe commented Oct 19, 2014

I think the the semicolon should be at line #25 instead of #23.

@tjake
Copy link
Author

tjake commented Dec 1, 2014

Updated to reflect profile changes in 2.1.1

@halgrim
Copy link

halgrim commented Feb 11, 2015

The tool looks awesone but my company uses "map < text , text >" type in the schema.
How much effort would be needed to add support for this type?
Where the changes need to be made?
Is it something that junior developer can handle?

@infomaven
Copy link

I'm getting an error when I try to run cassandra-stress. I downloaded the source code and built it today using ant 1.9.4.
Command> infomav:tools infomav:tools$ ./bin/cassandra-stress user profile=blogpost.yaml ops(singlepost=1)

Error> Error: Could not find or load main class org.apache.cassandra.stress.Stress

How to fix this?

@infomaven
Copy link

Found the issue. I was using build instead of release command in Ant. This caused it to skip creation of the JAR files and binary tar files.

Once I changed commands, I found the correct tar archive (bin.tar.gz) in build folder and was able to unpack and use it to run cassandra-stress.

@srikanthr341
Copy link

When I make the below changes:

  • name: published_date
    cluster: fixed(100)

select: fixed(1)/100

I am seeing very different results. ( very high opcounts/rowcounts/pk counts and better mean latencies)
Some how it appears to me the tools itself is causing the delay when it generates the rows when the cluster: fixed values are high.

@arsonak47
Copy link

How can I set the write consistency level for running cassandra stress test?

@mshuler
Copy link

mshuler commented Sep 29, 2015

cl= gives the ability to set read/write consistency level, for example:
cassandra-stress write cl=QUORUM -schema 'replication(factor=3)'

@arunsandu
Copy link

Hi,
I am trying to pass request_trace.yaml as an input to the stress-tool as below:
./cassandra-stress user profile=request_trace.yaml n=1000000 ops(likelyquery0=1,likelyquery1=2,insert=1) -node 10.32.100.16
the script is perfectly working fine for the other tables. But I get the below error for request_trace table. Please check the request_trace.yaml file for the script.
Can someone suggest a solution for this?

------------------------------------- request_trace.yaml---------------------------------------------------------
DML ### THIS IS UNDER CONSTRUCTION!!!
Keyspace Name

keyspace: autogeneratedtest
The CQL for creating a keyspace (optional if it already exists)

keyspace_definition: |
CREATE KEYSPACE autogeneratedtest WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
Table name

table: request_trace
The CQL for creating a table you wish to stress (optional if it already exists)

table_definition:
CREATE TABLE request_trace (
service_context_id text,
trace_statement text,
PRIMARY KEY (service_context_id, trace_statement)
)
Column Distribution Specifications

columnspec:

    name: service_context_id
    size: gaussian(10..20)
    population: gaussian(300..500)

    name: trace_statement
    size: gaussian(5..15)
    population: gaussian(800..1000)

Batch Ratio Distribution Specifications

insert:
partitions: fixed(1) # Our partition key is the domain so only insert one per batch

select: fixed(1)/1000 # We have 1000 posts per domain so 1/1000 will allow 1 post per batch

batchtype: UNLOGGED # Unlogged batches
A list of queries you wish to run against the schema

#
queries:
likelyquery0:
cql: SELECT * FROM request_trace WHERE service_context_id = ?
fields: samerow
likelyquery1:
cql: SELECT * FROM request_trace WHERE service_context_id = ? AND trace_statement = ?
fields: samerow
ERROR:
Warming up likelyquery0 with 50000 iterations...
Warming up likelyquery1 with 50000 iterations...
Warming up insert with 50000 iterations...
Generating batches with [1..1] partitions and [0..0] rows (of [1..1] total rows in the partitions)
Exception in thread "main" com.datastax.driver.core.exceptions.SyntaxError: line 1:28 no viable alternative at input 'WHERE' (UPDATE "request_trace" SET  [WHERE]...)
    at com.datastax.driver.core.exceptions.SyntaxError.copy(SyntaxError.java:35)
    at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:271)
    at com.datastax.driver.core.AbstractSession.prepare(AbstractSession.java:82)
    at org.apache.cassandra.stress.util.JavaDriverClient.prepare(JavaDriverClient.java:84)
    at org.apache.cassandra.stress.StressProfile.getInsert(StressProfile.java:396)
    at org.apache.cassandra.stress.settings.SettingsCommandUser$1.get(SettingsCommandUser.java:82)
    at org.apache.cassandra.stress.settings.SettingsCommandUser$1.get(SettingsCommandUser.java:78)
    at org.apache.cassandra.stress.operations.SampledOpDistributionFactory$1.get(SampledOpDistributionFactory.java:80)
    at org.apache.cassandra.stress.StressAction$Consumer.<init>(StressAction.java:269)
    at org.apache.cassandra.stress.StressAction.run(StressAction.java:204)
    at org.apache.cassandra.stress.StressAction.warmup(StressAction.java:105)
    at org.apache.cassandra.stress.StressAction.run(StressAction.java:61)
    at org.apache.cassandra.stress.Stress.main(Stress.java:114)
Caused by: com.datastax.driver.core.exceptions.SyntaxError: line 1:28 no viable alternative at input 'WHERE' (UPDATE "request_trace" SET  [WHERE]...)
    at com.datastax.driver.core.Responses$Error.asException(Responses.java:123)
    at com.datastax.driver.core.SessionManager$1.apply(SessionManager.java:167)
    at com.datastax.driver.core.SessionManager$1.apply(SessionManager.java:142)
    at com.google.common.util.concurrent.Futures$1.apply(Futures.java:713)
    at com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:861)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

@john19may
Copy link

Hi all, I am trying to define multiple DDL and columnSpec together in single yaml file,

`table: user_info

table_definition: |
CREATE TABLE IF NOT EXISTS user_info (
user_id text,
password text,
name text,
pic blob,
dept text,
contacts list,
PRIMARY KEY ((user_id))
)

columnspec:
- name: user_id
size: GAUSSIAN(17..35,27,3)

- name: password
  size: GAUSSIAN(8..200,12,1)

- name: name
  size: GAUSSIAN(5..40,12,3)

- name: pic
  size: UNIFORM(10000..200000)

- name: dept
  size: FIXED(100000)

- name: contacts
  size: UNIFORM(5..100)

table: category_info

table_definition: |
CREATE TABLE IF NOT EXISTS category_info (
user_id text,
category_id uuid,
category_name text,
category_color text,
unreads counter,
PRIMARY KEY ((user_id),category_id)
)

columnspec:
- name: user_id
size: GAUSSIAN(17..35,27,3)

- name: category_id
  cluster: FIXED(100)

- name: category_name
  size: GAUSSIAN(5..40,14,3)

- name: category_color
  size: FIXED(7)

table: mails_by_category

table_definition: |
CREATE TABLE IF NOT EXISTS user_info (
week timestamp,
category_id uuid,
user_id text,
all_unread text,
time timestamp,
mail_id uuid,
from_id text,
header text,
content_id uuid,
family_id uuid,
is_thread boolean,
is_read boolean,
is_starred boolean,
categories list,
PRIMARY KEY ((week,category_id,user_id),all_unread,time,mail_id)
)
WITH CLUSTERING ORDER BY (all_unread ASC, time DESC, mail_id ASC)

columnspec:
- name: week
population: FIXED(50000)

- name: category_id
  population: FIXED(100)

- name: user_id
  size: GAUSSIAN(17..35,27,3)

- name: all_unread
  size: FIXED(5)
  population: FIXED(2)

- name: from_id
  size: GAUSSIAN(17..35,27,3)

- name: header
  size: UNIFORM(0..10000)

- name: categories
  size: UNIFORM(8..100)

`

but i am getting an error "unconfigured columnfamily category_info" when i try to run insert test.

Thank in advance.

@jagadeesh4u
Copy link

May I know your CPU,RAM and Cluster Size??

@dragon-laurance
Copy link

Could you tell me what did the "cluster:uniform(20..40)" do?my English is not so good.
Thanks

@dragon-laurance
Copy link

how can i understand this?

Cluster distribution - Defines the distribution for the number of clustering prefixes within a given partition (default of FIXED(1))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment