| layout | title | version | description |
|---|---|---|---|
default |
Modern T-SQL Style Guide |
0.0.2-alpha |
A guide to writing clean, clear, and consistent T-SQL. |
You can use this set of guidelines, fork them, or make your own - the key here is that you pick a style and stick to it. The odds of making everyone happy are low, so compromise is a guiding principle to achieve cohesion.
These guidelines are designed to make T-SQL statements easy to write, easy to read, and easy to maintain. This guide is opinionated in some areas and relaxed in others.
It is easy to include this guide in Markdown format as a part of a project's code base or reference it here for anyone on the project to freely read.
Modern T-SQL style guide created by Matt McElheny is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on various SQL standards including http://www.sqlstyle.guide and Kickstarter guide.
This document is to be used as a guide for anyone who would like to write clean and clear T-SQL code that is meant to be shared.
NOTE: This style guide is written for use with Microsoft SQL Server, but much of it can be applied to any SQL database with some simple modifications.
- We take a disciplined and practical approach to writing code.
- We treat T-SQL like any other source code, which should be checked into source control, peer reviewed, and properly maintained.
- We believe consistency in style is important, and we value craftsmanship, but not to the exclusion of other practical concerns.
- We demonstrate intent explicitly in code, via clear structure and comments where needed.
Before getting into all the rules, here is a quick look at some examples showing well formatted SQL that matches the recommendations in this style guide:
-- basic select example
select u.user_name as employee_user_name
, u.created as created_date
, d.[name] as department_name
from dbo.users as u
join dbo.departments as d on u.department_id = d.id
where u.created >= dateadd(year, -1, getutcdate())
order by created_date desc-- basic insert example
insert into dbo.categories (
category_code
,category_name
,created_at
)
values (
'MOV'
,'Movies'
,getutcdate()
)-- basic update example
update e
set e.salary = e.salary * 1.04
, updated_at = getutcdate()
from dbo.employees as e
where e.department_id in (1, 2, 3)-- basic delete example
delete g
from dbo.gifts as g
where g.is_purchased = 1
and g.updated_at <= dateadd(month, -6, getutcdate())
and g.return_date is null-
Do not SHOUTCASE SQL keywords (e.g., prefer
select, notselect) -
Favor using a "river" for vertical alignment so that a query can be quickly and easily be scanned by a new reader.
-
Comments should appear at the top of your query, and should explain the intent of the query, not the mechanics.
-
Try to comment things that aren't obvious about the query (e.g., why a particular filter is necessary, why an optimization trick was needed, etc.)
-
Favor being descriptive over terseness:
GOOD:
select usr.user_name as customer_user_nameBAD:
select usr.user_name as custnm -
Follow any existing style in the script before applying this style guide. The SQL script should have one clear style, and these rules should not be applied to existing scripts unless the whole script is being changed to adhere to this style guide.
-
Favor storing
datetimeanddatetime2in UTC unless embedding timezone information (datetimeoffset) so that times are clear and convertible. Use ISO-8601 compliant time and date information (YYYY-MM-DD HH:MM:SS.SSSSS) when referring to date/time data.
-
Names should be
underscore_separatedorPascalCasebut do not mix styles.GOOD:
select count(*) as the_tally, sum(*) as the_totalBAD:
select count(*) as TheTally, sum(*) as theTotal
-
Do not use reserved words for table names if possible.
-
Prefer the shortest commonly understood words to name a table.
-
Naming a table as a plural makes the table easier to speak about. (e.g. favor
employeesoveremployee) -
Do not use object prefixes or Hungarian notation (e.g.
sp_,prc_,vw_,tbl,t_,fn_, etc). -
Never give a table the same name as one of its columns and vice versa.
-
Use
_xref_for many-to-many tables rather than concatenating table names:GOOD:
drivers_xref_carsBAD:
drivers_cars -
Tables should always have a primary key. Surrogate, auto-number identity primary keys are preferable in most circumstances. Natural keys can be enforced with unique constraints in lieu of making them a primary key.
- Do not use reserved words for column names if possible.
- Prefer not simply using
idas the primary identifier for the table if possible. - Do not add a column with the same name as its table and vice versa.
- Avoid common words like
name,description, etc. Prefer a descriptive prefix for those words so that they don't require aliases when joined to other tables with similarly named columns.
- Should relate in some way to the object or expression they are aliasing.
- as a rule of thumb the correlation name should be the first letter of each word in the object's name.
- If there is already a correlation with the same name then append a number.
- Always include the
askeyword—makes it easier to read as it is explicit. - For computed data (
sum()oravg()) use the name you would give it were it a column defined in the schema.
- No tabs. Use spaces for indents.
- Configure your editor to 4 spaces per indent, but prefer your SQL to indent to the "river" and not to a set indent increment.
- No trailing whitespace.
- No more than two lines between statements.
- No empty lines in the middle of a single statement.
- One final newline at the end of a file
- Use an .editorConfig file to enforce reasonable whitespace rules if your T-SQL editor supports it:
# .editorConfig is awesome: https://EditorConfig.org
# SQL files
[*.{sql,tsql,ddl}]
charset = utf-8
indent_style = space
indent_size = 4
end_of_line = crlf
trim_trailing_whitespace = true
insert_final_newline = trueSpaces should be used to line up the code so that the root keywords all end on the same character boundary. This forms a "river" down the middle making it easy for the readers eye to scan over the code and separate the keywords from the implementation detail. Rivers are bad in typography, but helpful here. Celko's book describes using a river to vertically align your query. Right align keywords to the river.
-- a river in the 7th column helps vertical readability
select emp.id as employee_id
, dpt.name as department
, dpt.location
, emp.first_name as employee_first_name
, emp.last_name as employee_last_name
, mgr.first_name as manager_first_name
, mgr.last_name as manager_last_name
from dbo.employees as emp
join dbo.departments as dpt on emp.department_id = dpt.id
left join dbo.employees as mgr on emp.manager_id = mgr.id
where dpt.name in ('IT', 'DBA', 'Analytics')
and (emp.title like '%DBA%' or emp.title like '%Database%')
order by dpt.name
, emp.last_name
, emp.first_name-- alternately, a river in the a different column is fine if that is preferred
-- due to longer keywords, but know that indenting can feel off if the `select`
-- is not in the first column for the query
select emp.id as employee_id
, dpt.name as department
, dpt.location
, emp.first_name as employee_first_name
, emp.last_name as employee_last_name
, mgr.first_name as manager_first_name
, mgr.last_name as manager_last_name
from dbo.employees as emp
join dbo.departments as dpt on emp.department_id = dpt.id
left join dbo.employees as mgr on emp.manager_id = mgr.id
where dpt.name in ('IT', 'DBA', 'Analytics')
and (emp.title like '%DBA%' or emp.title like '%Database%')
order by dpt.name
, emp.last_name
, emp.first_name-- Using a river can be tedious, so if this alignment is not preferred by your
-- team, then standard 4 space indenting should be used in place of a river
-- with the major keywords occupying their own line. (i.e. `select`, `from`,
-- `where`, `group by`, `having`, `order by`):
select
emp.id as employee_id
,dpt.name as department
,dpt.location
,emp.first_name as employee_first_name
,emp.last_name as employee_last_name
,mgr.first_name as manager_first_name
,mgr.last_name as manager_last_name
from
dbo.employees as emp
join dbo.departments as dpt on emp.department_id = dpt.id
left join dbo.employees as mgr on emp.manager_id = mgr.id
where
dpt.name in ('IT', 'DBA', 'Analytics')
and (emp.title like '%DBA%' or emp.title like '%Database%')
order by
dpt.name
,emp.last_name
,emp.first_nameselect the first column on the same line, and align all additional columns to the first column on their own line:
select projects.name
, users.email
, projects.country
, count(backings.id) as backings_count
from ...Use commas as a prefix as opposed to a suffix. This is preferred because:
- It makes it easy to add new columns to the end of the column list
- It prevents unintentional aliasing bugs
- It makes commenting out columns much easier
- Column lists are more likely to change at the end than at the beginning
The comma should border the "river".
GOOD:
select user_name as customer_user_name
, created_at
, is_deleted
...BAD:
-- whoops! forgot a trailing comma because it's hard to see, making an
-- accidental alias
select usr.user_name as customer_user_name,
created_at
is_deleted
...Always rename aggregates, case derived columns, and function-wrapped columns:
select product_name
, sum(amount) as sum_amount
from ...Always use table aliases for all columns when querying from more than one table. Single character aliases are fine for a few tables, but are less likely to be clear as a query grows:
select prj.name as project_name
, count(bck.id) as backings_count
from dbo.backings as bck
join dbo.projects as prj
on ...Always use as to rename columns. as statements can be used for additional
vertical alignment but don't have to be:
GOOD:
select projects.name as project_name
, count(backings.id) as backings_count
...BAD:
select prj.name project_name
, count(bck.id) backings_count
...Do not bracket-escape table or column names unless the names contain keyword collisions or would cause a syntax error without properly qualifying them.
GOOD:
-- name and state are keywords
select [name]
, street1
, city
, [state]
, zip
from dbo.peopleBAD:
-- the brackets are messy and unnecessary
select [name]
, [street1]
, [city]
, [state]
, [zip]
from [dbo].[people]Long Window functions should be split across multiple lines: one for each
clause, aligned to the partition keyword. Partition keys can share the same
line, or be split. Ascending order is an intuitive default and thus using an
explicit asc is not necessary whereas desc is. All window functions should
be aliased.
row_number() over (partition by prj.category_id, prj.year
order by pledged desc, last_name, first_name
) as sequence_numbercase statements aren't always easy to format but try to align when, then,
and else together inside case and end.
then can stay on the when line if needed, but aligning with else is
preferable.
select prj.name as project_name
, prj.created_at
, case when prj.category in ('stuff', 'and', 'things')
then 'Amazing!'
else 'Average'
end as opinion
from ...Only one table should be in the from. Never use comma separated from-joins:
GOOD:
select projects.name as project_name
, count(backings.id) as backings_count
from dbo.projects as projects
join dbo.backings as backings on backings.project_id = projects.id
...BAD:
select projects.name as project_name
, count(backings.id) as backings_count
from dbo.projects as projects, dbo.backings as backings
where backings.project_id = projects.id
...Favor not using the extraneous words inner or outer when joining tables.
Alignment is easier without them, they don't add to the understanding of the
query, and the full table list is easier to scan without excessive staggering:
GOOD:
select projects.name as project_name,
, count(backings.id) as backings_count
from dbo.projects as projects
join dbo.backings as backings on ...
join ...
left join dbo.backer_rewards as backer_rewards on ...
left join ...BAD:
select projects.name as project_name,
, count(backings.id) as backings_count
from dbo.projects as projects
inner join dbo.backings as backings on ...
inner join ...
left outer join dbo.backer_rewards as backer_rewards on ...
left outer join ...The on keyword and condition goes on the inner join line:
select projects.name as project_name
, count(backings.id) as backings_count
from dbo.projects as projects
join dbo.backings as backings on projects.id = backings.project_id
...Additional filters in the join go on new indented lines. Line up using the
on keyword as a second river:
select projects.name as project_name
, count(backings.id) as backings_count
from dbo.projects as projects
join dbo.backings as backings on projects.id = backings.project_id
and backings.project_country <> 'US'
...Begin with inner joins and then list left joins, order them semantically,
and do not intermingle left joins with inner joins unless necessary:
GOOD:
join dbo.backings as backings on ...
join dbo.users as users on ...
join dbo.locations as locations on ...
left join dbo.backer_rewards as backer_rewards on ...
left join ...BAD:
left join dbo.backer_rewards as backer_rewards on backings
join dbo.users as users on ...
left join ...
join dbo.locations as locations on ...Avoid right joins as they are usually better written with a left join
BAD:
from dbo.backer_rewards as backer_rewards
right join dbo.users as users on ...GOOD:
from dbo.users as users
left join dbo.backer_rewards as backer_rewards on ...Multiple where clauses should go on different lines and align to the river:
select name
, goal
from dbo.projects as projects
where country = 'US'
and deadline >= '2015-01-01'
...When mixing and and or statements, do not rely on order of operations and
instead always use parenthesis to make the intent clear:
select name
, goal
from dbo.projects as projects
where (country = 'US'
and deadline >= '2015-01-01')
or include = 1
...Always put a semicolon on its own line when using them. This prevents common
errors like adding conditions to a where clause and neglecting to move the
trailing semicolon:
BAD:
-- messing with a where clause, it's easy to miss a trailing semicolon
delete u
from dbo.users u
where created_at < dateadd(year, -1, getutcdate());
and user_name like '%test_user%' -- whoops, missed that trailing semicolon!
...GOOD:
-- semicolons that start a line are easy to spot
delete u
from dbo.users u
where created_at < dateadd(year, -1, getutcdate())
and user_name like '%test_user%'
; -- < if we add a condition, we'll see this semicolon
...Maintain the same column order as the select clause in the group by:
BAD:
-- messing with the 'group by' order makes it hard to scan for accuracy
select p.project_name
, p.country
, sum(p.pledged) as total_pledged
from dbo.projects as p
group by p.country -- out of order
, p.project_nameGOOD:
select p.project_name
, p.country
, sum(p.pledged) as total_pledged
from dbo.projects as p
group by p.project_name
, p.countryDo not use the superfluous asc in order by statements:
BAD:
-- asc is clutter - it's never ambiguous when you wanted to sort ascending
select p.project_name
, p.country
from dbo.projects as p
order by p.country asc
, p.project_name ascGOOD:
-- asc is clutter - it's never ambiguous when you wanted to sort ascending
select p.project_name
, p.country
from dbo.projects as p
order by p.country
, p.project_nameOrdering by column number is okay, but not preferred:
-- This is okay, but not great. It's better when running ad-hoc queries or
-- queries with calculated columns
select p.project_name
, case when p.category in ('Science', 'Technology', 'Engineering')
then 'STEM'
else 'Other'
end as category_group
, p.country
from dbo.projects as p
order by 2, 1

Just a few thoughts about using commas as a prefix as opposed to a suffix.
Modern development tools make these justifications largely irrelevant. I am not using a simple text editor for SQL development and the design tools at my disposal allow me to add columns to whatever position I want with equal ease. There is no perceived need to shave 3 ms off the time it takes me to do so. And in my experience, there is no single, most common column placement. I usually use the WYSIWYG-ish table editor, and then simply move the column if needed in the generated SQL pane. The build and release tooling makes the necessary updates in the correct order to the database, including column placement, when the changes are deployed. Easy peezy.
Syntax errors cannot survive undetected in any reasonably effective SQL editor. And of course, the project will not build, and it will tell me exactly where the problem is. I'm not sure what you mean by "aliasing bugs (missing comma)", but a missing comma is a syntax error and will cause the build to fail. I realize that many developers are not yet using database projects, builds and deployments, but that is a much more pressing problem to fix than style. And I'm only mentioning builds/deployments because they are the natural next step after getting the database schema into a source control system, which you cite above.
Making it easier to comment out code and leave it in the code base is a mistake. Only active, reachable code should be deployed. I can see where being able to comment out sections of code can aid in testing/debugging, but having commas at the end of the line makes no difference except when commenting out the last column. But when the code is finalized and checked in for build & release, no commented out code should remain. And because much of the SQL code in my project, for example, is generated by the tooling itself with trailing commas, I think it makes sense to use that style unless it's just bad, which it isn't.
'... the prefix comma makes it clear when a new column starts'
-- A trailing comma also makes it clear when a new column starts.
You're correct in that it doesn't affect readability; one way or the other.
Leading commas take more time and effort to maintain. To wit, they waste time. And although the format looks nice, it offers no practical benefit for the extra time invested. Over time as code is updated, fragile styles such as this invariably get messed up from time to time and must be fixed. Trailing commas don't have this problem.
Trailing commas are
At the end of the day, developers will use the style they want. I understand that most styles are based on opinions and personal preferences. I intend no disrespect. I only wanted to point out that IMHO, the justifications cited for leading commas do not apply in a modern development environment. Some tools are more suited for styles like this while others are not. I think any style being published for others to consider ought to stick mostly to proven practicality in the development environments most likely in use. IOW, what is the tangible value of the style over the alternatives?
I agree completely with the principles listed above. Source-controlled databases are very good and increasingly becoming the norm. But I would add that once the database schema is source-controlled, it should have first class, versioned (i.e. 2.5.0), automated builds and deployments, to the extent possible. And most of the time, it is.