What does a data source need to do to support SQL Expressions?

It must be a backend data source since SQL expressions is server side.

SQL Expressions supports two general categories of responses:

Tabular data: A single dataframe, with no labels on any of the fields (columns). In short, a dataframe that can be directly mapped to a SQL table.
Labeled Metric Time Data (Timeseries or Numeric): Data that meets the Dataplane spec and has the type Frame.Meta.Type property set.

Therefore, data source support is really per query type or response type within a data source. So it is often not all or nothing in terms of data source support.

Tabular Data

For tabular data responses SQL expressions should work out of the box, so the only thing to do is test out a basic select query.

Labeled Metric Time Data (Timeseries or Numeric)

For labeled metric data, SQL Expressions detects metrics based on the Frame.Meta.Type property of the data frame. This property holds the the dataplane type.

The supported types are:

timeseries-multi
timeseries-wide
numeric-multi
numeric-wide

(Note: the timeseries-long and timeseries-multi fall into the tabular category).

So if your DS has metric data that matches one of those data types, it should work as long as Frame.Meta.Type (serialized into json as schema.meta.type) is set to that data type, for example:

[
  {
    "schema": {
      "meta": {
        "type": "numeric-wide", // SQL expressions need this property set for labeled metric data
        "typeVersion": [0, 1], // optional for SQL Expressions (so can be default 0.0)
        // TypeVersion > 0.0 should make other SSE operations more deterministic,
        // but if not a new DS, safest path is to do that as a separate task.
        // ...
      }
    },
    "fields": [
        // ...
    ]
  }
]

In go code:

import (
    "github.com/grafana/grafana-plugin-sdk-go/data"
)

func main() {
    frame := data.NewFrame("")
    // ... add data and fields the create "NumericWide" type.
    frame.Meta = &data.FrameMeta{Type: data.FrameTypeNumericWide}
}

When SQL expressions receives labeled metric data, it will convert (flatten) the data into the full-long format for the corresponding kind (timeseries or numeric). This happens once the data source query is selected by RefID from a SQL expression. This is because SQL has no notion of labels.

Manual Testing

In a dashboard, and for each type for response type your query offers:

Add your data source query
Add Expression -> Type SQL
The default should query should be SELECT * from A LIMIT 10 (assuming your query is RefID A). If this works on a few variations of that query type, it should be compatible with SQL expressions.
If it doesn't work, see section above, if you are sending metadata and the datatypes mentioned above (or tabular data, and it isn't working, there may be an issue with SQL expressions)

Full Long Formats

Two new data types have been introduced with SQL expressions. timeseries-full-long and numeric-full-long. This is a row-oriented tabular format similar to long (labels are not used), but structured in a way that allows more lossless conversion of information between the labeled (wide and multi) formats.

So the use case for this format is when you need to be able to convert between labeled and tabular formats with more preservation of the information. Whereas, when converting back and forth with Long can create artifacts.

Properties shared by Full Long Formats

They are single frame formats.

There are three reserved fields (columns):

__metric_name__: The name of the metric
__value__: A nullable *float64 numeric field, where this is the value that that represents combination of the metric name and dimensions for the row.
__display_name__: (Optional) When converting from other formats, if the DisplayNameFromDS property is set, it is flatted into this field
Note: Additional reserved names may be added in the same naming style. This would be to flatten other column metadata such as a data links or units.

Dimension Columns

Like long any dimensions (that would be labels in in wide or multi) become their own field. The dimension fields name corresponds to key of the dimensions, and the values values are in the rows of that field.

In the case of full-long they are nullable string fields. When the value on a row is null, the label is considered absent. This helps with more correct conversion to and from the wide and multi formats because of all of the items do not have the same set of label keys.

Numeric Full Long

The numeric kind is for when each item in the response as a value. The numeric kind in format full long format is identified by the data frame type numeric-full-long.

Examples

Single Metric Example

__metric_name__	__value__	host	region
cpu_load	0.82	a	us-east-1
cpu_load	0.61	b	us-east-1

Multi-Metric Example

__metric_name__	__value__	host	region
cpu_load	0.82	a	us-east-1
cpu_load	0.61	b	null
disk_free_bytes	1.23e+12	a	us-east-1
disk_free_bytes	9.80e+11	b	us-east-1

Timeseries Full Long

The timeseries kind shares the same properties as the numeric kind for the full long format in terms of reserved columns and the behavior of dimensions. The timeseries kind in format full long format is identified by the data frame type timeseries_full-long.

The main difference is that there must be a time column the is is sorted by time in ascending order (old to new). When this type is produced by converting from the multi or wide formats, the name of the field (column) is kept.

How Full Long differs from the Long format

The long long format is partially flatted, where as the full-long format is fully flatted.

In the full-long formats, the metric name is treated as another dimension. This creates two reserved fields for this purpose in full long: __value__ and __metric_name__.

In the long and each unique metric name gets a field (column). The field's Name property is the metric name. Like long the dimensions that would be labels in the wide and multi formats get flatted. But in long the metric name is not treated as a dimension and does not get flatted, whereas it does in full-long

This format also uses nullable strings for dimensions. This means that if the value is null for the row, where converting to labels, that label should be considered not present. This allows full-long to avoid creating label sets that didn't exist during a round trip conversion from wide or multi.

In the case of the timeseries kind, with roundtrip conversion (multi or wide to long, and then back) long can create false datapoints. Since long does not flatten the metric name portion, there was no way to represent timestamped data points that existed for one set of series sharing a name but not the other set of series. Since full-long is fully flattened into __value__ and __metric_name__, those missing points don't get falsely created like to with full long with doing roundtrip conversion.

In long, the numeric type for the value is persevered. For example a metric that is a unit64 will stay that way. Because full-long flattens to a a single column (__value__), all numeric values become nullable float64 values. In most cases, upstream we effectively convert to something like a float64 (or JS float), so this likely doesn't matter much in current practice.

Conversion between `wide` and `multi` formats

From `wide`/`multi` to `full-long`

Fundamentally we are flattening a column-oriented structure with labels into row-oriented structure without labels.
Each label key present becomes a field (column) with the name of the label key
All the numeric value columns are combined and flatted into one __value__ field. Whatever the numeric type is, it will be converted to a nullable *float64 field.
The field.Name from wide/multi gets flatted into the __metric_name__ field.
If the timeseries kind, the first field of type time gets flatted and keeps the name

Use in SQL expressions

When SQL expressions receives the data in the kinds of timeseries or numeric in the wide or multi formats, it converts it to the corresponding full long format. This is because there is no notion of labels in SQL, so the data must be flatted into tabular format.

kylebrandt/DataplaneFAQ.md