Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.formal.ai/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The Formal SQL parser tags every column reference with the list of functions it passes through (column.in_functions in policy input). Each function carries one or more categories sourced from the catalogues below, so policies can branch on category instead of name. The catalogue is split per dialect: a single Common file holds names shared across every supported dialect with the same meaning, and each technology has its own file for entries that are dialect-specific or have a different interpretation in that dialect. There are 766 entries across all dialects.

How policies use this

Read the categories off column.in_functions[].categories in Rego. For example, hash any email column that went through an aggregate:
package formal.v2

import future.keywords.if
import future.keywords.in

in_aggregate(column) if {
  some func in column.in_functions
  "aggregate" in func.categories
}

response := {
  "action": "mask",
  "type": "hash.no_salt",
  "columns": columns,
} if {
  columns := [c |
    c := input.row[_]
    c.data_label == "email_address"
    in_aggregate(c)
  ]
  columns != []
}

Categories

Each function is tagged with one or more categories, organized in two groups.

Shape categories

Describes how the function participates in SQL evaluation. Every catalogued function carries at least one shape category.
CategoryDefinition
scalarOperates on individual values: one input row produces one output value. The default shape for non-aggregate, non-window functions.
aggregateReduces multiple input rows to a single output value (SUM, COUNT, STRING_AGG, ARRAY_AGG).
windowComputes a value per row using a window of related rows, typically with an OVER clause (ROW_NUMBER, LAG, RANK).
tableReturns a set of rows; called in a FROM clause (UNNEST, GENERATE_SUBSCRIPTS, JSON_TABLE).

Domain categories

Describes the kind of data the function operates on or returns. A function can have zero, one, or several domain categories.
CategoryDefinition
numericProduces or operates on numeric values (ABS, ROUND, SQRT, MOD).
string_and_binaryOperates on character strings or byte strings (CONCAT, SUBSTRING, REGEXP_REPLACE, HEX).
date_timeOperates on date, time, or timestamp values (EXTRACT, DATE_PART, AGE, DATE_FORMAT).
conditionalReturns one of several values based on a condition (CASE, COALESCE, NULLIF, IFNULL).
conversionCoerces a value from one type to another (CAST, TO_DATE, CONVERT, INET_ATON).
bitwiseOperates on bit-level representations (BITAND, BITOR, BITXOR).
hashComputes a hash digest of its input (MD5, SHA1, SHA2, Snowflake HASH).
encryptionEncrypts or decrypts data (AES_ENCRYPT, AES_DECRYPT, Snowflake ENCRYPT).
semi_structuredOperates on nested formats: JSON, JSONB, VARIANT, OBJECT, ARRAY, XML.
geospatialOperates on geographic or geometric values (Snowflake ST_*).
vector_similarityComputes similarity or distance between vectors (Snowflake VECTOR_*).
fileOperates on staged-file references (Snowflake FL_*, BUILD_*_FILE_URL).
contextReturns information about the current session, user, role, database, or transaction (CURRENT_USER, CURRENT_TIMESTAMP).
metadataReturns metadata about database objects (Snowflake GET_DDL, GENERATE_COLUMN_DESCRIPTION).
notificationUsed for messaging integrations (Snowflake EMAIL_INTEGRATION_CONFIG, TEXT_PLAIN).
systemEngine-level operations exposed as functions (BENCHMARK, SLEEP, EXPLAIN_JSON, Snowflake SYSTEM$*).
data_generationGenerates random or sequence values (RANDOM, UUID_STRING, RANDOM_BYTES).
lockingManages explicit advisory locks (MySQL GET_LOCK, RELEASE_LOCK).
differential_privacyDifferential-privacy operations (Snowflake ESTIMATE_REMAINING_DP_AGGREGATES).

Catalogue

For canonical upstream documentation see Snowflake, PostgreSQL, and MySQL / MariaDB. Use the search box to find a function by name, and click category pills to narrow the list to entries that carry the selected categories.