Documentation Index
Fetch the complete documentation index at: https://docs.formal.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The Formal SQL parser tags every column reference with the list of functions it passes through (column.in_functions in policy input).
Each function carries one or more categories sourced from the
catalogues below, so policies can branch on category instead of name.
The catalogue is split per dialect: a single Common file holds names
shared across every supported dialect with the same meaning, and each
technology has its own file for entries that are dialect-specific or
have a different interpretation in that dialect.
There are 766 entries across all dialects.
How policies use this
Read the categories offcolumn.in_functions[].categories in Rego.
For example, hash any email column that went through an aggregate:
Categories
Each function is tagged with one or more categories, organized in two groups.Shape categories
Describes how the function participates in SQL evaluation. Every catalogued function carries at least one shape category.| Category | Definition |
|---|---|
scalar | Operates on individual values: one input row produces one output value. The default shape for non-aggregate, non-window functions. |
aggregate | Reduces multiple input rows to a single output value (SUM, COUNT, STRING_AGG, ARRAY_AGG). |
window | Computes a value per row using a window of related rows, typically with an OVER clause (ROW_NUMBER, LAG, RANK). |
table | Returns a set of rows; called in a FROM clause (UNNEST, GENERATE_SUBSCRIPTS, JSON_TABLE). |
Domain categories
Describes the kind of data the function operates on or returns. A function can have zero, one, or several domain categories.| Category | Definition |
|---|---|
numeric | Produces or operates on numeric values (ABS, ROUND, SQRT, MOD). |
string_and_binary | Operates on character strings or byte strings (CONCAT, SUBSTRING, REGEXP_REPLACE, HEX). |
date_time | Operates on date, time, or timestamp values (EXTRACT, DATE_PART, AGE, DATE_FORMAT). |
conditional | Returns one of several values based on a condition (CASE, COALESCE, NULLIF, IFNULL). |
conversion | Coerces a value from one type to another (CAST, TO_DATE, CONVERT, INET_ATON). |
bitwise | Operates on bit-level representations (BITAND, BITOR, BITXOR). |
hash | Computes a hash digest of its input (MD5, SHA1, SHA2, Snowflake HASH). |
encryption | Encrypts or decrypts data (AES_ENCRYPT, AES_DECRYPT, Snowflake ENCRYPT). |
semi_structured | Operates on nested formats: JSON, JSONB, VARIANT, OBJECT, ARRAY, XML. |
geospatial | Operates on geographic or geometric values (Snowflake ST_*). |
vector_similarity | Computes similarity or distance between vectors (Snowflake VECTOR_*). |
file | Operates on staged-file references (Snowflake FL_*, BUILD_*_FILE_URL). |
context | Returns information about the current session, user, role, database, or transaction (CURRENT_USER, CURRENT_TIMESTAMP). |
metadata | Returns metadata about database objects (Snowflake GET_DDL, GENERATE_COLUMN_DESCRIPTION). |
notification | Used for messaging integrations (Snowflake EMAIL_INTEGRATION_CONFIG, TEXT_PLAIN). |
system | Engine-level operations exposed as functions (BENCHMARK, SLEEP, EXPLAIN_JSON, Snowflake SYSTEM$*). |
data_generation | Generates random or sequence values (RANDOM, UUID_STRING, RANDOM_BYTES). |
locking | Manages explicit advisory locks (MySQL GET_LOCK, RELEASE_LOCK). |
differential_privacy | Differential-privacy operations (Snowflake ESTIMATE_REMAINING_DP_AGGREGATES). |