GFQL Language Specification#
Introduction#
GFQL (Graph Frame Query Language) is a DataFrame-native graph query language designed for expressing graph patterns and traversals on tabular data. It operates on node and edge DataFrames, providing a functional, composable approach to graph querying with native GPU acceleration support.
Design Principles#
Dataframe-native: Type-safe functional bulk operations over dataframe libraries like pandas, cuDF
Declarative: Focus on what to retrieve, and give the engine freedom to optimize how
Accessible: Designed for both human readability and machine generation, and building on intuitions from popular tabular and graph systems
Performance-oriented: Vectorized operations by default, including GPU acceleration
Embeddable: Similar to DuckDB, can be embedded in different languages, and initially focused on Python data ecosystem
Computer-tier: Decoupling from storage enables flexible execution - embedded locally or via remote acceleration servers
Language Forms#
GFQL exists in three complementary forms:
Core Language: Abstract graph pattern matching language defined by this specification
Embedded DSL: Host language implementations (currently Python with pandas/cuDF)
Wire Protocol: JSON serialization for client-server communication (see Wire Protocol spec)
This specification focuses on the core language concepts. Examples use Python syntax for concreteness, but the patterns apply to any embedding.
Language Overview#
Core Concepts#
Graph Model#
Graphs consist of node and edge dataframes:
Edges: DataFrame with source and destination columns
Nodes: DataFrame with unique identifier column
Column names are user-defined globals for the graph:
Node ID attribute:
g._node(e.g., “node_id”, “id”)Edge source attribute:
g._source(e.g., “source”, “from”)Edge destination attribute:
g._destination(e.g., “destination”, “to”)
GFQL infers nodes from edge references when only edges are provided
GFQL Programs#
GFQL programs are declarative graph-to-graph transformations:
Enable use cases like search, filter, enrich, and traverse
Express what to find (ex: Cypher), not how to find it (ex: Gremlin)
Chains#
Path pattern expressions for matching graph structures:
Express graph patterns as sequences of node and edge matching operations
Similar to Cypher patterns but decomposed into composable steps
Define paths through the graph: start nodes → edges → end nodes
Each operation refines the pattern match based on previous results
WHERE (Same-Path Constraints)#
WHERE ties attributes across named steps in a chain. Use it when you need to enforce relationships between nodes/edges on the same path (for example, start.owner_id equals end.owner_id). Multiple WHERE comparisons are conjunctive (AND).
Python example:
from graphistry import n, e_forward, col, compare
g.gfql(
[n({"type": "account"}, name="a"), e_forward(), n({"type": "user"}, name="c")],
where=[compare(col("a", "owner_id"), "==", col("c", "owner_id"))],
)
Wire format (JSON):
{
"type": "Chain",
"chain": [
{"type": "Node", "filter_dict": {"type": "account"}, "name": "a"},
{"type": "Edge", "direction": "forward"},
{"type": "Node", "filter_dict": {"type": "user"}, "name": "c"}
],
"where": [{"eq": {"left": "a.owner_id", "right": "c.owner_id"}}]
}
WHERE context boundaries:
Same-path
where=[...]usescompare(col(...), op, col(...))withopin==,!=,<,<=,>,>=.Predicate helper calls (for example,
gt(...),between(...)) are not used inside same-pathwhere=[...].Row-table filtering after
rows(...)useswhere_rows(...):where_rows(filter_dict=...)supports predicate helpers.where_rows(expr="...")uses expression comparators=,!=,<>,<,<=,>,>=.
Operations#
GFQL supports two operation families:
Graph matchers act on graph entities (nodes and edges).
Row-pipeline operators act on tabular outputs from matched graph entities.
g.gfql([...], where=[...])filters same-path alias relationships.where_rows(...)filters the active row table in RETURN/WITH-style pipelines.
Predicates#
Act on attributes of nodes and edges:
Filter based on property values
Comparison, membership, string matching, temporal checks
Composable within operations to build complex conditions
Values#
Type system matching modern data formats:
Scalars: numbers, strings, booleans, null
Temporal: ISO datetimes, dates, times with timezone support
Collections: lists for membership tests
Compatible with JSON, Arrow, and DataFrame type systems
Formal Grammar#
(* Entry point *)
query ::= chain
(* Chain - path pattern expression *)
chain ::= "[" step ("," step)* "]"
step ::= operation | row_operation | call_operation
(* Graph operations *)
operation ::= node_matcher | edge_matcher
(* Node Matcher *)
node_matcher ::= "n(" node_params? ")"
node_params ::= filter_dict ("," name_param)? ("," query_param)?
| name_param ("," query_param)?
| query_param
(* Edge Matchers *)
edge_matcher ::= edge_forward | edge_reverse | edge_undirected
edge_forward ::= "e_forward(" edge_params? ")"
edge_reverse ::= "e_reverse(" edge_params? ")"
edge_undirected ::= ("e" | "e_undirected") "(" edge_params? ")"
(* WHERE (same-path constraints) *)
where_clause ::= "where=" where_list
where_list ::= "[" where_expr ("," where_expr)* "]"
where_expr ::= "compare(" column_ref "," compare_op "," column_ref ")"
compare_op ::= "'=='" | "'!='" | "'<'" | "'<='" | "'>'" | "'>='"
column_ref ::= alias "." column
alias ::= identifier
column ::= identifier
(* Row operations - Cypher RETURN/WITH-style pipeline *)
row_operation ::= rows_op | where_rows_op | select_op | with_op | return_op
| order_by_op | skip_op | limit_op | distinct_op
| unwind_op | group_by_op
call_operation ::= "call(" string ("," params_object)? ")"
params_object ::= "{" (string ":" value_or_expr ("," string ":" value_or_expr)*)? "}"
rows_op ::= "rows(" (rows_arg ("," rows_arg)*)? ")"
rows_arg ::= "table=" ("'nodes'" | "'edges'") | "source=" string
where_rows_op ::= "where_rows(" (where_rows_arg ("," where_rows_arg)*)? ")"
where_rows_arg ::= "filter_dict=" filter_dict | "expr=" string
select_op ::= "select(" (projection_items | "items=" projection_items) ")"
with_op ::= "with_(" (projection_items | "items=" projection_items) ")"
return_op ::= "return_(" (projection_items | "items=" projection_items) ")"
projection_items ::= "[" projection_item ("," projection_item)* "]"
projection_item ::= string | "(" string "," value_or_expr ")"
value_or_expr ::= value | string
order_by_op ::= "order_by(" (order_keys | "keys=" order_keys) ")"
order_keys ::= "[" "(" value_or_expr "," ("'asc'" | "'desc'") ")" ("," "(" value_or_expr "," ("'asc'" | "'desc'") ")")* "]"
skip_op ::= "skip(" (integer | "value=" integer) ")"
limit_op ::= "limit(" (integer | "value=" integer) ")"
distinct_op ::= "distinct()"
unwind_op ::= "unwind(" "expr=" value_or_expr ("," "as_=" string)? ")"
group_by_op ::= "group_by(" "keys=" "[" string ("," string)* "]" "," "aggregations=" "[" aggregation_spec ("," aggregation_spec)* "]" ")"
aggregation_spec ::= "(" string "," string ")" | "(" string "," string "," value_or_expr ")"
(* Parameters *)
edge_params ::= edge_match_params ("," hop_params)? ("," node_filter_params)? ("," name_param)?
filter_dict ::= "{" (property_filter ("," property_filter)*)? "}"
property_filter ::= string ":" (value | predicate)
hop_params ::= hop_bound_params | hop_slice_params | hop_label_params | "hops=" integer | "to_fixed_point=True"
hop_bound_params ::= "min_hops=" integer | "max_hops=" integer
hop_slice_params ::= "output_min_hops=" integer | "output_max_hops=" integer
hop_label_params ::= "label_node_hops=" string | "label_edge_hops=" string | "label_seeds=True"
node_filter_params ::= source_filter ("," dest_filter)?
source_filter ::= "source_node_match=" filter_dict | "source_node_query=" string
dest_filter ::= "destination_node_match=" filter_dict | "destination_node_query=" string
name_param ::= "name=" string
query_param ::= "query=" string
edge_query_param ::= "edge_query=" string
edge_match_params ::= filter_dict | edge_query_param
(* Predicates *)
predicate ::= comparison | membership | range | null_check | string_pred | temporal_pred
comparison ::= ("gt" | "lt" | "ge" | "le" | "eq" | "ne") "(" value ")"
membership ::= "is_in(" "[" value ("," value)* "]" ")"
range ::= "between(" value "," value ("," "inclusive=" boolean)? ")"
null_check ::= "isnull()" | "notnull()" | "isna()" | "notna()"
string_pred ::= string_match | string_check
string_match ::= "contains(" string ("," "case=" boolean)? ("," "regex=" boolean)? ")"
| "match(" string ("," "case=" boolean)? ("," "flags=" integer)? ")"
| "fullmatch(" string ("," "case=" boolean)? ("," "flags=" integer)? ")"
| ("startswith" | "endswith") "(" string ("," "case=" boolean)? ")"
string_check ::= ("isalpha" | "isnumeric" | "isdigit" | "isalnum"
| "isupper" | "islower") "()"
temporal_pred ::= temporal_check "()"
temporal_check ::= "is_month_start" | "is_month_end" | "is_quarter_start"
| "is_quarter_end" | "is_year_start" | "is_year_end" | "is_leap_year"
(* Values *)
value ::= scalar | temporal_value | collection
scalar ::= number | string | boolean | null
temporal_value ::= datetime_value | date_value | time_value
datetime_value ::= "pd.Timestamp(" string ("," "tz=" string)? ")"
| "datetime(" datetime_args ")"
date_value ::= "date(" date_args ")"
time_value ::= "time(" time_args ")"
collection ::= "[" (value ("," value)*)? "]"
(* Primitives *)
string ::= '"' [^"]* '"' | "'" [^']* "'"
number ::= integer | float
integer ::= ["-"]? [0-9]+
float ::= ["-"]? [0-9]+ "." [0-9]+
boolean ::= "True" | "False"
null ::= "None"
identifier ::= [A-Za-z_][A-Za-z0-9_]*
datetime_args ::= integer ("," integer)*
date_args ::= integer "," integer "," integer
time_args ::= integer "," integer ("," integer)?
Operations#
Node Matcher: n()#
Filters nodes based on attributes.
Syntax: n(filter_dict?, name?, query?)
Parameters:
filter_dict: Dictionary of attribute filtersname: Optional string label for resultsquery: Pandas query string expression
Examples:
n() # All nodes
n({"type": "person"}) # Nodes where type='person'
n({"age": gt(30)}) # Nodes where age > 30
n(name="important") # Label matching nodes
n(query="age > 30 and status == 'active'") # Query string
Edge Matchers#
Forward Traversal: e_forward()#
Traverses edges in forward direction (source → destination).
Syntax: e_forward(edge_match?, hops?, min_hops?, max_hops?, output_min_hops?, output_max_hops?, label_node_hops?, label_edge_hops?, label_seeds?, to_fixed_point?, source_node_match?, destination_node_match?, name?)
Parameters:
edge_match: Edge attribute filtershops: Number of hops (default: 1; shorthand formax_hops)min_hops/max_hops: Inclusive traversal bounds (default min=1 unless max=0; max defaults to hops)output_min_hops/output_max_hops: Optional post-filter slice; defaults keep all traversed hops up tomax_hopslabel_node_hops/label_edge_hops: Optional hop-number columns;label_seeds=Truewrites hop 0 for seeds when labelingto_fixed_point: Continue until no new nodes (default: False)source_node_match: Filters for source nodesdestination_node_match: Filters for destination nodesname: Optional label
Examples:
e_forward() # One hop forward
e_forward(hops=2) # Two hops forward
e_forward(min_hops=2, max_hops=4, output_min_hops=3, label_edge_hops="edge_hop") # bounded + sliced + labeled
e_forward(to_fixed_point=True) # All reachable nodes
e_forward({"type": "follows"}) # Only 'follows' edges
e_forward(source_node_match={"active": True}) # From active nodes
Reverse Traversal: e_reverse()#
Traverses edges in reverse direction (destination → source).
Syntax: Same as e_forward()
Undirected Traversal: e() or e_undirected()#
Traverses edges in both directions.
Syntax: Same as e_forward()
Row-Pipeline Operations#
These operations are encoded as call steps in the chain and are used for
Cypher-style MATCH ... RETURN processing:
rows(table=..., source=...): select active row table (nodes/edges; optional alias scope)where_rows(filter_dict=..., expr=...): row-level filtering on active row tableselect(...)/with_(...)/return_(...): projection and expression shapingorder_by(...),skip(...),limit(...),distinct(): row sorting/paging/dedupunwind(...): expand list-valued expressions into rowsgroup_by(...): grouped vectorized aggregations
Row-pipeline operators are part of the chain list itself (not top-level
g.gfql() keyword arguments):
from graphistry import n, e_forward
from graphistry.compute import rows, where_rows, return_, order_by, limit
g.gfql([
n({"type": "Person"}, name="p"),
e_forward({"type": "FOLLOWS"}),
n({"type": "Person"}, name="q"),
rows(table="nodes", source="q"),
where_rows(expr="score >= 50"),
return_([("id", "id"), ("name", "name"), ("score", "score")]),
order_by([("score", "desc"), ("name", "asc")]),
limit(25),
])
Equivalent explicit Chain form:
from graphistry.compute.chain import Chain
query = Chain([
n({"type": "Person"}, name="p"),
e_forward({"type": "FOLLOWS"}),
n({"type": "Person"}, name="q"),
rows(table="nodes", source="q"),
where_rows(expr="score >= 50"),
return_(["id", "name", "score"]),
])
g.gfql(query)
where=[...] and where_rows(...) are intentionally different:
where=[...]compares values across named path aliases in the MATCH pattern.where_rows(...)evaluates scalar expressions against the active row table.
Predicates#
Comparison Predicates#
gt(value) # Greater than
lt(value) # Less than
ge(value) # Greater than or equal
le(value) # Less than or equal
eq(value) # Equal
ne(value) # Not equal
Membership Predicate#
is_in([value1, value2, ...]) # Value in list
Range Predicate#
between(lower, upper, inclusive=True) # Value in range
String Predicates#
Pattern matching predicates:
contains(pat, case=True, regex=True) # Contains pattern (substring or regex)
startswith(prefix, case=True) # Starts with prefix
endswith(suffix, case=True) # Ends with suffix
match(pat, case=True, flags=0) # Matches regex from start of string
fullmatch(pat, case=True, flags=0) # Matches regex against entire string
String type checking predicates:
isalpha() # Alphabetic characters only
isnumeric() # Numeric characters only
isdigit() # Digits only
isalnum() # Alphanumeric
isupper() # All uppercase
islower() # All lowercase
Null Predicates#
isnull() # Is null/None
notnull() # Is not null/None
isna() # Is NaN (numeric)
notna() # Is not NaN
Temporal Predicates#
is_month_start() # First day of month
is_month_end() # Last day of month
is_quarter_start() # First day of quarter
is_quarter_end() # Last day of quarter
is_year_start() # First day of year
is_year_end() # Last day of year
is_leap_year() # Is leap year
Call Operations and Security#
Call Operations#
GFQL supports calling Plottable methods through the call() operation, providing controlled access to graph transformation and analysis capabilities:
call(function: str, params: dict) -> ASTCall
Call operations enable:
Graph algorithms (PageRank, community detection)
Layout computations (ForceAtlas2, Graphviz)
Data transformations (filtering, collapsing)
Visual encodings (color, size, icons)
Row-pipeline operations (
rows,where_rows,select,with_,return_,order_by,skip,limit,distinct,unwind,group_by)
Safelist Architecture#
For security and stability, Call operations are restricted to a predefined safelist of methods. This prevents:
Arbitrary code execution
Access to filesystem or network operations
Modification of global state
Unsafe graph operations
Safelist Categories#
Graph Analysis
get_degrees,get_indegrees,get_outdegrees: Calculate node degreescompute_cugraph: Run GPU algorithms (pagerank, louvain, etc.)compute_igraph: Run CPU algorithmsget_topological_levels: Analyze DAG structure
Filtering & Transformation
filter_nodes_by_dict,filter_edges_by_dict: Filter by attributeshop: Traverse graph with conditionsdrop_nodes,keep_nodes: Node selectioncollapse: Merge nodes by attributeprune_self_edges: Remove self-loopsmaterialize_nodes: Generate node table
Layout
layout_cugraph: GPU-accelerated layoutslayout_igraph: CPU-based layoutslayout_graphviz: Graphviz layoutsfa2_layout: ForceAtlas2 layoutring_continuous_layout: Radial layout driven by numeric attributesring_categorical_layout: Radial layout grouping by categoriestime_ring_layout: Time-series radial layout (accepts ISO timestamp bounds)group_in_a_box_layout: Group-in-a-box community layoutcircle_layout: Circular node layouttree_layout: Sugiyama-style tree layoutmercator_layout: Mercator projection for latitude/longitude node coordinatesmodularity_weighted_layout: Community-weighted edge layout preparation
Note
time_ring_layout accepts ISO-8601 strings for time_start / time_end when
sent over the wire. GFQL converts them to numpy.datetime64 before use so the
behavior matches direct Plotter calls.
Visual Encoding
encode_point_color: Color nodes/edgesencode_point_size: Size nodesencode_point_icon: Set iconsbind: Attach visual attributes
Embeddings & Dimensionality Reduction
umap: UMAP dimensionality reduction for graph embeddings
Validation#
Call operations undergo multiple validation stages:
Safelist Check: Function name must be in the safelist
Parameter Validation: Parameters validated against method signature
Type Checking: Runtime type validation
Schema Validation: Compatibility with graph schema
Error Codes#
E104: Function not in safelist
E105: Missing required parameter
E201: Parameter type mismatch
E303: Unknown parameter
E301: Required column not found (runtime)
Type System#
Value Types#
Scalars
number: int, floatstring: Text valuesboolean: True/Falsenull: None
Temporal Types
datetime: Timestamp with optional timezonedate: Calendar datetime: Time of day
Collections
list: Ordered sequence of values
Type Coercion#
GFQL performs automatic type coercion:
Python datetime → pandas Timestamp
Numeric types → appropriate precision
Collections → lists for
is_in()
Execution Model#
Declarative Pattern Matching#
GFQL follows a declarative execution model similar to Neo4j’s Cypher:
Pattern Declaration: Chains express path patterns in the graph
Users declare graph patterns as sequences of node and edge constraints
Patterns specify what paths to match, not how to find them
The engine optimizes pattern matching based on data characteristics
Row-Pipeline Transformation: Optional call steps shape tabular outputs
rows(...)chooses active table (nodesoredges, optionally alias-scoped)where_rows(...), projections, sorting, grouping, and paging transform rowsExpressions are validated before execution and unsupported forms fail fast
Set-Based Operations: Graph and row operations run in bulk
No explicit user-managed iteration or traversal order
Results include all matching paths/rows satisfying constraints
Execution is vectorized in supported engines (pandas/cuDF)
Lazy Evaluation: Chains define transformations without immediate execution
Allows engines to optimize path finding and row-table transformations
Result Access#
Query execution returns graph and/or row-tabular outputs according to the embedding implementation.
result = g.gfql([...])
# accessors are embedding-specific
For Python accessor details (including row-pipeline result materialization), see GFQL Python Embedding.
Named Results#
Operations with name parameter add boolean columns to mark matched entities:
result = g.gfql([
n({"type": "person"}, name="people"),
e_forward(name="connections"),
n({"active": True}, name="active_targets")
])
# Access all matched nodes and edges:
all_nodes = result._nodes
all_edges = result._edges
# Access specific matched nodes/edges using pandas filtering:
people_nodes = result._nodes[result._nodes["people"]]
connection_edges = result._edges[result._edges["connections"]]
active_nodes = result._nodes[result._nodes["active_targets"]]
# Or using standard pandas query syntax:
people_nodes = result._nodes.query("people == True")
This pattern is essential for extracting specific subsets from complex graph traversals.
Best Practices#
Use specific filters early: Filter nodes before traversing edges
Limit hops: Use reasonable hop limits to avoid explosion
Name important results: Use
nameparameter for analysisPrefer filter_dict: More efficient than query strings
Use appropriate predicates: Match predicate to column type
See Also#
GFQL Python Embedding - Python implementation details
GFQL Wire Protocol Specification - JSON serialization format
Cypher to GFQL Python & Wire Protocol Mapping - Cypher to GFQL translation with wire protocol
GFQL Quick Reference - Comprehensive examples and usage patterns
GFQL Validation Guide - Learn validation basics