Graphs + SQL

PGQL is a query language for the property graph data model that combines the powers of graph pattern matching and SQL:

SELECT p2.name AS friend_of_friend
  FROM facebook_graph                             /* In the Facebook graph..   */
 MATCH (p1:Person) -/:friend_of{2}/-> (p2:Person) /* ..match two-hop friends.. */
 WHERE p1.name = 'Mark'                           /* ..of Mark.                */

For a detailed specification of PGQL, see PGQL 1.1 Specification.

Graph Pattern Matching

PGQL uses ASCII-art syntax for matching vertices, edges, and paths:

  • (n:Person) matches a vertex (node) n with label Person
  • -[e:friend_of]-> matches an edge e with label friend_of
  • -/:friend_of+/-> matches a path consisting of one or more (+) edges, each with label friend_of

SQL Capabilities

PGQL has the following SQL-like capabilities:

  • DISTINCT to remove duplicates
  • GROUP BY to create groups of solutions, and, HAVING to filter out groups of solutions
  • COUNT, MIN, MAX, AVG and SUM to aggregate over groups of solutions
  • ORDER BY to sort results
  • (NOT) EXISTS subqueries to test whether a graph pattern exists, or, doesn’t exists
  • DATE, TIME, TIMESTAMP, TIME WITH TIMEZONE, and TIMESTAMP WITH TIMEZONE temporal data types

Regular Path Expressions

PGQL has regular path expressions (e.g. *, +, ?, {1,4}) for expressing complex traversals for all sorts of reachability analyses:

    PATH connects_to AS (:Device) <- (x) -> (:Device)                /* Devices are connected by two edges..                     */
                  WHERE has_label(x, 'Connection')                   /* ..and an intermediate Connection vertex..                */
                     OR has_label(x, 'Switch') AND x.status = 'OPEN' /* ..or an intermediate Switch vertex with OPEN status.     */
  SELECT d1.name AS source, d2.name AS destination
    FROM electric_network
   MATCH (d1) -/:connects_to+/-> (d2)                                 /* We match the connects_to pattern one or more (+) times. */
   WHERE d1.name = 'DS'
ORDER BY d2.name
+--------+-------------+
| source | destination | /* The result of above query is a table with columns, like in SQL. */
+--------+-------------+
| DN     | D0          | /* First result row. */
| DN     | D5          |
| DN     | D6          |
| DN     | D7          |
| DN     | D8          |
| DN     | D9          | /* Last result row. */
+--------+-------------+

Temporal Data Types

In addition to numbers, (character) strings, and booleans, PGQL has the following temporal data types:

  • DATE (java.time.LocalDate)
  • TIME (java.time.LocalTime)
  • TIMESTAMP (java.time.LocalDateTime)
  • TIME WITH TIMEZONE (java.time.OffsetTime)
  • TIMESTAMP WITH TIMEZONE (java.time.OffsetDateTime)

PGQL’s Java result set API (see ResultSet.java and ResultAccess.java) is based on the new Java 8 Date and Time Library (java.time.*), offering greatly improved safety and functionality for Java developers.

Querying Multiple Graphs

Through subqueries, PGQL allows for comparing data from different graphs.

For example, the following query finds people who are on Facebook but not on Twitter:

SELECT p1.name
  FROM facebook_graph
 MATCH (p1:Person)                           /* Match persons in the Facebook graph.. */
 WHERE NOT EXISTS (                          /* ..such that there doesn't exists..    */
                    SELECT p2
                      FROM twitter_graph
                     MATCH (p2:Person)       /* ..a person in the Twitter graph..     */
                     WHERE p1.name = p2.name /* ..with the same name.                 */
                  )

Resources