×

☰ Table of Contents

TigerGraph Docs : TigerGraph Platform Knowledge Base and FAQs -LATEST

Version 1.0
Document Updated:

Copyright © 2015-2017 TigerGraph. All Rights Reserved.
For technical support on this topic, contact support@tigergraph.com with a subject line beginning with "FAQ"


This document provides direct answers to common questions, or directs the reader to a section of another document.

♦ Getting Started and Basics

  • I need help installing the system.

If you have a problem with the procedure descr ibed in th e TigerGraph Platform Installation Guide v1.0 , please contact support@tigergraph.com and summarize your issue in the email subject.


  • What version of the TigerGraph platform am I running?

Created

Use the following command:

$ gsql --version

To see the version numbers of individual components of the platform:

$ gadmin version


  • How do I upgrade from an earlier version?

Updated

Each release comes with documentation addressing how to perform an upgrade.  Contact support@tigergraph.com for help in your specific situation. As of this writing (Sept 2017), the TigerGraph Platform is releasing version 1.0.  Prior to v1.0, features may have change from one version to the next, without full backward compatibility. We recommend always contacting our support staff when upgrading a pre-v1.0 release.


  • I'm not sure how to run the TigerGraph system.

If you correctly installed the system and are now logged in as the TigerGraph system user, you should be able to enter the GSQL shell by typing the gsql command from an operating system prompt. If this command has never worked, then probably the installation was not successful.  If it works but you are not sure what to do next, please see the GSQL Tutorial and Demo Examples guide.


  • The system does not seem to be running correctly.

If you believe you have installed the system correctly (e.g., you followed the TigerGraph Platform Installation Guide v1.0 and received no errors, and the gsql and gadmin commands are now recognized), then please contact support@tigergraph.com and summarize your issue in the email subject.


  • Do I need to start the TigerGraph servers (e.g., GPE, GSE) to run the system?

Updated

Different servers are needed for different purposes, but the TigerGraph should automatically turn services on and off as needed. Please be sure that the Dictionary (dict) server is on when using the TigerGraph system:

To check the status of servers:

$ gadmin status



  • Does the TigerGraph system have in-tool help?

Updated

Yes. For the GSQL shell and language, first enter the shell (type gsql from an operating system prompt).  Then type the help command, e.g.,

HELP

This gives you a short list of commands.  Note that "help" itself is one of the listed commands; there are help options to get more details about BASIC , QUERY commands.  For example,

HELP QUERY

lists the command syntax for queries. See the "System Basics" section of the GSQL Language Reference, Part 1: Defining Graphs and Loading Data . The gadmin administration tool also has a help menu and a manual page:

$ gadmin -h

$ man gadmin


  • Is the GSQL language case sensitive?

User-defined identifiers are case-sensitive. For example, the names User and user are different. The GSQL language keywords (e.g., CREATE, LOAD, VERTEX) are not case-sensitive, but in our documentation examples, we generally show keywords in ALL CAPS to make them easy to distinguish.


  • What are the rules for naming identifiers?

An identifier consists of letters, digits, and the underscore.  Identifiers may not begin with a digit.  Identifiers are case sensitive.  Special naming rules apply to accumulators (see the Query section).


  • When are quotation marks required? Single or double quotes?

The general rule is that string literals within the GSQL language are enclosed in double quotation marks.  For data that is to be imported (not yet in the GSQL data store), the GSQL loading language lets the user specify how data fields are delimited within your input files. The loading language has an option to specify whether single quotes or double quotes are used to mark strings. For more help on loading, see the "Loading Data" section of this document or of the <em><strong>GSQL Language Reference, Part 1: Defining Graphs and Loading Data</strong></em> .


  • Can I run GSQL Shell commands in batch command?

Yes. You can create a text file containing a sequence of GSQL commands and then execute that file. To execute from outside the shell:
$ gsql filename

To execute the command file from within the shell:

@ filename

See also the "Language Basics" and "System Basics" sections of the <em><strong>GSQL Language Reference, Part 1: Defining Graphs and Loading Data</strong></em> document.


  • I have a long command line. Can I split it into multiple lines?

Yes. Normally, an end-of-line character triggers execution of a line.  You can use the BEGIN and END keywords to mark off a multi-line block of text that should not be executed until END is encountered.

This is an example of a loading statement split into mulitple lines using BEGIN and END:

BEGIN CREATE ONLINE_POST JOB load1 FOR GRAPH LaborForce { LOAD TO VERTEX user VALUES ($0, _, _, _), TO VERTEX occupation VALUES ($0, _), TO EDGE user_occupation VALUES ($0, $1); } END

Back to Top


♦ Defining a Graph Schema

  • What are the components of a graph schema?

Updated
A TigerGraph graph schema consists of (A) one or more vertex types, (B) one or more edge types, and (C) a graph type.  Each edge type is defined to be either DIRECTED or UNDIRECTED. The graph type is simply the list of vertex types and edges types which may exist in the graph. For more: See the section "Defining a Graph Schema" in the <em><strong>GSQL Language Reference, Part 1: Defining Graphs and Loading Data</strong></em> .  Below is an example of a graph schema containing two vertex types, one edge type, and one graph type:

CREATE VERTEX user (PRIMARY_ID user_id UINT, age UINT, gender STRING, postalCode STRING) CREATE VERTEX occupation (PRIMARY_ID occ_id STRING, occ_name STRING) CREATE UNDIRECTED EDGE user_occupation (FROM user, TO occupation) CREATE GRAPH LaborForce (user, occupation, user_occupation)

Alternately, a generic CREATE GRAPH statement can be used:

CREATE GRAPH LaborForce (*)
  • What data types do you support for vertex and edge attributes?

Updated

Each attribute of a vertex or edge has an assigned data type. v0.8 of the TigerGraph adds support for many more attribute types.: DATETIME, UDT, and container types LIST, SET, and MAP. The following is an abbreviated list.  For a complete list and description, see the section "Attribute Data Types" of the <em><strong>GSQL Language Reference, Part 1: Defining Graphs and Loading Data</strong></em> .

Primitive Types Advanced Types Complex Types

INT

UINT

FLOAT

DOUBLE

BOOL

STRING

STRING COMPRESS

DATETIME

User-Defined Tuple (UDT)

LIST

SET

MAP


Discontinued Feature

The UINT_SET and STRING_SET COMPRESS types have been discontinued since there is now equivalent functionality from the more general SET<INT> and SET<STRING> types.



  • Can I define and load multiple graph schemas?

Currently, the GSQL supports only one graph schema at a time.  However, you can merge multiple logic graphs by using distinct vertex types and edge types for the different logical graphs.


  • How do I check the definition of the current schema?

From within the GSQL Shell, the ls command lists the catalog : the vertex type, edge type, and graph type definitions, job definitions, query  definitions, and some system configuration settings.


  • How do I modify my graph schema?

The GSQL language includes ADD, ALTER, and DROP commands. See the section "Update Your Data" in the GSQL Tutorial and Demo Examples or the section "Modifying a Graph Schema" in the <em><strong>GSQL Language Reference, Part 1: Defining Graphs and Loading Data</strong></em> for details. Note that altering the graph schema will invalidate your old data loading and query jobs.  You should create and install new loading and query jobs.


  • How do I delete my entire graph schema?

Updated

To delete your entire catalog, containing not just your vertex, edge, and graph type definitions, but also your loading job and query definitions, use the following command:
GSQL> DROP ALL

To delete just your graph schema, you need to drop the individual components, starting with the graph type, then the edge types, and finally the vertex types. For example:
GSQL> DROP GRAPH g1
GSQL> DROP EDGE e1
GSQL> DROP EDGE e2
GSQL> DROP VERTEX v1
GSQL> DROP VERTEX v2

UPDATE Deleting the graph schema also erases the contents of the graph store. To erase the graph store without deleting the graph schema, use the following command:
GSQL> CLEAR GRAPH STORE

See also " How do I erase all data? "


  • Why do the graph schema operations (CREATE VERTEX/EDGE/GRAPH and DROP GRAPH/ALL) seem slower than before?

Created

The overall latency to create a graph and load data are the same or faster than before, but some operations have been shifted. As of v0.8, the INIT GRAPH STORE command is no longer a separate command. The activities it used to perform are now performed by the DROP GRAPH/ALL and the CREATE GRAPH commands.

  • DROP GRAPH/ALL now resets the graph store and ID service (which had been done by INIT GRAPH STORE).
  • CREATE GRAPH now initializes the graph store according to the new schema (which had been done by INIT GRAPH STORE).

Also, starting with v0.8, each CREATE VERTEX/EDGE/GRAPH statement updates the dictionary, which adds a minor latency.

Back to Top


♦ Loading Data

  • How do I load data?

Updated

See the GSQL Tutorial and Demo Examples for introductory examples. See GSQL Language Reference, Part 1: Defining Graphs and Loading Data for the complete specifications. We also have a cheatsheet; go to doc.tigergraph.com .


  • Why has Offline loading been deprecated?

Created

Online loading is preferred. Online loading can do everything that offline loading can do, plus it has the following advantages:

  1. Can run while other graph operations are in progress.
  2. Uses multithreaded execution for faster performance.
  3. Does not need to turn the GPE off, which saves time.
  4. Its data source is specified at run time rather than at compile time.
  5. Can add data to an existing graph.
  • What are the syntax differences between online loading and offline loading?

Updated

Offline loading is deprecated and is now being emulated by online loading. Therefore, there is no performance difference.

There are some small syntax differences between the two styles:

Syntax Detail Online Offline
CREATE JOB statement Keyword ONLINE_POST is used:

CREATE ONLINE_POST JOB job_name FOR GRAPH graph_name {...
Keyword LOADING is used:

CREATE LOADING JOB job_name FOR GRAPH graph_name {...
input source filename Filename appears in the USING clause in the RUN statement, so one JOB can handle only one input file:

RUN myJob USING FILENAME=" myFile "

Filename appears in the LOAD statement, so one JOB can handle multiple input files if it has multiple LOAD statements:

LOAD " myFile1 " TO ...
LOAD " myFile2 " TO ...

HEADER, SEPARATOR, EOL, QUOTE parameters These parameters appear at the end of the RUN statement:

RUN myJob USING FILENAME=" myFile ", SEPARATOR=",', QUOTE="double"
These parameters appear at the end of the LOAD statement:

LOAD " myFile " TO ... USING SEPARATOR=",', QUOTE="double"


  • How do I convert an offline loading job to an online loading job?

Created

See the offline2online command, described in <em><strong>GSQL Language Reference, Part 1: Defining Graphs and Loading Data</strong></em> .


Back to Top

  • What types of data can be read?

Updated

The GSQL data loader reads text files organized in tabular or JSON format . Each field may represent numeric, boolean, string, or binary data. Each data field may contain a single value or a list of values (see How do I split a data field containing a list of values into separate vertices and edges? ).


  • What is the format of an tabular input data file?

Updated

Each tabular input data file should be structured as a table, in which each line represents a row, and each row is a sequence of data fields, or columns. A data field can contain string or numeric data. To represent boolean values, 0 or 1 is expected. A header line may be included, to associate a name with each column. A designated character separates columns.  For example, if the designated separator character is the comma, this format is commonly called CSV, for Comma-Separated Values.  Below is an example of a CSV file with a header. The uid column is int type, name is string type, avg_score is float type, and is_member is boolean type. See simple examples in Real-Life Data Loading and Querying Examples and a complete specification in the section "Creating a Loading Job" in <em><strong>GSQL Language Reference, Part 1: Defining Graphs and Loading Data</strong></em> .

uid,name,avg_score,is_member 100,"Lee, Tom",48.5,1 101,"Wu, Ming",33.9,0 102,"Gables, Anne", 72.2,1

The loader does not filter out extra white space (spaces or tabs).  The user should filter out extra white space from the files before loading into the TigerGraph system.


  • How should data fields be separated?

The data field (or token ) separator can be any single ASCII character, including one of the non-printing characters. The separator is specified with the SEPARATOR phrase in the USING clause.  For example, to specify the semicolon as the separator:
USING SEPARATOR=";"

To specify the tab character, use " \t" . To specify any ASCII character, use "\nn" , where nn is the character's ASCII code, in decimal. For example, to specify ASCII 30, the Record Separator (RS):
USING SEPARATOR="\30"


  • Should fields be enclosed in quotation marks?

TigerGraph does not require fields to be enclosed in quotation marks, but is it recommended for string fields. If the QUOTE option is enabled, and if the loader finds a pair of quotation marks, then the loader treats the text within the quotation marks as one value, regardless of any separation characters that may occur in the value. The user must specify whether strings are marked by single quotation marks or double quotation marks.
USING QUOTE="single"
or
USING QUOTE="double"

For example, if SEPARATOR="," and QUOTE="double" are set, then when the following data are read,

uid,name,avg_score,is_member 100,"Lee, Tom",48.5,1 101,"Wu, Ming",33.9,0 102,"Gables, Anne,"72.2,1

"Lee, Tom" will be read as a single field. The comma between Lee and Tom will not separate the field.


  • Does the GSQL Loader automatically interpret quotation marks as enclosing strings?

No. You must specify either QUOTE="single" or QUOTE="double" .

Back to Top

  • What are the parameters (in the USING clause) for a loading job?

Updated

The following three parameters should be considered for every loading job from a tabular input file:

Parameter Meaning of value Allowed values Comments
SEPARATOR specifies the special character that separates tokens (columns) in the data file any single ASCII character Required.
"\t" for tab
"\nn" for ASCII decimal code nn
HEADER

whether the data file's first line is a header line which assigns names to the columns.

In offline loading, the Loader reads the header line to obtain mnemonic names for the columns. In online loading, the Loader just skips the header line.

"true", "false" Default = "false"
QUOTE

specifies whether strings are enclosed in
single quotation marks: 'a string'
or double quotation marks: "a string"

"single", "double" Optional; no default value.

The next two parameters, FILENAME and EOL are required if the job is an ONLINE_POST job:

Parameter Meaning of value Allowed values Comments
FILENAME name of input data file any valid path to a data file Required for online loading.
Not allowed for offloading loading
EOL the end-of-line character any ASCII sequence Default = "\n" (system-defined newline character or character sequence)

All of the these five parameters are combined into one USING clause with a list of parameter/value pairs.  The parameters may appear in any order.

USING parameter1="value1", parameter2="value2",... , parameterN="valueN"



The location of the USING clause depends on whether the job is an offline loading job or an online loading job. For offline loading, the USING clause appears at the end of the LOAD statement. For example:

CREATE LOADING JOB load1 FOR GRAPH LaborForce{ LOAD "jobs.csv" TO VERTEX occupation VALUES ($0, $1) USING HEADER="true", SEPARATOR="|", QUOTE="double"; }

For online loading, the USING clause appears at the end of the RUN statement

CREATE ONLINE_POST JOB load2 FOR GRAPH LaborForce{ LOAD TO VERTEX occupation VALUES ($0, $1); } RUN JOB load2 USING FILENAME="./jobs.csv", HEADER="true", SEPARATOR="|", QUOTE="double", EOL="\n"
  • My data file doesn't have a header but I still want to name the columns.

You can define a header line (a sequence of column names) within a loading job using a DEFINE HEADER statement, such as the following:

DEFINE HEADER head1 = "jobId", "jobName";

This statement must appear before the LOAD statement that wishes to use the header definition.  Then, the LOAD statement must set the USER_DEFINED_HEADER parameter in the USING clause. A brief example is shown below:

CREATE ONLINE_POST JOB load2 FOR GRAPH LaborForce{ DEFINE HEADER head1 = "jobId", "jobName"; LOAD TO VERTEX occupation VALUES ($"jobId", $"jobName") USING USER_DEFINED_HEADER="head1"; }
  • How do I identify and refer to the input data fields?

Input data fields can always be referenced by position. They can also be referenced by name, if a header has been defined.

    • Position-based reference: The leftmost field is $0 , the next one is $1 , and so on.
    • Name-based reference: $" name " , where name is one of the header column names.

For example, if the header is
abc,def,ghi

then the third field can be referred to as either $2 or $"ghi" .

Back to Top

  • How do I split (flatten) a data field containing a list of values into separate vertices and edges?

First, to clarify the task, consider a graph schema with two vertex types, Book and Genre, and one edge type, book_genre:

create_book_schema.gsql
CREATE VERTEX Book (PRIMARY_ID bookcode STRING, title STRING) CREATE VERTEX Genre (PRIMARY_ID genre_id STRING, genre_name STRING) CREATE UNDIRECTED EDGE book_genre (FROM Book, TO Genre) CREATE GRAPH book_rating (Book, Genre, book_genre)

Further, each row of the input data file contains three fields: bookcode , title , and genres , where genres is a list of strings associated with the book. For example, the first few lines of the data file could be the following:

book.dat
bookcode|title|genres 101|"Harry Potter and the Philosopher's Stone"|fiction,fantasy,young adult 102|"The Three-Body Problem"|fiction,science fiction,Chinese

The data line for bookcode 101 should generate one Book instance ("Harry Potter and the Philosopher's Stone"), four Genre instances ("fiction", "adventure", "fantasy", "young adult"), and four Book_Genre instances, connecting the Book instance to each of the Genre instances.  This process of creating multiple instances from a list field (e.g., the genres field) is called flattening .

To flatten the data, we use a two-step load. The first LOAD statement uses the flatten() function to split the multi-value field and stores the results in a TEMP_TABLE. The second LOAD statement takes the TEMP_TABLE contents and writes them to the final edge type.

load_books.gsql
CREATE ONLINE_POST JOB load_books FOR GRAPH book_rating { LOAD TO VERTEX Book VALUES ($0, $1), TO TEMP_TABLE t1(bookcode,genre) VALUES ($0, flatten($2,",",1)); LOAD TEMP_TABLE t1 TO VERTEX Genre VALUES($"genre", $"genre"), TO EDGE book_genre VALUES($"bookcode", $"genre"); } RUN JOB load_books USING FILENAME="book.dat", SEPARATOR="|", HEADER="true", QUOTE="double", EOL="\n"

The flatten function has three arguments: (field_to_split, separator, number_of_parts_in_one_field).  In this example, we want to split $2 (genres), the separator is the comma, and each field has only 1 part. So, the flatten function is called with the following arguments: flatten($2, ",",1) .  Using the example of data file , TEMP_TABLE t1 will then contain the following:

bookcode genre
101 fiction
101 adventure
101 fantasy
101 young adult
102 fiction
102 science fiction
102 Chinese

The second LOAD statement uses the TEMP_TABLE t1 to generates Genre vertex instances and book_genre_instances. While there are 7 rows shown in the sample TEMP_TABLE, only 6 Genre vertices will be generated, because there are only 6 unique values; "Fiction" appears twice.  Seven book_genre edges will be generated, one for each row in the TEMP_TABLE.

There is another version of the flatten function which has four arguments and which supports a two-level grouping.  That is, the field contains a list of groups, each group composed of N subfields. The arguments are (field_to_split, group_separator, sub_field_separator, number_of_parts_in_one_group). For example, suppose the data line were organized this way instead:

book2.dat
bookcode|title|genres 101|"Harry Potter and the Philosopher's Stone"|FIC:fiction,ADV:adventure,FTS:fantasy,YA:young adult 102|"The Three-Body Problem"|FIC:fiction,SF:science fiction,CHN:Chinese"

Then the following loading statements would be appropriate:

load_books2.gsql
CREATE ONLINE_POST JOB load_books2 FOR GRAPH book_rating { LOAD TO VERTEX Book VALUES ($0, $1), TO TEMP_TABLE t1(bookcode,genre_id,genre_name) VALUES ($0, flatten($2,",",":",2)); LOAD TEMP_TABLE t1 TO VERTEX Genre VALUES($"genre_id", $"genre_name"), TO EDGE book_genre VALUES($"bookcode", $"genre_id"); } RUN JOB load_books2 USING FILENAME="book2.dat", SEPARATOR="|", EOL="\n"

Back to Top

  • Can the TigerGraph system load data from a streaming source?

Updated

Yes. Use online loading. Specifically, online loading lets you define a general loading process without naming the data source.  Every time you call an online loading job, you name the source file.  It can be a different file each time, or it can be the same file, if the contents of the file are changing over time.  Also, if it happens that the loader re-reads a data line that it has encountered before, it will just reload the data (except for container attributes, e.g., a LIST attribute, using a reduce() loading function. In that case, there is an accumulative effect for re-reading a data line).


  • I want to compute an attribute value. What built-in functions are available?

Updated

The GSQL Loading includes some built-in token functions (a token is one column or field of a data input line.)  A user can also define custom token functions. Please see the section "Built-In Loader Token Functions" in the <em><strong>GSQL Language Reference, Part 1: Defining Graphs and Loading Data</strong></em> .


  • Do I need a one-to-one correspondence between input files and vertex types and edge types?

No. One of the advantages of the TigerGraph loading system is the flexible relationship between input files and resulting vertex and edge instances. In general, there is a many-to-many relationship: one input file can generate many vertex and edge types.

From the LOAD statement perspective for a online loading job:

LOAD TO VERTEX vertex_type VALUES (attr_expr...) [WHERE conditions], ..., TO VERTEX vertex_typeN VALUES (attr_expr...) [WHERE conditions], TO EDGE edge_type VALUES (attr_expr...) [WHERE conditions] [OPTION (options)], ..., TO EDGE edge_typeN VALUES (attr_expr...) [WHERE conditions] [OPTION (options)] [Parsing_Conditions];
    • Each LOAD statement refers to one input file.
    • Each LOAD statement can have one or more resulting vertex types and one or more resulting edge types.
    • Hence, one LOAD statement can potentially describe the one-to-many mapping from one input file to many resulting vertex and edge types.
    • It is not necessary for every input line to always generate the same set of vertex types and edge types. The WHERE clause in each TO VERTEX | TO EDGE clause can be used to selectively choose and filter which input lines generate which resulting types.

♦ Updating and Modifying Data

  • How can I insert / load more data?

Updated

If there is already data in the graph store and you wish to insert more data, you have a few options. First, if you have bulk data stored in a file (local disk, remote or distributed storage), you can us e Online Loading .

Second, if you have a few specific insertions, you can use the Upsert da ta command in the RESTPP API User Guide v1.0 .  For Upsert, the data must be formatted in JSON format.

Third, you can write a query containing INSERT statements. The syntax is similar to SQL INSERT. (See GSQL Language Reference Part 2 - Querying v1.0 . ) The advantage of query-based INSERT is that the details (id values and attribute values) can be determined at run time and even can be based on an exploration and analysis of the existing graph.  The disadvantage is that the query-insert job must be compiled first and data values must either be hardcoded or supposed as input parameters.

  • How can I modify the graph schema?

Created

You can modify the schema in several ways:

  • Add new vertex or edge types
  • Drop existing vertex or edge types
  • Add or drop attributes from an existing vertex or edge type

Any schema change can invalidate existing loading jobs and queries.

See the section "Modifying a Graph Schema" in GSQL Language Reference Part 1 - Defining Graphs and Loading Data v1.0 .


  • How do I modify data?

Updated

To make a known modification of a known vertex or edge:
Option 1) Make a RESTPP endpoint request, to the POST /graph or DELETE /graph endpoint.  See the RESTPP API User Guide v1.0 .

Option 2) The Loading language includes an upsert command. The UPSERT statement performs a combined modify-or-add operation, depending on whether the indicated vertex or edge already exists.  Examples of UPSERT are described in the GSQL Tutorial and Demo Examples document.  The GSQL Language Reference Part 1 - Defining Graphs and Loading Data v1.0 provides a full specification .

Option 3) The query language now includes an UPDATE statement which enables sophisticated selection of which vertices and edges to update and how to update them.  Likewise, there is an INSERT statement in the query language. See the GSQL Language Reference Part 2 - Querying v1.0 .


  • How do I selectively delete data?

You can write a query which selects vertices or edges to be deleted. See the DELETE subsections of the "Data Modification Statements" section in GSQL Language Reference Part 2 - Querying v1.0 .

  • How do I erase all the data?

Updated
If you wish to completely clear all the data in the graph store, use the CLEAR GRAPH STORE -HARD command. Be very careful using this command; deleted data cannot be restored (except from a Backup). Note that clearing the data does not erase the catalog definitions of vertex, edge, and graph types.  See also " How do I delete my entire graph schema? "

-HARD must be in all capital letters.

Back to Top


♦ Querying

  • I s there more than one TigerGraph query language?

Updated

Yes. The GSQL Query Language (GQuery) is a full-featured graph query-and-data-computation language. In addition, there is a small lightweight set of buit-in query commands that can inspect the set of stored vertices and edges, but these built-in commands do not support graph traversal (moving from one vertex to another via edges). We refer to this as the Standard Data Manipulation API or the Built-in Query Language (described in RESTPP API User Guide v1.0 and the GSQL Tutorial and Demo Examples )


  • What is the basic syntax for the TigerGraph query language?

Updated

For a first-time user: See the documents GSQL Tutorial and Demo Examples and then GSQL Language Reference Part 2 - Querying v1.0 .
For users with some experience, a reference card is now available: GSQL Query Language Reference Card.


  • Is GQuery a query language or a programming language?

Created

The GSQL Query Language (GQuery) supports powerful graph querying, but it is also designed to perform powerful computations.  It can be used for simple SQL-like queries, but it also features control flow (IF, WHILE, FOREACH), procedural calls, local and global variables, complex data types, and accumulators to enable much more sophisticated use.


  • What types of accumulators are available?

Updated

Three new types were introduced in v0.8: GroupByAccum, BitwiseAndAccum, and BitwiseOrAccum. Version 0.8.1. added ArrayAccum. This is a quick summary. For a more detailed explanation, see the "Accumulator Types" section of GSQL Language Reference Part 2 - Querying v1.0 .

In the following table, baseType means any of the following: INT, UINT, FLOAT, DOUBLE, STRING, BOOL, VERTEX, EDGE, JSONARRAY, JSONOBJECT, DATETIME

Accumulators data types
SumAccum INT, UINT, FLOAT, DOUBLE, STRING
MaxAccum, MinAccum INT, UINT, FLOAT, DOUBLE, VERTEX
AvgAccum INT, UINT, FLOAT, DOUBLE (output is DOUBLE)
AndAccum, OrAccum BOOL
BitwiseAndAccum, BitwiseOrAccum INT (acting as a sequence of bits)
ListAccum, SetAccum, BagAccum baseType, TUPLE, STRING COMPRESS
ArrayAccum accumulator, other than MapAccum, HeapAccum, or GroupByAccum
MapAccum

key: baseType, TUPLE, STRING COMPRESS

value: baseType, TUPLE, STRING COMPRESS, ListAccum, SetAccum, BagAccum, MapAccum, HeapAccum

HeapAccum< tuple_type >(heapSize, sortKey [, sortKey_i]*) TUPLE
GroupByAccum

key: baseType, TUPLE, STRING COMPRESS

accumulator: ListAccum, SetAccum, BagAccum, MapAccum

  • How do I use accumulators?

See the section "Accumulators" in the GSQL Language Reference Part 2 - Querying v1.0 document.


  • How do I reference the ID fields of a vertex or edge in a built-in query?

Updated

Vertex and edge IDs (i.e., the unique identifier for each vertex or edge) are treated differently than user-defined attributes. Special keywords must be used to refer to the PRIMARY_ID, FROM, or TO id fields.

Vertices :

In a CREATE VERTEX statement, the PRIMARY_ID is required and is always listed first. User-defined attributes are optional and come after the required ID fields.

CREATE VERTEX Book (PRIMARY_ID bookcode STRING, title STRING) CREATE VERTEX Genre (PRIMARY_ID genre_id STRING, genre_name STRING) CREATE UNDIRECTED EDGE book_genre (FROM Book, TO Genre) CREATE GRAPH book_rating (Book, Genre, book_genre)


In a built-in query, if you wish to select vertices by specifying an attribute value, you use the attribute name (e.g., title):

SELECT * FROM Book WHERE title=="The Three-Body Problem"

In contrast, if you wish to reference vertices by the id value, the lowercase keyword primary_id must be used. Note that that query does not use the id name pid .

SELECT * FROM Book WHERE primary_id=="101"

Edges :

In a CREATE EDGE statement, the FROM and TO vertex identifiers are required and are always listed first. The FROM and TO values should match the PRIMARY_ID values of a source vertex and a target vertex. In the example below, rating and date_time are user-defined optional attributes.

CREATE UNDIRECTED EDGE book_genre (FROM Book, TO Genre, rating uint, date_time datetime)


In a query, if you wish to select edges by specifying their FROM or TO vertex values, you must use the lowercase keywords from_id or to_id .

SELECT * FROM Book-(book_genre)->Genre WHERE from_id=="101"

  • What is the format of data returned by a query?

Updated
The data are in JSON format. See the section "Output Statementst" in the GSQL Language Reference Part 2 - Querying v1.0 .

  • Is there an output size limit for a data query?

Updated

Yes. The maximum output size for a query is 2GB. If the result of a query would be larger than 2GB, the system may return no data.  No error message is returned.

Also, for built-in queries (using the Standard Data Manipulation REST API), queries return at most 10240 vertices or edges.


  • How and when do I use INSTALL QUERY and INSTALL QUERY -OPTIMIZE?

Updated

INSTALL QUERY query_name is required for each GSQL query, after its initial CREATE QUERY query_name statement and before using RUN QUERY query_name . After INSTALL query has been executed, RUN QUERY can now be used.

Anytime after INSTALL QUERY, another statement, INSTALL QUERY  -OPTIMIZE can be executed once.  This operation optimizes all previously installed queries, reducing their run times by about 20%.

Legal:
CREATE QUERY query1...
INSTALL QUERY query1

RUN QUERY query1(...)
...
INSTALL QUERY -OPTIMIZE    # (optional) optimizes run time performance for query1 and query2
RUN QUERY query1(...)      # runs faster than before


Illegal:
INSTALL QUERY -OPTIMIZE query_name

  • Should I run INSTALL QUERY -OPTIMIZE?

Optimize a query if query run time is more important to you than query installation time.

The initial INSTALL QUERY operation runs quickly. This is good for the development phase.

The optional additional operation INSTALL QUERY -OPTIMIZE will take more time, but it will speed up query run time. This makes sense for production systems.


  • Can I make a 2-dimensional (or multi-dimensional) array?

Yes. A ListAccum is like an array, a 1-dimensional array.  If you nest ListAccums as the elements within an outer ListAccum, you have effectively made a 2-dimensional array. Please read Section "Nested Accumulators" in the GSQL Language Reference Part 2 - Querying v1.0 for more details. Here is an example:

CREATE QUERY nestedAccumEx() FOR GRAPH anyGraph { ListAccum<ListAccum<INT>> @@_2d_list; ListAccum<ListAccum<ListAccum<INT>>> @@_3d_list; ListAccum<INT> @@_1d_list; SumAccum <INT> @@sum = 4; @@_1d_list += 1; @@_1d_list += 2; // add 1D-list to 2D-list as element @@_2d_list += @@_1d_list; // add 1D-enum-list to 2D-list as element @@_2d_list += [@@sum, 5, 6]; // combine 2D-enum-list and 2d-list @@_2d_list += [[7, 8, 9], [10, 11], [12]]; // add an empty 1D-list @@_1d_list.clear(); @@_2d_list += @@_1d_list; // combine two 2D-list @@_2d_list += @@_2d_list; PRINT @@_2d_list; // test 3D-list @@_3d_list += @@_2d_list; @@_3d_list += [[7, 8, 9], [10, 11], [12]]; PRINT @@_3d_list; }

  • Can I make nested container Accumulators?

Updated

Yes, please read Section "Nested Accumulators" in the GSQL Language Reference Part 2 - Querying v1.0 for more details. There are seven types of container accumulators: ListAccum, SetAccum, BagAccum, MapAccum, ArrayAccum HeapAccum, and GroupByAccum. Here the allowed combinations:

  • ListAccum can contain ListAccum.
  • MapAccum and GroupByAccum can contain any container accumulator except HeapAccum.
  • ArrayAccum is always nested.

Here is an example:

CREATE QUERY nestedMap() FOR GRAPH anyGraph { MapAccum<String, MapAccum<int, String>> @@testMap; @@testMap += ("m1" -> (0 -> "value1")); @@testMap += ("m1" -> (1 -> "value2")); @@testMap += ("m2" -> (2 -> "value3")); IF @@testMap.containsKey("m1") THEN PRINT @@testMap.get("m1"); END; //for map, we can get it's value, and then, get the value's key. PRINT @@testMap.get("m1").get(0); }

Back to Top




♦ Testing and Debugging

  • How can I validate a loading job?

To write a loading job, you must know the format of the input data files, so that you can describe to GSQL how to parse each data line and convert it into vertex and edge attributes. To validate a loading job, that is, to check that the actual input data meet your expectations, and that they produce the expected vertices and edges, you can use two features of the RUN JOB command: the -DRYRUN option and loading a specified range of data lines.

The full syntax for an (offline) loading job is the following:

RUN JOB [-DRYRUN] [-n [ first_line_num , ] last_line_num ] job_name

The -DRYRUN option will read input files and process data as instructed by the job, but it does not store data in the graph store.

The -n option limits the loading job to processing only a range of lines of each input data file. The selected data will be stored in the graph store, so the user can check the results. The -n flag accepts one or two arguments. For example,

-n 50 means read lines 1 to 50.
-n 10,50 means read lines 10 to 50.
The special symbol $ is interpreted as "last line", so -n 10,$ means reads from line 10 to the end.


  • Where are the logs?

The following command lists the log locations of the log files:

gadmin log

If the platform has been installed with default file locations, so that <TigerGraph_root_dir> = /home/tigergraph/tigergraph, then the output would be the following:

GPE : /home/tigergraph/tigergraph/logs/gpe/gpe1.out GPE : /home/tigergraph/tigergraph/logs/GPE_1_1/log.INFO GSE : /home/tigergraph/tigergraph/logs/gse/gse1.out GSE : /home/tigergraph/tigergraph/logs/GSE_1_1/log.INFO RESTPP : /home/tigergraph/tigergraph/logs/restpp/restpp1.out RESTPP : /home/tigergraph/tigergraph/logs/RESTPP_1_1/log.INFO RESTPP : /home/tigergraph/tigergraph/logs/RESTPP-LOADER_1_1/log.INFO GSQL : /home/tigergraph/tigergraph/dev/gdk/gsql/output/load_output.log GSQL : /home/tigergraph/tigergraph/dev/gdk/gsql/logs/GSQL_LOG
  • Where are the log files of loading runs?

Updated

Each loading run creates a log file, stored in the folder  <TigerGraph_root_dir>/dev/gdk/gsql/output. The filename load_output.log is a link to the most recent log file. This file contains summary statistics on the number of lines read, the vertices created, and various types of errors encountered. Or, you can type a shell command to find log paths "gadmin log".


  • Where are the log files for GSQL?

The log file is at <TigerGraph_root_dir>/dev/gdk/gsql/GSQL_LOG