The IBM Graph DB service is one of the many data and analytics services available on Bluemix. If you are not familiar with graph databases, than you can read more about them on Wikipedia. The primary use case for graph databases is to describe relationships between different objects. Graph DBs are composed of two components, vertices and edges. A vertex represents some type of object and an edge represents a relationship between two vertices. The classic example is to represent your social network via a graph database. Each vertex represents a person and an edge between a person represents the fact that those two people know each other. Vertices and edges can have properties associated with them. For example, a person might have a name property associated with the vertex. An edge between two people might have a property representing the date the relationship was established.

The IBM Graph DB service on Bluemix is based on the Apache ThinkerPop stack. You can interact with the service by either using the TinkerPop query language or via a simplified HTTP REST API.

To use the IBM Graph DB you first need to create an instance of the service in a space in your org. After the service is created you can easily access the URL and credentials to connect to the DB by selecting the service in your dashboard. With your credentials in hand you can start issuing REST requests to the DB. For more information see the service’s getting started documentation.

If you are going to be doing more than making a couple of requests to the service I recommend you generate a session token and use that for authentication instead of using basic authentication. If you make to many requests to the service using basic authentication, the service will start rejecting your requests, at this point your only option is to use a session token. Details on how to generate a session token can be found in the documentation.

Before you start creating vertices and edges in the DB, you need to create a schema to represent the data. The schema tells the DB the data types and cardinality of the properties we are going to use to describe our vertices. It also allows us to provide labels for the vertices and edges so we have a way of distinguishing them. Finally the schema tells the DB which properties to index making our traversal of the DB much more efficient. Below is a sample schema from a DB I have been working on that describes the relationship between a developer and the languages they code in.

{
  "propertyKeys": [
    {
      "name": "name",
      "dataType": "String",
      "cardinality": "SINGLE"
    },
    {
      "name": "languages",
      "dataType": "String",
      "cardinality": "SINGLE"
    },
    {
      "name": "picture",
      "dataType": "String",
      "cardinality": "SINGLE"
    },
    {
      "name": "preferred_language",
      "dataType": "String",
      "cardinality": "SINGLE"
    },
    {
      "name": "bytes",
      "dataType": "Integer",
      "cardinality": "SINGLE"
    },
    {
      "name": "github_id",
      "dataType": "String",
      "cardinality": "SINGLE"
    },
    {
      "name": "twitter_id",
      "dataType": "String",
      "cardinality": "SINGLE"
    },
    {
      "name": "language_percentage",
      "dataType": "Float",
      "cardinality": "SINGLE"
    }
  ],
  "vertexLabels": [
    {
      "name": "person"
    },
    {
      "name": "language"
    }
  ],
  "edgeLabels": [
    {
      "name": "codes_in",
      "multiplicity": "MULTI"
    },
    {
      "name": "used_by",
      "multiplicity": "MULTI"
    }
  ],
  "vertexIndexes": [
    {
      "name": "vByName",
      "propertyKeys": [
        "name"
      ],
      "composite": true,
      "unique": false
    },
    {
      "name": "vByPreferredLang",
      "propertyKeys": [
        "preferred_language"
      ],
      "composite": true,
      "unique": false
    },
    {
      "name": "vByLanguages",
      "propertyKeys": [
        "languages"
      ],
      "composite": false,
      "unique": false
    }
  ],
  "edgeIndexes": [
    {
      "name": "eByName",
      "propertyKeys": [
        "name"
      ],
      "composite": true,
      "unique": false
    },
    {
      "name": "eByLanguagePercentage",
      "propertyKeys": [
        "language_percentage"
      ],
      "composite": true,
      "unique": false
    }
  ]
}

To set the schema in your IBM Graph DB service you would issue a POST request to the /schema endpoint of your DB. For example, using cURL the request would look like this

curl -X "POST" "$apiUrl/schema" \
	-u "$username:$password" \
	-H "Content-Type: application/json" \
	-d "{\"propertyKeys\":[{\"name\":\"name\",\"dataType\":\"String\",\"cardinality\":\"SINGLE\"},{\"name\":\"languages\",\"dataType\":\"String\",\"cardinality\":\"SINGLE\"},{\"name\":\"picture\",\"dataType\":\"String\",\"cardinality\":\"SINGLE\"},{\"name\":\"preferred_language\",\"dataType\":\"String\",\"cardinality\":\"SINGLE\"},{\"name\":\"bytes\",\"dataType\":\"Integer\",\"cardinality\":\"SINGLE\"},{\"name\":\"github_id\",\"dataType\":\"String\",\"cardinality\":\"SINGLE\"},{\"name\":\"twitter_id\",\"dataType\":\"String\",\"cardinality\":\"SINGLE\"},{\"name\":\"language_percentage\",\"dataType\":\"Float\",\"cardinality\":\"SINGLE\"}],\"vertexLabels\":[{\"name\":\"person\"},{\"name\":\"language\"}],\"edgeLabels\":[{\"name\":\"codes_in\",\"multiplicity\":\"MULTI\"},{\"name\":\"used_by\",\"multiplicity\":\"MULTI\"}],\"vertexIndexes\":[{\"name\":\"vByName\",\"propertyKeys\":[\"name\"],\"composite\":true,\"unique\":false},{\"name\":\"vByPreferredLang\",\"propertyKeys\":[\"preferred_language\"],\"composite\":true,\"unique\":false},{\"name\":\"vByLanguages\",\"propertyKeys\":[\"languages\"],\"composite\":false,\"unique\":false}],\"edgeIndexes\":[{\"name\":\"eByName\",\"propertyKeys\":[\"name\"],\"composite\":true,\"unique\":false},{\"name\":\"eByLanguagePercentage\",\"propertyKeys\":[\"language_percentage\"],\"composite\":true,\"unique\":false}]}"

Now we can start creating vertices and edges in our graph db. Lets create a person vertex. Again this can be done easily using the /vertices endpoint. The cURL request would look something like

curl -X "POST" "$apiUrl/g/vertices" \
	-u "$username:$password" \
	-H "Content-Type: application/json" \
	-d "{\"name\":\"Ryan Baxter\",\"languages\":\"Java,Node,Python\",\"picture\":\"https://en.gravatar.com/userimage/12148147/46ccae88e5aae747d53e0b1863f72a4e.jpg?size=200\",\"preferred_language\":\"Node\",\"github_id\":\"ryanjbaxter\",\"twitter_id\":\"ryanjbaxter\"}"

Since I code in Java, Node, and Python we will want to create vertices for those languages and then create the edges between the language vertices and the person vertex.

  curl -X "POST" "$apiUrl/vertices" \
	-u "$username:$password" \
	-H "Content-Type: application/json" \
	-d "{\"name\":\"Java\"}"
  curl -X "POST" "$apiUrl/vertices" \
  -u "$username:$password" \
  -H "Content-Type: application/json" \
  curl -X "POST" "$apiUrl/vertices" \
  -u "$username:$password" \
  -H "Content-Type: application/json" \
  -d "{\"name\":\"Python\"}"

Now that we have our language vertices we can create edges between the vertices. The one thing to keep in mind when doing this is what is called the multiplicity of the edges. The multiplicity of the edges puts restrictions on the number of incoming and outgoing edges from a given vertex. The multiplicity of an edge is defined in the schema you use for your DB. In my case, my vertices can have any number of incoming or outgoing edges because a developer can code in several languages and any language will can have multiple users using it. Lets create an edge from me to Java, Node, and Python.

  curl -X "POST" "$apiUrl/edges" \
  -u "$username:$password" \
	-H "Content-Type: application/json" \
	-d "{\"outV\":\"4256\",\"label\":\"codes_in\",\"inV\":\"4216\",\"language_percentage\":0.7}"
  curl -X "POST" "$apiUrl/edges" \
  -u "$username:$password" \
	-H "Content-Type: application/json" \
	-d "{\"outV\":\"4256\",\"label\":\"codes_in\",\"inV\":\"4128\",\"language_percentage\":0.1}"
curl -X "POST" "$apiUrl/edges" \
  -u "$username:$password" \
	-H "Content-Type: application/json" \
	-d "{\"outV\":\"4256\",\"label\":\"codes_in\",\"inV\":\"4328\",\"language_percentage\":\"0.2\"}"

The properties outV and inV are the IDs of the vertices the edge is starting at and going to. The ID of a vertex is returned as part of the response to the POST request you make to the DB to create vertex. The ID 4256 is the ID of the person vertex for Ryan Baxter. The IDs 4216, 4218, and 4328, are the IDs for Java, Node, and Python.

In addition we want to create edges from the language vertex back to the person vertex in order to allow us to query the DB to find all users of a given language.

curl -X "POST" "$apiUrl/edges" \
	-u "$username:$password" \
	-H "Content-Type: application/json" \
	-d "{\"outV\":\"4216\",\"label\":\"used_by\",\"inV\":\"4256\"}"
curl -X "POST" "$apiUrl/edges" \
	-u "$username:$password" \
	-H "Content-Type: application/json" \
	-d "{\"outV\":\"4218\",\"label\":\"used_by\",\"inV\":\"4256\"}"
curl -X "POST" "$apiUrl/edges" \
	-u "$username:$password" \
	-H "Content-Type: application/json" \
	-d "{\"outV\":\"4328\",\"label\":\"used_by\",\"inV\":\"4256\"}"

That is it, we now have a basic graph structure setup in our DB! You can replay most of these cURL commands to add additional vertices and edges to build a more complex graph.

We can now issue queries to the DB to extract information. To do this you use the /gremlin endpoint. The /gremlin endpoint allows you to use Groovy to traverse your graph. You can learn more about how to use Gremlin and the syntax on the TinkerPop website.

If we want to get everyone who uses Java for example we can issue the following POST to the /gremlin endpoint.

curl -X "POST" "https://ibmgraph-alpha.ng.bluemix.net/107a14cc-8f47-422e-a5ed-89d42544fb19/g/gremlin" \
	-H "Authorization: Basic YjRlZDI2NDYtNDEyOS00NjVmLWJhYjctMmI5YWRjOGVmZDc0OmYxZGQ3OWY2LWZlNTEtNDE0Yy1hNzZhLWNhODJhMmIwODgzMw==" \
	-H "Content-Type: text/plain" \
	-d $'{
  "gremlin": "graph.traversal().V().has(\'name\', \'Java\').out(\'used_by\').values(\'name\')"
}'

The Groovy code that is being executed is graph.traversal().V().has('name', 'Java').out('used_by').values('name'). All this code is doing is looking at each vertex in our database that has a name property equal to Java (which is just one vertex) and then gets all out used_by edges and then gets the name values of the vertices those edges point to.

I hope this introduction has piqued your interest in the IBM Graph DB service and graph databases in general. Once you understand how they work you can start to build some pretty powerful features in your applications. The service is still in beta so we appreciate any feedback you might have with regards to using the service. As always questions about the service can be posted to StackOverflow.


Ryan J Baxter

Husband, Father, Software Engineer