Working with Knowledge Graph using AWS Neptune — (Step1) Introduction and Setting Up

Ranjan Debnath
4 min readMay 31, 2020

Hello everyone,

This is my first story on Medium. The working motivation is to make it easier to get to work with AWS Neptune for Knowledge Graph representations by solving the practical problems one can face.

Now the terms mentioned in the title can be explained as followed:

Knowledge Graph is a representation of real-world entities where one entity is is connected with others by different relationships, thus making a strong proof knowledge flowing in and out from themselves. A sample of the knowledge graph is shown below.

AWS Neptune is a fast reliable production-ready graph database. It allows us to store billions of relationships and querying them with milliseconds latency.

By the end of this series, we will create a Knowledge Graph in Neptune and use Gremline(popular graph traversal language of Apache Tinkerpop) for querying the data. This will allow us to leverage the power of data and their interconnections to store and use real-world knowledge in an efficient manner.

To learn more about Gremline and Apache Tinkerpop refer here.

In this post, we will discuss setting up Neptune on AWS.

Setting up:

  1. AWS Subscription: First of all you will create an AWS subscription. You can refer here to get started with a free subscription.
  2. Now before going ahead and start creating the Neptune stack we need to create a key-pair. You can refer to this on how to create the key-pair. Remember at the end of key-pair creation a private key of pen format will get downloaded in your system, save it in a safe place as it can’t be generated again in the future.
  3. Now let’s start and create the Neptune stack. For this, we will use CloudFormation-stack, which will create other necessary services along with Neptune like (VPC, EC2, S3, etc.) to complete the necessary operations on Neptune.
  4. Open this link for the Cloud Formation stack and click on next.
  5. Now you are on the page to specify all your stack details. I will specify the necessary stack parameters and their working below.

a. Stack name: As the name refers, this will correlate all the resources under this stack. For Azure users, you can correlate stack with a resource group.

Parameters(Please keep them by default if not mentioned below to change for easy creation of the whole stack)>

b. AttachBulkloadIAMRoleToNeptuneCluster: This is the IAM role using which will be required during the data bulk load operations in Neptune, this is the fastest way of loading data into Neptune.

c. DBClusterPort: The port number on which Neptune will be running.

d. DbInstanceType: This specifies the hardware configuration of the Neptune. This chart(https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.DBInstanceClass.html#Concepts.DBInstanceClass.Summary) demonstrates the configurations. For this exercise, db.t2.large will be sufficient.

e. EC2ClientInstanceType: This specifies the hardware configuration of the EC2. This chart(https://aws.amazon.com/ec2/pricing/on-demand/). For this exercise, t3.large will be sufficient.

f. EC2SSHKeyPairName: Here you will be selecting the key pair you created in step 2.

g. NeptuneQueryTimeout: This defines the time until which Neptune is going to allow queries to run. After this period, it will cancel it. If your queries are long enough you could tweak this value later also, when getting a timeout error. We will change it to 30000 for a moderate timeout depending on our data and hardware configurations of Neptune.

h. NotebookInstanceType: If you want to access your Neptune DB using AWS Sagemaker which is a fully managed Machine Learning service. For the minimal purpose and depending on the incurred cost, we don’t need it now.

i. SetupGremlinConsole: This is required as we will be using Gremlin as our traversal language. Change it to True.
Now go for Next.

6. This page allows you to specify IAM access, rollback configurations, and notification options for your stack. We do not need to do any changes to this for self-use only. Now go for Next.

7. Reviewal page, here we get a summary of all the stack configurations. Thorough check it once and check on the 2 acknowledgment checkboxes at the bottom and hit the create stack.

Wait for the CREATE_COMPLETE status of CloudFormation stack, also notice the creation events are stacked up. For example, if for a Logical ID A it is showing CREATE_IN_PROGRESS status now in future for the same Logical ID A it will show CREATE_COMPLETE status in a new event on top, don’t get confused with so many CREATE_IN_PROGRESS events, just make sure every CREATE_IN_PROGRESS event has a CREATE_COMPLETE status on top of it.

Once you could see the CREATE_COMPLETE status on the left Stacks panel, your stack creation is complete.

Thanks for holding up with me on this post, this was the journey for setting up the Neptune stack. In the next blog post, we will access the Neptune using Gremlin queries from EC2 instance followed by populating data to create the Knowledge Graph.

References:

i) https://us-east-2.console.aws.amazon.com/cloudformation/home?region=us-east-2#/stacks/create/template?stackName=NeptuneQuickStart&templateURL=https://s3.amazonaws.com/aws-neptune-customer-samples/v2/cloudformation-templates/neptune-full-stack-nested-template.json

ii) https://aws.amazon.com/neptune/

--

--