Practical-7
Aim:- Basic Graph Queries and Implementations with Neo4j.
Neo4j
Cypher Query Language
Cypher is Neo4j’s graph query language that lets you retrieve data from the graph. It is like
SQL for graphs, and was inspired by SQL so it lets you focus on what data you want out of
the graph (not how to go get it). It is the easiest graph language to learn by far because of its
similarity to other languages and intuitiveness.
Cypher is unique because it provides a visual way of matching patterns and relationships.
Cypher was inspired by an ASCII ASCII-art
art type of syntax where (nodes)- (nodes)
[:ARE_CONNECTED_TO]->(otherNodes
>(otherNodes)) using rounded brackets for circular (nodes), and -
[:ARROWS]-> > for relationships. When you write a query, you draw a graph pattern through
your data.
Neo4j users use Cypher to construct expressive and efficient queries to do any kind of create,
read, update,
te, or delete (CRUD) on their graph, and Cypher is the primary interface for Neo4j.
Once you start neo4j, you can use the :play cypher command inside of Neo4j Browser to get
started.
Neo4j’s developer pages cover the basics of the language, which you ccan an explore by topic
area below, starting with basic material, and building up towards more complex material.
Cypher provides first class support for a number of data types. These fall into several
categories which will be described in detail in the followi
following subsections:
Property types: Integer, Float, String, Boolean, Point, Date, Time, LocalTime, DateTime,
LocalDateTime, and Duration.
Structural types: Node, Relationship, and Path.
Composite types: List and Map.
In Cypher, comments are added by start
starting
ing a line with // and writing text after the slashes.
Using two forward slashes designates the entire line as a comment, explaining syntax or
query functionality.
In Cypher, representing nodes involves enclosing them in parentheses, mirroring the visual
representation
epresentation of circles used for nodes in the graph model. Nodes, which signify data
entities, are identified by finding nouns or objects in the data model. For instance, in the
example (Sally), (John), (Graphs), and (Neo4j) are nodes.
1. Node Variables:
- Nodes in Cypher can be assigned variables, such as (p) for person or (t) for thing. These
variables function similarly to programming language variables, allowing you to reference the
nodes by the assigned name later in the query.
2. Node Labels:
- Node labels in Cypher, similar to tags in the property graph data model, help group similar
nodes together. Labels, like Person, Technology, and Company, act as identifiers, aiding in
specifying certain types of entities to look for or create. Using node labels in queries helps
Cypher optimize execution and distinguish between different entities.
() //anonymous node (no label or variable) can refer to any node in the database
(p:Person) //using variable p and label Person
(:Technology) //no variable, label Technology
(work:Company) //using variable work and label Company
In Cypher, relationships are denoted by arrows (--> or <--) between nodes, resembling the
visual representation of connecting lines. Relationship types and properties can be specified
in square brackets within the arrow. Directed relationships use arrows, while undirected
relationships use double dashes (--), allowing flexible traversal in either direction without
specifying the physical orientation in queries.
//data stored with this direction
CREATE (p:Person)-[:LIKES]->(t:Technology)
//query relationship backwards will not return results
MATCH (p:Person)<-[:LIKES]-(t:Technology)
//better to query with undirected relationship unless sure of direction
MATCH (p:Person)-[:LIKES]-(t:Technology)
1. Relationship Types:
- Relationship types in Cypher categorize connections between nodes, providing meaning to
the relationships similar to how labels group nodes. Good naming conventions using verbs or
actions are recommended for clarity and readability in Cypher queries.
2. Relationship Variables:
- Like nodes, relationships in Cypher can be assigned variables such as [r] or [rel]. These
variables, whether short or expressive like [likes] or [knows], allow referencing the relationship
later in a query. Anonymous relationships can be specified with two dashes (--, -->, <--) if they
are not needed for reference.
In Cypher, node and relationship properties are represented using curly braces within the
parentheses for nodes and brackets for relationships. For example, a node property is
expressed as `(p:Person {name: 'Sally'})`, and a relationship property is denoted as `-
[rel:IS_FRIENDS_WITH {since: 2018}]->`.
In Cypher, patterns are composed of nodes and relationships, expressing the fundamental
structure of graph data. Patterns can range from simple to intricate, and in Cypher, they are
articulated by combining node and relationship syntax, such as `(p:Person {name: "Sally"})-
[rel:LIKES]->(g:Technology {type: "Graphs"})`.
Creating Data with CREATE Clause:
- In Cypher, the `CREATE` clause is used to add data by specifying patterns representing
graph structures, labels, and properties. For example, `CREATE (:Movie {title: 'The Matrix',
released: 1997})` creates a movie node with specified properties.
- To return created data, the `RETURN` clause is added, referencing variables assigned to
pattern elements. For instance, `CREATE (p:Person {name: 'Keanu Reeves'}) RETURN p`
creates a person node and returns it in the result.
Matching Patterns with MATCH Clause:
- The `MATCH` clause is used for finding patterns in the graph. It enables specifying
patterns similar to `CREATE` but focuses on identifying existing data. For example, `MATCH
(m:Movie) RETURN m` finds all movie nodes.
- The `MERGE` clause combines elements of `MATCH` and `CREATE`, ensuring
uniqueness by checking for existing data before creating. It's useful for creating or matching
nodes and relationships. For instance, `MERGE (m:Movie {title: 'Cloud Atlas'}) ON CREATE
SET m.released = 2012 RETURN m` merges or creates a movie node and returns it.
- Return values can be aliased for better readability using the `AS` keyword. For example,
`RETURN tom.name AS name, tom.born AS 'Year Born'` provides cleaner and more
informative result labels.
Filtering results
Explore result refinement in Cypher by using the WHERE clause to filter and retrieve specific
subsets of data based on boolean expressions, predicates, and comparisons, including logical
operators like AND, OR, XOR, and NOT.
MATCH (m:Movie)
WHERE m.title = 'The Matrix'
RETURN m
1. Inserting Data:
- Use the `CREATE` keyword to add nodes and relationships to Neo4j.
- Patterns can be created blindly, but using `MATCH` before `CREATE` ensures
uniqueness.
CREATE (j:Person {name: 'Jennifer'})-[rel:IS_FRIENDS_WITH]->(m:Person {name:
'Mark'})
2. Updating Data:
- Modify node properties using `SET` after a `MATCH` statement.
- Update relationship properties similarly by specifying the relationship in the `MATCH`
clause.
MATCH (p:Person {name: 'Jennifer'})
SET p.birthdate = date('1980-01-01')
RETURN p
3. Deleting Data:
- Delete relationships with `DELETE` after a `MATCH` specifying the relationship.
MATCH (j:Person {name: 'Jennifer'})-[r:IS_FRIENDS_WITH]->(m:Person {name: 'Mark'})
DELETE r
- Delete nodes without relationships using `DELETE` after a `MATCH` specifying the node.
MATCH (m:Person {name: 'Mark'})
DELETE m
- Use `DETACH DELETE` to delete a node along with its relationships.
MATCH (m:Person {name: 'Mark'})
DETACH DELETE m
4. Deleting Properties:
- Remove properties using `REMOVE` or set them to `null` with `SET` for nodes.
//delete property using REMOVE keyword
MATCH (n:Person {name: 'Jennifer'})
REMOVE n.birthdate
//delete property with SET to null value
MATCH (n:Person {name: 'Jennifer'})
SET n.birthdate = null
5. Avoiding Duplicate Data with MERGE:
- Use `MERGE` to perform a "select-or-insert" operation for nodes and relationships.
- `MERGE` checks for the entire pattern's existence and creates it if not found.
MERGE (mark:Person {name: 'Mark'})
RETURN mark
6. Handling MERGE Criteria:
- Utilize `ON CREATE SET` and `ON MATCH SET` to specify actions during node or
relationship creation or matching.
- This helps initialize properties when creating and update properties when matching.
MATCH (j:Person {name: 'Jennifer'})
MATCH (m:Person {name: 'Mark'})
MERGE (j)-[r:IS_FRIENDS_WITH]->(m)
RETURN j, r, m
Graph algorithms and their applications
Graph algorithms provide one of the most potent approaches to analyzing connected data
because their mathematical calculations are specifically built to operate on relationships. They
describe steps to be taken to process a graph to discover its general qualities or specific
quantities.
The library contains implementations for the following types of algorithms:
Path Finding - these algorithms help find the shortest path or evaluate the availability
and quality of routes
Centrality - these algorithms determine the importance of distinct nodes in a network
Community Detection - these algorithms evaluate how a group is clustered or
partitioned, as well as its tendency to strengthen or break apart
Similarity - these algorithms help calculate the similarity of nodes
Topological link prediction - these algorithms determine the closeness of pairs of nodes
Node Embeddings - these algorithms compute vector representations of nodes in a
graph.
Node Classification - this algorithm uses machine learning to predict the classification
of nodes.
Link prediction - these algorithms use machine learning to predict new links between
pairs of nodes
Neo4j optimization techniques
Memory Configuration Guidelines:
OS Memory Sizing:
- Reserve around 1GB for non-Neo4j server activities.
- Avoid exceeding available RAM to prevent OS swapping, which impacts performance.
Page Cache Sizing:
- Utilize the page cache to cache Neo4j data stored on disk.
- Estimate page cache size by summing the sizes of relevant database files and adding a
growth factor.
- Configure the page cache size in `neo4j.conf` (default is 50% of available RAM).
Heap Sizing:
- Configure a sufficiently large heap space for concurrent operations (8G to 16G is often
adequate).
- Adjust heap size using parameters `dbms.memory.heap.initial_size` and
`dbms.memory.heap.max_size` in `neo4j.conf`.
- Set these parameters to the same size for optimal performance.
*(Refer to the Neo4j Operations Manual for detailed discussions on heap memory
configuration, distribution, and garbage collection tuning.)*
Logical Logs:
- Logical transaction logs are crucial for recovery after an unclean shutdown and incremental
backups.
- Log files are rotated after reaching a specified size (e.g., 25 MB).
- Configure log retention policy using the `dbms.tx_log.rotation.retention_policy` parameter
(recommended: 7 days).
Number of Open Files:
- The default open file limit of 1024 may be insufficient, especially with multiple indexes or
high connection volumes.
- Increase the limit to a practical value (e.g., 40000) based on usage patterns.
- Adjust system-wide open file limit following platform-specific instructions (ulimit command for
current session).
Real-world graph database scenarios
Example #1: Using Neo4j to determine customer preferences
Suppose we need to learn preferences of our customers to create a promotional offer for a
specific product category, such as notebooks. First, Neo4j allows us to quickly obtain a list of
notebooks that customers have viewed or added to their wish lists. We can use this code to
select all such notebooks:
MATCH (:Customer)-[:ADDED_TO_WISH_LIST|:VIEWED]->(notebook:Product)-[:IS_IN]-
>(:Category {title: 'Notebooks'})
RETURN notebook;
Now that we have a list of notebooks, we can easily include them in a promotional offer. Let’s
make a few modifications to the code above:
CREATE(offer:PromotionalOffer {type: 'discount_offer', content: 'Notebooks discount offer...'})
WITH offer
MATCH (:Customer)-[:ADDED_TO_WISH_LIST|:VIEWED]->(notebook:Product)-[:IS_IN]-
>(:Category {title: 'Notebooks'})
MERGE(offer)-[:USED_TO_PROMOTE]->(notebook);
We can track the changes in the graph with the following query:
MATCH (offer:PromotionalOffer)-[:USED_TO_PROMOTE]->(product:Product)
RETURN offer, product;
Linking a promotional offer with specific customers makes no sense, as the structure of
graphs allows you to access any node easily. We can collect emails for a newsletter by
analyzing the products in our promotional offer.
When creating a promotional offer, it’s imimportant
portant to know what products customers have
viewed or added to their wish lists. We can find out with this query:
MATCH (offer:PromotionalOffer{type:
PromotionalOffer{type: 'discount_offer'}
'discount_offer'})-[:USED_TO_PROMOTE]
[:USED_TO_PROMOTE]-
>(product:Product)<-[:ADDED_TO_WISH_LIST
[:ADDED_TO_WISH_LIST|:VIEWED]-(customer:Customer))
RETURNoffer, product, customer;
This example is simple, and we could have implemented the same functionality in a relational
database. But our goal is to show the intuitiveness of Cypher and to demonstrate how simple
it is to write queries in Neo4j.
Example #2: Using Neo4j to devise promotional offers
Now let’s imagine that we need to develop a more efficient promotional campaign. To
increase conversion rates, we should offer alternative products to our customers. For
example, if a customer shows interest in a certain product but doesn’t buy it, we can create a
promotional offer that contains alternative products.
To show how this works, let’s create a promotional offer for a specific customer:
MATCH (alex:Customer {name: 'Alex McGyver'})
MATCH (free_product:Product)
WHERE NOT ((alex)-->(free_product))
MATCH (product:Product)
WHERE ((alex)-->(product))
MATCH (free_product)-[:IS_IN]->()<-[:IS_IN]-(product)
WHERE ((product.price - product.price * 0.20) >= free_product.price<= (product.price +
product.price * 0.20))
RETURN free_product;
This query searches for products that don’t have either ADDED_TO_WISH_LIST, VIEWED,
or BOUGHT relationships with a client named Alex McGyver. Next, we perform an opposite
query that finds all products that Alex McGyver has viewed, added to his wish list, or bought.
Also, it’s crucial to narrow down recommendations, so we should make sure that these two
queries select products in the same categories. Finally, we specify that only products that cost
20 percent more or less than a specific item should be recommended to the customer.
Now let’s check if this query works correctly.
The product variable is supposed to contain the following items:
Xiaomi Mi Mix 2 (price: $420.87). Price range for recommendations: from $336.70 to $505.04.
Sony Xperia XA1 Dual G3112 (price: $229.50). Price range for recommendations: from
$183.60 to $275.40.
The free_product variable is expected to have these items:
Apple iPhone 8 Plus 64GB (price: $874.20)
Huawei P8 Lite (price: $191.00)
Samsung Galaxy S8 (price: $784.00)
Sony Xperia Z22 (price: $765.00)
Note that both product and free_product variables contain items that belong to the same
category, which means that the [:IS_IN]->()<-[:IS_IN] constraint has worked.
As you can see, none of the products except for the Huawei P8 Lite fits in the price range for
recommendations, so only the P8 Lite will be shown on the recommendations list after the
query is executed.
Now we can create our promotional offer. It’s going to be different from the previous one
(personal_replacement_offer instead of discount_offer), and this time we’re going to store a
customer’s email as a property of the USED_TO_PROMOTE relationship as the products
contained in the free_product variable aren’t connected to specific customers. Here’s the full
code for the promotional offer:
MATCH (alex:Customer {name: 'Alex McGyver'})
MATCH (free_product:Product)
WHERE NOT ((alex)-->(free_product))
MATCH (product:Product)
WHERE ((alex)-->(product))
MATCH (free_product)-[:IS_IN]->()<-[:IS_IN]-(product)
WHERE ((product.price - product.price * 0.20) >= free_product.price<= (product.price +
product.price * 0.20))
CREATE(offer:PromotionalOffer {type: 'personal_replacement_offer', content: 'Personal
replacement offer for ‘ + alex.name})
WITH offer, free_product, alex
MERGE(offer)-[rel:USED_TO_PROMOTE {email: alex.email}]->(free_product)
RETURN offer, free_product, rel;
Let’s take a look at the result of this query:
In the form of a graph
In the form of a table
Example #3: Building a recommendation system with Neo4j
The Neo4j database proves useful for building a recommendation system.
Imagine we want to recommend products to Alex McGyver according to his interests. Neo4j
allows us to easily track the products Alex is interested in and find other customers who also
have expressed interest in these products. Afterward, we can check out these customers’
preferences and suggest new products tto Alex.
First, let’s take a look at all customers and the products they’ve viewed, added to their wish
lists, and bought:
MATCH (customer:Customer)-->(product:Product)
RETURNcustomer, product;
As you can see, Alex has two touch points with other customers: the Sony Xperia XA1 Dual
G3112 (purchased by Allison York) and the Nikon D7500 Kit 18 18–105mm
105mm VR (viewed by Joe
Baxton). Therefore, in this particular case, our product recommendation system should offer
to Alex those products that Allison
ison and Joe are interested in (but not the products Alex is also
interested in). We can implement this simple recommendation system with the help of the
following query:
MATCH(:Customer{name: 'Alex McGyver'})-->(product:Product)<--(customer:Customer)
MATCH (customer)-->(customer_product:Product)
WHERE (customer_product<>product)
RETURNcustomer, customer_product;
We can further improve this recommendation system by adding new conditions, but the
takeaway is that Neo4j helps you build such systems quickly and easily.
--------------------------------------------
-------------------------------------------------------------------------
-----------------------------