divendres, 27 de desembre del 2013

Neo4j 2.0 - Indexing

NOTE: this post is the unexpected continuation to my yesterday's post on Neo4J. You might want to start there.

DISCLAIMER: I'm no expert on the technology and this is more of personal notes while I keep playing around and learning.



INDEXING

So I went on with Alberto's workshop slides to learn Neo4J to refresh my memory on the features until I reached a certain slide which used indexing:

     START tom=node:node_auto_index(name="Tom Hanks")  
     MATCH (tom)-[:ACTED_IN]->()<- director="" span="">
     RETURN director.name;

I then tried to execute it and found a nasty error message:
   Index `node_auto_index` does not exist 
It is clear what the problem is: the index is missing. But considering it's the auto_index I was trying to use I assume there's no more indexing magic in Neo4J. I then started a pursuit to create an index so that I could reproduce what I had in Neo4J 1.9.x. I mean, it was clear to me now that if I wanted indexing over actors-name I would have to create it myself. So I started digging google to learn some more about indexing in 2.0.0.


Creating and using Indexes

First hit I checked on Indexing is a great webinar by Michael Hunger on new features in Neo4J. For what I could gather (in the matter of indexing) is that the main difference is that they are now truly indexes meaning once created they auto-magically maintained when data is updated/added. I deduce from that statement that this wasn't the case in previous versions. BTW, the index is maintained transactionally, the index is bound to the data transactionally.

So, to create an index you simply need to:

     CREATE INDEX ON :Actor(name)

This approach really simplifies the queries so that my original Cypher query becomes:

     MATCH (actor:Actor)-[:ACTED_IN]->()<- director="" span="">
     WHERE actor.name="Tom Hanks"
     RETURN director.name;

which is simpler and also more aligned to what a SQL-John might expect. What happens under the covers is that Neo4J detects I'm filtering by a field (name) over a labelled node (actor:Actor) and then finds out there's an index by ':Actor(name)'. So, it goes and automagically tries to use it.




But it is flawed

Turns out when I tried try to create the index using:

     CREATE INDEX ON :Actor(name)

it worked at indexing nothing because my dataset doesn't use labels. So I then tried to index anything by name:

     CREATE INDEX ON :(name)
     CREATE INDEX ON :*(name)
     CREATE INDEX ON (name)

it was a total waste of time since indexing requires labels in Neo4J 2.0.x. (insert sadface here). 




INTRODUCING LABELS!

Then, back on my quest to query using an index I noticed my only chance was to create a label and have all nodes that [:ACTED_IN] another node to be labelled as Actor. Turns out to be quite straight forward:

      MATCH (actor)-[:ACTED_IN]->(movie)
      SET actor :Actor
      RETURN actor;

This finally created my label, which unblocked my power to created indexes which allowed to query using them.



FUTURE WORK

Some doubts I need to investigate further:
  • The video mentions there's "no unique indexing yet" but the video is few months old now and is based on Neo4J 2.0.0-M0
  • there's also a mention to 'simple lookups for now' and I wonder what that might mean.
  • While reading the docs on indexing I noticed it is possible to force the usage of a given index when querying (which is wonderful and also expected by some SQL-John's).
  • I read s/where it's possible to alter the indexing technology. That's definitely worth a look at.

dijous, 26 de desembre del 2013

Neo4j 2.0 - Setup and first impressions

During the Christmas Holidays I took some time to play around with Neo4J. This is not the first time I tinker with it but definitely the first time I do it unsupervised. I must say the first time I played around with Neo4J it was under @albertoperdomo 's guidance and it felt like a liberation after several years of RDBMS.

DISCLAIMER: I'm a total newbie at Neo4j so don't take my advice for anything I'll be writing, this post is more of a compendium of notes for myself to check in the future.

INSTALLING

Installing any version (I needed 2.0.0) of Neo4J is insultingly simple thanks to @thedevel script: ndm. I even tweeted about it (again, for my own reference). Let me point out that even manually, isntalling neo4j is really simple.

IMPORTING MOVIE DATABASE

During Alberto's workshop intro to Neo4J we had a lot of small quizes so that each would have to keep on investigating and putting small concepts into practice. My idea to start playing around with Neo4J was to load that clean movie database, refresh some concepts from the notes I took during the workshop and then try to move on from that point.

First issue I faced with new features in Neo4J 2.0.0 were small syntax changes causing the load of a 1.9.3 database to fail. It's ended up being something quite silly though. What used to be:

     START n=node(*) MATCH (n)-[r?]-() DELETE r,n;

now turned into:

     START n=node(*) 
     MATCH (n)--() 
     OPTIONAL MATCH ()-[r]-() 
     DELETE r,n

There seems to be 2 differences:

  • the trailing semicolon seems to be unnecessary now. It was causing a parsing error when reading the following line which caused the error message to be miss-leading since it pointed me in the wrong direction. I finally noticed the error message made no sense and tried to remove the semi-colon. It worked.
  • Second thing is the replacement of '?' char to mark 'r' relationship optional in the query. Optional matcher's syntax seems to be new (or restricted to): OPTIONAL MATCH. I then replaced the edge from the query MATCHer and created an OPTIONAL MATCHer for it. It worked but I really doubt the two queries (old 1.9.x vs new 2.0.x) do the exact same thing. What I intended was to delete everything and that's what happens, but that's not enough proof to be satisfied with the rewrite.
(I'm not sure I can freely distribute the movies.cyp database) :-(

FIRST IMPRESSIONS


  1. The web console has improved incredibly. It was a great tool already but it is now beyond awesome. You can judge yourself:
    1. not only the tabular data presentation provides a clearer view of the schemaless data,


    2. you can now peek at the results in graph view


I'm only scratching the surface of Graph DB concept at the moment. I hope I can get my hands dirty in the upcoming days...