divendres, 27 de desembre del 2013

Neo4j 2.0 - Indexing

NOTE: this post is the unexpected continuation to my yesterday's post on Neo4J. You might want to start there.

DISCLAIMER: I'm no expert on the technology and this is more of personal notes while I keep playing around and learning.



INDEXING

So I went on with Alberto's workshop slides to learn Neo4J to refresh my memory on the features until I reached a certain slide which used indexing:

     START tom=node:node_auto_index(name="Tom Hanks")  
     MATCH (tom)-[:ACTED_IN]->()<- director="" span="">
     RETURN director.name;

I then tried to execute it and found a nasty error message:
   Index `node_auto_index` does not exist 
It is clear what the problem is: the index is missing. But considering it's the auto_index I was trying to use I assume there's no more indexing magic in Neo4J. I then started a pursuit to create an index so that I could reproduce what I had in Neo4J 1.9.x. I mean, it was clear to me now that if I wanted indexing over actors-name I would have to create it myself. So I started digging google to learn some more about indexing in 2.0.0.


Creating and using Indexes

First hit I checked on Indexing is a great webinar by Michael Hunger on new features in Neo4J. For what I could gather (in the matter of indexing) is that the main difference is that they are now truly indexes meaning once created they auto-magically maintained when data is updated/added. I deduce from that statement that this wasn't the case in previous versions. BTW, the index is maintained transactionally, the index is bound to the data transactionally.

So, to create an index you simply need to:

     CREATE INDEX ON :Actor(name)

This approach really simplifies the queries so that my original Cypher query becomes:

     MATCH (actor:Actor)-[:ACTED_IN]->()<- director="" span="">
     WHERE actor.name="Tom Hanks"
     RETURN director.name;

which is simpler and also more aligned to what a SQL-John might expect. What happens under the covers is that Neo4J detects I'm filtering by a field (name) over a labelled node (actor:Actor) and then finds out there's an index by ':Actor(name)'. So, it goes and automagically tries to use it.




But it is flawed

Turns out when I tried try to create the index using:

     CREATE INDEX ON :Actor(name)

it worked at indexing nothing because my dataset doesn't use labels. So I then tried to index anything by name:

     CREATE INDEX ON :(name)
     CREATE INDEX ON :*(name)
     CREATE INDEX ON (name)

it was a total waste of time since indexing requires labels in Neo4J 2.0.x. (insert sadface here). 




INTRODUCING LABELS!

Then, back on my quest to query using an index I noticed my only chance was to create a label and have all nodes that [:ACTED_IN] another node to be labelled as Actor. Turns out to be quite straight forward:

      MATCH (actor)-[:ACTED_IN]->(movie)
      SET actor :Actor
      RETURN actor;

This finally created my label, which unblocked my power to created indexes which allowed to query using them.



FUTURE WORK

Some doubts I need to investigate further:
  • The video mentions there's "no unique indexing yet" but the video is few months old now and is based on Neo4J 2.0.0-M0
  • there's also a mention to 'simple lookups for now' and I wonder what that might mean.
  • While reading the docs on indexing I noticed it is possible to force the usage of a given index when querying (which is wonderful and also expected by some SQL-John's).
  • I read s/where it's possible to alter the indexing technology. That's definitely worth a look at.

dijous, 26 de desembre del 2013

Neo4j 2.0 - Setup and first impressions

During the Christmas Holidays I took some time to play around with Neo4J. This is not the first time I tinker with it but definitely the first time I do it unsupervised. I must say the first time I played around with Neo4J it was under @albertoperdomo 's guidance and it felt like a liberation after several years of RDBMS.

DISCLAIMER: I'm a total newbie at Neo4j so don't take my advice for anything I'll be writing, this post is more of a compendium of notes for myself to check in the future.

INSTALLING

Installing any version (I needed 2.0.0) of Neo4J is insultingly simple thanks to @thedevel script: ndm. I even tweeted about it (again, for my own reference). Let me point out that even manually, isntalling neo4j is really simple.

IMPORTING MOVIE DATABASE

During Alberto's workshop intro to Neo4J we had a lot of small quizes so that each would have to keep on investigating and putting small concepts into practice. My idea to start playing around with Neo4J was to load that clean movie database, refresh some concepts from the notes I took during the workshop and then try to move on from that point.

First issue I faced with new features in Neo4J 2.0.0 were small syntax changes causing the load of a 1.9.3 database to fail. It's ended up being something quite silly though. What used to be:

     START n=node(*) MATCH (n)-[r?]-() DELETE r,n;

now turned into:

     START n=node(*) 
     MATCH (n)--() 
     OPTIONAL MATCH ()-[r]-() 
     DELETE r,n

There seems to be 2 differences:

  • the trailing semicolon seems to be unnecessary now. It was causing a parsing error when reading the following line which caused the error message to be miss-leading since it pointed me in the wrong direction. I finally noticed the error message made no sense and tried to remove the semi-colon. It worked.
  • Second thing is the replacement of '?' char to mark 'r' relationship optional in the query. Optional matcher's syntax seems to be new (or restricted to): OPTIONAL MATCH. I then replaced the edge from the query MATCHer and created an OPTIONAL MATCHer for it. It worked but I really doubt the two queries (old 1.9.x vs new 2.0.x) do the exact same thing. What I intended was to delete everything and that's what happens, but that's not enough proof to be satisfied with the rewrite.
(I'm not sure I can freely distribute the movies.cyp database) :-(

FIRST IMPRESSIONS


  1. The web console has improved incredibly. It was a great tool already but it is now beyond awesome. You can judge yourself:
    1. not only the tabular data presentation provides a clearer view of the schemaless data,


    2. you can now peek at the results in graph view


I'm only scratching the surface of Graph DB concept at the moment. I hope I can get my hands dirty in the upcoming days...

divendres, 20 de setembre del 2013

Parallel collection manipulation in scala

Scala collections API comes packed with a very cool feature which is parallelizing any processing. See this example:

I first create a list (I could use a range or s/thing else too):

scala> List(1,2,3,4,5,6,7,8,9)
res0: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9)

... and then build the skeleton of my processing. What I want to to multiply each value by 1000 and then divide each value by 500:

scala> res0.map{ 
    i => i*1000
  }.map{
    i => i/500
  } 
res1: List[Int] = List(2, 4, 6, 8, 10, 12, 14, 16, 18)

Nothing fancy so far.

Entering par

In scala every collection can be automagically wrapped into a counterpart that implements processing with a thread pool. I actually have no clue what the implementation is. Damn! I'll have to look it up. Anyway, insert 'par.' on your code and...

scala> res0.par.map{ 
    i => i*1000
  }.par.map{
    i => i/500
  }
res2: scala.collection.parallel.immutable.ParSeq[Int] = ParVector(2, 4, 6, 8, 10, 12, 14, 16, 18)


... the list becomes a ParVector and keeps all items sorted in the original position.
Let's try and see it in action: (added random sleep to 'help' context switching)

scala> import java.util.concurrent.TimeUnitimport java.util.concurrent.TimeUnit

scala> import java.util.Random
import java.util.Random

scala> new Random

res6: java.util.Random = java.util.Random@b9d964d

scala> res0.par.map { 
    i => TimeUnit.MILLISECONDS.sleep(res6.nextInt(1000));
    println(i);
    i*1000
  }.par.map{
    i => TimeUnit.MILLISECONDS.sleep(res6.nextInt(1000));
    println(i); 
    i/500
  } 
3
7
4
5
8
2
1
9
6
7000
3000
5000
4000
1000
6000
8000
9000
2000
res13: scala.collection.parallel.immutable.ParSeq[Int] = ParVector(2, 4, 6, 8, 10, 12, 14, 16, 18)

scala> 

Ta dah! Execution is run in parallel.

See more information re Parallel Collections on the overviews of the Scala Docs.

PS: For the curious...

If I get rid of the first 'par', the first processing is sequential, and the delays add up.


scala> res0.map { i => TimeUnit.MILLISECONDS.sleep( res6.nextInt(1000)  );println(i) ;i*1000}.par . map { i => TimeUnit.MILLISECONDS.sleep(  res6.nextInt(1000)  ); println(i); i/500 } 
1
2
3
4
5
6
7
8
9
5000
2000
1000
6000
3000
4000
9000
7000
8000
res17: scala.collection.parallel.immutable.ParSeq[Int] = ParVector(2, 4, 6, 8, 10, 12, 14, 16, 18)

dijous, 1 d’agost del 2013

SBT and ScalaTest and a strange exception

After few weeks developing in Play! at some point today I started getting an Exception out of nowhere.

[info] 
Exception in thread "Thread-109" java.io.EOFException
at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2577)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1315)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
at sbt.React.react(ForkTests.scala:98)
at sbt.ForkTests$$anonfun$apply$2$Acceptor$2$.run(ForkTests.scala:66)
at java.lang.Thread.run(Thread.java:722)
[info] Passed: : Total 23, Failed 0, Errors 0, Passed 23, Skipped 0

Strangely enough it would be thrown on every test execution but all tests pass (see last line). 

Turns out it's a known (and already fixed) issue in sbt 0.12.2 so that was only a matter of updating:

   # sed -e 's/0.12.2/0.12.3/g' project/build.properties

dissabte, 18 de maig del 2013

Word Wrap #katayuno

I finally got the chance to attend a Softonic's Katayuno.

I love the Coding Dojo's in general, but those organised at Softonic are special because of their office decoration (out of the average) and because they're breakfast.

Once I got at Softonic I must say the ambient, even with an empty office, felt different from many other  companies I had visited before. The place is clean, ample and colorful you also have to consider the fact that's we were at story 9 which is over the average building height in Barcelona so the view was also quite stunning. Yes, you can see the sea from the dinner. And yes, there's a dinner.



Back to work

We got to work and after fiunchinho's introduction to TDD and red-green-refactor warmed-up on a first 30 minute pomodoro. The problem at hand was the KataWordWrap which fiunchinho selected specially for it's simplicity. It's not that he thinks we are stupid (which we are) it's that eh wanted us to complete the kata for once. SPOILER some of the pairs did complete the kata so fiunchinho just got a badge unlocked!

I paired twice in Java and after the break I paired once in scala. I'm still not very fluent in scala but I'm happy to report that we completed the kata in scala in little over a pomodoro (and I think we got further than previous pomodoros too!). Here's the final code:

This last session I paired with dvillacampa that is completely new to scala. I must say he very patiently listened to all my funoby comments about the language.