SQL2Neo: Moving health-care data from relational to graph databases

SQL2Neo: Moving health-care data from relational to graph databases De-facto storage model being used by health-care information systems is Relational Database Management Systems (RDBMS). Albeit relational storage model is mature and widely used; they are incompetent to store and query data encompassing high degree of relationships. Health-care data is heavily annotated with relationships and hence are a suitable candidate for a specialized data model – Graph databases. Graph databases will empower health-care professionals to discover and manage new and useful relationships and also provides speed when querying highly-related data. To query related data, relational databases employ massive joins which are very expensive, in contrast graph data-stores have direct pointers to their adjacent nodes. Hence achieving much needed scalability to handle huge amount of medical data being generated at a very high velocity. Also, healthcare data is primarily semi/un-structured – inciting the need of a schema-less database.

In this proposal a methodology to convert a relational to a graph database by exploiting the schema and the constraints of the source, is proposed. The approach supports the translation of conjunctive SQL queries over the source into graph traversal operations over the target. The experimental results are provided to show the feasibility of the solution and the efficiency of query answering over the target database. Tuples are mapped to nodes and foreign key is mapped into edges. Software have been implemented in Java to convert a sample medical relational database with 24 tables to a graph database. During transformation, constraints were preserved. MySQL as relational database and popular graph database – Neo4j was used for the implementation of proposed system – SQL2Neo.