Graph Analytics on Big Data

Graph Analytics on Big Data

Graph Analytics (or network analytics), is an area of analysis with numerous applications that increasingly draws more and more attention. From fraud detection and money laundering, illegal transactions and other forms of financial crime, to identify key influencers in social networks, communities of frequently interacting individuals and route optimisation or even bioinformatics; graph analytics offer a vast variety of solutions that keep on evolving on a daily basis; allowing for experts in various fields to tackle every day challenges, extract insights and drive decision making.

Mainly, Graph Analytics is divided into 4 categories:
* Path Analytics;
* Connectivity Analytics;
* Centrality Analytics, and;
* Community Detection analytics;
each of which relies on different algorithms and address different problems.

So, how does one apply these techniques effectively in order to drive hypothesis testing and, eventually, the extraction of actionable insights?
What are the steps that should be followed?
What is the impact of visualisation tools in this process?
How should we sample from graphs?
What tools should one use or be familiar with?
Furthermore, how can scalable and high-quality production-ready solutions be implemented that apply Graph Analytics, giving direct access to visualisations and on-demand analytics’ dashboards, which can serve as an intuitive and amenable means of information interpretation?

In this talk, we present and discuss the different categories of graph analytics and their areas of application. In addition, to address the above questions, we define a methodology for the application of these analytics technologies through example use-cases, studying the steps that need to be followed before assumptions can be confirmed and insights can be extracted.

Finally, we will discuss how distributed programming models, such as Spargel, have been developed to allow for the adoption of graph analytics algorithms by frameworks like Apache Spark and Apache Flink; the challenges and limitations that come with their adoption by these frameworks, and how one can build scalable distributed graph analytics solutions using them.