Timeliness

  • Real-time systems process data as it is created and involves tools that can process and analyse streams of events. These are useful for example, for in-the-moment recommendations.

  • Batch analysis requires the processing of data periodically and implies that enough data needs to be available in order to make the analysis relevant. These can be used for analysis at a later date.

  • Near-real-time analysis gathers data quickly and allows for refreshing analytics every few minutes or seconds. These are useful for in the same browsing session.

Using the MapReduce programming model to process big data sets, it is possible to run algorithms in a distributed file system at the same time and choose the most similar cluster.