Variability-aware request replication for latency curtailment Conference Poster

abstract

  • Processing time variability is commonplace in distributed systems, where resources display disparate performance due to, e.g., different workload levels, background processes, and contention in virtualized environments. However, it is paramount for service providers to keep variability in response time under control in order to offer responsive services. We investigate how request replication can be used to exploit processing time variability to reduce response times, considering not only mean values but also the tail of the response time distribution. We focus on the distributed setup, where replication is achieved by running copies of requests on multiple servers that otherwise evolve independently, and waiting for the first replica to complete service. We construct models that capture the evolution of a system with replicated requests using approximate methods and observe that highly variable service times offer the best opportunities for replication - reducing the response time tail in particular. Further, the effect of replication is non-uniform over the response time distribution: gains in one metric, e.g., the mean, can be at the cost of another, e.g., the tail percentiles. This is demonstrated in wide range of numerical virtual experiments. It can be seen that capturing service time variability is key to the evaluation of latency tolerance strategies and in their design.

publication date

  • 2016/7/27

edition

  • 2016-July

keywords

  • Experiments
  • Processing
  • Servers