Hiep Doan

Why you should split your Kubernetes-hosted backend app into API and worker mode

May 29, 2019

In a world of monolithic architecture, it’s very common that backend application does not just receive and respond to API request, but also perform scheduling job such as data cleaning, notification sending… In addition, the same app can also take the job from the queue and execute it.

This architecture has tremendous advantage at the beginning as a team can re-use exactly the same code base and infrastructure and can quickly add a new scheduled job.

For example, to have a daily job of clearing expired token at 2am, in Spring Boot you can do this:

 @Scheduled(cron = "0 0 2 * * ?")
 public void clearExpiredToken() {


This works fine, but not optimal. API and job execution mode are essentially very different:

  • API should normally be fast and not very computationally heavy while this kind of job can be expensive to execute and can take more time to complete.
  • In terms of database transaction, API transaction should be similarly fast and mostly in parallel (basically every API comes with a database transaction). In contrary, job can have much longer transaction time, but with way less number of transaction.
  • In general, backend API must be up all the time, while for job executor you can have more tolerance for down-time in case of resource lacking. Of course we must guarantee that the job is idempotent but it’s another story.
  • For Kubernetes-hosted application, it’s optimal if you can see the pod’s resource limit and request within a close range, so the cluster can effectively allocate the pod to a node or kill the pod if it needs more resource. If you use same deployment for both the scheduled job and API backend, the range is gonna be big. By splitting two kinds of backend mode, we can effectively use smaller pods for them, thus let Kubernetes be easily move them between nodes if needed. In case if a job needs more resource than its limit, Kubernetes cluster can kill it, but not the API backend, which will lead to down time.

Enough said, how can we split a Spring boot backend app on Kubernetes into worker mode (for executing job) and API mode? It’s very easy indeed by the use of Spring profile. For API mode, it can use normal prod profile and for worker mode, we use both prod and worker profile. Then we only enable Spring’s scheduled mode in worker profile.

public class SchedulerConfig {

  private static final Integer SCHEDULE_THREAD_COUNT = 10;

  public ThreadPoolTaskScheduler threadPoolTaskScheduler() {

    ThreadPoolTaskScheduler threadPoolTaskScheduler = new ThreadPoolTaskScheduler();


    return threadPoolTaskScheduler;


Then we do not open the worker deployment with a service so no user can access it with API anyway. In short, this article analyses and demonstrates why we should split your monolith backend into two modes: API and worker mode.

From my own experience, after the change our API latency is much more stable (as it’s not affected by scheduling job) and we can reduce the number of Kubernetes node. We can also effectively scale the number of worker node separately if there are too many jobs in the queue. How do you think? Feel free to give your feedback and comments here.

Hiep Doan

Written by Hiep Doan, a software engineer living in the beautiful Switzerland. I am passionate about building things which matters and sharing my experience along the journey. I also selectively post my articles in Medium. Check it out!