Overview of the Issue

A webhook POST request was sent from the DRF server to the Django app server,
and during the process of creating a Post model instance in Django, the following issues arose.

  1. Webhook Response Timeout
  2. After the creation of the Post object, time was consumed for processing the ManyToMany fields categories, tags
  3. This caused a delay in the webhook response, and DRF recognized it as a failure

  4. Data Integrity Not Secured

  5. When the Celery task was called immediately for subsequent processing, it was invoked at a point where tags.add(...), categories.add(...) operations were not yet completed
  6. This resulted in Celery processing incomplete data

  7. Receiving empty data in Celery even after on_commit()

  8. After ensuring data integrity with on_commit() and scheduling the Celery task, the result of queries like tags.all() still appeared as an empty list
  9. This was due to Celery sending read operations to the replica DB, while there was a delay in synchronization from master to replica

Resolution Strategy

  1. Return Response Immediately After Post Creation
  2. Create the Post instance and immediately return a 202 Accepted response to DRF, avoiding the timeout issue

  3. Handle Subsequent Tasks in a Separate Thread

  4. Utilize threading.Thread() to separate relationship handling and Celery calling from the main flow
  5. Designed to prevent delays in the webhook response

  6. Ensure Transaction Guarantee within post_process

  7. Wrap the entire ManyToMany field processing within transaction.atomic()
  8. Schedule Celery tasks with transaction.on_commit() after task completion
  9. This ensures that Celery executes after the relationship processing is complete

  10. Directly Specify the Master DB for Queries in Celery

  11. To prevent consistency issues caused by delays in the replica, explicitly use using('default') within the Celery task
post = Post.objects.using('default').get(id=post_id)
tags = post.tags.using('default').all()
categories = post.categories.using('default').all()

Alternatively, it is possible to route Celery requests to always query the master DB using a DB Router


Final Structure Summary

  1. DRF server → Django webhook request
  2. Django:
    • Return response immediately after creating Post object
    • Execute subsequent tasks in a separate thread
  3. Within the thread:
    • Organize relationships with atomic()
    • Schedule Celery task with on_commit()
  4. Celery:
    • Query data from the master DB (default)

Conclusion

While all components function properly individually,
to resolve latency and data integrity issues arising in a distributed architecture environment,
a system design considering not just code structure but also data flow and timing, as well as delays in DB replicas was necessary.