* Split `shutdown` into two separate methods for better control:
- `shutdown` - signals threads to stop accepting new work
- `wait_for_termination` - waits for threads to finish (with optional timeout)
* Add tracking of busy threads via `@busy_threads` Set
* Make idle_time parameter optional with 30-second default
* Improve thread spawning logic:
- Spawn initial thread immediately when work is posted
- Spawn additional threads when all threads are busy and work is queued
* Fix race condition in work distribution
* Add busy thread count to stats output
* Add test coverage for zero min_threads configuration
This commit makes the ThreadPool more reliable, easier to use, and adds
better visibility into its internal state.
---------
Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com>
This commit introduces a new ThreadPool class that provides efficient worker
thread management for background tasks. Key features include:
- Dynamic scaling from min to max threads based on workload
- Proper database connection management in multisite setup
- Graceful shutdown with task completion
- Robust error handling and logging
- FIFO task processing with a managed queue
- Configurable idle timeout for worker threads
The implementation is thoroughly tested, including stress tests, error
scenarios, and multisite compatibility.
In some cases in CI env, it seems the AR connection isn’t available and
the `ensure` block is executed. It’s calling `#verify!` on the
connection, so it can fail sometimes. This is probably why
`#clear_active_connections!` was failing too sometimes.
Here, we just check the connection is present before clearing the
connections.
Spec was flaky cause work could still be in pipeline after the defer
length is 0. Our length denotes the backlog, not the in progress
count.
This adds a mechanism for gracefully stopping the queue and avoids
wait_for callse
Firstly, we need to understand that ActiveRecord can be
connected to a role which prevent writes and this happens in Discourse when a
replica database has been setup for failover purposes. When a role
prevent writes from happening, ActiveRecord will raise the
`ActiveRecord::ReadOnlyError` if a write query is attempted.
Secondly, theme fields are baked at runtime within GET requests. The
baking process involves writing the baked value to the
`ThemeField#baked_value` column in the database.
If we combine the two points above, we can see how the writing of the
baked value to the database will trigger a `ActiveRecord::ReadOnlyError`
in a GET requests when the database is connected to a role preventing
writes. However, failing to bake a theme is not the end of the world and
should not cause GET requests to fail. Therefore, this commit adds a rescue
for `ActiveRecord::ReadOnlyError` in the `ThemeField#ensure_baked!`
method.
Constants should always be only assigned once. The logical OR assignment
of a constant is a relic of the past before we used zeitwerk for
autoloading and had bugs where a file could be loaded twice resulting in
constant redefinition warnings.
When the thread is aborted, an exception is raised before the `start` of a job is set, and therefore raises an exception in the `ensure` block. This commit checks that `start` exists, and also adds `abort_on_exception=true` so that this issue would have caused test failures.
This will give us some aggregate stats on the defer queue performance.
It is limited to 100 entries (for safety) which is stored in an LRU cache.
Scheduler::Defer.stats can then be used to get an array that denotes:
- number of runs and completions (queued, finished)
- error count (errors)
- total duration (duration)
We can look later at exposing these metrics to gain visibility on the reason
the defer queue is clogged.
Use a dedicated thread to run Scheduler::Defer
This avoids blocking of a worker during operations that require waiting.
In particular uploads risked blocking a unicorn.
This also add a queue "length" that discourse prometheus consumes.