Commit Graph

75 Commits

Author SHA1 Message Date
Sam
c315e26485
FIX: handle more thread pool edge cases (#30392)
* Split `shutdown` into two separate methods for better control:
  - `shutdown` - signals threads to stop accepting new work
  - `wait_for_termination` - waits for threads to finish (with optional timeout)

* Add tracking of busy threads via `@busy_threads` Set
* Make idle_time parameter optional with 30-second default
* Improve thread spawning logic:
  - Spawn initial thread immediately when work is posted
  - Spawn additional threads when all threads are busy and work is queued
* Fix race condition in work distribution
* Add busy thread count to stats output
* Add test coverage for zero min_threads configuration

This commit makes the ThreadPool more reliable, easier to use, and adds 
better visibility into its internal state.

---------

Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com>
2024-12-20 11:50:00 +11:00
Sam
efa50a4da2
FEATURE: ThreadPool implementation (#30364)
This commit introduces a new ThreadPool class that provides efficient worker
thread management for background tasks. Key features include:

- Dynamic scaling from min to max threads based on workload
- Proper database connection management in multisite setup
- Graceful shutdown with task completion
- Robust error handling and logging
- FIFO task processing with a managed queue
- Configurable idle timeout for worker threads

The implementation is thoroughly tested, including stress tests, error
scenarios, and multisite compatibility.
2024-12-20 07:37:12 +11:00
Loïc Guitaut
88f1b3b195 DEV: Try fixing flaky spec related to Scheduler::Defer
Checking if a connection is available is probably not enough, when the
connection is available, we should still verify it’s not stale.
2024-11-28 15:30:13 +01:00
Loïc Guitaut
f69f0211df DEV: Fix flaky spec related to Scheduler::Defer
In some cases in CI env, it seems the AR connection isn’t available and
the `ensure` block is executed. It’s calling `#verify!` on the
connection, so it can fail sometimes. This is probably why
`#clear_active_connections!` was failing too sometimes.

Here, we just check the connection is present before clearing the
connections.
2024-11-28 11:46:52 +01:00
Sam
07813ba83c
DEV: fix hanging spec (#29974) 2024-11-28 11:06:19 +08:00
Sam
72132c35fb
DEV: fix flaky spec (#29972)
Spec was flaky cause work could still be in pipeline after the defer
length is 0. Our length denotes the backlog, not the in progress
count.

This adds a mechanism for gracefully stopping the queue and avoids
wait_for callse
2024-11-28 11:21:35 +11:00
Loïc Guitaut
fac6147039 DEV: Verify DB connection before trying to clear active connections 2024-11-27 18:12:11 +01:00
Alan Guo Xiang Tan
6bf0ac730f
FIX: Rescue ActiveRecord::ReadOnlyError when baking theme field (#29776)
Firstly, we need to understand that ActiveRecord can be
connected to a role which prevent writes and this happens in Discourse when a
replica database has been setup for failover purposes. When a role
prevent writes from happening, ActiveRecord will raise the
`ActiveRecord::ReadOnlyError` if a write query is attempted.

Secondly, theme fields are baked at runtime within GET requests. The
baking process involves writing the baked value to the
`ThemeField#baked_value` column in the database.

If we combine the two points above, we can see how the writing of the
baked value to the database will trigger a `ActiveRecord::ReadOnlyError`
in a GET requests when the database is connected to a role preventing
writes. However, failing to bake a theme is not the end of the world and
should not cause GET requests to fail. Therefore, this commit adds a rescue
for `ActiveRecord::ReadOnlyError` in the `ThemeField#ensure_baked!`
method.
2024-11-15 10:19:10 +08:00
Alan Guo Xiang Tan
322a3be2db
DEV: Remove logical OR assignment of constants (#29201)
Constants should always be only assigned once. The logical OR assignment
of a constant is a relic of the past before we used zeitwerk for
autoloading and had bugs where a file could be loaded twice resulting in
constant redefinition warnings.
2024-10-16 10:09:07 +08:00
Daniel Waterworth
30922855f2
PERF: Don't allow a single user to monopolize the defer queue (#25593) 2024-02-07 13:47:50 -06:00
Daniel Waterworth
5c92d7da22
FIX: Increase defer queue length (#24200)
It's important that there is a limit, but the current limit is too
restrictive.
2023-11-01 14:02:53 -05:00
Daniel Waterworth
26e267478d
SECURITY: Don't allow a particular site to monopolize the defer queue 2023-07-28 12:53:51 +01:00
David Taylor
29f7ec7090
DEV: Prevent defer stats exception when thread aborted (#19863)
When the thread is aborted, an exception is raised before the `start` of a job is set, and therefore raises an exception in the `ensure` block. This commit checks that `start` exists, and also adds `abort_on_exception=true` so that this issue would have caused test failures.
2023-01-16 09:08:44 +11:00
Sam
7b63c42304
FEATURE: add basic instrumentation to defer queue (#19824)
This will give us some aggregate stats on the defer queue performance.

It is limited to 100 entries (for safety) which is stored in an LRU cache.

Scheduler::Defer.stats can then be used to get an array that denotes:

- number of runs and completions (queued, finished)
- error count (errors)
- total duration (duration)

We can look later at exposing these metrics to gain visibility on the reason
the defer queue is clogged.
2023-01-12 12:29:50 +11:00
David Taylor
6417173082
DEV: Apply syntax_tree formatting to lib/* 2023-01-09 12:10:19 +00:00
Sam
057087e0e8 FEATURE: log long running jobs in the defer queue
If a job in the defer queue takes longer than 90 seconds log an error
2018-10-12 17:03:47 +11:00
Neil Lalonde
4ad7ce70ce REFACTOR: extract scheduler to the mini_scheduler gem 2018-07-31 17:12:55 -04:00
Guo Xiang Tan
7eff64773c Revert "FIX: Don't clear active connections in defer queue."
This reverts commit c9feadf9ec.
2018-06-19 17:58:21 +08:00
Guo Xiang Tan
df24c51c6f Revert "FIX: Don't try to dequeue an empty queue."
This reverts commit 1af7d4a894.
2018-06-19 15:49:45 +08:00
Guo Xiang Tan
1af7d4a894 FIX: Don't try to dequeue an empty queue. 2018-06-19 15:25:44 +08:00
Guo Xiang Tan
c9feadf9ec FIX: Don't clear active connections in defer queue. 2018-06-19 12:45:16 +08:00
Sam
6974b7d6a8 FIX: run deferred jobs inline in sidekiq 2018-05-23 12:05:37 +10:00
Régis Hanol
93ed8d2522
PERF: defer user notifications (#5827) 2018-05-15 09:51:32 +02:00
Sam
2b8d4508e5 PERF: stop running background work between requests
Use a dedicated thread to run Scheduler::Defer

This avoids blocking of a worker during operations that require waiting.

In particular uploads risked blocking a unicorn.

This also add a queue "length" that discourse prometheus consumes.
2017-11-23 15:48:47 +11:00
Sam
7ca08216bd FIX: ensure we have no dangling db connections on threads
This correct 10 second timeouts in dev mode, when reloader kicks in
2017-10-30 14:24:15 +11:00
Sam
55d096ee8b FEATURE: add event for scheduled_job_ran 2017-10-23 17:22:17 +11:00
Guo Xiang Tan
9dcb11f553 Fix the build. 2017-10-11 17:45:19 +08:00
Guo Xiang Tan
36f8697a59 FIX: Exception has to be wrapped in the connection as well. 2017-10-11 17:19:26 +08:00
Guo Xiang Tan
09721090a3 FIX: Ensure that we revert back to default connection after running jobs. 2017-10-11 17:17:03 +08:00
Sam
9b4fd0b26b correct multisite issues with scheduler 2017-10-11 18:46:53 +11:00
Sam
6b4a1af160 FIX: don't attempt to schedule if there is no next run 2017-10-11 14:27:16 +11:00
Sam
233299982f keep time consistent, we always use to_i 2017-10-11 14:26:50 +11:00
Guo Xiang Tan
5012d46cbd Add rubocop to our build. (#5004) 2017-07-28 10:20:09 +09:00
Sam
04eac9f14a lets attempt to get these specs working! 2017-07-24 18:35:20 -04:00
Sam
f67e715ef1 comment out specs that break others
will check in a fixed spec tomorrow
2017-07-24 17:28:24 -04:00
Sam
f97fb7b70c tighten time to stop schedueler 2017-07-24 15:19:54 -04:00
Sam
0c47153808 clean up stop semantics 2017-07-24 15:17:48 -04:00
Sam
c08a7aee8f clean up skipped tests
tighter connection handling in scheduler
2017-07-24 15:06:24 -04:00
Sam
66ef7976ea FIX: don't re-scheduler correctly scheduled daily tasks 2017-07-24 14:30:43 -04:00
Guo Xiang Tan
d940166a89 Re-enable skipped Scheduler::ScheduleInfo test. 2017-07-25 00:03:03 +09:00
Guo Xiang Tan
c3b5bca0e8 Log error for all exceptions in scheduler stats. 2017-04-26 09:33:05 +08:00
Guo Xiang Tan
1f6418f907 Track error message in SchedulerStats. 2017-04-26 01:34:25 +08:00
Sam
e9ba6e4e99 clean up formatting reports 2016-05-31 07:57:28 +10:00
Sam
3eec0a83b0 clean up stop semantics and bypass test 2016-05-30 13:59:58 +10:00
Sam
cc088956bc correct some test concurrency bugs 2016-05-30 12:28:05 +10:00
Sam
c9dcffe434 FEATURE: store history for scheduled job execution 2016-05-30 11:38:08 +10:00
Régis Hanol
fbacaab2fc FIX: disable scheduled jobs when in readonly mode 2016-01-11 18:31:28 +01:00
Sam
f85b59b6d4 FIX: you could not manually trigger jobs via sidekiq ui 2015-08-24 16:44:41 +10:00
Sam
5373413102 skip runner params changed 2015-06-26 14:02:17 +10:00
Sam
d6d9a7fa09 FEATURE: per host regular jobs
These are jobs that will run on every host that is running discourse.

If you have multiple hosts running the same site you get independent
schedules
2015-06-26 13:37:05 +10:00