Commit Graph

39 Commits

Author SHA1 Message Date
Alan Guo Xiang Tan
e4e5db57f0
DEV: Fix undefined method check_email_sync_heartbeat in unicorn conf (#30360)
This is a follow-up to 9812407f76
2024-12-19 10:10:11 +08:00
Alan Guo Xiang Tan
9812407f76
FIX: Redo Sidekiq monitoring to restart stuck sidekiq processes (#30198)
This commit reimplements how we monitor Sidekiq processes that are
forked from the Unicorn master process. Prior to this change, we rely on
`Jobs::Heartbeat` to enqueue a `Jobs::RunHeartbeat` job every 3 minutes.
The `Jobs::RunHeartbeat` job then sets a Redis key with a timestamp. In
the Unicorn master process, we then fetch the timestamp that has been set
by the job from Redis every 30 minutes. If the timestamp has not been
updated for more than 30 minutes, we restart the Sidekiq process. The
fundamental flaw with this approach is that it fails to consider
deployments with multiple hosts and multiple Sidekiq processes. A
sidekiq process on a host may be in a bad state but the heartbeat check
will not restart the process because the `Jobs::RunHeartbeat` job is
still being executed by the working Sidekiq processes on other hosts.

In order to properly ensure that stuck Sidekiq processs are restarted,
we now rely on the [Sidekiq::ProcessSet](https://github.com/sidekiq/sidekiq/wiki/API#processes)
API that is supported by Sidekiq. The API provides us with "near real-time (updated every 5 sec)
info about the current set of Sidekiq processes running". The API
provides useful information like the hostname, pid and also when Sidekiq
last did its own heartbeat check. With that information, we can easily
determine if a Sidekiq process needs to be restarted from the Unicorn
master process.
2024-12-18 12:48:50 +08:00
Alan Guo Xiang Tan
c1f25cdf5b
FIX: Unicorn master and Sidekiq reopening logs at the same time (#29137)
In our production environment, we have been seeing Sidekiq processes
getting stuck randomly when a USR1 signal is sent to the Unicorn master
process. We have not been able to identify the root cause of why the
Sidekiq process gets stuck. We however noticed that when the Unicorn
master process receives a USR1 signal, it will reopen the logs for the
Unicorn master process first before sending a USR1 signal for the
Unicorn worker processes to reopen the logs. We figured that we should
do the same for the Sidekiq process as well when a USR1 signal.

In this commit, we introduce an arbitrary delay of 1 second before we
the Sidekiq process reopens its log files so as to allow enough time for the Unicorn
master to finish reopening it logs first.

We also do not send reopen logs for the Sidekiq process if the `DISCOURSE_LOG_SIDEKIQ`
env is not present because there is no need to reopen any logs.
2024-10-10 08:01:40 +08:00
Alan Guo Xiang Tan
9fdcdcf58d
DEV: Log error encountered when reopening sidekiq logs (#27411)
We are seeing the following error in our logs when Sidekiq is sent a
`USR1` signal in production when logrotate happens:

```
log writing failed. stream closed in another thread
Error encountered while starting Sidekiq: can't be called from trap context\n/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/unicorn-6.1.0/lib/unicorn/util.rb:71:in `reopen'
```

I'm not quite sure where the error is triggered from so I'm improving
the way we log errors.
2024-06-11 12:29:48 +08:00
Alan Guo Xiang Tan
23c38cbf11
DEV: Log Unicorn worker timeout backtraces to Rails.logger (#27257)
This commit introduces the following changes:

1. Introduce the `SignalTrapLogger` singleton which starts a single
   thread that polls a queue to log messages with the specified logger.
   This thread is necessary becasue most loggers cannot be used inside
   the `Signal.trap` context as they rely on mutexes which are not
   allowed within the context.

2. Moves the monkey patch in `freedom_patches/unicorn_http_server_patch.rb` to
   `config/unicorn.config.rb` which is already monkey patching
   `Unicorn::HttpServer`.

3. `Unicorn::HttpServer` will now automatically send a `USR2` signal to
   a unicorn worker 2 seconds before the worker is timed out by the
   Unicorn master.

4. When a Unicorn worker receives a `USR2` signal, it will now log only
   the main thread's backtraces to `Rails.logger`. Previously, it was
   `put`ing the backtraces to `STDOUT` which most people wouldn't read.
   Logging it via `Rails.logger` will make the backtraces easily
   accessible via `/logs`.
2024-06-03 12:51:12 +08:00
Alan Guo Xiang Tan
e9c8e182d3
DEV: Use Unicorn logger to log Sidekiq signal handling events (#27239)
This commit updates all Sidekiq signal handling event logs to go through
Unicorn's logger instead of logging to STDOUT. Going through a proper logger
means the log messages are logged in the format which the logger has configured.
This means we get proper timestamp for the log messages.
2024-05-29 11:15:20 +08:00
David Taylor
6417173082
DEV: Apply syntax_tree formatting to lib/* 2023-01-09 12:10:19 +00:00
David Taylor
fe5bfc8d3b
DEV: Route Sidekiq logs to Rails logger (#15817)
Most of our logging goes through Rails.logger, and therefore appears in Logster at `/logs` on a site. The Sidekiq logger was bypassing this and writing directly to STDERR.

Unfortunately it's not possible to do `Sidekiq.logger = Rails.logger` because `Sidekiq#logger=` applies a number of patches to the logger instance, causing our whole logging system to break.

Instead, this commit adds a dedicated Logger instance with no output, which is then patched to forward all messages directly to `Rails.logger`
2022-02-04 16:28:20 +00:00
David Taylor
dfcb8a72fd
DEV: Ensure Sidekiq warnings are logged to STDERR (#15800)
The default configuration will log to STOUT, which pollutes the output of scripts/rake-tasks
2022-02-03 14:24:15 +00:00
David Taylor
ed6b3b82bd
FIX: Reopen sidekiq log files after rotation (#9429)
Unicorn uses the USR1 to indicate that log files should be reopened. This commit implements the same functionality for our forked sidekiq workers:

- USR1 is intercepted in the unicorn master, and re-issued to all child processes
- USR1 is trapped in the sidekiq processes, and `Unicorn::Util.reopen_logs` is used to re-open log files
2020-04-16 12:13:13 +01:00
Sam Saffron
28292d2759
PERF: avoid shelling to get hostname aggressively
Previously we had many places in the app that called `hostname` to get
hostname of a server. This commit replaces the pattern in 2 ways

1. We cache the result in `Discourse.os_hostname` so it is only ever called once

2. We prefer to use Socket.gethostname which avoids making a shell command

This improves performance as we are not spawning hostname processes throughout
the app lifetime
2020-02-18 15:13:19 +11:00
Sam Saffron
f5396e2700 DEV: Sidekiq::Logging is gone use Sidekiq.logger instead
This 6.0 upgrade of sidekiq moved this around.
2019-12-10 15:09:51 +11:00
Krzysztof Kotlarek
35b1185a08 FIX: Revert Demon::DemonBase back to Demon::Base (#8132)
I introduced DemonBase because I had got some conflict between `demon/base.rb` and `jobs/base.rb`, however, to not rename base class, it is possible to use regex on absolute path in Zeitwerk custom inflector.
2019-10-02 14:54:08 +10:00
Krzysztof Kotlarek
427d54b2b0 DEV: Upgrading Discourse to Zeitwerk (#8098)
Zeitwerk simplifies working with dependencies in dev and makes it easier reloading class chains. 

We no longer need to use Rails "require_dependency" anywhere and instead can just use standard 
Ruby patterns to require files.

This is a far reaching change and we expect some followups here.
2019-10-02 14:01:53 +10:00
David Taylor
e2449f9f23 Revert "Revert "Revert "FIX: Heartbeat check per sidekiq process (#7873)"""
This reverts commit c3497559be.
2019-08-30 11:26:16 +01:00
Sam Saffron
c3497559be Revert "Revert "FIX: Heartbeat check per sidekiq process (#7873)""
This reverts commit e805d44965.
We now have mechanisms in place to ensure heartbeat will always
be scheduled even if the scheduler is overloaded per: 098f938b
2019-08-30 10:12:10 +10:00
OsamaSayegh
e805d44965 Revert "FIX: Heartbeat check per sidekiq process (#7873)"
This reverts commit 340855da55.
2019-08-27 11:56:23 +00:00
Osama Sayegh
340855da55
FIX: Heartbeat check per sidekiq process (#7873)
* FIX: Heartbeat check per sidekiq process

* Rename method

* Remove heartbeat queues of previous bootups

* Regis feedback

* Refactor before_start

* Update lib/demon/sidekiq.rb

Co-Authored-By: Régis Hanol <regis@hanol.fr>

* Update lib/demon/sidekiq.rb

Co-Authored-By: Régis Hanol <regis@hanol.fr>

* Expire redis keys after 3600 seconds

* Don't use redis to store the list of queues
2019-08-26 09:33:49 +03:00
Sam Saffron
30990006a9 DEV: enable frozen string literal on all files
This reduces chances of errors where consumers of strings mutate inputs
and reduces memory usage of the app.

Test suite passes now, but there may be some stuff left, so we will run
a few sites on a branch prior to merging
2019-05-13 09:31:32 +08:00
Sam
384135845b FEATURE: introduce ultra_low priority queue
This commit introduces an ultra low priority queue for post rebakes. This
way rebakes can never interfere with regular sidekiq processing for cases
where we perform a large scale rebake.

Additionally it allows Post.rebake_old to be run with rate_limiter: false
to avoid triggering the limiter when rebaking. This is handy for cases
where you want to just force the full rebake and not wait for it to trickle
2019-01-17 14:53:19 +11:00
Sam
df460b4abd PERF: run sidekiq with nice 5
This ensures that unicorn master forks of sidekiq run with a lower priority
than the webs. It means that a busy sidekiq is less likely to impact web
performance
2019-01-09 09:29:14 +11:00
Guo Xiang Tan
470b1a5bc1 Don't print Sidekiq starting message to STDERR. 2017-11-03 21:02:31 +08:00
Guo Xiang Tan
5012d46cbd Add rubocop to our build. (#5004) 2017-07-28 10:20:09 +09:00
Guo Xiang Tan
7ea288140d Allow multiple host when restricting Sidekiq queues. 2017-06-19 14:45:51 +09:00
Guo Xiang Tan
84490c4558 Allow a sidekiq queue to be configured to only run on a certain hostname. 2017-04-27 15:32:16 +08:00
Sam
0b3aec9c94 FEATURE: set UNICORN_STATS_SOCKET_DIR for status socket
eg:

sam@ubuntu stats_sockets % socat - UNIX-CONNECT:9622.sock
gc_stat
{"count":46,"heap_allocated_pages":2459,"heap_sorted_length":2460,"heap_allocatable_pages":0,"heap_available_slots":1002267,"heap_live_slots":647293,"heap_free_slots":354974,"heap_final_slots":0,"heap_marked_slots":503494,"heap_swept_slots":498773,"heap_eden_pages":2459,"heap_tomb_pages":0,"total_allocated_pages":2459,"total_freed_pages":0,"total_allocated_objects":4337014,"total_freed_objects":3689721,"malloc_increase_bytes":6448248,"malloc_increase_bytes_limit":29188387,"minor_gc_count":36,"major_gc_count":10,"remembered_wb_unprotected_objects":19958,"remembered_wb_unprotected_objects_limit":39842,"old_objects":462019,"old_objects_limit":895782,"oldmalloc_increase_bytes":6448696,"oldmalloc_increase_bytes_limit":19350882}
2017-04-21 11:37:03 -04:00
Sam
8ec7fd84fd FEATURE: prioritize sidekiq jobs
This commit introduces 3 queues for sidekiq

"critical" for urgent jobs (weighted at 4x weight)
"default" for standard jobs(weighted at 2x weight)
"low" for less important jobs


"critical jobs"

Reset Password emails has been seperated to its own job
Heartbeat which is required to keep sidekiq running
Test email which needs to return real quick


"low priority jobs"

Notify mailing list
Pull hotlinked images
Update gravatar

"default"

All the rest

Note: for people running sidekiq from command line use

bin/sidekiq -q critical,4 -q default,2 -q low
2016-04-07 12:56:43 +10:00
Sam
65edbb609c Revert "Revert message bus upgrade"
This reverts commit 47e718f5b2.
2015-12-09 11:48:41 +11:00
Sam
47e718f5b2 Revert message bus upgrade 2015-12-09 11:45:11 +11:00
Sam
2cc95af69b Revert "REVERT: message bus changes"
This reverts commit 4820d5c7b0.
2015-12-09 07:36:36 +11:00
Robin Ward
4820d5c7b0 REVERT: message bus changes 2015-12-08 15:32:31 -05:00
Sam
a3ba564b03 missing spot where initializer was renamed 2015-12-08 07:13:29 +11:00
Sam
95159fb82a BUGFIX: Sidekiq could be initialized incorrectly in some cases
Symptom, no jobs run
2014-06-03 17:17:10 +10:00
Sam
dc06401479 PERF: reduce sidekiq worker count to 5 2014-05-14 10:21:11 +10:00
Sam
ead7c52a06 Refactor demonizer in prep for unicorn forking
Upgrade sidekiq
2014-04-17 15:58:00 +10:00
Régis Hanol
b56b11d96a add qunit to autospec 2013-11-01 23:57:50 +01:00
Sam
89f801ac04 fix no sidetiq when using demon 2013-10-24 15:58:28 +11:00
Sam
28a0cb494a rails 4 upgrade
rack lock is trouble, nuke it out of orbit
more aggressive suicide for forked sidekiq
2013-10-10 14:23:24 +11:00
Sam
c4bab8915c fix initialization issues with unicorn
amend unicorn script to demonize sidekiq
create a sidekiq demon that unicorn consumes
correct bug in exec_sql with empty params
2013-10-10 14:23:24 +11:00