Commit Graph

59 Commits

Author SHA1 Message Date
Alan Guo Xiang Tan
e4e5db57f0
DEV: Fix undefined method check_email_sync_heartbeat in unicorn conf (#30360)
This is a follow-up to 9812407f76
2024-12-19 10:10:11 +08:00
Alan Guo Xiang Tan
9812407f76
FIX: Redo Sidekiq monitoring to restart stuck sidekiq processes (#30198)
This commit reimplements how we monitor Sidekiq processes that are
forked from the Unicorn master process. Prior to this change, we rely on
`Jobs::Heartbeat` to enqueue a `Jobs::RunHeartbeat` job every 3 minutes.
The `Jobs::RunHeartbeat` job then sets a Redis key with a timestamp. In
the Unicorn master process, we then fetch the timestamp that has been set
by the job from Redis every 30 minutes. If the timestamp has not been
updated for more than 30 minutes, we restart the Sidekiq process. The
fundamental flaw with this approach is that it fails to consider
deployments with multiple hosts and multiple Sidekiq processes. A
sidekiq process on a host may be in a bad state but the heartbeat check
will not restart the process because the `Jobs::RunHeartbeat` job is
still being executed by the working Sidekiq processes on other hosts.

In order to properly ensure that stuck Sidekiq processs are restarted,
we now rely on the [Sidekiq::ProcessSet](https://github.com/sidekiq/sidekiq/wiki/API#processes)
API that is supported by Sidekiq. The API provides us with "near real-time (updated every 5 sec)
info about the current set of Sidekiq processes running". The API
provides useful information like the hostname, pid and also when Sidekiq
last did its own heartbeat check. With that information, we can easily
determine if a Sidekiq process needs to be restarted from the Unicorn
master process.
2024-12-18 12:48:50 +08:00
Alan Guo Xiang Tan
322a3be2db
DEV: Remove logical OR assignment of constants (#29201)
Constants should always be only assigned once. The logical OR assignment
of a constant is a relic of the past before we used zeitwerk for
autoloading and had bugs where a file could be loaded twice resulting in
constant redefinition warnings.
2024-10-16 10:09:07 +08:00
Alan Guo Xiang Tan
c1f25cdf5b
FIX: Unicorn master and Sidekiq reopening logs at the same time (#29137)
In our production environment, we have been seeing Sidekiq processes
getting stuck randomly when a USR1 signal is sent to the Unicorn master
process. We have not been able to identify the root cause of why the
Sidekiq process gets stuck. We however noticed that when the Unicorn
master process receives a USR1 signal, it will reopen the logs for the
Unicorn master process first before sending a USR1 signal for the
Unicorn worker processes to reopen the logs. We figured that we should
do the same for the Sidekiq process as well when a USR1 signal.

In this commit, we introduce an arbitrary delay of 1 second before we
the Sidekiq process reopens its log files so as to allow enough time for the Unicorn
master to finish reopening it logs first.

We also do not send reopen logs for the Sidekiq process if the `DISCOURSE_LOG_SIDEKIQ`
env is not present because there is no need to reopen any logs.
2024-10-10 08:01:40 +08:00
Alan Guo Xiang Tan
10ff0ee0cc
FIX: Ensure we dispose of MiniRacer::Context before forking daemons (#28361)
This commit updates `Demon::Base#start` to call `Discourse.before_fork`
before forking. According to the docs in `mini_racer`, we need to
"Dispose manually of all MiniRacer::Context objects prior to forking".

This commit is motivated by a segmentation fault which we are seeing in
production when killing a daemon process. Backtrace of the core dump
includes traces of `mini_racer` so we think this is the cause. Note that
we are not 100% sure if this will fix the issue.
2024-08-14 12:45:34 +08:00
Alan Guo Xiang Tan
9fdcdcf58d
DEV: Log error encountered when reopening sidekiq logs (#27411)
We are seeing the following error in our logs when Sidekiq is sent a
`USR1` signal in production when logrotate happens:

```
log writing failed. stream closed in another thread
Error encountered while starting Sidekiq: can't be called from trap context\n/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/unicorn-6.1.0/lib/unicorn/util.rb:71:in `reopen'
```

I'm not quite sure where the error is triggered from so I'm improving
the way we log errors.
2024-06-11 12:29:48 +08:00
Alan Guo Xiang Tan
23c38cbf11
DEV: Log Unicorn worker timeout backtraces to Rails.logger (#27257)
This commit introduces the following changes:

1. Introduce the `SignalTrapLogger` singleton which starts a single
   thread that polls a queue to log messages with the specified logger.
   This thread is necessary becasue most loggers cannot be used inside
   the `Signal.trap` context as they rely on mutexes which are not
   allowed within the context.

2. Moves the monkey patch in `freedom_patches/unicorn_http_server_patch.rb` to
   `config/unicorn.config.rb` which is already monkey patching
   `Unicorn::HttpServer`.

3. `Unicorn::HttpServer` will now automatically send a `USR2` signal to
   a unicorn worker 2 seconds before the worker is timed out by the
   Unicorn master.

4. When a Unicorn worker receives a `USR2` signal, it will now log only
   the main thread's backtraces to `Rails.logger`. Previously, it was
   `put`ing the backtraces to `STDOUT` which most people wouldn't read.
   Logging it via `Rails.logger` will make the backtraces easily
   accessible via `/logs`.
2024-06-03 12:51:12 +08:00
Alan Guo Xiang Tan
e9c8e182d3
DEV: Use Unicorn logger to log Sidekiq signal handling events (#27239)
This commit updates all Sidekiq signal handling event logs to go through
Unicorn's logger instead of logging to STDOUT. Going through a proper logger
means the log messages are logged in the format which the logger has configured.
This means we get proper timestamp for the log messages.
2024-05-29 11:15:20 +08:00
Jarek Radosz
694b5f108b
DEV: Fix various rubocop lints (#24749)
These (21 + 3 from previous PRs) are soon to be enabled in rubocop-discourse:

Capybara/VisibilityMatcher
Lint/DeprecatedOpenSSLConstant
Lint/DisjunctiveAssignmentInConstructor
Lint/EmptyConditionalBody
Lint/EmptyEnsure
Lint/LiteralInInterpolation
Lint/NonLocalExitFromIterator
Lint/ParenthesesAsGroupedExpression
Lint/RedundantCopDisableDirective
Lint/RedundantRequireStatement
Lint/RedundantSafeNavigation
Lint/RedundantStringCoercion
Lint/RedundantWithIndex
Lint/RedundantWithObject
Lint/SafeNavigationChain
Lint/SafeNavigationConsistency
Lint/SelfAssignment
Lint/UnreachableCode
Lint/UselessMethodDefinition
Lint/Void

Previous PRs:
Lint/ShadowedArgument
Lint/DuplicateMethods
Lint/BooleanSymbol
RSpec/SpecFilePathSuffix
2023-12-06 23:25:00 +01:00
David Taylor
6417173082
DEV: Apply syntax_tree formatting to lib/* 2023-01-09 12:10:19 +00:00
David Taylor
fe5bfc8d3b
DEV: Route Sidekiq logs to Rails logger (#15817)
Most of our logging goes through Rails.logger, and therefore appears in Logster at `/logs` on a site. The Sidekiq logger was bypassing this and writing directly to STDERR.

Unfortunately it's not possible to do `Sidekiq.logger = Rails.logger` because `Sidekiq#logger=` applies a number of patches to the logger instance, causing our whole logging system to break.

Instead, this commit adds a dedicated Logger instance with no output, which is then patched to forward all messages directly to `Rails.logger`
2022-02-04 16:28:20 +00:00
David Taylor
dfcb8a72fd
DEV: Ensure Sidekiq warnings are logged to STDERR (#15800)
The default configuration will log to STOUT, which pollutes the output of scripts/rake-tasks
2022-02-03 14:24:15 +00:00
Peter Zhu
c5fd8c42db
DEV: Fix methods removed in Ruby 3.2 (#15459)
* File.exists? is deprecated and removed in Ruby 3.2 in favor of
File.exist?
* Dir.exists? is deprecated and removed in Ruby 3.2 in favor of
Dir.exist?
2022-01-05 18:45:08 +01:00
Martin Brennan
f34fa999a2
DEV: IMAP debugging improvements (#11784)
Improvements to make console access to IncomingEmail more pleasant, and stopping certain IMAP logs from landing in the DB because they just create too much noise,
2021-01-21 11:37:47 +10:00
David Taylor
1d024f77a6
FEATURE: Allow plugins to register demon processes (#11493)
This allows plugins to call `register_demon_process` with a Class inheriting from Demon::Base. The unicorn master process will take care of spawning, monitoring and restarting the process. This API should be used with extreme caution, but it is significantly cleaner than spawning processes/threads in an `after_initialize` block.

This commit also cleans up the demon spawning logging so that it uses the same format as unicorn worker logging. It also switches to the block form of `fork` to ensure that Demons exit after running, rather than returning execution to where the fork took place.
2020-12-16 09:43:39 +00:00
David Taylor
477538bf2d
DEV: setproctitle on demon processes (#11402)
This makes it easier to identify processes in `ps` output
2020-12-04 09:41:17 +00:00
Martin Brennan
7f2f87bf59
DEV: Review fixes (#10641)
See comments in https://review.discourse.org/t/dev-imap-log-to-database-10435/14337/6 for context.
2020-09-10 13:41:46 +10:00
Martin Brennan
4670b62969
DEV: IMAP log to database (#10435)
Convert all IMAP logging to write to a database table for easier inspection. These logs are cleaned up daily if they are > 5 days old.

Logs can easily be watched in dev by setting DISCOURSE_DEV_LOG_LEVEL=\"debug\" and running tail -f development.log | grep IMAP
2020-08-14 12:01:31 +10:00
Martin Brennan
5a3494b1e1
FIX: IMAP archive fix and group list mailbox code unification (#10355)
* Fixed an issue I introduced in the last PR where I am just archiving everything regardless of whether it is actually archived in Discourse man_facepalming
* Refactor group list_mailboxes IMAP code to use providers, add specs, and add provider code to get the correct prodivder
2020-08-04 14:19:57 +10:00
Martin Brennan
2920988b3a
FIX: IMAP sync email update uniqueness across groups and minor improvements (#10332)
Adds a imap_group_id column to IncomingEmail to deal with an issue where we were trying to update emails in the mailbox, calling IncomingEmail.where(imap_sync: true). However UID and UIDVALIDITY could be the same across accounts. So if group A used IMAP details for Gmail account A, and group B used IMAP details for Gmail account B, and both tried to sync changes to an email with UID of 3 (e.g. changing Labels), one account could affect the other. This even applied to Archiving!

Also in this PR:

* Fix error occurring if we do a uid_fetch and no emails are returned
* Allow for creating labels within the target mailbox (previously we would not do this, only use existing labels)
* Improve consistency for log messages
* Add specs for generic IMAP provider (Gmail specs still to come)
* Add custom archiving support for Gmail
* Only use Message-ID for uniqueness of IncomingEmail if it was generated by us
* Various refactors and improvements
2020-08-03 13:10:17 +10:00
Dan Ungureanu
c72bc27888
FEATURE: Implement support for IMAP and SMTP email protocols. (#8301)
Co-authored-by: Joffrey JAFFEUX <j.jaffeux@gmail.com>
2020-07-10 12:05:55 +03:00
David Taylor
ed6b3b82bd
FIX: Reopen sidekiq log files after rotation (#9429)
Unicorn uses the USR1 to indicate that log files should be reopened. This commit implements the same functionality for our forked sidekiq workers:

- USR1 is intercepted in the unicorn master, and re-issued to all child processes
- USR1 is trapped in the sidekiq processes, and `Unicorn::Util.reopen_logs` is used to re-open log files
2020-04-16 12:13:13 +01:00
Sam Saffron
28292d2759
PERF: avoid shelling to get hostname aggressively
Previously we had many places in the app that called `hostname` to get
hostname of a server. This commit replaces the pattern in 2 ways

1. We cache the result in `Discourse.os_hostname` so it is only ever called once

2. We prefer to use Socket.gethostname which avoids making a shell command

This improves performance as we are not spawning hostname processes throughout
the app lifetime
2020-02-18 15:13:19 +11:00
Sam Saffron
f5396e2700 DEV: Sidekiq::Logging is gone use Sidekiq.logger instead
This 6.0 upgrade of sidekiq moved this around.
2019-12-10 15:09:51 +11:00
Krzysztof Kotlarek
35b1185a08 FIX: Revert Demon::DemonBase back to Demon::Base (#8132)
I introduced DemonBase because I had got some conflict between `demon/base.rb` and `jobs/base.rb`, however, to not rename base class, it is possible to use regex on absolute path in Zeitwerk custom inflector.
2019-10-02 14:54:08 +10:00
Krzysztof Kotlarek
427d54b2b0 DEV: Upgrading Discourse to Zeitwerk (#8098)
Zeitwerk simplifies working with dependencies in dev and makes it easier reloading class chains. 

We no longer need to use Rails "require_dependency" anywhere and instead can just use standard 
Ruby patterns to require files.

This is a far reaching change and we expect some followups here.
2019-10-02 14:01:53 +10:00
David Taylor
e2449f9f23 Revert "Revert "Revert "FIX: Heartbeat check per sidekiq process (#7873)"""
This reverts commit c3497559be.
2019-08-30 11:26:16 +01:00
Sam Saffron
c3497559be Revert "Revert "FIX: Heartbeat check per sidekiq process (#7873)""
This reverts commit e805d44965.
We now have mechanisms in place to ensure heartbeat will always
be scheduled even if the scheduler is overloaded per: 098f938b
2019-08-30 10:12:10 +10:00
OsamaSayegh
e805d44965 Revert "FIX: Heartbeat check per sidekiq process (#7873)"
This reverts commit 340855da55.
2019-08-27 11:56:23 +00:00
Osama Sayegh
340855da55
FIX: Heartbeat check per sidekiq process (#7873)
* FIX: Heartbeat check per sidekiq process

* Rename method

* Remove heartbeat queues of previous bootups

* Regis feedback

* Refactor before_start

* Update lib/demon/sidekiq.rb

Co-Authored-By: Régis Hanol <regis@hanol.fr>

* Update lib/demon/sidekiq.rb

Co-Authored-By: Régis Hanol <regis@hanol.fr>

* Expire redis keys after 3600 seconds

* Don't use redis to store the list of queues
2019-08-26 09:33:49 +03:00
Sam Saffron
30990006a9 DEV: enable frozen string literal on all files
This reduces chances of errors where consumers of strings mutate inputs
and reduces memory usage of the app.

Test suite passes now, but there may be some stuff left, so we will run
a few sites on a branch prior to merging
2019-05-13 09:31:32 +08:00
Sam
384135845b FEATURE: introduce ultra_low priority queue
This commit introduces an ultra low priority queue for post rebakes. This
way rebakes can never interfere with regular sidekiq processing for cases
where we perform a large scale rebake.

Additionally it allows Post.rebake_old to be run with rate_limiter: false
to avoid triggering the limiter when rebaking. This is handy for cases
where you want to just force the full rebake and not wait for it to trickle
2019-01-17 14:53:19 +11:00
Sam
df460b4abd PERF: run sidekiq with nice 5
This ensures that unicorn master forks of sidekiq run with a lower priority
than the webs. It means that a busy sidekiq is less likely to impact web
performance
2019-01-09 09:29:14 +11:00
Sam
cc3fc87dd7 DEV: handle termination cleanly in autospec 2018-06-19 16:13:36 +10:00
Sam
fc05164667 demo script for demonizing using fork exec
minor refinements to demon
2018-01-11 13:51:52 +11:00
Guo Xiang Tan
470b1a5bc1 Don't print Sidekiq starting message to STDERR. 2017-11-03 21:02:31 +08:00
Guo Xiang Tan
5012d46cbd Add rubocop to our build. (#5004) 2017-07-28 10:20:09 +09:00
Guo Xiang Tan
7ea288140d Allow multiple host when restricting Sidekiq queues. 2017-06-19 14:45:51 +09:00
Guo Xiang Tan
84490c4558 Allow a sidekiq queue to be configured to only run on a certain hostname. 2017-04-27 15:32:16 +08:00
Sam
0b3aec9c94 FEATURE: set UNICORN_STATS_SOCKET_DIR for status socket
eg:

sam@ubuntu stats_sockets % socat - UNIX-CONNECT:9622.sock
gc_stat
{"count":46,"heap_allocated_pages":2459,"heap_sorted_length":2460,"heap_allocatable_pages":0,"heap_available_slots":1002267,"heap_live_slots":647293,"heap_free_slots":354974,"heap_final_slots":0,"heap_marked_slots":503494,"heap_swept_slots":498773,"heap_eden_pages":2459,"heap_tomb_pages":0,"total_allocated_pages":2459,"total_freed_pages":0,"total_allocated_objects":4337014,"total_freed_objects":3689721,"malloc_increase_bytes":6448248,"malloc_increase_bytes_limit":29188387,"minor_gc_count":36,"major_gc_count":10,"remembered_wb_unprotected_objects":19958,"remembered_wb_unprotected_objects_limit":39842,"old_objects":462019,"old_objects_limit":895782,"oldmalloc_increase_bytes":6448696,"oldmalloc_increase_bytes_limit":19350882}
2017-04-21 11:37:03 -04:00
Sam
8ec7fd84fd FEATURE: prioritize sidekiq jobs
This commit introduces 3 queues for sidekiq

"critical" for urgent jobs (weighted at 4x weight)
"default" for standard jobs(weighted at 2x weight)
"low" for less important jobs


"critical jobs"

Reset Password emails has been seperated to its own job
Heartbeat which is required to keep sidekiq running
Test email which needs to return real quick


"low priority jobs"

Notify mailing list
Pull hotlinked images
Update gravatar

"default"

All the rest

Note: for people running sidekiq from command line use

bin/sidekiq -q critical,4 -q default,2 -q low
2016-04-07 12:56:43 +10:00
Sam
65edbb609c Revert "Revert message bus upgrade"
This reverts commit 47e718f5b2.
2015-12-09 11:48:41 +11:00
Sam
47e718f5b2 Revert message bus upgrade 2015-12-09 11:45:11 +11:00
Sam
2cc95af69b Revert "REVERT: message bus changes"
This reverts commit 4820d5c7b0.
2015-12-09 07:36:36 +11:00
Robin Ward
4820d5c7b0 REVERT: message bus changes 2015-12-08 15:32:31 -05:00
Sam
a3ba564b03 missing spot where initializer was renamed 2015-12-08 07:13:29 +11:00
Sam
0e4883a8ae correct regression where monitoring thread crashed out
add logging
2015-06-16 11:16:33 +10:00
Sam
861cd5d9b0 FIX: ensure child demon is correctly terminated from parent on stop 2015-06-15 12:36:47 +10:00
Sam
1721872084 cleanup out-of-memory detection and correction code 2015-03-27 15:44:52 +11:00
Sam
58f3fcbc1a BUGFIX: not terminating self correctly on hangups from parent 2014-06-13 11:15:40 +10:00