Commit Graph

82 Commits

Author SHA1 Message Date
Alan Guo Xiang Tan
9812407f76
FIX: Redo Sidekiq monitoring to restart stuck sidekiq processes (#30198)
This commit reimplements how we monitor Sidekiq processes that are
forked from the Unicorn master process. Prior to this change, we rely on
`Jobs::Heartbeat` to enqueue a `Jobs::RunHeartbeat` job every 3 minutes.
The `Jobs::RunHeartbeat` job then sets a Redis key with a timestamp. In
the Unicorn master process, we then fetch the timestamp that has been set
by the job from Redis every 30 minutes. If the timestamp has not been
updated for more than 30 minutes, we restart the Sidekiq process. The
fundamental flaw with this approach is that it fails to consider
deployments with multiple hosts and multiple Sidekiq processes. A
sidekiq process on a host may be in a bad state but the heartbeat check
will not restart the process because the `Jobs::RunHeartbeat` job is
still being executed by the working Sidekiq processes on other hosts.

In order to properly ensure that stuck Sidekiq processs are restarted,
we now rely on the [Sidekiq::ProcessSet](https://github.com/sidekiq/sidekiq/wiki/API#processes)
API that is supported by Sidekiq. The API provides us with "near real-time (updated every 5 sec)
info about the current set of Sidekiq processes running". The API
provides useful information like the hostname, pid and also when Sidekiq
last did its own heartbeat check. With that information, we can easily
determine if a Sidekiq process needs to be restarted from the Unicorn
master process.
2024-12-18 12:48:50 +08:00
Alan Guo Xiang Tan
2f65f36cae
DEV: Fix duplicated logging of unicorn worker backtraces (#30178)
In 6cafe59c76, we added a monkey patch to
`Unicorn::HtppServer#murder_lazy_workers` to log a message and send a
`USR2` signal to the Unicorn worker process when they Unicorn worker
process is 2 seconds away from being timed out by the Unicorn master
process. However, we ended up loggging multiple messages and sending
multiple USR2 signal during the 2 seconds before the Unicorn worker
process hit the time out.

To overcome this problem, we will now set an instance variable on the
`Unicorn::Worker` instance and use it to ensure that the log message is
only logged once and USR2 signal to the Unicorn worker process is only
sent one as well.
2024-12-10 06:47:33 +08:00
Alan Guo Xiang Tan
c1f25cdf5b
FIX: Unicorn master and Sidekiq reopening logs at the same time (#29137)
In our production environment, we have been seeing Sidekiq processes
getting stuck randomly when a USR1 signal is sent to the Unicorn master
process. We have not been able to identify the root cause of why the
Sidekiq process gets stuck. We however noticed that when the Unicorn
master process receives a USR1 signal, it will reopen the logs for the
Unicorn master process first before sending a USR1 signal for the
Unicorn worker processes to reopen the logs. We figured that we should
do the same for the Sidekiq process as well when a USR1 signal.

In this commit, we introduce an arbitrary delay of 1 second before we
the Sidekiq process reopens its log files so as to allow enough time for the Unicorn
master to finish reopening it logs first.

We also do not send reopen logs for the Sidekiq process if the `DISCOURSE_LOG_SIDEKIQ`
env is not present because there is no need to reopen any logs.
2024-10-10 08:01:40 +08:00
Alan Guo Xiang Tan
10ff0ee0cc
FIX: Ensure we dispose of MiniRacer::Context before forking daemons (#28361)
This commit updates `Demon::Base#start` to call `Discourse.before_fork`
before forking. According to the docs in `mini_racer`, we need to
"Dispose manually of all MiniRacer::Context objects prior to forking".

This commit is motivated by a segmentation fault which we are seeing in
production when killing a daemon process. Backtrace of the core dump
includes traces of `mini_racer` so we think this is the cause. Note that
we are not 100% sure if this will fix the issue.
2024-08-14 12:45:34 +08:00
Alan Guo Xiang Tan
5105fce899
DEV: Recover @timestamp in unicorn logs when logstash logger is enabled (#28008)
This is a regression introduced in 28f5550886
2024-07-22 15:21:41 +08:00
Alan Guo Xiang Tan
28f5550886
DEV: Redo DiscourseLogstashLogger to not rely on logstash-logger (#27759)
This reverts commit 92d7d24d0f.
2024-07-08 14:03:11 +08:00
Alan Guo Xiang Tan
92d7d24d0f
Revert "DEV: Redo DiscourseLogstashLogger to not rely on logstash-logger (#27663)" (#27733)
This reverts commit 8e10878e1a.

Something is broken on a friday so reverting first before I pick this up
again next Monday.
2024-07-05 17:26:58 +08:00
Alan Guo Xiang Tan
8e10878e1a
DEV: Redo DiscourseLogstashLogger to not rely on logstash-logger (#27663)
This commit rewrites `DiscourseLogstashLogger` to not be an instance
of `LogstashLogger`. The reason we don't want it to be an instance of
`LogstashLogger` is because we want the new logger to be chained to
Logster's logger which can then pass down useful information like the
request's env and error backtraces which Logster has already gathered.

Note that this commit does not bother to maintain backwards
compatibility and drops the `LOGSTASH_URI` and `UNICORN_LOGSTASH_URI`
ENV variables which were previously used to configure the destination in
which `logstash-logger` would send the logs to. Instead, we introduce
the `ENABLE_LOGSTASH_LOGGER` ENV variable to replace both ENV and remove
the need for the log paths to be specified. Note that the previous
feature was considered experimental as stated in d888d3c54c
and the new feature should be considered experimental as well. The code
may be moved into a plugin in the future.
2024-07-05 09:41:52 +08:00
Alan Guo Xiang Tan
23c38cbf11
DEV: Log Unicorn worker timeout backtraces to Rails.logger (#27257)
This commit introduces the following changes:

1. Introduce the `SignalTrapLogger` singleton which starts a single
   thread that polls a queue to log messages with the specified logger.
   This thread is necessary becasue most loggers cannot be used inside
   the `Signal.trap` context as they rely on mutexes which are not
   allowed within the context.

2. Moves the monkey patch in `freedom_patches/unicorn_http_server_patch.rb` to
   `config/unicorn.config.rb` which is already monkey patching
   `Unicorn::HttpServer`.

3. `Unicorn::HttpServer` will now automatically send a `USR2` signal to
   a unicorn worker 2 seconds before the worker is timed out by the
   Unicorn master.

4. When a Unicorn worker receives a `USR2` signal, it will now log only
   the main thread's backtraces to `Rails.logger`. Previously, it was
   `put`ing the backtraces to `STDOUT` which most people wouldn't read.
   Logging it via `Rails.logger` will make the backtraces easily
   accessible via `/logs`.
2024-06-03 12:51:12 +08:00
Alan Guo Xiang Tan
e9c8e182d3
DEV: Use Unicorn logger to log Sidekiq signal handling events (#27239)
This commit updates all Sidekiq signal handling event logs to go through
Unicorn's logger instead of logging to STDOUT. Going through a proper logger
means the log messages are logged in the format which the logger has configured.
This means we get proper timestamp for the log messages.
2024-05-29 11:15:20 +08:00
Alan Guo Xiang Tan
47523fa57c
DEV: Use existing loggers for stuff we log in config/unicorn.conf.rb (#27237)
This commit updates various spots in `config/unicorn.conf.rb` which were
doing `STDERR.puts` to either use `server.logger` which is unicorn's
logger or `Rails.logger` which is Rails' logger. The reason we want to
do so is because `STDERR.puts` doesn't format the logs properly and is a
problem especially when custom loggers with structured formatting is
enabled.
2024-05-29 09:34:09 +08:00
Alan Guo Xiang Tan
6cafe59c76
DEV: Add DISCOURSE_DUMP_BACKTRACES_ON_UNICORN_WORKER_TIMEOUT env (#27199)
This commit adds a `DISCOURSE_DUMP_BACKTRACES_ON_UNICORN_WORKER_TIMEOUT`
environment that will allow us to dump all backtraces for all threads of
a Unicorn worker 2 seconds before it times out. In development,
backtraces are dumped to `STDOUT` and in production we will dump it to
`unicorn.stdout.log`.

We want to dump all the backtraces to make it easier to identify the
cause of a Unicorn worker timing out.
2024-05-27 12:20:38 +08:00
Jarek Radosz
694b5f108b
DEV: Fix various rubocop lints (#24749)
These (21 + 3 from previous PRs) are soon to be enabled in rubocop-discourse:

Capybara/VisibilityMatcher
Lint/DeprecatedOpenSSLConstant
Lint/DisjunctiveAssignmentInConstructor
Lint/EmptyConditionalBody
Lint/EmptyEnsure
Lint/LiteralInInterpolation
Lint/NonLocalExitFromIterator
Lint/ParenthesesAsGroupedExpression
Lint/RedundantCopDisableDirective
Lint/RedundantRequireStatement
Lint/RedundantSafeNavigation
Lint/RedundantStringCoercion
Lint/RedundantWithIndex
Lint/RedundantWithObject
Lint/SafeNavigationChain
Lint/SafeNavigationConsistency
Lint/SelfAssignment
Lint/UnreachableCode
Lint/UselessMethodDefinition
Lint/Void

Previous PRs:
Lint/ShadowedArgument
Lint/DuplicateMethods
Lint/BooleanSymbol
RSpec/SpecFilePathSuffix
2023-12-06 23:25:00 +01:00
Martin Brennan
553b4888ba
DEV: Revert syntax-tree line change in unicorn.conf.rb listen (#19874)
Some internal tooling expects this to be one line, see /t/90198/13
2023-01-16 13:17:23 +10:00
David Taylor
7c77cc6a58
DEV: Apply syntax_tree formatting to config/* 2023-01-09 11:13:29 +00:00
Jarek Radosz
5a50f18c0c
DEV: Avoid $ globals (#15453)
Also:
* Remove an unused method (#fill_email)
* Replace a method that was used just once (#generate_username) with `SecureRandom.alphanumeric`
* Remove an obsolete dev puma `tmp/restart` file logic
2022-01-08 23:39:46 +01:00
Peter Zhu
c5fd8c42db
DEV: Fix methods removed in Ruby 3.2 (#15459)
* File.exists? is deprecated and removed in Ruby 3.2 in favor of
File.exist?
* Dir.exists? is deprecated and removed in Ruby 3.2 in favor of
Dir.exist?
2022-01-05 18:45:08 +01:00
Alan Guo Xiang Tan
8775191cf3 Revert "DEV: suppress assets logs from qunit tests (#13871)"
This reverts commit 1b649016d9.

Test is failing with the following message:

```
navigate to  http://localhost:60099/qunit?hidepassed=1&seed=312984199721107128645754962579661839397&qunit_disable_auto_start=1
2021-07-29T02:57:13.303Z - (type: network/error) message: Failed to load resource: the server responded with a status of 403 (Forbidden), url: http://localhost:60099/extra-locales/admin?v=eeeea38a700966a1a7fb600b23f3a222
2021-07-29T02:57:43.823Z - (type: network/error) message: Failed to load resource: net::ERR_EMPTY_RESPONSE, url: http://localhost:60099/assets/discourse/tests/test_helper.js
ReferenceError: define is not defined
    at eval (discourse/tests/acceptance/about-test:1:1)
    at http://localhost:60099/assets/discourse/tests/core_plugins_tests.js:1:1
ReferenceError: require is not defined
    at eval (discourse/tests/test_starter:5:24)
    at http://localhost:60099/assets/discourse/tests/test_starter.js:1:1
2021-07-29T02:57:43.891Z - (type: network/error) message: Failed to load resource: the server responded with a status of 403 (Forbidden), url: http://localhost:60099/extra-locales/admin?v=eeeea38a700966a1a7fb600b23f3a222

Tests timed out
```
2021-07-29 13:28:24 +08:00
Joffrey JAFFEUX
1b649016d9
DEV: suppress assets logs from qunit tests (#13871)
Partially reverts: 956f849250

Note that this way of running tests will soon be deprecated in favor of `ember test` and this shouldn’t matter anymore.
2021-07-28 14:27:45 +02:00
Alan Guo Xiang Tan
cadf5eafe6 DEV: Move Discourse app specific concern out of unicorn conf. 2021-06-04 09:13:34 +08:00
David Taylor
cac7725e28
DEV: Improve bin/unicorn boot time in development environment (#12900)
This commit makes a few changes to improve boot time in development environments. It will have no effect on production boot times.

- Skip the SchemaCache warmup. In development mode, the SchemaCache is refreshed every time there is a code change, so warmup is of limited use.

- Skip warming up PrettyText. This adds ~2s to each web worker's boot time. The vast majority of requests do not use PrettyText, so it is more efficient to defer its warmup until it's needed

- Skip the intentional 1 second pause during Unicorn worker forking. The comment (which also exists in Unicorn's documentation) suggests this works around a Unix signal handling bug, but I haven't been able to locate any more information. Skipping it in dev will significantly speed up boot. If we start to see issues, we can revert this change.

On my machine, this improves `/bin/unicorn` boot time from >10s to ~4s
2021-04-30 11:32:13 +01:00
Jarek Radosz
956f849250 DEV: Enable unicorn logger in test environment 2021-02-11 15:24:15 +01:00
David Taylor
1d024f77a6
FEATURE: Allow plugins to register demon processes (#11493)
This allows plugins to call `register_demon_process` with a Class inheriting from Demon::Base. The unicorn master process will take care of spawning, monitoring and restarting the process. This API should be used with extreme caution, but it is significantly cleaner than spawning processes/threads in an `after_initialize` block.

This commit also cleans up the demon spawning logging so that it uses the same format as unicorn worker logging. It also switches to the block form of `fork` to ensure that Demons exit after running, rather than returning execution to where the fork took place.
2020-12-16 09:43:39 +00:00
Matt Palmer
1ad270965c
FEATURE: Allow the specification of an arbitrary unicorn listen address
Useful if you want to, say, have your unicorn listen on a Unix domain
socket, rather than a TCP port, or you want to be able to bind to a
single address other than 127.0.0.1.
2020-07-28 13:03:17 +10:00
Martin Brennan
a9905ef7e5
FIX: Add enable_email_sync_demon global variable and disable EmailSync demon by default (#10304)
Demon::EmailSync is used in conjunction with the SiteSetting.enable_imap to sync N IMAP mailboxes with specific groups. It is a process started in unicorn.conf, and it spawns N threads (one for each multisite connection) and for each database spans another N threads (one for each configured group).

We want this off by default so the process is not started when it does not need to be (e.g. development, test, certain hosting tiers)
2020-07-24 17:09:29 +10:00
Guo Xiang Tan
bb8f1ce8b1
DEV: Consolidate Unicorn error backtraces when logstash is enabled. 2020-07-21 15:35:41 +08:00
Dan Ungureanu
c72bc27888
FEATURE: Implement support for IMAP and SMTP email protocols. (#8301)
Co-authored-by: Joffrey JAFFEUX <j.jaffeux@gmail.com>
2020-07-10 12:05:55 +03:00
Guo Xiang Tan
b66f2187f1
DEV: Don't use logstash for unicorn if ENV is blank. 2020-06-11 15:58:18 +08:00
Guo Xiang Tan
e4cd4f7e0b
DEV: Avoid reaching for Redis#_client which is considered deprecated. 2020-06-02 11:46:55 +08:00
David Taylor
ed6b3b82bd
FIX: Reopen sidekiq log files after rotation (#9429)
Unicorn uses the USR1 to indicate that log files should be reopened. This commit implements the same functionality for our forked sidekiq workers:

- USR1 is intercepted in the unicorn master, and re-issued to all child processes
- USR1 is trapped in the sidekiq processes, and `Unicorn::Util.reopen_logs` is used to re-open log files
2020-04-16 12:13:13 +01:00
Jarek Radosz
85e03a7f68
DEV: Replace Time.new with Time.now (#9142)
(or `Time.zone.now`)
2020-03-09 17:37:49 +01:00
David Taylor
7b6bafc651 Revert "DEV: Bind to ipv6 loopback address in addition to ipv4 (#8544)"
This reverts commit 1002bf62e5.
2019-12-16 15:51:04 +00:00
David Taylor
1002bf62e5
DEV: Bind to ipv6 loopback address in addition to ipv4 (#8544) 2019-12-16 11:01:02 +00:00
Joffrey JAFFEUX
0d3d2c43a0
DEV: s/\$redis/Discourse\.redis (#8431)
This commit also adds a rubocop rule to prevent global variables.
2019-12-03 10:05:53 +01:00
Jeff Wong
64d51ab45e DEV: add UNICORN_BIND_ALL
listen on local interface by default.
2019-10-28 14:11:19 -07:00
Sam Saffron
8d5f47dded PREF: optimise preloading application
We preload to ensure as much memory as possible is reused from unicorn master
to various workers using copy-on-write (sidekiq, unicorn)

This migrates the preloading code into the Discourse module for easier
reuse and adds 3 notable preloading changes

1. We attempt to localize a string on each site, ensuring we warmup
the i18n

2. We preload all our templates (compiling .erb to class)

3. We warm-up our search tokenizer which uses cppjieba which is a large
memory consumer, this will only cause a warmup on CJK sites or sites with
the special site setting enabled.
2019-10-07 00:33:37 -04:00
Sam Saffron
e0a403edfc PERF: ensure we warm up schema cache in the entire multisite
This makes sure that all processes that fork off the master have a fully
operation schema cache.

In Rails 6, schema cache is now bolted to the connection pool. This change
ensures the cache on all pools is fully populated prior to forking.

The bolting of cache to connection pool does lead to some strange cases
where a connection can "steal" the cache from another connection, which
can cause stuff to possibly hang or deadlock. This change minimizes the risk
of this happening cause it is already primed.

We make a STRONG assumption that the schema is always the same on all sites
when we spin up a multisite cluster.
2019-09-16 17:38:13 +10:00
David Taylor
e2449f9f23 Revert "Revert "Revert "FIX: Heartbeat check per sidekiq process (#7873)"""
This reverts commit c3497559be.
2019-08-30 11:26:16 +01:00
Sam Saffron
c3497559be Revert "Revert "FIX: Heartbeat check per sidekiq process (#7873)""
This reverts commit e805d44965.
We now have mechanisms in place to ensure heartbeat will always
be scheduled even if the scheduler is overloaded per: 098f938b
2019-08-30 10:12:10 +10:00
OsamaSayegh
e805d44965 Revert "FIX: Heartbeat check per sidekiq process (#7873)"
This reverts commit 340855da55.
2019-08-27 11:56:23 +00:00
Osama Sayegh
340855da55
FIX: Heartbeat check per sidekiq process (#7873)
* FIX: Heartbeat check per sidekiq process

* Rename method

* Remove heartbeat queues of previous bootups

* Regis feedback

* Refactor before_start

* Update lib/demon/sidekiq.rb

Co-Authored-By: Régis Hanol <regis@hanol.fr>

* Update lib/demon/sidekiq.rb

Co-Authored-By: Régis Hanol <regis@hanol.fr>

* Expire redis keys after 3600 seconds

* Don't use redis to store the list of queues
2019-08-26 09:33:49 +03:00
Sam Saffron
76173dea87 DEV: ensure we never fork v8 contexts from unicorn
v8 forking is not supported and can lead to memory leaks.

This commit handles the most common case which is the unicorn master forking

There are still some cases related to backup where we fork, however those
forks are usually short lived so the memory leak is not severe, burning
the contexts in the master process could break sidekiq or web process that
do the actual forking
2019-05-16 09:50:34 +10:00
Sam Saffron
30990006a9 DEV: enable frozen string literal on all files
This reduces chances of errors where consumers of strings mutate inputs
and reduces memory usage of the app.

Test suite passes now, but there may be some stuff left, so we will run
a few sites on a branch prior to merging
2019-05-13 09:31:32 +08:00
Guo Xiang Tan
4774633dac DEV: Remove StatsSocket.
Removed in favor of https://github.com/discourse/discourse-prometheus.
2019-03-26 18:16:58 +08:00
Guo Xiang Tan
edbcc992d4 Allow unicorn timeout to be configurable via ENV. 2018-09-04 13:21:41 +08:00
Sam
1172e141cd adjust timeouts in dev 2018-08-15 11:13:43 +10:00
Angus McLeod
6c41b54b2e FIX: create tmp if it doesn't exist when creating tmp/pids
I get this error if I stop a dev server, ``rm -rf tmp`` and start it again:
```
`mkdir': No such file or directory @ dir_s_mkdir - /Users/angusmcleod/discourse/discourse/tmp/pids (Errno::ENOENT)
```
This fixes it.

See: f3549291a3 (diff-26ac62db6c6a4582de3bbf2615790c23R22)
2018-08-08 14:49:09 +10:00
Sam
f3549291a3 DEV: use unicorn in development
This commit also cleans up a bunch of pointless noise each time we boot app

- narrative was loading i18n cause redefinition of consts
- discourse.rb was loaded twice as was auth
- bin/unicorn now does all the smart things and boots unicron in dev
- bin/rails s will boot unicorn with no params
- remove bin/puma which only causes confusion
2018-08-07 17:13:47 +10:00
Guo Xiang Tan
805fd17b23 ActiveRecord in Rails 5.2 discards connection pools after fork. 2018-06-12 09:30:52 +08:00
Guo Xiang Tan
01f126e38f Simplify unicorn config. 2018-06-07 09:03:16 +08:00