discourse

mirror of https://github.com/discourse/discourse.git synced 2024-11-26 01:33:45 +08:00

Author	SHA1	Message	Date
Michael Fitz-Payne	df4a9f96ae	DEV(cache_critical_dns): add additional service runtime variable We'd like to lean on the DNS caching service for more than the standard DB and Redis hosts, but without having to add additional code each time. Define a new environment variable DISCOURSE_DNS_CACHE_ADDITIONAL_SERVICE_NAMES (admittedly a mouthful) which is a list of service names to be added to the static list at process execution time. For example, plugin foo may reference two services that you want to cache the address of. By specifying the following two variables in the process environment, cache_critical_dns will perform the lookup alongside the DB and Redis host variables. ``` DISCOURSE_DNS_CACHE_ADDITIONAL_SERVICE_NAMES='FOO_SERVICE1,FOO_SERVICE2' FOO_SERVICE1='foo.service1.example.com' FOO_SERVICE1_SRV='foo._tcp.example.com' FOO_SERVICE2='foo.service2.example.com' ``` The behaviour when it comes to SRV record lookup is the same as previously implemented for the `DISCOURSE_DB_..` and `DISCOURSE_REDIS_..` variables. For the purposes of the health checks, services defined in the list _are always considered healthy_. This is a compromise for conveniences sake. Defining a dynamic method for health checks at runtime is not practical. See t/88457/32.	2023-01-20 10:03:08 +10:00
Michael Fitz-Payne	5fdbbe3045	DEV(cache_critical_dns): add caching for MessageBus Redis hostname We are already caching any DB_HOST and REDIS_HOST (and their accompanying replicas), we should also cache the resolved addresses for the MessageBus specific Redis. This is a noop if no MB redis is defined in config. A side effect is that the MB will also support SRV lookup and priorities, following the same convention as the other cached services. The port argument was added to redis_healthcheck so that the script supports a setup where Redis is running on a non-default port. Did some minor refactoring to improve readability when filtering out the CRITICAL_HOST_ENV_VARS. The `select` block was a bit confusing, so the sequence was made easier to follow. We were coercing an environment variable to an int in a few places, so the `env_as_int` method was introduced to do that coercion in one place and for convenience purposes default to a value if provided. See /t/68301/30.	2022-10-12 10:11:22 +10:00
Michael Fitz-Payne	1867202a4d	DEV(cache_critical_dns): add option to run once and exit There are situations where a container running Discourse may want to cache the critical DNS services without running the cache_critical_dns service, for example running migrations prior to running a full bore application container. Add a `--once` argument for the cache_critical_dns script that will only execute the main loop once, and return the status code for the script to use when exiting. 0 indicates no errors occured during SRV resolution, and 1 indicates a failure during the SRV lookup. Nothing is reported to prometheus in run_once mode. Generally this mode of operation would be a part of a unix pipeline, in which the exit status is a more meaningful and immediate signal than a prometheus metric. The reporting has been moved into it's own method that can be called only when the script is running as a service. See /t/69597.	2022-07-06 14:53:02 +10:00
Michael Fitz-Payne	aabbc9e63e	DOC(cache_critical_dns): add program description Describes the behaviour and configuration of the cache_critical_dns script, mainly cribbed from commit messages. Tries to make this program a bit less of an enigma.	2022-05-26 14:26:57 +10:00
Michael Fitz-Payne	0553788d3b	DEV(cache_critical_dns): improve postgres_healthcheck The `PG::Connection#ping` method is only reliable for checking if the given host is accepting connections, and not if the authentication details are valid. This extends the healthcheck to confirm that the auth details are able to both create a connection and execute queries against the database. We expect the empty query to return an empty result set, so we can assert on that. If a failure occurs for any reason, the healthcheck will return false.	2022-05-24 08:20:10 +10:00
Gabe Pacuilla	4284ba9c27	FIX(cache_critical_dns): use correct DISCOURSE_DB_USERNAME envvar (#16862 )	2022-05-18 13:01:18 -04:00
Gabe Pacuilla	9f246e6969	FIX(cache_critical_dns): use discourse database name and user by default (#16856 )	2022-05-17 16:09:32 -04:00
Michael Fitz-Payne	35d5c29e10	DEV(cache_critical_dns): add SRV priority tunables An SRV RR contains a priority value for each of the SRV targets that are present, ranging from 0 - 65535. When caching SRV records we may want to filter out any targets above or below a particular threshold. This change adds support for specifying a lower and/or upper bound on target priorities for any SRV RRs. Any targets returned when resolving the SRV RR whose priority does not fall between the lower and upper thresholds are ignored. For example: Let's say we are running two Redis servers, a primary and cold server as a backup (but not a replica). Both servers would pass health checks, but clearly the primary should be preferred over the backup server. In this case, we could configure our SRV RR with the primary target as priority 1 and backup target as priority 10. The `DISCOURSE_REDIS_HOST_SRV_LE` could then be set to 1 and the target with priority 10 would be ignored. See /t/66045.	2022-05-12 08:08:56 +10:00
Michael Fitz-Payne	1acc4751ff	FIX: remove refresh seconds override on cache_critical_dns (#16572 ) This removes the option to override the sleep time between caching of DNS records. The override was invalid because `''.to_i` is 0 in Ruby, causing a tight loop calling the `run` method.	2022-04-27 12:42:35 +08:00
Michael Fitz-Payne	0784c28702	FIX: cache_critical_dns - add TLS support for Redis healthcheck For Redis connections that operate over TLS, we need to ensure that we are setting the correct arguments for the Redis client. We can utilise the existing environment variable `DISCOURSE_REDIS_USE_SSL` to toggle this behaviour. No SSL verification is performed for two reasons: - the Discourse application will perform a verification against any FQDN as specified for the Redis host - the healthcheck is run against the _resolved_ IP address for the Redis hostname, and any SSL verification will always fail against a direct IP address If no SSL arguments are provided, the IP address is never cached against the hostname as no healthy address is ever found in the HealthyCache.	2022-04-27 12:27:58 +10:00
Michael Fitz-Payne	c4ea439cc3	DEV: refactor cache_critical_dns for SRV RR awareness Modify the cache_critical_dns script for SRV RR awareness. The new behaviour is only enabled when one or more of the following environment variables are present (and only for a host where the `DISCOURSE__HOST_SRV` variable is present): - `DISCOURSE_DB_HOST_SRV` - `DISCOURSE_DB_REPLICA_HOST_SRV` - `DISCOURSE_REDIS_HOST_SRV` - `DISCOURSE_REDIS_REPLICA_HOST_SRV` Some minor changes in refactor to original script behaviour: - add Name and SRVName classes for storing resolved addresses for a hostname - pass DNS client into main run loop instead of creating inside the loop - ensure all times are UTC - add environment override for system hosts file path and time between DNS checks mainly for testing purposes The environment variable for `BUNDLE_GEMFILE` is set to enables Ruby to load gems that are installed and vendored via the project's Gemfile. This script is usually not run from the project directory as it is configured as a system service (see `71ba9fb7b5/templates/cache-dns.template.yml (L19)`) and therefore cannot load gems like `pg` or `redis` from the default load paths. Setting this environment variable configures bundler to look in the correct project directory during it's setup phase. When a `DISCOURSE__HOST_SRV` environment variable is present, the decision for which target to cache is as follows: - resolve the SRV targets for the provided hostname - lookup the addresses for all of the resolved SRV targets via the A and AAAA RRs for the target's hostname - perform a protocol-aware healthcheck (PostgreSQL or Redis pings) - pick the newest target that passes the healthcheck From there, the resolved address for the SRV target is cached against the hostname as specified by the original form of the environment variable. For example: The hostname specified by the `DISCOURSE_DB_HOST` record is `database.example.com`, and the `DISCOURSE_DB_HOST_SRV` record is `database._postgresql._tcp.sd.example.com`. An SRV RR lookup will return zero or more targets. Each of the targets will be queried for A and AAAA RRs. For each of the addresses returned, the newest address that passes a protocol-aware healthcheck will be cached. This address is cached so that if any newer address for the SRV target appears we can perform a health check and prefer the newer address if the check passes. All resolved SRV targets are cached for a minimum of 30 minutes in memory so that we can prefer newer hosts over older hosts when more than one target is returned. Any host in the cache that hasn't been seen for more than 30 minutes is purged. See /t/61485.	2022-04-27 10:14:33 +10:00
David Taylor	9e43f0303d	DEV: Include DISCOURSE_REDIS_REPLICA_HOST in cache_critical_dns (#15877 ) This is the replacement for DISCOURSE_REDIS_SLAVE_HOST	2022-02-09 14:41:26 +00:00
Neil Lalonde	530058918e	DEV: Support env var for prometheus port in cache_critical_dns	2020-08-17 15:48:14 -04:00
Michael Brown	7200653e16	FIX: cache_critical_dns was erroring without IPAddr * sometimes cache_critical_dns would error out since "IPAddr" was undefined * sometimes it autoloaded, so no error	2019-12-27 12:39:08 -05:00
Michael Brown	7b1783bae8	FIX: cache_critical_dns was never caching pg replica (#7461 ) * it's DISCOURSE_DB_REPLICA_HOST not DISCOURSE_DB_BACKUP_HOST	2019-04-30 08:42:51 +08:00
Sam	6acabec423	FIX: script was missing newlines when generating hosts	2018-11-28 15:18:08 +11:00
Sam	6d9d904df5	add missing newline to end of file	2018-11-23 15:43:27 +11:00
Sam	d7b0f0069c	no need to double strip this line	2018-11-23 14:48:02 +11:00
Sam	4c6eeaac15	Followup on `0739c3b1d1` This corrects some minor style issues	2018-11-23 14:43:52 +11:00
Sam	0739c3b1d1	DEV: this introduces a script capable of caching critical DNS locally This is useful for cases where you want to add resiliency to DNS lookups for redis and postgres, so they will continue to work even if there is a DNS outage	2018-11-22 18:46:59 +11:00

20 Commits