Notable changes:
* Imports a lot more tables from core and plugins
* site settings
* uploads with necessary upload references
* groups and group members
* user profiles
* user options
* user fields & values
* muted users
* user notes (plugin)
* user followers (plugin)
* user avatars
* tag groups and tags
* tag users (notification settings for tags / user)
* category permissions
* polls with options and votes
* post votes (plugin)
* solutions (plugin)
* gamification scores (plugin)
* events (plugin)
* badges and badge groupings
* user badges
* optimized images
* topic users (notification settings for topics)
* post custom fields
* permalinks and permalink normalizations
* It creates the `migration_mappings` table which is used to store the mapping for a handful of imported tables
* Detects duplicate group names and renames them
* Pre-cooking for attachments, images and mentions
* Outputs instructions when gems are missing
* Supports importing uploads from a DB generated by `uploads_importer.rb`
* Checks that all required plugins exists and enables them if needed
* A couple of optimizations and additions in `import.rake`
This script preprocesses all uploads within a intermediate DB (output of converters) and uploads those files to S3. It does the same for optimized images. This speeds up migrations when you have to run them multiple times, because you only have to preprocess and upload the files once.
This script is very hacky and mostly undocumented for now. That will change in the future.
This commit introduces the scaffolding for us to easily switch between Ember 3.28 and Ember 5 on the `main` branch of Discourse. Unfortunately, there is no built-in system to apply this kind of flagging within yarn / ember-cli. There are projects like `ember-try` which are designed for running against multiple version of a dependency, but they do not allow us to 'lock' dependency/sub-dependency versions, and are therefore unsuitable for our use in production.
Instead, we will be maintaining two root `package.json` files, and two `yarn.lock` files. For ember-3, they remain as-is. For ember5, we use a yarn 'resolution' to override the version for ember-source across the entire yarn workspace.
To allow for easy switching with minimal diff against the repository, `package.json` and `yarn.lock` are symlinks which point to `package-ember3.json` and `yarn-ember3.lock` by default. To switch to Ember 5, we can run `script/switch ember version 5` to update the symlinks to point to `package-ember5.json` and `package-ember3.json` respectively. In production, and when using `bin/ember-cli` for development, the ember version can also be upgraded using the `EMBER_VERSION=5` environment variable.
When making changes to dependencies, these should be made against the default `ember3` versions, and then `script/regen_ember_5_lockfile` should be used to regenerate `yarn-ember5.lock` accordingly. A new 'Ember Version Lockfiles' GitHub workflow will automate this process on Dependabot PRs.
When running a local environment against Ember 5, the two symlink changes will show up as git diffs. To avoid us accidentally committing/pushing that change, another GitHub workflow is introduced which checks the default Ember version and raises an error if it is greater than v3.
Supporting two ember versions simultaneously obviously carries significant overhead, so our aim will be to get themes/plugins updated as quickly as possible, and then drop this flag.
The priority field in an SRV RR indicates a preferential order at which
the underlying targets should be utilised. We need to prefer healthy
services in order of priority, where 0 is highest.
Prior to this commit, we relied on whatever order the
dnsclient.getresources method returned. As it turns out, this assumption
is incorrect. The order returned is likely whatever order the system
resolver received DNS responses in, which may not be ordered according
to the spec.
This introduces a ResolvedAddress type which holds the priority value
for SRV targets, or a stand-in priority of zero for A/AAAA RRs. This
type is used as a return value from the underlying name resolution
routines in Name and SRVName.
In this manner, all ordering by priority and resolved time can be
performed directly within the ResolverCache class and calling code can
continue to be none-the-wiser.
Before sorting, we still ensure that we only consider targets with a
priority within the given threshold as previously implemented.
See t/115911.
These updates significantly improve IDE tooling for imports across the Discourse core codebase, and also for framework packages. The `@types/ember-*` packages are a temporary solution until we get onto Ember 5, which ships its types in the main package.
The previous approach of having jsconfig files in each package directory did work, but once you start adding all the possible interlinks between them, we hit the file count limit of VSCode's tooling (because it counts every file for every jsconfig its referenced in). Having one file at the root means that a single file can apply to all core packages and plugins.
Long-term, to get the same functionality for all themes/plugins, we may need to look at building/publishing a Discourse types package which can be added to theme/plugin package.json files for development purposes.
Discourse stopped using PostUpload in 9db8f00b3d. Since then, these importers have been writing to the table, but any data was totally unused. This commit updates the easy cases to use UploadReference, and adds an error to the discourse_merger import script, which needs more significant work.
Discourse core now builds and runs with Embroider! This commit adds
the Embroider-based build pipeline (`USE_EMBROIDER=1`) and start
testing it on CI.
The new pipeline uses Embroider's compat mode + webpack bundler to
build discourse code, and leave everything else (admin, wizard,
markdown-it, plugins, etc) exactly the same using the existing
Broccoli-based build as external bundles (<script> tags), passed
to the build as `extraPublicTress` (which just means they get
placed in the `/public` folder).
At runtime, these "external" bundles are glued back together with
`loader.js`. Specifically, the external bundles are compiled as
AMD modules (just as they were before) and registered with the
global `loader.js` instance. They expect their `import`s (outside
of whatever is included in the bundle) to be already available in
the `loader.js` runtime registry.
In the classic build, _every_ module gets compiled into AMD and
gets added to the `loader.js` runtime registry. In Embroider,
the goal is to do this as little as possible, to give the bundler
more flexibility to optimize modules, or omit them entirely if it
is confident that the module is unused (i.e. tree-shaking).
Even in the most compatible mode, there are cases where Embroider
is confident enough to omit modules in the runtime `loader.js`
registry (notably, "auto-imported" non-addon NPM packages). So we
have to be mindful of that an manage those dependencies ourselves,
as seen in #22703.
In the longer term, we will look into using modern features (such
as `import()`) to express these inter-dependencies.
This will only be behind a flag for a short period of time while we
perform some final testing. Within the next few weeks, we intend
to enable by default and remove the flag.
---------
Co-authored-by: David Taylor <david@taylorhq.com>
## What is the context here?
The `docker.rake` Rakefile contains Rake tasks that are meant to be run
in the `discourse/discourse_test:release` Docker image. For example, we
have the `docker:test` Rake task that makes it easier to run the test
suite for a particular Discourse commit.
Why are we introducing a `docker:test:setup` Rake task?
While we have the `docker:test` Rake task, it is very limited in the
test commands that can be executed. It is very useful for automated
testing but not very useful for running tests in the development
environment. Therefore, we are introducing a `docker:test:setup` rake
task that can be used to set up the test environment for running tests.
The envisioned example usage is something like this:
```
docker run -d --name=discourse_test --entrypoint=/sbin/boot discourse/discourse_test:release
docker exec -u discourse:discourse discourse_test ruby script/docker_test.rb --no-tests
docker exec -u discourse:discourse discourse_test bundle exec rake docker:test:setup
docker exec -u discourse:discourse discourse_test bundle exec rspec <path to file>
```
This commit adds some system specs to test uploads with
direct to S3 single and multipart uploads via uppy. This
is done with minio as a local S3 replacement. We are doing
this to catch regressions when uppy dependencies need to
be upgraded or we change uppy upload code, since before
this there was no way to know outside manual testing whether
these changes would cause regressions.
Minio's server lifecycle and the installed binaries are managed
by the https://github.com/discourse/minio_runner gem, though the
binaries are already installed on the discourse_test image we run
GitHub CI from.
These tests will only run in CI unless you specifically use the
CI=1 or RUN_S3_SYSTEM_SPECS=1 env vars.
For a history of experimentation here see https://github.com/discourse/discourse/pull/22381
Related PRs:
* https://github.com/discourse/minio_runner/pull/1
* https://github.com/discourse/minio_runner/pull/2
* https://github.com/discourse/minio_runner/pull/3
It's very simple import script and currently imports only the following content:
* Users
* Messages as Discourse topics/posts
* Attachments
Each channel can be mapped to a category and tags. It uses regular expressions to convert formatted messages ("rich text") into Markdown used by Discourse. In the future we could convert the `blocks` attribute from each message into Markdown instead of applying regular expressions on the `text` attribute.
Various migration scripts define a normalize_raw method to do custom processing of post contents before storing it in the Post.raw and other fields.
They normally do not handle nil inputs, but it's a relatively common occurrence in data dumps.
Since this method is used from various points in the migration script, as it stands, the experience of using a migration script is that it will fail multiple times at different points, forcing you to fix the data or apply logic hacks every time then restarting.
This PR generalizes handling of nil input by returning a <missing> string.
Pros:
no more messy repeated crashes + restarts
consistency
Cons:
it might hide data issues
OTOH we can't print a warning on that method because it will flood the console since it's called from inside loops.
* FIX: zendesk import script: support nil inputs in normalize_raw
* FIX: return '<missing>' instead of empty string; do it for all methods
Before `pg` gem version 1.4.6 was loading `date` as dependency.
Looks like version 1.5.1 is not doing that anymore. Update was done here: d32709a74f
Therefore, we have to load `date` explicitly.
Previously, FETCH_HEAD would always point to tests-passed because our base docker image was configured to only fetch the tests-passed branch. Since https://github.com/discourse/discourse_docker/commit/53bbacc882, we switched to a partial clone which means that `git fetch; git checkout FETCH_HEAD` will checkout whichever remote branch is the first alphabetically. This commit makes the checkout more specific to avoid this issue.
Fixes an assumption that the PostgreSQL port will always be reachable at
the discovered host on the default port when performing the healthcheck.
Instead we should be sourcing this from the same environment variable
that the application will be using to connect to.
Defaults to the standard PG port, 5432.
There are two failure modes that can be expected - no target SRV DNS RRs
found or no healthy service available at target addresses. Prior to this
patch, there was no way to differentiate from log messages between the
two cases.
Introduce an EmptyCache exception which may be raised by either the
ResolverCache or HealthyCache. The exception message contains enough
information about where the exception occurred to troubleshoot issues.
An existing bug was fixed in this commit. Previously if a target address
changed during runtime, an old cached (healthy) address would be
returned.. The behaviour has been corrected to return the newly cached
address.
This commit implements many changes to topic and comments embedding. It
deprecates the class_name field from EmbeddableHost and suggests using
the className parameter. discourse_username parameter has been
deprecated and it will fetch it from embedded site from the author or
discourse-username meta.
See the updated code sample from Admin > Customize > Embedding page.
* FEATURE: Add className parameter for Discourse embed
* DEV: Hide class_name from EmbeddableHost
* DEV: Deprecate class_name field of EmbeddableHost
* FEATURE: Use either author or discourse-username meta tag
* DEV: Deprecate discourse_username parameter
* DEV: Improve embed code sample
* FIX: Use pluralized string
* REFACTOR: Fix misuse of pluralized string
* REFACTOR: Fix misuse of pluralized string
* DEV: Remove linting of `one` key in MessageFormat string, it doesn't work
* REFACTOR: Fix misuse of pluralized string
This also ensures that the URL works on subfolder and shows the site setting link only for admins instead of staff. The string is quite complicated, so the best option was to switch to MessageFormat.
* REFACTOR: Fix misuse of pluralized string
* FIX: Use pluralized string
This also ensures that the URL works on subfolder and shows the site setting link only for admins instead of staff.
* REFACTOR: Correctly pluralize reaction tooltips in chat
This also ensures that maximum 5 usernames are shown and fixes the number of "others" which was off by 1 if the current user reacted on a message.
* REFACTOR: Use translatable string as comma separator
* DEV: Add comment to translation to clarify the meaning of `%{identifier}`
* REFACTOR: Use translatable comma separator and use explicit interpolation keys
* REFACTOR: Don't interpolate lowercase channel status
* REFACTOR: Fix misuse of pluralized string
* REFACTOR: Don't interpolate channel status
* REFACTOR: Use %{count} interpolation key
* REFACTOR: Fix misuse of pluralized string
* REFACTOR: Correctly pluralize DM chat channel titles
The #pluck_first freedom patch, first introduced by @danielwaterworth has served us well, and is used widely throughout both core and plugins. It seems to have been a common enough use case that Rails 6 introduced it's own method #pick with the exact same implementation. This allows us to retire the freedom patch and switch over to the built-in ActiveRecord method.
There is no replacement for #pluck_first!, but a quick search shows we are using this in a very limited capacity, and in some cases incorrectly (by assuming a nil return rather than an exception), which can quite easily be replaced with #pick plus some extra handling.
This bug is actually a Drupal issue where some edited posts have their `created` and `changed` timestamps set to the same value. But even when that happens in Drupal it still maintains the correct post order in an affected thread. This PR makes the Discourse importer also maintain the original Drupal comment order by sorting comments in the source DB by their `cid`, which is sequential and never changes. More details from this post onward:
https://meta.discourse.org/t/large-drupal-forum-migration-importer-errors-and-limitations/246939/24?u=rahim123
We'd like to lean on the DNS caching service for more than the standard
DB and Redis hosts, but without having to add additional code each time.
Define a new environment variable
DISCOURSE_DNS_CACHE_ADDITIONAL_SERVICE_NAMES (admittedly a mouthful)
which is a list of service names to be added to the static list at
process execution time.
For example, plugin foo may reference two services that you want to
cache the address of. By specifying the following two variables in the
process environment, cache_critical_dns will perform the lookup
alongside the DB and Redis host variables.
```
DISCOURSE_DNS_CACHE_ADDITIONAL_SERVICE_NAMES='FOO_SERVICE1,FOO_SERVICE2'
FOO_SERVICE1='foo.service1.example.com'
FOO_SERVICE1_SRV='foo._tcp.example.com'
FOO_SERVICE2='foo.service2.example.com'
```
The behaviour when it comes to SRV record lookup is the same as
previously implemented for the `DISCOURSE_DB_..` and
`DISCOURSE_REDIS_..` variables.
For the purposes of the health checks, services defined in the list _are
always considered healthy_. This is a compromise for conveniences sake.
Defining a dynamic method for health checks at runtime is not practical.
See t/88457/32.
1. Fix bug where we were not waiting for all unicorn workers to start up
before running benchmarks.
2. Fix a bug where headers were not used when benchmarking. Admin
benchmarks were basically running as anon user.
3. Disable rate limits when in profile env. We're pretty much going to
hit the rate limit every time as a normal user.
4. Benchmark against topic with a fixed posts count of 100. Previously profiling script was just randomly creating posts
and we would benchmark against a topic with a fixed posts count of 30.
Sometimes, the script fails because no topics with a posts count of 30
exists.
5. Benchmarks are not run against a normal user on top of anon and
admin.
6. Add script option to select tests that should be run.
This commit promotes all post_deploy migrations which existed in
Discourse v2.8.0 (timestamp <= 20220107014925).
This commit includes a fix to the promote_migrations script to promote
all migrations of the first version of the previous stable version. For
example, if the current stable version is v2.8.13, the version used as
a cutoff for promoting migrations is v2.8.0.
While load testing our user creation code path in production, we
identified that executing the DB statement to update the `Group#user_count` column within a
transaction is creating a bottleneck for us. This is because the
creation of a user and addition of the user to the relevant groups are
done in a transaction. When we execute the DB statement to update
`Group#user_count` for the relevant group, a row level lock is held
until the transaction completes. This row level lock acts like a global
lock when the server is creating users that will be added to the same
group in quick succession.
Instead of updating the counter cache within a transaction which the
default ActiveRecord `counter_cache` option does, we simply update the
counter cache outside of the committing transaction.
Co-authored-by: Rafael dos Santos Silva <xfalcox@gmail.com>
Co-authored-by: Rafael dos Santos Silva <xfalcox@gmail.com>
* Allow taking table prefix from env var
* FIX: remove unused column references
The columns `filedata` and `extension` are not present in a v4.2.4
database, and they aren't used in the method anyways.
* FIX: report progress for tables without imported_id
* FIX: effectively check for AR validation errors
NOTE: other migration scripts also have this problem; see /t/58202
* FIX: properly count Posts when importing attachments
* FIX: improve logging
* Remove leftover comment
* FIX: show progress when exporting Permalink file
* PERF: stream Permalink file
The current way results in tons of memory usage; write once per line instead
* Document fixes needed
* WIP - deduplicate category names
* Ignore non alphanumeric chars for grouping
* FIX: properly deduplicate user emails by merging accounts
* FIX: don't merge empty UserEmails
* Improve logging
* Merge users AFTER fixing primary key sequences
* Parallelize user merging
* Save duplicated users structure for debugging purposes
* Add progress logging for the (multiple hour) user merging step
We now use Ember CLI (core/plugins) and DiscourseJSProcessor (themes) for all Ember and template compilation. This commit removes the remnants of the legacy Sprockets-based Ember compilation system.
Sprockets, and its DiscourseJSProcess-based Babel transformations, is still in use for a few assets. Ideally that will be removed/replaced in the near future.
We are already caching any DB_HOST and REDIS_HOST (and their
accompanying replicas), we should also cache the resolved addresses for
the MessageBus specific Redis. This is a noop if no MB redis is defined
in config. A side effect is that the MB will also support SRV lookup and
priorities, following the same convention as the other cached services.
The port argument was added to redis_healthcheck so that the script
supports a setup where Redis is running on a non-default port.
Did some minor refactoring to improve readability when filtering out the
CRITICAL_HOST_ENV_VARS. The `select` block was a bit confusing, so the
sequence was made easier to follow.
We were coercing an environment variable to an int in a few places, so
the `env_as_int` method was introduced to do that coercion in one place and
for convenience purposes default to a value if provided.
See /t/68301/30.