This SQL tries to insert as much data as possible into the `user_stats` table by either calculating or by approximating stats based on existing. It also fixes an error in the calculation of `reply_count`which mistakenly contained all posts, not just replies.
This change also disables some steps in the `import:ensure_consistency` rake task by setting the `SKIP_USER_STATS` env variable. Otherwise, the rake task will overwrite the calculated data in the `user_stats` table with inaccurate data. I'm not changing or removing the logic from the rake task yet because other bulk import scripts seem to depend on it.
* FEATURE: Import into `category_users` table
* FIX: Failed to import `user_options` unless `timezone` was set
* FIX: Prevent reusing original `id` from intermediate DB in `user_fields`
* FEATURE: Order posts by `post_nuber` if available
* FEATURE: Allow `[mention]` placeholder to reference users by"id" or "name" (username)
* FEATURE: Support `[quote]` placeholders in posts
* FEATURE: Support `[link]` placeholders in posts
* FEATURE: Support all kinds of permalinks and remove support for `old_relative_url`
* PERF: Speed up pre-cooking by removing DB lookups
* Print instructions when the `sqlite3` gem can't be loaded
* Use `display_filename` instead of `filename` if available
* Support uploading for a multisite
Previously only Sidekiq was allowed to generate more than one optimized image at the same time per machine. This adds an easy mechanism to allow the same in rake tasks and other tools.
Notable changes:
* Imports a lot more tables from core and plugins
* site settings
* uploads with necessary upload references
* groups and group members
* user profiles
* user options
* user fields & values
* muted users
* user notes (plugin)
* user followers (plugin)
* user avatars
* tag groups and tags
* tag users (notification settings for tags / user)
* category permissions
* polls with options and votes
* post votes (plugin)
* solutions (plugin)
* gamification scores (plugin)
* events (plugin)
* badges and badge groupings
* user badges
* optimized images
* topic users (notification settings for topics)
* post custom fields
* permalinks and permalink normalizations
* It creates the `migration_mappings` table which is used to store the mapping for a handful of imported tables
* Detects duplicate group names and renames them
* Pre-cooking for attachments, images and mentions
* Outputs instructions when gems are missing
* Supports importing uploads from a DB generated by `uploads_importer.rb`
* Checks that all required plugins exists and enables them if needed
* A couple of optimizations and additions in `import.rake`
This script preprocesses all uploads within a intermediate DB (output of converters) and uploads those files to S3. It does the same for optimized images. This speeds up migrations when you have to run them multiple times, because you only have to preprocess and upload the files once.
This script is very hacky and mostly undocumented for now. That will change in the future.
Discourse stopped using PostUpload in 9db8f00b3d. Since then, these importers have been writing to the table, but any data was totally unused. This commit updates the easy cases to use UploadReference, and adds an error to the discourse_merger import script, which needs more significant work.
* Allow taking table prefix from env var
* FIX: remove unused column references
The columns `filedata` and `extension` are not present in a v4.2.4
database, and they aren't used in the method anyways.
* FIX: report progress for tables without imported_id
* FIX: effectively check for AR validation errors
NOTE: other migration scripts also have this problem; see /t/58202
* FIX: properly count Posts when importing attachments
* FIX: improve logging
* Remove leftover comment
* FIX: show progress when exporting Permalink file
* PERF: stream Permalink file
The current way results in tons of memory usage; write once per line instead
* Document fixes needed
* WIP - deduplicate category names
* Ignore non alphanumeric chars for grouping
* FIX: properly deduplicate user emails by merging accounts
* FIX: don't merge empty UserEmails
* Improve logging
* Merge users AFTER fixing primary key sequences
* Parallelize user merging
* Save duplicated users structure for debugging purposes
* Add progress logging for the (multiple hour) user merging step
`ActiveRecord::Base.connection_config` has been deprecated since Rails
6.1 and was completely removed from Rails 7.
Instead we need to use
`ActiveRecord::Base.connection_db_config.configuration_hash`.
Import scripts were forgotten when we did the Rails 7 upgrade, this
patch fixes them.
We validate the *format* of email addresses in many places with a match against
a regex, often with very slightly different syntax.
Adding a separate EmailAddressValidator simplifies the code in a few spots and
feels cleaner.
Deprecated the old location in case someone is using it in a plugin.
No functionality change is in this commit.
Note: the regex used at the moment does not support using address literals, e.g.:
* localpart@[192.168.0.1]
* localpart@[2001:db8::1]
* File.exists? is deprecated and removed in Ruby 3.2 in favor of
File.exist?
* Dir.exists? is deprecated and removed in Ruby 3.2 in favor of
Dir.exist?
This is a pretty straightforward bulk importer, just tailored to the vBulletin 5 database structure.
Also made a few minor improvements to the base importer -- should be self explanatory in the code.
After running the Discourse merge script, it was pretty evident it held up well after all these years ;)
Made a few fixes:
Included an environment variable for DB_PASS as likely the password will need to be changed if running the import in an official Docker container (recommended)
Set a hard order for imported categories, otherwise sometimes they'd be imported in a weird order making things unpredictable for parent/child category imports
Fixed a couple of instances where we added unique indexes (such as on category slugs)
Set up upload regex to handle AWS URLs better
Fixed the script to work with frozen string literals
This commit adds an additional find_user_by_email hook to ManagedAuthenticator so that GitHub login can continue to support secondary email addresses
The github_user_infos table will be dropped in a follow-up commit.
This is the last core authenticator to be migrated to ManagedAuthenticator 🎉
Adjustments to the base:
1. PG connection doesn't require host - it was broken on import droplet
2. Drop `topic_reply_count` - it was removed here - https://github.com/discourse/discourse/blob/master/db/post_migrate/20200513185052_drop_topic_reply_count.rb
3. Error with `backtrace.join("\n")` -> `e.backtrace.join("\n")`
4. Correctly link the user and avatar to quote block
Adjustments to vanilla:
1. Top-level Vanilla categories are valid categories
2. Posts have `format` column which should be used to decide if the format is HTML or Markdown
3. Remove no UTF8 characters
4. Remove not supported HTML elements like `font` `span` `sub` `u`
- Stream the queries that load the imported_ids
- Use an array instead of a hash for keeping the mapping between imported_ids and new ids
- Ensure we always treat the imported_ids as integers instead of strings
We have the `# frozen_string_literal: true` comment on all our
files. This means all string literals are frozen. There is no need
to call #freeze on any literals.
For files with `# frozen_string_literal: true`
```
puts %w{a b}[0].frozen?
=> true
puts "hi".frozen?
=> true
puts "a #{1} b".frozen?
=> true
puts ("a " + "b").frozen?
=> false
puts (-("a " + "b")).frozen?
=> true
```
For more details see: https://samsaffron.com/archive/2018/02/16/reducing-string-duplication-in-ruby
This is not used in core or official plugins, and has been printing a deprecation notice since v2.3.0beta4. All OpenID 2.0 code and dependencies have been dropped. The user_open_ids table remains for now, in case anyone has missed the deprecation notice, and needs to migrate their data.
Context at https://meta.discourse.org/t/-/113249
We like to stay as close as possible to latest with rubocop cause the cops
get better.
This update required some code changes, specifically the default is to avoid
explicit returns where implicit is done
Also this renames a few rules
This reduces chances of errors where consumers of strings mutate inputs
and reduces memory usage of the app.
Test suite passes now, but there may be some stuff left, so we will run
a few sites on a branch prior to merging
Changes to functionality
- Removed syncing of user metadata including gender, location etc.
These are no longer available to standard Facebook applications.
- Removed the remote 'revoke' functionality. No other providers have
it, and it does not appear to be standard practice in other apps.
- The 'facebook_no_email' event is no longer logged. The system can
cope fine with a missing email address.
Data is migrated to the new user_associated_accounts table.
facebook_user_infos can be dropped once we are confident the data has
been migrated successfully.