While load testing our user creation code path in production, we
identified that executing the DB statement to update the `Group#user_count` column within a
transaction is creating a bottleneck for us. This is because the
creation of a user and addition of the user to the relevant groups are
done in a transaction. When we execute the DB statement to update
`Group#user_count` for the relevant group, a row level lock is held
until the transaction completes. This row level lock acts like a global
lock when the server is creating users that will be added to the same
group in quick succession.
Instead of updating the counter cache within a transaction which the
default ActiveRecord `counter_cache` option does, we simply update the
counter cache outside of the committing transaction.
Co-authored-by: Rafael dos Santos Silva <xfalcox@gmail.com>
Co-authored-by: Rafael dos Santos Silva <xfalcox@gmail.com>
* Allow taking table prefix from env var
* FIX: remove unused column references
The columns `filedata` and `extension` are not present in a v4.2.4
database, and they aren't used in the method anyways.
* FIX: report progress for tables without imported_id
* FIX: effectively check for AR validation errors
NOTE: other migration scripts also have this problem; see /t/58202
* FIX: properly count Posts when importing attachments
* FIX: improve logging
* Remove leftover comment
* FIX: show progress when exporting Permalink file
* PERF: stream Permalink file
The current way results in tons of memory usage; write once per line instead
* Document fixes needed
* WIP - deduplicate category names
* Ignore non alphanumeric chars for grouping
* FIX: properly deduplicate user emails by merging accounts
* FIX: don't merge empty UserEmails
* Improve logging
* Merge users AFTER fixing primary key sequences
* Parallelize user merging
* Save duplicated users structure for debugging purposes
* Add progress logging for the (multiple hour) user merging step
This allows text editors to use correct syntax coloring for the heredoc sections.
Heredoc tag names we use:
languages: SQL, JS, RUBY, LUA, HTML, CSS, SCSS, SH, HBS, XML, YAML/YML, MF, ICS
other: MD, TEXT/TXT, RAW, EMAIL
We validate the *format* of email addresses in many places with a match against
a regex, often with very slightly different syntax.
Adding a separate EmailAddressValidator simplifies the code in a few spots and
feels cleaner.
Deprecated the old location in case someone is using it in a plugin.
No functionality change is in this commit.
Note: the regex used at the moment does not support using address literals, e.g.:
* localpart@[192.168.0.1]
* localpart@[2001:db8::1]
* File.exists? is deprecated and removed in Ruby 3.2 in favor of
File.exist?
* Dir.exists? is deprecated and removed in Ruby 3.2 in favor of
Dir.exist?
This reduces chances of errors where consumers of strings mutate inputs
and reduces memory usage of the app.
Test suite passes now, but there may be some stuff left, so we will run
a few sites on a branch prior to merging
`Upload#url` is more likely and can change from time to time. When it
does changes, we don't want to have to look through multiple tables to
ensure that the URLs are all up to date. Instead, we simply associate
uploads properly to `UserProfile` so that it does not have to replicate
the URLs in the table.
Introduce new patterns for direct sql that are safe and fast.
MiniSql is not prone to memory bloat that can happen with direct PG usage.
It also has an extremely fast materializer and very a convenient API
- DB.exec(sql, *params) => runs sql returns row count
- DB.query(sql, *params) => runs sql returns usable objects (not a hash)
- DB.query_hash(sql, *params) => runs sql returns an array of hashes
- DB.query_single(sql, *params) => runs sql and returns a flat one dimensional array
- DB.build(sql) => returns a sql builder
See more at: https://github.com/discourse/mini_sql
This changes the content of `@categories_lookup` from `Category` objects
to IDs since the category names aren't needed anymore. The lookup
method has been renamed too.
- FEATURE: TopicCreator now supports 'pinned_at' parameter
- FIX: 🐛 FIX TopicQuerySQL to support pinned topic older than 2010
- FIX: 🐛 Properly remove all HTML Entities from Usernames/Titles/Category Names/Groups in vBulletin importer
- FIX: 🐛 Properly handle specific vBulletin BBCode (quotes/mentions)
- FIX: 🐛 Make sure we generate a username from the name of the user instead of a fake email
- FEATURE: Allow for custom timezone in vBulletin importer
- FEATURE: Support for profile pictures/background in vBulletin importer
- FIX: 🐛 merge the categories tree to only 2 levels in vBulletin importer
FEATURE: add backtrace when an exception happen (importers)
FIX: post-processing should also happen on first posts (vBulletin
importer)
PERF: faster topic bypass when already imported