This is an importer I wrote to restore some users that were
accidentally deleted for being purged as old staged users or old
unactivated users.
It reads from CSV files exported from a discourse sql backup.
When running an import script there are many site settings that are
changed but we reset them back to where they were originally before the
import. However, there are two settings that we don't roll back:
```
purge_unactivated_users_grace_period_days
purge_deleted_uploads_grace_period_days
```
which could have some unintended consequences.
My first question is do we *really* have to change these settings? I'm
not a huge fan of changing someones settings without them really knowing
they were changed.
If we really do have to change these settings here is my proposed PR
where we don't alter the `purge_unactivated_users_grace_period_days` if
it has been disabled.
As I'm writing this another change we could make is that we don't change
either of these site settings if we detect that they aren't set to the
default values.
The drive behind this PR is that there is a discourse instance which
relies on staged users as part of their workflow and this setting was
changed by accident via the import script causing users to be deleted
that shouldn't have been.
* FEATURE: Import attachments
* FEATURE: Add support for importing multiple forums in one
* FEATURE: Add support for category and tag mapping
* FEATURE: Import groups
* FIX: Add spaces around images
* FEATURE: Custom mapping of user rank to trust levels
* FIX: Do not fail import if it cannot import polls
* FIX: Optimize existing records lookup
Co-authored-by: Gerhard Schlager <mail@gerhard-schlager.at>
Co-authored-by: Jarek Radosz <jradosz@gmail.com>
* ensure emails don't have spaces
* import banned users as suspended for 1k yrs
* upgrade users to TL2 if they have comments
* topic: import views, closed and pinned info
* import messages
* encode vanilla usernames for permalinks. Vanilla usernames can contain spaces and special characters.
* parse Vanilla's new rich body format
Refactors script to follow conventions of other importers and adds some features including like import, processing of post raw text, and, if needed, SSO import.
FEATURE: new rake task to update first_post_created_at column
The not-equal operator (`<>`) in PostgreSQL does not compare values
with NULL. We should instead use `IS DISTINCT FROM` when comparing
values with NULL.
* PERF: Dematerialize topic_reply_count
It's only ever used for trust level promotions that run daily, or compared to 0. We don't need to track it on every post creation.
* UX: Add symbol in TL3 report if topic reply count is capped
* DEV: Drop user_stats.topic_reply_count column
We have the `# frozen_string_literal: true` comment on all our
files. This means all string literals are frozen. There is no need
to call #freeze on any literals.
For files with `# frozen_string_literal: true`
```
puts %w{a b}[0].frozen?
=> true
puts "hi".frozen?
=> true
puts "a #{1} b".frozen?
=> true
puts ("a " + "b").frozen?
=> false
puts (-("a " + "b")).frozen?
=> true
```
For more details see: https://samsaffron.com/archive/2018/02/16/reducing-string-duplication-in-ruby
I'm not clear why changing only the `wait_for_url` address was necessary and not also the `get` a few lines above, but this change seems to work for me on both literatecomputing.com Groups and a public group.
Checking if all records have been imported uses a temp table in PostgreSQL. This fails when pgbouncer is used unless the temp table is created inside a transaction.
* Detects mostly all attachments and it's a lot faster
* Parses user properties in Ruby instead of the DB, because that's less errorprone
* Imports user avatars
* Imports topic views by users
* Better handling of quotes and YouTube links
* Adds ability to map forums to categories and tags as well as ignore forums.
* Fixes regular expression for detecting attachments in posts.
* Handles "remote attachments" 😮 by inserting a link.
* Imports view counts for topics.
* Handles incorrect references of parent posts.
* Better handling of quotes.
* Finds a lot more attachments by trying to replace various Unicode characters in filenames.
* Customizable email subject prefixes to remove "Re" and "Fwd" as well as localized prefixes.
* Configuration option for prefixes like [FOO] or (BAR) which can be replaced with tags during import.
* Bugfix: Import script might have skipped some users due to missing ORDER BY.
Posts without a user probably shouldn't happen unless there was some direct database tampering, but data like that has been seen in the wild.
The importer will assign those posts to the "system" user.
The following methods have long been deprecated in ruby due to flaws in their implementation per http://blade.nagaokaut.ac.jp/cgi-bin/vframe.rb/ruby/ruby-core/29293?29179-31097:
URI.escape
URI.unescape
URI.encode
URI.unencode
escape/encode are just aliases for one another. This PR uses the Addressable gem to replace these methods with its own encode, unencode, and encode_component methods where appropriate.
I have put all references to Addressable::URI here into the UrlHelper to keep them corralled in one place to make changes to this implementation easier.
Addressable is now also an explicit gem dependency.
We like to stay as close as possible to latest with rubocop cause the cops
get better.
This update required some code changes, specifically the default is to avoid
explicit returns where implicit is done
Also this renames a few rules
Zeitwerk simplifies working with dependencies in dev and makes it easier reloading class chains.
We no longer need to use Rails "require_dependency" anywhere and instead can just use standard
Ruby patterns to require files.
This is a far reaching change and we expect some followups here.