Commit Graph

53 Commits

Author SHA1 Message Date
Gerhard Schlager
7c26d5d084
FIX: Import script was broken after upgrade of sqlite3 gem (#27648) 2024-07-02 17:38:15 +10:00
Loïc Guitaut
2a28cda15c DEV: Update to lastest rubocop-discourse 2024-05-27 18:06:14 +02:00
Selase Krakani
949c70372c
DEV: Add support for various fields in generic bulk importer (#27114)
* user_profiles - `location`
* users - `date_of_birth`
* topics - `pinned_at`, `pinned_until`, `pinned_globally`

This also include changes to correctly import PMs. Currently PM topics
are skipped because of a check in `import_users` step which requires `category_id`
to be present.
2024-05-24 13:46:06 +02:00
Selase Krakani
61e12aaebe
FEATURE: Extend PM recipient bulk imports (#27063)
* FIX: Support multiple topic allowed user imports

* FEATURE: Add topic allowed groups import support
2024-05-17 13:45:20 +02:00
Gerhard Schlager
e3882a0c48
DEV: Add support for user_associated_accounts to import script (#26779) 2024-04-29 19:48:32 +02:00
Gerhard Schlager
bc98740205
DEV: Improve generic import script (#25972)
* FEATURE: Import into `category_users` table
* FIX: Failed to import `user_options` unless `timezone` was set
* FIX: Prevent reusing original `id` from intermediate DB in `user_fields`
* FEATURE: Order posts by `post_nuber` if available
* FEATURE: Allow `[mention]` placeholder to reference users by"id" or "name" (username)
* FEATURE: Support `[quote]` placeholders in posts
* FEATURE: Support `[link]` placeholders in posts
* FEATURE: Support all kinds of permalinks and remove support for `old_relative_url`
* PERF: Speed up pre-cooking by removing DB lookups
2024-03-05 22:23:36 +01:00
Gerhard Schlager
dc8c6b8958 DEV: Lots of improvements to the generic_bulk import script
Notable changes:
* Imports a lot more tables from core and plugins
  * site settings
  * uploads with necessary upload references
  * groups and group members
  * user profiles
  * user options
  * user fields & values
  * muted users
  * user notes (plugin)
  * user followers (plugin)
  * user avatars
  * tag groups and tags
  * tag users (notification settings for tags / user)
  * category permissions
  * polls with options and votes
  * post votes (plugin)
  * solutions (plugin)
  * gamification scores (plugin)
  * events (plugin)
  * badges and badge groupings
  * user badges
  * optimized images
  * topic users (notification settings for topics)
  * post custom fields
  * permalinks and permalink normalizations

* It creates the `migration_mappings` table which is used to store the mapping for a handful of imported tables

* Detects duplicate group names and renames them

* Pre-cooking for attachments, images and mentions

* Outputs instructions when gems are missing

* Supports importing uploads from a DB generated by `uploads_importer.rb`

* Checks that all required plugins exists and enables them if needed

* A couple of optimizations and additions in `import.rake`
2023-12-11 16:23:07 +01:00
Jarek Radosz
694b5f108b
DEV: Fix various rubocop lints (#24749)
These (21 + 3 from previous PRs) are soon to be enabled in rubocop-discourse:

Capybara/VisibilityMatcher
Lint/DeprecatedOpenSSLConstant
Lint/DisjunctiveAssignmentInConstructor
Lint/EmptyConditionalBody
Lint/EmptyEnsure
Lint/LiteralInInterpolation
Lint/NonLocalExitFromIterator
Lint/ParenthesesAsGroupedExpression
Lint/RedundantCopDisableDirective
Lint/RedundantRequireStatement
Lint/RedundantSafeNavigation
Lint/RedundantStringCoercion
Lint/RedundantWithIndex
Lint/RedundantWithObject
Lint/SafeNavigationChain
Lint/SafeNavigationConsistency
Lint/SelfAssignment
Lint/UnreachableCode
Lint/UselessMethodDefinition
Lint/Void

Previous PRs:
Lint/ShadowedArgument
Lint/DuplicateMethods
Lint/BooleanSymbol
RSpec/SpecFilePathSuffix
2023-12-06 23:25:00 +01:00
Gerhard Schlager
0b29dc5d38 DEV: Add experimental generic bulk import script 2023-08-09 20:56:14 +02:00
David Taylor
436b3b392b
DEV: Apply syntax_tree formatting to script/* 2023-01-09 11:13:22 +00:00
Leonardo Mosquera
bfecbde837
Fixes for vBulletin bulk importer (#17618)
* Allow taking table prefix from env var

* FIX: remove unused column references

The columns `filedata` and `extension` are not present in a v4.2.4
database, and they aren't used in the method anyways.

* FIX: report progress for tables without imported_id

* FIX: effectively check for AR validation errors

NOTE: other migration scripts also have this problem; see /t/58202

* FIX: properly count Posts when importing attachments

* FIX: improve logging

* Remove leftover comment

* FIX: show progress when exporting Permalink file

* PERF: stream Permalink file

The current way results in tons of memory usage; write once per line instead

* Document fixes needed

* WIP - deduplicate category names

* Ignore non alphanumeric chars for grouping

* FIX: properly deduplicate user emails by merging accounts

* FIX: don't merge empty UserEmails

* Improve logging

* Merge users AFTER fixing primary key sequences

* Parallelize user merging

* Save duplicated users structure for debugging purposes

* Add progress logging for the (multiple hour) user merging step
2022-11-28 16:30:19 -03:00
Loïc Guitaut
ab6ca78486 FIX: Use proper ActiveRecord method in import scripts
`ActiveRecord::Base.connection_config` has been deprecated since Rails
6.1 and was completely removed from Rails 7.
Instead we need to use
`ActiveRecord::Base.connection_db_config.configuration_hash`.

Import scripts were forgotten when we did the Rails 7 upgrade, this
patch fixes them.
2022-05-09 11:09:27 +02:00
Michael Brown
3bf3b9a4a5 DEV: pull email address validation out to a new EmailAddressValidator
We validate the *format* of email addresses in many places with a match against
a regex, often with very slightly different syntax.

Adding a separate EmailAddressValidator simplifies the code in a few spots and
feels cleaner.

Deprecated the old location in case someone is using it in a plugin.

No functionality change is in this commit.

Note: the regex used at the moment does not support using address literals, e.g.:
* localpart@[192.168.0.1]
* localpart@[2001:db8::1]
2022-02-17 21:49:22 -05:00
Gerhard Schlager
33d6ed60a4
DEV: Don't import year of birth (#15937)
The cakeday plugin doesn't use the year.
2022-02-14 18:10:35 +01:00
Gerhard Schlager
a4d0d866aa
DEV: Bulk imports should find existing users by email (#14468)
Without this change, bulk imports unconditionally create new user records even when a user with the same email address exists.
2021-09-29 00:20:06 +02:00
Justin DiRose
c1517e428e
DEV: Add vBulletin5 bulk importer (#12904)
This is a pretty straightforward bulk importer, just tailored to the vBulletin 5 database structure.

Also made a few minor improvements to the base importer -- should be self explanatory in the code.
2021-04-30 11:03:33 -05:00
Régis Hanol
a85d5edbf1
DEV: set digest_attempted_at during migrations (#11369) 2020-12-14 10:58:14 +11:00
Krzysztof Kotlarek
93ff54e184
FIX: improvements for vanilla bulk import (#10212)
Adjustments to the base:
1. PG connection doesn't require host - it was broken on import droplet
2. Drop `topic_reply_count` - it was removed here - https://github.com/discourse/discourse/blob/master/db/post_migrate/20200513185052_drop_topic_reply_count.rb
3. Error with `backtrace.join("\n")` -> `e.backtrace.join("\n")`
4. Correctly link the user and avatar to quote block

Adjustments to vanilla:
1. Top-level Vanilla categories are valid categories
2. Posts have `format` column which should be used to decide if the format is HTML or Markdown
3. Remove no UTF8 characters
4. Remove not supported HTML elements like `font` `span` `sub` `u`
2020-07-14 15:58:27 +10:00
Régis Hanol
47a1157458 DEV: various bugfixes in bulk importer 2020-06-19 17:53:06 +02:00
Régis Hanol
5143309014 DEV: ensure values are converted to integers in bulk importer 2020-06-18 17:42:14 +02:00
Régis Hanol
823b940b9d PERF: improve loading of indexes in bulk import
Similar strategy as for c52191d in which we stream the results from the database into
an automatically growing array instead of using a hash.
2020-06-18 16:32:27 +02:00
Régis Hanol
c52191d49e PERF: improve loading a imported_ids in bulk imports
- Stream the queries that load the imported_ids
- Use an array instead of a hash for keeping the mapping between imported_ids and new ids
- Ensure we always treat the imported_ids as integers instead of strings
2020-06-16 19:55:08 +02:00
Sam Saffron
d0d5a138c3
DEV: stop freezing frozen strings
We have the `# frozen_string_literal: true` comment on all our
files. This means all string literals are frozen. There is no need
to call #freeze on any literals.

For files with `# frozen_string_literal: true`

```
puts %w{a b}[0].frozen?
=> true

puts "hi".frozen?
=> true

puts "a #{1} b".frozen?
=> true

puts ("a " + "b").frozen?
=> false

puts (-("a " + "b")).frozen?
=> true
```

For more details see: https://samsaffron.com/archive/2018/02/16/reducing-string-duplication-in-ruby
2020-04-30 16:48:53 +10:00
Sam Saffron
0c52537f10 DEV: update rubocop to version 0.77
We like to stay as close as possible to latest with rubocop cause the cops
get better.

This update required some code changes, specifically the default is to avoid
explicit returns where implicit is done

Also this renames a few rules
2019-12-10 11:48:39 +11:00
Penar Musaraj
067696df8f DEV: Apply Rubocop redundant return style 2019-11-14 15:10:51 -05:00
Gerhard Schlager
b788948985 FEATURE: English locale with international date formats
Makes en_US the new default locale
2019-05-20 13:47:20 +02:00
Sam Saffron
30990006a9 DEV: enable frozen string literal on all files
This reduces chances of errors where consumers of strings mutate inputs
and reduces memory usage of the app.

Test suite passes now, but there may be some stuff left, so we will run
a few sites on a branch prior to merging
2019-05-13 09:31:32 +08:00
Arpit Jalan
a20f58554b IMPORT: create category definitions in import:ensure_consistency task 2019-04-11 12:06:37 +05:30
Bianca Nenciu
714f6cde79 FIX: Remove duplicate definition of create_categories. 2019-03-04 10:32:09 +02:00
Arpit Jalan
71a5369fef FIX: do not convert quote tags to markdown 2018-12-11 20:09:46 +05:30
Arpit Jalan
735a48415d FEATURE: option to use ruby-bbcode-to-md in bulk import script
ruby-bbcode-to-md provides better bbcode to markdown conversion
2018-12-10 10:28:07 +05:30
Arpit Jalan
0365d50797 Improve vBulletin bulk import script to support table prefix.
Improve base bulk import script to convert list tags to ul/li.
2018-12-10 10:10:44 +05:30
Régis Hanol
3c9c95ac83 Update Rubocop to 0.60 2018-12-04 10:48:16 +01:00
David Taylor
9248ad1905 DEV: Enable Style/SingleLineMethods and Style/Semicolon in Rubocop (#6717) 2018-12-04 11:48:13 +08:00
Régis Hanol
a0f0bac752
Add a comment to run the 'import:ensure_consistency' rake task after a bulk import 2018-11-21 16:28:35 +01:00
Arpit Jalan
0e04e3990e Improve Vanilla bulk import script 2018-08-16 22:00:26 +05:30
Neil Lalonde
7aa93b84c1 FIX: bulk importers shouldn't insert rows with id less than 1 2018-03-09 14:26:18 -05:00
Neil Lalonde
200c6673f1 FIX: bulk importers wiping all email addresses without warning or errors 2018-03-08 23:36:39 -05:00
Neil Lalonde
1093dacc03 FIX: bulk importers need to create category description topics 2018-03-07 12:10:22 -05:00
Neil Lalonde
0edd386b48 FEATURE: Vanilla bulk importer 2018-02-02 16:28:51 -05:00
Neil Lalonde
a224459960 bulk importer shouldn't try to update primary key sequences to -1 2018-01-19 15:01:00 -05:00
Quangbuu Le
90c14106fa Enhance BulkImport pre_cook (#5015)
* Enhance BulkImport pre_cook

* BulkImport: Trim <br> at begining and ending [quote][quote/]
2017-09-04 11:04:54 +02:00
Arpit Jalan
2d909f7894 new phpBB PostgreSQL bulk import script 2017-08-03 21:21:58 +05:30
Quangbuu Le
bac21d317b Bulk import likes from vBulletin thanks (#5014) 2017-08-01 10:01:45 +02:00
Mohammad AlTawil
7836b064f4 [FIX] invalid byte sequence in UTF-8 (#5003)
Invalid encoding fixed prior to empty check
2017-07-31 15:34:11 -04:00
Quangbuu Le
0daa177805 Enhance bulk import scripts (#5010)
* Enhance bulk import scripts

* Fix: restore running statement of BulkImport::VBulletin
2017-07-31 10:56:57 +02:00
Guo Xiang Tan
5012d46cbd Add rubocop to our build. (#5004) 2017-07-28 10:20:09 +09:00
Arpit Jalan
d89d279416 Update UserEmail primary key sequence when performing bulk import. 2017-07-25 19:15:22 +05:30
Quangbuu Le
5bba959cd5 FIX: vBulletin bulk importer: emails and stats 2017-07-24 19:49:22 +07:00
Régis Hanol
57d6a5dc9c FIX: vBulletin bulk importer 2017-07-24 14:22:00 +02:00