Commit Graph

617 Commits

Author SHA1 Message Date
Justin DiRose
be28fc73a0
DEV: Improvements to Drupal script (#10016)
Refactors script to follow conventions of other importers and adds some features including like import, processing of post raw text, and, if needed, SSO import.
2020-06-10 10:59:17 -05:00
Arpit Jalan
a93d24501c FIX: base import script was not updating first_post_created_at column
FEATURE: new rake task to update first_post_created_at column

The not-equal operator (`<>`) in PostgreSQL does not compare values
with NULL. We should instead use `IS DISTINCT FROM` when comparing
values with NULL.
2020-06-04 11:26:40 +05:30
Kane York
869f9b20a2
PERF: Dematerialize topic_reply_count (#9769)
* PERF: Dematerialize topic_reply_count

It's only ever used for trust level promotions that run daily, or compared to 0. We don't need to track it on every post creation.

* UX: Add symbol in TL3 report if topic reply count is capped

* DEV: Drop user_stats.topic_reply_count column
2020-05-14 15:42:00 -07:00
Francesco Frassinelli
c6f68e4006
Fix syntax error in fluxbb.rb (#9727) 2020-05-11 11:07:57 -04:00
Krzysztof Kotlarek
9bff0882c3
FEATURE: Nokogumbo (#9577)
* FEATURE: Nokogumbo

Use Nokogumbo HTML parser.
2020-05-05 13:46:57 +10:00
Martin Brennan
867bc3b48e
FIX: Change base importer to create new Bookmark records (#9603)
Also add a spec and fixture with a mock importer that we can use to test the create_X methods of the base importer
2020-05-01 11:34:55 +10:00
Sam Saffron
d0d5a138c3
DEV: stop freezing frozen strings
We have the `# frozen_string_literal: true` comment on all our
files. This means all string literals are frozen. There is no need
to call #freeze on any literals.

For files with `# frozen_string_literal: true`

```
puts %w{a b}[0].frozen?
=> true

puts "hi".frozen?
=> true

puts "a #{1} b".frozen?
=> true

puts ("a " + "b").frozen?
=> false

puts (-("a " + "b")).frozen?
=> true
```

For more details see: https://samsaffron.com/archive/2018/02/16/reducing-string-duplication-in-ruby
2020-04-30 16:48:53 +10:00
discoursehosting
094ddb1c1f
VBulletin5 importer improvements (#9477)
- no more hard coded contenttypes
- permalinks for topics, categories, subcategories
- better uploads handling
- tag support
2020-04-22 22:04:59 +02:00
Jay Pfaffman
3c72cbc5de
FIX: Google groups import changed login URL (#9432)
I'm not clear why changing only the `wait_for_url` address was necessary and not also the `get` a few lines above, but this change seems to work for me on both literatecomputing.com Groups and a public group.
2020-04-15 16:45:14 -04:00
Gerhard Schlager
2b2584912a Improve Telligent import script
* Imports private messages
* Replaces internal links for topics and replies
* Allows incremental import of accepted answers
2020-04-03 18:10:52 +02:00
Gerhard Schlager
739430c01e FIX: mbox import failed if no tags were configured 2020-03-26 16:41:11 +01:00
Gerhard Schlager
d216483c53 FIX: Importing with pgbouncer failed
Checking if all records have been imported uses a temp table in PostgreSQL. This fails when pgbouncer is used unless the temp table is created inside a transaction.
2020-03-26 16:41:09 +01:00
Gerhard Schlager
c94b63bc75 DEV: Improve import of attachments from Telligent 2020-03-26 16:37:55 +01:00
Gerhard Schlager
5b2b769eb7 DEV: Ensure uploads aren't deleted during imports
Sidekiq might delete uploads if you, for some reason, create upload records before using them in posts.
2020-03-24 17:14:16 +01:00
Gerhard Schlager
445b35381d Improve Telligent import script
* Detects mostly all attachments and it's a lot faster
* Parses user properties in Ruby instead of the DB, because that's less errorprone
* Imports user avatars
* Imports topic views by users
* Better handling of quotes and YouTube links
2020-03-23 09:18:12 +01:00
Gerhard Schlager
d27ece9ded FIX: Method from Telligent import script was deleted by accident 2020-03-14 22:10:40 +01:00
Gerhard Schlager
e825f47daa DEV: Better handling of incremental scrapes for Google Groups 2020-03-14 00:00:36 +01:00
Gerhard Schlager
0a88232e87 DEV: Improve mbox import script
* Better documentation of settings
* Add option to exclude trimmed parts of emails (enabled by default) to not revail email addresses
2020-03-14 00:00:36 +01:00
Gerhard Schlager
36062f43c8 DEV: Improve Telligent import script
* Adds ability to map forums to categories and tags as well as ignore forums.
* Fixes regular expression for detecting attachments in posts.
* Handles "remote attachments" 😮 by inserting a link.
* Imports view counts for topics.
* Handles incorrect references of parent posts.
* Better handling of quotes.
* Finds a lot more attachments by trying to replace various Unicode characters in filenames.
2020-03-14 00:00:36 +01:00
Gerhard Schlager
ba1b840816 DEV: Don't deactivate suspended users during import
Otherwise a cleanup job might delete those deactivated users.
2020-03-14 00:00:36 +01:00
Justin DiRose
6c948f27ea
FIX: Missing constant in SMF2 importer (#9178) 2020-03-11 10:19:59 -05:00
Gerhard Schlager
0e752db411 DEV: Improve mbox import script
* Customizable email subject prefixes to remove "Re" and "Fwd" as well as localized prefixes.
* Configuration option for prefixes like [FOO] or (BAR) which can be replaced with tags during import.
* Bugfix: Import script might have skipped some users due to missing ORDER BY.
2020-03-09 10:26:45 +01:00
Gerhard Schlager
edc8d58ac3 FEATURE: Add site setting to disable staged user cleanup
... and disabled the cleanup during imports, otherwise a running Sidekiq might delete users before posts are created
2020-03-09 10:26:41 +01:00
Jarek Radosz
f7ea2fdea5
FIX: Import posts of missing users from phpbb3 (#9085)
Posts without a user probably shouldn't happen unless there was some direct database tampering, but data like that has been seen in the wild.

The importer will assign those posts to the "system" user.
2020-03-06 22:54:40 +01:00
Gerhard Schlager
d7ccb58559 FIX: Google Groups scraper failed to login 2020-03-02 17:24:48 +01:00
Brad Morrical
ff5ff8d0d2
fix invalid byte sequence in UTF-8 (ArgumentError) (#9077) 2020-02-28 10:26:18 -05:00
Justin DiRose
f35ee5e887
DEV: Improvements to SMF2 script (#9006) 2020-02-24 12:51:45 -06:00
Régis Hanol
0843e3e6ce FIX: add support for sub-sub-categories in base_importer
Also delegates 'post_already_imported?' and 'user_already_imported?' to the base importer.
2020-02-05 10:40:28 +01:00
AlexP11223
1e4a83cc2a FEATURE: add mybb.ru import script (#8609) 2019-12-20 11:10:18 -05:00
Martin Brennan
edbc356593
FIX: Replace deprecated URI.encode, URI.escape, URI.unescape and URI.unencode (#8528)
The following methods have long been deprecated in ruby due to flaws in their implementation per http://blade.nagaokaut.ac.jp/cgi-bin/vframe.rb/ruby/ruby-core/29293?29179-31097:

URI.escape
URI.unescape
URI.encode
URI.unencode
escape/encode are just aliases for one another. This PR uses the Addressable gem to replace these methods with its own encode, unencode, and encode_component methods where appropriate.

I have put all references to Addressable::URI here into the UrlHelper to keep them corralled in one place to make changes to this implementation easier.

Addressable is now also an explicit gem dependency.
2019-12-12 12:49:21 +10:00
Sam Saffron
0c52537f10 DEV: update rubocop to version 0.77
We like to stay as close as possible to latest with rubocop cause the cops
get better.

This update required some code changes, specifically the default is to avoid
explicit returns where implicit is done

Also this renames a few rules
2019-12-10 11:48:39 +11:00
Gerhard Schlager
c218036107 FIX: Make Google Groups scraper work for G Suite users 2019-11-28 02:09:51 +01:00
Vinoth Kannan
3bb7ad4be1
FEATURE: remove support for 'suppress_from_latest' category setting. (#8308) 2019-11-18 12:28:35 +05:30
Penar Musaraj
067696df8f DEV: Apply Rubocop redundant return style 2019-11-14 15:10:51 -05:00
jelle van der Waa
2d4c9bbaac import_scripts: add fluxbb prefix to missing query (#8163)
Signed-off-by: Jelle van der Waa <jelle@archlinux.org>
2019-10-08 11:46:00 +11:00
Krzysztof Kotlarek
427d54b2b0 DEV: Upgrading Discourse to Zeitwerk (#8098)
Zeitwerk simplifies working with dependencies in dev and makes it easier reloading class chains. 

We no longer need to use Rails "require_dependency" anywhere and instead can just use standard 
Ruby patterns to require files.

This is a far reaching change and we expect some followups here.
2019-10-02 14:01:53 +10:00
Gerhard Schlager
b48ca9dee9 DEV: Simplify username validation in base importer
The `UsernameValidator` does already all the hard work. No need to do any additional checks in the import script. The checks were out-of-date anyway.
2019-10-01 20:33:09 +02:00
Gerhard Schlager
ed1e5ef6cc FIX: By default, don't abort Google Groups crawling on error 2019-09-18 18:14:04 +02:00
Gerhard Schlager
ab96239f2a FIX: Google Groups crawler failed to login
Trying to automate the login into a Google account is quite hard. This makes the crawler use the content of a cookies.txt file instead. It also removes a couple of deprecation warnings and adds some color to the output.
2019-09-18 13:09:20 +02:00
Gerhard Schlager
888b635cfc Import avatars and likes in the Zendesk AP importer
Co-authored-by: Justin DiRose <justin@justindirose.com>
2019-08-14 10:42:52 +02:00
Gerhard Schlager
4ed517a344 DEV: Make Rubocop happy
Follow-up to 6cc9fe42
2019-08-12 23:10:58 +02:00
Mohamad Abras
6cc9fe42ce add mongo adapter to nodebb importer (#8000) 2019-08-12 14:15:11 -04:00
Rishabh
dcb47d902b
REFACTOR: Rename SiteSetting.disable_edit_notifications to disable_system_edit_notifications (#7958)
* REFACTOR: Rename SiteSetting.disable_edit_notifications to disable_system_edit_notifications

- The older name could cause some confusion because the setting does not disable all edit notifications, only system ones.

* FIX: Add frozen_string_literal: true in the migration

* DEV: Deprecate 'disable_edit_notifications'
2019-07-31 20:20:41 +05:30
Gerhard Schlager
fd12c414e7 DEV: Refactor helper methods for upload markdown
Follow-up to a61ff167
2019-07-25 16:36:35 +02:00
Gerhard Schlager
a61ff16740 DEV: Make attachment markdown reusable 2019-07-25 14:04:18 +02:00
Gerhard Schlager
f0fea5991f FIX: Latest Selenium gem broke Google Groups import script
Selenium uses Keep-Alive since version 3.141, so the net-http-persistent gem shouldn't be needed anymore.
2019-07-10 09:45:33 +02:00
Arpit Jalan
6d30be1f94 Improve XenForo import script.
- ensure only active, unbanned users are imported.
- ensure only visible threads/posts are imported.
2019-06-18 15:52:34 +05:30
Arpit Jalan
77f5577e30 DEV: Improvements to AnswerHub import script. 2019-06-13 11:46:17 +05:30
Guo Xiang Tan
36c0cfa890 FIX: Use new attachment markdown format in ImportScripts::Uploader. 2019-06-11 14:49:28 +08:00
Blake Erickson
0955d9ece9 create answerhub importer (#7671) 2019-06-03 12:17:22 +10:00
Gerhard Schlager
0f3c3bc309 Make import scripts work with frozen strings 2019-05-30 22:22:24 +02:00
Gerhard Schlager
c70d0c6659 Use an invalid domain for fake email addresses in importers 2019-05-30 22:22:24 +02:00
Gerhard Schlager
d3ba338144 Make Telligent import script more generic 2019-05-30 22:22:24 +02:00
Sam Saffron
7429700389 FIX: ensure we can download maxmind without redis or db config
This also corrects FileHelper.download so it supports "follow_redirect"
correctly (it used to always follow 1 redirect) and adds a `validate_url`
param that will bypass all uri validation if set to false (default is true)
2019-05-28 10:28:57 +10:00
Sam Saffron
678a9a61c4 DEV: lint importer
commit #f490ed3b introduced a few linting issues, resolved now
2019-05-17 16:37:08 +10:00
Edmond Lepedus
f490ed3bbc FEATURE: Add attachment support to xenforo importer (#7548)
* FEATURE: Add attachment support to XenForo importer

If `ATTACHMENT_DIR` is provided, importer will scan each imported post
for `[GALLERY]` and `[ATTACH]` tags, attempt to import the referenced files
as Discourse uploads and replace the tags with Discourse markup.

References to files which cannot be imported are stripped.

NOTE: This only imports attachments which are referenced in imported
posts. Any XenForo media or files which are not referenced in any post
using `[ATTACH]` or `[GALLERY]` tags will not be imported. The goal is to
ensure that we don't have posts with missing images and unsightly
markup, NOT to ensure that all attachments are migrated.

* FEATURE: Add attachment support to XenForo importer

If `ATTACHMENT_DIR` is provided, importer will scan each imported post
for `[GALLERY]` and `[ATTACH]` tags, attempt to import the referenced files
as Discourse uploads and replace the tags with Discourse markup.

References to files which cannot be imported are stripped.

NOTE: This only imports attachments which are referenced in imported
posts. Any XenForo media or files which are not referenced in any post
using `[ATTACH]` or `[GALLERY]` tags will not be imported. The goal is to
ensure that we don't have posts with missing images and unsightly
markup, NOT to ensure that all attachments are migrated.

* FEATURE: Add attachment support to XenForo importer

If `ATTACHMENT_DIR` is provided, importer will scan each imported post
for `[GALLERY]` and `[ATTACH]` tags, attempt to import the referenced files
as Discourse uploads and replace the tags with Discourse markup.

References to files which cannot be imported are stripped.

NOTE: This only imports attachments which are referenced in imported
posts. Any XenForo media or files which are not referenced in any post
using `[ATTACH]` or `[GALLERY]` tags will not be imported. The goal is to
ensure that we don't have posts with missing images and unsightly
markup, NOT to ensure that all attachments are migrated.
2019-05-17 16:18:28 +10:00
Sam Saffron
30990006a9 DEV: enable frozen string literal on all files
This reduces chances of errors where consumers of strings mutate inputs
and reduces memory usage of the app.

Test suite passes now, but there may be some stuff left, so we will run
a few sites on a branch prior to merging
2019-05-13 09:31:32 +08:00
Guo Xiang Tan
152238b4cf DEV: Prefer public_send over send. 2019-05-07 09:33:21 +08:00
Sam Saffron
9be70a22cd DEV: introduce new API to look up dynamic site setting
This removes all uses of both `send` and `public_send` from consumers of
SiteSetting and instead introduces a `get` helper for dynamic lookup

This leads to much cleaner and safer code long term as we are always explicit
to test that a site setting is really there before sending an arbitrary
string to the class

It also removes a couple of risky stubs from the auth provider test
2019-05-07 11:00:30 +10:00
Gerhard Schlager
74ca49d7cd FIX: Importing of polls from phpBB3 was broken
Follow-up to 24369a81
2019-05-06 12:37:19 +02:00
Guo Xiang Tan
24347ace10 FIX: Properly associate user_profiles background urls via upload id.
`Upload#url` is more likely and can change from time to time. When it
does changes, we don't want to have to look through multiple tables to
ensure that the URLs are all up to date. Instead, we simply associate
uploads properly to `UserProfile` so that it does not have to replicate
the URLs in the table.
2019-05-02 14:58:24 +08:00
MMX
5d4aa256be FIX: category logo upload error in Discuz importer.(#7453) 2019-04-29 17:01:15 +02:00
Michael K Johnson
9fc3de01bb FEATURE: Add import script for Friends+Me Google+ Exporter JSON archives (#7334)
This script has been used to import over 50,000 Google+ posts
and over 300,000 comments from 29 communities into a single
Discourse instance, as well as for at least three other
imports.  Google+ has closed for the public, but it is still
available at this time for GSuite customers. If GSuite customers
decide to migrate from Google+ to Discourse, or if Google
"sunsets" Google+ for GSuite customers, this importer may be
useful.
https://www.reddit.com/r/FMGE_Support/comments/b8sa5h/fmge_for_gsuite/

Development and use of this script has been discussed in detail:
https://meta.discourse.org/t/bounty-google-private-communities-export-screenscraper-importer/108029
2019-04-23 14:04:09 +10:00
Robin Ward
b58867b6e9 FEATURE: New 'Reviewable' model to make reviewable items generic
Includes support for flags, reviewable users and queued posts, with REST API
backwards compatibility.

Co-Authored-By: romanrizzi <romanalejandro@gmail.com>
Co-Authored-By: jjaffeux <j.jaffeux@gmail.com>
2019-03-28 12:45:10 -04:00
Gerhard Schlager
453ba2da7b Make Google Groups scraper work with latest chromedriver 2019-03-25 16:11:22 +01:00
Gerhard Schlager
2349ba3bc4 Improve Google Groups scraper
* Better error detection during login phase
* Experimental support for 2FA and SMS codes
* Detect missing permissions to scrape email addresses
2019-03-24 23:15:13 +01:00
Penar Musaraj
0db2846a5b Add user bios to NodeBB importer 2019-03-20 16:40:26 -04:00
Penar Musaraj
b6a7b851c7 Nodebb importer: add permalinks, exclude disabled categories 2019-03-18 21:59:02 -04:00
Penar Musaraj
9334d2f4f7
FEATURE: add more granular user option levels for email notifications (#7143)
Migrates email user options to a new data structure, where `email_always`, `email_direct` and `email_private_messages` are replace by

* `email_messages_level`, with options: `always`, `only_when_away` and `never` (defaults to `always`)
* `email_level`, with options: `always`, `only_when_away` and `never` (defaults to `only_when_away`)
2019-03-15 10:55:11 -04:00
Gerhard Schlager
941e096df4 Fix error in base import script
Follow-up to 655a08dbbd
2019-03-06 21:58:25 +01:00
maulkin
655a08dbbd FIX: Return actual errors if PostCreator fails (#7096) 2019-03-06 21:29:37 +01:00
Penar Musaraj
b1035cc691 FIX: NodeBB import details
- mark imported users as active

- do not strip @ from usernames in post content

- improve uploads path matching
2019-03-06 12:30:36 -05:00
Gerhard Schlager
c36c9c2ee5 FEATURE: Import script for AnswerBase
Improves the generic database used by some import scripts:
* Adds additional columns for users
* Adds support for attachments
* Allows setting the data type for keys (numeric or string) to ensure correct sorting
2019-02-28 22:08:12 +01:00
Gerhard Schlager
24369a8166 Improve phpBB3 importer
* Log errors when mapping of posts, messages, etc. fails
* Allow permalink normalizations for old subfolder installation
* Disable importing of polls for now. It's broken.
2019-02-17 23:20:20 +01:00
Gerhard Schlager
8d5dfe1e01 FIX: Don't import parts of the email address as name 2019-02-17 22:59:18 +01:00
Régis Hanol
1e67bcb456
PERF: bulk feature topic users & reset topic counters after an import 2019-01-17 21:48:23 +01:00
Régis Hanol
788719d271 DEV: speed up posts base imports 2019-01-04 15:30:17 +01:00
Arpit Jalan
71a5369fef FIX: do not convert quote tags to markdown 2018-12-11 20:09:46 +05:30
Régis Hanol
3c9c95ac83 Update Rubocop to 0.60 2018-12-04 10:48:16 +01:00
David Taylor
9248ad1905 DEV: Enable Style/SingleLineMethods and Style/Semicolon in Rubocop (#6717) 2018-12-04 11:48:13 +08:00
Guo Xiang Tan
5076487eaf Update discuz_x import script to not use Category#logo_url. 2018-11-09 14:15:31 +08:00
Gerhard Schlager
d6f89a85ef Make Rubocop happy 2018-10-31 01:30:14 +01:00
Gerhard Schlager
65db9326b4 FEATURE: Add download script for Google Groups 2018-10-31 01:12:05 +01:00
Gerhard Schlager
efa265cbc8 Rename mbox import script 2018-10-31 01:12:05 +01:00
Gerhard Schlager
edbc004a9a Remove old mbox import script 2018-10-31 01:12:05 +01:00
Régis Hanol
c39a1022cc PERF: user imports would slow down the more users were imported 2018-10-22 11:14:13 +02:00
Régis Hanol
afa22a0c6f REFACTOR: more 'fake_email' to base importer 2018-10-22 11:12:40 +02:00
Régis Hanol
8b20e2500a
Remove unnecessary line 2018-10-19 15:48:48 +02:00
Régis Hanol
637123ff6f Merge users based on their email in vBulletin importer 2018-10-19 15:16:45 +02:00
Régis Hanol
53aa0344bf FIX: properly import vBulletin's hashed password 2018-10-18 10:22:55 +02:00
Régis Hanol
5f2fb0fe33 Show original options when an error happens while importing an user 2018-10-18 10:21:12 +02:00
Gerhard Schlager
341836eb42 Fix the rake task and importer instead 2018-10-17 16:48:09 +02:00
Gerhard Schlager
ee18d9ace0 FIX: mbox importer and rake task were broken 2018-10-17 16:34:18 +02:00
Neil Lalonde
a68032835a FEATURE: XenForo importer can import categories from the xf_node table and convert sub-categories beyond second level to tags 2018-10-11 12:04:15 -04:00
Guo Xiang Tan
71185c13b5
Merge pull request #6377 from tgxworld/remove_tif_tiff
Drop `tif`, `tiff`, `webp` and `bmp` from supported images.
2018-09-12 09:32:32 +08:00
Guo Xiang Tan
e1b16e445e Rename FileHelper.is_image? -> FileHelper.is_supported_image?. 2018-09-12 09:22:28 +08:00
Carsten Brandt
921e2213b8 FEATURE: Updated IPB import script
* IPB import script replace PHP code tags with proper markdown

remove excess newlines in code blocks
decode HTML entities in code blocks
add replacement for list items
proper handling of attachments that are not images
fix typo
improved quote handling
fix code style complaint from travis-ci build
2018-09-12 11:12:28 +10:00
Guo Xiang Tan
434035f167 FIX: Link post to uploads in PostCreator.
* This ensures that uploads are linked to their post on creation
  instead of a background job which may be delayed if Sidekiq
  is facing difficulties.
2018-09-06 11:18:11 +08:00
Guo Xiang Tan
8dc1463ab3 Enable Lint/ShadowingOuterLocalVariable for Rubocop. 2018-09-04 10:16:42 +08:00
Neil Lalonde
15f657309a FEATURE: Zendesk importer that uses its API to get data 2018-08-28 10:21:39 -04:00