Under certain conditions, a recurring automation can end up in a state with no pending automation records, causing it to not execute again until manually triggered.
We use the `RRule` gem to calculate the next execution date and time for recurring automations. The gem takes the interval, frequency, start date, and a time range, and returns all dates/times within this range that meet the recurrence rule. For example:
```ruby
RRule::Rule
.new("FREQ=DAILY;INTERVAL=1", dtstart: Time.parse("2023-01-01 07:30:00 UTC"))
.between(Time.zone.now, Time.zone.now + 2.days)
# => [Sat, 14 Sep 2024 07:30:00.000000000 UTC +00:00, Sun, 15 Sep 2024 07:30:00.000000000 UTC +00:00]
```
However, if the time component of the first point provided to `.between()` is slightly ahead of the start date (e.g., `dtstart`), the first date/time returned by `RRule` can fall outside the specified range by the same subsecond amount. For instance:
```ruby
RRule::Rule
.new("FREQ=DAILY;INTERVAL=1", dtstart: Time.parse("2023-01-01 07:30:00 UTC"))
.between(Time.parse("2023-01-01 07:30:00.999 UTC"), Time.parse("2023-01-03 07:30:00 UTC"))
.first
# => Sun, 01 Jan 2023 07:30:00.000000000 UTC +00:00
```
Here, the start date/time given to `.between()` is 999 milliseconds after 07:30:00, but the first date returned is exactly 07:30:00 without the 999 milliseconds. This causes the next recurring date to fall into the past if the automation executes within a subsecond of the start time, leading to the automation stalling.
I'm not sure why `RRule` does this, but it seems intentional judging by the source of the `.between()` method:
b9911b7147/lib/rrule/rule.rb (L28-L32)
This commit fixes the issue by selecting the first date ahead of the current time from the list returned by `RRule`, rather than the first date directly.
Internal topic: t/138045.
We're seeing a problem where some recurring automations end up in a state where they don't have any `pending_automations` records scheduled which effectively makes the recurring automation dead. We need to add some debugging to figure out what might be causing this problem.
Internal topic: t/138045.
Some combinations of start_date and frequency/interval values can cause a recurring automation rule to either trigger before its start_date or never trigger. Example repros:
- Configure a recurring automation with hourly recurrence and a start_date several days ahead. What this will do is make the automation start running hourly immediately even though the start_date is several days ahead.
- Configure a recurring automation with a weekly recurrence and a start_date several weeks ahead. This will result in the automation never triggering even after the start_date.
These 2 scenarios share the same cause which is that the automation plugin doesn't use the start_date as the date for the first run and instead uses the frequency/interval values from the current time to calculate the first run date.
This PR fixes this bug by adding an explicit check for start_date and using it as the first run's date if it's ahead of the current time.
Prior to this fix, any change to an automation would reset `pending_automations`, now we only do it if any value related to recurrence (start_date, interval, frequency, execute_at...) has been changed.
It means that any trigger creating `pending_automations` now needs to manage them in the `on_update` callback.