KEMBAR78
Add scheduled workflow to update database by Rylan12 · Pull Request #74 · Homebrew/homebrew-command-not-found · GitHub
Skip to content
This repository was archived by the owner on Sep 23, 2025. It is now read-only.

Conversation

@Rylan12
Copy link
Member

@Rylan12 Rylan12 commented Apr 14, 2021

This PR attempts to create a scheduled workflow to update the database using brew which-update.

It adds a new --install-missing option to which-update which will install missing formulae (i.e. those that are bottle :unneeded) that wouldn't otherwise be updated.

For now, I decided to run this on both macOS and Linux, each time once a day. It's probably overkill to run it on each but some formulae that are only available on one platform would otherwise be missed.

Edit: I realized that we probably don't want two jobs doing mostly the same thing running at the same time because they will both changes. For now, I changed to only run on macOS. I think this will cover most cases and, as always, specific linux-only instances can be managed manually if needed. If automating on Linux is still desired, let me know and we can brainstorm how to solve the problem.

I also set up the commit signing to that these commits are signed by BrewTestBot.

See Homebrew/brew#11137

@Rylan12
Copy link
Member Author

Rylan12 commented Apr 15, 2021

Something else that I've noticed is that some of the existing entries are outdated. I found this out because running brew which-update locally updated some of the entries for formulae I have installed. In most cases, this probably isn't a huge deal but it may be worth considering how to handle these in a reasonable way.

I'm not sure how long it would take to build the entire database from scratch, but I'm assuming a while, but I wonder if we want to totally rebuild it (maybe this is run once a month). Or, we could run more regular updates that only update some of the formulae (e.g. runs once a week and checks ~25% of the formulae each time, probably alphabetically). That way the database is kept up to date but doesn't require running a huge job all the time.

@bfontaine
Copy link
Contributor

Thank you a lot! ❤️

One way to keep the database up to date would be to have some kind of hook on homebrew-core PRs and updated only the formulæ that were modified by these PR(s).

@Rylan12
Copy link
Member Author

Rylan12 commented Apr 15, 2021

One way to keep the database up to date would be to have some kind of hook on homebrew-core PRs and updated only the formulæ that were modified by these PR(s).

Yeah, that could be an option. To be honest it would be pretty simple to add it to the existing test workflow. I think I'd be a little hesitant security-wise to do that (just because keeping the repos separate, especially when one can push to the other, seems like a good idea). Let's ask some other maintainers for their thoughts.

Comment on lines 51 to 57
if install_missing
(Formula.core_names - db.formula_names).each do |formula|
ohai "Installing #{formula}"
system HOMEBREW_BREW_FILE, "install", "--formula", formula
end
db.update!
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much disk space does this use?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on how many are missing tbh. Currently, there are 55. The idea is that there will only ever be a handful when this runs (i.e. any new formulae added that day) so it won't be too big a deal to install them all.

Copy link
Member

@Bo98 Bo98 Apr 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I misread how this worked. Do we ever update the list of existing formulae? The binaries shipped can change from version to version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the issue I've brought up above. Currently no.

@Rylan12
Copy link
Member Author

Rylan12 commented Apr 15, 2021

An alternative suggested by @carlocab would be host the executables.txt file in the homebrew/core tap instead of here. That way, it wouldn't be a big deal to update the file whenever a homebrew/core PR is merged and we might be able to have the ability to check in third-party taps as well (if they would like). Plus, a benefit to this is that the "point of truth" of what a tap's formulae contain exist in that tap rather than in this repo.

Thoughts on this, @bfontaine?

@bfontaine
Copy link
Contributor

👍 This is a great idea!

@Rylan12
Copy link
Member Author

Rylan12 commented Apr 16, 2021

Thinking about this a little more, I think I'm a tad hesitant to move this into Homebrew/core. This would add complexity and potential slowdowns to the homebrew/core repo for a feature that exists as a totally separate entity. I wonder if it's still better to handle the automatic updates in this repo. There might be other ways around this (e.g. detecting which formulae have been updated and only working on those). Thoughts @carlocab or @alebcay? (just moving our Slack discussion to here)

@Bo98
Copy link
Member

Bo98 commented Apr 16, 2021

Store the pkg_version in the database?

@Rylan12
Copy link
Member Author

Rylan12 commented Apr 16, 2021

Store the pkg_version in the database?

Good idea, thanks. I think that probably makes more sense. That way we can regularly update but still keep this separate from homebrew/core. I'll spend some time on that today.

@carlocab
Copy link
Member

This would add complexity and potential slowdowns to the homebrew/core repo for a feature that exists as a totally separate entity.

How much slower would it make running a workflow? I imagine we could probably run it as a separate workflow once a day on macos-latest.

@Rylan12
Copy link
Member Author

Rylan12 commented Apr 16, 2021

How much slower would it make running a workflow? I imagine we could probably run it as a separate workflow once a day on macos-latest.

Yeah, we could. But at that point, why bother putting it in homebrew/core at all when the same can be done here, keeping all of the command-not-found parts in this tap?

@carlocab
Copy link
Member

I guess third-party taps being able to do it would be nice, but I suppose given there's not much demand for the feature we don't need to do it. Fine with doing it here.

@Bo98
Copy link
Member

Bo98 commented Apr 17, 2021

On a related note, something I've always thought would be nice is a database of files installed for each formula, like what Debian and Alpine do. Though I'm probably alone in that.

Would be useful for finding conflicts and duplicated dependencies.

@Rylan12
Copy link
Member Author

Rylan12 commented Apr 17, 2021

Yeah, I could see the value in that. I'm not sure this is really the right place for it, though, as it's not directly related to the rest of what command-not-found does.

Although, command-not-found could certainly be used as a starting point for something like this as it already has logic for checking bottles to create a database.

@bfontaine
Copy link
Contributor

You’re not alone: #37 😉

@Rylan12
Copy link
Member Author

Rylan12 commented Apr 17, 2021

Unless there are any major concerns my plan is to merge this later today when I can monitor the workflow run (and quickly revert if needed).

@Rylan12 Rylan12 merged commit 338f092 into Homebrew:master Apr 17, 2021
@Rylan12 Rylan12 deleted the automation branch April 17, 2021 14:36
@Rylan12
Copy link
Member Author

Rylan12 commented Apr 20, 2021

@bfontaine I think the automation is now in a stable state, so here's a summary of what I've done:

  • I've added pkg_versions to the database. Now, instead of just storing hello:hello we store hello(2.10):hello. brew which-formula works fine if those versions aren't there.
  • I've also added three new flags to brew which-update:
    • --update-existing: when passed, which-update will download and check any formulae whos latest version in homebrew/core doesn't match the version in the database. Without this, only formulae that aren't in the database yet are downloaded
    • --install-missing: when passed, which-update will attempt to install any bottle :unneeded formulae that are missing (or outdated if --update-existing is passed) to check them. If an install fails, the command ignores the failure and just continues. This is mainly an option to make CI cleaner.
    • --max-downloads: if set, this limits the number formulae to download/install during a run. If no number is specified, there's no limit. This was added mainly for CI because when trying to update the entire database, the runner ran out of storage. Instead, I added this flag to run the large job in smaller chunks. This probably won't be used much now that the initial database is updated, but would be helpful for any future mass-updates that are needed. Just a note: failed installs do cound toward the maximum downloads because sometimes an install fails after it's already started downloading some stuff (e.g. a dependency fails to install).
  • I've added a workflow to automatically run brew which-update --commit --update-existing --install-missing and push the commit to master once a day. By default, this has no maximum number specified
    • This workflow can also be triggered manually by going to the Actions tab, clicking Scheduled database updates, clicking the Run workflow dropdown, entering an optional maximum number of downloads (can be left blank), and clicking Run workflow.

In general, there shouldn't be much manual intervention needed. I think the only reasons would be if a weird case somes up and the automated which-update call fails for some reason (might need to update the database locally in that case), there are just too many updates (in which case you can manually run with a maximum download number), or you're testing out a change to the workflow.

If you'd like to test a change to the workflow/command, this seems to be the best way to do so:

  • Open a PR from a branch within this repo (i.e. not from a fork) making the changes you need
    • A tip: if you want to see the output but not actually push the changes to master, you can comment out the last item in he workflow and add some debugging messages
  • Test the workflow by running it manually and choosing the PR branch as the branch to run from
    • This will run the workflow and use the command from the PR branch, but make changes and push to executables.txt on the master branch
    • Also note that this workflow will not be run automatically when you open a PR (so no changes are pushed to master yet)
  • When you're ready, merge the PR.
  • All pushes to master that modify the workflow file will automatically trigger the workflow to run, so merging a PR that modifies the workflow file will trigger a rerun. Otherwise, you can always run it manually.

@bfontaine
Copy link
Contributor

Thank you @Rylan12 for all your work here and the detailed explanation!

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 22, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants