Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write >> as infix notation for OptimiserChain #139

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mcabbott
Copy link
Member

@mcabbott mcabbott commented Apr 6, 2023

This lets you chain optimisation rules like setup(ClipGrad(1.0) => WeightDecay() => Descent(0.1), model). It's just notation & printing, there is no functional change to OptimiserChain.

Edit: now uses >> instead of =>.

@CarloLucibello
Copy link
Member

What about using |> instead?

@mcabbott
Copy link
Member Author

mcabbott commented Apr 6, 2023

That's possible. But x |> f normally returns a value... the operation seems more like f ∘ g in that it returns another object like f. In fact Chain is almost exactly , just written left-to-right.

@CarloLucibello
Copy link
Member

f |> g would be the natural candidate for the reversed composition operator. Also, making r1 => r2 not return a Pair is technically breaking, while defining

(r2::AbstractRule)(r1::AbstractRule) = OptimiserChain(r1, r2)

would only break piratical (and unlikely) similar definitions.

@ToucheSir
Copy link
Member

ToucheSir commented Apr 10, 2023

Since chaining rules is so close to f ∘ g as mentioned, can we just overload that operator? Unlike => which is just an alias for Pair and |> which has well-defined (and different) semantics, we are basically composing rules here.

@mcabbott
Copy link
Member Author

mcabbott commented Apr 10, 2023

One problem with is that the order is backwards. OptimiserChain matches Flux.Chain in putting the first operation leftmost.

We could literally use Flux.Chain by moving the struct Chain here, to be shorter. But maybe this is confusing. Rules are a lot like functions which take dx and return modified dx, but not exactly, whereas layers really are.

@ToucheSir
Copy link
Member

ToucheSir commented Apr 10, 2023

I personally don't think asking users to write the order of rules "backwards" is a big deal, but that might just be me. Suggesting OptimiserChain be aliased to something short (e.g. Optax has chain) could be a docs-only solution. If we're ok with unicode, then I don't think \rightarrow and co are bound to anything. Ideally we could find an operator like * which parses to op(a, b, c, ...) so that only one call to OptimiserChain is required.

@mcabbott
Copy link
Member Author

The only other vararg infix operator is ++. Although it isn't a big deal to digest things like :(1 => 2 => 3), indeed :(sin ∘ cos ∘ tan) is binary but prints as if it's not.

@ToucheSir
Copy link
Member

Yes, finding a good operator is the higher priority.

@CarloLucibello
Copy link
Member

I still think |> makes the most sense, it means composition left to right although it is "sourced" by the initial argument in the chain. As an alternative, also \rightarrow (can be typed also \to) is fine.

@darsnack
Copy link
Member

We discussed this during the call. In general, all the options outlined here don't seem worth it. There is very little value-add, and we are either breaking/bending existing semantics or unnecessarily using up an infix operator (the latter being poor form given that Julia is not permissive about these).

There are already workable solutions for users: use the new import ... as ... syntax or use const ... = OptimiserChain (the latter works for infix operators like \rightarrow mentioned above).

Here's some additional context for the current naming: #9 (comment) (which also goes into stuff discussed here like function composition being a bit confusing).

One option we settled on is to have an un-exported verb: Optimisers.chain. This will just make the OptimiserChain for you, but it has the benefit of not clashing with Flux.Chain if someone wants to explicitly import it. Given the generality of the name, we will not export it by default.

@mcabbott
Copy link
Member Author

Building in two distinct names to allow someone to write using Optimisers: chain and afterwards save a few chars doesn't seem great.

The nice thing about infix syntax like WeightDecay() => Descent(0.1) is that it also saves one level of annoying nested brackets. In both typing and printing.

breaking

I am very dubious that one case of Pair(AbstractRule, AbstractRule) exists in the wild. Nevertheless we just missed a golden opportunity to roll this into the 0.3 change.

don't think asking users to write the order of rules "backwards" is a big deal

I do think this is quite confusing. (In fact I remain a little confused why AdamW seems to be backwards, but that's another topic.)

unnecessarily using up an infix operator

This objection is to defining a currently undefined operator, like ++ or \rightarrow. It's awkward when two packages use the same one.

It does not apply to abusing other built-in infix operators. We could use say >>? This has the advantage that, unlike |> and , it does not have a confusing nearby pre-existing meaning.

@ToucheSir
Copy link
Member

(In fact I remain a little confused why AdamW seems to be backwards, but that's another topic.)

That's #46 (comment) and FluxML/Flux.jl#1612. PyTorch inexplicably chooses to do their own thing, and despite reading pytorch/pytorch#21250 (comment) multiple times I still don't understand why.

@mcabbott mcabbott changed the title write => for OptimiserChain write >> as infix notation for OptimiserChain Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants