Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve logfuncdensity behavior and add isdensity #9

Merged
merged 20 commits into from
Nov 16, 2021
Merged

Conversation

oschulz
Copy link
Collaborator

@oschulz oschulz commented Nov 10, 2021

Updated

Replaces hasdensity by DensityKind (#9)

Also changes behavior of logfuncdensity to always return a density and adds funcdensity.

@oschulz oschulz marked this pull request as draft November 10, 2021 17:19
@codecov
Copy link

codecov bot commented Nov 10, 2021

Codecov Report

Merging #9 (e28096a) into master (ccade11) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##            master        #9   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            2         2           
  Lines           37        59   +22     
=========================================
+ Hits            37        59   +22     
Impacted Files Coverage Δ
src/interface.jl 100.00% <100.00%> (ø)
src/interface_test.jl 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ccade11...e28096a. Read the comment docs.

@oschulz
Copy link
Collaborator Author

oschulz commented Nov 10, 2021

@oschulz
Copy link
Collaborator Author

oschulz commented Nov 10, 2021

Closes #5 (well, eventually ...)

@oschulz oschulz changed the title WIP Prototype adding measure theory concepts [WIP] Prototype adding measure theory concepts Nov 10, 2021
@cscherrer
Copy link
Contributor

Great! Just a note, we'll need a similar PR for MeasureTheory in addition to the one for MeasureBase, to be sure this can extend naturally and all tests can pass. I'll try this out with Soss as well, that can help us check for a wider range of use cases.

Copy link
Contributor

@phipsgabler phipsgabler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now find the semantics of hasdensity and ismeasure a bit complicated for the casual user. And especially it ties things to measures by name; not that I'm against that for the special case, but it precludes other names a priori.

I know you have discussed the return type of hasdensity already, but what about something like IteratorEltype to cover three cases?

abstract type DensityType end
struct NoDensity <: DensityType end
struct HasDensity <: DensityType end
struct IsDensity <: DensityType end
struct IsMeasure <: DensityType end # can perhaps even be left out

DensityType(x) = DensityType(typeof(x))
DensityType(::Type) = NoDensity()
DensityType(::Union{FuncDensity, LogFuncDensity}) = IsDensity()
DensityType(::Union{Distribution, PPLModel}) = HasDensity()
DensityType(::AbstractMeasure) = IsMeasure() # can perhaps even be left out

# some name to cover both "has" and "is"...
isdensitylike(x) = !(DensityType(x) isa NoDensity)

In my opinion this is clearer than conditionals like hasdensity(object) && !ismeasure(object), and users outside measure theory never need to even see the term.

src/interface.jl Outdated Show resolved Hide resolved
@devmotion
Copy link
Member

I know you have discussed the return type of hasdensity already, but what about something like IteratorEltype to cover three cases?

I like this since it would keep the API simpler and cleaner.

@oschulz
Copy link
Collaborator Author

oschulz commented Nov 10, 2021

NoDensity <: DensityType, HasDensity <: DensityType, IsDensity <: DensityType, IsMeasure <: DensityType

I don't like that specific version so much, because that way IsMeasure doesn't impy HasDensity, which it should. But we could use abstract types and have a hierarchy among them. This might be the right approach, compared to the Bool functions, now that we need to differentiate between different kinds of things that have a density. There could be other things that can be said to have a density but aren't densities or measures themselves. In Physics, for example, one might deal with objects that have a (e.g. mass) density, and a physical object is neither a density nor a measure itself.

Using types would make this extensible by other packages. We should think a bit about possible implications first, though.

@cscherrer
Copy link
Contributor

There could be other things that can be said to have a density but aren't densities or measures themselves. In Physics, for example, one might deal with objects that have a (e.g. mass) density, and a physical object is neither a density nor a measure itself.

I don't think this is really a counterexample. mass density is a local ratio of mass to volume, both of which are measures. The physical object isn't a measure, but it has these measures associated with it, and this "density" is a proper density in the measure-theoretic sense.

src/interface.jl Outdated Show resolved Hide resolved
@oschulz
Copy link
Collaborator Author

oschulz commented Nov 10, 2021

The physical object isn't a measure, but it has these measures associated with it, and this "density" is a proper density in the measure-theoretic sense.

Yes, that's what I mean - we'd say the object has a proper density, so hasdensity(object) == true, but neither isdensity(object) == true nor ismeasure(object) == true would make sense.

@mschauer
Copy link
Member

With a Volume and a CountingMeasure and having a fallback, perhaps DynamicDensity living in DensityInterface we should encourage to define basemeasure if that is known. That's just useful information, which might allow me to give you a meaning error message why your density doesn't work for my sampler... and its known in most cases except dynamic probabilistic programming, so define

basemeasure(::MyPPLProgram) = DynamicDensity() 

be done with it. One can wonder if one of those makes a sensible fallback. It would be useful to give density function wrappers which do this for you: say logfuncpdf and logfuncpmf

@oschulz
Copy link
Collaborator Author

oschulz commented Nov 10, 2021

With a Volume and a CountingMeasure

I would call it LebesgueMeasure or VolumeMeasure, not Volume - Volume is far too generic to be exported.

@mschauer
Copy link
Member

There is a second thing I would like to add (sorry for upsetting the apple cart), but I think a signature
basemeasure(d, x) is preferable, because it is conceivable that a package using the density interface has a simple rule where they can infer what type of density by just looking at say size(x) or typeof(x). You'd be free to ignore x there of course.

@oschulz
Copy link
Collaborator Author

oschulz commented Nov 10, 2021

There is a second thing I would like to add (sorry for upsetting the apple cart), but I think a signature basemeasure(d, x) is preferable

I don't think that would be upsetting at all. :-)

In fact, it may be quite natural, from a software design point of view: We say that densityof(object, x) will return the density value at x. But in a multiple dispatch language, what densityof does can of course be different for different types of x. So it would seem natural that the base measure can depend on on x - in theory even the actual value, not just the type - too. (Though in almost all cases it should just depend on the type of x, not the value).

Maybe this could also help with the ProductOver / IIDDensity construct, semantically? If the base density can depend on the argument, the PDF of ProductOver could be seen to be normalized. That way it could, maybe, be an actual product distribution instead of a density with complex semantics. Here, maybe we would want the base measure depend on both the type and size(x).

@oschulz
Copy link
Collaborator Author

oschulz commented Nov 10, 2021

I have a question: Are primitive measures like VolumeMeasure singletons, or do they carry information like dimensionality? So is it VolumeMeasure() or VolumeMeasure(ndims).

I have a feeling that this is an important design decision: I imagine that it would be useful in many cases, practically, to be able to predict things like array sizes bases on a base measure, and so on. On the other hand, do we always have dimensionality information available when we want/need to create a measure, esp. a primitive one? If we go with @mschauer's suggestion of having basemeasure(o, x), I think we probably do.

I would tend towards VolumeMeasure(ndims).

@devmotion
Copy link
Member

I'm sorry to crash the party here but I have serious doubts about integrating basemeasure and measures in DensityInterface. I think @phipsgabler's suggestion of a more fine-tuned trait system for densities can be useful, in particular in packages that work with explicit base measures such as MeasureTheory.

The measure-theoretical parts, however, seem to me like a reduced and more limited version of MeasureBase. I think it would be cleaner and the interface and the amount of measures would be less limited if packages that want to define and work with explicit base measures just implement the MeasureBase interface. Why adding a reduced version of something to DensityInterface that already exists in another interface package? I would view MeasureBase basically as the measuretheoretical extension of DensityInterface with explicit base measures.

I tried to explain my concerns also in this thread.

@mschauer
Copy link
Member

basemeasure(o, x) might be at least sufficient to deal with PPL dynamism like Gen.jl where each random variable has a namend address/site and x would be a corresponding named tuple (correct me if I am wrong about Gen here)

@cscherrer
Copy link
Contributor

Yes, that's what I mean - we'd say the object has a proper density, so hasdensity(object) == true, but neither isdensity(object) == true nor ismeasure(object) == true would make sense.

Isn't this just a matter of abbreviated language? The object may also have a charge or heat density. At some point, don't you end up saying "mass per unit volume"?


Ok, I have a proposal:

  1. A new package, MeasureInterface, depending on DensityInterface, and exporting basemeasure and some other very minimal primitives
  2. A new DistributionMeasures, depending on MeasureInterface, DensityInterface, and Distributions. This can mostly focus on defining basemeasure for various Distributions

Then

  • MeasureBase can depend on MeasureInterface and DensityInterface
  • MeasureTheory can (at least for now) depend on all packages discussed here, possibly refactored later.

An advantage of this is that we can easily make sure Distributions has basemeasure covered (we can have tests for this), and users who want to can easily add basemeasure methods without the requirement to depend on MeasureTheory. For those packages (assuming basemeasure is implemented well) MeasureTheory should "just work"

@devmotion
Copy link
Member

devmotion commented Nov 10, 2021

Ok, I have a proposal:

What about:

  • Switching to @phipsgabler's traits
  • Keeping basemeasure and measures in MeasureBase
  • Moving Distributions-specific definitions of basemeasure etc. from MeasureBase/Theory to Distributions

Alternatively, if MeasureBase is not lightweight enough one could extract a more lightweight MeasuresInterface package. But I assume it might not be worth it since, as far as I understand, MeasureBase is already a reduced base package and hence I fear breaking it up might lead to a somewhat incomplete package.

The main principle would be similar to ChainRules:
Packages and users that want to work with base measures etc. have to depend on MeasureBase and should define basemeasure etc.

@oschulz
Copy link
Collaborator Author

oschulz commented Nov 10, 2021

A new package, MeasureInterface, depending on DensityInterface, and exporting basemeasure and some other very minimal primitives

Sounds good to me. Should we maybe prototype that stuff here in this PR first, though, so we have all the lightweight stuff in context, and then split up? I expect the line where to split will be obvious when the design is done.

A new DistributionMeasures, depending on MeasureInterface, DensityInterface, and Distributions. > This can mostly focus on defining basemeasure for various Distributions

I would actually favour having that in Distributions. As long as it's not a lot of code, Distributions can take it, it's not exactly a small package. And having it in Distributions will encourage third-party dist developers to define their base measure. Also, when new dists are added to Distributions, we won't always need to do a separate PR to DistributionMeasures if it was integrated.

@devmotion
Copy link
Member

The problem that I see with @cscherrer's proposal is the amount of type piracy in e.g. DistributionsMeasures it would create and the overall dependency structure. I would prefer if one just has to depend on MeasureBase (or a more lightweight interface package).

@oschulz
Copy link
Collaborator Author

oschulz commented Nov 10, 2021

What about ... Switching to @phipsgabler's traits

I agree, it's more flexible that way. I'll add it to this PR, then we can take a look at it in practice.

@oschulz
Copy link
Collaborator Author

oschulz commented Nov 10, 2021

The problem that I see with @cscherrer's proposal is the amount of type piracy in e.g. DistributionsMeasures it would create and the overall dependency structure. I would prefer if one just has to depend on MeasureBase (or a more lightweight interface package).

I agree - I guess that more lightweight package (MeasureBase is too heavy I think) would be @cscherrer's proposed MeasureInterface (I like the name). Distributions would depend on it and dist-specific measures (like for Dirichlet) would live in Distributions, right next to "their" distribution.

@cscherrer
Copy link
Contributor

I like the general ideaof @phipsgabler 's suggestion a lot, we just need some time to think through how this will interface with MT, especially in terms of basemeasure

* Keeping `basemeasure` and measures in MeasureBase

* Moving Distributions-specific definitions of `basemeasure` etc. from MeasureBase/Theory to Distributions

So Distributions would depend on MeasureBase?

Alternatively, if MeasureBase is not lightweight enough one could extract a more lightweight MeasuresInterface package. But I assume it might not be worth since, as far as I understand, MeasureBase is already a reduced base package and hence I fear breaking it up might lead to a somewhat incomplete package.

Our compat section is currently

[compat]
ConcreteStructs = "0.2"
ConstructionBase = "1.3"
FillArrays = "0.12"
KeywordCalls = "0.2"
LogExpFunctions = "0.3"
MLStyle = "0.4"
MappedArrays = "0.4"
PrettyPrinting = "0.3"
Tricks = "0.1"
julia = "1.3"

We could probably pare this down some more:

  • ConcreteStructs should be easy to get rid of
  • PrettyPrinting turns out to be really nice, but we can make this a MeasureTheory-only thing if that helps
  • I'll need to check again what we're using MLStyle for, I suspect we can get by without it

The main principle would be similar to ChainRules: Packages and users that want to work with base measures etc. have to depend on MeasureBase and should define basemeasure etc.

Right, that was the original intent of MeasureBase.

The problem that I see with @cscherrer's proposal is the amount of type piracy in e.g. DistributionsMeasures it would create and the overall dependency structure. I would prefer if one just has to depend on MeasureBase (or a more lightweight interface package).

Great catch, thanks

I agree - I guess that more lightweight package (MeasureBase is too heavy I think) would be @cscherrer's proposed MeasureInterface (I like the name). Distributions would depend on it and dist-specific measures (like for Dirichlet) would live in Distributions, right next to "their" distribution.

I'm a little confused now. If Distributions will depend on MeasureInterface, how is that different than just adding basemeasure to DensityInterface?

@oschulz
Copy link
Collaborator Author

oschulz commented Nov 10, 2021

I agree with @devmotion - in the end a lightweight measure-oriented package, extending DensityInterface will be more flexible than adding too much measure theory to DensityInterface itself.

I like @cscherrer 's proposed name MeasureInterface. I would see it living in JuliaMath, eventually.

For convenience though (so we don't have to hop between too many different PRs) and to see how everything with mesh, it might be easiest to prototype this lightweight measures interface within this PR initially (and mark parts of it with a comment) to see how it all fits together.

Co-authored-by: David Widmann <[email protected]>
@phipsgabler
Copy link
Contributor

Great. I still like densitykind more than densitytype, though.

@oschulz
Copy link
Collaborator Author

oschulz commented Nov 16, 2021

Great. I still like densitykind more than densitytype, though.

I agree, "type" could be misleading here, I think.

@devmotion
Copy link
Member

I don't mind 🤷‍♂️ If there's an abstract supertype DensityKind I guess it would be reasonable to use the camel case DensityKind(x) as function name.

@oschulz
Copy link
Collaborator Author

oschulz commented Nov 16, 2021

How about we make DensityKind a union too, for now? That locks it, and we can always open it up for subtypes later if needed. Edit: If we use DensityKind(x), using an abstract supertype may be cleaner.

@oschulz
Copy link
Collaborator Author

oschulz commented Nov 16, 2021

Ok, we have DensityKind(object) now. @phipsgabler, are you happy with this? @devmotion is this good to merge from your side?

We should squash when we merge, this PR's commit history is very ugly.

Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you forgot to actually define them as subtypes 🙂

src/interface.jl Outdated Show resolved Hide resolved
src/interface.jl Outdated Show resolved Hide resolved
src/interface.jl Outdated Show resolved Hide resolved
@devmotion
Copy link
Member

We should squash when we merge, this PR's commit history is very ugly.

👍 Generally, I prefer if PRs are squashed and not all commits end up in the default branch - maybe we could make it the default (and possibly disable other options)?

oschulz and others added 3 commits November 16, 2021 16:42
Co-authored-by: David Widmann <[email protected]>
Co-authored-by: David Widmann <[email protected]>
Co-authored-by: David Widmann <[email protected]>
@oschulz
Copy link
Collaborator Author

oschulz commented Nov 16, 2021

maybe we could make it the default (and possibly disable other options)

I think in some cases a merge commit can make sense, though (but not in this one!).

@oschulz
Copy link
Collaborator Author

oschulz commented Nov 16, 2021

Thanks for the fixes @devmotion! Good the way it is now?

Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just some minor comment regarding == and ===.

docs/src/index.md Outdated Show resolved Hide resolved
docs/src/index.md Outdated Show resolved Hide resolved
"""
IsOrHasDensity = Union{IsDensity, HasDensity}

As a return value of [`DensityKind(object)`](@ref), indicates that `object`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this ever be a return value of DensityKind?

test/test_interface.jl Outdated Show resolved Hide resolved
test/test_interface.jl Outdated Show resolved Hide resolved
test/test_interface.jl Outdated Show resolved Hide resolved
oschulz and others added 5 commits November 16, 2021 17:13
@oschulz
Copy link
Collaborator Author

oschulz commented Nov 16, 2021

Thanks for the equality fixes (don't wonder about the multiple commits, for some reason GitHub didn't let me batch them).

@oschulz
Copy link
Collaborator Author

oschulz commented Nov 16, 2021

Increased version number to v0.4.0, since this is breaking (I volunteer to take care of the PR to update Distributions).

Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks!

@oschulz
Copy link
Collaborator Author

oschulz commented Nov 16, 2021

Thanks everyone!

@oschulz oschulz merged commit e71eb2d into master Nov 16, 2021
@oschulz oschulz deleted the measures branch November 16, 2021 16:46
@oschulz
Copy link
Collaborator Author

oschulz commented Nov 16, 2021

@oschulz
Copy link
Collaborator Author

oschulz commented Nov 16, 2021

Here's the update for Distributions: JuliaStats/Distributions.jl#1427

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants