-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement the sampler interface? #143
Comments
Yes, we should default to a vector of vector respective svector draws for multivariate distributions per default, but with the feature to allow |
Offhand I see a few considerations worth... considering. For scalar distributions, there are big advantages in MeasureTheory over Distributions for log-density computations. But for random sampling, this is less clear. For now, a lot of this goes through As for the multivariate case, I think we can have the contiguous memory benefits of the Distributions approach without the weird semantics. In TupleVectors, you can do julia> x = TupleVectors.chainvec(randn(3), 10)
10-element ArraysOfArrays.ArrayOfSimilarArrays{Float64, 1, 1, 2, ElasticArrays.ElasticMatrix{Float64, Vector{Float64}}}:
[0.2543955996319798, 0.5657592060457689, -1.4845945938531642]
[0.0, 0.0, 0.0]
[0.0, 0.0, 0.0]
[0.0, 0.0, 0.0]
[0.0, 0.0, 0.0]
[0.0, 0.0, 0.0]
[0.0, 0.0, 0.0]
[0.0, 0.0, 0.0]
[0.0, 0.0, 0.0]
[0.0, 0.0, 0.0]
julia> x.data
3×10 ElasticArrays.ElasticMatrix{Float64, Vector{Float64}}:
0.254396 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.565759 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
-1.48459 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 So in terms of the memory layout it's the same as Distributions, but we can still do e.g. julia> x[3]
3-element view(::ElasticArrays.ElasticMatrix{Float64, Vector{Float64}}, :, 3) with eltype Float64:
0.0
0.0
0.0 This also allows for preallocation, meaning we need low-level functionality for @mschauer by |
Probably not. Distributions.jl implements samplers for at least some distributions that can benefit from it, so for those distributions at least, we could just call their samplers. See https://github.com/JuliaStats/Distributions.jl/tree/master/src/samplers
How can we do something like this without making TupleVectors a dependency?
Great! We were discussing having that in #129 anyways. The challenge that we have relative to Distributions is potential for combinatorial explosion. For example, we might have 3 parameterizations of a distribution, each having their own most efficient ways to draw samples. Now if we need to implement 3 |
👍
The It's no longer specific to chains, so the method should be renamed anyway.
Right, there might be one parameterization that's most efficient for random sampling, and we could have the Sampler convert to that. Also, affine transformations will cover lots of cases, just as you only need |
I really dislike the current state of this, sampling should just provide an iterator and collecting the samples or writing the samples into a container is a problem of the caller, not the callee. But we have no nice idiom for iterating over vectors and writing them into a matrix or a vector of svectors efficiently, even if that has nothing to do with sampling and only with collection. |
I think the core of the problem here is that despite that
there are no common idioms for getting both at once. The idea of iterating over vectors and having the caller aggregate them sounds good at first, but without doing this carefully each Vector created would allocate. The ideal solution to this would probably be a purely functional interface for representing array computations. LazyArrays seems close to this, but we'll need nondeterminism with data sharing, and I'm not sure whether LazyArrays can handle that. We'll also need to be able to hand-tune things in some cases, e.g. with LoopVectorization, and I'm not sure about doing that with LazyArrays. Anyway, there's the usual risk of unknowns, and also that it could become a bigger project than MeasureTheory itself. So to be less ambitious, but to have easier control over what we're doing, the best approach I can see is to do all array sampling in terms of All of this gets weird if you have, say, a vector of named tuples, and one of the entries in the named tuple is an array. By default those "inner" arrays would all be separately allocated. TupleVectors helps with this using the array-of-arrays representation. So if you have a TupleVector where one element is |
@mschauer does this address your concerns? It's not clear to me what "I really dislike the current state of this" refers to, since I think Distributions etc has the same problem you describe. |
Yeah, that was general frustration, even more directed at |
If we implement the
Random.Sampler
interface, then we can draw IID samples from the distributions: https://docs.julialang.org/en/v1/stdlib/Random/#Generating-random-values-of-custom-types. Distributions tends to make scalar and array distributions returnArray
s, but should we instead return a vector of draws, e.g. a vector of arrays?The text was updated successfully, but these errors were encountered: