Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImageIO Benchmarks #21

Open
IanButterworth opened this issue Feb 28, 2021 · 8 comments
Open

ImageIO Benchmarks #21

IanButterworth opened this issue Feb 28, 2021 · 8 comments

Comments

@IanButterworth
Copy link
Member

Some benchmarks for FileIO save and load functions with the different Image IO backends, log x axis, because ImageMagick can be a lot slower.

All defaults, no kwargs.

This is with
FileIO JuliaIO/FileIO.jl#290
ImageMagick v0.7.6
QuartzImageIO v0.7.3
ImageIO v0.5.1
TiffImages v0.2.2
PNGFiles v0.3.6

image

Benchmark code

import Pkg
Pkg.develop(path=joinpath(@__DIR__,"..","FileIO.jl"))
Pkg.develop(path=joinpath(@__DIR__,"..","TiffImages.jl"))
Pkg.add.(["ImageCore", "BenchmarkTools", "DataFrames", "CSV"])

using FileIO, ImageCore, BenchmarkTools, DataFrames, CSV

res = DataFrame(backend=String[], file_fmt=String[], img_size=NTuple{2,Int}[], img_eltype=Type[],
                    save_first=Union{Missing,Float64}[], save=Float64[], load_first=Union{Missing,Float64}[], load=Float64[])
tmp, _ = mktemp()
for backend in ["ImageMagick", "QuartzImageIO", "ImageIO"]
    Pkg.add(backend)
    @info backend
    for ext in ["tiff","png"]
        fpath = string(tmp, ".", ext)
        for typ in [Gray{N0f8}, RGB{N0f8}]
            for n in 1:3
                println("$typ $(10^n)x$(10^n) ===")
                img = rand(typ, 10^n, 10^n)
                backend = backend
                file_fmt = ext
                img_size = size(img)
                img_eltype = typ
                save_first = if n == 1
                    @elapsed FileIO.save(fpath, img)
                else
                    missing
                end
                b = @benchmark FileIO.save($fpath, $img)
                save = median(b).time

                load_first = if n == 1
                    @elapsed img2 = FileIO.load(fpath)
                else
                    missing
                end
                b = @benchmark FileIO.load($fpath)
                load = median(b).time
                push!(res, (backend, file_fmt, img_size, img_eltype, save_first, save, load_first, load))
            end
        end
    end
    Pkg.rm(backend)
end
CSV.write("results.csv", res)

cc. @tlnagy @timholy @Drvi

@timholy
Copy link
Member

timholy commented Mar 1, 2021

That's awesome.

Just this morning I discovered via profiling that the biggest contributor to ImageMagick's slowness for small files is extracting the pixel depth, which is a single ccall. Crazy. There might of course be a way around that, but I'm not in a rush; having an implementation that works and we can change is so important, and the benchmarks above are really exciting!

For the smallest images, JuliaIO/FileIO.jl#295 will improve matters even further.

It's noteworthy that for TIFF, 10x10 and 100x100 are almost the same speed. That might merit some investigation, eventually.

@johnnychen94
Copy link
Member

johnnychen94 commented Feb 3, 2022

With my recent JpegTurbo.jl development, I noticed that only benchmarking with randomly generated images can be quite misleading; many image compression tricks work only when there exist overlaps between meaningful blocks and patches. Thus I would suggest adding more test images of the same size and plotting the median result of those samples when we regenerate the graphs.

@IanButterworth
Copy link
Member Author

Absolutely. Note that PNGFiles.jl now has automated CI benchmarking set up JuliaIO/PNGFiles.jl#52

i.e. see the report here that asserted there was no performance change in this PR JuliaIO/PNGFiles.jl#51 (comment)

But that is currently using random images, and @Drvi already suggested they should be replaced with test images.

Perhaps we should set the same thing up for ImageIO with TestImages vs. each backend

@johnnychen94
Copy link
Member

We just need to add a few high-resolution test images to TestImages.jl... Among those widely used test image dataset, I know there's DIV2k but it's licensed for academic purposes only, do you have any suggestions on where we can find such test images?

@IanButterworth
Copy link
Member Author

NASA?

@tlnagy
Copy link
Contributor

tlnagy commented Feb 4, 2022

It's been a goal of mine for awhile to add automated CI benchmarking to TiffImages (ref tlnagy/TiffImages.jl#53) but I'm super busy for the foreseeable future. Does it make more sense for the benchmarks to live at the individual package level or here in ImageIO? Maybe both?

But I agree with @johnnychen94 that it makes sense to use real images in addition to randomly-generated ones.

@johnnychen94
Copy link
Member

johnnychen94 commented Feb 4, 2022

Performance benchmarks are used for two purposes: 1) test against other similar packages, which can usually be written in other languages, and 2) regression test

For benchmark CI such as JuliaIO/PNGFiles.jl#52, it is used to track if PRs/releases are slowing things down.

For benchmark scripts like this issue, JuliaIO/JpegTurbo.jl#15, and the one @timholy created in https://github.com/JuliaImages/image_benchmarks, it's used for advertising purposes to convince people that we're doing great stuff. Also to prepare for JuliaImages 1.0, we definitely need such benchmarks.

Does it make more sense for the benchmarks to live at the individual package level or here in ImageIO? Maybe both?

Unless we move all packages into one gigantic monorepo, benchmark CIs for regression tests should still be put together with the source codes.

On the other hand, I prefer to have the "benchmark against other frameworks" codes stay in one repo as @timholy already did. I haven't yet committed to https://github.com/JuliaImages/image_benchmarks because the codes there are not very extensible/flexible in the sense that it's not always easy to switch on/off certain cases. Thus if we keep adding more benchmark cases there, we'll soon reach a status that it takes too long to get the result of interest. This is quite similar to the DemoCards I made for https://juliaimages.org/stable/examples/; that we can easily create an ad-hoc version of benchmark/demo scripts that works at first, but it's always a pain to convince/guide others to contribute benchmark/demo cases using the ad-hoc undocumented framework.

Some discussion on this can be found in JuliaImages/Images.jl#947 and I also have a very draft experiment in johnnychen94/Workflows.jl#1, but I certainly don't have enough time to finish it... Maybe we can propose this as this year's GSoC project by updating https://julialang.org/jsoc/gsoc/images/?

@timholy
Copy link
Member

timholy commented Feb 4, 2022

I'm supportive of changes to the architecture of image_benchmarks. That said, in the long run I expect that image_benchmarks will have a similar fate as Julia's own "microbenchmarks" (repo: https://github.com/JuliaLang/Microbenchmarks): people want them, lots of folks who have different favorite image-processing suites will request that we compare their favorite framework, but nobody wants to maintain them. Building many different languages' suites on a single machine is a major pain in the neck, and I have delayed doing this precisely because it's no fun. But for long-term growth it's important in our current phase. (I don't really expect to keep them going for 10 years, though; realistically I might imagine maintaining them for a couple of years.)

Consequently, anything that you want to live "forever" and be primarily focused on within-Julia performance I would put elsewhere. I'm happy to rename that repo if that would help, e.g., cross-suite-benchmarks or something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants