-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FR] PredictNumItersNeeded()
1.4 correction factor
#1848
Comments
PredictNumItersNeeded()
1.4 correction factor
the multiplier is applied when we've run for less than min time to figure out how many iterations we need to get over the minimum time. we could go with a smaller multiplier, but it will take longer to get up to the minimum time, which (i believe) nets out at the same amount of computation at the end of the day. |
The confusing point is that the library will (internally) aim to run the benchmark for "min_time" + 40%. What would happen if the estimation is wrong? Imagine an extreme situation where the first iteration(s) have very low performance due to a cold cache, but it is very fast after that. What would be the implications of setting this number to 1.0? Would the function be called several times until the "min_time" is reached? |
if it was set to 1.0 then we would never hit mintime. is it true we'd run for min_time + 40%? i think if the first run we try runs longer than mintime then we'd be done. the only time the multiplier would kick in is if an attempt runs for shorter than that time, at which point we'd increase the time of the next run by 40%. the only time we'd run for a time mintime * 1.4 is if the initial run is almost-but-not-quite mintime. |
As benchmark/src/benchmark_runner.cc Line 317 in 08fdf6e
So yes, we do need that fudge factor, and yes, we clearly can run for 1.4x of the min time. I'm guessing the factor of 1.4 is "derived" from statistical distribution of the iteration timings. |
According to my calculations, in step 4 you would run somewhere between Since Let's follow the example that we fall slightly short of the min time, 0.9x of the min time. Without any correction factor, we could just hit the 1.0x of min time with (roughly) |
Right.
But you do see the problem, right? We will then have spent 1.9x min_time. |
Very quick-n-dirty back of the napkin proof: julia codeusing Unitful
using Distributions
using Random
MIN_TIME = 1u"s"
IDEAL_NUM_ITERATIONS = 1000
ITERATION_MEAN_TIME = MIN_TIME / IDEAL_NUM_ITERATIONS
ITERATION_CoV = 0.01
ITERATION_STDEV = ITERATION_CoV * ITERATION_MEAN_TIME
ITERATION_STEP = 10
# TIME_FUDJE = 1.1
SYSTEM_BACKGROUND_JITTER_LIKELYHOOD = 0.25
SYSTEM_BACKGROUND_JITTER_RUNTIME_IMPACT = 0.10
function run_test(;TIME_FUDJE)
iterations_total = missing
time_total = missing
iterations = missing
time = missing
while true
prev_iterations = iterations
if iterations isa Missing
iterations = 1
elseif (time / MIN_TIME) <= (ITERATION_STEP/100)
iterations *= ITERATION_STEP
else
multiplier = MIN_TIME * TIME_FUDJE / max(time, 1e-9u"s")
@show multiplier
iterations *= multiplier
end
iterations = convert(Int64, ceil(iterations))
@assert (prev_iterations isa Missing) || iterations > prev_iterations
ds = Bernoulli(SYSTEM_BACKGROUND_JITTER_LIKELYHOOD)
di = Normal(ustrip(u"s", ITERATION_MEAN_TIME), ustrip(u"s", ITERATION_STDEV))
background_jitter_scale = [ (!b ? 1 : (1 + SYSTEM_BACKGROUND_JITTER_RUNTIME_IMPACT)) for b in rand(ds, iterations)]
iteration_times = background_jitter_scale .* (rand(di, iterations).*u"s")
time = sum(iteration_times)
time_total = sum(skipmissing([time, time_total]))
iterations_total = sum(skipmissing([iterations, iterations_total]))
if time > MIN_TIME
break
end
end
@assert time > MIN_TIME
return (iterations_total, time_total)
end
@show run_test(TIME_FUDJE=1.4)
@show run_test(TIME_FUDJE=1.0)
So NOTE: that model is not meant to be authoritative. |
Some plotting: julia codemodule MyUnits
using Unitful
#export pt, mm, cm
@unit fudgeFactor "×" FudgeFactor 1 false
@unit iterations "iters" Iterations 1 false
#const mm = u"mm"
#const cm = u"cm"
Unitful.register(@__MODULE__)
function __init__()
return Unitful.register(@__MODULE__)
end
end
using Unitful
using Distributions
using Random
using Measurements
using Plots
plotlyjs()
MIN_TIME = 1u"s"
IDEAL_NUM_ITERATIONS = 100
ITERATION_MEAN_TIME = MIN_TIME / IDEAL_NUM_ITERATIONS
ITERATION_CoV = 1u"percent"
ITERATION_STDEV = ITERATION_CoV * ITERATION_MEAN_TIME
ITERATION_STEP = 10
AUTHORITATIVE_TIME = 10u"percent"
SYSTEM_BACKGROUND_JITTER_LIKELYHOOD = 25u"percent"
SYSTEM_BACKGROUND_JITTER_RUNTIME_IMPACT = 0.10
NUM_REPETITIONS = 1000
function run_test_repetition(;TIME_FUDJE)
iterations_total = missing
time_total = missing
iterations = missing
time = missing
while true
prev_iterations = iterations
if iterations isa Missing
iterations = 1
elseif ((time / MIN_TIME)*100u"percent") <= AUTHORITATIVE_TIME
iterations *= ITERATION_STEP
else
multiplier = MIN_TIME * TIME_FUDJE / max(time, 1e-9u"s")
iterations *= multiplier
end
iterations = convert(Int64, ceil(iterations))
@assert (prev_iterations isa Missing) || iterations > prev_iterations
ds = Bernoulli(ustrip(u"percent", SYSTEM_BACKGROUND_JITTER_LIKELYHOOD)/100)
di = Normal(ustrip(u"s", ITERATION_MEAN_TIME), ustrip(u"s", ITERATION_STDEV))
background_jitter_scale = [ (!b ? 1 : (1 + SYSTEM_BACKGROUND_JITTER_RUNTIME_IMPACT)) for b in rand(ds, iterations)]
iteration_times = background_jitter_scale .* (rand(di, iterations).*u"s")
time = sum(iteration_times)
iterations_total = sum(skipmissing([iterations, iterations_total]))
time_total = sum(skipmissing([time, time_total]))
if time > MIN_TIME
break
end
end
@assert time > MIN_TIME
return (iterations_total, time_total)
end
function run_test(TIME_FUDJE)
v = [run_test_repetition(TIME_FUDJE=TIME_FUDJE) for r in 1:NUM_REPETITIONS]
all_iterations = getfield.(v, 1)
all_times = getfield.(v, 2)
iterations_mean = mean(all_iterations)
time_mean = mean(all_times)
iterations_std = std(all_iterations; mean=iterations_mean)
time_std = std(all_times; mean=time_mean)
return (measurement(iterations_mean, iterations_std),
measurement(time_mean, time_std))
end
FUDGE_MIN = 1.0
FUDGE_MAX = 1.5
FUDGE_STEP = 0.01
TIME_FUDJES = collect(range(FUDGE_MIN,stop=FUDGE_MAX,length=1+ceil(Int64, (FUDGE_MAX-FUDGE_MIN)/FUDGE_STEP)))
RESULTS = run_test.(TIME_FUDJES)
TIME_FUDJES = TIME_FUDJES.*u"fudgeFactor"
# ITERATIONS = getfield.(RESULTS, 1).*u"iterations"
TIMES = getfield.(RESULTS, 2)
plot(TIME_FUDJES, TIMES, xlabel="fudge factor", ylabel="total time", size=(800,600)) Without touching the rest of the params there ^, but with 1ms iterations (x 1000 repetitions), we get: Maybe the factor of 1.4 is too much, but i'm certain that 1.0 is just wrong. |
And some more fancy-ness (most things are now sampled from distributions julia codemodule MyUnits
using Unitful
@unit fudgeFactor "×" FudgeFactor 1 false
@unit iterations "iters" Iterations 1 false
Unitful.register(@__MODULE__)
function __init__()
return Unitful.register(@__MODULE__)
end
end
using Base.Threads
using Base.Iterators
using Unitful
using Distributions
using Random
using Measurements
using ProgressMeter
using Plots
plotlyjs()
MIN_TIME = 1u"s"
ITERATION_STEP = 10
AUTHORITATIVE_TIME = 10u"percent"
#dist_ITERATION_MEAN_TIME = LogUniform(ustrip(u"s", 1u"μs"), ustrip(u"s", 10u"s"))
#dist_ITERATION_CoV = LogUniform(0.1/100, 10/100)
#dist_SYSTEM_BACKGROUND_JITTER_LIKELYHOOD = LogUniform(0.1/100, 10/100)
#dist_SYSTEM_BACKGROUND_JITTER_RUNTIME_IMPACT = LogUniform(0.1/100, 10/100)
dist_ITERATION_MEAN_TIME = censored(LogNormal(0, 2.5), ustrip(u"s", 1u"μs"), ustrip(u"s", 1u"s"))
dist_ITERATION_CoV = censored(LogNormal(1, 1), 0.1/100, 10/100)
dist_SYSTEM_BACKGROUND_JITTER_LIKELYHOOD = censored(LogNormal(2.5, 1), 0.1/100, 10/100)
dist_SYSTEM_BACKGROUND_JITTER_RUNTIME_IMPACT = censored(LogNormal(2.5, 1), 0.1/100, 10/100)
function run_test_repetition_impl(;TIME_FUDJE,
ITERATION_MEAN_TIME,
ITERATION_STDEV,
SYSTEM_BACKGROUND_JITTER_LIKELYHOOD,
SYSTEM_BACKGROUND_JITTER_RUNTIME_IMPACT)
dist_SYSTEM_BACKGROUND_JITTER = Bernoulli(SYSTEM_BACKGROUND_JITTER_LIKELYHOOD)
dist_ITERATION_TIME = Normal(ITERATION_MEAN_TIME, ITERATION_STDEV)
iterations_total = missing
time_total = missing
iterations = missing
time = missing
while true
prev_iterations = iterations
if iterations isa Missing
iterations = 1
elseif ((time / MIN_TIME)*100u"percent") <= AUTHORITATIVE_TIME
iterations *= ITERATION_STEP
else
multiplier = MIN_TIME * TIME_FUDJE / max(time, 1e-9u"s")
iterations *= multiplier
end
iterations = convert(Int64, ceil(iterations))
@assert (prev_iterations isa Missing) || iterations > prev_iterations
background_jitter_scale = [ (!b ? 1 : (1 + SYSTEM_BACKGROUND_JITTER_RUNTIME_IMPACT)) for b in rand(dist_SYSTEM_BACKGROUND_JITTER, iterations)]
iteration_times = background_jitter_scale .* (rand(dist_ITERATION_TIME, iterations).*u"s")
time = sum(iteration_times)
iterations_total = sum(skipmissing([iterations, iterations_total]))
time_total = sum(skipmissing([time, time_total]))
if time > MIN_TIME
break
end
end
@assert time > MIN_TIME
real_iteration_mean_time = time_total / iterations_total
ideal_iteration_count = ceil(Int64, MIN_TIME / real_iteration_mean_time)
ideal_run_time = real_iteration_mean_time * ideal_iteration_count
time_overspent = time_total / ideal_run_time
return (iterations_total, time_total, time_overspent)
end
NUM_INNER_REPETITIONS = 100
function run_test_repetition(;TIME_FUDJE)
ITERATION_MEAN_TIME = rand(dist_ITERATION_MEAN_TIME)
ITERATION_CoV = rand(dist_ITERATION_CoV)
SYSTEM_BACKGROUND_JITTER_LIKELYHOOD = rand(dist_SYSTEM_BACKGROUND_JITTER_LIKELYHOOD)
SYSTEM_BACKGROUND_JITTER_RUNTIME_IMPACT = rand(dist_SYSTEM_BACKGROUND_JITTER_RUNTIME_IMPACT)
ITERATION_STDEV = ITERATION_CoV * ITERATION_MEAN_TIME
v = [run_test_repetition_impl(TIME_FUDJE=TIME_FUDJE,
ITERATION_MEAN_TIME=ITERATION_MEAN_TIME,
ITERATION_STDEV=ITERATION_STDEV,
SYSTEM_BACKGROUND_JITTER_LIKELYHOOD=SYSTEM_BACKGROUND_JITTER_LIKELYHOOD,
SYSTEM_BACKGROUND_JITTER_RUNTIME_IMPACT=SYSTEM_BACKGROUND_JITTER_RUNTIME_IMPACT)
for r in 1:NUM_INNER_REPETITIONS]
next!(p)
return v
end
NUM_REPETITIONS = 10^5
function run_test(TIME_FUDJE)
chunk_size = max(1, ceil(Int64, NUM_REPETITIONS // nthreads()))
data_chunks = partition(1:NUM_REPETITIONS, chunk_size)
tasks = map(data_chunks) do chunk
@spawn begin
state = []
for x in chunk
push!(state, run_test_repetition(TIME_FUDJE=TIME_FUDJE))
end
return state
end
end
v = fetch.(tasks)
v = vcat(v...)
v = vcat(v...)
all_iterations = getfield.(v, 1)
all_times = getfield.(v, 2)
all_times_overspent = getfield.(v, 3)
iterations_mean = mean(all_iterations)
time_mean = mean(all_times)
time_overspent_mean = mean(all_times_overspent)
iterations_std = std(all_iterations; mean=iterations_mean)
time_std = std(all_times; mean=time_mean)
time_overspent_std = std(all_times_overspent; mean=time_overspent_mean)
return (measurement(iterations_mean, iterations_std),
measurement(time_mean, time_std),
measurement(time_overspent_mean, time_overspent_std))
end
FUDGE_MIN = 1.0
FUDGE_MAX = 1.5
FUDGE_STEP = 0.01
TIME_FUDJES = collect(range(FUDGE_MIN,stop=FUDGE_MAX,length=1+ceil(Int64, (FUDGE_MAX-FUDGE_MIN)/FUDGE_STEP)))
chunk_size = max(1, ceil(Int64, length(TIME_FUDJES) // nthreads()))
data_chunks = partition(TIME_FUDJES, chunk_size)
p = Progress(NUM_REPETITIONS*length(TIME_FUDJES); dt=1)
tasks = map(data_chunks) do chunk
@spawn begin
state = []
for x in chunk
push!(state, run_test(x))
end
return state
end
end
RESULTS = fetch.(tasks)
RESULTS = vcat(RESULTS...)
TIME_FUDJES = TIME_FUDJES.*u"fudgeFactor"
# ITERATIONS = getfield.(RESULTS, 1).*u"iterations"
TIMES = getfield.(RESULTS, 2)
TIMES_OVERSPENT = getfield.(RESULTS, 3)
scatter(TIME_FUDJES, Measurements.value.(TIMES_OVERSPENT), #series_annotations = ann,
xticks=TIME_FUDJES, yticks=0:0.005:3, xlabel="fudge factor", ylabel="time overspenditure (factor)", size=(1920,1080)) EDIT: the distributions there ^ are wrong, these might be better: (graphs not updated)using SpecialFunctions
function lognormal_σ(;μ,v,num_σ=3)
p = erf(num_σ/sqrt(2))
return -(μ - log(v))/(sqrt(2)*erfinv(2*p-1))
end
# 0σ = 10ms, +3σ = 1s
# NOTE: in reality, mean should be 1ms, but that makes this simulation much slower...
μ=log(10/1000)
dist_ITERATION_MEAN_TIME = censored(LogNormal(μ, lognormal_σ(μ=μ,v=1000/1000)), ustrip(u"s", 1u"μs"), ustrip(u"s", 1u"s"))
# 0σ = 1%, +3σ = 25%
μ=log(1/100)
dist_ITERATION_CoV = censored(LogNormal(μ, lognormal_σ(μ=μ,v=25/100)), 0.1/100, 25/100)
# 0σ = 10%, +3σ = 25%
μ=log(10/100)
dist_SYSTEM_BACKGROUND_JITTER_LIKELYHOOD = censored(LogNormal(μ, lognormal_σ(μ=μ,v=25/100)), 0.1/100, 50/100)
# 0σ = 5%, +3σ = 25%
μ=log(5/100)
dist_SYSTEM_BACKGROUND_JITTER_RUNTIME_IMPACT = censored(LogNormal(μ, lognormal_σ(μ=μ,v=25/100)), 0.1/100, 50/100) Again, i'm not quite confident that my "statistical model" is sufficiently precise for anything, |
I agree that the value 1.0 falls into the theoretical (non-real) world. Looking at your plots, I also agree that something in the 1.02 - 1.10 range might be more sensible. In any case, I think the best option could be to have an environment variable (or a program argument like |
@dmah42 question: does google retain benchmark We don't encode not nearly enough info into that json (we are missing
I'm guessing you'll want to censor Out of all the benchmarks that are tracked, have all of them been run at least once since #1836? Is that possible? |
@dmah42 (consider my previous question to be a theoretical question, not a request.)
That feels very much placebo-y, as far as i can tell, even with that fudge at 1.1, we still spend 1.5x the time julia codemodule MyUnits
using Unitful
@unit fudgeFactor "×" FudgeFactor 1 false
@unit iterations "iters" Iterations 1 false
Unitful.register(@__MODULE__)
function __init__()
return Unitful.register(@__MODULE__)
end
end
using Base.Threads
using Base.Iterators
using Unitful
using Distributions
using Random
using Measurements
using ProgressMeter
using Plots
using SpecialFunctions
using DataFrames
plotlyjs()
MIN_TIME = 1u"s"
#dist_ITERATION_MEAN_TIME = LogUniform(ustrip(u"s", 1u"μs"), ustrip(u"s", 10u"s"))
#dist_ITERATION_CoV = LogUniform(0.1/100, 10/100)
#dist_SYSTEM_BACKGROUND_JITTER_LIKELYHOOD = LogUniform(0.1/100, 10/100)
#dist_SYSTEM_BACKGROUND_JITTER_RUNTIME_IMPACT = LogUniform(0.1/100, 10/100)
function lognormal_σ(;μ,v,num_σ=3)
p = erf(num_σ/sqrt(2))
return -(μ - log(v))/(sqrt(2)*erfinv(2*p-1))
end
# 0σ = 1ms, +3σ = 1s
# NOTE: in reality, mean should be 1ms, but that makes this simulation much slower...
μ=log(1/1000)
dist_ITERATION_MEAN_TIME = censored(LogNormal(μ, lognormal_σ(μ=μ,v=1000/1000)), ustrip(u"s", 1u"μs"), ustrip(u"s", 1u"s"))
# 0σ = 10%, +3σ = 25%
μ=log(10/100)
dist_ITERATION_CoV = censored(LogNormal(μ, lognormal_σ(μ=μ,v=25/100)), 0.1/100, 25/100)
struct BenchmarkResult
AUTHORITATIVE_TIME::Float64
ITERATION_COUNT_MULTIPLIER::Int64
TIME_FUDJE::Float64
time_overspent::Any
end
function run_test_repetition_impl(; AUTHORITATIVE_TIME,
ITERATION_COUNT_MULTIPLIER,
TIME_FUDJE)
ITERATION_MEAN_TIME = rand(dist_ITERATION_MEAN_TIME)
ITERATION_CoV = rand(dist_ITERATION_CoV)
ITERATION_STDEV = ITERATION_CoV * ITERATION_MEAN_TIME
dist_ITERATION_TIME = Normal(ITERATION_MEAN_TIME, ITERATION_STDEV)
iterations_total = missing
time_total = missing
iterations = missing
time = missing
while true
prev_iterations = iterations
if iterations isa Missing
iterations = 1
elseif (time / MIN_TIME) <= AUTHORITATIVE_TIME
iterations *= ITERATION_COUNT_MULTIPLIER
else
multiplier = MIN_TIME * TIME_FUDJE / max(time, 1e-9u"s")
iterations *= multiplier
end
iterations = convert(Int64, ceil(iterations))
@assert (prev_iterations isa Missing) || iterations > prev_iterations
iteration_times = rand(dist_ITERATION_TIME,iterations).*u"s"
time = sum(iteration_times)
iterations_total = sum(skipmissing([iterations, iterations_total]))
time_total = sum(skipmissing([time, time_total]))
if time >= MIN_TIME
break
end
end
@assert time > MIN_TIME
real_iteration_mean_time = time_total / iterations_total
ideal_iteration_count = ceil(Int64, MIN_TIME / real_iteration_mean_time)
ideal_run_time = real_iteration_mean_time * ideal_iteration_count
time_overspent = time_total / ideal_run_time
return time_overspent
end
NUM_BENCHMARK_REPETITIONS = 10^6
function run_test_configuration(; AUTHORITATIVE_TIME,
ITERATION_COUNT_MULTIPLIER,
TIME_FUDJE)
chunk_size = max(1, ceil(Int64, NUM_BENCHMARK_REPETITIONS // nthreads()))
data_chunks = partition(1:NUM_BENCHMARK_REPETITIONS, chunk_size)
tasks = map(data_chunks) do chunk
@spawn begin
state = []
for x in chunk
push!(state, run_test_repetition_impl(
AUTHORITATIVE_TIME=AUTHORITATIVE_TIME,
ITERATION_COUNT_MULTIPLIER=ITERATION_COUNT_MULTIPLIER,
TIME_FUDJE=TIME_FUDJE))
next!(p)
end
return state
end
end
v = fetch.(tasks)
v = vcat(v...)
v = vcat(v...)
time_overspent = v
time_overspent_mean = mean(time_overspent)
time_overspent_std = std(time_overspent; mean=time_overspent_mean)
time_overspent = measurement(time_overspent_mean, time_overspent_std)
#time_overspent = time_overspent_mean + 3*time_overspent_std
v = BenchmarkResult(AUTHORITATIVE_TIME,
ITERATION_COUNT_MULTIPLIER,
TIME_FUDJE,
time_overspent)
return v
end
function run_test(CONFIGURATIONS)
chunk_size = max(1, ceil(Int64, length(CONFIGURATIONS) // nthreads()))
data_chunks = partition(CONFIGURATIONS, chunk_size)
tasks = map(data_chunks) do chunk
@spawn begin
state = []
for x in chunk
push!(state, run_test_configuration(AUTHORITATIVE_TIME=x[1],ITERATION_COUNT_MULTIPLIER=x[2],TIME_FUDJE=x[3]))
end
return state
end
end
v = fetch.(tasks)
v = vcat(v...)
v = vcat(v...)
return v
end
#AUTHORITATIVE_TIME = [10/100]
#ITERATION_COUNT_MULTIPLIER = [10]
#TIME_FUDGE = [1.4]
AUTHORITATIVE_TIME = [10/100]
ITERATION_COUNT_MULTIPLIER = [10]
TIME_FUDGE = [1.0,1.1,1.4]
#AUTHORITATIVE_TIME = 1/100:1/100:5/100
#ITERATION_COUNT_MULTIPLIER = 2:1:20
#TIME_FUDGE = 1.0:0.01:1.1
CONFIGURATIONS = [(x,y,z) for x in AUTHORITATIVE_TIME
for y in ITERATION_COUNT_MULTIPLIER
for z in TIME_FUDGE]
NUM_CONFIGURATIONS = length(CONFIGURATIONS)
p = Progress(NUM_BENCHMARK_REPETITIONS*NUM_CONFIGURATIONS; dt=1)
R = run_test(CONFIGURATIONS)
X = [r.ITERATION_COUNT_MULTIPLIER for r in R]
Y = [r.AUTHORITATIVE_TIME for r in R]
Y2 = [r.TIME_FUDJE for r in R]
Z = [r.time_overspent for r in R]
df = DataFrame(ITERATION_COUNT_MULTIPLIER=X, AUTHORITATIVE_TIME=Y, TIME_FUDJE=Y2, time_overspent=Z)
sort!(df, [:time_overspent])
sort!(df, [order(:time_overspent), order(:AUTHORITATIVE_TIME), order(:ITERATION_COUNT_MULTIPLIER, rev=true), order(:TIME_FUDJE, rev=true)])
@show first(df, 100)
@assert false
Z = Measurements.value.(Z)
#heatmap(X,Y,Z; size=(800,600),
# xlabel="ITERATION_COUNT_MULTIPLIER (x)", ylabel="AUTHORITATIVE_TIME (%)", zlabel="time_overspent (x)",
# xticks=ITERATION_COUNT_MULTIPLIER, yticks=AUTHORITATIVE_TIME)
scatter(X,Y,Z; zcolor=Z, size=(800,600),
xlabel="ITERATION_COUNT_MULTIPLIER (x)", ylabel="AUTHORITATIVE_TIME (%)", zlabel="time_overspent (x)",
xticks=ITERATION_COUNT_MULTIPLIER, yticks=AUTHORITATIVE_TIME)
#scatter(Y2,Z; size=(800,600),
# xlabel="TIME_FUDGE", ylabel="time_overspent (x)",
# xticks=TIME_FUDGE)
We could also lower |
the right solution to this is to run until we reduce the confidence interval for the iterations within a run to a particular threshold (or max time to avoid infinite runs). anything other than this is just fine-tuning and is unlikely to scale across all the various use-cases. if someone wants to reduce 1.4 to 1.2 or 1.1 or whatever then create a PR but i really don't think it matters all that much (and thank you @LebedevRI for basically proving that :D ). |
Yeah, that's my general understanding, too. I haven't yet tried modelling said different convergence approach. |
Hello again, I was playing with the internals of the library, and I think I now understand how it works. Let's use an example: @LebedevRI That fits with what you mentioned here Is this correct? If so, isn't this approach very inefficient? We have to choose to overestimate or start from scratch a lot of times |
yes that is correct, we start from scratch. i suppose we could change it to do an incremental run and check the running total of iterations and time instead. it would be a reasonably complex change but would improve the efficiency. efficiency (in terms of how much we run) hasn't been a concern as we've focused on making the inner timing loop as fast as possible to get out of the way of the code under test. |
That is what i said, yes. I'm still looking at that model, i need better estimates, but i think we might
Can we, though? I was under distinct impression that the current way it's done is intentional, |
As an idea, why not estimate the number of iterations to get to 10% of the min time? The way it is now, you multiply the iterations by 10 until surpassing the 10% threshold. Let's say you are between 2-9% of your min time. Then, the multiplier will be 10, and you will move to the range of 20 to 90% of your min time. Instead of multiplying by 10 "blindly", what about making an estimation of the iterations to get to, let's say 15%? In that case, you "wasted" only ~15% of CPU time, but it is big enough to be considered "significant" in the current algorithm and make a better prediction later (reducing the 40% overestimation to something smaller). Maybe not the best solution, but I think this would alleviate the problem and should be easy to implement. |
it's intentional in that we built it that way. but like i said above, if efficiency had been a concern we probably would have built it differently. this was the simplest approach at the time that worked. why do you think the consecutive aspect matters, or reporting from a single run matters? it shouldn't, right? |
I don't think it is really relevant in terms of the report, but it is in terms of computation time. Not only about the CPU time (and energy) spent without a real outcome but also the human time you need to wait to have the full report. For example, below you can see the output of a little example to prove my point.
You can see that for having a 1s benchmark (actually ~1.4s), we had to run each function up to 2.50s. That's a 150% overhead!
|
As i have said, i'm still looking at the current approach, we may be able to do better without any drastic changes. |
@LebedevRI that's a very good point, in that case the aim should be to reduce the amount of times the function I implemented this idea:
3-phase approach to determine the number of iterations needed.
New implementation of `PredictNumItersNeeded()`IterationCount BenchmarkRunner::PredictNumItersNeeded(
const IterationResults& i) const {
// Try a 3-phase approach to determine the number of iterations needed.
// 1. Multiply by 10 while we are below 1% of the min time -> semi-significant
// time.
// 2. Find the iterations needed to reach 10% of the min time (+5% buffer) ->
// significant time.
// 3. Find the iterations to reach 100% of the min time + (5% buffer)
static constexpr auto kSemiSignificantTime = 0.01;
static constexpr auto kSignificantTime = 0.10;
static constexpr auto kSafetyBuffer = 0.05;
const auto significance = i.seconds / GetMinTimeToApply();
// See how much "iters" should be increased by.
double multiplier = 1.0;
if (significance < kSemiSignificantTime) {
// 1. Multiply by 10 while we are below 1% of the min time.
multiplier = 10.0;
} else if (significance < kSignificantTime) {
// 2. Find the number of iterations needed to reach 10% of the min time.
// Use +5% buffer to avoid running short.
static constexpr auto scale = kSignificantTime + kSafetyBuffer;
multiplier = (GetMinTimeToApply() * scale) / std::max(i.seconds, 1e-9);
} else {
// 3. Find the number of iterations to get to the 100% of the min time.
// Use +5% buffer to avoid running short and needing to run again.
static constexpr auto scale = 1.0 + kSafetyBuffer;
multiplier = (GetMinTimeToApply() * scale) / std::max(i.seconds, 1e-9);
}
// So what seems to be the sufficiently-large iteration count? Round up.
const IterationCount max_next_iters = static_cast<IterationCount>(
std::llround(std::max(multiplier * static_cast<double>(i.iters),
static_cast<double>(i.iters) + 1.0)));
// But we do have *some* limits though..
const IterationCount next_iters = std::min(max_next_iters, kMaxIterations);
BM_VLOG(3) << "Next iters: " << next_iters << ", " << multiplier << "\n";
return next_iters; // round up before conversion to integer.
} With the previous benchmark, it seems that the overhead is reduced from 50-150% to ~20%
I would even say that step 1 could use the same kind of approach as the other steps to reduce the amount of times that the iterations have to be recalculated |
Problem description
In the function
PredictNumItersNeeded()
there is this1.4
correction factor.This causes the time running the experiment to exceed by ~40% the time specified by
--benchmark_min_time
.Of course,
--benchmark_min_time
denotes the minimum amount of time to run the benchmark, but an overrun of 40% seems excessive.This is particularly relevant in supercomputers, where CPU time is expensive.
Suggested solution
I suggest either removing the correction factor or making it configurable (with a default value of 1.0).
Example$7039 \times 198481 = 1397107759$ , $8775 \times 160984 = 1412634600$ , ...
As shown in the following output (executed with
--benchmark_min_time=1s
) the real execution time is ~1.4s:When executing the same code with$1/1.4 \simeq 0.71$ ), the execution times are much closer to 1s: $7681 \times 134530 = 1033324930$ , $9265 \times 103410 = 958093650$ , ...
--benchmark_min_time=0.71s
(The text was updated successfully, but these errors were encountered: