Skip to main content

Benchmarks

Numbers first, then everything else.

At 5,000 simultaneous bouncing bullets, the serial solver costs 174ms per frame. The parallel solver costs 5ms. That's not a rounding error, that's a 32x difference. On a game with a 16ms frame budget, the serial solver stops being viable well before that point. The parallel solver barely notices.

These aren't cherry-picked. The data below is the raw output of Vetra's own benchmarker, captured on a live Roblox server with 64 shards, 120 frames sampled per cell.


The Setup

Three behavior profiles, tested across 13 bullet counts each. Every cell is 120 Heartbeat frames sampled after a 30-frame warmup, with a keep-alive callback that immediately respawns any terminated bullet so the active count stays stable throughout.

  • Travel-only, bullets fly straight with no callbacks, no bounce resolution. Pure raycast throughput.
  • Bounce (no callback), bullets bounce up to 8 times with no CanBounceFunction. The bounce logic runs but no Lua callback crosses the thread boundary.
  • Bounce (callback), identical, but with a CanBounceFunction that returns true. Callbacks must flush on the main thread after each parallel pass, so this measures the real cost of user-facing code.
  • Pierce (callback), bullets pierce up to 5 times with a CanPierceFunction. Same callback flush model as bounce.

Frame time is wall-clock Heartbeat duration. Throughput is average active casts x (1000 / avg ms). Treat everything as relative, not absolute, Roblox's server scheduler adds its own noise floor.


Travel-Only

No callbacks. No bounce resolution. Just bullets moving through space, each firing one raycast per frame. This is the theoretical ceiling of what parallel can do, all work is embarrassingly parallel, nothing needs to touch the main thread.

BulletsSerial avgParallel avgRatioParallel throughput
104.767 ms4.334 ms0.91x2,307 steps/s
256.874 ms4.317 ms0.63x5,791 steps/s
5010.345 ms4.157 ms0.40x12,028 steps/s
10010.187 ms4.169 ms0.41x23,988 steps/s
20014.041 ms4.159 ms0.30x48,086 steps/s
50025.885 ms4.159 ms0.16x120,215 steps/s
1,00045.2 ms4.168 ms0.09x239,899 steps/s
2,00074.321 ms4.206 ms0.06x475,481 steps/s
5,000174.671 ms5.453 ms0.03x916,962 steps/s
7,500,8.581 ms,874,040 steps/s
10,000,6.617 ms,1,511,333 steps/s
15,000,7.888 ms,1,901,513 steps/s
20,000,10.286 ms,1,944,472 steps/s

The parallel solver's frame time is essentially flat from 25 to 2,000 bullets, hovering between 4.1 and 4.3ms. That's the signature of work being distributed across enough cores that adding more bullets just fills unused capacity. The step from 5,000 to 20,000 bullets costs only 5ms more. At 20,000 active bullets, the parallel solver is still well within a 60fps frame budget.

The serial solver has no such ceiling. At 1,000 bullets it's already at 45ms, nearly 3x over budget. At 5,000 it's at 174ms. That's 10 frames of latency from one game system.


Bounce (No Callback)

When bullets bounce, each frame involves more work per cast: velocity reflection math, restitution, normal perturbation, corner-trap detection. But because there's no CanBounceFunction, none of that has to flush through the main thread. It all runs in parallel.

BulletsSerial avgParallel avgRatioParallel throughput
104.159 ms4.365 ms1.05x2,291 steps/s
254.168 ms4.165 ms1.00x6,003 steps/s
504.364 ms4.334 ms0.99x11,536 steps/s
1004.159 ms4.270 ms1.03x23,421 steps/s
2004.714 ms4.386 ms0.93x45,596 steps/s
5007.432 ms4.198 ms0.57x119,095 steps/s
1,0008.389 ms4.298 ms0.51x232,646 steps/s
2,00013.844 ms4.334 ms0.31x461,467 steps/s
5,00024.159 ms4.327 ms0.18x1,155,485 steps/s
7,500,4.629 ms,1,620,096 steps/s
10,000,5.265 ms,1,899,199 steps/s
15,000,7.574 ms,1,980,502 steps/s
20,000,9.390 ms,2,129,909 steps/s

Two things to notice. First: at low counts (10–100 bullets), serial and parallel are essentially tied, the overhead of Actor messaging costs as much as the work being parallelised. This is the crossover zone the docs warn about, and the data shows it clearly.

Second: at 500+ bullets, parallel breaks away hard. The serial solver hits a wall around 1,000 bullets at ~8ms and climbs to 24ms at 5,000. The parallel solver stays under 4.5ms across all of it. Even at 20,000 bouncing bullets, 9.39ms. That's real.


Bounce (Callback) and Pierce (Callback)

Adding a CanBounceFunction or CanPierceFunction means the parallel solver has to flush callback results through the main thread after each physics pass. This is the realistic profile for any production weapon, you're always going to want some gate logic.

Bounce (callback):

BulletsSerial avgParallel avgRatio
10–100~4.2 ms~4.2 ms≈1.00x
2004.391 ms4.149 ms0.95x
5006.084 ms4.176 ms0.69x
1,0008.454 ms4.161 ms0.49x
2,00012.142 ms4.260 ms0.35x
5,00023.262 ms4.164 ms0.18x
20,000,10.283 ms,

Pierce (callback):

BulletsSerial avgParallel avgRatio
10–100~4.2 ms~4.2 ms≈1.00x
2004.639 ms4.160 ms0.90x
5006.866 ms4.275 ms0.62x
1,0008.285 ms4.167 ms0.50x
2,00011.872 ms4.165 ms0.35x
5,00025.069 ms4.204 ms0.17x
20,000,9.900 ms,

The callback flush cost is nearly invisible in the data. Bounce-with-callback vs bounce-without-callback is indistinguishable at every bullet count. The parallel solver handles the main-thread sync without meaningful overhead because it's a deferred batch flush, not a per-cast round-trip.


What These Numbers Mean in Practice

A typical Roblox shooter might have 10–30 simultaneously live bullets at any moment. The serial solver is completely fine there. You'd use Vetra.new(), ship it, and never think about this page.

A bullet-hell game, a large-scale military simulation, an artillery mode, a scenario where shotguns fire 12 pellets simultaneously and 20 players are all shooting, those are different conversations. At 200+ bullets, the parallel solver is measurably faster and the margin widens every step of the way.

The honest version of the guidance in the API docs: use Vetra.new() until you feel the serial solver's cost in your profiler. Then switch to newParallel and these numbers tell you exactly what to expect.


Running the Benchmarker Yourself

The benchmarker that produced these results is included with Vetra. It's a self-contained ModuleScript you drop into your project and run once. Results print to the Output window in the same format as above.

Setup

  1. Place the VetraBenchmark ModuleScript somewhere accessible, ServerScriptService works fine.
  2. Add an ObjectValue named VetraReference as a child of the script, with its Value pointing at the Vetra ModuleScript.
  3. Require it from a Script and call :Run() inside a task.spawn:
local VetraBenchmark = require(script.Parent.VetraBenchmark)

task.spawn(function()
local Bench = VetraBenchmark.new()
Bench:Run()
end)

Configuration

VetraBenchmark.new() accepts an optional config table:

local Bench = VetraBenchmark.new({
BulletCounts = { 10, 50, 100, 500, 1000 }, -- which counts to test
SampleFrames = 120, -- Heartbeat frames sampled per cell
WarmupFrames = 30, -- frames to discard before sampling
ShardCount = 8, -- Actor shards for the parallel solver
ParallelOnlyThreshold = 500, -- skip serial above this count
Origin = Vector3.new(0, 50, 0), -- fire origin
SpreadDeg = 25, -- cone spread in degrees
})

All fields are optional, unset fields fall back to defaults. The default BulletCounts runs the full 13-step sweep from 10 to 20,000 and takes roughly 3–4 minutes.

For a quick sanity check, run a narrow sweep:

local Bench = VetraBenchmark.new({
BulletCounts = { 50, 200, 500 },
SampleFrames = 60,
ParallelOnlyThreshold = 9999, -- never skip serial
})
Bench:Run()

Custom Profiles

Pass a second argument to replace the default four profiles with your own:

local Bench = VetraBenchmark.new(nil, {
{
name = "Sniper with drag",
behavior = {
MaxDistance = 1500,
DragCoefficient = 0.003,
DragModel = "G7",
MaxPierceCount = 3,
CanPierceFunction = function(ctx, result, vel)
return true
end,
},
},
{
name = "Grenade",
behavior = {
MaxDistance = 400,
MaxBounces = 6,
Restitution = 0.55,
CanBounceFunction = function(ctx, result, vel)
return true
end,
},
},
})
Bench:Run()

Each profile runs the full bullet-count sweep independently. If you care specifically about how a particular weapon's physics cost scales, define it here and the benchmarker will tell you exactly when the serial solver starts hurting.

Reading the Output

The benchmarker prints a live row as each cell finishes, so you can watch it progress:

serial       | Travel-only              |   500 bullets | avg 25.885 ms  min 23.099  max 41.11  σ 2.614 | 19316 cast-steps/s
parallel | Travel-only | 500 bullets | avg 4.159 ms min 2.19 max 5.828 σ 0.765 | 120215 cast-steps/s
→ parallel/serial ratio: 0.161x [PARALLEL FASTER]

The σ column is standard deviation. High σ means the frame time was inconsistent, often a sign of garbage collection pressure or Roblox scheduler interference during that sample window. If a cell's σ is unusually large relative to its average, treat that row with more skepticism and re-run.

After all profiles complete, a summary table is printed with all results side-by-side for easy comparison.

One run per instance

Bench:Run() asserts if called more than once on the same instance. Create a new instance for each run.