Performance Guide¶

Ship smooth 60fps 3D and AR experiences on Android. This guide covers profiling, asset optimization, scene tuning, Compose best practices, and device-tier strategies for the SceneView SDK.

Hot Paths & Allocation-Free APIs¶

This is the single most common performance bug in SceneView apps, and the one that AI assistants reintroduce most often. A cross-platform hot-path audit (umbrella #2263) found the exact same anti-pattern on all five platforms: getters that decompose a matrix or allocate are far too easy to call once per frame. They cost almost nothing when called occasionally, so the cost is invisible in code review — but in a render-rate loop they melt your frame budget.

The rule

Never call a decomposing or allocating getter inside onFrame, onSessionUpdated, an update closure, a requestAnimationFrame tick, or any other per-frame loop. Read once, cache outside the loop, or use the allocation-free overload. The frame loop runs 30–60 times per second — one stray FloatArray(16) there is thousands of garbage objects per second.

Why it matters¶

Three distinct failure modes, all from the same root cause:

Float drift. Decomposing a matrix into components and recomposing it every frame is lossy. The trigger for this whole audit (#2187) was node.quaternion = q round-tripping through the world matrix → measurable rotational drift over time. Writing one TRS component at a time forces a decompose-edit-recompose cycle; set the whole transform once instead.
GC sawtooth. Per-frame allocation produces the classic memory-profiler sawtooth: allocate → GC pause → jank. It is worst on iOS Safari / mobile web, where the JS heap is small and a collection mid-rAF drops a frame outright.
JNI / bridge thunks. On Android each Filament world-matrix read is a JNI round-trip; doing it three times per frame per node (position + rotation + scale) is three thunks where one cached read would do.

Cheat sheet — avoid in the hot path → use instead¶

Android (Kotlin / Compose)AR (ARCore)KMP collision (sceneview-core)Web (Filament.js)Apple (SwiftUI / RealityKit)

Avoid per frame	Use instead	Why
Setting `node.position` / `node.quaternion` / `node.rotation` / `node.scale` individually in a loop	Build a `Transform` (`Mat4`) and assign `node.transform = …` once	Each component setter recomposes the matrix; one-at-a-time writes also round-trip and drift (#2187)
`Mat4.toColumnsFloatArray()` for a per-frame uniform / `TransformManager.setTransform` upload	`Mat4.copyColumnsInto(scratch)` into a reused `FloatArray(16)` (also `Mat3.copyColumnsInto`)	`toColumnsFloatArray()` allocates a fresh array every call (#2271)
`slerp(startTransform, endTransform, …)` when you already hold the TRS components	the TRS-tuple overload `slerp(startPosition, startQuaternion, startScale, …)`	The `Transform` overload runs 6 matrix decompositions per call (#2265)
Re-loading a model / re-creating a node per recomposition	`rememberModelInstance(modelLoader, "models/x.glb")`, `rememberNode { … }`	They cache and marshal Filament JNI onto the main thread for you

node.worldPosition / worldQuaternion / worldScale / worldRotation are now cached (#2264) — reading them per frame is fine. They invalidate automatically when the transform or parenting changes.

Avoid per frame	Use instead	Why
`pose.transform` inside `onSessionUpdated`	`pose.toTransform(scratch)` with a reused `FloatArray(16)`, or just feed the pose to a node/anchor setter	The `Pose.transform` getter allocates a `FloatArray(16)` every read (#2266)
Allocating collections — `frame.getUpdatedPlanes().map { … }`, `listOf(...)`, `filter` — every frame	Hoist the references out of the callback; iterate without building new lists	`onSessionUpdated` runs every frame; per-frame collections are pure GC pressure

Anchor / PoseNode setters already do the matrix work internally — let them, rather than reading pose.transform yourself.

Avoid per frame	Use instead	Why
A fresh `Ray(origin, direction)` per pick/cast	Reuse one `Ray` and `setOrigin(...)` / `setDirection(...)`	Each `Ray` and each `getOrigin()` / `getDirection()` returns a new `Vector3` copy
Allocating per triangle in a mesh-intersection loop	Reuse scratch `Vector3` / arrays across the loop	Triangle loops run thousands of iterations per cast

Avoid per frame	Use instead	Why
Fresh `[x, y, z]` / `float3(...)` / mat4 array literals every `requestAnimationFrame` tick	Preallocate scratch arrays once and mutate in place before the call	Filament.js reads the array synchronously, so reuse is safe; fresh literals = GC sawtooth on Safari

SceneView's own OrbitCameraController already does this — it keeps eyeScratch / centerScratch / upScratch and rewrites them per frame instead of allocating (#2268). Follow the same pattern in your render loop.

Avoid per frame	Use instead	Why
Driving `@State` from a per-frame `onFrame` / RealityKit `update` closure	Mutate a reference box (a `class` you hold), or subscribe to scene events, and only flip `@State` when UI-visible state actually changes	A `@State` write every frame churns the SwiftUI view `body` — the per-frame loop should touch entities, not view state

One-line takeaway¶

Per-frame code reads cached values and writes whole transforms. If a getter's name implies a decomposition (world*, *.transform) or returns a fresh array or object, hoist it out of the loop or use its allocation-free overload.

Measuring Performance¶

Before optimizing, measure. Guessing where time is spent leads to wasted effort.

Frame budget¶

At 60fps your app has 16.6ms per frame for everything: CPU logic, GPU rendering, Compose layout, and ARCore tracking. Anything over that budget causes dropped frames and visible jank.

The 16.6ms rule

60fps = 1000ms / 60 = 16.6ms per frame. That includes CPU work, GPU rendering, and any Compose recomposition. Aim for headroom — target 12ms so spikes don't push you over.

Android Studio Profiler¶

Use the built-in profiler in Android Studio to identify bottlenecks:

CPU Profiler — look for long onFrame or onSessionUpdated calls, excessive allocations, or blocking I/O on the main thread.
GPU Profiler — check for overdraw (red = 4x overdraw), long fragment shader times, or GPU-bound frames.
Memory Profiler — watch for repeated allocations each frame (GC pauses cause jank). Look for leaked ModelInstance or Material objects.

Filament debug stats¶

Enable Filament's built-in frame statistics to see draw calls, triangle counts, and GPU timing without leaving your app:

SceneView(
    engine = engine,
    modelLoader = modelLoader,
    // ...
) {
    // Access the Filament view for debug options
}

Quick debug overlay

Use adb shell dumpsys gfxinfo <package> for a quick frame-time histogram without any code changes.

Model Optimization¶

Models are usually the biggest performance lever. A poorly optimized model can single-handedly destroy your frame rate.

Polygon count¶

Target	Triangle budget
Interactive objects	< 100K triangles
Hero/showcase models	< 200K triangles (high-end only)
Background/environment	< 50K triangles

Triangles add up fast

A single model might be 50K triangles, but if you have 10 in the scene that is 500K — well beyond mobile budgets. Always count total scene triangles.

Reduction tools:

Blender — Decimate modifier (collapse or un-subdivide)
meshoptimizer — meshopt_simplify for automated LOD generation
gltfpack — CLI tool that simplifies, compresses, and optimizes glTF/GLB files

Use LOD (Level of Detail) when available: show high-poly when the camera is close, swap to low-poly at distance. This can cut triangle count by 50-80% for complex scenes.

Textures¶

Textures consume the most GPU memory and bandwidth on mobile.

Rule	Recommendation
Format	KTX2 with Basis Universal compression
Max size	2048x2048 for mobile (1024x1024 for low-end)
Mipmaps	Always enable for objects viewed at varying distances
Channels	Use single-channel textures for roughness/metallic, not full RGBA

KTX2 saves memory and load time

KTX2 with Basis Universal (ETC1S or UASTC) compresses textures 4-8x compared to raw PNG/JPEG, and they stay compressed in GPU memory. Convert with toktx --t2 --bcmp input.png output.ktx2.

File size¶

Smaller files mean faster loading and less memory pressure:

Target	Size
Interactive models	< 10MB
Hero/showcase models	< 50MB
Quick-load previews	< 2MB

Optimization checklist:

Use GLB (binary glTF) instead of glTF + separate .bin/.png files
Enable Draco geometry compression for mesh data
Strip unused animations, blend shapes, and extra UV sets
Run gltfpack as a final optimization pass

# Example: optimize a model with gltfpack
gltfpack -i model.glb -o model_optimized.glb -tc -cc -si 0.5
# -tc = texture compression, -cc = codec compression, -si = simplification ratio

Scene Optimization¶

Limit draw calls¶

Each visible node typically generates one or more draw calls. On mobile, aim for fewer than 100 draw calls per frame.

Fewer separate nodes = fewer draw calls. Merge static geometry in your 3D tool before export.
Use instancing for repeated objects (trees, rocks, particles). Filament supports GPU instancing for identical meshes.
Frustum culling is automatic in Filament — objects outside the camera view are not rendered. But they still cost CPU time if they exist as nodes.

Merge before export

If you have 50 static building meshes in Blender, join them into one object before exporting to GLB. This turns 50 draw calls into 1.

Lights¶

Lights are one of the most expensive parts of a scene. Each additional light increases per-fragment shading cost.

Light type	Cost	Recommendation
Directional (sun)	Low	Use 1 as your main light
Point / Spot	Medium	Limit to 2-3 total
Shadow-casting	High	Limit to 1-2 lights with shadows

val mainLight = rememberMainLightNode(engine) {
    intensity = 100_000f
    // Shadows on the main light only
}

Use IBL instead of many point lights

Image-Based Lighting (IBL) from an HDR environment map provides realistic ambient lighting at nearly zero per-frame cost. One directional light + IBL covers most use cases better than 5+ point lights.

val environment = rememberEnvironment(environmentLoader) {
    createHDREnvironment("environments/studio.hdr")
        ?: createEnvironment(environmentLoader)
}

Post-processing¶

Post-processing effects look great but eat into your frame budget:

Effect	Typical cost	Notes
Bloom	~1-2ms	Acceptable on mid-tier and above
Depth of Field	~1-2ms	Use sparingly, mainly for screenshots
SSAO	~2-3ms	Most expensive — skip on low-end devices
Anti-aliasing (FXAA)	~0.5ms	Cheap, usually worth enabling

SSAO on budget devices

Screen-Space Ambient Occlusion is the most expensive post-process effect. On low-end devices it can take 3ms+ alone — nearly 20% of your frame budget. Disable it on devices below your mid-tier threshold.

Enable effects selectively based on device tier (see Device Tiers below).

Compose Integration¶

SceneView is a Jetpack Compose library, so Compose performance rules apply directly.

Avoid unnecessary recompositions¶

Recomposition during rendering can cause frame drops. Follow these rules:

// BAD — creates new Position every recomposition, triggering node updates
SceneView(/* ... */) {
    ModelNode(
        modelInstance = model,
        position = Position(0f, 1f, 0f)  // new object every recomposition!
    )
}

// GOOD — stable reference, no unnecessary updates
val position = remember { Position(0f, 1f, 0f) }

SceneView(/* ... */) {
    ModelNode(
        modelInstance = model,
        position = position
    )
}

No allocations in the composition body

Never create new Position, Rotation, Scale, or Quaternion objects directly inside SceneView { } without remember. Each recomposition creates a new instance, causing the node to update every frame.

Key rules:

Use remember for stable Position, Rotation, and Scale references
Use key on model instances to avoid unnecessary reload when list order changes
Use derivedStateOf when computing values from other state

The Filament Engine is expensive to create. Never create more than one.

// At the app/activity level
val engine = rememberEngine()
val modelLoader = rememberModelLoader(engine)
val materialLoader = rememberMaterialLoader(engine)
val environmentLoader = rememberEnvironmentLoader(engine)

// Share across all scenes via CompositionLocal or parameter passing
SceneView(engine = engine, modelLoader = modelLoader, /* ... */) { }

One Engine per app — not per screen

Creating multiple Engine instances wastes GPU memory and can cause crashes on devices with limited resources. Create one at the top level and pass it down.

Lazy loading¶

Load models on-demand rather than all at startup:

// rememberModelInstance loads asynchronously and returns null while loading
val model = rememberModelInstance(modelLoader, "models/character.glb")

SceneView(/* ... */) {
    if (model != null) {
        ModelNode(modelInstance = model)
    } else {
        // Show a placeholder while loading
        CubeNode(materialInstance = placeholderMaterial)
    }
}

rememberModelInstance handles async loading and main-thread marshalling correctly
Show placeholder geometry (a simple cube or spinner) while the model loads
For imperative code outside Compose, use modelLoader.loadModelInstanceAsync

Threading: Filament calls must be on the main thread

Never call modelLoader.createModel* or materialLoader.* from a background coroutine. rememberModelInstance handles this automatically. For imperative code, use loadModelInstanceAsync.

AR-Specific Optimization¶

AR adds the camera feed and ARCore tracking to your frame budget, leaving less room for rendering.

Camera frame processing¶

onSessionUpdated runs every single frame. Any work you do here directly impacts frame rate.

ARSceneView(
    // ...
    onSessionUpdated = { session, frame ->
        // FAST: read a cached value
        val planes = frame.getUpdatedPlanes()

        // SLOW: don't do this!
        // val bitmap = frame.acquireCameraImage().toBitmap()  // allocation + conversion
    }
)

No allocations in onSessionUpdated

This callback runs 30-60 times per second. Allocating objects here causes GC pressure and frame drops. Cache references outside the callback and reuse them.

Rules for onSessionUpdated:

Do not allocate objects (no listOf, map, filter on every frame)
Cache plane and anchor references outside the callback
If you need heavy processing (image analysis, ML), dispatch to a background thread and read results on the next frame

Plane rendering¶

The AR plane renderer draws detected surfaces with a semi-transparent overlay. This causes GPU overdraw — especially problematic when multiple planes overlap.

ARSceneView(
    planeRenderer = planeRenderer,
    // ...
)

// Disable after the user has placed their object
LaunchedEffect(objectPlaced) {
    if (objectPlaced) {
        planeRenderer.isVisible = false  // reduces overdraw
    }
}

Disable planes after placement

Once the user has placed an object, disable planeRenderer. This removes overdraw from plane visualization and saves 1-2ms per frame on most devices.

Device Tiers¶

Not all Android devices are equal. Adapt your scene complexity based on hardware capability.

Tier	Example devices	Triangle budget	Post-processing	Shadows
High	Pixel 8 Pro, Samsung S24, OnePlus 12	200K triangles	Full (Bloom, SSAO, DoF)	2 shadow-casting lights
Mid	Pixel 6a, Samsung A54, Pixel 7	100K triangles	Basic (Bloom only)	1 shadow-casting light
Low	Older budget phones, 2GB RAM devices	50K triangles	None	No shadows

Detecting device tier¶

fun getDeviceTier(context: Context): DeviceTier {
    val activityManager = context.getSystemService(Context.ACTIVITY_SERVICE)
        as ActivityManager
    val memInfo = ActivityManager.MemoryInfo()
    activityManager.getMemoryInfo(memInfo)

    val totalRamGb = memInfo.totalMem / (1024.0 * 1024.0 * 1024.0)

    return when {
        totalRamGb >= 8.0 -> DeviceTier.HIGH
        totalRamGb >= 4.0 -> DeviceTier.MID
        else -> DeviceTier.LOW
    }
}

enum class DeviceTier { HIGH, MID, LOW }

RAM is a proxy, not a guarantee

Total RAM is a rough proxy for device capability. For more accurate tiering, also consider GPU model (GLES20.glGetString(GLES20.GL_RENDERER)), Android version, and the Android Performance Tuner library.

Applying tiers¶

val tier = remember { getDeviceTier(context) }

SceneView(
    engine = engine,
    modelLoader = modelLoader,
    // ...
) {
    // Adjust quality based on tier
    val modelPath = when (tier) {
        DeviceTier.HIGH -> "models/character_high.glb"
        DeviceTier.MID -> "models/character_mid.glb"
        DeviceTier.LOW -> "models/character_low.glb"
    }

    val model = rememberModelInstance(modelLoader, modelPath)
    if (model != null) {
        ModelNode(modelInstance = model)
    }
}

Performance Checklist¶

Use this checklist before shipping: