Format Specification
saimx
Spatial Audio Intelligent Meta & Object eXchange — an open, license-free format for immersive 3D audio built from real, discrete objects rather than bed-plus-offset encoding.
version 1.0 · MIME application/vnd.saimox+zip · .smx
01 Overview
Unlike Dolby Atmos, which encodes a bed plus channel differences, SAIMOX stores genuinely separate audio tracks that can be freely positioned and animated in 3D space.
Core principles
- Real objects. Each track is a discrete audio object — not an offset from a downmix.
- Codec-agnostic. Opus, FLAC, AAC or WAV, depending on the use case.
- Binaural fallback. An optional stereo bed for devices without spatial audio.
- Offline-first. Download, unpack, play — no streaming dependency.
- Cross-platform. iOS, macOS, Android and web compatible.
- License-free. No Dolby license, no patents, fully open.
02 Container
A .smx file is a renamed ZIP archive — like .docx, .epub or .apk.
# meditation.smx (= ZIP archive) ├─ manifest.json // required: format meta, codec info ├─ spatial.json // required: tracks, movements, environment ├─ tracks/ // required: audio files │ ├─ bed_left.opus // optional: binaural fallback L │ ├─ bed_right.opus // optional: binaural fallback R │ ├─ track_01.opus // object track │ └─ track_02.opus ├─ segments/ // optional: sparse audio chunks └─ assets/ // optional: cover, waveforms
Compression: audio files are already compressed, so they're stored with STORE (no re-compression). JSON and assets use DEFLATE.
03 manifest.json
Global metadata about the package — title, duration, codec and an integrity checksum.
{
"saimox_version": "1.0",
"package": {
"title": "Ocean Meditation",
"duration": 300.0,
"category": "meditation",
"license": "CC-BY-4.0"
},
"audio": {
"codec": "opus",
"sample_rate": 48000,
"channels_per_track": 1,
"total_tracks": 5
},
"compatibility": { "binaural_fallback": true }
}
Required fields
| Field | Type | Description |
|---|---|---|
saimox_version | string | Format version (semver) |
package.title | string | Title of the piece |
package.duration | float | Total duration in seconds |
audio.codec | string | opus, flac, aac or wav |
audio.total_tracks | int | Number of audio tracks |
04 spatial.json
The heart of the format: every track, its movements over time, and the listening environment.
{
"tracks": [
{ "id":"track_01", "filename":"tracks/track_01.opus",
"type":"spatial_object", "spatial_enabled":true,
"rendering_algorithm":"HRTF",
"initial_position":{"x":0,"y":0,"z":1} }
],
"movements": [
{ "track_id":"track_01",
"keyframes":[
{ "time":0, "position":{"x":-1,"y":0,"z":1}, "volume":0.8 },
{ "time":20, "position":{"x":1, "y":0,"z":-1} }
] }
],
"playback": { "fade_in":2.0, "loop_enabled":true }
}
05 Track types
Two kinds of track: a non-spatial stereo bed, and freely positionable 3D objects.
| Type | Spatial | Role |
|---|---|---|
binaural_bed | no | Stereo foundation, runs straight to the mixer |
spatial_object | yes | Mono object, positioned in 3D through the spatial renderer |
Rendering algorithms
| Algorithm | Description | Use |
|---|---|---|
HRTF | Head-Related Transfer Function | Realistic, CPU-heavy |
sphericalHead | Simple spherical head model | Fast, less precise |
equalPowerPanning | Standard panning | Fastest, minimal 3D |
Note: spatial objects must be mono — only mono inputs can be placed in 3D. The stereo bed keeps its full image and never passes through the spatial renderer.
06 Coordinate system
The listener sits at the origin, facing +Z.
| Axis | Range | Meaning |
|---|---|---|
| X | −∞ … +∞ | Left (−) / Right (+) |
| Y | −∞ … +∞ | Down (−) / Up (+) |
| Z | −∞ … +∞ | Back (−) / Front (+) |
| Distance | Meaning |
|---|---|
| 0.0 – 0.5 | Intimate (inside the head) |
| 0.5 – 2.0 | Near (personal space) |
| 2.0 – 10.0 | Mid (room) |
| 10.0 + | Far (environment) |
07 Keyframe parameters
Each movement is a list of keyframes, interpolated at 60 fps. Only time and position are required; everything else inherits the previous value.
| Parameter | Range | Default | Description |
|---|---|---|---|
time | 0 … duration | — | Moment in seconds |
position | {x,y,z} | — | 3D position |
volume | 0.0 – 1.0 | previous | Loudness |
pitch_shift | −24 … +24 | 0.0 | Semitones |
spread | 0.0 – 1.0 | 0.0 | Width (0 point → 1 diffuse) |
distance | 0.1 – 100 | 1.0 | Simulated distance |
reverb_blend | 0.0 – 1.0 | 0.0 | Reverb wet/dry |
occlusion | 0.0 – 1.0 | 0.0 | Damping through material |
interpolation | smooth/linear/step | smooth | Transition to next keyframe |
08 Environment
Global reverb and distance attenuation shape the perceived space.
Reverb presets
none · smallRoom · mediumRoom · largeRoom · mediumHall · largeHall · cathedral · plate · chamber
Distance attenuation models
| Model | Formula | Use |
|---|---|---|
linear | 1 − rolloff × (d − ref) / (max − ref) | Simple |
inverse | ref / (ref + rolloff × (d − ref)) | Natural |
exponential | pow(d / ref, −rolloff) | Realistic |
09 Playback
Load, then render — with a clean fallback path for devices without spatial audio.
Load
- Download the
.smxfile and unpack the ZIP. - Parse
manifest.json→ verify codec and track count. - Parse
spatial.json→ load tracks and movements. - Position tracks at their initial coordinates and start playback.
- Interpolate keyframes at 60 fps; update position and volume in real time.
Fallback logic
if device.supportsSpatialAudio { play(objects) // spatial objects through HRTF } else if binaural_fallback { play(bed_left, bed_right) // stereo bed } else { downmixToStereo(objects) // last resort }
10 File sizes
Five minutes, five object tracks — SAIMOX stays small because it leans on open, efficient codecs.
| Format | Size | Factor |
|---|---|---|
| ADM BWF (uncompressed) | ~2.76 GB | 1× |
| SAIMOX WAV | ~216 MB | 12× smaller |
| SAIMOX FLAC | ~110 MB | 25× smaller |
| SAIMOX Opus 128k | ~24 MB | 115× smaller |
| SAIMOX Opus 96k, 50% sparse | ~9 MB | 300× smaller |
11 Roadmap
saimox 1.0 is deliberately offline-first — download, unpack, play. Streaming is planned, and object-based audio has a real structural advantage here: discrete objects can be prioritised and loaded independently, which a bed-plus-offset stream cannot do.
Offline playback now · v1.0
The full .smx is fetched and unpacked, then played locally. Simple, robust, no network dependency during playback.
Range-request streaming planned · v1.1
Audio is stored uncompressed (STORE) inside the ZIP, so each track occupies a continuous, addressable byte range. Reading the central directory once lets a client fetch individual tracks via HTTP range requests — the container stays intact but becomes streamable.
Segment streaming planned · v1.1
The existing sparse_segments mechanism extends naturally into HLS-style just-in-time buffering: load time-indexed chunks a second or two ahead, with adaptive bitrate per object.
Adaptive object streaming planned · v2.0
Each track already carries essential and genre_role. On a weak connection a player can load only the essential objects — the foundation always plays — and pull optional accent objects when bandwidth allows. Prioritised, per-object delivery that a single downmixed stream can't match.
12 License
The SAIMOX format is fully open and license-free.
- No patents.
- No royalties.
- No restrictions on commercial use.
- Built on open codecs — Opus is BSD-licensed.
Real 3D audio. No Dolby. No license. No bullshit.