Format Specification

saimx

Spatial Audio Intelligent Meta & Object eXchange — an open, license-free format for immersive 3D audio built from real, discrete objects rather than bed-plus-offset encoding.

version 1.0 · MIME application/vnd.saimox+zip · .smx

01 Overview

Unlike Dolby Atmos, which encodes a bed plus channel differences, SAIMOX stores genuinely separate audio tracks that can be freely positioned and animated in 3D space.

Core principles

  • Real objects. Each track is a discrete audio object — not an offset from a downmix.
  • Codec-agnostic. Opus, FLAC, AAC or WAV, depending on the use case.
  • Binaural fallback. An optional stereo bed for devices without spatial audio.
  • Offline-first. Download, unpack, play — no streaming dependency.
  • Cross-platform. iOS, macOS, Android and web compatible.
  • License-free. No Dolby license, no patents, fully open.

02 Container

A .smx file is a renamed ZIP archive — like .docx, .epub or .apk.

# meditation.smx  (= ZIP archive)
├─ manifest.json          // required: format meta, codec info
├─ spatial.json           // required: tracks, movements, environment
├─ tracks/                // required: audio files
│   ├─ bed_left.opus      // optional: binaural fallback L
│   ├─ bed_right.opus     // optional: binaural fallback R
│   ├─ track_01.opus      // object track
│   └─ track_02.opus
├─ segments/              // optional: sparse audio chunks
└─ assets/                // optional: cover, waveforms

Compression: audio files are already compressed, so they're stored with STORE (no re-compression). JSON and assets use DEFLATE.

03 manifest.json

Global metadata about the package — title, duration, codec and an integrity checksum.

{
  "saimox_version": "1.0",
  "package": {
    "title": "Ocean Meditation",
    "duration": 300.0,
    "category": "meditation",
    "license": "CC-BY-4.0"
  },
  "audio": {
    "codec": "opus",
    "sample_rate": 48000,
    "channels_per_track": 1,
    "total_tracks": 5
  },
  "compatibility": { "binaural_fallback": true }
}

Required fields

FieldTypeDescription
saimox_versionstringFormat version (semver)
package.titlestringTitle of the piece
package.durationfloatTotal duration in seconds
audio.codecstringopus, flac, aac or wav
audio.total_tracksintNumber of audio tracks

04 spatial.json

The heart of the format: every track, its movements over time, and the listening environment.

{
  "tracks": [
    { "id":"track_01", "filename":"tracks/track_01.opus",
      "type":"spatial_object", "spatial_enabled":true,
      "rendering_algorithm":"HRTF",
      "initial_position":{"x":0,"y":0,"z":1} }
  ],
  "movements": [
    { "track_id":"track_01",
      "keyframes":[
        { "time":0,  "position":{"x":-1,"y":0,"z":1}, "volume":0.8 },
        { "time":20, "position":{"x":1, "y":0,"z":-1} }
      ] }
  ],
  "playback": { "fade_in":2.0, "loop_enabled":true }
}

05 Track types

Two kinds of track: a non-spatial stereo bed, and freely positionable 3D objects.

TypeSpatialRole
binaural_bednoStereo foundation, runs straight to the mixer
spatial_objectyesMono object, positioned in 3D through the spatial renderer

Rendering algorithms

AlgorithmDescriptionUse
HRTFHead-Related Transfer FunctionRealistic, CPU-heavy
sphericalHeadSimple spherical head modelFast, less precise
equalPowerPanningStandard panningFastest, minimal 3D

Note: spatial objects must be mono — only mono inputs can be placed in 3D. The stereo bed keeps its full image and never passes through the spatial renderer.

06 Coordinate system

The listener sits at the origin, facing +Z.

+Y up │ │ ─────────┼─────────▶ +X right ╱│ ╱ │ +Z front listener: (0, 0, 0) · facing +Z
AxisRangeMeaning
X−∞ … +∞Left (−) / Right (+)
Y−∞ … +∞Down (−) / Up (+)
Z−∞ … +∞Back (−) / Front (+)
DistanceMeaning
0.0 – 0.5Intimate (inside the head)
0.5 – 2.0Near (personal space)
2.0 – 10.0Mid (room)
10.0 +Far (environment)

07 Keyframe parameters

Each movement is a list of keyframes, interpolated at 60 fps. Only time and position are required; everything else inherits the previous value.

ParameterRangeDefaultDescription
time0 … durationMoment in seconds
position{x,y,z}3D position
volume0.0 – 1.0previousLoudness
pitch_shift−24 … +240.0Semitones
spread0.0 – 1.00.0Width (0 point → 1 diffuse)
distance0.1 – 1001.0Simulated distance
reverb_blend0.0 – 1.00.0Reverb wet/dry
occlusion0.0 – 1.00.0Damping through material
interpolationsmooth/linear/stepsmoothTransition to next keyframe

08 Environment

Global reverb and distance attenuation shape the perceived space.

Reverb presets

none · smallRoom · mediumRoom · largeRoom · mediumHall · largeHall · cathedral · plate · chamber

Distance attenuation models

ModelFormulaUse
linear1 − rolloff × (d − ref) / (max − ref)Simple
inverseref / (ref + rolloff × (d − ref))Natural
exponentialpow(d / ref, −rolloff)Realistic

09 Playback

Load, then render — with a clean fallback path for devices without spatial audio.

Load

  • Download the .smx file and unpack the ZIP.
  • Parse manifest.json → verify codec and track count.
  • Parse spatial.json → load tracks and movements.
  • Position tracks at their initial coordinates and start playback.
  • Interpolate keyframes at 60 fps; update position and volume in real time.

Fallback logic

if device.supportsSpatialAudio {
  play(objects)            // spatial objects through HRTF
} else if binaural_fallback {
  play(bed_left, bed_right) // stereo bed
} else {
  downmixToStereo(objects)  // last resort
}

10 File sizes

Five minutes, five object tracks — SAIMOX stays small because it leans on open, efficient codecs.

FormatSizeFactor
ADM BWF (uncompressed)~2.76 GB
SAIMOX WAV~216 MB12× smaller
SAIMOX FLAC~110 MB25× smaller
SAIMOX Opus 128k~24 MB115× smaller
SAIMOX Opus 96k, 50% sparse~9 MB300× smaller

11 Roadmap

saimox 1.0 is deliberately offline-first — download, unpack, play. Streaming is planned, and object-based audio has a real structural advantage here: discrete objects can be prioritised and loaded independently, which a bed-plus-offset stream cannot do.

Offline playback now · v1.0

The full .smx is fetched and unpacked, then played locally. Simple, robust, no network dependency during playback.

Range-request streaming planned · v1.1

Audio is stored uncompressed (STORE) inside the ZIP, so each track occupies a continuous, addressable byte range. Reading the central directory once lets a client fetch individual tracks via HTTP range requests — the container stays intact but becomes streamable.

Segment streaming planned · v1.1

The existing sparse_segments mechanism extends naturally into HLS-style just-in-time buffering: load time-indexed chunks a second or two ahead, with adaptive bitrate per object.

Adaptive object streaming planned · v2.0

Each track already carries essential and genre_role. On a weak connection a player can load only the essential objects — the foundation always plays — and pull optional accent objects when bandwidth allows. Prioritised, per-object delivery that a single downmixed stream can't match.

12 License

The SAIMOX format is fully open and license-free.

  • No patents.
  • No royalties.
  • No restrictions on commercial use.
  • Built on open codecs — Opus is BSD-licensed.

Real 3D audio. No Dolby. No license. No bullshit.