脱离 Three.js：用原生 WebGPU 优雅解析并渲染 GLTF 场景

2026/7/3 16:33:05 1 0 0 0

在 WebGL 时代，直接用原生 API 编写一个完整的 GLTF/GLB 加载器是一项相当繁琐的任务。而在 WebGPU 时代，得益于更现代的管线设计、更清晰的内存管理和 WGSL 的加持，这一过程虽然仍具挑战，但其逻辑结构变得更加优雅和符合直觉。

本文将不依赖 Three.js 或 Babylon.js 等任何上层 3D 引擎，带你使用原生 WebGPU API，从零解析 GLB 二进制流，并将其送入 GPU 渲染管线。

一、为什么选择原生 WebGPU 解析 GLTF？

极致的性能与体积控制：Three.js 等引擎打包体积通常以 MB 计。手写解析器可以做到按需加载，体积控制在几KB内，且能根据特定业务定制内存复用策略。
直观的内存映射：WebGPU 的 GPUBuffer 与 GLTF 的 BufferView 概念在设计哲学上高度契合。通过现代 JS 的 ArrayBuffer 和 TypedArray，可以实现几乎“零拷贝”的数据流转。
深度掌控管线：手写解析能让你真正理解顶点布局（Vertex Layout）、Bind Group 组织、矩阵级联变换（Transform Hierarchy）等底层图形学细节。

二、核心工作流：从文件到屏幕

解析并渲染一个 GLTF 场景，大体需要经过以下五个步骤：

[GLB/GLTF 文件] ──> [解析二进制 Header & JSON] ──> [重建 Buffer & Mesh 拓扑]
                                                             │
[渲染循环 (Render Loop)] <── [构建 GPU 管线与 Bind Groups] <───┘

三、步骤一：解析 GLB 二进制容器

GLB 是 GLTF 的二进制打包格式，结构紧凑。它由 Header、JSON Chunk 和 BIN Chunk 三部分组成：

Header (12 字节)：包括魔数（0x46546C67，即 "gltf"）、版本号、文件总长度。
Chunk 0 (JSON)：存储场景结构、节点树、材质参数等。
Chunk 1 (BIN)：存储顶点、法线、UV、索引等原始二进制数据。

下面是高效解析 GLB 的原生 JS 代码：

async function parseGLB(arrayBuffer) {
    const dataView = new DataView(arrayBuffer);
    
    // 1. 验证 Header
    const magic = dataView.getUint32(0, true);
    const version = dataView.getUint32(4, true);
    if (magic !== 0x46546C67) {
        throw new Error("非法的 GLB 文件格式");
    }

    // 2. 循环读取 Chunk
    let offset = 12;
    let json = null;
    let binaryBuffer = null;

    while (offset < arrayBuffer.byteLength) {
        const chunkLength = dataView.getUint32(offset, true);
        const chunkType = dataView.getUint32(offset + 4, true);
        const chunkData = arrayBuffer.slice(offset + 8, offset + 8 + chunkLength);

        if (chunkType === 0x4E4F534A) { // "JSON" Chunk
            const decoder = new TextDecoder("utf-8");
            json = JSON.parse(decoder.decode(chunkData));
        } else if (chunkType === 0x004E4942) { // "BIN" Chunk
            binaryBuffer = chunkData;
        }
        offset += 8 + chunkLength;
    }

    return { json, binaryBuffer };
}

四、步骤二：映射 WebGPU 顶点缓冲区 (GPUBuffer)

GLTF 的核心设计是通过 accessor、bufferView 指向原始二进制数据的不同片段。我们需要将这些片段准确地上传至 WebGPU 的 GPUBuffer 中。

这里有一个关键的设计抉择：是为每个属性单独建一个 Buffer，还是共用一个大 Buffer？
在 WebGPU 中，共用一个大 Buffer 并通过 offset 读取是更优雅、性能更好的做法。

function createGPUBuffers(device, json, binaryBuffer) {
    const gpuBuffers = [];

    // 遍历 glTF 的 bufferViews，直接映射到 WebGPU 的 GPUBuffer
    for (const view of json.bufferViews) {
        const byteLength = view.byteLength;
        const byteOffset = view.byteOffset || 0;
        
        // 确定 Buffer 的用途
        let usage = GPUBufferUsage.COPY_DST;
        if (view.target === 34962) { // GL_ARRAY_BUFFER (顶点数据)
            usage |= GPUBufferUsage.VERTEX;
        } else if (view.target === 34963) { // GL_ELEMENT_ARRAY_BUFFER (索引数据)
            usage |= GPUBufferUsage.INDEX;
        } else {
            // 兼容未标记 target 的情况，默认赋予 vertex/uniform 属性
            usage |= GPUBufferUsage.VERTEX | GPUBufferUsage.INDEX;
        }

        const gpuBuffer = device.createBuffer({
            size: (byteLength + 3) & ~3, // WebGPU 要求 Buffer 尺寸 4 字节对齐
            usage: usage,
            mappedAtCreation: true
        });

        // 将 BIN 中的数据写入 GPUBuffer 的映射内存
        const writeArray = new Uint8Array(gpuBuffer.getMappedRange());
        const sourceArray = new Uint8Array(binaryBuffer, byteOffset, byteLength);
        writeArray.set(sourceArray);
        gpuBuffer.unmap();

        gpuBuffers.push(gpuBuffer);
    }

    return gpuBuffers;
}

五、步骤三：解析 Mesh 与材质，构建渲染单元

在 GLTF 中，一个 mesh 包含多个 primitive（图元）。每个 primitive 对应 WebGPU 中的一次 draw 或 drawIndexed 调用。

为了优雅地渲染，我们需要将 accessor 中的信息（如 componentType、type）转换为 WebGPU 识别的格式（如 float32x3、uint16）。

1. 格式映射表

const GL_TYPE_MAP = {
    5121: "uint8",
    5123: "uint16",
    5125: "uint32",
    5126: "float32"
};

const TYPE_SIZE_MAP = {
    SCALAR: 1,
    VEC2: 2,
    VEC3: 3,
    VEC4: 4,
    MAT4: 16
};

function getVertexFormat(type, componentType) {
    const baseType = GL_TYPE_MAP[componentType];
    const size = TYPE_SIZE_MAP[type];
    return `${baseType}x${size}`;
}

2. 构建 Primitive 数据结构

遍历 Mesh 下的 Primitives，记录它们在 WebGPU 中的顶点布局（Vertex Buffer Layout）：

function parseMeshes(json, gpuBuffers) {
    const meshes = [];

    for (const mesh of json.meshes) {
        const primitives = mesh.primitives.map(prim => {
            const attributes = prim.attributes;
            const vertexLayouts = [];
            const vertexBuffersBind = [];

            let shaderSlot = 0;
            let vertexCount = 0;

            // 1. 处理顶点位置、法线、UV 等属性
            for (const [semantic, accessorIdx] of Object.entries(attributes)) {
                const accessor = json.accessors[accessorIdx];
                const viewIdx = accessor.bufferView;
                const bufferView = json.bufferViews[viewIdx];
                const gpuBuffer = gpuBuffers[viewIdx];

                const format = getVertexFormat(accessor.type, accessor.componentType);
                vertexCount = accessor.count;

                vertexLayouts.push({
                    arrayStride: bufferView.byteStride || (TYPE_SIZE_MAP[accessor.type] * 4), // 简单处理，假设都是4字节类型
                    attributes: [{
                        shaderLocation: shaderSlot++, // 对应 WGSL 中的 @location(N)
                        offset: accessor.byteOffset || 0,
                        format: format
                    }]
                });

                vertexBuffersBind.push({
                    buffer: gpuBuffer,
                    offset: 0
                });
            }

            // 2. 处理索引（如果有）
            let indexData = null;
            if (prim.indices !== undefined) {
                const accessor = json.accessors[prim.indices];
                const viewIdx = accessor.bufferView;
                indexData = {
                    buffer: gpuBuffers[viewIdx],
                    format: accessor.componentType === 5123 ? "uint16" : "uint32",
                    count: accessor.count,
                    offset: accessor.byteOffset || 0
                };
            }

            return {
                vertexLayouts,
                vertexBuffersBind,
                indexData,
                vertexCount
            };
        });

        meshes.push({ name: mesh.name, primitives });
    }

    return meshes;
}

六、步骤四：处理场景图（Scene Graph）与矩阵对齐

GLTF 具有严格的树状节点结构（Nodes），每个节点可以拥有 translation、rotation（四元数）和 scale，或者一个 16 元素的 matrix。我们需要遍历这个节点树，计算出每个 Mesh 节点的 全局变换矩阵（Global Transform Matrix）。

1. 矩阵层次计算

使用类似 gl-matrix 的库来处理矩阵运算。我们需要递归计算世界矩阵：

import { mat4, vec3, quat } from 'gl-matrix';

function computeWorldMatrices(json) {
    const worldMatrices = new Array(json.nodes.length).fill(null).map(() => mat4.create());

    function traverse(nodeIdx, parentMatrix) {
        const node = json.nodes[nodeIdx];
        const localMatrix = mat4.create();

        if (node.matrix) {
            mat4.copy(localMatrix, node.matrix);
        } else {
            const t = node.translation || vec3.fromValues(0, 0, 0);
            const r = node.rotation || quat.fromValues(0, 0, 0, 1);
            const s = node.scale || vec3.fromValues(1, 1, 1);
            mat4.fromRotationTranslationScale(localMatrix, r, t, s);
        }

        const worldMatrix = mat4.create();
        mat4.multiply(worldMatrix, parentMatrix, localMatrix);
        worldMatrices[nodeIdx] = worldMatrix;

        if (node.children) {
            for (const childIdx of node.children) {
                traverse(childIdx, worldMatrix);
            }
        }
    }

    // 从 Scene 的根节点开始遍历
    const activeScene = json.scenes[json.scene || 0];
    const identity = mat4.create();
    for (const rootNodeIdx of activeScene.nodes) {
        traverse(rootNodeIdx, identity);
    }

    return worldMatrices;
}

2. WebGPU Uniforms 内存对齐（关键陷阱）

在向 WebGPU 上传矩阵时，WGSL 中的 matrix<f32> 默认遵循 16 字节对齐（std140 布局规则）。
也就是说，哪怕你的 Uniform Buffer 里只需要一个 mat4 和一个 vec3，它们在内存中的排布也必须精确计算：

projectionViewMatrix: mat4x4<f32> -> 占用 64 字节 (偏移量 0)
modelMatrix: mat4x4<f32> -> 占用 64 字节 (偏移量 64)
color: vec3<f32> -> 占用 12 字节，但由于对齐规则，它后面需要填充 4 字节的空白，使下一个变量对齐到 16 的倍数。

// 创建 Uniform Buffer（保存相机矩阵和模型变换矩阵）
const uniformBufferSize = 64 + 64; // ProjectionView + Model
const uniformBuffer = device.createBuffer({
    size: uniformBufferSize,
    usage: GPUBufferUsage.UNIFORM | GPUBufferUsage.COPY_DST
});

七、步骤五：编写 WGSL 着色器

配合我们的顶点布局，WGSL 着色器应当被设计为高度通用且支持动态数据传入。

struct Uniforms {
    projectionViewMatrix : mat4x4<f32>,
    modelMatrix : mat4x4<f32>,
};

@group(0) @binding(0) var<uniform> uniforms : Uniforms;

struct VertexInput {
    @location(0) position : vec3<f32>,
    @location(1) normal : vec3<f32>,
};

struct VertexOutput {
    @builtin(position) Position : vec4<f32>,
    @location(0) fragNormal : vec3<f32>,
};

@vertex
fn vs_main(input : VertexInput) -> VertexOutput {
    var output : VertexOutput;
    
    // 计算世界空间下的坐标和法线
    let worldPosition = uniforms.modelMatrix * vec4<f32>(input.position, 1.0);
    output.Position = uniforms.projectionViewMatrix * worldPosition;
    
    // 粗暴地将法线旋转至世界空间（未考虑非均匀缩放）
    output.fragNormal = (uniforms.modelMatrix * vec4<f32>(input.normal, 0.0)).xyz;
    
    return output;
}

@fragment
fn fs_main(input : VertexOutput) -> @location(0) vec4<f32> {
    // 基础的 N dot L 朗伯特光照
    let lightDirection = normalize(vec3<f32>(0.5, 1.0, 0.3));
    let normal = normalize(input.fragNormal);
    let diffuse = max(dot(normal, lightDirection), 0.1);
    
    let objectColor = vec3<f32>(0.8, 0.8, 0.8);
    return vec4<f32>(objectColor * diffuse, 1.0);
}

八、步骤六：绘制（Draw）与渲染循环

现在，我们把解析出来的顶点缓冲、世界矩阵、渲染管线、Bind Group 全部串联起来：

// 假设已初始化好 device, queue, pipeline, uniformBuffer
const bindGroup = device.createBindGroup({
    layout: pipeline.getBindGroupLayout(0),
    entries: [{
        binding: 0,
        resource: { buffer: uniformBuffer }
    }]
});

function render() {
    // 1. 更新相机矩阵
    const vpMatrix = getCameraViewProjectionMatrix();
    device.queue.writeBuffer(uniformBuffer, 0, vpMatrix);

    const commandEncoder = device.createCommandEncoder();
    const renderPassDescriptor = {
        colorAttachments: [{
            view: context.getCurrentTexture().createView(),
            clearValue: { r: 0.1, g: 0.1, b: 0.1, a: 1.0 },
            loadOp: 'clear',
            storeOp: 'store'
        }]
    };

    const passEncoder = commandEncoder.beginRenderPass(renderPassDescriptor);
    passEncoder.setPipeline(pipeline);
    passEncoder.setBindGroup(0, bindGroup);

    // 2. 遍历 GLTF 节点并渲染
    for (const [nodeIdx, node] of json.nodes.entries()) {
        if (node.mesh === undefined) continue;

        // 获取该节点的世界变换矩阵并写入 Uniform Buffer 的 modelMatrix 偏移区间 (64字节处)
        const modelMatrix = worldMatrices[nodeIdx];
        device.queue.writeBuffer(uniformBuffer, 64, modelMatrix);

        const parsedMesh = parsedMeshes[node.mesh];
        for (const prim of parsedMesh.primitives) {
            // 绑定顶点缓冲
            for (let i = 0; i < prim.vertexBuffersBind.length; i++) {
                passEncoder.setVertexBuffer(i, prim.vertexBuffersBind[i].buffer);
            }

            // 执行绘制
            if (prim.indexData) {
                passEncoder.setIndexBuffer(prim.indexData.buffer, prim.indexData.format, prim.indexData.offset);
                passEncoder.drawIndexed(prim.indexData.count);
            } else {
                passEncoder.draw(prim.vertexCount);
            }
        }
    }

    passEncoder.end();
    device.queue.submit([commandEncoder.finish()]);
    
    requestAnimationFrame(render);
}

九、生产级渲染器的优化进阶方向

上述实现涵盖了原生 WebGPU 解析 GLTF 的最核心脉络。但在实际生产项目中，想要达到媲美成熟引擎的渲染品质和性能，还需要处理以下三个进阶问题：

1. 材质与纹理载入 (PBR Standard Material)

GLTF 通常使用双线性过滤的 PBR（基于物理的渲染）材质。你需要将 GLTF 内置的或外部引用的 PNG/JPEG/KTX2 图片，通过 copyExternalImageToTexture 复制到 WebGPU 的 2D 纹理中，并在 WGSL 中创建对应的 sampler 和 texture_2d<f32> 绑定组。

2. 共享内存与批处理 (Draw Call Batching)

如果场景中存在大量重复的几何体，应当在解析时利用 WebGPU 的 实例化（Instancing） 特性。将多个节点的变换矩阵合并存入一个大的 GPUBuffer 中，通过在 WGSL 中利用 @builtin(instance_index) 实现一次 Draw Call 绘制成千上万个独立物体。

3. 多管线合并与材质排序

不同的 Primitive 可能包含不同的材质属性（例如：透明材质、双面渲染、有无贴图等）。不应该为每一个 Primitive 都创建一个完整的 GPURenderPipeline。
优雅的做法是：提前根据材质特征生成一系列管线模版（例如：不透明管线、半透明管线、影子生成管线），并在渲染时按照管线对节点进行排序（Sorting），最小化管线状态切换（State Switching）开销。

十、结语

使用原生 WebGPU 解析 GLTF/GLB 文件，能够极大加深对 GPU 内存流转和现代图形 API 的理解。虽然底层需要考虑字节对齐、矩阵变换、BindGroup 数量限制等复杂琐碎的细节，但由于 WebGPU 在设计上消除了 CPU 端的隐式转换与状态猜测，使得整个解析器的底层链路异常清晰高效。

通过上述代码作为起点，你已跨出了摆脱庞大第三方 3D 引擎、在现代 Web 端构建超高性能轻量级图形应用的第一步。

PixelConductor WebGPU glTF 前端图形学

脱离 Three.js：用原生 WebGPU 优雅解析并渲染 GLTF 场景

一、 为什么选择原生 WebGPU 解析 GLTF？

二、 核心工作流：从文件到屏幕

三、 步骤一：解析 GLB 二进制容器

四、 步骤二：映射 WebGPU 顶点缓冲区 (GPUBuffer)

五、 步骤三：解析 Mesh 与材质，构建渲染单元