Support audio outputs with openai provider #3907

adolphnov · 2024-11-26T23:53:30Z

Description

When I use the gpt-4o-audio-preview model and pass the audio, the fetch method I've implemented myself, chat with openai to require the audio and transcription, the audio in the returned data was discarded, rendering the entire conversation completely invalid.

Code example

No response

AI provider

@ai-sdk/openai v1.0.4

Additional context

I found that the zodSchema.safeParse method dropped the audio field.

The text was updated successfully, but these errors were encountered:

lgrammel · 2024-11-27T10:16:24Z

Can you provide details on which information you would need to be returned? currently only audio inputs are supported

adolphnov · 2024-11-27T12:56:57Z

Can you provide details on which information you would need to be returned? currently only audio inputs are supported

I have input the audio and passed in additional parameters myself:

readonly fetch = async (url: RequestInfo | URL, options?: RequestInit): Promise<Response> => {
        const body = JSON.parse(options?.body as string);
        if (body.model === 'gpt-4o-audio-preview') {
            body.modalities = ['text', 'audio'];
            body.audio = { voice: 'alloy', format: 'opus' };
        }
        return fetch(url, {
            ...options,
            body: JSON.stringify(body),
        });

The original response from OpenAI contains an audio field:

{
  "index": 0,
  "message": {
    "role": "assistant",
    "content": null,
    "refusal": null,
    "audio": {
      "id": "audio_abc123",
      "expires_at": 1729018505,
      "data": "<bytes omitted>",
      "transcript": "Yes, golden retrievers are known to be ..."
    }
  },
  "finish_reason": "stop"
}

However, it was filtered by the SDK internally and the audio field was discarded.

I need the original return result, not the trimmed data. The response data I get in the onStepFinish middleware is also filtered and doesn't include the audio field.

polesapart · 2024-11-27T13:16:03Z

Ideally, ai sdk should formally support audio, as it supports images and pdf. Other ai providers may as well implement audio i/o soon enough, i.e. anthropic & hume ai colaboration rumours are around.

adolphnov added the bug Something isn't working label Nov 26, 2024

lgrammel added the ai/provider label Nov 27, 2024

lgrammel changed the title ~~When using sdk/openai, important data is discarded by the SDK~~ Support audio outputs with openai provider Dec 3, 2024

lgrammel added enhancement New feature or request and removed bug Something isn't working labels Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support audio outputs with openai provider #3907

Support audio outputs with openai provider #3907

adolphnov commented Nov 26, 2024 •

edited

Loading

lgrammel commented Nov 27, 2024

adolphnov commented Nov 27, 2024 •

edited

Loading

polesapart commented Nov 27, 2024

Support audio outputs with openai provider #3907

Support audio outputs with openai provider #3907

Comments

adolphnov commented Nov 26, 2024 • edited Loading

Description

Code example

AI provider

Additional context

lgrammel commented Nov 27, 2024

adolphnov commented Nov 27, 2024 • edited Loading

polesapart commented Nov 27, 2024

adolphnov commented Nov 26, 2024 •

edited

Loading

adolphnov commented Nov 27, 2024 •

edited

Loading