Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support audio outputs with openai provider #3907

Open
adolphnov opened this issue Nov 26, 2024 · 3 comments
Open

Support audio outputs with openai provider #3907

adolphnov opened this issue Nov 26, 2024 · 3 comments
Labels
ai/provider enhancement New feature or request

Comments

@adolphnov
Copy link

adolphnov commented Nov 26, 2024

Description

When I use the gpt-4o-audio-preview model and pass the audio, the fetch method I've implemented myself, chat with openai to require the audio and transcription, the audio in the returned data was discarded, rendering the entire conversation completely invalid.

Code example

No response

AI provider

@ai-sdk/openai v1.0.4

Additional context

I found that the zodSchema.safeParse method dropped the audio field.

@adolphnov adolphnov added the bug Something isn't working label Nov 26, 2024
@lgrammel
Copy link
Collaborator

Can you provide details on which information you would need to be returned? currently only audio inputs are supported

@adolphnov
Copy link
Author

adolphnov commented Nov 27, 2024

Can you provide details on which information you would need to be returned? currently only audio inputs are supported

I have input the audio and passed in additional parameters myself:

readonly fetch = async (url: RequestInfo | URL, options?: RequestInit): Promise<Response> => {
        const body = JSON.parse(options?.body as string);
        if (body.model === 'gpt-4o-audio-preview') {
            body.modalities = ['text', 'audio'];
            body.audio = { voice: 'alloy', format: 'opus' };
        }
        return fetch(url, {
            ...options,
            body: JSON.stringify(body),
        });

The original response from OpenAI contains an audio field:

{
  "index": 0,
  "message": {
    "role": "assistant",
    "content": null,
    "refusal": null,
    "audio": {
      "id": "audio_abc123",
      "expires_at": 1729018505,
      "data": "<bytes omitted>",
      "transcript": "Yes, golden retrievers are known to be ..."
    }
  },
  "finish_reason": "stop"
}

However, it was filtered by the SDK internally and the audio field was discarded.
zodSchema.safeParse filted audio field

I need the original return result, not the trimmed data. The response data I get in the onStepFinish middleware is also filtered and doesn't include the audio field.

@polesapart
Copy link

Ideally, ai sdk should formally support audio, as it supports images and pdf. Other ai providers may as well implement audio i/o soon enough, i.e. anthropic & hume ai colaboration rumours are around.

@lgrammel lgrammel changed the title When using sdk/openai, important data is discarded by the SDK Support audio outputs with openai provider Dec 3, 2024
@lgrammel lgrammel added enhancement New feature or request and removed bug Something isn't working labels Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ai/provider enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants