View Source

텍스트를 입력하면 음성으로 변환해 출력하고~ LLM AI응답도 음성으로 출력해 AI와 음성대화를 하는 심플음성채팅을 OPENAPI를 이용해 작동되는 데모로

Blazor의 특성과 액터모델의 장점을 이용해 더 강력한 커스텀화된 RealTime AI기능을 만들수 있습니다.

DEMO

PSMON > 텍스트 인풋 AI 음성채팅 with BlazorAkka.net > image-2025-6-8_17-9-39.png

진행되는 대화내용은 텍스트로 표시되며 동시에 음성으로 출력됩니다. ( 음성을 내기 힘든 상황에서의 인풋형 음성채팅 )
TTS및 컴플리트 LLM기능만 이용하지만 음성간 응답흐름을 최대한 짧게응답해 리얼타임인것처럼
세션및 리얼타임 LLM을 이용하지 않고~ 문맥과 응답을 이어갈수 있는 개인별 상태를 유지하는 액터베이스로 설계

어플리케이션 컴포넌트 구성

핵심구현코드를 간단하게 살펴보겠습니다.

주요핵심코드

응답속도를 높이기 위한 트릭

_ = Task.Run(() => GetChatCompletion(command.Text));
var recVoice = _openAIService.ConvertTextToVoiceAsync(command.Text, command.Voice).Result;
_blazorCallback?.Invoke("AddMessage", new object[] { command.From, command.Text });
_blazorCallback?.Invoke("PlayAudioBytes", new object[] { recVoice, 0.5f, playType });

Text가 입력되면 TTS 음성파일을 준비하는 동안, LLM요청을 미리 분리된 스레드에서 요청합니다.
- VoiceAPI보다 LLM Text응답이 더 빠르기때문에 가능한 전략
여기서 응답속도를 더 줄이려면~ LLM에 응답받은 Text + 응답해야할 음성파일을 미리 만들수도 있습니다.

OPEN AI API 활용

public async Task<string> GetChatCompletion(string message, List<string> assist)
{
    var completionResult = await _client.CompleteChatAsync(new ChatMessage[]
    {
        ChatMessage.CreateUserMessage(message),
        ChatMessage.CreateAssistantMessage(string.Join("\n", assist))
    });
    string aiResponse = completionResult.Value.Content.FirstOrDefault()?.Text ?? string.Empty;

    return aiResponse;
}


/// <summary>
/// voice alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, , shimme
/// </summary>
/// <param name="text"></param>
/// <param name="voice"></param>
/// <returns></returns>
/// <exception cref="InvalidOperationException"></exception>
public async Task<float[]> ConvertTextToVoiceAsync(string text, string voice = "alloy")
{
    var requestBody = new
    {
        model = "gpt-4o-mini-tts",  // TTS 모델 이름
        input = text,               // 변환할 텍스트
        voice                       // 음성 스타일
    };
    
    var ttsTask = _httpClient.PostAsJsonAsync("audio/speech", requestBody);
    
    await Task.WhenAll(ttsTask); // 두 작업이 완료될 때까지 기다림

    var response = await ttsTask;

    if (!response.IsSuccessStatusCode)
    {
        throw new InvalidOperationException($"TTS API 호출 실패: {response.ReasonPhrase}");
    }

    // MP3 데이터를 byte 배열로 변환하여 반환
    var audioBytes = await response.Content.ReadAsByteArrayAsync();

    // MP3 데이터를 PCM 데이터로 변환
    return ConvertMp3ToFloatArray(audioBytes);
}

ChatCompletion와 VoiceAPI의 TTS두개만 활용되었으며 더 다양한 스트림기능을 이용해 확장해나갈수 있습니다.

MP3 TO PCM

private float[] ConvertMp3ToFloatArray(byte[] mp3Data)
{
    using var mp3Stream = new MemoryStream(mp3Data);
    using var mp3Reader = new Mp3FileReader(mp3Stream);

    // PCM 데이터를 float 배열로 변환
    var sampleProvider = mp3Reader.ToSampleProvider();
    var totalSamples = (int)(mp3Reader.TotalTime.TotalSeconds * mp3Reader.WaveFormat.SampleRate * mp3Reader.WaveFormat.Channels);
    var floatBuffer = new float[totalSamples];
    int samplesRead = sampleProvider.Read(floatBuffer, 0, floatBuffer.Length);

    return floatBuffer.Take(samplesRead).ToArray();
}

OpenAPI의 TTS는 mp3로 반환하며~ 순수웹에서 스트리밍 재생하려면 PCM형태의 스트림 데이터 변환이 필요합니다.

웹에서의 재생

async function playAudioBytes(audioBytes, playbackRate, type, dotNetRef) {
    try {
        if (!audioContext || audioContext.state === 'closed') {
            audioContext = new AudioContext();
        }

        // Float32Array로 변환된 PCM 데이터 사용
        const float32Array = new Float32Array(audioBytes);

        // AudioBuffer 생성
        const audioBuffer = audioContext.createBuffer(1, float32Array.length, audioContext.sampleRate);
        audioBuffer.copyToChannel(float32Array, 0);

        // 재생
        const bufferSource = audioContext.createBufferSource();
        bufferSource.buffer = audioBuffer;
        bufferSource.playbackRate.value = playbackRate; // 재생 속도 설정
        bufferSource.connect(audioContext.destination);                

        // 재생 완료 이벤트 핸들러 추가
        bufferSource.onended = () => {            
            // 타입에 따라 추가 작업 수행
            console.log(`오디오 재생 완료 재생 타입: ${type}`);
            if (type === 1) {
                console.log("Type 1: 휴먼요청 재생완료~ LLM응답재생 요청");
                if (dotNetRef && typeof dotNetRef.invokeMethodAsync === "function") {
                    dotNetRef.invokeMethodAsync("OnAudioPlaybackCompleted", 1)
                        .catch(err => console.error("Blazor 메서드 호출 OnAudioPlaybackCompleted 중 오류 발생:", err));
                }                
            } else if (type === 2) {
                console.log("Type 2: AI재생완료");
            } else if (type === 3) {
                console.log("Type 3: 사용자 정의 작업");
            }
        };

        bufferSource.start();
    } catch (err) {
        console.error("오디오 재생 중 오류 발생:", err);
    }
}

PCM Byte를 웹에서 재생할수 있으며, 재생이 완료되면~ 오디오 재생이 끝난 순간의 이벤트를 서버(Blazor)에게 리얼타임으로 호출할수 있습니다.
- Blazor는 웹소켓 인터페이스를 기본으로 사용해 리모트지점의 함수를 리얼타임처럼 호출가능한 전략으로, Blazor가 아닌경우 WS인터페이스로 구현대할수 있습니다.

BlazorPage

<PageTitle>WebRTC</PageTitle>
<h1>WebRTC</h1>
    <!-- 채팅창 -->
    <MudItem xs="12">
        <MudPaper Class="pa-3">
            <MudButton OnClick="StartWebRTC" Class="mt-4">
                Start WebRTC
            </MudButton>
        </MudPaper>
        <MudPaper Class="pa-3">
            <MudText Typo="Typo.subtitle1">채팅</MudText>
            <MudStack>
                <!-- 채팅 리스트 -->
                <MudList T="string">
                    @foreach (var chat in ChatMessages)
                    {
                        <MudListItem Text="@chat" Icon="@Icons.Material.Filled.Chat" />
                    }
                </MudList>
                <!-- 채팅 입력 -->
                <MudTextField @bind-Value="ChatInput" Placeholder="메시지를 입력하세요..." />
                <MudButton OnClick="SendChatMessage" Class="mt-2">전송</MudButton>
            </MudStack>
        </MudPaper>
    </MudItem>
@code {
    [JSInvokable]
    public async Task OnAudioPlaybackCompleted(int option)
    {
      MyVoiceActor.Tell(new TTSCommand()
      {
      From = "AI",
      Text = "LLM자동재생",
      Voice = "alloy"
      });
    }

    private void PlayAudioBytes(float[] voice, float speed, int playtype)
    {
      InvokeAsync(() =>
      {
      var dotNetRef = DotNetObjectReference.Create(this);
      JSRuntime.InvokeVoidAsync("playAudioBytes", voice, speed, playtype, dotNetRef);
      });
    }
}

BlazorPage는 서버 코드의 상태값 변경으로 프론트의 페이지를 부분 갱신할수 있는 장점이 있습니다.
- 이러한 방식은 서버렌더링에서는 불가능하며 WS를 이용한 InteractiveServer 방식입니다.
JSInvokable 를 통해 JS가 호출할수 있는 원격함수를 만들수도 있으며, InvokeVoidAsync를 통해 프론트의 js함수를 호출할수 있습니다 ( 양방향 호출가능)
WebRTC이 제대로 활용되지 않았으나~ webrtc로 부터 발생한 데이터도 서버로 분석되어 작동됩니다.
- WebRTC에 발생된 음성 오디오 데이터와 볼륨데이터가 전송되어 음성채팅데이터 서버에서 활용가능

VoiceChatActor

public class VoiceChatActor : ReceiveActor, IWithTimers
{    
    private List<String> _conversationHistory = new();

    private string lastAiMessage = string.Empty;

    private Action<string, object[]> _blazorCallback;

    private OpenAIService _openAIService;

    private int MaxAIWordCount = 100; // AI 응답 최대 단어 수 설정

    private sealed class TimerKey
    {
        public static readonly TimerKey Instance = new();
        private TimerKey() { }
    }

    public int RefreshTimeSecForContentAutoUpdate { get; set; } = 30;

    public ITimerScheduler Timers { get; set; } = null!;

    public VoiceChatActor(IServiceProvider serviceProvider)
    {
        logger.Info($"VoiceChatActor : Constructor - {Self.Path}");

        _openAIService = new OpenAIService();

        // 액터별 반복스케줄러 기능을 가져~ 응답이 아닌 능동형기능에 이용될수 있습니다.
        Timers.StartPeriodicTimer(
            key: TimerKey.Instance,
            msg: new ContentAutoUpdateCommand(),
            initialDelay: TimeSpan.FromSeconds(10),
            interval: TimeSpan.FromSeconds(RefreshTimeSecForContentAutoUpdate));

        Receive<ContentAutoUpdateCommand>( command =>
        {
            logger.Info("VoiceChatActor : ContentAutoUpdateCommand");                
        });

        Receive<TTSCommand>( command =>
        {
            logger.Info($"VoiceChatActor : Received Command - {command.GetType().Name}");
            switch (command.From)
            {
                case "Your":
                {
                    int playType = 1;
                    _ = Task.Run(() => GetChatCompletion(command.Text));
                    var recVoice = _openAIService.ConvertTextToVoiceAsync(command.Text, command.Voice).Result;
                    _blazorCallback?.Invoke("AddMessage", new object[] { command.From, command.Text });
                    _blazorCallback?.Invoke("PlayAudioBytes", new object[] { recVoice, 0.5f, playType });
                }                        
                break;
                case "AI":
                {
                    var msg = lastAiMessage;
                    int playType = 2;
                    var recVoice = _openAIService.ConvertTextToVoiceAsync(msg, command.Voice).Result;
                    _blazorCallback?.Invoke("AddMessage", new object[] { command.From, msg });
                    _blazorCallback?.Invoke("PlayAudioBytes", new object[] { recVoice, 0.5f, playType });
                }
                break;
                default:
                    logger.Warning($"Unknown command received: {command.From}");
                break;
            }
        });
    }

    /// <summary>
    /// 주어진 메시지에 대한 ChatCompletion을 생성합니다.
    /// </summary>
    /// <param name="message">보낼 메시지</param>
    /// <returns>ChatCompletion 결과</returns>
    public async Task<string> GetChatCompletion(string message)
    {
        _conversationHistory.Add($"User:{message}");

        // 최근 20개의 대화 기록을 가져옵니다.
        var recentHistory = _conversationHistory.Skip(Math.Max(0, _conversationHistory.Count - 20)).ToList();

        // 수정된 코드: ChatMessage 생성 시 올바른 정적 메서드 사용
        var aiResponse = await _openAIService.GetChatCompletion(

            $"요청메시지는 : {message} 이며 첨부메시지는 현재 대화내용의 히스토리이며 이 맥락을 유지하면서 답변, 답변은 {MaxAIWordCount}자미만으로 줄여서 답변을 항상해~ AI는 너가답변한것이니 언급없이 너인것처럼하면됨",
            recentHistory
        );            

        _conversationHistory.Add($"AI:{aiResponse}");
        lastAiMessage = aiResponse;

        return aiResponse;
    }
}

음성채팅 기능의 이벤트를 처리하며 OpenAI API를 이용하고~ blazor에게 완료를 수행해 UI업데이트및 최종 프론트에게 재생 스트림을 전달해 재생시킬수도 있습니다.
비교적 간단한 LLM ChatCompletion이 이용되었으며 이 부분을 개선해, 더 스마트한 AI음성봇을 만들수도 있습니다.
- 액터모델은 상태 프로그래밍과 ( https://getakka.net/articles/actors/finite-state-machine.html) , 스트리밍(https://getakka.net/articles/streams/buffersandworkingwithrate.html) 및 분산 클러스터로 확장할수 있는 장치를 제공합니다.
사용자로부터 Input Text를 받는형태이지만 필요하면 음성 입력스트림을 바로 받아 처리할수도 있습니다.

데모샘플 전체코드

https://github.com/psmon/blazor-voice/blob/main/doc/kr/index.md
- 준비된 OPENAI_API_KEY 를 env에 주입하면 실행가능합니다.