Product Update: Improved Speaker Diarization and New Export Options
We are excited to announce several improvements to AudioToTextAI that make your transcription workflow even better. This update includes enhanced speaker diarization, new export formats, and significant performance improvements.
Improved Speaker Diarization
Our speaker diarization system has been upgraded with a new neural network architecture that delivers better results in challenging scenarios:
- Better overlap handling: The new model is significantly better at separating speakers when they talk simultaneously or interrupt each other.
- Improved accuracy for similar voices: Speakers with similar vocal characteristics are now distinguished more reliably.
- Faster processing: Diarization now adds minimal overhead to transcription time, even for long recordings.
- More consistent labeling: Speaker labels are more stable throughout long recordings, reducing cases where the same speaker was assigned different labels.
New Export Formats: DOCX and PDF
You can now export your transcriptions in Microsoft Word (DOCX) and PDF format, in addition to TXT, SRT, VTT, and JSON. The new formats include:
- Formatted speaker labels and timestamps
- Professional document layout
- Customizable headers with file name, date, and duration
- Easy sharing with colleagues who may not use AudioToTextAI
Performance Improvements
We have optimized our GPU processing pipeline to deliver faster results:
- Average processing time reduced by 30% across all models
- Queue times during peak hours improved with better load balancing
- Large file handling (3+ hours) is now more reliable
- API response times improved for status polling and result retrieval
Other Improvements
- Interactive editor now loads faster for long transcriptions
- Improved word-level timestamp accuracy for Whisper Turbo model
- Better handling of audio files with unusual encoding settings
- API rate limits increased for all plan tiers
What Is Next
We are actively working on additional improvements including real-time streaming transcription, enhanced multilingual support, and deeper integrations with popular tools like Zoom, Google Meet, and Slack. Stay tuned for more updates.
As always, we welcome your feedback. If you have suggestions or encounter any issues, reach out to our support team. Your input directly shapes our development priorities.