Welcome to the community!
This guide will help you get everything you need to create lifelike synthetic voices. We'll walk through selecting a voice, generating and organizing clips, and pronunciation best practices. So, let's dive in and start creating!
Starting a Project
What is a Project?
Projects act as folders to organize your clips easily. To start a new project, click New on the top right of Studio. Name your project, select a project model, and start creating.
View the Voice Library
When starting a new project, you’ll be directed to audition and choose a voice.
To view all Voices:
- Click on a Voice avatar, or select See all voices from the right-hand menu in your project.
- The Voice Library features a search filter that allows you to narrow results based on regional accents, voice characteristics, and performance styles.
- You can also click the microphone icon in the left-hand menu in Studio to view the entire catalog of voices.
Choosing the best voice for your use case
- Start by considering the tone and mood you want your production to have. Do you want someone trustworthy, authoritative, focused, or warm?
- Narrow down your search by using the Voice Library filters.
- Listen to a few different voices and choose your top options.
- Audition your top three choices by copying in a portion of your script—an excerpt that reflects the tone you're aiming for—and create clips for each of your top voices.
Tip: If you find a voice you love creating with, add it to your favorites by clicking the heart icon.
Generating Audio
Entering your script
Enter your script by copying it from your source material and pasting it into the Studio Editor, typing it in directly, or uploading your script by clicking Import Script.
Available rendering options
Studio offers three options to help streamline your workflow:
- Single take: With this option, you can render up to 5,000 characters in one take.
- Render by sentence: Use this option to create a new clip or section for each sentence in your script, up to 98 sentences. This can speed up your workflow by breaking your script into smaller, more manageable pieces.
- Render by paragraph: This option creates a new clip or section for each paragraph in your script, up to 48 paragraphs. Use two line breaks to define a new paragraph.
Note: In Caruso, you can split your script into sections by selecting “Split by paragraph” or “Split by sentence” when uploading — or simply copy and paste your script into sections manually.
Creating a clip
In Standard:
- Select a voice from the Voice Library.
- Enter or upload your script into the editor.
- Click Create. You've just made a clip
In Caruso:
- Select a voice from the Voice Library.
- Enter or upload your script into the editor.
- Click the Play button to generate an audio take.
Tip: You can listen to all the takes you’ve created for a section and recall the text by clicking the Take [#] at the top left of the section.
How to listen, repopulate, download, and delete a clip
In Standard:
Listen to a clip
- Locate the clip you want to listen to and click the play button on the left of the clip.
- You can pause or resume playback by clicking the play/pause button.
Repopulate a clip
- Locate the clip you want to repopulate and select the "T" icon. The clip's text will reappear in the editor, allowing you to make edits.
- Once you've made changes, click the Create button to create a new clip.
Download a clip
- Locate the clip and click the download icon.
- Select the desired file format (MP3, OGG, or WAV).
- Your clip will download based on your internet browser's download settings.
Delete a clip
- Locate the clip and click the "X" icon.
-
The clip will be permanently deleted.
In Caruso:
Listen to a section
- Locate the section you want to listen to and click the play button on the top right.
- You can pause or resume playback by clicking the play/pause button.
Regenerate and repopulate a take
- Locate the section you want to retake and click the Retake icon.
- Listen to your new take by clicking the play button.
- To listen to all takes for a section, click the Take [#] at the top of each section, or while your section is in an active state, select Take History in the sidebar.
- Click Use this take to repopulate different takes, and click outside the modal to return.
Download a section
- Locate the section and click the download icon.
- Select the desired file format (MP3, OGG, or WAV).
- Your audio will download based on your internet browser's download settings.
Delete a section
- Locate the section and click the three dots on the right.
- Select Delete.
- The section will be permanently deleted.
Organizing Clips
In Standard:
Renaming a clip
- In your Studio project, click the title of your selected clip.
- Enter a new name for your clip.
- Click anywhere outside the text file to save.
Combining clips
Merge multiple clips into one using our Combine tool.
- Select the checkbox on the left side of each clip you want to combine. Clips will combine in order from bottom to top.
- Select Combine.
- Name your new clip and select the pause length between each clip.
- Click Create Clip.
Note: Maximum file size that can be combined is 70MB or ~32 clips, depending on the sample rate.
Moving a clip
You can rearrange clips within a project and from project to project. Learn how to move files to another project here.
In Caruso:
Renaming a section
- In your Studio project, locate the section you want to rename.
- Click on Untitled on the upper left of the section to edit.
- Enter a new name for your section.
- Click anywhere outside the section to save.
Combining sections
Before you begin: Ensure your sections have been rendered.
- Select the sections you want to combine by checking individual sections or clicking Select all at the top of the section list.
- Click Download and choose Download audio as: Combined file. Sections will combine in order from top to bottom, and only the active take in each section will be downloaded.
-
Choose a pause length to ensure consistent pauses between sections:
- Short pause (0.3s)
- Natural pause (0.8s)
- Long pause (1.2s)
- Name Your File. This name will be used for the downloaded audio file on your computer.
-
Enable captions (Optional) to download captions alongside your audio.
Combined downloads generate a single caption file (SRT, VTT) for the full audio.
NOTE: If you have Global Download settings turned on, you’ll need to disable them in your settings.
Rearranging a section
- Copy the text from the section you want to move.
- Find the section that should come before the one you’re moving.
- Click New Section under that section.
- Paste your copied text into the new section and click Play to generate a take.
Note: In Studio, you can create as many takes as needed to get the right clip. Learn more about Unlimited Retakes and how clips are counted
Pronunciation Best Practices
When the voice needs help predicting a word or text, you can guide the output using the following best practices.
Use a Respelling Suggestion
Use the Respelling or Smart Suggestions features to add accurate phonetic replacements for common, industry-specific, uncommon, and complex words, among others.
Double-click or highlight a word in your script to display the toolbar, or simply click New Replacement to see Respelling suggestions.
Create a phonetic Respelling
Respellings are a unique way to format a word by breaking down each syllable and determining which syllables should be emphasized. Create your own or use our Oxford or Smart suggestions.
Create your own Replacement
Use a Replacement to substitute a word, term, or phrase with an alternative way of spelling when the pronunciation of a word, term, or phrase is otherwise ambiguous.
Example: 1099-MISC, tax form
- Voices will vocalize as “ten ninety-nine M I S C”
- Add a Replacement so the voices will always say “ten ninety-nine Miscellaneous”
Emphasizing a word or phrase
Place a word or phrase in quotation marks (""), so the voice pays particular attention to the chosen word or phrase. Quotation marks let you shape the emphasis of your sentence.

Acronym Pronunciation
Some acronyms are pronounced as a word (NASA), while others aren't (NBA). Learn how to guide the voice when pronouncing acronyms correctly here.
Number Pronunciation
Just like real voice actors, AI voices need cues to understand if a number is a dollar amount, a reference number, a value, an address, a dollar amount, a year, a phone number, and so on. Learn how to pronounce numbers here.
Adding a pause
Adding a natural pauses can be accomplished by using commas and periods.
- Commas add pauses anywhere you want a slight, subtle pause.
- Periods create a pause for a downward inflection. They are best used to break a long sentence into two pieces, allowing the AI to better predict which words to emphasize.
Adding a longer pause between sentences
- Use an ellipse (...) to create "breathing room" or a combination of punctuation marks ("...") to create space.
- Press the return or enter key and enter a period a few times for a slightly longer pause.
- You can utilize the Combine feature and choose the length of time between each audio file.
Adjusting questions
Adding inflection to a question requires context, which our AI is still learning. However, you can guide spoken questions to have the inflection you're looking for by using our tips here.
Congratulations, you're now ready to dive into Studio and start bringing your projects to life!
- If you have further questions, please visit our Help Center for additional information and FAQs.
- Want to see Studio in action? View our latest webinars to make the most out of your Studio time.
-
If you'd like further assistance, our WellSaid Support team is available to collaborate with.
Happy Creating!