Module: bark_hf_inference_workflow
Bark Inference Workflow
This workflow uses huggingface's transformers library to perform inference on Suno's text-to-speech Bark model.
Constructor Arguments
model_name (Optional[str])
: The source of the model. This can be eithersuno/bark
orsuno/bark-small
. Default issuno/bark
.default_voice_preset (Optional[str])
: The default voice preset to be used. See list of supported presets.
Additional Installations
Since this workflow uses some additional libraries, you'll need to install infernet-ml[bark_inference]
. Alternatively,
you can install those packages directly. The optional dependencies "[bark_inference]"
are provided for your
convenience.
Input Format
Input to the inference workflow is the following pydantic model:
class BarkWorkflowInput(BaseModel):
# prompt to generate audio from
prompt: str
# voice to be used. There is a list of supported presets here:
# here: https://github.com/suno-ai/bark?tab=readme-ov-file#-voice-presets
voice_preset: Optional[str]
"prompt"
: The text prompt to generate audio from."voice_preset"
: The voice preset to be used. See list of supported presets.
Output Format
The output of the inference workflow is a pydantic model with the following keys:
"audio_array"
: The audio array generated from the input prompt.
Example
In this example, we will use the Bark Inference Workflow to generate audio from a prompt. We will then write the generated audio to a wav file.
from scipy.io.wavfile import write as write_wav # type: ignore
from infernet_ml.workflows.inference.bark_hf_inference_workflow import (
BarkHFInferenceWorkflow,
BarkWorkflowInput,
)
workflow = BarkHFInferenceWorkflow(model_name="suno/bark-small", default_voice_preset="v2/en_speaker_0")
workflow.setup()
input = BarkWorkflowInput(
prompt="Hello, my name is Suno. I am a text-to-speech model.",
voice_preset="v2/en_speaker_5"
)
inference_result = workflow.inference(input)
generated_audio_path = "output.wav"
# write output to a wav file
write_wav(
generated_audio_path,
BarkHFInferenceWorkflow.SAMPLE_RATE,
inference_result.audio_array,
)
BarkHFInferenceWorkflow
Bases: TTSInferenceWorkflow
Implementation of Suno TTS Inference Workflow.
Source code in src/infernet_ml/workflows/inference/bark_hf_inference_workflow.py
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 |
|
do_postprocessing(input_data, output)
Converts the model output to a numpy array, which then can be used to save the audio file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_data(Any) |
original input data |
required | |
output |
Tensor
|
output tensor from the model |
required |
Returns:
Name | Type | Description |
---|---|---|
AudioInferenceResult |
AudioInferenceResult
|
audio array |
Source code in src/infernet_ml/workflows/inference/bark_hf_inference_workflow.py
do_preprocessing(input_data)
Preprocesses the input data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_data |
BarkWorkflowInput
|
input data to be preprocessed |
required |
Returns:
Name | Type | Description |
---|---|---|
BatchEncoding |
BatchEncoding
|
batch encoding of the input data |
Source code in src/infernet_ml/workflows/inference/bark_hf_inference_workflow.py
do_run_model(preprocessed_data)
Run the model on the preprocessed data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
preprocessed_data |
BatchEncoding
|
preprocessed data |
required |
Returns:
Type | Description |
---|---|
Tensor
|
torch.Tensor: output tensor from the model |
Source code in src/infernet_ml/workflows/inference/bark_hf_inference_workflow.py
do_setup()
Downloads the model from huggingface. Returns: bool: True on completion of loading model
Source code in src/infernet_ml/workflows/inference/bark_hf_inference_workflow.py
do_stream(preprocessed_input)
Stream data for inference. Currently not implemented.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
preprocessed_input |
Any
|
preprocessed input data |
required |
Returns:
Type | Description |
---|---|
Iterator[Any]
|
Iterator[Any]: iterator for streaming data |
Raises:
Type | Description |
---|---|
NotImplementedError
|
if the method is not implemented |
Source code in src/infernet_ml/workflows/inference/bark_hf_inference_workflow.py
inference(input_data, log_preprocessed_data=True)
Override super class inference method to be annotated with the correct types.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_data |
str
|
prompt to generate audio from |
required |
Returns:
Name | Type | Description |
---|---|---|
AudioInferenceResult |
AudioInferenceResult
|
audio array |
Source code in src/infernet_ml/workflows/inference/bark_hf_inference_workflow.py
BarkProcessor
Bases: Protocol
Type for the Suno Processor function. Used for type-safety.
Source code in src/infernet_ml/workflows/inference/bark_hf_inference_workflow.py
__call__(input_data, voice_preset)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_data |
str
|
prompt to generate audio from |
required |
voice_preset |
str
|
voice to be used. There is a list of supported presets |
required |
here |
|
required |
Returns:
Name | Type | Description |
---|---|---|
BatchEncoding |
BatchEncoding
|
batch encoding of the input data |