VHToolkit 2 Preview > Run VH Sandbox Example

Run VH Sandbox Example

Disclaimer: Preview version; content and features subject to change.

Tutorials

Add New NLP Service

Create and Hook Up New Chatbot

Download and Install Binary

Download and Set Up Source Code

Run VH Sandbox Example

Overview

This tutorial is geared towards any researcher and developer who wants to gain familiarity with VHToolkit 2.0.

Overview

Shows how Virtual Humans use a flexible set of backend solutions to demonstrate conversational behavior.

Leverages RIDE’s generic C# interfaces for Speech Recognition (ASR), Natural Language Processing (NLP), Text-To-Speech (TTS) and Non-verbal Behavior Generation (NVBG). Optionally includes sentiment analysis and entity analysis for NLP and Sensing (computer vision) with support for emotion and landmark recognition.

Initially supported services:

ASR	NLP	TTS	NVBG	Sensing
Windows Dictation Recognition (Native Windows service)	Azure QnA	AWS Polly Text To Speech and TTS Voice	NVBG System (External Windows Process)	Azure Face
Azure Speech Recognition (Google Speech SDK)	AWS Lex v1 & v2	Eleven Labs Text To Speech and TTS Voice	NVBG System (Local Integrated Class Library, Windows Only)	AWS Rekognition
Mobile Speech Recognition (KKSpeech Recognizer Asset)	OpenAI GPT3 Turbo GPT (3.5) GPT4	Windows TTS (External Windows Process)	NVBG System (Remote RESTful Service)

Supplemental Virtual Human Conversational Behavior:

Gaze, Saccade
Listening
Emotion Mirroring

Review the Supported Services by Character and Platform section for which combinations of the above are currently expected to function.

Additionally, this scenario has the capability of utilizing custom user created backend implementations for any of the above systems. Requires knowledge of RIDE’s generic C# interfaces.

How to Use

Run the scene and use the sandbox menu to control the scenario parameters.

Environment Tab

Choose between six flat background images, including USC ICT logo, 3D lobby scene, and landscape photos.
Three locations within the 3D SeaView environment.

VH Config Tab

Create VH: In this sub-tab, you can configure the backends you want to test before creating the Virtual Human by selecting the Create button.
- Character ID is prepopulated for each VH and is customizable; note, each VH must have a unique name.
- TTS Voice list is based on the selected TTS backend.
- Select Compare & Contrast to randomize the parameters prior to creation.
- Virtual Humans are created and placed within player FOV.

Edit VH: In this sub-tab, you can swap the backends for already created Virtual Humans.
- VH model template cannot be modified after creation.
- VH position and rotation can be modified here.
- Select between instanced VH characters from the dropdown.
- Delete problematic or unwanted VH characters.

Save/Load VH: In this sub-tab, you can save and load configured Virtual Humans to/from local storage.
- Saved VH configurations are identified by the VH name.
- VH configurations are saved as JSON. If in-scene backend objects have their name changed, the previously saved configuration will not load properly.
- If you load a VH configuration for an already created or loaded VH, the existing VH config will be overwritten by the loaded configuration.

Conversation Tab

This tab acts as conversation analysis/debugging.
Tracks the current VH the user is having a conversation with.
Has 2 sub-tabs, Analysis and Debug Info, for more in-depth information about the current conversation.

User Input

Text Input: Use the designated input field and the Submit button to send a query to the current conversational VH.
Audio Input: Press and hold the Push-To-Talk button and begin speaking.
- Input field will display detected partial speech (if supported by the current ASR backend).
- Query will be sent either once you release the push-to-talk button, or automatically if complete speech is detected, based on the current ASR system.

Toggle Webcam

Turns on the webcam and begins sensing for the backend of the current conversational VH.
Requires a created VH with an assigned Sensing backend.
Webcam overlay displays the following information:
- Detected face rectangle
- Detected landmarks
- Detected head orientation
- Detected emotion

Key bindings:

Key	Action
F1	Toggle Sandbox Menu
V	Push-to-talk shortcut
J	Toggle mouse cursor control
F	Toggle webcam shortcut
Tab	Cycle Sandbox Menu tab
F2	Create Virtual Human shortcut

Scene Location & Name

Assets/Scenes/VHSandbox/VHSandbox.unity

Setup Requirements

The main script name and location: Assets/Scripts/VHSandboxExample.cs.

Below are the steps to add a custom VH backend implementation to the sandbox:

Create your custom backend script, that implements one of the following Ride C# interfaces:
1. ASR – ISpeechRecognitionSystem
2. NLP – INLPQnASystem
3. TTS – ILipsyncedTextToSpeechSystem
4. NVBG – INonverbalGeneratorSystem
5. Sensing – ISensingSystem
Create a Unity gameobject with a user-friendly name and attach the created script.
1. The Gameobject should be placed in the scene under the “VH Backends” object, and should not have the same name as another backend gameobject.
To confirm the previous steps have been completed correctly, play the scene and see if the created gameobject name is displayed under the relevant backend dropdown.

Known Issues and Troubleshooting

Windows

Windows speech recognizer unavailable

Ensure under Settings > Speech, Online speech recognition option is switched to On

Mac

App dependencies blocked from running by versions of macOS

Due to macOS security features, the OS may block certain dependencies at run-time. Use the following workaround for any prompts received:

Open a terminal window at the IVA project folder
Input the following to edit the file attributes: xattr -cr “Assets/Ride_Dependencies (local)/SpeechSDK/Plugins/MacOS/libMicrosoft.CognitiveServices.Speech.core.dylib”
Input the following to edit the file attributes: xattr -cr “Assets/Ride_Dependencies (local)/Oculus/Oculus/LipSync/Plugins/MacOSX/OVRLipSync.bundle”
Play the VHSandbox – Minimal scene again

Speech input and camera feed not available

If prompted by the OS, enable access to your microphone and webcam

General

Config file out of date and errors when running any example scene

Update the config file by launching LevelSelect > ExampleLLM scene
Open the debug menu:
- Windows, press F11 key
- Mac, press (Command +) F11 key; note, may need to enable the option, System Preferences > Keyboard > “Use F1… as standard function keys”
Config menu appears; if not, click header or arrow of debug menu
Click Reset to Defaults button
Return to LevelSelect and choose desired scene again

Services unavailable and various errors when running any example scene

Ensure PC has an active Internet connection; majority of all services require Internet except for “local” or “external” Microsoft/Windows services

Example Scene

VHSandbox

AWS Lex v1 selected as both NLP Main and NLP Sentiment will fail and soft-lock input section of UI; as workaround, create new VH, then delete the problematic VH
- Also occurs with NLP Main as AWS Lex v1 and NLP Sentiment as AWS Lex v2; as workaround, do not use AWS Lex as options in both fields
  - Best practice if using AWS Lex, populate both fields with AWS Lex v2
Character utterance audio may repeat and not match text for ElevenLabs voices
Delay from click & hold until mic input status change to “listening”
Mic input may fail to function with initial click & hold
SeaView environment locations run very slow on lower-end hardware
Rocketbox characters do not display face mirroring
RenderPeople Test character does not display lipsync
Mobile Speech Recognizer does not function on desktop systems
Webcam view may fail to update with creation and/or deletion of multiple characters
AWS Polly TTS, corresponding TTS Voice options may fail to populate
Windows TTS (External Windows Process) may cause strong head nod at start of utterance for characters
NLP Sentiment as AWS Lex v1 or v2 and TTS as ElevenLabs TTS causes VH unresponsiveness and soft-lock input section of UI
Rocketbox characters with NLP Main set as AWS Lex v1 or v2 and TTS as ElevenLabs TTS may cause first utterance to repeat twice
Binary only: NVBG System (Local Integrated Class Library, Windows Only) causes VH unresponsiveness and soft-lock input section of UI

Supported Services by Character and Platform

Note: * denotes a known issue.

		Character Model			Platform
			Rocketbox – Male
		Kevin Civilian	Rocketbox – Female
		Davis OCP	Rocketbox – Male_2	Render People Test	Windows 10/11	MacOS
Speech Recognizer	Windows Speech Recognizer	N/A	N/A	N/A	Yes	N/A
	Azure Speech Recognizer	N/A	N/A	N/A	Yes	Yes
	Mobile Speech Recognizer	N/A	N/A	N/A	N/A	N/A
NLP Main	OpenAI TurboGPT	Yes	Yes	Yes	Yes	Yes
	Azure QnA	Yes	Yes	Yes	Yes	Yes
	AWS Lex v1	Yes*	Yes*	Yes*	Yes*	Yes*
	AWS Lex v2	Yes*	Yes*	Yes*	Yes*	Yes*
	OpenAI GPT3	Yes	Yes	Yes	Yes	Yes
	OpenAI GPT4	Yes	Yes	Yes	Yes	Yes
NLP Sentiment	Azure TA Sentiment	Yes	Yes	Yes	Yes	Yes
	AWS Lex v1	Yes*	Yes*	Yes*	Yes*	Yes*
	AWS Lex v2	Yes*	Yes*	Yes*	Yes*	Yes*
NLP Entities	Azure TA Entities	Yes	Yes	Yes	Yes	Yes
Nonverbal Generation	NVBG System (External Windows Process)	Yes	Yes	Yes	Yes	N/A
	NVBG System (Local Integrated Class Library, Windows Only)	No*	No*	No*	No*	N/A
	NVBG System (Remote RESTful Service)	Yes	Yes	Yes	Yes	Yes
TTS	AWS Polly TTS	Yes	Yes	Yes*	Yes	Yes
	ElevenLabs TTS – Proxy	Yes	Yes	Yes*	Yes	Yes
	ElevenLabs TTS – Auto	Yes	Yes	Yes*	Yes	Yes
	ElevenLabs v2 TTS – Auto	Yes	Yes	Yes*	Yes	Yes
	Windows TTS (External Windows Process)	Yes*	Yes*	Yes*	Yes	N/A
TTS Voice	AWS – Kevin	Yes	Yes	Yes	Yes	Yes
	AWS – Salli	Yes	Yes	Yes	Yes	Yes
	Eleven Labs P – Rachel	Yes	Yes	Yes	Yes	Yes
	Eleven Labs P – Clyde	Yes	Yes	Yes	Yes	Yes
	Eleven Labs P – Arno	Yes	Yes	Yes	Yes	Yes
	Eleven Labs P – Barack_Obama	Yes	Yes	Yes	Yes	Yes
	Eleven Labs A – Rachel	Yes	Yes	Yes	Yes	Yes
	Eleven Labs A – Clyde	Yes	Yes	Yes	Yes	Yes
	Eleven Labs A – Arno	Yes	Yes	Yes	Yes	Yes
	Eleven Labs A – Barack_Obama	Yes	Yes	Yes	Yes	Yes
	Eleven Labs v2 A – Rachel	Yes	Yes	Yes	Yes	Yes
	Eleven Labs v2 A – Clyde	Yes	Yes	Yes	Yes	Yes
	Eleven Labs v2 A – Arno	Yes	Yes	Yes	Yes	Yes
	Eleven Labs v2 A – Barack_Obama	Yes	Yes	Yes	Yes	Yes
	Microsoft – David	Yes	Yes	Yes	Yes	N/A
	Microsoft – Zira	Yes	Yes	Yes	Yes	N/A
Sensing	AWS Rekognition	Yes	No*	No*	Yes	Yes
	Azure Face	Yes	No*	No*	Yes	Yes

Run VH Sandbox Example

Overview

Page Contents

Overview

How to Use

Scene Location & Name

Setup Requirements

Known Issues and Troubleshooting

Windows

Windows speech recognizer unavailable

Mac

App dependencies blocked from running by versions of macOS

Speech input and camera feed not available

General

Config file out of date and errors when running any example scene

Services unavailable and various errors when running any example scene

Example Scene

VHSandbox

Supported Services by Character and Platform

Copyright © 2024 USC Institute for Creative Technologies