Add New NLP Service

Disclaimer: Preview version; content and features subject to change. 

Overview

This tutorial is geared towards any researcher and developer who wants to new REST web services to VHToolkit 2.0.

Page Contents

Overview

This tutorial shows how developers can add additional NLP web services to RIDE. We’ll use Anthropic Claude as an example.

Background:

  • RIDE communicates with cloud NLP services through their REST APIs
  • RIDE provides a series of C# interfaces that can be implemented and extended
  • RIDE provides a common REST web services framework 
  • See the ExampleNLP scene and associated script for existing NLP solutions

Requirements

  • Existing NLP AI web service (e.g., AWS Lex) account
  • Web service requirements (e.g., authentication method)
  • Web service interaction details (e.g., endpoint URI)

ExampleLLM Unity Scene Introduction

The scene consists of three objects: RideSystems,, a Camera, and the ExampleLLM gameObject. The initial game view will look like this:

The ExampleLLM child object hierarchy is as follows:

  • ProviderPanelParent
    • This is where all of the provider UI panels will be placed
    • Contains a PanelLLM prefab for OpenAI GPT – 4 (more on PanelLLM later)
  • BtnAsk
    • OnClick send prompt; not needed as return/enter sends the information
  • InpQuestion
    • The RideInputField used to enter the prompt text

The ExampleLLM componen has the two main objects of interest that exist in the scene: InpQuestion, to register the user input, and the ProviderPanelParent, to track the provider panels that exist in the scene.

 
 

The PanelLLM Unity Prefab

 

Existing in the hierarchy is the OpenAI GPT – 4 object under ExampleLLM -> ProviderPanelParent -> OpenAI GPT – 4. This is a PanelLLM prefab (Assets -> Ride (local) -> Prefabs -> UI -> PanelLLM.prefab) that has been modified to function as a UI panel for GPT4.

The PanelLLM.prefab hierarchy:

  • TxtProvider
    • A RideTextTMPro object that is manually updated to showcase the provider name
  • AutoVerticalScrollView
    • A RideScrollView object that has its context RideTextTMPro child object updated per user input and generated response. It automatically scrolls to the bottom of the most recently received message
  • TxtResponseTime
    • A RideTextTMPro object that updates to show the response time taken for the most recently generated response
  • TxtMaxTokenCount
    • A RideTextTMPro object that is manually updated to show the max token count of the provider
The PanelLLM component:
 
 

In addition to storing references to the RideTextTMPro components for use in updating the text of the relative sections, this component also allows for the customization of the colors of both the user input and the generated responses as they are displayed in the conversation text box.

Finally, the provider system is added to allow for the referencing of the NLP system that the panel will be showcasing the responses of.

Adding a New Panel for Anthropic

In order to test a new provider, a new panel is needed to show its responses. To add a new panel for the Anthropic – Claude provider, open the ExampleNLP Prefab, and add the PanelLLM prefab (Assets -> Ride (local) -> Prefabs -> UI -> PanelLLM.prefab) as a child of the ProviderPanelParenet:

The game view now looks like this:

Update some text for clarity:

  • Rename the PanelLLM object added from “PanelLLM” to “Anthropic – Claude”
  • Change the text of the TxtProvider object in the Anthropic – Claude panel from “<Provider Name>” to “Anthropic – Claude”
  • Change the text of the TxtMaxTokenCount in the Anthropic – Claude panel from “Max Token Count <# tokens> tokens” to “Max Token Count: 100,000 tokens”

Review Existing GPT-4 Panel and Script


 
Key script elements:
  • It exists in the Ride.NLP namespace
  • It derives from NLPBase, which is the type expected by the PanelLLM component for the provider reference
  • It contains fields relative to its execution:
    • m_answerSize
    • temperature
    • max_tokens
    • OpenAIMessage objects
      • Note: unused in this tutorial
  • It contains an overridden SystemInit() method
  • It contains overridden methods, AskQuestion and Request which are required by the NLPBase base class.
    • These are the interface methods used to collect the user input, package the input in a provider specific data structure and request a response

NLPBase Overview

NLPBase is an abstract base class that acts as a common interface for all NLP systems and enforces the implementation of key methods required for the NLP system to function.

It contains the fields:

  • m_uri: to store the endpoint
  • m_authorizationKey: to store the endpoint access key
  • m_responseTime: to store the most recent response time in ms
  • pastUserInputs: to store a history of the user inputs
  • generatedResponses: to store a history of the generated responses from the NLP system
  • stopwatch: for use when determining the response time
  • initialPrompt: to store the initial prompt used when initializing the NLP system

It contains the methods:

  • AddUserInput: used to add a new user input to the pastUserInputs list
  • AddGeneratedResponse: used to add a new generated response to the generatedResponses list
  • GetLatestUserInput: used to get the most recent user input
  • GetLatestGeneratedResponse: used to get the most recent generated response

It contains the abstract methods:

  • AskQuestion: used to package the most recent user input into a context appropriate data structure and send to the NLP system
  • Request: used to request and deserialize a response from the NLP system

These abstract methods are mandatory and are implemented by each derived system.

This abstract base class allows all NLP systems to behave in the same general way by having the same field and method names referring to context specific logic per derived NLP system (more on this with ExampleLLM.cs).

More on the key methods below.

SystemInit():
The SystemInit() method is an override of the RideSystemMonoBehavior method of the same name. This is inherited from the NLPBase class as it derives from RideSystemMonoBehavior. This override is optional, but is a good place to initialize the uri, authorization key and the initial prompt used for the provider if there is one.

AskQuestion(string question, Action<NLPResponse> onComplete):
The AskQuestion method is a required override of the NLPBase abstract method. This is a functionality implementation of the INLPQnASystem interface that NLPBase derives from. The purpose of this method is to store the user input into a context appropriate data structure and pass it to Request. It automatically stores the user input in the pastUserInput list for future access.

Request(string uri, string question, Action<NLPResponse> onComplete, string data = null)
The Request method is a required override of the NLPBase abstract method. This is a functionality implementation of the INLPSystem interface that NLPBase derives from. The purpose of this method is to request a response, wait for the response and deserialize that response from the NLP system into a value that can be easily referenced/used. It automatically stores the deserialized response in the generatedResponses list for future access.

Setting up the AnthropicClaude System Script

Create a script for the new Anthropic system. For the purposes of this tutorial, it will be named AnthropicClaude.cs

Note: during the next few implementation steps, comparisons can be made to the GPT4 system.

Script Setup First Steps:

  • Place the system inside of the Ride.NLP namespace
  • Get rid of the Start and Update methods if present
  • Derive from NLPBase instead of MonoBehavior

Addressing the Error:

There will be an error with the class. In particular the error stems from failing to implement the methods that are required when deriving from NLPBase: AskQuestion and Request. Luckily there is a simple way of doing this. Right click the class name -> Quick Actions and Refactorings… -> Implement abstract class. This will set up the required methods with the proper declarations automatically!

Anthropic REST Web Service

As with OpenAI GPT-4, Anthropic can receive a REST message and send a response in JSON format.

https://docs.anthropic.com/claude/reference/complete_post provides the required and optional headers and parameters.

In addition to the required authorization key in the header, the required parameters are:

  • Model (e.g., claude-1, claude-2)
  • Prompt (i.e., the user input; ideally the full conversational history of both user and agent)
  • Max_tokens_to_sample (i.e., how many tokens to generate; tokens are word parts)

Known JSON structures are typically (de)serialized with supporting classes. See \Assets\Ride (local)\Systems\NLP\OpenAIGPT3.cs for an example.

Implement AskQuestion() for Anthropic

Key Elements of the AskQuestion Method:

  • AddUserInput is used to store the question
  • A conversation history is being built by combining the user inputs and generated responses in order from the beginning
  • This history is provided alongside the Anthropic model and max tokens inside of a context relevant data structure needed to request a response
  • The user input and context relevant data is passed to Request

Code:

public override void AskQuestion(string question, Action<NLPResponse> onComplete)
{
AddUserInput(question);
var conversationHistory = new List<string>();
for (int i = 0; i < pastUserInputs.Count; i++)
{
var userText = pastUserInputs[i];
string generatedResponse = "";

if (i < generatedResponses.Count)
generatedResponse = generatedResponses[i];

conversationHistory.Add($"\nHuman: {userText}\nAssistant: {generatedResponse}");
}

string historyText = string.Join(" ", conversationHistory);

string data = JsonConvert.SerializeObject(new
{
model = "claude-1",
prompt = historyText,
max_tokens_to_sample = 256
});

Request(m_uri, question, onComplete, data);
}

Implement Request() for Anthropic

Key Elements of the Request Method:

  • It is an async override to allow for waiting for a response to come
    • Note: when using the quick actions and refactorings, the declaration is not automatically set to async. The async keyword must be added to the method declaration in order to use await functionality
  • A context relevant UnityWebRequest is made using the m_uri and m_authorizationKey to connect with the NLP system
  • It waits until a response is received
  • If the response is not an error, deserialize the response into usable information. Parse the information and use AddGeneratedResponse to add the parsed info.
    • In this context deserialize to the AnthropicResponse struct
  • Invoke the OnComplete Action.
  • Add: using System.Threading.Tasks;

Code:

public override async void Request(string uri, string content, Action<NLPResponse> onComplete, string data = null)
{
using var request = new UnityWebRequest(m_uri, "POST");
byte[] bodyRaw = Encoding.UTF8.GetBytes(data);
request.uploadHandler = new UploadHandlerRaw(bodyRaw);
request.downloadHandler = new DownloadHandlerBuffer();
request.SetRequestHeader("Content-Type", "application/json");
request.SetRequestHeader("x-api-key", m_authorizationKey);
request.SetRequestHeader("anthropic-version", "2023-06-01");

var operation = request.SendWebRequest();
while (!operation.isDone)
{
await Task.Yield();
}

if (!(request.result == UnityWebRequest.Result.ConnectionError))
{
var result = request.downloadHandler.text;
var res = JsonConvert.DeserializeObject<AnthropicResponse>(result);
AddGeneratedResponse(res.completion);
onComplete?.Invoke(new NLPResponse(res.completion));
}
}

private class AnthropicResponse
{
public string completion { get; set; }
public string stop_reason { get; set; }
public string model { get; set; }
}

Note: the stopwatch of the NLPBase can be used in request to get response time and store it in m_responseTime to use later, but ExampleLLM.cs uses the stopwatch outside of the class explicitly instead.

Override and Implement SystemInit() for Anthropic

Now that we have the functionality for asking for and receiving responses, we need to initialize the system to be able to make these requests.

Key Elements of the SystemInit Method:

  • The ride config system is used to store the endpoint and the endpointKey.
  • The initialPrompt field is used alongside the AddUserInput method to set the starting user input, then the AddGeneratedResponse method is used to associate some response to the starting input. This is done to keep the count of the pastUserInput list the same as the generatedResponses list since no response is being generated for this input through AskQuestion
  • Calls base.SystemInit()

Code:

public override void SystemInit()
{
var configSystem = api.systemAccessSystem.GetSystem<RideConfigSystem>();
m_uri = configSystem.config.anthropicClaude.endpoint;
m_authorizationKey = configSystem.config.anthropicClaude.endpointKey;

AddUserInput(initialPrompt);
AddGeneratedResponse("Initial prompt registered successfully");
base.SystemInit();
}

Add Script to ExampleLLM Scene / PanelLLM

Back in the ExampleLLM scene, we can now add our AnthropicClaude component to the previously created PanelLLM object and reference it in the provider field of the PanelLLM component.


Add the same prompt as used with GPT4:

You are a helpful assistant as part of the Virtual Human Toolkit 2.0 (VHToolkit 2.0). The VHToolkit 2.0 is developed by the USC Institute for Creative Technologies. It is currently in exclusive preview, only usable by a select lucky few. Any user should feel special about being able to use this preview before its formal release later this year. The VHToolkit 2.0 combines a flexible architecture with a principled API, using both academic and industry technologies. Capabilities include audio-visual sensing, speech recognition, natural language processing, nonverbal behavior generation and realization, and text-to-speech. Initially, many of these are implemented through the use of web services, with local alternatives being added in the future. The main game engine target is Unity, with Unreal Engine and Metahuman support underway. Your answers should be in a conversational tone with single sentence responses, emphasizing brevity while summarizing when necessary. Responses should be in spoken language, no characters that cannot be spoken.

Hit Play and Run the Scene

You’re done! You should be able to interact with both GPT4 and Clause simulteanously now.

Optional: ExampleLLM.cs Overview

Adding the panel as a child of the ProviderPanelParent allows for the scene to prompt multiple NLP systems at once.

Fields of ExampleLLM:

  • m_uiQuestion: the Input field referenced in the editor. Used for user input
  • providerPanelParent: a transform referenced in the editor, used to get all PanelLLMs in child objects
  • m_providerDict: a dictionary that stores references to provider names and the panelLLM components of those providers for every child in providerPanelParent

Methods of ExampleLLM:

  • Start:
    • Throws an error if the providerPanelParent is not provided
    • If provided, it initializes the m_providerDict so the PanelLLM component of every PanelLLM child of ProviderPanelParent is easily referenceable. This is why adding the PanelLLM as a child object of ProviderPanelParent automatically allows ExampleLLM to send prompts to the Anthropic system we just made
  • Update: used to check if the return/enter key is pressed to send non-empty user inputs to the NLP systems
  • ResetUIFields: Used to reset the conversation and response time text of each PanelLLM
  • AskQuestion
    • Checks for internet connection. If none is found throws an error and returns
    • Checks if the user input is empty. If empty, logs a message and returns
    • For every panel in the previously initialized dict
      • It starts the stopwatch of the NLPBase system in the PanelLLM component to begin tracking response time
      • Calls AskQuestion with the user input and adds the user input to the panel
      • Uses a lambda expression as the onComplete Action, allowing for the same PanelLLM reference to be used in the after Request action. In this action, the stopwatch is stopped, the response time is updated, the stopwatch is reset and the generatedResponse is added to the panel

AskQuestion can function in this polymorphic manner because the NLP systems both derive from NLPBase which allows for the same method and field names to be used to reference the context specific implementation of any of the derived NLP systems within a single loop.

Achievement Unlocked!

What other cloud solutions might you implement within VHToolkit 2.0 to benefit your research and simulations?