Guardrails

This documentation is valid for:

The implementation of Guardrails in Globant Enterprise AI aims to ensure the system's security, ethics, and reliability. These mechanisms are fundamental to:

Prevent inappropriate responses and biases.
Avoid disclosure of confidential information.
Mitigate undesired behavior by AI models.
Ensure compliance with legal regulations and ethical standards.

These controls protect both the users and reputation of an organization.

Guardrails in Globant Enterprise AI can be configured from the Backoffice or via the Assistant API.

In the Backoffice, these settings are made when defining a new Chat Assistant or clicking on EDIT PROMPT for an already created Chat Assistant.

When accessing the Assistant Editor, you will find the Security Guardrails section on the right side of the screen. This section will allow you to enable one or more of the following options:

Prompt Injection
Input Moderation
Assistant Output

You can activate any combination of these options according to your needs. This flexibility is also available when configured via Assistant API.

By combining these three Guardrails, Prompt Injection and Input Moderation operate in parallel to the LLM call configured for the assistant, while Assistant Output acts as a final layer of security, analyzing the output generated by the assistant before delivering it to the end user.

Additionally, you can configure Guardrails in your RAG Assistants in the Retrieval tab. For more information, go to Guardrails applied to RAG Assistants.

When a Guardrail is activated while the end user is interacting in the Frontend, a message will be displayed indicating that the action cannot be processed due to the activation of a Guardrail.

Prompt Injection

By enabling this option, your AI assistants are protected against possible threats contained in end-user or system inputs. It works in parallel with the assistant's LLM, without adding timeout. This Guardrail ensures that:

Malicious commands that manipulate the assistant's behavior cannot be injected.
The operational context of the model remains intact and is not altered.
Risks such as prompt manipulation or attempts to exploit vulnerabilities are reduced.

The Prompt Injection Guardrail evaluates the following categories, assigning a confidence score to each:

Prompt Injection
Self-Disclosure Attempts
Instruction Overrides
Code Execution Requests
Privacy Violations
Disallowed Content
Politeness Violations
Consistency Violations

Input Moderation

Like Prompt Injection, this Guardrail makes a parallel call to the assistant's LLM.

This configuration analyzes user inputs in real time. Enabling this Guardrail makes it possible to:

Detect and block offensive or inappropriate language.
Identify content that breaches internal policies or ethical standards.
Ensure respectful interactions that comply with regulations, protecting both end users and the assistant's reputation.

For more details about the categories analyzed by the Input Moderation Guardrail, please check Moderation.

Assistant Output

Unlike the previous ones, this Guardrail analyzes the response after the assistant's LLM has generated it.

Selecting this option ensures that the answers generated by the assistant are safe and appropriate. This Guardrail allows you to:

Avoid inappropriate content in the model's outputs.
Ensure that responses meet legal and quality standards.
Monitor and validate the interactions generated by the assistant, ensuring a reliable experience for end users.

The Assistant Output Guardrail evaluates the following categories, assigning a confidence score to each:

Malicious URLs
Malicious Code
Prohibited Content
Language and Tone Issues
Instruction Noncompliance
Formatting Issues

This guardrail analyzes the generated output in real-time during generation. When the stream property is enabled, the response will only be returned after the validation process is complete. If an error is detected, the entire response will be withheld, and only the relevant error message will be shown.

Guardrails

Prompt Injection

Input Moderation

Assistant Output

See Also