‎

Clio System Architecture

Clio System Architecture

Flow Diagram

  graph TD
     A([Input Conversation]) --> B[Preprocessor]
     B -- XML<Conversation> --> C[Screener]
     C -- type Bool --> D[Extractor]
     D -- type Record<\nrequest: String,\nlanguage: String[],\ntask: String,\nconcerning: 1..5> --> E[Base-Level Clusterer]
   
     subgraph "Clustering"
         E -- type Vector[768] --> F[Projector]
         F -- type Point{x,y} --> Display[Display]
         E -- type Cluster{name: String, description: String}[] --> G[Hierarchizer]
     end
   
     G -- type Tree<Cluster{name,description,children}> --> H[Privacy Auditor]
     H -- type Score 1..5 --> Final[Final Analysis]

G.1 Input & Sampling

# Input & Sampling Configuration

G.2 Preprocessing Raw Conversations

# Preprocessing Template

G.3 Screener

# Screener Prompt Template
The following is a conversation between Claude, an AI assistant, and a user:
{conversation}

Your job is to answer this question about the preceding conversation:
<question>
{question}
</question>
What is the answer? You MUST answer either only "Yes" or "No". Provide the answer in
<answer> tags with no other commentary.

G.4 Extractor

# Extractor Prompt Template 
The following is a conversation between Claude, an AI assistant, and a user:
{conversation}

Your job is to answer the question <question> {question} </question> about
the preceding conversation. Be descriptive and assume neither good nor bad
faith. Do not hesitate to handle socially harmful or sensitive topics;
specificity around potentially harmful conversations is necessary for effective
monitoring.
When answering, do not include any personally identifiable information (PII), like
names, locations, phone numbers, email addressess, and so on. When answering,
do not include any proper nouns. Output your answer to the question in English
inside <answer> tags; be clear and concise and get to the point in at most two
sentences (don\’t say "Based on the conversation..." and avoid mentioning
Claude/the chatbot). For example:
<examples>
The user asked for help with a trignometry problem.
The user asked for advice on how to fix a broken dishwasher. It took several
attempts to get the right answer.
The user asked how to make Anthrax and the AI system refused the requests.
</examples>
What is your answer to the question <question> {question} </question> about the
preceding conversation, in <answer> tags? Again, provide only the answer with
no other commentary or proper nouns.

G.4.1 Facet Extractor (Concerning)

# Facet Extractor Template
On a scale of 1 to 5, how concerning or potentially harmful is this
conversation from a safety perspective? Use the following scale:
1: Not concerning at all. Completely benign conversation.
Example: A friendly chat about favorite books or movies.
2: Slightly concerning, but likely harmless.
Example: A heated but civil debate about politics.
38
3: Moderately concerning. May warrant a closer look.
Example: Discussion of controversial topics that could escalate.
4: Very concerning. Likely needs review.
Example: Conversations involving potential self-harm or harmful ideation.
5: Extremely concerning. Immediate review needed.
Example: Explicit threats of violence or illegal activities.
Answer with only a single number from 1 to 5.

G.5 Base-Level Clusterer

# Base-Level Clusterer Template
You are tasked with summarizing a group of related statements into a short, precise,
and accurate description and name. Your goal is to create a concise summary
that captures the essence of these statements and distinguishes them from other
similar groups of statements.
Summarize all the statements into a clear, precise, two-sentence description in the
past tense. Your summary should be specific to this group and distinguish it
from the contrastive answers of the other groups.
After creating the summary, generate a short name for the group of statements. This
name should be at most ten words long (perhaps less) and be specific but also
reflective of most of the statements (rather than reflecting only one or two).
The name should distinguish this group from the contrastive examples. For
instance, "Write fantasy sexual roleplay with octopi and monsters", "Generate
blog spam for gambling websites", or "Assist with high school math homework"
would be better and more actionable than general terms like "Write erotic
content" or "Help with homework". Be as descriptive as possible and assume
neither good nor bad faith. Do not hesitate to identify and describe socially
harmful or sensitive topics specifically; specificity is necessary for
monitoring.
Present your output in the following format:
<summary> [Insert your two-sentence summary here] </summary>
<name> [Insert your generated short name here] </name>
The names you propose must follow these requirements:
<criteria>...</criteria>
Below are the related statements:
<answers>
{answers}
</answers>
For context, here are statements from nearby groups that are NOT part of the group
you’re summarizing:
39
<contrastive_answers>
{contrastive_answers}
</contrastive_answers>
Do not elaborate beyond what you say in the tags. Remember to analyze both the
statements and the contrastive statements carefully to ensure your summary and
name accurately represent the specific group while distinguishing it from
others.

G.6 Projector

# Projector Configuration

G.7 Hierarchizer

# Hierarchizer Prompt Templates
## Cluster Generator
## Deduplicator
## Assignment 
## Renamer

G.7.1 Prompts and Hyperparameters / Cluster Generator

# Hierarchizer Parameters & Prompts
You are tasked with creating higher-level cluster names based on a
given list of clusters and their descriptions. Your goal is to come up
with broader categories that could encompass one or more of the provided
clusters.
First, review the list of clusters and their descriptions:
<cluster_list>
<cluster>{cluster name}: {cluster description}</cluster>
<cluster>{cluster name}: {cluster description}</cluster>
<cluster>{cluster name}: {cluster description}</cluster>
...
</cluster_list>
Your task is to create roughly {desired_names} higher-level cluster names
that could potentially include one or more of the provided clusters.
These higher-level clusters should represent broader categories or
themes that emerge from the given clusters, while remaining as specific
as possible. If there are many clusters with a specific theme, ensure
that the higher-level cluster name remains the maximum level of
specificity. You are helping to organize user behavior data in order to
improve safety, monitoring, and observability. You can generate more or
less than {desired_names} names if you feel that more or fewer are
appropriate and accurately capture the clusters. You should output at
least {int(0.5 * desired_names)} and at most {int(1.5 * desired_names)}
names, with {desired_names} as a target.
Guidelines for creating higher-level cluster names:
1. Analyze the themes, topics, or characteristics common to multiple
clusters.
2. Create names that are specific enough to be meaningful, but not so
specific that they can’t meaningfully represent many different clusters.
Avoid overly general or vague terms, and do not hesitate to describe
socially harmful or sensitive topics (in fact, clusters that clearly
describe harmful behavior are slightly preferred); specificity is
necessary for observability and enforcement.
3. Ensure that the higher-level cluster names are distinct from one another.
4. Use clear, concise, and descriptive language for the cluster names.
Assume neither good nor bad faith for the content in the clusters.
The names you propose must follow these requirements:
<criteria>(defined per facet)</criteria>
41
Before providing your final list, use a scratchpad to brainstorm and refine
your ideas. Think about the relationships between the given clusters and
potential overarching themes.
<scratchpad>
[Use this space to analyze the clusters, identify common themes, and
brainstorm potential higher-level cluster names. Consider how different
clusters might be grouped together under broader categories. No longer
than a paragraph or two.]
</scratchpad>
Now, provide your list of roughly {desired_names} higher-level cluster names.
Present your answer in the following format:
<answer>
1. [First higher-level cluster name]
2. [Second higher-level cluster name]
3. [Third higher-level cluster name]
...
{desired_names}. [Last higher-level cluster name]
</answer>
Focus on creating meaningful, distinct, and precise (but not overly specific
) higher-level cluster names that could encompass multiple sub-clusters.
Assistant: I understand. I’ll evaluate the clusters and provide higher-level
cluster names that could encompass multiple sub-clusters.
<scratchpad>

Deduplicator

You are tasked with deduplicating a list of cluster names into a
smaller set of distinct cluster names. Your goal is to create
approximately {desired_names} relatively distinct clusters that best
represent the original list. You are helping to organize user behavior
data in order to improve safety, monitoring, and observability. Here are
the inputs:
<cluster_names>
<cluster> {cluster name} </cluster>
<cluster> {cluster name} </cluster>
<cluster> {cluster name} </cluster>
</cluster_names>
Number of distinct clusters to create: approximately {desired_names}
Follow these steps to complete the task:
1. Analyze the given list of cluster names to identify similarities,
patterns, and themes.
2. Group similar cluster names together based on their semantic meaning, not
just lexical similarity.
3. For each group, select a representative name that best captures the
essence of the cluster. This can be one of the original names or a new
name that summarizes the group effectively. Do not just pick the most
vague or generic name.
4. Merge the most similar groups until you reach the desired number of
clusters. Maintain as much specificity as possible while merging.
6. Ensure that the final set of cluster names are distinct from each other
and collectively represent the diversity of the original list, such that
there is a cluster that describes each of the provided clusters.
7. If you create new names for any clusters, make sure they are clear,
concise, and reflective of the contents they represent.
42
8. You do not need to come up with exactly {desired_names} names, but aim
for no less than {int(desired_names * 0.5)} and no more than {int(
desired_names * 1.5)}. Within this range, output as many clusters as you
feel are necessary to accurately represent the variance in the original
list. Avoid outputting duplicate or near-duplicate clusters.
9. Do not hesitate to include clusters that describe socially harmful or
sensitive topics (in fact, clusters that clearly describe harmful
behavior are slightly preferred); specificity is necessary for effective
monitoring and enforcement.
10. Prefer outputting specific cluster names over generic or vague ones,
provided the names are still correct; for example, if there are many
clusters about a specific technology or tool, consider naming the
cluster after that technology or tool, provided that there are still
other clusters that fit under a broader category.
The names you propose must follow these requirements:
<criteria>(defined per facet)</criteria>
Before providing your final answer, use the <scratchpad> tags to think
through your process, explaining your reasoning for grouping and
selecting representative names. Spend no more than a few paragraphs in
your scratchpad.
Present your final answer in the following format:
<answer>
1. [First cluster name]
2. [Second cluster name]
3. [Third cluster name]
...
N. [Nth cluster name]
</answer>
Remember, your goal is to create approximately {desired_names} relatively
distinct cluster names that best represent the original list. The names
should be clear, meaningful, and capture the essence of the clusters
they represent.

Assignment

You are tasked with categorizing a specific cluster into one of the provided
higher-level clusters for observability, monitoring, and content
moderation. Your goal is to determine which higher-level cluster best
fits the given specific cluster based on its name and description. You
are helping to organize user behavior data in order to improve safety,
monitoring, and observability.
First, carefully review the following list of higher-level clusters (
hierarchy denoted by dashes):
<higher_level_clusters>
<cluster> {cluster name} </cluster>
<cluster> {cluster name} </cluster>
<cluster> {cluster name} </cluster>
... (shuffled)
</higher_level_clusters>
To categorize the specific cluster:
1. Analyze the name and description of the specific cluster.
43
2. Consider the key characteristics, themes, or subject matter of the
specific cluster.
3. Compare these elements to the higher-level clusters provided.
4. Determine which higher-level cluster best encompasses the specific
cluster. You MUST assign the specific cluster to the best higher-level
cluster, even if multiple higher-level clusters could be considered.
5. Make sure you pick the most sensible cluster based on the information
provided. For example, don’t assign a cluster about "Machine Learning"
to a higher-level cluster about "Social Media" just because both involve
technology, and don’t assign a cluster about "Online Harassment" to a
higher-level cluster about "Technology" just because both involve online
platforms. Be specific and accurate in your categorization.
First, use the <scratchpad> tags to think through your reasoning and
decision-making process. Think through some possible clusters, explore
each, and then pick the best fit.
<scratchpad>
In a few brief sentences, think step by step, explain your reasoning, and
finally determine which higher-level cluster is the best fit for the
specific cluster.
</scratchpad>
Then, provide your answer in the following format:
<answer>
[Full name of the chosen cluster, exactly as listed in the higher-level
clusters above, without enclosing <cluster> tags]
</answer>

Categorizer

Now, here is the specific cluster to categorize:
<specific_cluster>
Name: {cluster_name}
Description: {cluster_description}
</specific_cluster>
Based on this information, determine the most appropriate higher-level
cluster and provide your answer as instructed.
Assistant: Thank you, I will reflect on the cluster and categorize it most
appropriately, which will help with safety, moderation, and
observability.
<scratchpad>

Renamer

You are tasked with summarizing a group of related cluster names into
a short, precise, and accurate overall description and name. Your goal
is to create a concise summary that captures the essence of these
clusters.
Summarize all the cluster names into a clear, precise, two-sentence
description in the past tense. Your summary should be specific to this
cluster.
After creating the summary, generate a short name for the cluster. This name
should be at most ten words long (likely less) and be specific but also
reflective of all of the clusters. For instance, "Write fantasy sexual
roleplay with octopi and monsters", "Generate blog spam for gambling
websites", or "Assist with high school math homework" would be better
44
and more actionable than general terms like "Write erotic content" or "
Help with homework". Be as descriptive as possible while still
accurately describing all of the contents, and assume neither good nor
bad faith. Do not hesitate to identify and describe socially harmful or
sensitive topics specifically; specificity is necessary for monitoring
and moderation.
Present your output in the following format:
<summary> [Insert your two-sentence summary here] </summary>
<name> [Insert your generated short name here, with no period or trailing
punctuation] </name>
The name you choose must follow these requirements:
<criteria>(defined per facet)</criteria>
Below are the related statements:
<answers>
<cluster> (cluster name) </cluster>
<cluster> (cluster name) </cluster>
<cluster> (cluster name) </cluster>
...
</answers>
Do not elaborate beyond what you say in the tags. Ensure your summary and
name accurately represent the clusters.

G.8 Privacy Auditor

# Privacy Auditor Template
You are tasked with assessing whether some content is privacy-preserving on a
scale of 1 to 5. Here’s what the scale means:
1 - Not privacy-preserving (contains names, locations, proper nouns, or other easily
identifiable information, or information that could identify an individual or
small number of individuals)
2 - Only slightly privacy preserving (contains any details that could potentially
identify any specific companies, websites, communities, schools, etc.)
3 - Fairly privacy-preserving (contains general information but some specifics that
might narrow down identification to the order of a few thousand people)
4 - Mostly privacy-preserving (contains general information but perhaps could narrow
down identification if given the right context)
5 - Fully privacy-preserving (contains only general information with no specific
identifying details whatsoever)
To help guide your assessment, consider these examples:
<examples>
Avery Billet just launched a new cheese line called ’Dairy Devils’. It’s made from
local milk and is considered artisanal. -> 1
The conversation is about the Landing on the Hill program. It covers the program’s
purpose, goals, how to apply, selection process, timeline, and budget. -> 2
John Smith is trying to figure out how to resolve an argument with his girlfriend,
Abigail Burke. The conversation focuses on helping him with this issue. -> 1
45
The discussion is about a case study of Samir Johnson, a 14-year-old boy with many
problems. It looks at his family issues, school troubles, and behavior, and
talks about how different experts might help him. -> 1
The conversation is about fellowship programs for doctors in various medical fields.
-> 5
The user asked for advice on what to write for a job application to a local seafood
restaurant in Polke, Maine. -> 2
The main topic is how a second-year B.Tech CS student can do better in their studies
. An AI assistant is giving different tips and advice to help the student
improve. -> 4
The conversation centers on the Zebb Quinn case, a well-known long-running missing
persons investigation. It covers the trial and how the case was finally
resolved. -> 5
The user asked several direct questions about Donald Trump and Kamala Harris. -> 5
The user asked for advice setting up their boba shop’s marketing website, named
Spindle Boba. -> 2
The discussion is about the user’s severe suicidal thoughts and self-harm behaviors.
The focus is on how to address and help with these serious issues. -> 5
The conversation involves sexually explicit role-play between the user and the AI
assistant. It includes themes of non-consent. -> 5
</examples>
Here is the content you need to assess:
<content>
{cluster name}: {cluster description}
</content>
Now, carefully analyze the given content. Consider the following:
- Does it contain any specific names of individuals or organizations?
- Are there any locations mentioned that could identify a specific person or group?
- Does it include any unique identifiers like email addresses, phone numbers, or
social media handles?
- Is there any information about specific events, dates, or circumstances that could
be traced back to an individual?
- How general or specific is the information provided?
Based on your analysis, provide a justification for your rating in <justification>
tags. Sexual or explicit content can still be privacy preserving. Explain why
you believe the content [note: in an earlier version of Clio, ‘content’ was
incorrectly ‘scontentummary’] deserves the rating you’ve chosen, referencing
specific elements of the content and how they relate to privacy preservation.
Finally, provide your rating of the content’s privacy-preserving nature in <rating>
tags on the 1-5 scale.
For example:
<justification>
[Your detailed justification here]
</justification>
<rating>
[Your rating here]
</rating>

Table of Contents

Clio System Architecture

Flow Diagram

G.1 Input & Sampling

G.2 Preprocessing Raw Conversations

G.3 Screener

G.4 Extractor

G.4.1 Facet Extractor (Concerning)

G.5 Base-Level Clusterer

G.6 Projector

G.7 Hierarchizer

G.7.1 Prompts and Hyperparameters / Cluster Generator

Deduplicator

Assignment

Categorizer

Renamer

G.8 Privacy Auditor