Configuration Reference

Archi deployments are configured via YAML files passed to the CLI with --config. Any fields not specified are populated from the base template at src/cli/templates/base-config.yaml.

Tip: Start from one of the example configs in examples/deployments/ and customize from there.

Top-Level Fields

`name`

Type: string (required)

Name of your deployment. Used for container naming and directory structure.

name: my_deployment

`global`

Global settings shared across all services.

Key	Type	Default	Description
`DATA_PATH`	string	`/root/data/`	Path for persisted data inside containers
`ACCOUNTS_PATH`	string	`/root/.accounts/`	Path for uploader/grader account data
`ACCEPTED_FILES`	list	See below	File extensions allowed for manual uploads
`LOGGING.input_output_filename`	string	`chain_input_output.log`	Pipeline I/O log filename
`verbosity`	int	`3`	Default logging level for services (0-4)

Default accepted files: .pdf, .md, .txt, .docx, .html, .htm, .json, .yaml, .yml, .py, .js, .ts, .jsx, .tsx, .java, .go, .rs, .c, .cpp, .h, .sh

`services`

Configuration for containerized services. Each service has its own subsection.

`services.chat_app`

The main chat interface.

Key	Type	Default	Description
`agent_class`	string	`CMSCompOpsAgent`	Pipeline class to run
`agents_dir`	string	—	Path to agent markdown files
`default_provider`	string	`local`	Default LLM provider
`default_model`	string	`llama3.2`	Default model
`client_timeout_seconds`	number	`600`	Chat request/stream timeout in seconds (sent to frontend as ms)
`tools`	dict	`{}`	Agent-class-specific tool settings (for example `tools.monit.url`)
`trained_on`	string	—	Description shown in the chat UI
`hostname`	string	`localhost`	Public hostname for the chat interface
`port`	int	`7861`	Internal container port
`external_port`	int	`7861`	Host-mapped port
`host`	string	`0.0.0.0`	Network binding
`num_responses_until_feedback`	int	`3`	Responses before prompting for feedback
`auth.enabled`	bool	`false`	Enable authentication

Provider Configuration

services:
  chat_app:
    providers:
      local:
        enabled: true
        base_url: http://localhost:11434
        mode: ollama              # or openai_compat
        default_model: llama3.2
        models:
          - llama3.2
      gemini:
        enabled: true

`services.postgres`

PostgreSQL database settings.

Key	Type	Default	Description
`host`	string	`postgres`	Database hostname
`port`	int	`5432`	Database port
`user`	string	`archi`	Database user
`database`	string	`archi-db`	Database name

`services.vectorstore`

Key	Type	Default	Description
`backend`	string	`postgres`	Vector store backend (only `postgres` supported)

`services.data_manager`

Key	Type	Default	Description
`port`	int	`7871`	Internal port
`external_port`	int	`7871`	Host-mapped port
`host`	string	`0.0.0.0`	Network binding
`enabled`	bool	`true`	Enable data manager service

`services.grafana`

Key	Type	Default	Description
`port`	int	`3000`	Grafana port
`external_port`	int	`3000`	Host-mapped port

`services.grader_app`

Key	Type	Default	Description
`port`	int	`7861`	Internal port
`external_port`	int	`7862`	Host-mapped port
`provider`	string	—	Provider for grading pipelines
`model`	string	—	Model for grading pipelines
`num_problems`	int	—	Number of problems (must match rubric files)
`local_rubric_dir`	string	—	Path to rubric files
`local_users_csv_dir`	string	—	Path to users CSV

Other Services

services.piazza: Requires network_id, agent_class, provider, model
services.mattermost: Requires update_time
services.redmine_mailbox: Requires url, project, redmine_update_time, mailbox_update_time
services.benchmarking: See Benchmarking

`data_manager`

Controls data ingestion, vectorstore behaviour, and retrieval settings.

Core Settings

Key	Type	Default	Description
`collection_name`	string	`default_collection`	Vector store collection name
`embedding_name`	string	`OpenAIEmbeddings`	Embedding backend
`chunk_size`	int	`1000`	Max characters per text chunk
`chunk_overlap`	int	`0`	Overlapping characters between chunks
`parallel_workers`	int	`32`	Parallel ingestion workers
`reset_collection`	bool	`true`	Wipe collection on startup
`distance_metric`	string	`cosine`	Similarity metric: `cosine`, `l2`, `ip`

Retrieval Settings

Key	Type	Default	Description
`retrievers.hybrid_retriever.num_documents_to_retrieve`	int	`5`	Top-k documents per query
`retrievers.hybrid_retriever.bm25_weight`	float	`0.6`	BM25 keyword score weight
`retrievers.hybrid_retriever.semantic_weight`	float	`0.4`	Semantic similarity weight
`stemming.enabled`	bool	`false`	Enable Porter Stemmer for improved matching

Note: use_hybrid_search is a dynamic runtime setting (managed via the configuration API), not a YAML config key.

Sources

data_manager:
  sources:
    links:
      input_lists:
        - miscellanea.list
      scraper:
        reset_data: true
        verify_urls: false
        enable_warnings: false
      selenium_scraper:
        enabled: false
    git:
      enabled: false
    sso:
      enabled: false
    jira:
      url: https://jira.example.com
      projects: []
      anonymize_data: true
      cutoff_date: null
    redmine:
      url: https://redmine.example.com
      project: null
      anonymize_data: true

The visible flag on any source (sources.<name>.visible) controls whether content appears in chat citations (default: true).

Embedding Configuration

data_manager:
  embedding_name: OpenAIEmbeddings
  embedding_class_map:
    OpenAIEmbeddings:
      class: OpenAIEmbeddings
      kwargs:
        model: text-embedding-3-small
      similarity_score_reference: 10

See Models & Providers for all embedding options.

Anonymizer

data_manager:
  utils:
    anonymizer:
      nlp_model: en_core_web_sm
      excluded_words: []
      greeting_patterns: []
      signoff_patterns: []
      email_pattern: '[\w\.-]+@[\w\.-]+\.\w+'
      username_pattern: '\[~[^\]]+\]'

Agent Configuration Model

Archi no longer uses a top-level archi: block in standard deployment YAML.

Agent behavior is defined by:

services.chat_app.agent_class: which pipeline class runs (for example CMSCompOpsAgent)
services.chat_app.agents_dir: where agent spec markdown files live
agent specs (*.md): selected tool subset (tools) and system prompt body
services.chat_app.tools: optional agent-class-specific tool settings

Example:

services:
  chat_app:
    agent_class: CMSCompOpsAgent
    agents_dir: examples/agents
    tools:
      monit:
        url: https://monit-grafana.cern.ch

See Agents & Tools for agent spec format and tool selection.

Complete Example

name: my_deployment

global:
  DATA_PATH: "/root/data/"
  ACCEPTED_FILES: [".txt", ".pdf", ".md"]
  verbosity: 3

services:
  chat_app:
    agent_class: CMSCompOpsAgent
    agents_dir: examples/agents
    default_provider: local
    default_model: llama3.2
    trained_on: "Course documentation"
    hostname: "example.mit.edu"
    external_port: 7861
    providers:
      local:
        enabled: true
        base_url: http://localhost:11434
        mode: ollama
        models:
          - llama3.2
  postgres:
    port: 5432
    database: archi-db
  vectorstore:
    backend: postgres

data_manager:
  sources:
    links:
      input_lists:
        - examples/deployments/basic-gpu/miscellanea.list
      scraper:
        reset_data: true
        verify_urls: false
  embedding_name: OpenAIEmbeddings
  chunk_size: 1000
  chunk_overlap: 0

Tip: For the full base template with all defaults, see src/cli/templates/base-config.yaml in the repository.