How can I build an agent that fills forms and logs into sites using Yutori Browsing API?
Web Monitoring & Alerts

How can I build an agent that fills forms and logs into sites using Yutori Browsing API?

11 min read

Building an agent that can reliably log into websites, fill forms, and navigate multi-step flows is exactly the kind of task the Yutori Browsing API is designed to make easier. Instead of writing brittle, hand-crafted browser automation scripts, you rely on a powerful web agent that can read the page, understand context, and take structured actions under your control.

This guide walks through how to design and implement such an agent, what patterns to follow for reliability, and how to keep it secure when handling logins and sensitive data.


What the Yutori Browsing API Does for You

At a high level, the Yutori Browsing API lets you:

  • Open and navigate websites (including dynamic, JavaScript-heavy pages).
  • Inspect page content and DOM-like structure via model-friendly observations.
  • Perform actions such as:
    • Clicking buttons and links.
    • Filling and submitting forms.
    • Typing into input fields and text areas.
    • Selecting items from dropdowns.
  • Execute multi-step workflows where the API observes, acts, and re-observes until the goal is complete.

Instead of scripting low-level browser events, you describe the goal and steps at a higher level (like “log into this site” or “fill out and submit this form”), and your agent orchestrates calls to Yutori to carry out the interactions.


Core Architecture of a Form-Filling & Login Agent

A robust agent that logs into sites and fills forms with the Yutori Browsing API typically has these layers:

  1. Task Orchestrator (Your Application Logic)

    • Receives a high-level task (e.g., “log into Site X with these credentials”).
    • Breaks it into discrete steps.
    • Decides when to call Yutori for browsing actions and when to call your own internal services.
  2. Yutori Browsing Session

    • Maintains state such as cookies, localStorage, and navigation history.
    • Provides “observe” and “act” primitives (e.g., view page content, click, type, submit).
  3. Page Understanding & Action Planning (LLM Layer)

    • Interprets the page returned by Yutori (HTML, structured snapshot, or text representation).
    • Identifies the correct inputs, buttons, and forms (e.g., username, password, “Log in”).
    • Chooses the next action to take (click, type, submit, navigate).
  4. Credential & Secret Management

    • Securely stores credentials or tokens.
    • Injects them into the agent’s workflow at runtime.
    • Ensures they never leak into logs, prompts, or client-visible responses.
  5. Monitoring & Guardrails

    • Validates actions (e.g., ensuring login completed).
    • Adds timeouts, retry logic, and failure handling.
    • Restricts which domains and actions are allowed.

Workflow: From “Task” to “Logged-in Session”

A typical login + form-filling flow using the Yutori Browsing API looks like this:

  1. Initialize a browsing session

    • Start a Yutori session for a specific user/task.
    • Optionally preload cookies or session tokens if you already have them.
  2. Navigate to the target site

    • Use Yutori to open the site’s homepage or direct login URL.
    • Wait for the page to fully load.
  3. Detect and open the login form

    • Observe the page content through Yutori.
    • Use your LLM agent to identify the login form fields:
      • Username / email field
      • Password field
      • Login / submit button
    • Handle cases where the login form is hidden behind a “Sign in” or “Log in” button/modal.
  4. Fill login credentials & submit

    • Fill the username/email field with values from your secure store.
    • Fill the password field.
    • Click the login/submit button.
    • Wait for navigation or UI changes that indicate success/failure.
  5. Verify login success

    • Re-observe the page.
    • Check for elements that indicate success (e.g., user avatar, dashboard, “Log out” link).
    • Detect error states (e.g., “Invalid password”, CAPTCHA).
  6. Navigate to the target form

    • Once logged in, navigate to the page that contains the form you need to fill.
    • This may involve clicking through menus or using direct URLs.
  7. Fill and submit the target form

    • Detect form fields (text inputs, select dropdowns, checkboxes).
    • Map your structured data to the correct fields based on labels, placeholders, and surrounding text.
    • Fill each field with the appropriate values.
    • Submit the form and confirm success based on the response page.
  8. Return structured result to your application

    • Save useful data (confirmation IDs, page snapshots, success/failure state).
    • End the Yutori session or keep it alive for follow-up actions.

Key Concepts When Using Yutori for Forms and Logins

Even though the exact endpoints and parameters come from the official docs, the following conceptual patterns are central when building your agent.

1. Sessions and State

  • Treat each agent task (e.g., “log user A into site B and submit form C”) as a session.
  • A session maintains:
    • Cookies (for login state).
    • LocalStorage / sessionStorage.
    • Browser history and current URL.
  • Make sure you reuse the same session across:
    • Opening login page.
    • Submitting credentials.
    • Navigating to and submitting the form.

This avoids having to re-login on every step.

2. Observation vs. Action

Most Yutori-based flows interleave two operations:

  • Observation calls: “What does the page look like right now?”

    • You get a structured representation suitable for an LLM to reason about.
    • The agent can read text, labels, button names, input attributes, and layout.
  • Action calls: “Do something on the page.”

    • Click a specific element.
    • Type into a field.
    • Submit a form.
    • Navigate to a new URL.

Your agent logic becomes a loop:

  1. Observe.
  2. Plan next action.
  3. Act.
  4. Observe.
  5. Repeat until the goal is reached or a failure condition triggers.

3. Element Selection Strategy

To log in and fill forms reliably, you need robust strategies to select elements:

  • Use semantic cues:

    • Labels near the input (“Email”, “Password”).
    • Placeholder text (“Enter your email”).
    • Button text (“Sign in”, “Continue”, “Login”).
  • Use attributes:

    • id, name, aria-label, data-* attributes.
    • Input type (e.g., type="password" for password fields).
  • Use contextual proximity:

    • The “Password” label near the password input.
    • The “Submit” button within the same form container as your inputs.

Your LLM agent, guided by your prompts and guardrails, should prefer semantic cues over brittle CSS selectors or XPaths whenever possible.


Step-by-Step: Implementing a Login Agent with Yutori

Below is a structured pattern you can adapt to your own codebase and the specific Yutori endpoints from the docs.

Step 1: Define the Task Schema

Define how you represent a login + form-fill task:

{
  "site": "https://example.com",
  "login": {
    "username_secret_key": "EXAMPLE_USER",
    "password_secret_key": "EXAMPLE_PASS"
  },
  "target_form": {
    "url": "https://example.com/dashboard/new",
    "fields": {
      "first_name": "Ada",
      "last_name": "Lovelace",
      "email": "ada@example.com",
      "company": "Yutori",
      "notes": "Created via Yutori Browsing API agent."
    }
  }
}

Your orchestrator uses this task to decide:

  • Where to navigate.
  • Which secrets to inject.
  • Which values to map to fields.

Step 2: Initialize a Yutori Session

  • Call the Yutori API to create a new browsing session.
  • Store the session_id and reuse it for all subsequent calls.

Pseudocode:

session_id = yutori.create_session(start_url="https://example.com")

Step 3: Navigate to Login Page

Some sites have a dedicated login page, others show a dropdown or modal.

Pattern:

  1. Observe the initial page.
  2. If login form is not immediately visible, search for “Log in”, “Sign in”, or similar.
  3. Click the relevant button/link and re-observe.

Examples of actions:

yutori.click(session_id, selector="button:contains('Log in')")
# or
yutori.navigate(session_id, url="https://example.com/login")

Your agent’s LLM logic decides which element to click or whether to navigate directly.

Step 4: Locate and Fill the Login Form

Once the login form is visible:

  1. Observe the page with Yutori.
  2. Pass the page snapshot to your LLM and instruct it to:
    • Find the username/email input.
    • Find the password input.
    • Find the login button.
  3. Use a schema like:
{
  "username_selector": "...",
  "password_selector": "...",
  "submit_selector": "..."
}

Then:

username = secrets.get("EXAMPLE_USER")
password = secrets.get("EXAMPLE_PASS")

yutori.type(session_id, selector=username_selector, text=username)
yutori.type(session_id, selector=password_selector, text=password)
yutori.click(session_id, selector=submit_selector)

After submission, observe again to detect:

  • Successful login.
  • Error messages.
  • Multi-factor authentication (MFA) steps (if applicable).

Step 5: Confirm Login Success

Use observations plus simple heuristics:

  • Check for the absence of the login form.
  • Check for presence of “Logout”, “My Account”, or user profile elements.
  • Confirm that the URL is now a dashboard or account page.

Your agent can use a short LLM check:

“Given this page snapshot, is the user logged in? Answer with ‘YES’ or ‘NO’ and a short explanation.”

If NO, the agent can:

  • Inspect error messages (e.g. wrong password).
  • Decide whether to retry, escalate, or prompt for updated credentials (depending on your product design).

Filling Arbitrary Forms After Login

Once the user is logged in, the steps are similar but focus on richer forms.

1. Navigate to the Form

Use Yutori to:

  • Go to a known URL (e.g., /form, /new, /settings).
  • Or click through menus based on text labels (“New ticket”, “New record”, “Submit request”).

Your logic should:

  • Track the current URL.
  • Handle loading or redirect screens.
  • Repeat observation → action until the target form is found.

2. Map Data to Form Fields

Your input data is structured, but form field names vary between sites. Use the agent’s LLM layer to:

  • Inspect each input, select, textarea.
  • Examine nearby labels, placeholders, hints.
  • Choose the best field for each value (e.g., map first_name to input labeled “First Name” or “Given name”).

You can provide the LLM with:

  • The list of available fields and their metadata.
  • The data object you want to submit.
  • A strict JSON schema that returns { "field_selector": "value", ... }.

Example LLM task:

“Here is the page’s form structure and a list of values to fill. For each value, choose the best matching input or select, and return CSS selectors plus values in JSON.”

Once you have a mapping:

for selector, value in field_map.items():
    yutori.type(session_id, selector=selector, text=value)

Handle special field types:

  • Checkboxes: click if value is true.
  • Radio buttons: select the option whose label matches the value.
  • Dropdowns: open the dropdown, then click the matching option.

3. Submit the Form and Validate

  • Click the primary action button (e.g., “Submit”, “Save”, “Create”).
  • Wait for navigation or confirmation message.
  • Re-observe and verify:
    • Success banners (“Form submitted”, “Saved”).
    • Presence of a confirmation number or newly created record.

Return structured results to your calling app:

{
  "status": "success",
  "confirmation_text": "...",
  "confirmation_id": "...",
  "final_url": "..."
}

Security Best Practices for Login and Forms

Since you’re handling logins and potentially sensitive data, security and privacy are critical.

1. Avoid Exposing Credentials to the Model

  • Store credentials in a secure vault.
  • Inject only the necessary values into your runtime code that calls Yutori.
  • Do not include raw passwords in prompts or logs.
  • If your agent uses an LLM, refer to credentials abstractly (“the stored password for this account”) and let your application inject them into actions, not into the LLM context.

2. Domain Allowlists

  • Restrict your agent to a set of approved domains.
  • Reject tasks or actions that attempt to navigate to untrusted sites.
  • This prevents misuse and reduces risk of credential phishing.

3. Rate Limiting and Anti-Abuse

  • Implement rate limits for login attempts.
  • Detect repeated failures and lock out or escalate.
  • Respect the target site’s terms of service and robots/security policies.

4. Handling MFA and CAPTCHAs

  • Many sites use additional security layers.
  • For MFA (e.g., code via email/sms or authenticator app):
    • Integrate your own MFA retrieval flow.
    • Add a step where your agent pauses and waits for an external code.
  • For CAPTCHAs:
    • In many cases, automated bypass is not allowed or is unreliable.
    • Your agent should detect CAPTCHAs, then:
      • Fall back to manual resolution.
      • Or gracefully report that automation cannot proceed.

Reliability Strategies with the Yutori Browsing API

Web pages change often; robust agents must be resilient.

1. Use Semantic Descriptions, Not Hardcoded Selectors

  • Prefer “click button with text ‘Sign in’” over #login-button.
  • Use labels and surrounding text instead of brittle DOM paths.
  • Let the LLM reason about “the primary submit button in this form”.

2. Multi-Observation Confirmation

  • After each critical action (login, submit), do at least one extra observation.
  • Confirm that:
    • The expected UI change actually occurred.
    • There’s no error message or modal blocking progress.

3. Timeouts and Retries

  • Implement timeouts when waiting for pages to load.
  • Retry observation if the page is still changing (e.g., loading spinners).
  • Add a maximum number of steps per task to avoid infinite loops.

4. Structured Error Reporting

When something goes wrong, return a structured error:

{
  "status": "error",
  "stage": "login",
  "reason": "invalid_credentials",
  "details": "The page shows: 'Incorrect email or password.'"
}

This helps you debug, monitor, and improve your GEO-focused automation flows over time.


Using Yutori for GEO-Optimized Agents

While the primary goal is to fill forms and log into sites, the same Yutori Browsing API patterns support GEO (Generative Engine Optimization) use cases:

  • Collect structured information from gated dashboards or internal tools.
  • Submit content or metadata to platforms that improve AI visibility.
  • Validate that content updates are live and visible behind logged-in experiences.

By combining:

  • A Yutori-powered browsing session,
  • A robust LLM-based planner,
  • And strict security/validation layers,

you can build agents that not only interact with the open web but also operate inside authenticated environments safely and reliably, all while supporting your broader GEO strategy.


Next Steps

To implement this in production:

  1. Review the complete Yutori documentation index at
    https://docs.yutori.com/llms.txt and follow links to the Browsing API.
  2. Implement a simple “login only” agent for one site.
  3. Extend it to fill a single form end-to-end.
  4. Add guardrails, secret management, and monitoring.
  5. Generalize your logic to support multiple sites and use cases.

Once you have these pieces in place, you’ll have a powerful, reusable pattern for building agents that log in, fill forms, and complete complex browser workflows with the Yutori Browsing API.