Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Free: 6-day Agentic AI Engineering Email Guide.
Learnings from Towards AI's hands-on work with real clients.
Beyond Lowest Bid: A Deterministic, Explainable Multi-Agent Hiring System
Artificial Intelligence   Latest   Machine Learning

Beyond Lowest Bid: A Deterministic, Explainable Multi-Agent Hiring System

Last Updated on February 6, 2026 by Editorial Team

Author(s): Adrian Dsouza

Originally published on Towards AI.

Imagine you need to hire someone for a small task for a given product or team, it can be an API endpoint, maybe a small bug fix, or maybe a quick dashboard. We can post the job, and within a couple of minutes, five freelancers will respond with bids. They can have different prices, different timelines, and different levels of confidence. But in reality, we mostly see that the “cheapest” option often looks good on paper until the code arrives late, breaks tests, or needs a full rewrite, and we spend more time or even lose money while trying to rectify it. Sounds bad, doesn’t it?

That’s the exact problem my project tackles. TaskBounty DAO simulates a real hiring marketplace by considering five freelancer agents (Demo), which will compete with the same rules that is bid, negotiate across multiple rounds, and then evaluate, which produces a winner based on price, ETA, expected quality, and risk, and not just cost.

I would love to brainstorm on the next version, including some randomness or non deterministic approach for determining the Winner!

Project Deployed on GitHub

GitHub — adrian2504/Multi-Agent-Task-Bidding-Negotiation-Quality-Arbitration-Dispute-Resolution

Contribute to adrian2504/Multi-Agent-Task-Bidding-Negotiation-Quality-Arbitration-Dispute-Resolution development by…

github.com

A Cool Project I Built

  • Multi-agent simulator: Firstly a client will post a task (title, acceptance criteria, and budget), and then 5 freelancer agents will compete against each other by submitting bids (price, ETA, confidence, portfolio score, risk flags).
  • Negotiation rounds and replayable runs: AMediator will run N number of rounds of counteroffers and then update bids after each round. A seed makes the run produce the same results. That is, the same output for the same inputs.
  • Winner selection: A deterministic scorer will rank the bids the freelancers give, using price + ETA + expected quality and risk, and returns a structured report (score breakdown per bid + winner ).
  • UI and API end-to-end: FastAPI exposes /demo/run-uiand /run-ui and a React frontend that shows the Winner card and then ranks the bids in a table, followed by the Referee summary, and the Negotiation timeline.
Beyond Lowest Bid: A Deterministic, Explainable Multi-Agent Hiring System

Where does the use of LLM fit in this Project?

Even if the LLM is “only being used for language generation,” it still showcases AI, just not the decision-making AI. My project is basically a hybrid agent system:

  • Deterministic core = decision engine (guarantees correctness + reproducibility)
  • LLM layer = agent behavior + communication (adds realism + usability + explainability)

AI used for “agent behavior” and not “winner selection.”

An AI agent isn’t defined by only selecting the winner.
An agent is something that:

  • It plays a role (freelancer or mediator or referee)
  • Generates outputs that are aligned to that role
  • Reacts to state (task and bid attributes and negotiation round)
  • Communicates decisions.

My LLM is what makes the system agentic with the human sense: each role speaks differently, argues differently, negotiates differently, and produces structured and natural outputs based on context.

Without AI, these messages will become boring templates like:

“I bid $200, ETA 5 days.”

With AI, you get:

  • realistic bid proposals
  • negotiation dialogue
  • referee summaries
  • rationale that reads like a real arbitration report.

This is the part that makes the project feel like a real marketplace.

The LLM is your interface layer for humans

In real marketplace, the most difficult part is not just compute a score, but it is also

  • Selling the bid
  • Negotiating term
  • Explaining why we got that result
  • Building trust

LLMs are strong at:

  • Summarization
  • Persuasion
  • Controlled tone
  • converting structured data to a human language and also tone and voice.

So my AI is doing something critical: turning raw numbers into human-consumable interaction.

AI for multi-agent realism.

My multi-agent setup works because each agent needs to:

  • Respond differently
  • Stick to its persona, which I define
  • Reference specific numbers (budget, ETA, confidence)
  • Doesn’t hallucinate new constraints.

This is exactly where prompt engineering and my role conditioning is real agent AI and gives the feel:

  • Freelancer agent: writes bid notes based on its own attributes
  • Mediator agent: frames tradeoffs
  • Referee agent: summarizes rubric-style evaluation

So even if the winner is deterministic, the interaction loop is AI-driven.

Before jumping into the working project, let me take a quick step back and talk about why I built this system using AI agents in the first place.

This problem not only gives a result. It’s also a marketplace-style workflow where a client posts a task, multiple freelancers compete with bids, a mediator negotiates across the specified number of rounds, and a referee evaluates the outcomes before declaring a winner. That’s exactly where AI agents shine. They are not just predictive systems; they are actors with goals, roles, and decision loops.

In this project, I treat each bidder as an agent competing under the same rules. The system doesn’t simply pick the cheapest bid. It will weigh trade-off points like price, ETA, expected quality, and risk, then produce an explainable winner declaration through a deterministic process.

In other words: I’m not using “one big AI.” I’m using multiple specialized agents, and making them compete and negotiate so the final outcome is the best overall option: not just the lowest number.

What’s an AI agent? Is it different from agentic AI?

An AI agent is basically a system that can take action toward a goal and not just produce an answer. It usually has:

  • a) a goal (“win the bid,” “pick the best freelancer,” “resolve dispute”)
  • b) state (what has happened so far)
  • c) tools or functions it can call (scoring, negotiation rules, evaluation)
  • d) an action loop (observe then decide then act then observe)

Agentic AI is basically the style and architecture of building with agents: We design the AI as an actor in a loop that plans, calls tools, and iterates. In other words:

AI agent = the entity
A
gentic AI = the approach (building systems out of agents + loops + tools + state)

In my project, each “role” (Client, Freelancers, Mediator, Referee, Dispute) is an agent because it has a specific objective and produces actions (bid, counteroffer, accept/reject, score, propose resolution).

Why use AI agents instead of AI/ML techniques?

Because the problem here is not only prediction but also a workflow where segments interact with each other:

  • Multiple bidders created
  • Negotiation across different rounds
  • Changing offers after each state
  • Evaluation and explanation given
  • Dispute resolution done

Traditional ML is good at: Given X and then predict Y.
But marketplaces are for Given X, then run a process with decisions, constraints, feedback, and accountability.

Agents are used because:

  • It can model roles (freelancer vs mediator vs referee)
  • Model interaction and iteration (round-based negotiation)
  • Handle stateful workflows (ledger and timeline, changing bids)
  • Produce explainable decisions (decision report)

Why do ML models lack, and why aren’t they able to complete this?

ML models can also work here, but they cannot orchestrate the entire system by themselves unless we wrap them in an agent/workflow. ML and AI are more like school teachers, and AI agents are more like the administration team.

What ML struggles with here:

  • A) Sequential decision-making: Negotiation is not a straight output. But it is a multi-round process.
  • B) Constraints and also hard rules: Budget caps, ETA bounds, risk flags, and acceptance criteria are some of the constraints. Deterministic rules guarantee constraints are respected. ML may violate them unless we set some rules.
  • C) Explainability: My project’s value is tailored to what we prioritize. Raw ML predictions most often do not give structured results unless we add some layers of understanding
  • D) Auditability / reproducibility: My build is a replayable, ledger-driven simulation. Machine learning outputs vary with sampling and also can be difficult to audit.
  • E) Data requirement: A good ML marketplace model needs historical data with past tasks, bids, outcomes, disputes, and quality scores. Most personal projects don’t have that dataset.

How we use deterministic algorithms and how non-determinism could change or even improve results

Deterministic in my project means:

  • Given the same task + bids + weights + seed : I always get the same winner.
  • My scoring function is pure math + fixed normalization.
  • Negotiation randomness is controlled by a seeded RNG (so still reproducible).

Why this matters

  • I can replay scenarios in demos.
  • I can debug exactly why a winner changed.
  • I can compare versions of my algorithm.

Deterministic scoring formula

The winner is selected using a deterministic and a multi-attribute score system. The main idea I worked on is that different attributes have different units, and so I normalize them first (dollars, days, probabilities), and then combine them with weights.

Inputs per bid

Each bid has:

  • price in usd
  • eta in days
  • confidence
  • portfolio score
  • risk flags

And the task provides:

  • budget usd

Weights are editable in the UI:

  • price, eta, quality, risk

Normalization

Price normalization

* Image Snippet from GPT

ETA normalization

* Image Snippet from GPT

Quality and Risk are calculated deterministically

Instead of guessing quality directly, I compute an expected quality proxy from confidence and portfolio score, with penalties for risk flags

quality = clamp (0.55 * confidence + 0.45 * portfolio_score — 0.12 * |risk_flags|, 0, 1)

Risk is also computed deterministically as a function of:

  • number of risk flags
  • low confidence penalty

risk = clamp(0.15 * |risk_flags| + 0.35 . (1-confidence), 0, 1)

FINAL SCORING:

Price and ETA are costs (lower is better), so we subtract. Quality is a “benefit” (higher is better), so we add. Risk is a “penalty” (lower is better), so we subtract.

total = (-w_price * price_norm) + (-w_eta * eta_norm) + (w_quality * quality) + (-w_risk * risk)

WINNER = arg max(i) total(i)

Non-deterministic approach

  • Allowing the LLM decide who wins directly
  • The negotiation be fully done by the LLM
  • Using a non-deterministic algorithm
  • Getting different winners across runs
  • The negotiation behavior doesn’t remain same
  • Harder debugging (it hallucination)

Examples of some Non deterministic approach:

Monte Carlo winner selection (simulate outcomes, pick best expected value):
It treats each bid as uncertain. Simulate “delivery outcomes” many times:

  • pass/fail probability
  • quality score distribution
  • lateness probability

Bayesian scoring (posterior sampling or Thompson sampling):
This model’s each freelancer’s true quality, which is uncertain within a distribution.

Could non-determinism ever be better?

Depends on the problem at hand, but yes, in a controlled way:

  • Using LLMs to generate candidate negotiation strategies and then pick the best option, after considering my deterministic scorer.
  • Using stochastic exploration to avoid local optima
  • Using probabilistic risk modeling

Prompts for interacting with the LLM

In my project, prompts are used only where natural language is needed, and not where determining the result is required.

I intentionally didn’t use prompts to decide the winner. Since that decision is handled by a deterministic scoring function, the system will remain reproducible and auditable. Instead of that, I used prompt engineering for communication, justification, and role-played interaction.

Deterministic code is the rules of the marketplace (fair, repeatable, measurable)

Prompts + LLM is the human layer (telling me how agents explain bids, negotiate and also justify decisions)

Where prompts are used

Freelancer bid notes (adding realism)

Each freelancer agent will generate a short bid note that sounds like a real proposal based on:

  • task title + acceptance criteria
  • their proposed price + ETA
  • their confidence score

This is where prompt engineering is needed, cause if I don’t constrain the output, the LLM will tend to add fluff or uneeded decorations.

What I want: clean text, 1–2 sentences.

So the prompt is designed to control:

  • format
  • length (1–2 sentences)
  • content

Example prompt structure

  • System: define role and output format
  • User: must include task and numeric bid attributes

System:
“You are a freelancer. Write a concise bid note. Output only the note.”

User:
“Task: X | Criteria: ## | Price: $# | ETA: Z days | Confidence: C.”

Result: the same numbers give a different voice, but always relevant and with the same structure.

Negotiation messaging

Even if the negotiation logic is rule-based, prompts can generate a message that can counteroffer:

  • Mediator: We like your portfolio, but can you reduce the price by 10% or deliver 1 day earlier?
  • Freelancer: Yes, I can reduce the price slightly, but faster delivery will also lead to increased risk.”

Prompt engineering here is about producing a consistent negotiation tone:

  • professional
  • specific
  • doesn’t invent new constraints

Decision explanation (human-friendly + structured)

My system already gives a structured decision report. The Prompts can generate a nice, readable narrative and explanation.

This is a great pattern, because:

  1. A deterministic scorer will decide the winner
  2. LLM gives the explanation using the scorers output as ground truth

Example:

Bid A has the lowest price but higher risk flags and also lower expected quality. But Bid B won surprisingly, because it had a higher total utility across price, ETA, quality, and risk.

This makes the project feel agentic while staying accurate.

Prompt patterns that matter

Role conditioning (prevents generic text)

The same prompt with different role framing gives better results:

  • Freelancer: focuses on delivery and confidence
  • Mediator: focuses on tradeoffs and constraints
  • Referee: focuses on the evaluation criteria

This is agent behavior, created through prompt design.

Grounding to structured inputs (prevents hallucination)

  • the exact bid numbers
  • acceptance criteria list
  • relevant constraints

So the LLM doesn’t need to invent missing details. It’s “writing with receipts.”

Deterministic guardrails

Even though my LLM outputs differ from time to time, they do not break the system because:

  • The LLM is not used for scoring
  • The LLM is not used for selecting the winner

Good prompts help, but architecture prevents damage.

Please reach out to me for clarification or suggestions, or even If your hiring!

LinkedIn: https://www.linkedin.com/in/adrian-dsouza-b84a7a1b0/

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI


Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.

Start free — no commitment:

6-Day Agentic AI Engineering Email Guide — one practical lesson per day

Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages

Our courses:

AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.

Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.

AI for Work — Understand, evaluate, and apply AI for complex work tasks.

Note: Article content contains the views of the contributing authors and not Towards AI.