Beyond Lowest Bid: A Deterministic, Explainable Multi-Agent Hiring System

Last Updated on February 6, 2026 by Editorial Team

Author(s): Adrian Dsouza

Originally published on Towards AI.

Imagine you need to hire someone for a small task for a given product or team, it can be an API endpoint, maybe a small bug fix, or maybe a quick dashboard. We can post the job, and within a couple of minutes, five freelancers will respond with bids. They can have different prices, different timelines, and different levels of confidence. But in reality, we mostly see that the “cheapest” option often looks good on paper until the code arrives late, breaks tests, or needs a full rewrite, and we spend more time or even lose money while trying to rectify it. Sounds bad, doesn’t it?

That’s the exact problem my project tackles. TaskBounty DAO simulates a real hiring marketplace by considering five freelancer agents (Demo), which will compete with the same rules that is bid, negotiate across multiple rounds, and then evaluate, which produces a winner based on price, ETA, expected quality, and risk, and not just cost.

I would love to brainstorm on the next version, including some randomness or non deterministic approach for determining the Winner!

Project Deployed on GitHub

GitHub — adrian2504/Multi-Agent-Task-Bidding-Negotiation-Quality-Arbitration-Dispute-Resolution

Contribute to adrian2504/Multi-Agent-Task-Bidding-Negotiation-Quality-Arbitration-Dispute-Resolution development by…

github.com

A Cool Project I Built

Multi-agent simulator: Firstly a client will post a task (title, acceptance criteria, and budget), and then 5 freelancer agents will compete against each other by submitting bids (price, ETA, confidence, portfolio score, risk flags).
Negotiation rounds and replayable runs: AMediator will run N number of rounds of counteroffers and then update bids after each round. A seed makes the run produce the same results. That is, the same output for the same inputs.
Winner selection: A deterministic scorer will rank the bids the freelancers give, using price + ETA + expected quality and risk, and returns a structured report (score breakdown per bid + winner ).
UI and API end-to-end: FastAPI exposes /demo/run-uiand /run-ui and a React frontend that shows the Winner card and then ranks the bids in a table, followed by the Referee summary, and the Negotiation timeline.

Beyond Lowest Bid: A Deterministic, Explainable Multi-Agent Hiring System

Where does the use of LLM fit in this Project?

Even if the LLM is “only being used for language generation,” it still showcases AI, just not the decision-making AI. My project is basically a hybrid agent system:

Deterministic core = decision engine (guarantees correctness + reproducibility)
LLM layer = agent behavior + communication (adds realism + usability + explainability)

AI used for “agent behavior” and not “winner selection.”

An AI agent isn’t defined by only selecting the winner.
An agent is something that:

It plays a role (freelancer or mediator or referee)
Generates outputs that are aligned to that role
Reacts to state (task and bid attributes and negotiation round)
Communicates decisions.

My LLM is what makes the system agentic with the human sense: each role speaks differently, argues differently, negotiates differently, and produces structured and natural outputs based on context.

Without AI, these messages will become boring templates like:

“I bid $200, ETA 5 days.”

With AI, you get:

realistic bid proposals
negotiation dialogue
referee summaries
rationale that reads like a real arbitration report.

This is the part that makes the project feel like a real marketplace.

The LLM is your interface layer for humans

In real marketplace, the most difficult part is not just compute a score, but it is also

Selling the bid
Negotiating term
Explaining why we got that result
Building trust

LLMs are strong at:

Summarization
Persuasion
Controlled tone
converting structured data to a human language and also tone and voice.

So my AI is doing something critical: turning raw numbers into human-consumable interaction.

AI for multi-agent realism.

My multi-agent setup works because each agent needs to:

Respond differently
Stick to its persona, which I define
Reference specific numbers (budget, ETA, confidence)
Doesn’t hallucinate new constraints.

This is exactly where prompt engineering and my role conditioning is real agent AI and gives the feel:

Freelancer agent: writes bid notes based on its own attributes
Mediator agent: frames tradeoffs
Referee agent: summarizes rubric-style evaluation

So even if the winner is deterministic, the interaction loop is AI-driven.

Before jumping into the working project, let me take a quick step back and talk about why I built this system using AI agents in the first place.

This problem not only gives a result. It’s also a marketplace-style workflow where a client posts a task, multiple freelancers compete with bids, a mediator negotiates across the specified number of rounds, and a referee evaluates the outcomes before declaring a winner. That’s exactly where AI agents shine. They are not just predictive systems; they are actors with goals, roles, and decision loops.

In this project, I treat each bidder as an agent competing under the same rules. The system doesn’t simply pick the cheapest bid. It will weigh trade-off points like price, ETA, expected quality, and risk, then produce an explainable winner declaration through a deterministic process.

In other words: I’m not using “one big AI.” I’m using multiple specialized agents, and making them compete and negotiate so the final outcome is the best overall option: not just the lowest number.

What’s an AI agent? Is it different from agentic AI?

An AI agent is basically a system that can take action toward a goal and not just produce an answer. It usually has:

a) a goal (“win the bid,” “pick the best freelancer,” “resolve dispute”)
b) state (what has happened so far)
c) tools or functions it can call (scoring, negotiation rules, evaluation)
d) an action loop (observe then decide then act then observe)

Agentic AI is basically the style and architecture of building with agents: We design the AI as an actor in a loop that plans, calls tools, and iterates. In other words:

AI agent = the entity
Agentic AI = the approach (building systems out of agents + loops + tools + state)

In my project, each “role” (Client, Freelancers, Mediator, Referee, Dispute) is an agent because it has a specific objective and produces actions (bid, counteroffer, accept/reject, score, propose resolution).

Why use AI agents instead of AI/ML techniques?

Because the problem here is not only prediction but also a workflow where segments interact with each other:

Multiple bidders created
Negotiation across different rounds
Changing offers after each state
Evaluation and explanation given
Dispute resolution done

Traditional ML is good at: Given X and then predict Y.
But marketplaces are for Given X, then run a process with decisions, constraints, feedback, and accountability.

Agents are used because:

It can model roles (freelancer vs mediator vs referee)
Model interaction and iteration (round-based negotiation)
Handle stateful workflows (ledger and timeline, changing bids)
Produce explainable decisions (decision report)

Why do ML models lack, and why aren’t they able to complete this?

ML models can also work here, but they cannot orchestrate the entire system by themselves unless we wrap them in an agent/workflow. ML and AI are more like school teachers, and AI agents are more like the administration team.

What ML struggles with here:

A) Sequential decision-making: Negotiation is not a straight output. But it is a multi-round process.
B) Constraints and also hard rules: Budget caps, ETA bounds, risk flags, and acceptance criteria are some of the constraints. Deterministic rules guarantee constraints are respected. ML may violate them unless we set some rules.
C) Explainability: My project’s value is tailored to what we prioritize. Raw ML predictions most often do not give structured results unless we add some layers of understanding
D) Auditability / reproducibility: My build is a replayable, ledger-driven simulation. Machine learning outputs vary with sampling and also can be difficult to audit.
E) Data requirement: A good ML marketplace model needs historical data with past tasks, bids, outcomes, disputes, and quality scores. Most personal projects don’t have that dataset.

How we use deterministic algorithms and how non-determinism could change or even improve results

Deterministic in my project means:

Given the same task + bids + weights + seed : I always get the same winner.
My scoring function is pure math + fixed normalization.
Negotiation randomness is controlled by a seeded RNG (so still reproducible).

Why this matters

I can replay scenarios in demos.
I can debug exactly why a winner changed.
I can compare versions of my algorithm.

Deterministic scoring formula

The winner is selected using a deterministic and a multi-attribute score system. The main idea I worked on is that different attributes have different units, and so I normalize them first (dollars, days, probabilities), and then combine them with weights.

Inputs per bid

Each bid has:

price in usd
eta in days
confidence
portfolio score
risk flags

And the task provides:

budget usd

Weights are editable in the UI:

price, eta, quality, risk

Normalization

Price normalization

ETA normalization

Quality and Risk are calculated deterministically

Instead of guessing quality directly, I compute an expected quality proxy from confidence and portfolio score, with penalties for risk flags

quality = clamp (0.55 * confidence + 0.45 * portfolio_score — 0.12 * |risk_flags|, 0, 1)

Risk is also computed deterministically as a function of:

number of risk flags
low confidence penalty

risk = clamp(0.15 * |risk_flags| + 0.35 . (1-confidence), 0, 1)

FINAL SCORING:

Price and ETA are costs (lower is better), so we subtract. Quality is a “benefit” (higher is better), so we add. Risk is a “penalty” (lower is better), so we subtract.

total = (-w_price * price_norm) + (-w_eta * eta_norm) + (w_quality * quality) + (-w_risk * risk)

WINNER = arg max(i) total(i)

Non-deterministic approach

Allowing the LLM decide who wins directly
The negotiation be fully done by the LLM
Using a non-deterministic algorithm
Getting different winners across runs
The negotiation behavior doesn’t remain same
Harder debugging (it hallucination)

Examples of some Non deterministic approach:

Monte Carlo winner selection (simulate outcomes, pick best expected value):
It treats each bid as uncertain. Simulate “delivery outcomes” many times:

pass/fail probability
quality score distribution
lateness probability

Bayesian scoring (posterior sampling or Thompson sampling):
This model’s each freelancer’s true quality, which is uncertain within a distribution.

Could non-determinism ever be better?

Depends on the problem at hand, but yes, in a controlled way:

Using LLMs to generate candidate negotiation strategies and then pick the best option, after considering my deterministic scorer.
Using stochastic exploration to avoid local optima
Using probabilistic risk modeling

Prompts for interacting with the LLM

In my project, prompts are used only where natural language is needed, and not where determining the result is required.

I intentionally didn’t use prompts to decide the winner. Since that decision is handled by a deterministic scoring function, the system will remain reproducible and auditable. Instead of that, I used prompt engineering for communication, justification, and role-played interaction.

Deterministic code is the rules of the marketplace (fair, repeatable, measurable)

Prompts + LLM is the human layer (telling me how agents explain bids, negotiate and also justify decisions)

Where prompts are used

Freelancer bid notes (adding realism)

Each freelancer agent will generate a short bid note that sounds like a real proposal based on:

task title + acceptance criteria
their proposed price + ETA
their confidence score

This is where prompt engineering is needed, cause if I don’t constrain the output, the LLM will tend to add fluff or uneeded decorations.

What I want: clean text, 1–2 sentences.

So the prompt is designed to control:

format
length (1–2 sentences)
content

Example prompt structure

System: define role and output format
User: must include task and numeric bid attributes

System:
“You are a freelancer. Write a concise bid note. Output only the note.”

User:
“Task: X | Criteria: ## | Price: $# | ETA: Z days | Confidence: C.”

Result: the same numbers give a different voice, but always relevant and with the same structure.

Negotiation messaging

Even if the negotiation logic is rule-based, prompts can generate a message that can counteroffer:

Mediator: We like your portfolio, but can you reduce the price by 10% or deliver 1 day earlier?
Freelancer: Yes, I can reduce the price slightly, but faster delivery will also lead to increased risk.”

Prompt engineering here is about producing a consistent negotiation tone:

professional
specific
doesn’t invent new constraints

Decision explanation (human-friendly + structured)

My system already gives a structured decision report. The Prompts can generate a nice, readable narrative and explanation.

This is a great pattern, because:

A deterministic scorer will decide the winner
LLM gives the explanation using the scorers output as ground truth

Example:

Bid A has the lowest price but higher risk flags and also lower expected quality. But Bid B won surprisingly, because it had a higher total utility across price, ETA, quality, and risk.

This makes the project feel agentic while staying accurate.

Prompt patterns that matter

Role conditioning (prevents generic text)

The same prompt with different role framing gives better results:

Freelancer: focuses on delivery and confidence
Mediator: focuses on tradeoffs and constraints
Referee: focuses on the evaluation criteria

This is agent behavior, created through prompt design.

Grounding to structured inputs (prevents hallucination)

the exact bid numbers
acceptance criteria list
relevant constraints

So the LLM doesn’t need to invent missing details. It’s “writing with receipts.”

Deterministic guardrails

Even though my LLM outputs differ from time to time, they do not break the system because:

The LLM is not used for scoring
The LLM is not used for selecting the winner

Good prompts help, but architecture prevents damage.

Please reach out to me for clarification or suggestions, or even If your hiring!

LinkedIn: https://www.linkedin.com/in/adrian-dsouza-b84a7a1b0/

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources