When AIs Bluff: Building a Multi-Agent BS Card Game (with AWS Strands)
I wanted to make agents play BS with each other
BS is a card game where players take turns placing cards face-down on a pile, claiming they match the current target rank (ex: Ace, 1, Queen). You can lie about your cards. If someone calls "BS!", and the cards don't match your claim, you have to pick up the entire pile. If you were telling the truth, the caller picks it up. The first player to empty their hand wins the game.
If agents were to play this game, they need to be able to reason about incomplete information, bluff strategically, and react to the behavior of other agents.
GAME OF BS
agents to interact
What is AWS Strands?
Strands is an open-source agentic framework from AWS. Instead of writing a ton of JSON schema to describe your tools, Strands allows you to write normal Python functions, simply adding @tool decorator to declare it as such. Strands reads your docstring and type hints to auto-generate the schema for you.
Strands also handles the tool-call loop internally, which runs automatically when your agent is called.
Install Packages & Model Setup
Install packages and import them.
For your LLM model, you can use one from AWS Bedrock or your own API key. AWS Strands works with both options.
import os, json, requests
from strands import Agent, tool
from strands.models.litellm import LiteLLMModel
# set your key
os.environ['MISTRAL_API_KEY'] = "" #@param
# LiteLLM lets you swap model providers without changing your tool code
model = LiteLLMModel(model_id="mistral/mistral-large-latest")
Card Deck
The Deck of Cards API (deckofcardsapi.com) will be used to handle the actual cards. This part requires hitting only 2 endpoints: one to create a shuffled deck, and one to draw all 52 cards.
def setup_deck():
print("setup da deckofcards api")
res = requests.get("https://deckofcardsapi.com/api/deck/new/shuffle/?deck_count=1").json()
deck_id = res['deck_id']
draw_res = requests.get(f"https://deckofcardsapi.com/api/deck/{deck_id}/draw/?count=52").json()
cards = []
for c in draw_res['cards']:
code = c['code']
if code.startswith('0'):
code = '10' + code[1:] # normalize: '0S' โ '10S'
cards.append(code)
return cards
Defining Tools with @tool
Each tool is just a Python function. The docstring becomes the tool description that the LLM sees, and the type hints become the parameter schema. By using the @tool decorator, there is no JSON to manually maintain and pass into an agent.
Here is an example of a tool:
@tool
def play_cards(cards: list) -> str:
"""Play 1 to 4 cards from your hand face-down. You must declare them as the target rank.
You can lie about what the cards actually are โ that's the whole game!
Args:
cards: List of card codes from your hand to play (e.g. ['AS', 'AD']).
"""
global game, current_player
hand = game.hands[current_player]
if not (1 <= len(cards) <= 4):
return "Error: You must play between 1 and 4 cards."
if not all(c in hand for c in cards):
return f"Error: Some cards aren't in your hand. Your hand is: {hand}"
for c in cards:
hand.remove(c)
game.center_pile.extend(cards)
target_rank = game.get_target_rank()
game.last_play = {"player": current_player, "count": len(cards), "cards": cards}
print(f"๐ {current_player} played {len(cards)} card(s) claiming to be {target_rank}s.")
return f"Cards played successfully. You claimed they were {target_rank}s."
The Game Engine
BSGame is a normal Python class that tracks the target hand that the players must place each turn, the center pile, and the last play. Dealing 5 cards per player keeps the demo quick (and uses less tokens if you are using an open source model or rate-limited model).
RANKS = ['A', '2', '3', '4', '5', '6', '7', '8', '9', '10', 'J', 'Q', 'K']
class BSGame:
def __init__(self, player_names):
self.players = player_names
self.hands = {name: [] for name in player_names}
self.center_pile = []
self.current_rank_index = 0
self.last_play = None
self.game_over = False
deck = setup_deck()
# deal 5 cards each for a quicker demo
for _ in range(5):
for name in self.players:
self.hands[name].append(deck.pop())
def get_target_rank(self):
return RANKS[self.current_rank_index]
def advance_rank(self):
self.current_rank_index = (self.current_rank_index + 1) % len(RANKS)
def print_state(self):
print("\n" + "="*40)
print(f"๐ฏ Target Rank: {self.get_target_rank()}")
print(f"๐ Center Pile: {len(self.center_pile)} cards")
for p in self.players:
print(f"๐ค {p}: {len(self.hands[p])} cards")
print("="*40 + "\n")
Initializing Agents
Each player is a Strands Agent. The agent is assigned a model, list of tools, and a system prompt. For this demo, the system is identical for all three players.
Even with the same instructions, the name given to each agent can change its behavior, because models have learned associations with certain names (cultural, gender, ethnicity, etc.). If you did not want this to affect agentic behavior, you should make sure the name does not enter the agent's context. This can be done by masking the name with something more neutral like a numerical ID.
Main Game Loop
The loop runs three phases for each turn:
- Active player places down cards
- All other players decide to call BS or pass
- Cards are distributed (based on the results of the player reaction)
Since Strands handles the tool-call loop internally when you call all_agents[name](prompt), you just pass a prompt and get a response back.
while not game.game_over:
game.print_state()
active_name = names[turn_idx % len(names)]
current_player = active_name
target_rank = game.get_target_rank()
# โโ phase 1: active player plays cards โโโโโโโโโโโโโโโโโโโโโโ
prompt = build_prompt(
active_name,
f"It is your turn. Target rank is {target_rank}. "
f"Your hand is {game.hands[active_name]}. "
"Play 1-4 cards using the play_cards tool."
)
all_agents[active_name](prompt) # Strands runs the tool-call loop internally
# โโ phase 2: other players react โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
bs_caller = None
for other_name in names:
if other_name == active_name:
continue
current_player = other_name
reaction_prompt = build_prompt(
other_name,
f"{active_name} played {game.last_play['count']} card(s) "
f"claiming to be {target_rank}s. Do you call BS or pass? "
"Use call_bs or pass_turn."
)
reaction = all_agents[other_name](reaction_prompt)
if response_contains(reaction, "BS_CALLED"):
bs_caller = other_name
print(f"๐จ {bs_caller} CALLED BS ON {active_name}! ๐จ")
break
# โโ phase 3: resolve the round โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
if bs_caller:
actual_cards = game.last_play["cards"]
is_truth = all(c[:-1] == target_rank for c in actual_cards)
if is_truth:
print(f"โ
{active_name} told the TRUTH! {bs_caller} takes the pile.")
game.hands[bs_caller].extend(game.center_pile)
else:
print(f"โ {active_name} LIED! {active_name} takes the pile.")
game.hands[active_name].extend(game.center_pile)
game.center_pile = []
build_prompt helper
Between turns, other agents may have sent chat messages that need to be delivered. build_prompt prepends any queued messages to an agent's next prompt.
def build_prompt(player_name: str, core: str) -> str:
queued = pending_chats.pop(player_name, [])
if queued:
relay_block = "\n".join(queued)
return f"[Messages from other players since your last turn]\n{relay_block}\n\n{core}"
return core
Parsing Agent Responses
Since Strands returns a response object with a content list rather than a plain string, we can check our sentinel value across all content blocks, including nested tool result blocks.
def response_contains(response, keyword: str) -> bool:
"""Check if any tool result in a Strands response contains a keyword."""
for block in response.message.get("content", []):
if not isinstance(block, dict):
continue
# tool_result blocks have a nested content list
if block.get("type") == "tool_result":
for inner in block.get("content", []):
if isinstance(inner, dict) and keyword in inner.get("text", ""):
return True
if keyword in block.get("text", ""):
return True
return False
Self Feedback Loop
The game works, but right now the agents play the same way every game.
What if they improved after every round?
Something you could do is to add a coach agent that watches the game, tracks each player's bluff rate and win rate, and writes updated system prompts between rounds. The players improve; the coach adapts. This would be considered a multi-agent feedback loop.
Some metrics that you could add for a coach agent, given the game log:
| Type | Metric | What it tells you |
|---|---|---|
| Quantitative | Bluff success % | Is the agent's lying strategy working? |
| Quantitative | Win rate across rounds | Is the agent actually improving? |
| Quantitative | Avg turns to win | Is the agent playing efficiently? |
| Qualitative | Did the agent adapt? | Is it reading the table, not just its hand? |
| Qualitative | Was it a "good" bluff? | LLM-as-a-judge can evaluate this |
Recap
This coding demo uses three Strands agents that share a global game state. Each agent has tools decorated with @tool. The chat tool injects messages into other agents' memories for inter-agent communication, enabling them to potentially form strategies (and trash talk). The main game loop implements three phases per turn: play, react, resolve.
Strands reduces the amount of boilerplate needed because it takes care of the tool-call loop internally. Check out the full documentation of Strands here: strands link
Full code for this demo: demo link
I wrote this article on the AWS Builder center too!: builder center link