When AIs Bluff: Building a Multi-Agent BS Card Game (with AWS Strands)

I wanted to make agents play BS with each other

BS is a card game where players take turns placing cards face-down on a pile, claiming they match the current target rank (ex: Ace, 1, Queen). You can lie about your cards. If someone calls "BS!", and the cards don't match your claim, you have to pick up the entire pile. If you were telling the truth, the caller picks it up. The first player to empty their hand wins the game.

If agents were to play this game, they need to be able to reason about incomplete information, bluff strategically, and react to the behavior of other agents.

Goal
WIN A
GAME OF BS
system_prompt
player_agent
tools
system_prompt
player_agent
tools
system_prompt
player_agent
tools
chatting tool allows
agents to interact
result of turn
game state
Termination
END
(when agent has empty hand)

What is AWS Strands?

Strands is an open-source agentic framework from AWS. Instead of writing a ton of JSON schema to describe your tools, Strands allows you to write normal Python functions, simply adding @tool decorator to declare it as such. Strands reads your docstring and type hints to auto-generate the schema for you.

Strands also handles the tool-call loop internally, which runs automatically when your agent is called.

Install Packages & Model Setup

Install packages and import them.

For your LLM model, you can use one from AWS Bedrock or your own API key. AWS Strands works with both options.

import os, json, requests

from strands import Agent, tool
from strands.models.litellm import LiteLLMModel

# set your key
os.environ['MISTRAL_API_KEY'] = ""  #@param

# LiteLLM lets you swap model providers without changing your tool code
model = LiteLLMModel(model_id="mistral/mistral-large-latest")

Card Deck

The Deck of Cards API (deckofcardsapi.com) will be used to handle the actual cards. This part requires hitting only 2 endpoints: one to create a shuffled deck, and one to draw all 52 cards.

def setup_deck():
    print("setup da deckofcards api")
    res = requests.get("https://deckofcardsapi.com/api/deck/new/shuffle/?deck_count=1").json()
    deck_id = res['deck_id']
    draw_res = requests.get(f"https://deckofcardsapi.com/api/deck/{deck_id}/draw/?count=52").json()

    cards = []
    for c in draw_res['cards']:
        code = c['code']
        if code.startswith('0'):
            code = '10' + code[1:]  # normalize: '0S' โ†’ '10S'
        cards.append(code)
    return cards

Defining Tools with @tool

Each tool is just a Python function. The docstring becomes the tool description that the LLM sees, and the type hints become the parameter schema. By using the @tool decorator, there is no JSON to manually maintain and pass into an agent.

Here is an example of a tool:

@tool
def play_cards(cards: list) -> str:
    """Play 1 to 4 cards from your hand face-down. You must declare them as the target rank.
    You can lie about what the cards actually are โ€” that's the whole game!

    Args:
        cards: List of card codes from your hand to play (e.g. ['AS', 'AD']).
    """
    global game, current_player

    hand = game.hands[current_player]

    if not (1 <= len(cards) <= 4):
        return "Error: You must play between 1 and 4 cards."
    if not all(c in hand for c in cards):
        return f"Error: Some cards aren't in your hand. Your hand is: {hand}"

    for c in cards:
        hand.remove(c)
    game.center_pile.extend(cards)

    target_rank = game.get_target_rank()
    game.last_play = {"player": current_player, "count": len(cards), "cards": cards}

    print(f"๐Ÿƒ {current_player} played {len(cards)} card(s) claiming to be {target_rank}s.")
    return f"Cards played successfully. You claimed they were {target_rank}s."

The Game Engine

BSGame is a normal Python class that tracks the target hand that the players must place each turn, the center pile, and the last play. Dealing 5 cards per player keeps the demo quick (and uses less tokens if you are using an open source model or rate-limited model).

RANKS = ['A', '2', '3', '4', '5', '6', '7', '8', '9', '10', 'J', 'Q', 'K']

class BSGame:
    def __init__(self, player_names):
        self.players = player_names
        self.hands = {name: [] for name in player_names}
        self.center_pile = []
        self.current_rank_index = 0
        self.last_play = None
        self.game_over = False

        deck = setup_deck()
        # deal 5 cards each for a quicker demo
        for _ in range(5):
            for name in self.players:
                self.hands[name].append(deck.pop())

    def get_target_rank(self):
        return RANKS[self.current_rank_index]

    def advance_rank(self):
        self.current_rank_index = (self.current_rank_index + 1) % len(RANKS)

    def print_state(self):
        print("\n" + "="*40)
        print(f"๐ŸŽฏ Target Rank: {self.get_target_rank()}")
        print(f"๐Ÿƒ Center Pile: {len(self.center_pile)} cards")
        for p in self.players:
            print(f"๐Ÿ‘ค {p}: {len(self.hands[p])} cards")
        print("="*40 + "\n")

Initializing Agents

Each player is a Strands Agent. The agent is assigned a model, list of tools, and a system prompt. For this demo, the system is identical for all three players.

Even with the same instructions, the name given to each agent can change its behavior, because models have learned associations with certain names (cultural, gender, ethnicity, etc.). If you did not want this to affect agentic behavior, you should make sure the name does not enter the agent's context. This can be done by masking the name with something more neutral like a numerical ID.

Main Game Loop

The loop runs three phases for each turn:

  1. Active player places down cards
  2. All other players decide to call BS or pass
  3. Cards are distributed (based on the results of the player reaction)

Since Strands handles the tool-call loop internally when you call all_agents[name](prompt), you just pass a prompt and get a response back.

while not game.game_over:
    game.print_state()
    active_name = names[turn_idx % len(names)]
    current_player = active_name
    target_rank = game.get_target_rank()

    # โ”€โ”€ phase 1: active player plays cards โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    prompt = build_prompt(
        active_name,
        f"It is your turn. Target rank is {target_rank}. "
        f"Your hand is {game.hands[active_name]}. "
        "Play 1-4 cards using the play_cards tool."
    )
    all_agents[active_name](prompt)  # Strands runs the tool-call loop internally

    # โ”€โ”€ phase 2: other players react โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    bs_caller = None
    for other_name in names:
        if other_name == active_name:
            continue
        current_player = other_name
        reaction_prompt = build_prompt(
            other_name,
            f"{active_name} played {game.last_play['count']} card(s) "
            f"claiming to be {target_rank}s. Do you call BS or pass? "
            "Use call_bs or pass_turn."
        )
        reaction = all_agents[other_name](reaction_prompt)

        if response_contains(reaction, "BS_CALLED"):
            bs_caller = other_name
            print(f"๐Ÿšจ {bs_caller} CALLED BS ON {active_name}! ๐Ÿšจ")
            break

    # โ”€โ”€ phase 3: resolve the round โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    if bs_caller:
        actual_cards = game.last_play["cards"]
        is_truth = all(c[:-1] == target_rank for c in actual_cards)

        if is_truth:
            print(f"โœ… {active_name} told the TRUTH! {bs_caller} takes the pile.")
            game.hands[bs_caller].extend(game.center_pile)
        else:
            print(f"โŒ {active_name} LIED! {active_name} takes the pile.")
            game.hands[active_name].extend(game.center_pile)

        game.center_pile = []

build_prompt helper

Between turns, other agents may have sent chat messages that need to be delivered. build_prompt prepends any queued messages to an agent's next prompt.

def build_prompt(player_name: str, core: str) -> str:
    queued = pending_chats.pop(player_name, [])
    if queued:
        relay_block = "\n".join(queued)
        return f"[Messages from other players since your last turn]\n{relay_block}\n\n{core}"
    return core

Parsing Agent Responses

Since Strands returns a response object with a content list rather than a plain string, we can check our sentinel value across all content blocks, including nested tool result blocks.

def response_contains(response, keyword: str) -> bool:
    """Check if any tool result in a Strands response contains a keyword."""
    for block in response.message.get("content", []):
        if not isinstance(block, dict):
            continue
        # tool_result blocks have a nested content list
        if block.get("type") == "tool_result":
            for inner in block.get("content", []):
                if isinstance(inner, dict) and keyword in inner.get("text", ""):
                    return True
        if keyword in block.get("text", ""):
            return True
    return False

Self Feedback Loop

The game works, but right now the agents play the same way every game.

What if they improved after every round?

Something you could do is to add a coach agent that watches the game, tracks each player's bluff rate and win rate, and writes updated system prompts between rounds. The players improve; the coach adapts. This would be considered a multi-agent feedback loop.

Some metrics that you could add for a coach agent, given the game log:

Type Metric What it tells you
Quantitative Bluff success % Is the agent's lying strategy working?
Quantitative Win rate across rounds Is the agent actually improving?
Quantitative Avg turns to win Is the agent playing efficiently?
Qualitative Did the agent adapt? Is it reading the table, not just its hand?
Qualitative Was it a "good" bluff? LLM-as-a-judge can evaluate this

Recap

This coding demo uses three Strands agents that share a global game state. Each agent has tools decorated with @tool. The chat tool injects messages into other agents' memories for inter-agent communication, enabling them to potentially form strategies (and trash talk). The main game loop implements three phases per turn: play, react, resolve.

Strands reduces the amount of boilerplate needed because it takes care of the tool-call loop internally. Check out the full documentation of Strands here: strands link

Full code for this demo: demo link

I wrote this article on the AWS Builder center too!: builder center link