Quick Start¶
This guide will help you get started with pymahjong quickly.
Single-agent Environment¶
The single-agent environment is similar to standard OpenAI Gym environments. Your agent plays against 3 opponents.
Basic Usage¶
import pymahjong
import numpy as np
# Create environment with random opponents
env = pymahjong.SingleAgentMahjongEnv(opponent_agent="random")
# Reset and get initial observation
obs = env.reset()
while True:
# Get valid actions at current step
valid_actions = env.get_valid_actions()
# Select a random valid action
action = np.random.choice(valid_actions)
# Step the environment
obs, reward, done, info = env.step(action)
if done:
print(f"Game over! Payoff: {reward}")
break
Using Pretrained Opponents¶
import pymahjong
# Load environment with pretrained VLOG opponents
env = pymahjong.SingleAgentMahjongEnv(
opponent_agent="path/to/mahjong_VLOG_CQL.pth"
)
obs = env.reset()
# ... same as above
Multi-agent Environment¶
In the multi-agent environment, 4 agents play against each other.
Basic Usage¶
import pymahjong
import numpy as np
env = pymahjong.MahjongEnv()
for game in range(10):
env.reset()
while not env.is_over():
# Get current player ID
curr_player_id = env.get_curr_player_id()
# Get observation and valid actions
obs = env.get_obs(curr_player_id)
valid_actions = env.get_valid_actions()
# Make a random decision
action = np.random.choice(valid_actions)
env.step(curr_player_id, action)
# Get final payoffs
payoffs = env.get_payoffs()
print(f"Game {game}, payoffs: {payoffs}")
Observation Space¶
The observation is a boolean matrix:
Executor observation: Shape
(93, 34)- visible game stateOracle observation: Shape
(18, 34)- hidden information (opponents’ hands)Full observation: Shape
(111, 34)- concatenated
# Get different observations
obs = env.get_obs(player_id) # Executor observation (93, 34)
oracle = env.get_oracle_obs(player_id) # Oracle observation (18, 34)
full = env.get_full_obs(player_id) # Full observation (111, 34)
The 34 columns represent:
Characters (Man): 1-9
Dots (Pin): 1-9
Bamboo (Sou): 1-9
Winds: East, South, West, North
Dragons: White, Green, Red
Action Space¶
There are 54 discrete actions. Not all actions are valid at each step.
# Get valid action indices
valid_actions = env.get_valid_actions() # e.g., [0, 3, 4, 20, 21]
# Get one-hot mask of valid actions
mask = env.get_valid_actions(nhot=True) # Shape (54,)
Action Types¶
Index Range |
Action Type |
|---|---|
0-33 |
Discard tile (0-33) |
34-36 |
Discard red dora 5m/5p/5s |
37-42 |
Chi (left/middle/right, with/without red) |
43-44 |
Pon (with/without red) |
45 |
AnKan (concealed kan) |
46 |
MinKan (open kan) |
47 |
KaKan (added kan) |
48 |
Riichi |
49 |
Ron |
50 |
Tsumo |
51 |
Push (kyushukyuhai) |
52 |
Pass (during riichi decision) |
53 |
Pass (response) |
Game Initialization¶
# Default initialization
env.reset()
# Custom initialization
env.reset(
oya=0, # Parent player (0-3)
game_wind="east", # Game wind
seed=42 # Random seed
)
Rendering¶
For debugging, you can print the game state:
env.render()
This displays each player’s hand and river in text format.
Next Steps¶
API Reference - Detailed API documentation
Examples - More usage examples
Advanced Topics - Custom agents and C++ engine