Observation¶
Observation Encoding¶
Observations are encoded as boolean matrices where each element is 0 or 1.
Executor Observation¶
Shape: (93, 34)
The executor observation contains all visible game state information.
Channel Groups¶
Channels |
Description |
|---|---|
0-3 |
Own hand tiles (counts up to 4) |
4-6 |
Red dora in hand |
7-10 |
Own melds (chi/pon/kan) |
11-13 |
Red dora in melds |
14-17 |
Own river tiles |
18-25 |
Player 1’s visible tiles |
26-33 |
Player 2’s visible tiles |
34-41 |
Player 3’s visible tiles |
42-44 |
Dora indicators |
45-76 |
Remaining tile counts |
77-92 |
Game state flags |
Oracle Observation¶
Shape: (18, 34)
The oracle observation contains hidden information - the opponents’ hands.
Channels |
Description |
|---|---|
0-3 |
Player 1’s hand |
4-7 |
Player 2’s hand |
8-11 |
Player 3’s hand |
12-14 |
Red dora in opponents’ hands |
15-17 |
Hand size information |
Full Observation¶
Shape: (111, 34)
The full observation is the concatenation of executor and oracle observations.
full_obs = np.concatenate([executor_obs, oracle_obs], axis=0)
Tile Encoding¶
The 34 columns represent tile types:
Index |
Tile |
Index |
Tile |
Index |
Tile |
|---|---|---|---|---|---|
0 |
1m |
12 |
1p |
24 |
1s |
1 |
2m |
13 |
2p |
25 |
2s |
2 |
3m |
14 |
3p |
26 |
3s |
3 |
4m |
15 |
4p |
27 |
4s |
4 |
5m |
16 |
5p |
28 |
5s |
5 |
6m |
17 |
6p |
29 |
6s |
6 |
7m |
18 |
7p |
30 |
7s |
7 |
8m |
19 |
8p |
31 |
8s |
8 |
9m |
20 |
9p |
32 |
9s |
9 |
East |
21 |
White |
33 |
(reserved) |
10 |
South |
22 |
Green |
||
11 |
West |
23 |
Red |
Red Dora¶
Red dora (red 5m, 5p, 5s) are represented separately from regular 5 tiles. When using red dora, the corresponding action indices (34-36) should be used.