You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Training examples have a label for the result of the game (z) which we use to train the value head. We set the label to {-1 or 1} representing if the example has from a game that black or white won. We know some percent of our labels are "wrong" (we disagree on precisely what "wrong" means but during v9 we know some games were wrong by both our definitions)
I set out to measure what effect increasing the number of mislabeled values (z) would have on training.
Experiment
Load 10 "golden chunks" (v9 chunk 250 to 259), this is 20 million examples taken from ~600,000 selfplay games from strong v9 models.
When training flip the result (from white to black win and vice versa) in some percent {1, 2, 5, 10} of examples.
Train really fast on a bunch of TPUs at a couple of different network sizes.
Results
*TL;DR Flipping 1% and 2% of results doesn't have much impact, 5% and 10% have a much big impact on confidence and value accuracy.
The key takeaway:
The AG paper says they aim for 5% bad resign in their self play games. This is a trade off between playing more games and slightly better labelled games.
My experiments show that decreasing this to 4%, 2%, 1% would speed up training
That (^) being said there are diminishing returns (in value loss) and increasing costs (in computation) to drive this down (given that it might increasing average game length by 20% or more), it would also change the corpus of training data which has an unknown effect on value.
Potential follow ups:
Train with z to 1 - bad_resign_rate
Many moves in resign disabled games have value = 1.0 or -1.0 and MCTS playouts mirrors policy strongly. Maybe we should avoid sampling from these positions.
z represents "goodness of position for black", we often assume that it's linear and actually represents the approximate change of a winning.
Andrew takes a "wrong" z to mean the engine result was changed because of something outside of it's control and really only counts this case
In v9 we limited games to ~500 moves. If the clearly winning side might pass often and failed to clean up dead groups it might run out of moves and fail to cleaning up a "dead" group which changes the outcome according to Taylor-Tromp.
Seth takes "wrong" to mean "if two strong players (humans or bots) played the game from this point a hundred times" would their results agree with our result more than half the time.
This means that lots of games are "wrong" (i.e. we are teaching the NN something that it will later need to unlearn).
Helper script
sethtroisi@sethtroisi:~/minigo$ cat fumble_analysis
WORK_DIR="gs://$USER-sandbox/model"
DATA_DIR="gs://v9-19/data/golden_chunks"
test () {
BOARD_SIZE=19 python dual_net.py train --use_tpu --tpu_name=sethtroisi --model_dir=$WORK_DIR/fumble_analysis/$1_$2_$3 $DATA_DIR/{250..259}.tfrecord.zz --steps=30720 --iterations_per_loop=128 --summary_steps=256 --trunk_layers=$1 --conv_width=$2 --game_result_fumble_prob=$3
}
test 5 64 0
test 5 64 0.10
test 5 64 0.05
test 5 64 0.02
test 5 64 0.01
test 10 128 0
test 10 128 0.10
test 10 128 0.05
test 10 128 0.02
test 10 128 0.01
test 20 256 0
test 20 256 0.10
test 20 256 0.05
test 20 256 0.02
test 20 256 0.01
test 5 128 0
test 15 128 0
test 20 128 0
test 5 192 0
test 10 192 0
test 15 192 0
test 20 192 0
Inspiration
Training examples have a label for the result of the game (z) which we use to train the value head. We set the label to {-1 or 1} representing if the example has from a game that black or white won. We know some percent of our labels are "wrong" (we disagree on precisely what "wrong" means but during v9 we know some games were wrong by both our definitions)
I set out to measure what effect increasing the number of mislabeled values (z) would have on training.
Experiment
Results
*TL;DR Flipping 1% and 2% of results doesn't have much impact, 5% and 10% have a much big impact on confidence and value accuracy.
The key takeaway:
Potential follow ups:
1 - bad_resign_rateData
Holdout

Filling own eye when way ahead

https://cloudygo.com/v9-19x19/000000-unused/full/1532637753-tpu-player-deployment-57c689f568-q26qm-29.sgf?M=450
Unexpected outcomes versus generation

What is z
z represents "goodness of position for black", we often assume that it's linear and actually represents the approximate change of a winning.
Andrew takes a "wrong" z to mean the engine result was changed because of something outside of it's control and really only counts this case
Seth takes "wrong" to mean "if two strong players (humans or bots) played the game from this point a hundred times" would their results agree with our result more than half the time.
This means that lots of games are "wrong" (i.e. we are teaching the NN something that it will later need to unlearn).
Helper script