当前位置：首页 > 资讯 >

自学围棋的AlphaGoZero，你也能用PyTorch造一个｜附代码实现(6)

2023-05-04 来源:飞速影视

所有被禁的落子点，概率会变成零，然后重新把总概率归为1。
然后，这个叶节点就会生出枝节 (都是可以落子的位置，概率不为零的那些) 。代码如下——
1def expand(self, probas):2 self.children = [Node(parent=self, move=idx, proba=probas[idx]) 3 for idx in range(probas.shape[0]) if probas[idx] > 0]
更新一下
枝节生好之后，这个叶节点和它的妈妈们，身上的统计数据都会更新，用的是下面这两串代码。
1def update(self, v):2 """ Update the node statistics after a rollout """34 self.w = self.w v5 self.q = self.w / self.n if self.n > 0 else 0
1while current_node.parent:2 current_node.update(v)3 current_node = current_node.parent
选择落子点
模拟器搭好了，每个可能的“下一步”，都有了自己的统计数据。
按照这些数据，算法会选择其中一步，真要落子的地方。
选择有两种，一就是选择被模拟的次数最多的点。试用于测试和实战。
另外一种，随机 (Stochastically) 选择，把节点被经过的次数转换成概率分布，用的是以下代码——
1total = np.sum(action_scores)2probas = action_scores / total3move = np.random.choice(action_scores.shape[0], p=probas)
后者适用于训练，让AlphaGo探索更多可能的选择。
三位一体的修炼
狗零的修炼分为三个过程，是异步的。
一是自对弈(Self-Play) ，用来生成数据。
1def self_play():2 while True: 3 new_player, checkpoint = load_player() 4 if new_player: 5 player = new_player 6 7 ## Create the self-play match queue of processes 8 results = create_matches(player, cores=PARALLEL_SELF_PLAY, 9 match_number=SELF_PLAY_MATCH) 10 for _ in range(SELF_PLAY_MATCH):11 result = results.get()12 db.insert({13 "game": result,14 "id": game_id15 })16 game_id = 1

1 ...4 5 6 7 8 9 查看全文

自学围棋的AlphaGoZero，你也能用PyTorch造一个｜附代码实现(6)

中学时代：我们的省实

根据真实事件改编，用生命诠释的爱情，疾病会传染，但爱也会

围棋少年

新围棋少年

告白实行委员会：喜欢上你的那个瞬间

附身实验

一个女教练的自述

在异世界获得超强能力的我，在现实世界照样无敌～等级提升改变人生命运～