reinforcement

از kaggle

Connect Four is a game where two players alternate turns dropping colored discs into a vertical grid. Each player uses a different color (usually red or yellow), and the objective of the game is to be the first player to get four discs in a row.

Connect Four یک بازی است که در آن دو بازیکن به طور متناوب دیسک های رنگی را به یک شبکه عمودی تبدیل می کنند. هر بازیکن از رنگ متفاوتی (معمولا قرمز یا زرد) استفاده می کند و هدف از بازی این است که اولین بازیکنی باشد که چهار دیسک پشت سر هم به دست می آورد.

In this course, you will build your own intelligent agents to play the game.

  • In the first lesson, you’ll learn how to set up the game environment and create your first agent.
  • نحوه ایجاد محیط بازی و first Agent

  • The next two lessons focus on traditional methods for building game AI.
  • تمرکز بر روی روش های قدیمی ایجاد بازی
  • These agents will be smart enough to defeat many novice players!

بازی به حدی قوی است که میتواند از تازه کار ببرد.

  • In the final lesson, you’ll experiment with cutting-edge algorithms from the field of reinforcement learning.
  • مفاد درس آخر
  • The agents that you build will come up with gameplay strategies much like humans do:
  • gradually, and with experience.

Join the competition

Throughout the course, you’ll test your agents’ performance by competing against agents that other users have created.

در دوره شما با agent رقابت میکنید که یوزر دیگر ایجاد کرده است.

To join the competition, open a new window with the competition page, and click on the “Join Competition” button.

انجام شد.

(If you see a “Submit Agent” button instead of a “Join Competition” button, you have already joined the competition, and don’t need to do so again.)

This takes you to the rules acceptance page.

صفحه قوانین

You must accept the competition rules in order to participate. These rules govern how many submissions you can make per day, the maximum team size, and other competition-specific details.

Then, click on “I Understand and Accept” to indicate that you will abide by the competition rules.

انجام شد.

Introduction

Connect Four is a game where two players alternate turns dropping colored discs into a vertical grid. Each player uses a different color (usually red or yellow), and the objective of the game is to be the first player to get four discs in a row.

In this course, you will build your own intelligent agents to play the game.

  • In the first lesson, you’ll learn how to set up the game environment and create your first agent.
  • The next two lessons focus on traditional methods for building game AI. These agents will be smart enough to defeat many novice players!
  • In the final lesson, you’ll experiment with cutting-edge algorithms from the field of reinforcement learning. The agents that you build will come up with gameplay strategies much like humans do: gradually, and with experience.

Join the competition

Throughout the course, you’ll test your agents’ performance by competing against agents that other users have created.

To join the competition, open a new window with the competition page, and click on the “Join Competition” button. (If you see a “Submit Agent” button instead of a “Join Competition” button, you have already joined the competition, and don’t need to do so again.)

This takes you to the rules acceptance page. You must accept the competition rules in order to participate. These rules govern how many submissions you can make per day, the maximum team size, and other competition-specific details. Then, click on “I Understand and Accept” to indicate that you will abide by the competition rules.

Getting started

The game environment comes equipped with agents that have already been implemented for you.

To see a list of these default agents, run:

انجام شد.

The "random" agent selects (uniformly) at random from the set of valid moves.

In Connect Four, a move is considered valid if there’s still space in the column to place a disc (i.e., if the board has seven rows, the column has fewer than seven discs).

در این بازی یک حرکت زمانی معتبر است که هنوز جایی برای گذاشتن دیسک وجود داشته باشد. برای مثال اگر صفحه دارای 7 ردیف باشد، ستون دارای کمتر از 7 دیسک باشد.

In the code cell below, this agent plays one game round against a copy of itself.

در کد زیر یک بازی در مقابل خودش انجام میشود.

کد جرا شد

You can use the player above to view the game in detail: every move is captured and can be replayed. Try this now!

As you’ll soon see, this information will prove incredibly useful for brainstorming ways to improve our agents.

Defining agents

To participate in the competition, you’ll create your own agents.

برای شرکت در مسابقه باید agent خود را بسازید.

Your agent should be implemented as a Python function that accepts two arguments: obs and config.

agent شما یک تابع پایتون دارد که دو آرگومان میگیرد: obs و config

It returns an integer with the selected column, where indexing starts at zero. So, the returned value is one of 0-6, inclusive.

چیزی که این تابع برمیگرداند، یک عدد صحیح و یک ستون انتخاب شده است. که index آن از صفر شروع میشود. بنابراین عدد برگردانده شده عددی بین 0 تا 6 است. (که شش را هم شامل میشود.)

We’ll start with a few examples, to provide some context. In the code cell below:

با چند مثال شروع میکنیم

  • The first agent behaves identically to the "random" agent above.
    • عامل اول دقیقا مانند عامل رندوم بالا عمل میکند
  • The second agent always selects the middle column, whether it’s valid or not!
  • عامل دوم ، همیشه ستون وسط را انتخاب میکند، فارق از اینکه درست باشد یا غلط
  • Note that if any agent selects an invalid move, it loses the game.
  • اگر عامل حرکت اشتباهی را انتخاب کند، بازی را میبازد.
  • The third agent selects the leftmost valid column.
  • عامل سوم،چپ ترین ستون معتبر را انتخاب میکند.

کد مربوطه ران شد.

So, what are obs and config, exactly?

این دو مورد دقیقا چی هستند؟

obs

obs contains two pieces of information:

این مورد دارای دو اطلاعات مهم است

  • obs.board – the game board (a Python list with one item for each grid location)

صفحه بازی- لیست پایتونی که هر آیتم آن مربوط به یک نقطه از شبکه است.

مثلا در مورد زیر6 ردیف و 7 ستون داریم ، پس میشود 42 تا لوکیشن.

  • obs.mark – the piece assigned to the agent (either 1 or 2)

مهره ای که در هر لوکیشن موجود است. 1 برای یک تیم و 2 برای تیم حریف

obs.board is a Python list that shows the locations of the discs, where the first row appears first, followed by the second row, and so on.

We use 1 to track player 1’s discs, and 2 to track player 2’s discs. For instance, for this game board:

در زیر یک نمونه مشاهده میشود.

obs.board would be [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 2, 1, 2, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 2, 1, 2, 0, 2, 0]

config

config contains three pieces of information:

سه داده مهم وجود دارد

  • config.columns – number of columns in the game board (7 for Connect Four)

تعداد ستون ها در بازی که 7 تاست

  • config.rows – number of rows in the game board (6 for Connect Four)
  • تعداد سطرها که شش تاست
  • config.inarow – number of pieces a player needs to get in a row in order to win (4 for Connect Four)
  • تعداد مهرهای مورد نیازیک تیم که اگر ردیف شوند، برنده خواهد شد. که در این بازی 4 تاست.

Take the time now to investigate the three agents we’ve defined above.

Make sure that the code makes sense to you!

حالا یکبار دیگر کد را بررسی کنید و مطمئن شوید که آن را فهمیده اید.

Evaluating agents

To have the custom agents play one game round, we use the same env.run() method as before.

The outcome of a single game is usually not enough information to figure out how well our agents are likely to perform.

To get a better idea, we’ll calculate the win percentages for each agent, averaged over multiple games.

For fairness, each agent goes first half of the time.

To do this, we’ll use the get_win_percentages() function (defined in a hidden code cell). To view the details of this function, click on the “Code” button below.

دیدگاهتان را بنویسید

نشانی ایمیل شما منتشر نخواهد شد.