Slime-RL
Do you have questions or comments about this model? Ask them here! (You'll first need to log in.)
## GOALS
Teach learning slimes (= red turtles, aka "learning-turtles") to aggregate in clusters using **basic Q-learning**.
## TL;RD: quick info about RL here
Roughly, the Netlogo screen is split in two:
* on left side of the simulation arena are the parameters of the basic slime mold aggregation model (actually there are more than the original model, as I extended the number of parameters directly configurable at run-time from the GUI)
* on the right side there are RL-related parameters
**Important**: as the left side parameters are related to basic slime behaviours, they obviously also affect learning (e.g. decreasing the evaporation rate makes learning more difficult). Hence, **you are strongly advised to keep them fixed** once you find suitable behaviour with `setup` & `go` (no RL involved there).
To **keep track of experiments** remember to:
1. Explictly modify experiment name in procedure `setup-learning`
2. Configure RL stuff within `ask Learners [...` in procedure `setup-learning
3. Configure actions reported by logging procedure accordingly
4. Explicitly modify lines `"e-greedy", "ACION SPACE", "OBSERVATION SPACE"`, and `"REWARD"` in procedure `log-params` at the very end of file
### NON-RL parameters
As already said, these parameters describe slimes behaviour in the original model, but also indirectly affect learning, making it more difficult or easier (e.g. decreasing the evaporation rate makes learning more difficult).
* `population` controls the number of non-learning slimes (= blue turtles)
* `wiggle-angle` controls how much slimes steer around---no effect on learning
* `look-ahead` controls how far slimes can smell pheromone (higher values enable forming elongated clusters)---no effect on learning
* `sniff-threshold` controls how sensitive slimes are to pheromone (higher values make slimes less sensitive to pheromone)---unclear effect on learning, could be negligible
* `sniff-angle` controls how wide is the cone within which slimes can smell pheromone in nearby patches (higher values make slimes able to smell pheromone in a wider cone)---unclear effect on learning, could be negligible
* `chemical-drop` controls how much pheromone slimes deposit on their patch---unclear effect on learning, could be negligible
* `diffuse-share` controls how much pheromone diffuses in nearby patches (higher values mean more pheromone is diffused)---unclear effect on learning, but **likely lower values make learning more difficult**
* `evaporation-rate` controls how much pheromone is retained over time (higher values mean less pheromone evaporates)---unclear effect on learning, but **likely lower values make learning more difficult**
### RL parameters
All the following parameters have a direct effect on Q-learning of learning slimes.
* `cluster-threshold` controls the minimum number of slimes needed to consider an aggregate within `cluster-radius` a cluster (the higher the more difficult to consider an aggregate a cluster)---**the higher the more difficult to obtain a positive reward** for being within a cluster for learning slimes
* `cluster-radius` controls the range considered by slimes to count other slimes within a cluster (the higher the easier to form clusters, as turtles far apart are still counted together)---**the higher the easier it is to obtain a positive reward** for being within a cluster for learning slimes
* `learning-turtles` controls the number of learning slimes (= red turtles)
* `ticks-per-episode` controls how long a learning episode last (on episode end, slimes position are randomly reset and pheromone is cleared)---slimes should be given enough time to form clusters, hence it is strongly advisable to set this parameter at the very least **2x as low as allowed by non learning slimes forming clusters**
* `episodes` controls how many learning episodes are automatically run
* `learning-rate` is the classical Q-learning param, controlling "how fast" slimes learn---higher values cause bigger adjustements to Q-values
* `discount-factor` is the classical Q-learning param, controlling how much future rewards are given value over immediate ones---higher values cause bigger value given to future rewards
* `reward` is the raw reward value considered by the reward function (check code to see how it is used)
* `penalty` is the raw penalty (= negative reward) value considered by the reward function (check code to see how it is used)
### PLOTS
The top plots tracks the average "size" of clusters (in terms of number of turtles therein) based on two parameters:
* `cluster-threshold` is the minimum number of turtles needed to consider the aggregate a cluster (the higher the more difficult to form legit clusters, hence the more difficult to obtain a positive reward for learning turtles)
* `cluster-radius` is the range considered to count turtles in a cluster (the higher the easier to form clusters, as turtles far apart are still counted together)
This plot is better suited to monitor non-learning turtles behaviour during a `setup` & `go`: the higher the value the less-and-bigger clusters are produced
The bottom plot is meant to monitor learning, as it plots the average reward per episode (average of the individual rewards of each learning turtle).
### Other params
TBD
-----
##
ORIGINAL INFO BELOW
## WHAT IS IT?
This model is inspired by the aggregation behavior of slime-mold cells.
The slime mold spends much of its life as thousands of distinct single-celled units, each moving separately. Under the right conditions, those many cells will coalesce into a single, larger organism. When the environment is less hospitable, the slime mold acts as a single organism; when the weather turns cooler and the mold enjoys a large food supply, "it" becomes a "they." The slime mold oscillates between being a single creature and a swarm.
This model shows how creatures can aggregate into clusters without the control of a "leader" or "pacemaker" cell. This finding was first described by Evelyn Fox Keller and Lee Segel in a paper in 1970.
Before Keller began her investigations, the conventional belief had been that slime mold swarms formed at the command of "pacemaker" cells that ordered the other cells to begin aggregating. In 1962, Shafer showed how the pacemakers could use cyclic AMP as a signal of sorts to rally the troops; the slime mold generals would release the compounds at the appropriate moments, triggering waves of cyclic AMP that washed through the entire community, as each isolated cell relayed the signal to its neighbors. Slime mold aggregation, in effect, was a giant game of Telephone — but only a few elite cells placed the original call.
For the twenty years that followed the publication of Shafer's original essay, mycologists assumed that the missing pacemaker cells were a sign of insufficient data, or poorly designed experiments. But Keller and Segel took another, more radical approach. They shows that Shafer had it wrong -- that the community of slime mold cells were organizing themselves without any need for pacemakers. This was one of the first examples of emergence and self-organization in biology.
Initially, biologists did not accept this explanation. Indeed, the pacemaker hypothesis would continue as the reigning model for another decade. Now, slime mold aggregation is recognized as a classic case study in bottom-up self-organizing behavior.
In this model, each turtle drops a chemical pheromone (shown in green). The turtles also "sniff" ahead, trying to follow the gradient of other turtles' chemicals. Meanwhile, the patches diffuse and evaporate the pheromone. Following these simple, decentralized rules, the turtles aggregate into clusters.
## HOW TO USE IT
Click the SETUP button to set up a collection of slime-mold cells. Click the GO button to start the simulation.
The POPULATION slider controls the number of slime mold cells in the simulation. Changes in the POPULATION slider do not have any effect until the next SETUP command.
The other sliders affect the way turtles move. Changes to them will immediately affect the model run.
SNIFF-THRESHHOLD -- The minimum amount of chemical that must be present in a turtle's patch before the turtle will look for a chemical gradient to follow. This parameter causes the turtles to aggregate only when there are enough other cells nearby. The default value is 1.0.
SNIFF-ANGLE -- The amount, in degrees, that a turtle turns to the left and right to check for greater chemical concentrations. The default value is 45.
WIGGLE-ANGLE -- The maximum amount, in degrees, that a turtle will turn left or right in its random movements. When WIGGLE-ANGLE is set to zero, the turtle will remain at the same heading until it finds a chemical gradient to follow. The default value is 40.
WIGGLE-BIAS -- The bias of a turtle's average wiggle. When WIGGLE-BIAS = 0, the turtle's average movement is straight ahead. When WIGGLE-BIAS > 0, the turtle will tend to move more right than left. When BIAS < 0, the turtle will tend to move more left than right. The default value is 0.
There are several other critical parameters in the model that are not accessible by sliders. They can be changed by modifying the code in the procedures window. They are:
- the evaporation rate of the chemical -- set to 0.9
- the diffusion rate of the chemical -- set to 1
- the amount of chemical deposited at each step -- set to 2
## THINGS TO NOTICE
With 100 turtles, not much happens. The turtles wander around dropping chemical, but the chemical evaporates and diffuses too quickly for the turtles to aggregate.
With 400 turtles, the result is quite different. When a few turtles happen (by chance) to wander near one another, they create a small "puddle" of chemical that can attract any number of other turtles in the vicinity. The puddle then becomes larger and more attractive as more turtles enter it and deposit their own chemicals. This process is a good example of positive feedback: the more turtles, the larger the puddle; and the larger the puddle, the more likely it is to attract more turtles.
## THINGS TO TRY
Try different values for the SNIFF-THRESHOLD, SNIFF-ANGLE, WIGGLE-ANGLE, and WIGGLE-BIAS sliders. How do they affect the turtles' movement and the formation of clumps?
Change the SNIFF-ANGLE and WIGGLE-ANGLE sliders after some clumps have formed. What happens to the clumps? Try the same with SNIFF-THRESHOLD and WIGGLE-BIAS.
## EXTENDING THE MODEL
Modify the program so that the turtles aggregate into a single large cluster.
How do the results change if there is more (or less) randomness in the turtles' motion?
Notice that the turtles only sniff for chemical in three places: forward, SNIFF-ANGLE to the left, and SNIFF-ANGLE to the right. Modify the model so that the turtles sniff all around. How does their clustering behavior change? Modify the model so that the turtles sniff in even fewer places. How does their clustering behavior change?
What "critical number" of turtles is needed for the clusters to form? How does the critical number change if you modify the evaporation or diffusion rate?
Can you find an algorithm that will let you plot the number of distinct clusters over time?
## NETLOGO FEATURES
Note the use of the `patch-ahead`, `patch-left-and-ahead`, and `patch-right-and-ahead` primitives to do the "sniffing".
## RELATED MODELS
Ants uses a similar idea of creatures that both drop chemical and follow the gradient of the chemical.
## CREDITS AND REFERENCES
Keller, E & Segel, L. (1970). Initiation of slime mold aggregation viewed as an instability. Journal of Theoretical Biology,
Volume 26, Issue 3, March 1970, Pages 399–415.
Wilensky, U., & Resnick, M. (1999). Thinking in levels: A dynamic systems approach to making sense of the world. Journal of Science Education and Technology, 8(1), 3-19.
Johnson, S. (2001). Emergence: The Connected Lives of Ants, Brains, Cities, and Software. New York: Scribner.
Resnick, M. (1996). Beyond the centralized mindset. Journal of the Learning Sciences, 5(1), 1-22.
See also http://www.creepinggarden.com for video of slime mold.
## HOW TO CITE
If you mention this model or the NetLogo software in a publication, we ask that you include the citations below.
For the model itself:
* Wilensky, U. (1997). NetLogo Slime model. http://ccl.northwestern.edu/netlogo/models/Slime. Center for Connected Learning and Computer-Based Modeling, Northwestern University, Evanston, IL.
Please cite the NetLogo software as:
* Wilensky, U. (1999). NetLogo. http://ccl.northwestern.edu/netlogo/. Center for Connected Learning and Computer-Based Modeling, Northwestern University, Evanston, IL.
## COPYRIGHT AND LICENSE
Stefano Mariani (stefano.mariani@unimore.it)
Original copyright info below.
Copyright 1997 Uri Wilensky.
![CC BY-NC-SA 3.0](http://ccl.northwestern.edu/images/creativecommons/byncsa.png)
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-sa/3.0/ or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.
Commercial licenses are also available. To inquire about commercial licenses, please contact Uri Wilensky at uri@northwestern.edu.
This model was created as part of the project: CONNECTED MATHEMATICS: MAKING SENSE OF COMPLEX PHENOMENA THROUGH BUILDING OBJECT-BASED PARALLEL MODELS (OBPML). The project gratefully acknowledges the support of the National Science Foundation (Applications of Advanced Technologies Program) -- grant numbers RED #9552950 and REC #9632612.
This model was developed at the MIT Media Lab using CM StarLogo. See Resnick, M. (1994) "Turtles, Termites and Traffic Jams: Explorations in Massively Parallel Microworlds." Cambridge, MA: MIT Press. Adapted to StarLogoT, 1997, as part of the Connected Mathematics Project.
This model was converted to NetLogo as part of the projects: PARTICIPATORY SIMULATIONS: NETWORK-BASED DESIGN FOR SYSTEMS LEARNING IN CLASSROOMS and/or INTEGRATED SIMULATION AND MODELING ENVIRONMENT. The project gratefully acknowledges the support of the National Science Foundation (REPP & ROLE programs) -- grant numbers REC #9814682 and REC-0126227. Converted from StarLogoT to NetLogo, 2000.
Comments and Questions
;; CHECK ESPECIALLY CAREFULLY COMMENTS WITH "NB" OR "WARNING" ;; 1) Explictly modify experiment name in procedure setup-learning ;; 2) Configure 'actions' global variable ;; 3) Configure RL stuff within "ask Learners [..." in procedure setup-learning accordingly ;; 4) Explicitly modify lines "e-greedy", "OBSERVATION SPACE", and "REWARD" in procedure log-params at the very end of file extensions[qlearningextension table] globals [ actions action-distribution ;; table "action -> number of turtles choosing that action" turtle-distribution ;; table "turtle -> [table action -> number of times action chosen]" filename ;; the file where to report simulation results (automatically appended with a timestamp) g-reward-list ;; list with one entry for each turtle, that is the average reward got so far by such turtle g-std-reward-list ;; list with one entry for each turtle, that is the standard deviation of the average reward got so far by such turtle g-max-reward-list ;; list with one entry for each turtle, that is the maximum reward got so far by such turtle g-min-reward-list ;; list with one entry for each turtle, that is the minimum reward got so far by such turtle g-mean-distance-vector ;; list with one entry for each turtle, that is the average distance from that turtle to any other turtle g-std-distance-vector ;; list with one entry for each turtle, that is the standard deviation of the average distance from that turtle to any other turtle g-min-distance-vector ;; list with one entry for each turtle, that is the minimum distance from that turtle to any other turtle g-max-distance-vector ;; list with one entry for each turtle, that is the maximum distance from that turtle to any other turtle episode ;; progressive number of the currently running episode (hence number of episodes run) is-there-cluster ;; is there at least one cluser in the whole environment? (boolean) first-cluster ;; whether the cluster now formed is the first one of the episode cluster-tick ;; tick number relative to an episode when the first cluster of that episode is formed ] patches-own [chemical] ;; amount of pheromone in the patch Breed[Learners Learner] ;; turtles that are learning (shown in red) Learners-own [ chemical-here ;; whether there is pheromone on the patch-here (boolean) chemical-gradient ;; direction where gradient is stronger cluster-gradient ;; direction where cluster is stronger p-chemical ;; amount of pheromone on the patch-here reward-list ;; list of rewards got so far ] turtles-own [ ;; these variables are also inherited by learners ticks-in-cluster ;; how many ticks the turtle has stayed within a cluster cluster ;; number of turtles within cluster-radius in-cluster ;; whether the turtle is within a cluster (boolean = cluster > cluster-threshold) last-action ;; name of the action taken by the turtle in last tick distance-vector ;; list with one entry for each turtle, that is the distance from this turtle to that turtle ] ;;;;;;;;;;;;;;;;;;;;;; ;; SETUP procedures ;; ;;;;;;;;;;;;;;;;;;;;;; to setup ;; NO RL here (some RL variables are initialised anyway to avoid errors) clear-all create-turtles population [ set color blue set size 2 setxy random-xcor random-ycor set ticks-in-cluster 0 set cluster 0 set in-cluster false set distance-vector [] if label? [ set label who ] ] ask patches [ set chemical 0 ] set episode 1 set is-there-cluster false set first-cluster true set cluster-tick ticks-per-episode reset-ticks setup-global-plot "Average cluster size in # of turtles within cluster-radius" "# of turtles" 0 ;set actions ["move-and-drop" "walk-and-drop"] ;set actions ["away-and-drop" "stand-still"] set actions ["away-and-drop" "drop-chemical"] ;set actions ["away-and-drop" "walk-and-drop"] ;set actions ["random-walk" "move-toward-chemical" "drop-chemical"] ;; NB MODIFY ACTIONS LIST HERE setup-action-distribution-table actions type "Actions distribution: " print action-distribution setup-turtle-distribution-table turtles type "Turtles distribution: " print turtle-distribution if log-data? [ set filename (word "BS-scatter01-" date-and-time ".txt") ;; NB MODIFY HERE EXPERIMENT NAME print filename file-open filename log-params-nolearn ] ask turtles [ if not (breed = Learners) [ foreach [self] of turtles [ t -> if t != self [ set distance-vector lput precision distance t 2 distance-vector ] ] set g-mean-distance-vector [] set g-std-distance-vector [] set g-min-distance-vector [] set g-max-distance-vector [] set g-mean-distance-vector lput precision mean distance-vector 2 g-mean-distance-vector set g-std-distance-vector lput precision standard-deviation distance-vector 2 g-std-distance-vector set g-min-distance-vector lput precision min distance-vector 2 g-min-distance-vector set g-max-distance-vector lput precision max distance-vector 2 g-max-distance-vector ] ] end to setup-learning ;; RL clear-all create-turtles population [ set color blue set size 2 setxy random-xcor random-ycor set ticks-in-cluster 0 set cluster 0 set in-cluster false set distance-vector [] if label? [ set label who ] ] ask patches [ set chemical 0 ] set episode 1 set is-there-cluster false set first-cluster true set cluster-tick ticks-per-episode reset-ticks setup-global-plot "Average cluster size in # of turtles within cluster-radius" "# of turtles" 0 ;set actions ["random-walk" "stand-still"] ;set actions ["random-walk" "move-toward-cluster"] ;set actions ["random-walk" "stand-still" "move-toward-cluster"] set actions ["move-away-chemical" "random-walk" "drop-chemical" "move-toward-chemical"] ;set actions ["move-away-chemical" "random-walk" "drop-chemical"] ;set actions ["away-and-drop" "walk-and-drop" "drop-chemical"] ;set actions ["random-walk" "drop-chemical" "move-toward-chemical"] ;set actions ["move-and-drop" "walk-and-drop"] ;set actions ["move-toward-chemical" "random-walk" "move-and-drop" "walk-and-drop" "drop-chemical"] ;; NB MODIFY ACTIONS LIST HERE setup-action-distribution-table actions type "Actions distribution: " print action-distribution create-Learners learning-turtles [ set color red set size 2 setxy random-xcor random-ycor set chemical-here false set chemical-gradient max-one-of neighbors [chemical] set cluster-gradient max-one-of neighbors [count turtles-on neighbors] set p-chemical 0 set reward-list [] set distance-vector [] if label? [ set label who ] ] setup-turtle-distribution-table Learners type "Turtles distribution: " print turtle-distribution if log-data? [ set filename (word "manual_switch_reward_turn90-" date-and-time ".txt") ;; NB MODIFY HERE EXPERIMENT NAME print filename file-open filename log-params ] set g-reward-list [] set g-std-reward-list [] set g-max-reward-list [] set g-min-reward-list [] ask Learners [ ;qlearningextension:state-def ["cluster-gradient" "in-cluster"] ;qlearningextension:state-def ["chemical-gradient" "in-cluster"] qlearningextension:state-def ["chemical-gradient"] ;; reporter ;; reporter could report variables that the agent does not own ;qlearningextension:state-def ["chemical-here" "in-cluster"] ;; WARNING non-boolean state variables make the Q-table explode in size, hence Netlogo crashes 'cause out of memory! ;(qlearningextension:actions [random-walk] [stand-still]) (qlearningextension:actions [move-away-chemical] [random-walk] [drop-chemical] [move-toward-chemical]) ;; admissible actions to be learned in policy WARNING: be sure to not use explicitly these actions in learners! ;(qlearningextension:actions [move-away-chemical] [random-walk] [drop-chemical]) ;(qlearningextension:actions [away-and-drop] [walk-and-drop] [drop-chemical]) ;(qlearningextension:actions [random-walk] [drop-chemical] [move-toward-chemical]) ;(qlearningextension:actions [move-toward-chemical] [random-walk] [move-and-drop] [walk-and-drop] [drop-chemical]) ;; NB MODIFY ACTIONS LIST ACCORDING TO "actions" GLOBAL VARIABLE ;(qlearningextension:actions [move-and-drop] [walk-and-drop]) qlearningextension:reward [adaptive01] ;; the reward function used qlearningextension:end-episode [isEndState] resetEpisode ;; the termination condition for an episode and the procedure to call to reset the environment for the next episode ; 10000 -> .9 .999 / .9993, 5000, 3000 episodes -> .9 .9985, 1500 ep -> .9 .9965, 500 ep -> .9 .985 qlearningextension:action-selection "e-greedy" [0.9 0.9993] ;; 1st param is chance of random action, 2nd parameter is decay factor applied (after each episode the 1st parameter is updated, the new value corresponding to the current value multiplied by the 2nd param) qlearningextension:learning-rate learning-rate qlearningextension:discount-factor discount-factor foreach [self] of turtles [ t -> if t != self [ set distance-vector lput precision distance t 2 distance-vector ] ] set g-mean-distance-vector [] set g-std-distance-vector [] set g-min-distance-vector [] set g-max-distance-vector [] set g-mean-distance-vector lput precision mean distance-vector 2 g-mean-distance-vector set g-std-distance-vector lput precision standard-deviation distance-vector 2 g-std-distance-vector set g-min-distance-vector lput precision min distance-vector 2 g-min-distance-vector set g-max-distance-vector lput precision max distance-vector 2 g-max-distance-vector ] setup-global-plot "Average reward per episode" "average reward" 0 end ;;;;;;;;;;;;;;;;;;; ;; GO procedures ;; ;;;;;;;;;;;;;;;;;;; to go ;; NO RL if episode <= episodes ;; = learning episodes not finished [ ask turtles [ check-cluster ifelse scatter? [ ifelse chemical > sniff-threshold ;; ignore pheromone unless there's enough here [ away-and-drop ] [ drop-chemical ] ] [ ifelse chemical > sniff-threshold ;; ignore pheromone unless there's enough here [ move-and-drop ] [ walk-and-drop ] ] ;drop-chemical ;; drop chemical onto patch if table:has-key? action-distribution last-action [ let n table:get action-distribution last-action table:put action-distribution last-action n + 1 ] ;[ type "WARNING: " type who type " choose action " type last-action print " that is NOT intable!" ] let turtles-table table:get turtle-distribution who if table:has-key? turtles-table last-action [ let n table:get turtles-table last-action table:put turtles-table last-action n + 1 ] ;[ type "WARNING: " type who type " choose action " type last-action print " that is NOT intable!" ] ] diffuse chemical diffuse-share ;; diffuse chemical to neighboring patches ask patches [ set chemical chemical * evaporation-rate ;; evaporate chemical set pcolor scale-color green chemical 0.1 3 ] ;; update display of chemical concentration let c-avg avg-cluster? plot-global "Average cluster size in # of turtles within cluster-radius" "# of turtles" c-avg log-ticks "average cluster size in # of turtles: " c-avg if is-there-cluster [ if first-cluster [ set cluster-tick ticks - (ticks-per-episode * (episode - 1)) set first-cluster false type "t" type cluster-tick print ") FIRST CLUSTER!" ] ] if log-data? [ if (((ticks + 1) mod print-every) = 0) ;; log experiment data [ ask turtles [ foreach [self] of turtles [ t -> if t != self [ set distance-vector lput precision distance t 2 distance-vector ] ] set g-mean-distance-vector lput precision mean distance-vector 2 g-mean-distance-vector set g-std-distance-vector lput precision standard-deviation distance-vector 2 g-std-distance-vector set g-min-distance-vector lput precision min distance-vector 2 g-min-distance-vector set g-max-distance-vector lput precision max distance-vector 2 g-max-distance-vector ] let g-mean-distance precision mean g-mean-distance-vector 2 let g-std-distance precision standard-deviation g-std-distance-vector 2 let g-min-distance precision min g-min-distance-vector 2 let g-max-distance precision max g-max-distance-vector 2 file-open filename ;; Episode, Tick, Avg cluster size X tick, Avg reward X episode, Actions distribution until tick (how many turtles choose each available action) file-type episode file-type ", " file-type ticks file-type ", " file-type cluster-tick file-type ", " file-type c-avg file-type ", " file-type g-mean-distance file-type ", " file-type g-std-distance file-type ", " file-type g-min-distance file-type ", " file-type g-max-distance file-type ", " print-table action-distribution ", " print-table-table turtle-distribution ", " ] ] if (((ticks + 1) mod ticks-per-episode) = 0) [ ;; an episode has just ended ;if (ticks > 1) and (is-there-cluster = true) [ clear-patches ;; clear chemical set is-there-cluster false ;; reset state variables set first-cluster true set cluster-tick ticks-per-episode ;plot-global "Average reward per episode" "average reward" g-avg-rew log-episodes "average reward per episode: " 0 type "Actions distribution: " print action-distribution type "Turtles distribution: " print turtle-distribution setup-action-distribution-table actions setup-turtle-distribution-table turtles set g-reward-list [] set episode episode + 1 ask turtles [ ;; reset non learners too if not (breed = Learners) [ setxy random-xcor random-ycor set ticks-in-cluster 0 set cluster 0 set in-cluster false ] ] ] tick ] file-close-all end to learn ;; RL if episode <= episodes ;; = learning episodes not finished [ ask turtles [ if not (breed = Learners) ;; handle non learning slimes as for 'go' procedure [ check-cluster ifelse chemical > sniff-threshold [ move-toward-chemical ] ;drop-chemical ] [ random-walk ] ;drop-chemical ] drop-chemical ] ;] ] ask Learners ;; handle learning slimes [ check-cluster set p-chemical [chemical] of patch-here ifelse chemical > sniff-threshold [ set chemical-here true ] ;; set state variables ;move-toward-chemical ] [ set chemical-here false ] ;random-walk ] set chemical-gradient face-chem-gradient set cluster-gradient face-cluster-gradient qlearningextension:learning ;; select an action for the current state, perform the action, get the reward, update the Q-table, verify if the new state is an end state and if so will run the procedure passed to the extension in the end-episode primitive ;if (ticks > 0) and ((ticks mod ticks-per-episode) = 0) [ ;type "Q-table: " print(qlearningextension:get-qtable) ] ifelse table:has-key? action-distribution last-action [ let n table:get action-distribution last-action table:put action-distribution last-action n + 1 ] [ type "WARNING: " type who type " choose action " type last-action print " that is NOT intable!" ] let learner-table table:get turtle-distribution who ifelse table:has-key? learner-table last-action [ let n table:get learner-table last-action table:put learner-table last-action n + 1 ] [ type "WARNING: " type who type " choose action " type last-action print " that is NOT in learner table!" ] ] diffuse chemical diffuse-share ask patches [ set chemical chemical * evaporation-rate set pcolor scale-color green chemical 0.1 3 ] let c-avg avg-cluster? plot-global "Average cluster size in # of turtles within cluster-radius" "# of turtles" c-avg log-ticks "average cluster size in # of turtles: " c-avg let g-avg-rew 0 let g-std-rew 0 let g-min-rew 0 let g-max-rew 0 if is-there-cluster [ if first-cluster [ set cluster-tick ticks - (ticks-per-episode * (episode - 1)) set first-cluster false type "t" type cluster-tick print ") FIRST CLUSTER!" ] ] if log-data? [ if (((ticks + 1) mod print-every) = 0) ;; log experiment data [ ask Learners [ foreach [self] of turtles [ t -> if t != self [ set distance-vector lput precision distance t 2 distance-vector ] ] set g-mean-distance-vector lput precision mean distance-vector 2 g-mean-distance-vector set g-std-distance-vector lput precision standard-deviation distance-vector 2 g-std-distance-vector set g-min-distance-vector lput precision min distance-vector 2 g-min-distance-vector set g-max-distance-vector lput precision max distance-vector 2 g-max-distance-vector ] set g-avg-rew avg? g-reward-list set g-std-rew avg? g-std-reward-list set g-min-rew avg? g-min-reward-list set g-max-rew avg? g-max-reward-list let g-mean-distance precision mean g-mean-distance-vector 2 let g-std-distance precision standard-deviation g-std-distance-vector 2 let g-min-distance precision min g-min-distance-vector 2 let g-max-distance precision max g-max-distance-vector 2 file-open filename ;; Episode, Tick, First cluster tick Avg cluster size X tick, Avg reward X episode, file-type episode file-type ", " file-type ticks file-type ", " file-type cluster-tick file-type ", " file-type c-avg file-type ", " file-type g-avg-rew file-type ", " file-type g-std-rew file-type ", " file-type g-min-rew file-type ", " file-type g-max-rew file-type ", " file-type g-mean-distance file-type ", " file-type g-std-distance file-type ", " file-type g-min-distance file-type ", " file-type g-max-distance file-type ", " ;; Actions distribution until tick (how many turtles choose each available action) print-table action-distribution ", " print-table-table turtle-distribution ", " ] ] if (((ticks + 1) mod ticks-per-episode) = 0) [ ;; an episode has just ended ;if (ticks > 1) and (is-there-cluster = true) [ clear-patches ;; clear chemical set is-there-cluster false ;; reset state variables set first-cluster true set cluster-tick ticks-per-episode set g-avg-rew avg? g-reward-list set g-std-rew avg? g-std-reward-list set g-min-rew avg? g-min-reward-list set g-max-rew avg? g-max-reward-list plot-global "Average reward per episode" "average reward" g-avg-rew log-episodes "average reward per episode: " g-avg-rew type "Actions distribution: " print action-distribution type "Turtles distribution: " print turtle-distribution setup-action-distribution-table actions setup-turtle-distribution-table Learners set g-reward-list [] set g-std-reward-list [] set g-max-reward-list [] set g-min-reward-list [] set episode episode + 1 ask turtles [ ;; reset non learners too if not (breed = Learners) [ setxy random-xcor random-ycor set ticks-in-cluster 0 set cluster 0 set in-cluster false ] ] ] tick ] file-close-all end ;;;;;;;;;;;;;;;;;;;;;;;;; ;; LEARNING procedures ;; ;;;;;;;;;;;;;;;;;;;;;;;;; to-report rewardFunc1 ;; fixed reward if in cluster, otherwise penalty let r penalty if in-cluster = true [ set r reward ] set reward-list lput r reward-list report r end to-report rewardFunc2 ;; monotonic reward based on ticks-in-cluster set reward-list lput ticks-in-cluster reward-list report ticks-in-cluster end to-report rewardFunc3 ;; reward and penalty based on ticks-in-cluster let rew 0 if (ticks > 0) [ set rew ((ticks-in-cluster / ticks-per-episode) * reward) + (((ticks-per-episode - ticks-in-cluster) / ticks-per-episode) * penalty) set reward-list lput rew reward-list ] report rew end to-report rewardFunc4 ;; reward based on both ticks-in-cluster and cluster size, penalty based on ticks-in-cluster let rew cluster if (ticks > 0) [ set rew ((ticks-in-cluster / ticks-per-episode) * (cluster / cluster-threshold) * reward) + (((ticks-per-episode - ticks-in-cluster) / ticks-per-episode) * penalty) set reward-list lput rew reward-list ] report rew end to-report rewardFunc5 ;; additive variation of rewardFunc4 let rew cluster if (ticks > 0) [ set rew ((ticks-in-cluster / ticks-per-episode) * reward) + ((cluster / cluster-threshold) * reward) + (((ticks-per-episode - ticks-in-cluster) / ticks-per-episode) * penalty) set reward-list lput rew reward-list ] report rew end to-report rewardFunc6 ;; variation of rewardFunc5: more 'weight' to cluster size, less 'weight' to penalty let rew cluster if (ticks > 0) [ set rew ((ticks-in-cluster / ticks-per-episode) * reward) + ((cluster / cluster-threshold) * reward ^ 2) + ((ticks-per-episode - ticks-in-cluster) * (penalty / 10)) set reward-list lput rew reward-list ] report rew end to-report rewardFunc7 ;; no ticks-in-cluster let rew cluster if (ticks > 0) [ set rew ((cluster ^ 2 / cluster-threshold) * reward) + (((ticks-per-episode - ticks-in-cluster) / ticks-per-episode) * penalty) set reward-list lput rew reward-list ] report rew end to-report rewardFunc8 ;; variation of rewardFunc6: ratio of ticks not in cluster, instead of absolute difference let rew cluster ;if (ticks > 0) ;[ set rew ((ticks-in-cluster / ticks-per-episode) * reward) + ((cluster / cluster-threshold) * (reward ^ 2)) + (((ticks-per-episode - ticks-in-cluster) / ticks-per-episode) * penalty) set reward-list lput rew reward-list ;] report rew end to-report rewardFunc9 ;; variation of rewardFunc8: ratio of ticks in cluster give reward only if cluster of correct size let rew cluster ;if (ticks > 0) ;[ set rew ((ticks-in-cluster / ticks-per-episode) * (cluster / cluster-threshold) * reward) + ((cluster / cluster-threshold) * (reward ^ 2)) + (((ticks-per-episode - ticks-in-cluster) / ticks-per-episode) * penalty) set reward-list lput rew reward-list ;] report rew end to-report scatter01 ;; incentivise scattering, not clustering! (essentially, the contrary of rewardFunc8) let rew cluster ;if (ticks > 0) ;[ set rew ((ticks-in-cluster / ticks-per-episode) * penalty) + ((cluster / cluster-threshold) * (0 - (penalty ^ 2))) + (((ticks-per-episode - ticks-in-cluster) / ticks-per-episode) * reward) set reward-list lput rew reward-list ;] report rew end to-report scatter02 ;; explicitly reward no clusters let rew cluster ;if (ticks > 0) ;[ ifelse in-cluster [ set rew ((ticks-in-cluster / ticks-per-episode) * penalty) + ((cluster / cluster-threshold) * (0 - (penalty ^ 2))) + (((ticks-per-episode - ticks-in-cluster) / ticks-per-episode) * reward) ] [ set rew reward ] set reward-list lput rew reward-list ;] report rew end to-report scatter03 ;; added distance, std dev let rew 0 ;if (ticks > 0) ;[ set rew ((ticks-in-cluster / ticks-per-episode) * penalty) + ((cluster / cluster-threshold) * penalty) + (((ticks-per-episode - ticks-in-cluster) / ticks-per-episode) * (reward)) + (reward) / precision standard-deviation distance-vector 2 set reward-list lput rew reward-list ;] report rew end to-report scatter04 ;; added distance, min let rew 0 ;if (ticks > 0) ;[ set rew ((ticks-in-cluster / ticks-per-episode) * penalty) + ((cluster / cluster-threshold) * penalty) + (((ticks-per-episode - ticks-in-cluster) / ticks-per-episode) * (reward)) + precision min distance-vector 2 * reward set reward-list lput rew reward-list ;] report rew end to-report scatter05 ;; amplifying reward of min distance let rew 0 ;if (ticks > 0) ;[ set rew ((ticks-in-cluster / ticks-per-episode) * penalty) + ((cluster / cluster-threshold) * penalty) + (((ticks-per-episode - ticks-in-cluster) / ticks-per-episode) * (reward)) + (precision min distance-vector 2) ^ 3 * reward set reward-list lput rew reward-list ;] report rew end to-report scatter06 ;; amplifying reward of time out cluster let rew 0 ;if (ticks > 0) ;[ set rew ((ticks-in-cluster / ticks-per-episode) * penalty) + ((cluster / cluster-threshold) * penalty) + (((ticks-per-episode - ticks-in-cluster) / ticks-per-episode) * (reward) ^ 2) + (precision min distance-vector 2) ^ 3 * reward set reward-list lput rew reward-list ;] report rew end to-report scatter07 ;; huge penalty for staying in cluster let rew 0 ;if (ticks > 0) ;[ set rew (ticks-in-cluster / ticks-per-episode) * (0 - reward) + ((cluster / cluster-threshold) * penalty) + (((ticks-per-episode - ticks-in-cluster) / ticks-per-episode) * (reward) ^ 2) + (precision min distance-vector 2) ^ 3 * reward set reward-list lput rew reward-list ;] report rew end to-report adaptive01 ;; toggle reward scheme mid-learning NB: should reset also exploration rate,HOW TO?? ifelse episode < (episodes / 2) [ report rewardFunc8 ] [ report scatter07 ] end to-report isEndState ;if is-there-cluster = true [ if (((ticks + 1) mod ticks-per-episode) = 0) [ ;set is-there-cluster false if switch-reward and episode = round (episodes / 2) [ qlearningextension:action-selection "e-greedy" [0.9 0.993] print "Switched reward function!" ] report true ] report false end to resetEpisode let avg-rew avg? reward-list set g-reward-list lput avg-rew g-reward-list if length reward-list > 0 [ set g-std-reward-list lput precision standard-deviation reward-list 2 g-std-reward-list set g-min-reward-list lput precision min reward-list 2 g-min-reward-list set g-max-reward-list lput precision max reward-list 2 g-max-reward-list ] ;set-current-plot-pen (word who) ;plot avg-rew set reward-list [] set ticks-in-cluster 0 set distance-vector [] ;ask patch-here [ set chemical 0 ] ;ask [neighbors] of patch-here [ set chemical 0 ] setxy random-xcor random-ycor end ;;;;;;;;;;;;;;;; ;; RL actions ;; ;;;;;;;;;;;;;;;; to move-toward-cluster ;; turtle procedure ;if breed = Learners set last-action "move-toward-cluster" let ahead count-from-me look-ahead 0 let myright count-from-me look-ahead 1 let myleft count-from-me look-ahead -1 ifelse (myright >= ahead) and (myright >= myleft) [ rt sniff-angle ] [ if myleft >= ahead [ lt sniff-angle ] ] fd 1 end to move-toward-chemical ;; turtle procedure ;if breed = Learners set last-action "move-toward-chemical" ;; examine the patch ahead of you and two nearby patches; ;; turn in the direction of greatest chemical let ahead [chemical] of patch-ahead look-ahead let myright [chemical] of patch-right-and-ahead sniff-angle look-ahead let myleft [chemical] of patch-left-and-ahead sniff-angle look-ahead ifelse (myright >= ahead) and (myright >= myleft) [ rt sniff-angle ] [ if myleft >= ahead [ lt sniff-angle ] ] fd 1 ;; default don't turn end to move-away-chemical ;; turtle procedure ;if breed = Learners set last-action "move-away-chemical" ;; examine the patch ahead of you and two nearby patches; ;; turn in the direction of greatest chemical let ahead [chemical] of patch-ahead look-ahead let myright [chemical] of patch-right-and-ahead sniff-angle look-ahead let myleft [chemical] of patch-left-and-ahead sniff-angle look-ahead ifelse (myright >= ahead) and (myright >= myleft) [ lt 90 ] [ ifelse myleft >= ahead [ rt 90 ] [ lt 180 ] ] fd 1 ;; default don't turn end to random-walk ;; turtle procedure ;if breed = Learners set last-action "random-walk" ifelse (random-float 1) > 0.5 [ rt random-float wiggle-angle ] [ lt random-float wiggle-angle ] fd 1 end to drop-chemical ;; turtle procedure ;if breed = Learners set last-action "drop-chemical" set chemical chemical + chemical-drop end to dont-drop-chemical ;; turtle procedure ;if breed = Learners set last-action "dont-drop-chemical" end to move-and-drop ;; turtle procedure (can't reuse code due to last-action saving (would compromise tracking of last actions performed!)) ;if breed = Learners set last-action "move-and-drop" let ahead [chemical] of patch-ahead look-ahead let myright [chemical] of patch-right-and-ahead sniff-angle look-ahead let myleft [chemical] of patch-left-and-ahead sniff-angle look-ahead ifelse (myright >= ahead) and (myright >= myleft) [ rt sniff-angle ] [ if myleft >= ahead [ lt sniff-angle ] ] fd 1 ;; default don't turn set chemical chemical + chemical-drop end to away-and-drop ;; turtle procedure (can't reuse code due to last-action saving (would compromise tracking of last actions performed!)) ;if breed = Learners set last-action "away-and-drop" let ahead [chemical] of patch-ahead look-ahead let myright [chemical] of patch-right-and-ahead sniff-angle look-ahead let myleft [chemical] of patch-left-and-ahead sniff-angle look-ahead ifelse (myright >= ahead) and (myright >= myleft) [ lt 90 ] [ ifelse myleft >= ahead [ rt 90 ] [ lt 180 ] ] fd 1 ;; default don't turn set chemical chemical + chemical-drop end to walk-and-drop ;; turtle procedure (can't reuse code due to last-action saving (would compromise tracking of last actions performed!)) ;if breed = Learners set last-action "walk-and-drop" ifelse (random-float 1) > 0.5 [ rt random-float wiggle-angle ] [ lt random-float wiggle-angle ] fd 1 set chemical chemical + chemical-drop end to stand-still ;if breed = Learners set last-action "stand-still" ;ifelse (random-float 1) > 0.5 ;[ rt random-float wiggle-angle ] ;[ lt random-float wiggle-angle ] end ;;;;;;;;;;;;;;;;;;;;; ;; RL observations ;; ;;;;;;;;;;;;;;;;;;;;; to-report count-from-me [howfar direction] let counter 0 let candidates turtles-here while [counter < howfar] [ set counter counter + 1 if direction = 0 [ set candidates (turtle-set candidates turtles-on patch-ahead counter) ] if direction = 1 [ set candidates (turtle-set candidates turtles-on patch-right-and-ahead sniff-angle counter) ] if direction = -1 [ set candidates (turtle-set candidates turtles-on patch-left-and-ahead sniff-angle counter) ] ] report count candidates end to-report face-cluster-gradient ;; turtle procedure ;; examine the patch ahead of you and two nearby patches; ;; turn in the direction of greatest chemical let ahead count-from-me look-ahead 0 let myright count-from-me look-ahead 1 let myleft count-from-me look-ahead -1 ifelse (myright >= ahead) and (myright >= myleft) [ rt sniff-angle ] [ if myleft >= ahead [ lt sniff-angle ] ] report patch-ahead look-ahead end to-report face-chem-gradient ;; turtle procedure ;; examine the patch ahead of you and two nearby patches; ;; turn in the direction of greatest chemical let ahead [chemical] of patch-ahead look-ahead let myright [chemical] of patch-right-and-ahead sniff-angle look-ahead let myleft [chemical] of patch-left-and-ahead sniff-angle look-ahead ifelse (myright >= ahead) and (myright >= myleft) [ rt sniff-angle ] [ if myleft >= ahead [ lt sniff-angle ] ] report patch-ahead look-ahead end to check-cluster ;; turtle procedure set cluster count turtles in-radius cluster-radius ifelse cluster >= cluster-threshold [ set in-cluster true set is-there-cluster true set ticks-in-cluster ticks-in-cluster + 1 ] [ set in-cluster false ] end ;;;;;;;;;;;;;;;;;;;;; ;; SHOW procedures ;; ;;;;;;;;;;;;;;;;;;;;; ;to setup-individual-plot ; set-current-plot "Average cluster size in # of turtles within cluster-radius" ; create-temporary-plot-pen (word who) ; let p-color scale-color one-of base-colors who 0 count turtles ; set-plot-pen-color p-color ;end to setup-global-plot [p-name pen-name pen-color] set-current-plot p-name create-temporary-plot-pen pen-name set-plot-pen-color pen-color end ;to plot-individual ; set-current-plot "Average cluster size in # of turtles within cluster-radius" ; set-current-plot-pen (word who) ; plot cluster ;end to plot-global [p-name pen-name what] set-current-plot p-name set-current-plot-pen pen-name plot what end to log-ticks [msg what] if (((ticks + 1) mod print-every) = 0) [ type "t" type ticks type ") " type msg print what ] end to log-episodes [msg what] type "E" type episode type ") " type msg print what end ;;;;;;;;;;;;;;;;;;;;; ;; HELP procedures ;; ;;;;;;;;;;;;;;;;;;;;; ;to l-check-cluster ; set l-cluster count turtles in-radius cluster-radius ; set l-cluster l-cluster + count Learners in-radius cluster-radius ; ifelse l-cluster > cluster-threshold ; [ set l-in-cluster true ; set is-there-cluster true ; set l-ticks-in-cluster l-ticks-in-cluster + 1 ] ; [ set l-in-cluster false ] ;end to-report avg? [collection] let summ 0 let lengthh 0 foreach collection [ i -> set summ summ + i set lengthh lengthh + 1 ] if lengthh > 0 [ report summ / lengthh ] report 0 end to-report avg-cluster? let c-sum 0 let c-length 0 foreach sort turtles [ t -> ;if ([cluster] of t) > cluster-threshold ;[ set c-sum c-sum + ([cluster] of t) set c-length c-length + 1 ;] ] let c-avg 0 if not (c-length = 0) [ set c-avg c-sum / c-length ] report c-avg end to setup-action-distribution-table [collection] set action-distribution table:make foreach collection [ c -> table:put action-distribution c 0 ] end to setup-turtle-distribution-table [agentset] set turtle-distribution table:make foreach sort agentset [ c -> ;type [who] of c type " " let turtle-action-distribution table:make foreach actions [ a -> table:put turtle-action-distribution a 0 ] table:put turtle-distribution [who] of c turtle-action-distribution ] end to print-actions [collection sep] ;foreach but-last collection [ c -> foreach collection [ c -> file-type c file-type sep ] ;file-type last collection ;file-print "" end to print-turtle-actions [turtleL actionsL sep] foreach but-last turtleL [ t -> foreach actionsL [ a -> file-type t file-type "-" file-type a file-type sep ] ] foreach but-last actionsL [ a -> file-type last turtleL file-type "-" file-type a file-type sep ] file-type last turtleL file-type "-" file-type last actionsL file-print "" end to print-table [tab sep] ;foreach but-last table:keys tab [ k -> foreach table:keys tab [ k -> file-type table:get tab k file-type sep ] ;file-type table:get tab last table:keys tab ;file-print "" end to print-table-table [tabtab sep] foreach but-last table:keys tabtab [ t-who -> foreach table:keys table:get tabtab t-who [ t-a -> file-type table:get table:get tabtab t-who t-a file-type sep ] ] foreach but-last table:keys table:get tabtab last table:keys tabtab [ t-a -> file-type table:get table:get tabtab last table:keys tabtab t-a file-type sep ] file-type table:get table:get tabtab last table:keys tabtab last table:keys table:get tabtab last table:keys tabtab file-print "" end ;;;;;;;;;;;;;;;;;;;;;;;; ;; LOGGING procedures ;; ;;;;;;;;;;;;;;;;;;;;;;;; to log-params ;; NB explicitly modify lines "e-greedy", "OBSERVATION SPACE", and "REWARD" (everything else is logged automatically) file-print "--------------------------------------------------------------------------------" file-type "TIMESTAMP: " file-print date-and-time file-print "PARAMS:" file-type " population " file-print population file-type " wiggle-angle " file-print wiggle-angle file-type " look-ahead " file-print look-ahead file-type " sniff-threshold " file-print sniff-threshold file-type " sniff-angle " file-print sniff-angle file-type " chemical-drop " file-print chemical-drop file-type " diffuse-share " file-print diffuse-share file-type " evaporation-rate " file-print evaporation-rate file-type " cluster-threshold " file-print cluster-threshold file-type " cluster-radius " file-print cluster-radius file-type " learning-turtles " file-print learning-turtles file-type " ticks-per-episode " file-print ticks-per-episode file-type " episodes " file-print episodes file-type " learning-rate " file-print learning-rate file-type " discount-factor " file-print discount-factor file-type " reward " file-print reward file-type " penalty " file-print penalty file-type " e-greedy " file-type 0.9 file-type " " file-type 0.9993 file-print "" ;; NB: CHANGE ACCORDING TO ACTUAL CODE! file-type "ACTION SPACE: " print-actions actions " " file-print "" ;file-type "OBSERVATION SPACE: " file-type "cluster-gradient " file-print "in-cluster" ;file-type "OBSERVATION SPACE: " file-type "chemical-gradient " file-print "in-cluster" ;; NB: CHANGE ACCORDING TO ACTUAL CODE! file-type "OBSERVATION SPACE: " file-print "chemical-gradient " file-type "REWARD: " file-print "adaptive01" ;; NB: CHANGE ACCORDING TO ACTUAL CODE! file-print "--------------------------------------------------------------------------------" ;; Episode, Tick, Avg cluster size X tick, Avg reward X episode, file-type "Episode, " file-type "Tick, " file-type "First cluster tick, " file-type "Avg cluster size X tick, " file-type "Avg reward X episode, " file-type "Std dev reward X episode, " file-type "Min reward X episode, " file-type "Max reward X episode, " file-type "Avg distance, " file-type "Std dev distance, " file-type "Min distance, " file-type "Max distance, " ;; Actions distribution until tick (how many turtles choose each available action) print-actions actions ", " print-turtle-actions sort Learners actions ", " end to log-params-nolearn ;; NB explicitly modify lines "e-greedy", "OBSERVATION SPACE", and "REWARD" (everything else is logged automatically) file-print "--------------------------------------------------------------------------------" file-type "TIMESTAMP: " file-print date-and-time file-print "PARAMS:" file-type " population " file-print population file-type " wiggle-angle " file-print wiggle-angle file-type " look-ahead " file-print look-ahead file-type " sniff-threshold " file-print sniff-threshold file-type " sniff-angle " file-print sniff-angle file-type " chemical-drop " file-print chemical-drop file-type " diffuse-share " file-print diffuse-share file-type " evaporation-rate " file-print evaporation-rate file-type " cluster-threshold " file-print cluster-threshold file-type " cluster-radius " file-print cluster-radius file-type " learning-turtles " file-print learning-turtles file-type " ticks-per-episode " file-print ticks-per-episode file-type " episodes " file-print episodes ;file-type " learning-rate " file-print learning-rate ;file-type " discount-factor " file-print discount-factor ;file-type " reward " file-print reward ;file-type " penalty " file-print penalty ;file-type " e-greedy " file-type 0.9 file-type " " file-type 0.999 file-print "" ;; NB: CHANGE ACCORDING TO ACTUAL CODE! file-type "ACTION SPACE: " print-actions actions " " file-print "" ;file-type "OBSERVATION SPACE: " file-type "cluster-gradient " file-print "in-cluster" ;file-type "OBSERVATION SPACE: " file-type "chemical-gradient " file-print "in-cluster" ;; NB: CHANGE ACCORDING TO ACTUAL CODE! ;file-type "OBSERVATION SPACE: " file-print "chemical-gradient " ;file-type "REWARD: " file-print "rewardFunc8" ;; NB: CHANGE ACCORDING TO ACTUAL CODE! file-print "--------------------------------------------------------------------------------" ;; Episode, Tick, Avg cluster size X tick, Avg reward X episode, Actions distribution until tick (how many turtles choose each available action) file-type "Episode, " file-type "Tick, " file-type "First cluster tick, " file-type "Avg cluster size X tick, " file-type "Avg distance, " file-type "Std dev distance, " file-type "Min distance, " file-type "Max distance, " print-actions actions ", " print-turtle-actions sort turtles actions ", " end ; Copyright 2022 Stefano Mariani
There are 5 versions of this model.
Attached files
No files
This model does not have any ancestors.
This model does not have any descendants.