Wednesday, October 10, 2012

Variable Ratio Training and Extinction of Sniffy the Virtual Rat

To train Sniffy the Rat to press the bar on a VR-50 schedule of reinforcement. A variable ratio schedule of reinforcement may reward the behavior for different amounts of responses, but the average number of responses required to receive reinforcement is 50. After Sniffy was performing successfully on the VR-50 schedule, extinction was performed where Sniffy did not receive any reinforcement for bar-pressing behavior.

Sniffy was moved up to a VR-5 schedule of reinforcement following shaping. The ratio was subsequently stretched through VR-10, VR-20, VR-35, and finally VR-50. Sniffy was isolated during each schedule, allowing the program to quickly learn the behavior on its own. Each schedule was considered learned when bar-sound and action strength graphs reached their maximum. Once the VR-50 schedule was learned, Sniffy was exposed to extinction until the graphs representing bar-sound's association with reinforcement and action strength fell to zero.

Sniffy successfully adapted to each VR schedule and made it to VR-50 without any issues. The entire process took around 30 minutes in real-time, but the program was isolated to run at a high rate. Each schedule was continued until the program recorded a cumulative record indicative of the appropriate VR schedule. Extinction was successful and Sniffy no longer associated bar-pressing and sound with reinforcement. The action strength of Sniffy's response also fell to zero, indicating that the response was unlikely to occur in the future without additional shaping.

The training of Sniffy on the VR schedule was relatively simple because I was able to isolate the experiment and let Sniffy learn on his own. I did not return to the program until the bar-sound and action strength graphs reached their maximum. I ensured that Sniffy was performing the bar-pressing behavior reliably before stretching the ratio. Sniffy was less likely to explore the box during the variable ratio schedules. I assume this is because Sniffy understood that bar-pressing would yield a reinforcement, but he was not aware of how many bar-presses were required. Therefore, he was likely to remain at the bar to receive reinforcement. On average, the acquisition of a new VR schedule took around 30 minutes in virtual time. Sniffy slowly responded during the beginning of each schedule, but his responses became more consistent as he learned the schedule.

Cumulative record of VR-50 schedule indicating the schedule had been learned by the program.
Extinction of Sniffy was also a very simple task. He pressed the bar repeatedly during the beginning of extinction, but his response rate dropped sharply near the end of extinction. Extinction was considered to be complete when Sniffy was no longer performing the instrumental response.

Cumulative record of extinction. Note that Sniffy reaches a response rate of zero.
Figure 1. Graph of bar-sound and action strength for VR-50 (left) vs. Extinction (Right).

