The Rate of Reinforcement - The simplest and most applicable training tool you've never heard of

The 1973 oil crisis, brought on by the Arab-Israeli war earlier that year, led at least indirectly, to the development of a series of important concepts within Behavioural Ecology, which are directly applicable to several aspects of dog training and behaviour modification -- that most trainers don’t know. These are the subject of this blog article.

At first, this whole thing may seem a bit weird and far-fetched, but I have seen several independent behaviour profs and whatnot making the same connections as I will in this article. So these ideas aren’t mine, but their direct application to dog training and behaviour modification is something I consider originally my work, including 3-5 new treat-handling, -placement, and -delivery techniques that I think training geeks like me would find useful and/or interesting for bolstering the degrees of freedom available within existing techniques as well as possibly offering novel approaches useful during assessment, conditioning, and proofing.

The 1973 oil crisis began in October when OPEC stopped at least 75% of crude oil shipments to nations who supported Israel during or after the Yom Kippur War, earlier that year. This led directly to a sudden and severely dwindling supply of oil, when compared to a “business-as-usual demand” demand. Many gas stations were without gas for weeks at a time, and many governments were forced into petroleum rationing “for essential services/uses only” to avoid price-gouging, huge lineups, rushes, hoarding, and escalating frustration and violence among those waiting for a fill-up. These situations may have led the academics of the time to a predisposition towards thinking and theorizing about best-distribution of limited resources and the implications for the behaviour of animals exposed to these kinds of stresses -- which just so happen to occur regularly in the daily challenges most wildlife faces.

Optimal foraging theory (OFT) is a field within Behavioural Ecology which talks about how natural selection is likely to select, over time via mutation and “survival of the fittest”, animals who waste as little energy as possible, or who make the smartest decisions about when and how much time spent hiding from predators, searching for food, eating, storing food for later, socializing, or defend the store. OFT is both a theoretical and experimental field, each with a rich research canon. Both Eric Charnov’s Marginal Value Theorem and John Krebs’ Foraging Theory, and Behavioural Ecology are worth looking up. These plus Central Place Foraging are all OFTs, which assume that evolution would select for behaviour that would tend to maximize the rate of prey capture over generations.

The theory goes, that animals who are able to extract the most survival and reproduction value from each hour of each day while trying to minimize predation-risk will be the ones most likely to survive and reproduce in the long-run. Naturally, there is also a good deal of variability in individual behavioural and cognitive biases in most animal populations, but in general, when we go out to measure animal behaviour in the wild, we should expect to see most animals in the wild doing a “pretty good job” at balancing risks vs. rewards by taking enough chances to make sure they eat sufficiently but not over-feeding to the point that over-exposes them to predation risk, or reduces their ability to escape if a predator appears.

My masters research was on Eastern chipmunks, Tamias striatus. My research advisor, Dr. Giraldeau, found good evidence that chipmunks base their foraging decisions based on the rate of prey capture. My work, showed that the presence of other chipmunks foraging

All of these concepts are based on a rather surprising assumption that food resources are not distributed uniformly in the environment, but are present in patches of high resource density. This is in fact, what I saw during my chipmunk behaviour research. The Canadian forest where I did that research has many tree species, perhaps 25 species of trees, three of which (maple, oak, beech) are considered "masting tree species". That means that they produce seeds pretty much every year, but every few years, each tree has a "mast" year where it produces several times the normal number of seeds. Seeds are produced in summer and fall in the autumn, as the chipmunks are trying to cache seeds in their burrow/larder to survive on during the winter. It is to each chipmunk's advantage to try to collect seeds under a tree that has masted in that particular year, until the number of seeds under that tree, i.e.: in this tree's patch, declines to the number of seeds the average tree produces in a "non-mast" year. At that time it is to that chipmunk's advantage to find another tree that has masted this year and collect seeds there. This attracts chipmunks in neighbouring burrow/larders to forage together under the same trees, encouraging them to socialize together -- usually in an "agonistic" manner. It turns out that chipmunks continue to capture seeds from a patch for at least two or three minutes beyond the average rate of prey capture has fallen to or below the average of a tree that has not masted.

In fact, these types of behaviours are what many researchers have found in a wide variety of species; many different types of animals including fish, octopus, cuttlefish, eels, arthropods, reptiles, birds, and mammals can all detect about a 10% change in the “rate of prey capture” within a few minutes, wherever and whenever it occurs. It appears all of these diverse animal groups have developed neural nets capable of making the integral calculations required to determine the marginal rate of prey capture given a perception of time between prey captures to plot a logarithmic depletion curve. This means dogs can detect a 10% change in “the rate of reinforcement” between two different vocal cues, hand signals, dog handlers, or training situations. I don’t have a reference ready to support this broad claim because I have not had recent access to the scientific literature, but it’s something I believe has been firmly established within the behaviour research cannon.

Before we talk about the particulars of what those “rates of reinforcement” are, and are good for, I want to clarify what I mean by “a 10% change in rate of reinforcement”. There are all kinds of ideas about what reinforcement is and how often you should do it, so I want to clarify some of the things I mean, and some of the things I DO NOT mean by this usage of “reinforcement”.

I’ve heard Cesar Millan say something like, “[Reinforcement is not a thing for ‘the alpha dog’. You will never see ‘the alpha dog’ go up to another dog and say, ‘wow, thank you for walking ten miles’]”. And I’ve heard “The Wolf Man” say he reinforces his dog every day at supper time, and if the dog ‘has been bad’ he doesn’t eat. Neither of these opinions make much sense to me. This isn’t what reinforcement looks like in my book.

I also see a lot of folks who work getting sits and downs and other tricks twice-a-minute for twenty minutes, then make and serve tea for 15mins, and then take a biscuit to the dog and tell him he was a good boy when he was doing his sits and downs half-an-hour ago. This isn’t reinforcement either - or rather, the biscuit most likely reinforces being on the bed rather than the earlier training. I’m not talking about this either.

I’m talking about an operant-conditioning scenario where the dog is familiar with the operating framework - for example, a dog that knows that if it comes by you and makes a nice sit, you are much more likely to respond with attention than you would, if had the dog not made the sit, and only approached standing. The dog has to believe it has the power to make you reinforce it, as long as it behaves according to certain expectations. One of the best ways about this is to demonstrate to the dog that it can trigger you to reinforce it whenever it wants -- at least temporarily. I use the principle of interruptibility, as I described in my blog article about the Premack Principle to communicate my interest in reinforcing any particular behaviour.

The way I demonstrate to a dog, that it can cue me to reinforce it is simple:

I make a criterion in my head, an idea of what I’m looking for. Let’s say I want to emphasize a sit… I would make sitting the criterion. If the dog sits, I will reinforce the dog for doing so. Further, I will interrupt anything else I may be doing. If I can detect that the dog sat, I will interrupt whatever else I am doing in that moment to tell the dog they’ve done well, and promptly deliver them a bit of kibble. You really do have to make an effort to keep a keen eye to make sure you reinforce the sit reliably enough to make a lasting impression. There’s one trick in particular I love to use in connection with this that makes it even more effective as a conditioning tool, but you’ll have to contact me to learn about it.

At the beginning of the conditioning your goal rate of reinforcement for the sit in this case is 100% -- you want to try to “capture” as many sits with a reinforcement as you can. A finished sit is usually reinforced with food about 5% of the time. This is one of the things I mean by rate of reinforcement -- a measurement of how many times you reinforce the behaviour out of the total number of times the dog actually does the behaviour (measured as a percentage). I will refer to this kind of “rate of reinforcement” as the, “operant rate of reinforcement”.

There’s something else I mean by “rate of reinforcement”... Imagine there are two trainers standing in the park, 25m apart. The trainer in the East can give a dog a treat two times each minute, or every thirty seconds. The trainer in the West can give a dog a treat twenty times each minute, or every three seconds. If there are twenty-two dogs in the park, two should be in the East, and twenty should be in the West (in Behavioural Ecology, this is also known as the “Ideal free distribution”. This is the other thing I mean by rate of reinforcement -- a measurement of how many reinforcements are provided every second or every minute (measured in treats per second or treats per minute). I will refer to this kind of “rate of reinforcement” as the, “immediate rate of reinforcement”.

I believe dogs are capable of detecting a 10% difference in both the operant rate of reinforcement, and in the immediate rate of reinforcement -- and I am working on an experiment to provide evidence for this claim.

I am currently able to reinforce a dog as fast as 75 kibbles per minute, or 1.15 kibbles per second. This is effectively the maximum rate of reinforcement available during a typical training session with me. I usually don’t reinforce at this rate for longer than 30 seconds at-a-time (so at maximum, a quarter cup of kibble, assuming a 70lb dog). The minimum, or least rate of reinforcement I would normally offer during a typical training session is one kibble per twenty minutes (usually offered during advanced confinement training). That represents a fifteen-hundred-fold increase in the immediate rate of reinforcement between the minimum and maximum rate I offer during a typical session (I define “a fold” in this case as a 100% increase). I feel this is far beyond what a typical trainer might offer. If I’m correct and dogs are capable of detecting a 10% change in the immediate rate of reinforcement, that means I have 15,000 different rates to offer any particular dog. If I pull out, say twenty, different rates in a one-hour session, I’m able to communicate a lot more information to the dog about its performance on each individual repetition of the exercise, when compared to a session where I only offer two different rates of reinforcement.

Reinforcement Type

Offering different types of reinforcement simultaneously, in different numbers, in different situations, is another critical component of teaching lessons that last. In this case I’m referring not only to reinforcement by voice, eye contact, and touch. Each dog has different preferences among the times, places, activities, objects, people and styles of reinforcement it prefers. Knowing yours intimately only adds to the different rates and types of reinforcement you can use simultaneously during any given session to communicate with your dog(s).

The more different types of reinforcement you use at the same time, the more reinforcing the event will be. The more closely the simultaneous reinforcements match the preferences of the dog, the more reinforcing the event will be.

Using generalization and proofing techniques in conjunction with the rate of reinforcement and reinforcement type is one of the most effective ways of teaching a dog a framework of routines and expectations in a way that helps them understand their roles and responsibilities more and more over time.

See my blog article on the Premack Principle, particularly the “Techniques” section, as many of them are also associated with the rate of reinforcement and conditioning strategies associated with it and Contact Us with any questions or comments.

Sources:

https://en.wikipedia.org/wiki/1973_oil_crisis

https://en.wikipedia.org/wiki/Yom_Kippur_War

https://en.wikipedia.org/wiki/Behavioral_ecology

https://en.wikipedia.org/wiki/Optimal_foraging_theory

https://en.wikipedia.org/wiki/Marginal_value_theorem

https://en.wikipedia.org/wiki/Central_place_foraging

https://en.wikipedia.org/wiki/Eric_Charnov

https://en.wikipedia.org/wiki/John_Krebs,_Baron_Krebs

https://en.wikipedia.org/wiki/Ideal_free_distribution

https://en.wikipedia.org/wiki/Derivative