Lecture 22
REINFORCEMENT
Reinforcement Contingencies
A reinforcement contingency is a consistent relationship between a response and the
changes in the environment that it produces
For example, an experiment in which a pigeon’s pecking a disk (the response) is
generally followed by the presentation of grain (the corresponding change in the
environment)
This consistent relationship, or reinforcement contingency, will usually be
accompanied by an increase in the rate of pecking
For delivery of grain to increase only the probability of pecking, it must be
contingent only on the pecking response—the delivery must occur regularly after that
response but not after other responses, such as turning or bowing
Based on Skinner’s work modern behavior analysts seek to understand behavior in terms of
reinforcement contingencies. These are applied in
o Behaviour modification therapies
o Trainings
o Management of behavior
A reinforcer, any stimulus that—when made contingent on a behavior—increases the
probability of that behavior over time.
Reinforcement is the delivery of a reinforcer following a response.
Reinforcers are always defined empirically, in terms of their effects on changing the
probability of a response.
Individual differences in reinforcers
Positive and Negative Reinforcers
Positive, Neutral, Negative
Positive = Appetitive(we want them)
Negative = aversive, want to avoid them
A behavior followed by the delivery of an appetitive stimulus, is positive
reinforcement
A response followed by removal of an aversive stimulus = negative reinforcement
What is reinforcing for a child?
And for young persons
Properties of Reinforcement
Your pet rat will turn circles if a consequence of circle turning is the delivery of
desirable food
humans will tell jokes if a consequence of their joke telling is a type of laughter they
find pleasurable
Child will stop crying if she is picked up or given a candy
Each one of these are consequences of behaviour and act as reinforcers
Consequences are learnt fast and serve to strengthen the behaviour or response
but not each reinforcer is positive and comes with a cost
Sometimes a child needs comfort
But if we console each time a child throws a tantrum or is disruptive, it becomes
reinforcer for that behaviour
Cost; they learn that crying is only way to get attention or comfort
Remember "a behavior is followed by the removal of an aversive stimulus, the event is
called negative reinforcement “
A child is disruptive in class, gets attention
Does not sit still, is given many distracters
A wife panics and gets lot of attention
A mother gets sick and gets her way
A behaviour we want to stop is being reinforced
Two types of learning circumstances where negative reinforcement applies
In escape conditioning, animals learn that a response will allow them to escape
from an aversive Stimulus
You learn to use an umbrella to escape the aversive stimulus of getting wet
In avoidance conditioning, animals learn responses that allow them to avoid aversive
stimuli before they begin
car buzzer if not buckle seat belt
Both positive reinforcement and negative reinforcement increase the probability of the
response that precedes them
Positive reinforcement increases response probability by the presentation of an appetitive
stimulus following a response
Negative reinforcement does the same in reverse, through the removal, reduction, or
prevention of an aversive stimulus following a response
Reinforcers are the power brokers of operant conditioning: they change or maintain
behavior
have a number of interesting and complex properties
can be learned through experience rather than be biologically determined
can be activities rather than objects
In some situations, even ordinarily powerful reinforcers may not be enough to change a
dominant behavior pattern (in this case, we would say that the consequences were not
actually reinforce
Primary and Conditioned Reinforcers
Handful of primary reinforcers, such as food and water, whose reinforcing properties
were biologically determined
Over time neutral stimuli have become associated with primary reinforcers and now
function as conditioned reinforcers for operant responses
Secondary reinforcers; money, cheques, valuables
a great deal of human behavior is influenced less by biologically significant primary
reinforcers than by a wide variety of conditioned reinforcers
Social reinforcers; Grades, smiles of approval, gold stars, and various kinds of status
symbols
Virtually any stimulus can become a conditioned reinforcer by being paired with a
primary reinforcer
In one experiment, simple tokens were used with animal learners.
Schedules of Reinforcement
Operant Extinction
If reinforcement is withheld operant extinction occurs
if a behavior no longer produces predictable consequences, it returns to the level it
was at before operant conditioning-it is extinguished
Example; putting coins in machine to get a cola, machine does not deliver, after trying
few times, you stop putting coins- response extinguished
Spontaneous Recovery
You may come back later or kick the machine to try after some time
Pigeon no longer recieves grain when he pecks at the green light in skinner box- he stops
pecking
next time the pigeon is put back in the apparatus with the green light on, the pigeon
would likely spontaneously peck
Spontaneous recovery occurs but response rate is less and stops very quickly
Punishers
Another technique for decreasing the probability of a response- punishment
A punisher is any stimulus that-when it is made contingent on a response-decreases the
probability of that response over time
Punishment is the delivery of a punisher following a response
When a behavior is followed by the delivery of an aversive stimulus, the event is called
positive punishment (positive because something is added to the situation); touching a
hot stove, pain, next time not likely to touch the stove
When a behavior is followed by the removal of an appetitive stimulus, the event is
referred to as negative punishment(negative because something is subtracted from the
situation)a parent withdraws a child’s allowance when hits her baby brother.
A way to remember
A time out is “the contingent
withholding of the opportunity to earn reinforcement . . . from rewarding stimuli
including attention from the parent, as a consequence of some form of misbehavior”
(Morawska & Sanders, 2011, p. 2)
Be sure time out is punishing not enjoyable; Children may learn this is a way to escape a
task they don’t wont to do
Make request again after time ends
Effective for children 3-7 years
Schedules of Reinforcement
A Story
B. F. Skinner; It seems that one weekend he was secluded in his laboratory with not
enough of a food-reward supply for his hardworking rats. He economized by giving the
rats pellets only after a certain interval of time—no matter how many times they
pressed in between, they couldn’t get any more pellets. Even so, the rats responded as much
with this partial reinforcement schedule as they had with continuous reinforcement.
o Fixed-Ratio Schedules: In fixed-ratio (FR) schedules, the reinforcer comes after the
organism has emitted a fixed number of responses
o Many salespeople are on FR schedules: They must sell a certain number of units
before they can get paid
o In a variable-ratio (VR) schedule, the average number of responses between
reinforcers is predetermined
o A VR-10 schedule means that, on average, reinforcement follows every 10th response,
but it might come after only 1 response or after 20 responses. Variable-ratio schedules
generate the highest rate of responding and the greatest resistance to extinction,
especially when the VR value is large
o On a fixed-interval (FI) schedule, a reinforce is delivered for the first response
made after a fixed period of time
o For variable-interval (VI) schedules, the average interval is predetermined; on a VI-20
schedule, reinforcers are delivered at an average rate of 1 every 20 seconds, generates a
moderate but very stable response rate
o Extinction under VI schedules is gradual and much slower than under fixed-interval
schedules
o A pigeon pecked 18,000 times during the first 4 hours after reinforcement stopped and
required 168 hours before its responding extinguished completely (Ferster & Skinner,
1957)
o A professor, who gave occasional, irregularly scheduled quizzes; do you study your notes
all the time.
Schedules of reinforcement; Applications and Interventions
B. F. Skinner agreed with Thorndike’s view that environmental consequences exert a
powerful effect on behavior. Skinner outlined a program of research to discover, by
systematic variation of stimulus conditions, the ways that various environmental conditions
affect the likelihood that a given response will occur.
A natural datum in a science of behavior is the probability that a given bit of behavior will
occur at a given time. An experimental analysis deals with that probability in terms of
frequency or rate of responding.
The task of an experimental analysis is to discover all the variables of which probability of
response is a function. (Skinner, 1966, pp. 213–214
Being an experimenter and empiricist, Skinner designed Operant procedures to
manipulate consequences of an animal’s behaviour to observe how these affected
subsequent behaviour
An operant is any behavior that is emitted by an organism and can be characterized
in terms of the observable effects it has on the environment; operant means affecting the
environment, or operating on it
Being an experimenter and empiricist, Skinner designed Operant procedures to
manipulate consequences of an animal’s behaviour to observe how these affected
subsequent behaviour
An operant is any behavior that is emitted by an organism and can be characterized
in terms of the observable effects it has on the environment; operant means affecting the
environment, or operating on it
o Continuous Reinforcement; each response is rewarded, pigeon gets food after each
peck
o Partial reinforcement; reward, food is delivered on some occasions and not on
others
Partial reinforcement can be after a certain number of responses, or an interval schedule,
after the first response following a specified interval of time. In each case, there can be either
a constant, or fixed, pattern of reinforcement or an irregular, or variable, pattern.
o Fixed Interval, Fixed ratio
o Variable Interval, Variable Ratio
o Interesting aspects have been noticed about these various schedules
The rats whose lever pressing had been partially reinforced continued to respond
longer and more vigorously than did the rats who had gotten payoffs after every
response
Different schedules of reinforcement in; when you raise your hand in class, the
teacher sometimes calls on you and sometimes does not
Some slot machine players continue to put coins in the one-armed bandits even
though the reinforcers are delivered only rarely
FI schedule when you reheat a slice of pizza. Suppose you set the oven’s timer for 2
minutes. You probably won’t check very much for the first 90 seconds, but in the last
30 seconds, you’ll peek in more often
Gambling would seem to be under the control of VR schedules. The response of
dropping coins in slot machines is maintained at a high, steady level by the
payoff, which is delivered only after an unknown, variable number of coins has
been deposited. VR schedules leave you guessing when the reward will come—you
gamble that it will be after the next response, not many responses later
A thought; waiting for good deeds to be rewarded in hereafter- is it reinforcement
delayed but yet effective?