In college I read about the advantages of conjoint analysis over the more intuitive method of using a Likert scale–the familiar rate-this-thing-from-one-to-five or whatever scale. It turns out that people get bored with Likert scales, and end up either reporting everything as extremes, or the median. It has been shown that you can get a better reading on people by asking them about their preference regarding two items. In this post I’d like to share the beginning of a framework for modelling these sorts of situations. Specifically, I’d like to model agents with specific behaviors, and see if those behaviors are apparent through conjoint analysis, i.e, I’d like to test conjoint methods under different controlled circumstances.
We will begin, as we always do, by importing stuff.
import random import pandas import scipy.stats
Next, we’ll build an agent. In this example we’re considering an agent’s preference over different magazine titles. Given a list of titles, the agent will randomly pick one or more of those titles. Next, we will randomly assign utilities to those titles. We’ll first pick a subset of the interval [5,25] dollars, representing the agent’s price range for any one title, then we’ll draw utilities from that interval. Thus, some agents will have narrow price ranges, while others will have broad price ranges. Finally, we’ll map titles to utilities, the price beyond which an agent would not pay for the title.
class Agent:
def __init__( self, titles ):
# pick some number, n, of titles
self.n = random.randint(1,len(titles))
# pick n specific titles
self.titles = random.sample( titles, self.n )
# pick a subset of [5,25]
u = np.sort( scipy.stats.uniform(5,20).rvs(2) )
# draw uniformly from the subset of [5,25]
u = scipy.stats.uniform(u[0],u[1]-u[0]).rvs( self.n )
# build a dictionary of title -> utility
self.utility = dict( zip( self.titles, u ) )
Next, we’ll subclass Agent into Hunter and Bro. Both Hunter and Bro have tendencies to value Field and Stream and Men’s Health, but Hunters usually favor Field and Stream more, and Bros typically favor Men’s Health more.
class Hunter( Agent ):
'''
Values both Men's Health and Field and Stream,
but more likely to favor Field and Stream
'''
def __init__( self, titles ):
Agent.__init__( self, titles )
# behavior rule
if "Field and Stream" in self.titles:
self.utility["Field and Stream"] *= 1.25
else:
u = scipy.stats.uniform(5,20).rvs()
self.utility["Field and Stream"] = u
if "Men's Health" in self.titles:
self.utility["Men's Health"] *= 1.1
else:
u = scipy.stats.uniform(5,20).rvs()
self.utility["Field and Stream"] = u
class Bro( Agent ):
'''
Values both Men's Health and Field and Stream,
but more likely to favor Men's Health
'''
def __init__( self, titles ):
Agent.__init__( self, titles )
# behavior rule
if "Men's Health" in self.titles:
self.utility["Men's Health"] *= 1.25
else:
u = scipy.stats.uniform(5,20).rvs()
self.utility["Men's Health"] = u
if "Field and Stream" in self.titles:
self.utility["Field and Stream"] *= 1.1
else:
u = scipy.stats.uniform(5,20).rvs()
self.utility["Field and Stream"] = u
Next we’ll define the titles, create some generic Agents, some Hunters and Bros and mix everyone up in a population.
titles = [ "Men's Health", "Field and Stream", "Women's Digest", "Cosmopolitan", "The New Yorker" ] pop = [ Agent( titles ) for i in range(100) ] hunters = [ Hunter( titles ) for i in range(100) ] bros = [ Bro( titles ) for i in range(100) ] pop = pop + hunters + bros random.shuffle( pop )
Next we’ll write a function to assess the ground-truth or gold-standard of the population statistics, this will be a pandas DataFrame that we can query later.
def gold_standard( population, titles ):
N, p = len(population), len(titles)
z = np.zeros(( len(population), len(titles) ))
for i in range( N ):
for j in range( p ):
if titles[j] in population[i].titles:
z[i,j] = population[i].utility[ titles[j] ]
z = pandas.DataFrame( z, columns=titles )
return z
Next we’ll define a questionnaire that we pose the members of our population, asking them whether they prefer one title or another. This internally queries the utilities we assigned to different titles in the Agent class.
def questionmaire( a, t0, t1 ):
if( t0 in a.titles )and( t1 in a.titles ):
t0_value = a.utility[ t0 ]
t1_value = a.utility[ t1 ]
if t0_value > t1_value:
return t0
elif t0_value < t1_value:
return t1
else:
return "Ambivalent"
elif( t0 in a.titles ):
return t0
elif( t1 in a.titles ):
return t1
else:
return "No Interest"
Next we’ll define a “study” that surveys a random sample of the population with the questionnaire, and reports the results as percentages.
def study( population, size, t0, t1 ):
if size < 1:
size = np.round( len( population ) * size )
sample = random.sample( population, size )
ans = map( questionaire, sample, [t0]*size, [t1]*size )
uni = np.unique( ans )
return { answer:ans.count(answer)/float(size) for answer in uni }
Next, we can repeatedly apply studies…
N = len( titles )
m = np.zeros(( N, N, 4 ))
for k in range( 4 ):
for i in range( N ):
for j in range( i+1, N ):
outcome = study( pop, 50, titles[i], titles[j] )
m[i,j,k] = outcome[titles[i]]
m[j,i,k] = outcome[titles[j]]
And then plot some somewhat unhelpful results.
fig, axs = subplots(2,2) axs[0,0].matshow( m[:,:,0], vmin=0, vmax=1 ) axs[0,1].matshow( m[:,:,1], vmin=0, vmax=1 ) axs[1,0].matshow( m[:,:,2], vmin=0, vmax=1 ) axs[1,1].matshow( m[:,:,3], vmin=0, vmax=1 )
In the future I’ll do some more work regarding conjoint analysis, but I feel like this is a decent start on building a simulation for testing.