In college I read about the advantages of conjoint analysis over the more intuitive method of using a Likert scale–the familiar rate-this-thing-from-one-to-five or whatever scale. It turns out that people get bored with Likert scales, and end up either reporting everything as extremes, or the median. It has been shown that you can get a better reading on people by asking them about their preference regarding two items. In this post I’d like to share the beginning of a framework for modelling these sorts of situations. Specifically, I’d like to model agents with specific behaviors, and see if those behaviors are apparent through conjoint analysis, i.e, I’d like to test conjoint methods under different controlled circumstances.
We will begin, as we always do, by importing stuff.
import random import pandas import scipy.stats
Next, we’ll build an agent. In this example we’re considering an agent’s preference over different magazine titles. Given a list of titles, the agent will randomly pick one or more of those titles. Next, we will randomly assign utilities to those titles. We’ll first pick a subset of the interval [5,25] dollars, representing the agent’s price range for any one title, then we’ll draw utilities from that interval. Thus, some agents will have narrow price ranges, while others will have broad price ranges. Finally, we’ll map titles to utilities, the price beyond which an agent would not pay for the title.
class Agent: def __init__( self, titles ): # pick some number, n, of titles self.n = random.randint(1,len(titles)) # pick n specific titles self.titles = random.sample( titles, self.n ) # pick a subset of [5,25] u = np.sort( scipy.stats.uniform(5,20).rvs(2) ) # draw uniformly from the subset of [5,25] u = scipy.stats.uniform(u[0],u[1]-u[0]).rvs( self.n ) # build a dictionary of title -> utility self.utility = dict( zip( self.titles, u ) )
Next, we’ll subclass Agent
into Hunter
and Bro
. Both Hunter
and Bro
have tendencies to value Field and Stream and Men’s Health, but Hunter
s usually favor Field and Stream more, and Bro
s typically favor Men’s Health more.
class Hunter( Agent ): ''' Values both Men's Health and Field and Stream, but more likely to favor Field and Stream ''' def __init__( self, titles ): Agent.__init__( self, titles ) # behavior rule if "Field and Stream" in self.titles: self.utility["Field and Stream"] *= 1.25 else: u = scipy.stats.uniform(5,20).rvs() self.utility["Field and Stream"] = u if "Men's Health" in self.titles: self.utility["Men's Health"] *= 1.1 else: u = scipy.stats.uniform(5,20).rvs() self.utility["Field and Stream"] = u class Bro( Agent ): ''' Values both Men's Health and Field and Stream, but more likely to favor Men's Health ''' def __init__( self, titles ): Agent.__init__( self, titles ) # behavior rule if "Men's Health" in self.titles: self.utility["Men's Health"] *= 1.25 else: u = scipy.stats.uniform(5,20).rvs() self.utility["Men's Health"] = u if "Field and Stream" in self.titles: self.utility["Field and Stream"] *= 1.1 else: u = scipy.stats.uniform(5,20).rvs() self.utility["Field and Stream"] = u
Next we’ll define the titles, create some generic Agent
s, some Hunter
s and Bro
s and mix everyone up in a population.
titles = [ "Men's Health", "Field and Stream", "Women's Digest", "Cosmopolitan", "The New Yorker" ] pop = [ Agent( titles ) for i in range(100) ] hunters = [ Hunter( titles ) for i in range(100) ] bros = [ Bro( titles ) for i in range(100) ] pop = pop + hunters + bros random.shuffle( pop )
Next we’ll write a function to assess the ground-truth or gold-standard of the population statistics, this will be a pandas DataFrame
that we can query later.
def gold_standard( population, titles ): N, p = len(population), len(titles) z = np.zeros(( len(population), len(titles) )) for i in range( N ): for j in range( p ): if titles[j] in population[i].titles: z[i,j] = population[i].utility[ titles[j] ] z = pandas.DataFrame( z, columns=titles ) return z
Next we’ll define a questionnaire that we pose the members of our population, asking them whether they prefer one title or another. This internally queries the utilities we assigned to different titles in the Agent
class.
def questionmaire( a, t0, t1 ): if( t0 in a.titles )and( t1 in a.titles ): t0_value = a.utility[ t0 ] t1_value = a.utility[ t1 ] if t0_value > t1_value: return t0 elif t0_value < t1_value: return t1 else: return "Ambivalent" elif( t0 in a.titles ): return t0 elif( t1 in a.titles ): return t1 else: return "No Interest"
Next we’ll define a “study” that surveys a random sample of the population with the questionnaire, and reports the results as percentages.
def study( population, size, t0, t1 ): if size < 1: size = np.round( len( population ) * size ) sample = random.sample( population, size ) ans = map( questionaire, sample, [t0]*size, [t1]*size ) uni = np.unique( ans ) return { answer:ans.count(answer)/float(size) for answer in uni }
Next, we can repeatedly apply studies…
N = len( titles ) m = np.zeros(( N, N, 4 )) for k in range( 4 ): for i in range( N ): for j in range( i+1, N ): outcome = study( pop, 50, titles[i], titles[j] ) m[i,j,k] = outcome[titles[i]] m[j,i,k] = outcome[titles[j]]
And then plot some somewhat unhelpful results.
fig, axs = subplots(2,2) axs[0,0].matshow( m[:,:,0], vmin=0, vmax=1 ) axs[0,1].matshow( m[:,:,1], vmin=0, vmax=1 ) axs[1,0].matshow( m[:,:,2], vmin=0, vmax=1 ) axs[1,1].matshow( m[:,:,3], vmin=0, vmax=1 )
In the future I’ll do some more work regarding conjoint analysis, but I feel like this is a decent start on building a simulation for testing.