Object models and data sets for a social network .. a technique for data

following on from my last post .. Object models and data sets for a social network .. crawlers vs simulation ..

(read the previous email first to get the context)

I have been thinking of this last night and this may be a solution

The question is – Can we extapolate social network data based on existing data patterns?

We have three components

a) A knowledge base(typical size of friends, no of blog posts etc)

b) Parameterization (setup configuration to run the generator) and

c) The generation itself

We need a large volume of data to be relevent

We need parameters that mirror real life

Here is my plan

The objective is to ‘clone’ the transactions from a core set first and then apply the parameters(intelligence)

a) Create configuration tables (for instance a profiles table)

b) Create transactions tables(blog entries, facebook pokes etc) and

populate it with the base entries

c) Create a cartesian join between the profiles table and the blog

entries table. In a normal course of events, cartesian joins are not

desirable. However they are good to create a massively large number of

records very quickly

fror instance

select profile.profile_id, blogs.blog_id from profiles, blogs

If profiles has profile_id P1, P2

blogs has blog_id B1, B2

will give 4 rows

P1, B1

P1, B2

P2, B1

P2, B2

d) We thus get a large number of rows

e) We then ‘apply’ the rules as a series of update statements on the

base data(post cartesian join)

f) This gives us the ‘real’ data

g) To make this work, I plan to ‘open source’ the whole thing -

tables and more importantlyt the knowledge base.

h) So, I see many people contributing insights(A typical user on FB

has typically 100 friends on average, Myspace has 40 blog entries per

week that sort of thing)

i) So, we can now create a set of data based on parameters and its

all open sourced


kind rgds