Welcome to the second talk I think, having here might help will not be as technical, because we havent been acquired yet and there’s a lot of dark secrets in what we do, but I will do my best to go as technical as we can so. Im data scientist at codec: we do content marketing and were trying to solve one of a biggest problem in our world, so how many of you enjoy love commercials for
Example on YouTube you’re trying to watch. Maybe your favorite metal band live concert and then before you can get to the video pregnancy test turay and you have to watch it for this 20 seconds. So super annoying people are using ad blockers. They are not using TV as much anymore, theyre, not clicking on banners and clickbait and as a result, companies are losing and brands theyre
Losing a lot of money because theyre producing content distributing it and nobodys interested once in a while. You do get very early. You get a video or an article that you understand its its sponsored its produced by Brent, but its interesting, its captivating, and you actually end up learning something. So this is the kind of content were trying to help brands to create, so they come to
Us with a question: what content should we make and we are fundamentally changing the way these creative decisions are made because were trying to tell them a lot about the audiences theyre trying to get their content to so were not trying to tell them. Okay, you should make a video with a kitten playing with a ball and then there’s a
plane and everything is in fire were not doing that not were not creators we are telling them a very detailed picture about the people they want to reach out to and were actually telling them one of this audience they have in mind already whether its suitable for them or not and which audience would be possibly better so there are many steps in this process this is just a very sample simple flow we first to interact a lot with a brand then we collect a lot of digital signals public signals were not spying everyone.
and we are identifying the tribes within the digital signals were looking at their personality their interests what kind of content they interact with and share and then we digest all this into a very concise short report online offline for the brands to see to distribute within their own teams and to make creative decisions so today Ill focus only on this part its about discovering the tribes online and learning as much as you can about them before before I.
started that again Ill be preaching to the converted of course this is your raw data and its so important to learn as much as you can about the source of this data to think about the problem you’re trying to solve because if you just start throwing algorithms at it it will come out of the bushes and bite you from so what you do is you do very careful analysis thought through especially during the RNG of course thought through.
a hundred times and only this will help you to create actually actionable nice data so in terms of drive discovery its easy enough to collect like an audience of 250k users its its a small and very easy to find and the first thing we do and is running simple clustering algorithm maybe can find ten big tribes that cover most of that or can we end up.
maybe with this nice fluffy picture I dont know how much you can see from here and feet its a bit too – nice – nice shades so it looks nice and feathery but its not very actionable to be honest its not very useful you can see all these nice colors that’s it so we want to know more about these tribes inside this audience we want to.
know who these people are how do they interact or what are they interested in who do they listen to and how does the whole picture change over Im among these five factors there’s one life we dont we dont actually tell you who these people are were not spies and its pretty useless to know the names and idea of to two hundred fifty or a million people but we can certainly tell you who they listen to so for example we can identify a tribe and look at the.
influential profiles within this tribe that can be for example in a single tribe a set like this Metropolitan Museum classicism British Library and W Magazine so just by looking this profiles it already gives you a picture of what kind of interest these people might have you know who is interested in like what kind of culture they might have and that so that’s already some digestible concise information how do these people interact so again the image.
didnt show up unfortunately it was a lovely blob here yeah the projector doesnt show it I see it on the screen here but for every of those tribes that can tell you how many people are within the tribe how how well does he interact how actively what is the path between every two members what do they share how many people share how many people just listen so all these things we can also tell you what theyre interested in so we do topic extraction and the main extraction extraction and we can analyze.
things we share and extract all these insights and very important that we can also tell you how this whole picture changes over time so one thing here the blobs wonderful so one thing is to have a tribe that over the whole year had 23 3 K users for example but if we take just a certain month it might be just seven care users and then in three months the picture changes and there’s few users now and moreover only 40% transferred and there’s 16 million they came from somewhere else and then in three months.
the picture changes again there’s even fewer users now but more people stayed and your people joined so this again for the brands this is a very important information to know because they want to know the audience they want to know how stable these tribes are we can also see how people transfer from tribe to tribe actually do they mostly so how stable these tribes are all the time how big they are of course so this is this is again another useful bit but essentially.
were just extracting statistics right I know there’s a problem with the picture so if you are essentially what we need to give to our clients are examples of good tribes good communities if you ask Google say what is a good community what is the bad community give me a picture it gives you these very poor dramatic pictures I dont agree with either of them so and this is not what to give to.
our clients for our clients the good community needs and we connected engaged they stable over time focused on maybe certain topics around certain areas and a bad community would be something that’s disconnected they dont really interact with each other its structurally there you can find it with clustering algorithm but essentially its very volatile and its all over the place it talks about all possible topics and you cant really figure out what these people are about so these are our.
definitions the question is that its really difficult to put numbers to do these parameters right so how do we say like what is volatile enough so that we can assign it a bad label or how how exactly focused should they be so what we did is we did clustering simple simple clustering people who probably recognized matplotlib here and after all the proper data processing and even before that there were always four clusters always focuses whatever we did the best silhouette was four four.
clusters so what we did what I did is I gave examples from these clusters to our experts to our cultural researchers for a part of our team and without giving them labels say like what do you think of the stripes here all the parameters influential users all the things that could well show to our clients do you think its a good community can we use it for a report and they told me for some they said look now then look right.
it doesnt feel good and for somebody said yes and essentially all all were doing within our data team as were trying to capture these amazing expertise on culture and communities and interaction with content and were trying to scale it and digitize it and build models in it so it turns out that like this cluster was stable very of very high quality so this is sometimes.
we can use in the future we can make the data processing path much faster and our experts dont have to do it manually anymore and go and search and analyze things manually so basically take-home message of this talk is that community analysis is goes far beyond just nodes and edges and interesting feathery graphs then and especially the Rd stage please treat each dataset like this is what Ive learned through sweat and blood and tears each data set is unique.
and if you treat it as such it will help you later in the production because if you again try just the standard methods so it will tell you okay I see baby mammal here in this data set figuratively speaking but if you treat it as unique it can tell you this is a kitten give it milk play with it this is a puppy and give it milk take it or walk.
and this is a hedgehog dont give it milk its poisonous for hedgehogs let it go into the forest so this this is the kind of granularity get if you treat your data well and you end up not poisoning hedgehogs right so this is just part our team have been expanding recently were at Codex di you can reach us if you have any questions and yeah I.
wonder if you have any questions for me now.
Codec helps brands remain relevant to their target audiences. Their content intelligence platform combines big social data with versatile machine learning solutions to tell brands what content resonates with the audiences they want to target, all before the brand spends big on high-production content. The result is interesting content, engaged consumers and increased marketing ROI.