journal impressions of icwsm '09
I attended the International Conference for Web and Social Media (ICWSM) this year. I was left with a few lasting impressions, and some exciting papers to read.
SUMMARY
I wasn't quite sure what to expect when I showed up at ICWSM this year. To be honest, I was expecting presentations similar to what I saw last year at WSDM 2008, which basically consisted of a lot of really heavy math presentations. I tend not to enjoy that style. Instead, I want to know a presenter's findings, so that I can read the paper if they are relevant to me. Fortunately, I found the latter style at ICWSM 2009, which made the conference very enjoyable.
The schedule consisted of presentations broken down into the following themes:
- Community
- Psychology & Users
- Ranking
- Data Mining & Sentiment Analysis
- Panel Discussion (on User Experience Design)
- Modeling Social Dynamics
- Leveraging Diversity
I also had a good discussion with Anton Kast, the VP of R&D at Digg, about Digg's future, tweetmeme, and what he's working on at Digg. It turns out that they've realized that the news slots on their front page are an extremely scarce resource. They are working to personalize what the user sees, so that there is a more diverse and relevant presentation for each individual user (see my last observation below).
Lastly, Jon Kleinberg gave a very thought provoking talk about meme-tracking and the 24 hour news cycle. You can see some of the results at MemeTracker. The other half of his talk was devoted to tracking chain letters.
OBSERVATIONS
Academia and industry remain somewhat disconnected. It seems as though academia recognizes all of the problems, and is actively working to solve them, but it's just missing a lot of technologies and perspectives that you only get when you work in industry. Watts made an off hand comment at one point in his presentation about the unwieldy nature of a 200 million node graph, and not being able to fit it into memory. This caught me a bit off guard. I sat at the conference working on Pig and Hadoop during most of the discussions, yet neither of those technologies were mentioned once.
Another example of this disconnect was a conversation that Monica Rogati and I had with a grad student studying machine learning in biotech. During the conversation, we had to explain to him what EC2 was. I should also note that I needed a lot of explaining about the biotech area as well. We just need more cross pollination all the way around.
One promising sign on this subject was the use of Mechanical Turk, which the sociologists in particular seemed to love because it provided instant test subjects.
In addition, academia seems to be starving for interesting data. They're hungry for it. Hopefully, we, in industry, can find more ways to give them what they want. We can't let anonymizing and other barriers be excuses. I'm convinced that our payback for providing data will more than offset any data preparation effort on our part.
Blogging is still relevant. I was amazed at the use of both Twitter and old-school blogs during the conference. Twitter is really a powerful medium. I followed #icwsm the entire time. Jess Tsai was a maniac micro-blogger during the conference, and became famous enough to make it onto "the big board" (projector) at the conference. Old school blogs also remain relevant as a great (the best?) way to build and control your identity. I'm increasingly viewing my blog as the center of my online presence.
Sentiment analysis is a hot topic. Lillian Lee opened up the conference with a great presentation on sentiment analysis, and the topic was a recurring theme through all three days. My impression of sentiment analysis remains: it's a very cool idea, everyone recognizes the value, and it's extremely difficult to do well. If you can get 80% precision in a specific genre, you're doing really well.
A lot of energy is being put into aggregating, summarizing, and dynamically presenting user created content. There was a really interesting looking poster board on MakeMyPage, which focused on aggregating and combining multiple sources of social media. There were also numerous other presentations on blog aggregation/ranking.
There was no visualization section! This part really bummed me out. Many of the presentations had nice R graphs and charts, but the only really exciting visualizations that I saw were Kleinberg's Baby Names clone on MemeTracker, and an awesome project called Gephi. I haven't had a chance to play with Gephi yet, but it looks very promising!
FAVORITE PAPERS
Gesundheit! Modeling Contagion Through Facebook News Feed
This was one of my favorite presentations. It shows how "Page Fanning" (users becoming fans of a Facebook page) does not spread through "social influencers". The presentation opened up the conference, and is related to Watts' discussion on "social influcencers", and seemingly contradicts The Tipping Point.
Using Transactional Information to Predict Link Strength in Online Social Networks
Indika Kahanda presented this paper, which has implications in link strength, which is something I'm always interested in at work. The paper found that network proximity was a great indicator in whether two users would later connect. I can't find the PDF for the paper, so I've linked to Indika's page.
You Are Where You Edit: Locating Wikipedia Contributors Through Edit Histories
This paper shows how to geo locate users based on the pages that they edit. This, of course, does not work for everyone. Michael Lieberman put this talk out, which interested me mostly because my old position was as a Software Engineer in PayPal's Risk Group. The idea is conceptually very similar to de-anonymizing AOL's search data, or the NSA's patent on IP geolocation triangulation through socket response times. It was also news to me that you can download the complete Wikipedia library (including all edit history and pictures) for free!
Motivational, Structural and Tenure Factors that Impact Online Community Photo Sharing
This was a great presentation about why users share photos on Flickr. The paper has implications for engagement and adoption. One of the best parts of the talk was when the presenter pointed out that the longer a user has been signed up for your site, the less likely they are to be engaged with it. This seems counter intuitive, but it's completely true. In another presentation that modeled blog posting behavior, the same thing was observed: users are highly likely to post when they first create their blog, and slowly become less likely to do so as time goes on.
LINKS
ICWSM 2009
Twitter #ICWSM Search
Jennifer Neville's Predictive Modeling with Social Networks Tutorial blog comments powered by Disqus
