Actually, this all was Day 2 of the conference (Tuesday), because Day 1 was a workshop day. I was technically registered for a conference only; it won’t stop me from attending the workshops, but I’ve only arrived at about noon of Monday. So the only thing I’ve attended that day was a tutorial “Blind Men and an Elephant: Coalescing Open-source, Academic, and Industrial Perspectives on BigData”. The abstract can be found here, I have to admit, it was not that exciting, as it sounded, but that’s probably because by that time I was twice jet-lagged..
And my earlier post was about Tuesday – the first day of the conference. After the opening there was the first keynote, called “Data Crowdsourcing: Is it for Real?” presented by Hector Garcia-Molina. Let me tell you, that he is one of the “textbook characters” for me, and when you see in person somebody, who’s books and papers you’ve studied since you were a PhD student, you definitely want to hake this person’s hand and tell them how you never thought this is going to happen… and I did, but two days later :).
The keynote was absolutely amazing. And watching for Garcia-Molina live was equally exciting:). I’ve tried to take some pictures, but ended up just listening. So here is just one picture – but it describes the talk pretty well:)
“This is for real!” – states Garcia-Molina. He says: those old Wild West “wanted” ads where the first Data crowdsourcing: the sherif himself could not catch the fugitive, so he would announce a reward to those people who will provide any information which will help to catch the robber.
It todays world there are multiple examples of the data crowdsourcing. If you look at the picture posted above, Garcia-Molina was talking about one apparently well-known case, when the humans were hired to resolve captcha to create “non-human” hotmail addressed for the porn sites.
He also talked about the crowdsource based system, which I believe his team in Sanford is developing (since I do not have the text of this keynote, I’m not 100% sure). But in any case, here is an example. You need to buy a specific type of cable, but you have no idea how it is called and hence what to search for on Amazon. You take a picture of this cable, publish it into this service and “ask the crowd”, how it is called. The volunteering crowd respond with suggestions, and then you can search on Amazon with these suggestions and see whether they were correct. The feedback helps to improve the quality of the crowdsourcing.
After the keynote there was a panel “Big Data: Old Wine in New Bottle”. The panel description is here. Now, because in this case it’s not one but eight great people talking, and again, I do not have any slides from that, I am not sure whether I will be able to reproduce all the details of the discussion. There was one funny thing, when Hector Garcia-Molina was representing himself as his “evil twin Victor”, and he put the fake nose with mustache and glasses on, when he was talking as “Victor”. Here is one picture from this panel:
This is when Umesh Dayal was giving his 8-min presentation (and he was the only person on the panel whom I knew really well). I was introduced to Rakesh Agrawal a year ago (after reading his papers for years!), but this was just an introduction.
Anyways, the definition of the Big Data which I liked the most was the one given by David Lomet (another textbook character:)): Big Data is the date which is just a little bit too big for us to process with no problem at this time. And if you look back and try to remember what exactly you’ve considered to be “a big data” say 10 years ago – it’s not anymore! The most important thing is not the actual data volume but the techniques we use for this data manipulation – at some point we do not need a “special technique” for the data which we’ve considered very big before.
There were more panels and lots of interesting discussions later, and I will definitely tell more about everything – most likely over the weekend though.