Category Archives: publications and discussions

When working remotely does not work

About a week ago this article from Atlantic landed in my Inbox from one of many newsletters.

And now, returning home from just three days of working remotely, I think about how I agree with many of the points this article makes.
In general I am very thankful, that I have an option of working remotely periodically, but only because otherwise I will have to skip work entirely each time I travel.

I’ve always been a great proponent of remote work, arguing that it can be as efficient, as working in the office if not more, using my work with NY Department of Education as an example. But I have to agree with this article: times have changed. And now I have no doubts, that working remotely for me in my current position should be very limited – both to my own benefit and the benefits of the projects I am working on (which, to be honest, is almost the same thing :))

Advertisements

Leave a comment

Filed under publications and discussions, Team and teamwork, Workplace

My presentation at PG Open 2017

I was waiting for the videos to be uploaded, but since I am not sure when this is going to happen I figured out I will at least post my presentation.  If you compare it with the one I’ve presented at PG Open 2016, you’ll see lots of improvements.

Here is it:

 My Presentation at PG Open 2017c

Leave a comment

Filed under publications and discussions, talks

The Data science education panel on ICDE 2017

In order to keep up with my own promises to tell more about what was happening on ICDE 2017 I am going to write about the panel on data science education. The panel was called “Data Science Education: We’re Missing the Boat, Again”, and I’d say it was probably the most interesting panel I’ve ever attended! By the time the panel was about to start, there was a huge crowd, and people were encouraged to take a dozen of remaining seats in the first and second rows (do I need to mention that I was at the front five minutes before the panel started?)

The topic of the panel described in my own words was the following. The Data science is a buzz word, students want to be taught “data science”, and there is a common believe that data science is about machine learning and statistical modeling while in reality 80% of time of the data scientists is spent on data pre-processing, cleansing, etc.

The panelists were given the questions which I am copying below.

If data scientists are spending 80% of their time grappling with data, what are they doing wrong? What are we doing wrong? What can we teach them to reduce this cost?
• What should a practicing data scientist learn about sys- tems engineering? What’s the difference between a data engineer and a data scientist?
• Scale is at the heart of what we do, and it’s a daily source of friction for data scientists. How can we teach funda- mental principles of scalability (randomized algorithms, for example) in the context of data systems?
• Perhaps data scientists are just consumers of our technol- ogy — how much do they really need to know about how things work? Empirically, it appears to be more than we think. There is a black art to making our systems sing and dance at scale, even though we like to pretend everything happens automatically. How can we stop pretending and start teaching the black art in a principled way?
• How can we address emerging issues in reproducibility, provenance, curation in a principled yet practical way as a core part of data engineering and data systems? Consider that the ML community has a vibrant workshop on fairness, accountability, and transparency. These topics are at least as relevant from a database perspective as they are from an ML perspective, maybe more so. Can we incorporate these issues into what we teach?
• How much math do we need to teach in our database- oriented data science courses? How can we expose the underlying rigor while remaining practical for people seeking professional degrees?

Bill Howe from UW was a moderator and the first panelist to give his talk.

The second one was Jeff Ullman, and thereby I have nothing more to say:)

Actually, i really liked the fact that he mentioned, that the math courses, linear algebra and calculus should be included into the Database curriculum.  I was always saying that nobody without Calc  BC should be allowed anywhere near any database.

The next panelist was Laura Haas, and again – what else I need to say, except of I’ve enjoyed each and every moment of her presentation?

One thing from her presentation which I find really important is that the Data science is not a part of the Computer Science, and not a part of Database management.  As Laura put it, “we provide the tools”, but not like “we” should teach the DS as a part of CS.

Next panelist was Mike Franklin from UC, and I hope this picture is clear enough for you to see a funny example of DS he is showing.

And the last one was very controversial Tim Kraska from Brown, who started with “he is going to disagree with all the rest of panelists” – and he did.

To be honest, it’s very difficult to write about this panel, because each of you can google all these great people, but you would need to see a video recording of this panel to really fell how interesting, and how much fun it was.

After the panel I talked to several conference participants, who like me are from industry and asked them what are they looking for when hiring recent grads. And literally everybody said the same thing that I was thinking about: they said they hire smart people with solid basic education, people who can solve problems, “and we will teach them all the rest”. Which I couldn’t agree more!

Paradoxically, the students think it’s cool to have something about “Data science” in their curriculum, they often think it will make them more marketable, but real future employers do not care that much!

Leave a comment

Filed under Data management, events, People, publications and discussions, talks

Once again about women in science

A friend of mine have sent me this link almost a month ago, but it’s just now that I got to writing about this article in The Guardian.

I liked it a lot; the most important thing it is stressing – women are already doing science, so there may be less need at this point “to encourage girls to do science” The statistics show the actually in many areas of science there are more women that man!

Then the question comes – why in this case there are less women being published?

The New Scientist blames the “choice” to have a family. It points to a study in this month’s American Economic Review that shows women incurring earnings penalties in science if they have children. A recentHouse of Commons science and technology committee report goes into more detail, saying that scientific research careers are dominated by short-term contracts with poor job security – at the very time of life that women need to have children (if they want them). The female postdoctoral scientist faces difficult decisions while stuck on fixed-term contracts before tenure, with very little in the way of institutional support. Women should not have to choose between career and family, says the science magazine. But surely male scientists face similar choices?

Turns out – not. And what follows is something we all knew for a very long time. I remember how may years ago, when I was a consultant at the City of Chicago my single-mom-consultant co-worker used to say: I need stay at home wife!

Not a husband mind you :). So, here is how the article goes:

Apparently not. European social science research shows that male and female scientists often have different types of partners: male scientists more frequently have a stay-at-home partner looking after the children, while female scientists are more likely to have another scientist as a spouse. So male scientists might not need family-friendly working practices to have a successful career but female scientists do. Hence the loss of women in the “leaky pipeline” of scientific careers. And that is to say nothing of the research that found scientists perceived job applicants to be less competent when they had female names.

Sad, but true.

You know what it made me to think about? At ICDE and other conferences of the similar caliber the organizers usually report the submissions and accepted papers stats by countries and regions. Why not to report by gender? Some of my friends have already asked me looking at the pictures from the conference – why there were so little women?!

I understand, that it’s not always easy to derive gender from the name, and I also understand, that you can’t mandate people to submit their gender. But I was thinking that at least when you register for the conference, you might be ask – specify your gender (and you might “prefer not to answer”, there should be always be this option).

Ideally though I would love to see the stats on something like: how many women among the authors how may are the main authors, how many are registered for the conference, how many actually come and who is presenting:)

Leave a comment

Filed under publications and discussions