Tag Archives: teaching

Databases are not sexy?

I’ve heard this line from a former colleague of mine. He was explaining why there are so little database people around, why IT students are not specializing in databases. That was his answer – that it not cool. “WEB-designer” sounds cool, “database developer” does not.

Several months passed since I heard that, and I was thinking: I should write a blog post about it! However, honestly – what else I can say except the things I’ve already said in this blog multiple times? That there is nothing more exciting, than exploring the endless possibilities of SQL, that nothing can be more rewarding than applying your magic to the jobs, which runs for an hour, and all of a sudden it runs in less than a minute and produces the same result:)

I suspect that the general public does not think that there is something behind a web page, and when somebody experiences a website slowness, they refer to it as “the internet is slow.” Also, the buzz words like “Big Data” often send a message, that “the databases are outdated,” and that there is something bigger, and cooler, and more fashionable, than just”databases,” which does not help a bit.

As I always like to be practical, and not only state a problem but come up with a way to solve it, I am now thinking about how to bridge this gap. I want to find ways to reach out to college and high school students and to give them some exposure to the Wonderful World Of Data. A couple of years ago when I was attending one of the Big Conferences, I’ve heard a discussion regarding “what our students want to be taught.” That was a time of Big Data just becoming a Big Deal :). Honestly, my opinion is that the student’s interest should not drive a curriculum entirely 🙂 and that that’s the right place to interfere.

Is anybody interested in joining me in this effort?

Advertisements

Leave a comment

Filed under Data management, SQL

My talk: What is a database?

I’ve presented this talk a week ago as a part of “Braviant Talks”, where people from different departments of our company talk about what their department is doing. It intended to be as non-technical as possible, which I think was achieved, at least to some extent, and … I just like how it turned out:). Enjoy 🙂

Leave a comment

Filed under Companies, events, talks

Can you teach somebody to optimize?

I’ve got a lot of feedback on my last blogpost; one question was posted on another paltform where I’ve reblogged the same text, and this question was so interesting that I’ve decided to write a separate post in reply.

So tell me, Hettie, are these kind of discoveries being reported at the conferences, and would they later become a part of a common knowledgebase? Will this sort of technique be taught in colleges? Overall, in your opinion, are the nowadays CS graduates more knowledgeable in this area. And, by the way, is there any special knowledge which is necessary to be able to resolve problems like this, or it’s just a combination of basic knowledge plus experience?

Great question! I have being teaching optimization for almost 15 years, and in general my optimism on this subject is very modest. You can teach technique, you can show tons of examples, and still there is no guarantee that a student who has attended your class will be able to correctly identify a similar pattern in the real life and to recall what specific technique which was advised for a similar problem. But may be I am just not good in teaching this stuff.

It’s tempting to say, that all that matters are years of practice, but this is also not always the case, since you, and me, and all of us can recall the situations in which years of experience did not help. And to be honest, I do not want to wait for “years of experience”! I want to be able to hire a new grad, who can optimize at a reasonable level. And I am not saying I never met this kind of CS grads, but what I am saying is that whenever it happens, it is due to the person’s individual abilities and/or a desire to excel in this particular skill.

Let’s be clear: the kind of breakthrough as I’ve described in the previous post does not happen often. In fact, you might never get anything like this in your live. But there are still tons of optimizations which can be done almost every day.

I would still argue that knowing the basics is a key. For the thousandth time over – you need to know your math, your calc and your algebra in order to understand how databases work. You might not be aware of some sophisticated indexes, but you should be able to identify, looking at the query, what’s it about< whether it is "short" or "long". And if you try and try, and it does not become faster, you need all your convincing powers to convince yourself, that this query can be optimized. There should be a way.

Another big thing I am trying to teach – to write queries declaratively. This is an extremely challenging task, because most of the requirements are formulated in an imperative manner. So what I am trying to teach, is that even if you find something like “previous three occurrences”, or “return to the previous status”, or “compare the top two” in the requirements, you still can write a declarative statement, carefully using CASE, GROUP BY and sometimes window functions. And it’s amazing, how fast everything starts running right away. Most of the time being able to reduce the number of table scans to one does the trick, except of… well, except of the situation, when you should do exactly the opposite.

I didn’t figure out yet, how to teach to distinguish one from another :). But the more I think about it, there more it seems like that’s what signifies that somebody can optimize, and that skill is the most difficult to teach. Most optimization classes teach you how and when to use some indexes, and how to choose the join order, but they do not teach how to rewrite a query itself.

… For the original question: no, I do not think they teach it in school. But I am trying to promote this idea!

Leave a comment

Filed under Data management, SQL

What I am looking (and not looking) for

Since  I’ve been looking for  database developers and DBAs for quite some time now,  and since virtually everybody knows about this, people often ask me: what are you looking for? What skills and qualifications you are interested at? Who would be your ideal candidate?

Most of the time I reply: please read the job description. I know that all the career advisors tell you “apply even if you do not have some qualifications”, but as for my job postings, I actually need those qualifications which are listed as “required”, and I would really prefer the candidates, who have “is a plus” qualifications.

Also, there are definitely some big DON’Ts, which I wish I would never ever hear again during an interview:

  • when asked for the definition of the foreign key,  starting your answer from “when we need to join two tables”
  • when asked about normalization, starting from “for better performance”
  • when asked about schemas, saying that we use then for storage optimization

Today however, I was asked a different question: why you are saying that you are looking for skilled candidates, and at the same time you admit, that for anybody who will get hired there will be a long learning process? if a candidate does not know something, doesn’t it mean he does not have enough skills? Doesn’t it mean, (s)he is underqualified?

I thought for a while before I’ve responded. When I was first hired as a Postgres DBA, it was a senior position right away, although at that time I did not know any Postgres at all. But people who’ve hired me were confident that not only I can learn fast, but also that I can generalize my existing knowledge and skills and apply it in the new environment.

To build on this example, there are two pre-requisites for success: knowledge and the ability to apply it in the real-life circumstances.

I think, that a person who wants to succeed as a database developer or a DBA should possess a solid knowledge of the relational theory.  But it is not enough to memorize your Ullman or Jennifer Widom, you need to be able to connect this theory to the real-world problems. This is such an obvious thing, that I never thought I will need to write about it, but life proved me wrong :).

Same goes in the situation, when a candidate has a lot of experience with other database, not the one you need. Yes, different database systems may be different, and significantly different. Can somebody who is very skilled  Oracle DBA be qualified for a position of Postgres DBA? Yes, if this person knows how to separate generic knowledge from the systems specifics.

if you know how to read an execution plan in Oracle, and more importantly, why you need to be able to read it, you will have no problem reading execution plans in Postgres. If you used the system tables in Oracle to generate dynamic SQL,  you will know exactly what you want to look for in the Postgres catalog.  And if you know that queries can be optimized, it will help no matter what a specific DBMS is. And it won’t help, if the only thing you know is how to execute utilities.

… No idea, whether this blog post is optimistic or pessimistic, but… we are still hiring 🙂

1 Comment

Filed under SQL, Uncategorized

The Data science education panel on ICDE 2017

In order to keep up with my own promises to tell more about what was happening on ICDE 2017 I am going to write about the panel on data science education. The panel was called “Data Science Education: We’re Missing the Boat, Again”, and I’d say it was probably the most interesting panel I’ve ever attended! By the time the panel was about to start, there was a huge crowd, and people were encouraged to take a dozen of remaining seats in the first and second rows (do I need to mention that I was at the front five minutes before the panel started?)

The topic of the panel described in my own words was the following. The Data science is a buzz word, students want to be taught “data science”, and there is a common believe that data science is about machine learning and statistical modeling while in reality 80% of time of the data scientists is spent on data pre-processing, cleansing, etc.

The panelists were given the questions which I am copying below.

If data scientists are spending 80% of their time grappling with data, what are they doing wrong? What are we doing wrong? What can we teach them to reduce this cost?
• What should a practicing data scientist learn about sys- tems engineering? What’s the difference between a data engineer and a data scientist?
• Scale is at the heart of what we do, and it’s a daily source of friction for data scientists. How can we teach funda- mental principles of scalability (randomized algorithms, for example) in the context of data systems?
• Perhaps data scientists are just consumers of our technol- ogy — how much do they really need to know about how things work? Empirically, it appears to be more than we think. There is a black art to making our systems sing and dance at scale, even though we like to pretend everything happens automatically. How can we stop pretending and start teaching the black art in a principled way?
• How can we address emerging issues in reproducibility, provenance, curation in a principled yet practical way as a core part of data engineering and data systems? Consider that the ML community has a vibrant workshop on fairness, accountability, and transparency. These topics are at least as relevant from a database perspective as they are from an ML perspective, maybe more so. Can we incorporate these issues into what we teach?
• How much math do we need to teach in our database- oriented data science courses? How can we expose the underlying rigor while remaining practical for people seeking professional degrees?

Bill Howe from UW was a moderator and the first panelist to give his talk.

The second one was Jeff Ullman, and thereby I have nothing more to say:)

Actually, i really liked the fact that he mentioned, that the math courses, linear algebra and calculus should be included into the Database curriculum.  I was always saying that nobody without Calc  BC should be allowed anywhere near any database.

The next panelist was Laura Haas, and again – what else I need to say, except of I’ve enjoyed each and every moment of her presentation?

One thing from her presentation which I find really important is that the Data science is not a part of the Computer Science, and not a part of Database management.  As Laura put it, “we provide the tools”, but not like “we” should teach the DS as a part of CS.

Next panelist was Mike Franklin from UC, and I hope this picture is clear enough for you to see a funny example of DS he is showing.

And the last one was very controversial Tim Kraska from Brown, who started with “he is going to disagree with all the rest of panelists” – and he did.

To be honest, it’s very difficult to write about this panel, because each of you can google all these great people, but you would need to see a video recording of this panel to really fell how interesting, and how much fun it was.

After the panel I talked to several conference participants, who like me are from industry and asked them what are they looking for when hiring recent grads. And literally everybody said the same thing that I was thinking about: they said they hire smart people with solid basic education, people who can solve problems, “and we will teach them all the rest”. Which I couldn’t agree more!

Paradoxically, the students think it’s cool to have something about “Data science” in their curriculum, they often think it will make them more marketable, but real future employers do not care that much!

Leave a comment

Filed under Data management, events, People, publications and discussions, talks

ACM Hour of Code Dec 5-11

I am re-posting the ACM newsletter about thee upcoming Hour of code – please consider organizing something in your community!

 

Organize an Hour of Code in Your Community During Computer Science Education Week, December 5-11

Over the past three years, the Hour of Code has introduced over 100 million students in more than 180 countries to computer science. ACM (a partner of Code.org, a coalition of organizations dedicated to expanding participation in computer science) invites you to host an Hour of Code in your community and give students an opportunity to gain the skills needed for creating technology that’s changing the world.

The Hour of Code is a global movement designed to generate excitement in young people. Games, tutorials, and other events are organized by local volunteers from schools, research institutions, and other groups during Computer Science Education Week, December 5-11.

Anyone, anywhere can organize an Hour of Code event, and anyone from ages 4 to 104 can try the one-hour tutorials, which are available in 40 languages. Learn more about how to teach an Hour of Code. Visit the Get Involved page for additional ideas for promoting your event.

Please post activities you are hosting/participating in, pass along this information, and encourage others to post their activities. Tweet about it at #HourOfCode.

Leave a comment

Filed under events, news, Uncategorized

What happens at the Hour of Optimization

I’ve being bragging about our “Hour of optimization” for a while, and now I want to talk about it in more details.

The original idea was that I will try to allocate some time for random optimizations: since people were coming to me with all sorts of optimization questions for a while, and a question could pop up in a chat at the most inconvenient time, like 5 PM on Friday, I’ve decided to have some “office hours”. This attempt was an epic fail, because literally nobody would come with their questions at that time, and everybody continued to bring me their questions at any other day/time of the week.

Then I’ve decided, since I already have this hour on my schedule, and since we’ve got so many newly hired database developers, to use this hour to show some optimization techniques, and how they can be applied for our tasks. This idea was definitely more productive, and I felt like people are learning something. On the other hand I knew that listening to the great optimization stories is not enough, that in order to master any optimization techniques you need to apply it at least ones.

So I’ve started to contemplate making other people talk at the Hour of optimization. First I’ve begun to approach those people about whom I knew for sure they performed some cool optimization task during the week. I’ve started to ask them to talk about these tasks – no formal presentation, just a conversation, and please show your code. After each of those presentations I’ve tried to initiate some exchange of opinions, some discussion: which technique was used, did we talk about it during my tuning classes, did we use it somewhere else recently? I liked it a lot when almost everybody started to participate. I’ve also started to invite people from other departments, since I’ve learned that many optimizations are going on all over the place 🙂

After some time I didn’t even need to know for sure that “somebody did something”. I was just asking: who would like to talk at the next Hour of optimization? Sometimes it’s even better: a developer may approach me and say: I have something for our optimization hour!

Yes, one can say that that’s a job of a database developer – to optimize queries. But I think it’s extremely important to reflect on your work, to think about the patterns you are using, to connect the current problems with the previous similar ones. Also, it gives the new database developers this feeling of almost visible growth of experience. We are definitely growing our database developers on the fertilized soil!

Leave a comment

Filed under SQL, Team and teamwork, Workplace