Asking the right questions

Hard to believe, but it’s the end of Week 7 of the summer in my lab.  We’ve been very busy recently:  earlier this week my students ran some experiments (which they designed completely by themselves!) with human subjects, and they are now busily analyzing the data.  They’re giving a talk, all by themselves, next week.  And I’ve had them start writing up their results, with a conference paper as the eventual end goal.  So things move ever onward.

Brigindo at Dirt and Rocks has a really interesting post from earlier this week about helping (doctoral) students “find the story” in their research:  learning to frame their data correctly, to tell the correct story, and above all to start asking better questions.  This post really resonated with me.  My students are now knowledgeable enough to know what they are doing and why, but they are still learning how to ask the right questions.

Here’s an example.  We have a testbed network set up on which we run our experiments.  The network has a router running netem, which is software that allows us to essentially emulate the Internet:  we tell it how much data to lose, or how long to delay data, or how much data to let through, etc.  As part of the experiments we just ran, we set netem to lose certain percentages of data at different times.  At the same time, at each test computer we measured the actual amount of lost data using a program called ping.

When we looked at the results, my students noted that the measured data loss was off from the data loss percentage we applied.  Their immediate reaction was to assume that either the ping data was wrong, or netem was wrong.

This highlights a couple of thorny aspects of shaping the thinking of undergraduate researchers from “classroom thinking” to “research thinking”.  First, students are used to a more black-and-white view of the universe:  if one answer is right, then all other answers are wrong.  This of course is not how the universe works, and in fact research questions may have many right answers, or none at all.  In this case, both netem and ping may be “right”, or “wrong”, and what they are probably seeing is an artifact of the probabilistic nature of computer networks.

Second, students are not used to context switching between the big picture (what problem are we trying to solve and why?) and the smaller details (run this experiment, collect this data, do this analysis).  Again, what are students more familiar with?  Doing smaller tasks covering one or maybe a small handful of skills.  In research, you need to have an eye on both the big and small pictures, on the details and the trends.  To do so, you have to be able to take a step back from the data, or your observations/analysis of the data, and ask the right, framing questions.

The latter is probably the most difficult skill for new researchers to learn—heck, even the most experienced researchers can experience such tunnel vision from time to time.  I end up teaching by example in this case:  my students present their observations and analysis, and I ask them leading questions to get them to think more deeply and broadly about the data/results/whatever.  I ask them to think about what the results could mean, to get them to explore different explanations and hypothesis.   My hope is that by seeing me do this, they will eventually learn how to move beyond their initial reaction to and assumptions about the data and be willing to explore it in more depth or with a different lens.

In this case, I directly challenged their assumptions, and asked them to think a bit about the possible reasons for the discrepancies.  I also asked them to consider what an “acceptable” amount of error would be:  if the measured and applied losses are only off by 1%, is this a deal breaker or is it within the realm of plausibility? I could have just as easily told them “nah, this is normal” and be done with it, but I’d rather have them spend the extra time and come out with a deeper understanding of what’s happening than to have them blindly trust what I tell them.  Because critical thinking and the willingness to consider multiple possible explanations is a vital skill for any researcher, and one that requires practice, practice, practice.

An interesting publishing model

There’s been some renewed discussion in the blogosphere and CS media lately about the broken model of CS publishing. In the latest issue of Communications of the ACM, for instance, Moshe Vardi’s editor’s letter discusses hypercriticality, or the tendency of some reviewers to be overly and needlessly negative, and how this is harmful to our field:

We typically publish in conferences where acceptance rates are 1/3, 1/4, or even lower. [Actually, the top conferences in my field have acceptance rates closer to 10%!---acd] Reviewers read papers with “reject” as the default mode. They pounce on every weakness, finding justification for a decision that, in some sense, has already been made….If the proposal is not detailed enough, then the proposer “does not have a clear enough plan of research,” but if the proposal is rich in detail, then “it is clear that the proposer has already done the work for which funding is sought.”

What is to be done? Remember, we are the authors and we are the reviewers. It is not “them reviewers;” it is “us reviewers.”…This does not mean that we should not write critical reviews! But the reviews we write must be fair, weighing both strengths and weaknesses; they must be constructive, suggesting how the weaknesses can be addressed; and, above all, they must be respectful.

A mailing list I’m on pointed me towards this blog post, lamenting the state of systems-level HCI research (basically a good discussion of what type of work is “valued” by a subfield, and how this plays out in the review cycle—I can certainly relate!), and concluding with the following:

What is the answer? I believe we need a new conference that values HCI systems work. I also have come to agree with Jonathan Grudin that conference acceptance rates need to be much higher so that interesting, innovative work is not left out (e.g., I’d advocate 30-35%), while coupling this conference with a coordinated, prestigious journal that has a fast publication cycle (e.g., electronic publication less than 6 months from when the conference publication first appears). This would allow the best of both worlds: systems publications to be seen by the larger community, with the time (9-12 months) to do additional work and make the research more rigorous.

These are all great questions and valid points, but it’s easy to just wring your hands and say “oh well, I need the publications so I’ll just play by the rules” rather than trying to change the system.

But one conference seems to have been paying attention to the discussion.

VLDB, a databases conference, is trying out a new reviewing model, attempting to combine the best features of journal pubs (multiple reviews and rebuttals) and conference pubs (timely publication, quick(er) turnaround time).

PVLDB uses a novel review process designed to promote timely submission, review, and revision of scholarly results. The process will be carried out over 12 submission deadlines during the year preceding the conference. The basic cycle will operate as follows:

A Rolling Deadline occurs on the 1st of each month, 5:00 AM Pacific Time (Daylight Savings observed according to US calendar).

· Initial Reviews are intended to be done within one month, and they will include notice of acceptance, rejection, or revision requests.

· Revision Requests are to be specific, and moderate in scope. Authors will be given two months to produce a revised submission.
· Second Reviews are intended to be returned within one month, after which a final decision will be made. Second reviews are to directly address the authors’ handling of the requested revisions

What’s more, they also address some of the points made in the first two articles I linked to, and common complaints about the review process (emphasis mine):

The revision process is intended to be a constructive partnership between reviewers and authors. To this end, reviewers bear a responsibility to request revisions only in constructive scenarios: when requests can be addressed by specific and modest efforts that can lead to acceptance within the revision timeframe. In turn, authors bear the responsibility of attempting to meet those requests within the stated timeframe, or of withdrawing the paper from submission. At the discretion of the Program Committee, mechanisms may be employed for reviewers and authors to engage in further dialog during the revision period.

This is a really fabulous idea. There are still issues—you trade off between submitting early in the review cycle (and getting more feedback) and having a long turnaround time to publication, and it’s unclear if people will still wait until the “hard” deadline (March 1) to submit their work. But if it works, this could really revolutionize how we think about both conference and journal publishing.

(Oh, and it appears they have a semi-loose connection/agreement with a journal, encouraging submissions of extended version of the conference papers—not clear if these will be “fast-tracked” at all, though.)

I will be watching what happens, and hope that the VLDB organizers prepare some sort of summary of what worked well and what could be improved (and whether this actually works!) so that other conferences can adopt and adapt this particular model.

What are you searching for?

If you’re visiting this blog, most likely something about writing conference papers or Moodle, apparently.

I love numbers and stats and trends and all that fun stuff, so about once every other week I take a look at my blog stats.  I like to see where people are coming from (referrals), what posts they’re reading, etc.  (My favorite discovery:  If you put “getting things done” in your blog title, you get a lot of hits!)

The most fun part, though, is looking at the search terms people use to get to this here blog.  So, what are the most popular search terms of all time for this blog?*

  1. Searches for me directly (N=59).  This was by far the most popular search item.  I had to laugh, though, about the 2 people who found me by searching for my URL.  Um, if you already know my URL….oh, nevermind.
  2. Moodle (N=37).  People came here to complain or learn about font sizing in Moodle.  Well, at least I fulfilled the first wish…
  3. How to write a conference paper (N=34).  Um, yeah, I hope those people weren’t looking for actual advice on that one…
  4. This is what a computer scientist looks like (N=16), and variations thereof like “how to look like a programmer” and “how to look like a scientist”.  I’m probably not the world’s authority on either, since no one believes me anyway when I tell them what I do for a living!
  5. Barbie-related searches round out the top 5 (N=14).

And here are some of my favorite random searches that led people here:

  • why do computer scientists like mountains.  Good question!  I know I like mountains, but I can’t speak for all computer scientists…
  • becoming a computer scientist when older.  Not sure how I am the authority on this, since I went straight through to grad school….although I did take a bit of a break before becoming a professor…
  • good problems to talk about.  I kinda dig this one.  I hope these 2 people found something to talk about!  (But as we all know, writing good problems is hard.)
  • innovative female design.  Again, another cool search term, and I wish I had more interesting links than these—but I guess it’s a start…

However you got here, whatever led you to this blog, thanks for stopping by and reading!

(And on a totally unrelated note:  I’ve been very bad about putting up any sort of blogroll, but I’m going to put one up Real Soon Now.  So if you’re a regular or semi-regular or hey, even a drive-by reader and you’d like to be on the blogroll, leave a comment.  Thanks!)

* I’m not exactly sure how WordPress calculates these—for instance, none of the GTD-related searches are showing up in these stats, and that’s been a fairly popular search term this week.  So these may be a bit inaccurate or out of date.

Publishing calculus

For those of you reading this blog who are not academic computer scientists:  In CS, most of the publishing is done through conference proceedings.  Conference submissions, unlike in many other fields, consists of full-fledged papers which undergo a single cycle of peer review with an up-or-down decision at the end; these full papers are then published as such in the conference proceedings.  The conference paper cycle is preferred because the time-to-publish is much, much faster than journals—which is much better suited towards the fast-paced nature of CS research. However, journal publications are still required and necessary for tenure and promotion at most places.  And, at least according to conventional wisdom, journal articles are seen as more “complete” records of research results (often a journal article will combine and build upon results from several conference papers).

Because the journal review and publication cycle can be so slow compared to the conference review and publication schedule, conferences have become highly competitive—in fact, the top conferences in CS, most would argue, are more selective and more prestigious than most CS journals.  The slow journal publication timeline, some argue, has led to the proliferation of CS conferences (and the reduced value of attending conferences, which in many cases these days consist mostly of those presenting papers), which, some argue, leads to even slower timelines for journal publishing.  (There was a lot of discussion around the blogosphere and in the Communications of the ACM about this very issue last year [see these editorials]—John Dupuis at Confessions of a Science Librarian has a great set of posts summarizing the discussion here and here.)

This leads to some interesting calculus when it comes time to publish and submit some results.  If something is brand-new and never published, clearly it goes to a conference.  Conventional wisdom might say that if it’s building upon something you’ve published at a conference, or building upon several other papers, then send it in to a journal.  Or should you, particularly if you know it might be years before your paper sees the light of day, if at all?  Should a journal still get your best and most complete work, or is it worth instead sending it to a highly competitive conference?

I currently have a journal article under submission.  I originally submitted it in 2008.  It has already gone through three cycles of review (original submission plus 2 revisions), and yet it is no closer to being published today than it was 2 years ago.  The main contributions of the work have already been disseminated via a couple of conference publications, but there is still substantial new work represented too—although this work is now more than 2 years old.  The project has moved well beyond what’s represented in that journal article.  And yet, it continues to live in that special purgatory—not rejected, yet refusing to be accepted.

At this point, I will probably submit it for one more round of review.  I could submit it to another journal, but there are problems there, too.  I’d probably be looking at another couple of years to a decision, and I have no idea if another journal would be more or less likely to accept this article for publication.  Plus I’d have to deal with a whole new set of reviewers and editors, some or all of whom might have much different ideas about how I should present and frame my work for their journal.  Also complicating things is the fact that my work straddles just enough subfields, and is unconventional enough, that finding an appropriate journal is tricky.  (Since my research falls into subfields X, Y, Z, and a bit of Q, X journals often say “this is really Y work”, Y journals say “nope, this is Z work”, etc.)

So the thought has crossed my mind, more than once, that perhaps I should forget about publishing this work in a journal and just repurpose it into one or more conference papers, and target highly selective conferences.  That way I still get “credit” for publishing in a top location without the super-long peer review cycle.  The fact that I’m even considering this shows you how weird and messed up the whole publishing model in CS has become.

The problem is that even though I have tenure, I still do need the journal pubs if I want to be promoted to full professor.  So most likely I’ll continue to jump through the hoops to get my work published in a journal—even though by the time that happens, if it happens, the work will be out-of-date.

The question is:  will this article be published before I’m ready to send out my next journal article?  I wish I had a more definitive answer than “maybe”.

Self-sufficiency vs. getting things done

It’s Week 4 already for my research students, and summer research is in full swing.  My students have spent much of their time so far working on some data collection and data analysis tasks, and had their first opportunities to do some technical writing (which will be good practice for the conference paper they will write at the end of the summer).  They’ve learned a few new programming languages (Java, Perl) and modified code written by others (and not very well commented).  They’ve hit a lot of dead ends and experienced a lot of “huh, that’s odd” moments with the data.

In short, they’ve had a pretty full research experience already!

As a research mentor, I’m facing a couple of key challenges at this point in the summer:

  1. Helping my students to become more self-sufficient, while still encouraging them to ask questions
  2. Carving out time to get my own work done, while still being an effective mentor to my students

The first challenge is particularly tricky when working with students who are not familiar with the whole “doing research” drill.  Doing research means that we’re working with questions without answers, essentially working at the edge of our knowledge of a field.  Students are used to problems that have “correct” solutions, and are used to having someone to turn to (a professor, a classmate, the almighty Wikipedia) for that answer.  Students new to research will sometimes become paralyzed with the unknown, and instead of trying something will either do nothing, or will ask questions too often (confirm everything before trying, for instance).

My responsibility is to get them to try out ideas before asking me questions—in essence, getting them to trust their instincts, and to develop a research instinct in the first place.  So, for instance, when a student comes to me and says “the program doesn’t work and we don’t know why”, instead of running down to the lab, I’ll ask if they’ve isolated the problem to a particular section of the code.  If they’ve isolated the problem, I’ll give them a few things to try—or, better yet, I’ll ask them “what do you think you should try next?”  By not rushing to bail them out, and by forcing them to confront their demons, they become more willing and able to do the initial legwork on their own, and end up coming to me with more interesting questions because they’ve already answered some of them on their own.

The flip side is knowing when to spend the time walking them through the answer.  This afternoon, for instance, I decided to sit down with one of my students and walk her through a particularly tricky section of code which had to be modified.  Sure, I could have said “here is the API, figure it out yourself”, but I sensed that both of us would be more productive if I put in the face time with her.  And it turned out to be the correct thing, because we discovered the code in question is quite inefficient, and put fixing the code on our to-do list.

This leads, though, to the second challenge:  working with 4 students, I am less in control of my schedule than I’d like, because I never know when I’m going to have to spend the afternoon in the lab with them, or when a technical problem they can’t fix is going to crop up.  I am queen of the daily to-do list, but lately much more is not getting done and migrating from daily list to daily list.  I need to carve out some time that I know I won’t be interrupted so that I can work on the bigger/more intellectually challenging tasks, and balance those with the smaller, more deadline-driven tasks.  I’m also experimenting with putting specific blocks of time on my calendar for specific tasks—we’ll see if that works better this week.  I think I also have to be a bit more realistic about what can get done in a week, and be satisfied with that.

The challenge in the comming weeks, for me and my students, is maintaining momentum and morale, as the project intensifies and the problems get harder.  Hopefully we’ve laid a solid enough foundation in these first few weeks to maintain both.