Saturday, April 25, 2009

Collaborative coding

I would promote this idea to a higher level than just 'snippet sharing'. I can envision something called collaborative coding.

Collaborative coding embraces the uniquely identified code snippets but enhances it with a timestamp of modification and user comments.

Say,

/* Calculates the fibonacci number for the parameter
guid: {1c125546-b87c-49ff-8130-a24a3deda659}
date: 2009-04-25 19:00
comm: First version */

int fibonacci(int n) {
if (n<=2) return 1;
else return fibonacci(n-1)+fibonacci(n-2);
}

/* Calculates the fibonacci number for the parameter
guid: {1c125546-b87c-49ff-8130-a24a3deda659}
date: 2009-04-25 19:25
comm: Can't use recursion as my stack is too small, changed to an iterative version */

int fibonacci(int n) {
if (n<=2) return 1;
int j = 1;
int k = 1;
int ans = 0;
for (int i = 3; i <= n; i++) {
ans = j + k;
j = k;
k = ans;
}
return ans;
}

Then tools can show diffs over time of the different versions of snippets, with comments (if available) and be made to push back the snippets to available repositories.

You could actually end up having conversations in code. The identifier should identify function, and diff algorithms can show what has been changed and hopefully why.

Friday, April 24, 2009

WWW2009

I was lucky enough to get funding to go (without having sent a paper nor being an active participant in the research community) to the last day of WWW2009 Madrid!

Lucky indeed as I hadn't been before in such a global and well organized conference. I'm sure there were some problems (the people in charge apologized in the closing ceremony) but I personally didn't see any.

I went to the following talks:

- Invited talk about web infrastructure by Pablo Rodríguez the Internet Scientific Director at Telefonica. He explained the problems current internet architecture which mainly consist of obsolete, generic protocols, where distribution is hard. He suggested to specialize protocols (where you route content, not hosts) and push things down a layer or two (where every router has a terabyte storage to do an akamai-for-the-people).

- Invited talk about web search in web2.0 by Ricardo Baeza-Yates, VP of Research for Europe and Latin America at Yahoo!. He showed a pretty interesting summary of the recent research done by him and his team on searching text/images using mainly user generated content (tags, tags of objects in pictures). Summing up, using user generated content can enhance search results significantly.

And some highlights from the papers presentations:

- Social searching: Porqpine

- Synonym extraction: Ways to detect that queries like "vaio 720 laptop" and "vaio 720" are synonyms in an efficient way

- A framework to manage digital rights: Interesting idea, a good model to license, but sadly it requires everyone to use a centralized website and to download an extra application to access the actual content. It just doesn't pass the grandma test.

- A reference implementation of the web coverage processing service standard, Rasdaman (plus some web tools which use it to justify its presence on WWW)

- Leveraging web search engines to query databases: A clever approach to enhance search results with structured database results, typically useful when searching for entities which are usually in structured DBs (movies, electronic devices and so on)

- YUI 3. I wasn't present at the talk, but looks nice

All in all, it was a very good experience and I now have a CD with 215 pdf files to review (between papers and posters)... that's some hard work! If I manage to do any of it and see something worth, will comment on it. Pics here.

Saturday, April 18, 2009

Bach's passion

It's amazing how a good communicator can inspire. Especially a passionate communicator who shows how much he cares about the subject at hand. If you haven't seen this, you should: James Bach's talk @ Google.

What I found really interesting is the combination of subjects he manages to make relevant for his field. We all should follow his lead and find inspiration and guidance from more than a single discipline.

Learn, not just about technology, but about social sciences which make the technology relevant and also make the development of the technology easier. For example, if you could communicate better and understand people's limitations on what they can perceive, you would be able to gather better requirements, or capture bug reports more efficiently, or just develop humane software (*). Not to mention the applications outside the world of software (love life, friends, networking your career).

In fact, this all points to aspiring to becoming a generalist, the more you can aprehend about different subjects the more you can then combine in different ways under different contexts. Current schooling does not aim to this, but you can always take care of your education (in addition to or instead of the regular, your call). In the words of the great Paul Lutus "do not let your schooling interfere with your education".

(*): More on humane software some day.

Monday, March 30, 2009

The case against the case against everything buckets

Alex Payne writes a good piece about how everything buckets are awful.

He defines everything buckets as applications where you can store every imaginable piece of data and then search over the RTF or PDF documents it generates. Or, in his own words:

These applications claim to be “your outboard brain” or “your digital filing cabinet” or similar. They go by many names: Yojimbo, Together, ShoveBox, Evernote, DEVONthink. There may be differences in their implementation and appearance, but these applications are all of the same sinister ilk. They are Everything Buckets.

He then continues on how these kind of applications suck because they try to do many things at once and fail miserably at most if not all of them. Additionally, he says filesystems already have all the features that allow you to organize all the information you want (you can even have tags via symlinks, he stresses). There's no need to pay for these lousy implemented, do-many-things-wrongly apps.

Computers like structured data and so filesystems are a good way to store hierarchical information without the need for expensive indexing tasks current operating systems usually perform (Spotlight, Beagle, and so on).

This is all good and well, though there are people who think hierarchical filesystems suck.

Anyhow, even if everybody agreed that current filesystems are Sliced Bread 2.0, this well construed argument lacks a pretty important consideration, which is people want the computer to organize their information for them, even if current applications blow. Alex is too quick in dismissing the concept, only because there is no good implementation of it.

All these arguments could have well been used against web search 12 years ago, and I need not mention what happened to it. A good everything bucket would not prevent you to do work with your computer when you want to, nor would it corrupt its data, additionally it would cross reference the data and automatically insert what you throw at it into relevant buckets with little user input.

Alex's advice is right in that, as of now, you are better trying to organize your information manually via the filesystem (YMMV), but he's wrong in that you will never be able to have a good everything bucket application, and even more wrong when he says everything buckets shouldn't exist.

The need certainly exists. The fact that current individual personal computers suck at managing unstructured information and take too much time at creating structure from the chaos is merely accidental.

Let us create a good Everything Bucket soon enough.

UPDATE: As always, I arrive late to the party... and, of course, some people like current everything buckets. I still think they can and should do a lot better.

Monday, March 9, 2009

Excel Interop COMException HRESULT 0x800A03EC via C# (.NET 2.0)

Today I'm writing some code to read Excel files via the interop assemblies. I was reusing working code and it suddenly started failing with a COMException which said the error code was HRESULT 0x800A03EC.

I'm using the following code:

string excelLocation = @"C:\test.xls";
Microsoft.Office.Interop.Excel.Application _app = new Microsoft.Office.Interop.Excel.Application();

Workbook wbook = _app.Workbooks.Open(excelLocation,
Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing,
Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing,
Type.Missing, Type.Missing, Type.Missing, Type.Missing);

Worksheet sheet = (Worksheet)wbook.Sheets[1];

/** rest of the code **/

and I get the COMException in the line where I open the workbook.

So, I open the file via regular Excel and I find out it finds the file somewhat corrupt, Excel tells me it had fixed some font and sheet name problems with the file and that if I want to make that fix permanent to save the file. I save the file and then the code works!

But wait, this is not the proper solution if you want a truly automated processing of the files. Given that I don't control the Excel generation process I cannot fix the corruption on the source side of things, so I'll have to fix it on my side.

I started to dig into the Open method parameters and found that the last parameter is suggestively called 'CorruptLoad' so, the solution:

string excelLocation = @"C:\test.xls";
Microsoft.Office.Interop.Excel.Application _app = new Microsoft.Office.Interop.Excel.Application();

Workbook wbook = _app.Workbooks.Open(excelLocation,
Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing,
Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing,
Type.Missing, Type.Missing, Type.Missing, true);

Worksheet sheet = (Worksheet)wbook.Sheets[1];

/** rest of the code **/

And that opens it without throwing an exception and gets all the data in the sheets, at least for this particular type of file corruption.

Tuesday, January 13, 2009

Wondered this too?

I had always wondered why you only see 3 GB only. Usually simplifying explanations to you need 64 bits OS so see 4GB.

Real explanation is here