Databases and Formats in TransportationPosted: December 12, 2011
I had quite a few meetings recently and kept talking to a lot of people, engineers in the government, municipalities and researchers in the Montréal region and it is easy to get a bit desperate. Everyone needs data, everyone has datasets and databases, but no one seems to care enough to do some effort organizing the data so that everyone could exchange data back and forth, enriching common datasets.
Here is what happens instead: for a given project, a researcher will obtain a copy of a dataset from a public organization, say traffic volumes, import this data in one’s own system, organize it, do some analysis, and that’s it. There is probably some thought given to the data organization for the specific goal, and the dataset was probably expanded one way or another for the complete analysis. During this time, the public organization went its own way, collecting more data and putting it (or not) in its own system, maybe even doing modifications or treatments (cleanup) to the data previously shared. So now, you have two diverging datasets with added value, typically in different formats, that will never be reconciled. Next project, it will start all over again. And voilà! Your tax dollars at work…
Yes, everyone is busy, everyone has deadlines, but what kills me is that when you hint at the problem and offer people to actually try to do something about it, think for a second and try to bring some consistency, your are quickly dismissed as an idealist. “That would be nice, but we have work to do”.But that, organizing datasets and agreeing on data exchange formats and protocols, is the work!
That is why I support open standards and data: choose an open format, clean the data minimally and make it public. Actually engage with the community, find use cases, and iterate to find an data organization/format that would meet most needs. Montréal has made some progress, but there is still a long way to go, and public practices, including academic practices, are unfortunately part of the problem.