Kurt's Weblog: December 2011 Archives

12.28.2011 15:01

Research Tools video makes 1000 views

Wow... I can't believe my research tools video #1, RT-1 emacs keyboard introduction, just made it to 1000 views. That's not quite the 3 million for the introduction of Google Oceans or the 167 thousand views of Jenifer doing the voice over for New Seafloor in Google Earth Tour, but I still am really impressed.

So, if you've watch the video... Thank You! It means a lot to know that the work I put into it is being used by many.

1000 views in YouTube

Posted by Kurt | Permalink

12.27.2011 11:51

Beidou navigation system

China Begins Using New Global Positioning Satellites [slashdot]

Seriously? From the wikipedia entry on the Chinese version of GPS:

A signal is transmitted skyward by a remote terminal.
Each of the geostationary satellites receive the signal.
Each satellite sends the accurate time of when each received the signal to a ground station.
The ground station calculates the longitude and latitude of the remote terminal, and determines the altitude from a relief map.
The ground station sends the remote terminal's 3D position to the satellites.
The satellites broadcast the calculated position to the remote terminal.

There should be 1 more step: everyone who has hacked any encryption keys can track all navigation people or devices. Additionally, it should also be possible to snoop the transmissions of the devices as they broadcast their requests for position also giving away positions. Do they have the bandwidth and processing power to scale this is there end up being many many nodes all asking for positions? This also means that units have to expend power to transmit when they want a position. Seems like a seriously flawed design. But then again, if you are the Chinese government and want to track everyone in the world, this is a smart design.

The general global navigation satellite system (GNSS) (wikipedia)

Posted by Kurt | Permalink

12.26.2011 14:25

RT Video 20 - Secure Shell (ssh), crontab and Emacs tramp

Grades went in yesterday for the Research Tools course. I have many more videos that I would like to make. Once I get to Google, I will be likely making a separate playlist from Research Tools 2011 YouTube Playlist.

I would kill for feedback (positive, negative, or otherwise) on any of the class. Especially helpful would be pull requests for things in mercurial!

video/video-20-secure-shell-ssh-sftp-scp.org

Posted by Kurt | Permalink

12.19.2011 22:50

Dawn Wright GIS classes at OSU

I decided to try to at least one class every day of Dawn Wright's class: Geographic Information Systems and Science - GEO 580, OSU Geosciences (2010) (on iTunesU). Dawn is now at ESRI as ESRI Chief Scientist. I haven't ever had the chance to meet her. As a total non-related point, I finally got to meet Jim Bellingham at AGU this year. I was supposed to meet him back 12 years ago or so, but AUV operations and other things caused us to not cross paths till now. As is typical of AGU, I can't remember where it was or why I met him.

So far I'm 1.5 classes in and not sure what to think of it. I cowrote a GRASS GIS tutorial at NASA Ames in 1993 and was an Arc/Info user in the 1994-5 time range at the USGS. It might be good for me to think through the material that she presents in her class, especially since she has a very different style than I do.

She also has these two courses on iTunesU: Geographic Information Systems and Science - GEO 465_565 2009 and Geovisualization Lecture Series - Winter Geography/Geology Seminar 2009.

Dawn Wright teaching GIS

I gave it a go trying to get my Research Tools course on iTunesU for UNH, but all I got was a response from the web team saying that it would take them an entire day and they didn't know who at UNH would pay for it (???). I guess I should just try to set up an iTunes podcast channel when I get some time to put my lectures into someplace they can be searched.

Posted by Kurt | Permalink

12.17.2011 07:22

Tami's CCOM seminar

Yesterday, Tami presented her masters research project as a part of the CCOM seminar. Over the last couple months, Tami has gotten really good at using python to process full wave form single beam sonar. I watched her presentation over the IVS 3D GoToMeeting channel and grabbed a couple of screenshots that am posting to show off what Tami did. (Tami, I hope you don't mind!) She did a great job of creating an algorithm in python to pick the top of the eel grass and the "hard" bottom in her 2011 survey lines.

Tami's title slide

Posted by Kurt | Permalink

12.16.2011 16:21

YouTube machine transcriptions

Okay, this is super awesome! I just noticed in my YouTube videos that there is now a machine transcription that you can download. I should be able to edit this to make it correct and it is way easier to fix something like this than to create one from scratch. Additionally, this means that I will end up with the text for my videos. Now I just need to finish the class so I get some time to work on the transcriptions. Some of this is highly entertaining!

0:00:01.140,0:00:04.670
pull my name is kurt shillinger and i'll
be showing you the game now

0:00:04.670,0:00:06.880
expect senator

0:00:06.880,0:00:10.359
now he max is more than just tech center
itself

0:00:10.359,0:00:15.519
pull environment for all sorts of text
manipulation passes including

0:00:15.519,0:00:18.250
programming handed nanjing

0:00:18.250,0:00:21.180
have projects in other tasks

Posted by Kurt | Permalink

12.16.2011 10:37

Rick Brennan's talk at CCOM

From the UNH CCOM/JHC Seminar Series 2011-12. NOAA Corps Officer CDR Richard T. Brennan Presents: "Coastal and Ocean Mapping: A Nautical Cartographer's Perspective". Talk was given on Friday, December 9th, 2011 at UNH's Chase Ocean Engineering Laboratory. Awesome that they are turning around videos so quickly!

Necessary Map Elements:
- Distance or Scale
- Direction (where's north?)
- Legend (what do the colors mean?)
- Source (who produced the map?)
- Position (graticule/border)
http://www.colorado.edu/geography/gcraft/notes/cartocom/elements.html

Nathaniel Bowditch

https://www.facebook.com/CCOMJHC

Rick Brennan on Maps/Charts

Posted by Kurt | Permalink

12.16.2011 06:39

Using Yolk to build fink packages

I've had the idea of using pypi as the raw source to create a starter fink package info file or to help with the update process. I finally took a couple minutes to give it a go based on the Yolk python package. I got a proof of concept done, but I'm not sure that I am on the right track. I posted a ticket with the Yolk developer, Rob Cakebread asking for examples of how he thinks the yolk API should be used:

https://github.com/cakebread/yolk/issues/2

In looking around, I realized that perhaps yolk isn't the best platform for this. Rob points to g-pypi:

g-pypi creates ebuilds for Gentoo Linux using information in PyPI
(Python Package Index).

That is exactly what I am trying to do for fink and it was a 2010 Google Summer of Code project. Looking into g-pypi, I realized that the code is also based on yolk and by Rob, so yolk looks to be the way to go. The one thing that I was bummed to see is that he is using the Cheetah templating library rather than python's str ".format" method.

I got a bunch turned around in my initial inspection of yolk, but after a bit here is what I came up with for two examples. First with SQLAlchemy:

import yolk.pypi as pypi
cs = pypi.CheeseShop()

cs.package_releases('SQLAlchemy')
# ['0.7.4']
cs.query_versions_pypi('SQLAlchemy')
# ('SQLAlchemy', ['0.7.4'])
sa_rd = cs.release_data('SQLAlchemy','0.7.4')

sa_ru = cs.release_urls('SQLAlchemy','0.7.4')
sa_ru[0]['md5_digest']
# '731dbd55ec9011437a842d781417eae7'
sa_ru[0]['url']

The results of the release_urls call consists of just one entry:

[{'comment_text': '',
  'downloads': 4525,
  'filename': 'SQLAlchemy-0.7.4.tar.gz',
  'has_sig': False,
  'md5_digest': '731dbd55ec9011437a842d781417eae7',
  'packagetype': 'sdist',
  'python_version': 'source',
  'size': 2514647,
  'upload_time': <DateTime '20111209T23:31:43' at 10fb28128>,
  'url': 'http://pypi.python.org/packages/source/S/SQLAlchemy/SQLAlchemy-0.7.4.tar.gz'}]

SQLObject gives more complicated responses:

import yolk.pypi as pypi
cs = pypi.CheeseShop()

cs.package_releases('SQLObject')
# ['1.2.1', '1.1.4']
cs.query_versions_pypi('SQLObject')
# ('SQLObject', ['1.2.1', '1.1.4'])

cs.get_download_urls('SQLObject')
# ['http://pypi.python.org/packages/2.4/S/SQLObject/SQLObject-1.2.1-py2.4.egg',
#  'http://pypi.python.org/packages/2.5/S/SQLObject/SQLObject-1.2.1-py2.5.egg',
#  'http://pypi.python.org/packages/2.6/S/SQLObject/SQLObject-1.2.1-py2.6.egg',
#  'http://pypi.python.org/packages/2.7/S/SQLObject/SQLObject-1.2.1-py2.7.egg',
#  'http://pypi.python.org/packages/source/S/SQLObject/SQLObject-1.2.1.tar.gz',
#  'http://pypi.python.org/packages/2.4/S/SQLObject/SQLObject-1.1.4-py2.4.egg',
#  'http://pypi.python.org/packages/2.5/S/SQLObject/SQLObject-1.1.4-py2.5.egg',
#  'http://pypi.python.org/packages/2.6/S/SQLObject/SQLObject-1.1.4-py2.6.egg',
#  'http://pypi.python.org/packages/2.7/S/SQLObject/SQLObject-1.1.4-py2.7.egg',
#  'http://pypi.python.org/packages/source/S/SQLObject/SQLObject-1.1.4.tar.gz']

cs.get_download_urls('SQLObject',pkg_type='source')
# ['http://pypi.python.org/packages/source/S/SQLObject/SQLObject-1.2.1.tar.gz',
#  'http://pypi.python.org/packages/source/S/SQLObject/SQLObject-1.1.4.tar.gz']

cs.get_download_urls('SQLObject',version='1.2.1', pkg_type='source')
# ['http://pypi.python.org/packages/source/S/SQLObject/SQLObject-1.2.1.tar.gz']

so_rd = cs.release_data('SQLObject','1.2.1')
so_rd['download_url']
# 'http://pypi.python.org/pypi/SQLObject/1.2.1'

so_ru = cs.release_urls('SQLObject','1.2.1')

And the release_urls for just one version contains both source and egg file archives.

[{'comment_text': '',
  'downloads': 76,
  'filename': 'SQLObject-1.2.1-py2.4.egg',
  'has_sig': False,
  'md5_digest': '42bd3bcfd5406305f58df98af9b7ced8',
  'packagetype': 'bdist_egg',
  'python_version': '2.4',
  'size': 354432,
  'upload_time': <DateTime '20111204T15:33:09' at 10fb85518>,
  'url': 'http://pypi.python.org/packages/2.4/S/SQLObject/SQLObject-1.2.1-py2.4.egg'},
 {'comment_text': '',
  'downloads': 62,
  'filename': 'SQLObject-1.2.1-py2.5.egg',
  'has_sig': False,
  'md5_digest': '0303afd1580b6345a88baad91a3a701d',
  'packagetype': 'bdist_egg',
  'python_version': '2.5',
  'size': 350388,
  'upload_time': <DateTime '20111204T15:33:11' at 10fb85050>,
  'url': 'http://pypi.python.org/packages/2.5/S/SQLObject/SQLObject-1.2.1-py2.5.egg'},
 {'comment_text': '',
  'downloads': 129,
  'filename': 'SQLObject-1.2.1-py2.6.egg',
  'has_sig': False,
  'md5_digest': 'df4bfd367a141c6c093766e25dea68f3',
  'packagetype': 'bdist_egg',
  'python_version': '2.6',
  'size': 349829,
  'upload_time': <DateTime '20111204T15:33:13' at 10fb85248>,
  'url': 'http://pypi.python.org/packages/2.6/S/SQLObject/SQLObject-1.2.1-py2.6.egg'},
 {'comment_text': '',
  'downloads': 174,
  'filename': 'SQLObject-1.2.1-py2.7.egg',
  'has_sig': False,
  'md5_digest': '0c3df5991f680a7d7ec0aab2c898a945',
  'packagetype': 'bdist_egg',
  'python_version': '2.7',
  'size': 348455,
  'upload_time': <DateTime '20111204T15:33:16' at 10fb85908>,
  'url': 'http://pypi.python.org/packages/2.7/S/SQLObject/SQLObject-1.2.1-py2.7.egg'},
 {'comment_text': '',
  'downloads': 202,
  'filename': 'SQLObject-1.2.1.tar.gz',
  'has_sig': False,
  'md5_digest': 'a01cfd19da6f19ecaf79c2eed88c3dc3',
  'packagetype': 'sdist',
  'python_version': 'source',
  'size': 259127,
  'upload_time': <DateTime '20111204T15:33:07' at 10fb85a70>,
  'url': 'http://pypi.python.org/packages/source/S/SQLObject/SQLObject-1.2.1.tar.gz'}]

That gave me enough to knock out the proof of concept. There needs to be huge amounts of fine tuning to make this a truly useful tool. For example, I need to be able to detect all the binaries that are installed and create the update_alternatives entries. The initial code:

#!/usr/bin/env python

from yolk.pypi import CheeseShop

template = '''Info3: <<
Package: {fink_name}%type_pkg[python]
Version: {version}
Revision: {revision}
Source: {url}
Source-MD5: {md5}
Type: python ({types})
Depends: python%type_pkg[python]
BuildDepends: distribute-py%type_pkg[python]
CompileScript: true
InstallScript: %p/bin/python%type_raw[python] setup.py install --root=%d --single-version-externally-managed
License: OSI-Approved
Homepage: 
Maintainer: Kurt Schwehr <goatbar@users.sourceforge.net>
Description: 
DescDetail: <<
<<
# Info3
<<
'''

def pypi_to_fink(package_name):
    cs = CheeseShop()
    release = cs.package_releases(package_name)[0] # Hope that highest is first
    #
    # FIX: make a function that pulls the first sdist
    # Possibly prioritize by tar.bz2 -> tar.gz -> zip
    url = cs.release_urls(package_name,release)[0]
    fink = {}
    fink['name'] = package_name.lower()
    fink['fink_name'] = fink['name']+'-py'
    fink['version'] = release
    fink['revision'] = 1
    # FIX: remove the package name if need be from the URL
    fink['url'] = url['url']
    fink['md5'] = url['md5_digest']
    py_types = ['2.7', '3.2'] # FIX: figure this out
    fink['types'] = ' '.join(py_types)
    return template.format(**fink)

if __name__ == '__main__':
    import sys
    print pypi_to_fink(sys.argv[1])

Here is using it for SQLAlchemy:

./pypi_to_fink.py SQLAlchemy
Info3: <<
Package: sqlalchemy-py%type_pkg[python]
Version: 0.7.4
Revision: 1
Source: http://pypi.python.org/packages/source/S/SQLAlchemy/SQLAlchemy-0.7.4.tar.gz
Source-MD5: 731dbd55ec9011437a842d781417eae7
Type: python (2.7 3.2)
Depends: python%type_pkg[python]
BuildDepends: distribute-py%type_pkg[python]
CompileScript: true
InstallScript: %p/bin/python%type_raw[python] setup.py install --root=%d --single-version-externally-managed
License: OSI-Approved
Homepage: 
Maintainer: 
Description: 
DescDetail: <<
<<
# Info3
<<

That is not the best info package for fink on the planet, but it's a great start. If I want to keep participating with fink, I need to get smarter about testing and automation.

Posted by Kurt | Permalink

12.14.2011 14:09

Research Tools on Hacker News

I just got a nice note from Jeffrey Morin at MIT that my Research Tools class made it onto the Hacker News homepage today: Research Tools class (unh.edu). I have to admit, that while I follow slashdot and briefly followed digg, I hadn't seen Hacker News before, so I don't really know what this means.

Research Tools on Hacker News

Thanks to Steven Bedrick for the post. He has an interesting class that is also similar to mine: Scripting for Scientists

For my little corner of the web, this is a lot of hits:

cd /var/log/apache2
grep 'rt/ HTTP|esci895-researchtools/ HTTP' other_vhosts_access.log | grep '14/Dec' | grep -v icon | wc -l
4841

Not that much compared to the many 10's of millions when I work a spacecraft mission, but that's pretty awesome for a class covering emacs, python and the like.

My class:

http://vislab-ccom.unh.edu/~schwehr/Classes/2011/esci895-researchtools/

Posted by Kurt | Permalink

12.13.2011 20:27

Beyond (or before?) the mouse

At AGU, Bernie Coakley and I were discussing education and he pointed me to a class at the University of Alaska by Ronni Grapenthin:

Beyond the Mouse 2011 - The (geo)scientist's computational chest. (A Short Course on Programming)

Ronni has published a short article about the class in AGU's EOS. (AGU membership required).

The class is similar to my Research Tools Course that I have now finished teaching and need to finish the grading for.

01_thinking_programs.pdf

Beyond the Mouse

Posted by Kurt | Permalink

12.11.2011 14:11

Open Access to research literature

Request for Information: Public Access to Peer-Reviewed Scholarly Publications Resulting From Federally Funded Research (Federal Register)

Danah Boyd posted on g+ and Save Scholarly Ideas, Not the Publishing Industry (a rant)

Q2: What are the five things that you think that other scholars
should do to help challenge the status quo wrt scholarly publishing?

So what have I done?

My papers are online on my site
I have a paper in progress that is in a public DVCS: Toils of AIS (Sorry ESR! I need to work on it ASAP!)
I post a good amount of my research on my blog
I have my Google Scholar Citations and JabRef HTML pub list with BibTex entries
I'm going to work for Google on Ocean, where I hope to show off all sorts of peoples' research
I post my source code and class notes online with open source licenses: schwehr on github and schwehr on BitBucket
My research tools class has videos and audio from lecture under a Creative Commons license

That would be my list of 5 (+2 extra). What are yours?

I think that a reasonable alternative to the current closed journals would be for me to retain the copyright and allow the journal an exclusive (e.g. not even my web page) to my paper for 1 year. After that, I release my paper under an Creative Commons license.

Open Access on Wikipedia

Posted by Kurt | Permalink

12.09.2011 07:49

Jane Lubchenco at AGU, Scientific Integrity

Jane Lubchenco gave a talk this week at AGU: Predicting and Managing Extreme Events (Full text). As a part of that, she started with "NOAA'S SCIENTIFIC INTEGRITY POLICY" and there was a press release, NOAA issues scientific integrity policy, that links to NOAA Scientific Integrity Commons. That last page links to Scientific%20Integrity%20Policy_NAO%20202-735_Signed.pdf, which sadly does not have OCR'ed text to go with it.

Posted by Kurt | Permalink

12.09.2011 06:50

oil and risk

TED talk - Naomi Klein: Addicted to risk

Athabasca oil sands (Wikipedia)

Albera Oil Sands in Google Earth

Posted by Kurt | Permalink

12.08.2011 14:08

GeoMapApp was missing from Research Tools

I am bummed that I didn't get a chance to bring Andrew Goodwillie up to UNH for the traditional class on GeoMapApp. I found Andrew at AGU today manning the Integrated Earth Data Applications (IEDA) booth. Same people at LDEO, but a new umbrella name. I was hoping to ask him if I could upload the webinar that he, Laura Wetzel, and a bunch of other folks recorded: MARGINS Data in the Classroom: Teaching with MARGINS Data and GeoMapApp (2009).

Instead, Andrew pointed me to a whole slew of videos that he uploaded to YouTube three weeks ago! What excellent timing! He put 22 short videos up on the GeoMapApp channel. I made a playlist of them so that you can watch all 1 hour and 47 minutes of video in one go (if you are up for absorbing that much material in one sitting, I'll be seriously impressed!)

GeoMapApp YouTube Playlist

So get watching! If you are ocean mapping or marine geology/geophysics student, this is a tool that I consider being a "must know."

And I noticed Ramon Arrowsmith at ASU is a subscriber to the GeoMapApp channel. Ramon was finishing his PhD at Stanford while I was an undergrad building user interfaces for finite element and borehole software in Dave Pollard's lab.

It looks like Ramon has got some great videos up on his YouTube channel, especially if you are learning ArcGIS. For Example:

Posted by Kurt | Permalink

12.08.2011 07:13

BitTorrent for ships - revisited

Val asked me to explain my thoughts on why the BitTorrent protocol might be good for research ships. I'd love to hear other thoughts about how to most effectively use the limited sat bandwidth without destroying the other critical uses of that link and allowing for bursts of data over other routes when possible.

I thought quite a bit about this back in 1996 / 1997 when I wrote the first version of a protocol to blast image data back over a double hop geosync satellite link from the Nomad Rover in the Atacama desert of Chile back to 2-3 ground control centers at NASA Ames (California), Carnegie Mellon (Pittsburgh, PA), and I forget the other locations. I very much regret not having written a journal paper back then about what Dan Christian and I got working.

My original design was using multicast UDP with RTI's Network Data Delivery Service (NDDS; now DDS). The robot was running my code on VxWorks and there were 5-10+ Sun Solaris and SGI Irix 5.x workstations consuming the multicast stream. I had a good sized ring buffer of image blocks. As images were captured, they were dropped into the ring buffer. Another thread sent out packets at a good clip, but we made sure to never 100% saturate our dedicated satellite links to reserve space for command and control data. End nodes could then send very small retransmit request packets and the robot would resend data if it was still in the ring buffer. This meant that when the network degraded, we would get some images with unknown (aka black) blocks, but the system was always moving forward. When the data loss rate was low, we spent almost no time on the resend process.

With bigger machines and huge disks on ships, we can now expect things like BitTorrent to get 100% of the data correctly to the end points. But with satellite links, there is a lot of very specific knowledge that can help you tune packet size and transmission rate. Is each link section going over a reliable tunnel (e.g. between a ground station and the satellite)? Can we balance packet overhead with the BER (bit-error-rate) on the link to pick the optimal packet size? Is there packet fragmentation happening at any point that means we might as well send smaller packets?

Lots of interesting things going on and trade offs to weigh!

What I wrote:

I look forward to the day when data is never again shipped more than a couple hundred feet by DVD/flashdrive/USB Drive/Tape (or maybe a helicopter flight from a ship in the ice to McMurdo). Our data is not a Washington Mutual data center armored truck sized data center migration (a story I very much enjoyed hearing).

My thoughts so far...

Benefits of a BitTorrent style setup:

designed with networks that frequently go up and down. rsync over ssh will work, but not great.
It's okay with transferring bits and pieces.
If you hit a high bandwidth spot like wifi or wired in a harbor, it will happily ramp up the bandwidth.
Think Barrow, AK... it could do a fast dump to a local dump to a shore client when a ship came in range and now you have two seeds getting data back through the system to the national archives.
This can be hands off... the ship just sends the file definition / seed info over the sat link and then the archive sites can monitor the progress and the ship slowly and sporadically sends the files in no particular order.
If two ships passed using SWAP, it is possible to pass data to a vessel that will hit higher bandwidth internet sooner.
Again with SWAP, if you have one ship with a much bigger sat pipe, BitTorrent could totally handle the UUCP like job of bouncing the data to shore.

Downsides:

What you don't want (that is in the current spec) is to have more than 1 or 2 clients pulling that data. Having clients "swarm" the ships network needs to be prevented.
Most IT groups will so "no way" to anything peer-to-peer without ever looking beyond the first time they see a word proposing such.

What needs to happen? IT teams need to be willing to consider a peer-to-peer system without a knee jerk no. I do understand that plain old BitTorrent has a well deserved bad rap for 95% of its traffic. Stock BitTorrent setup would definitely work and be able to be setup to stay away for the normal BitTorrent world. But it would be better to have a modified BitTorrent setup that:

Used UDP to send and I'm not sure if it should be TCP or UDP to "ack" blocks
Have a configurable or dynamic UDP packet size to best work with the specific satellite configuration and bit error rate
Can listen to the ship's systems for hints about how much sat bandwidth to try to use
Being able to know now many clients to send data to based on the route to that client (e.g. over sat verses land/sea based WAN)
Evaluate if multicast could reduce the number of retransmits needed by increasing the chance that someone would get the packet and ack it
Automate the seek info handling and distribution
Can we use this channel to more effectively update ship board software systems? e.g. data delivery to ships
A priority system to make sure that the most important data files or data types get to shore first (and I would imagine that will always be changing based on the vessel science goals)

The other thought I should bring up... I strongly suggest pre-compressing all data files before doing BitTorrent / rsync / whatever. Most of the formats compress very well, but that adds time to the transmit process (more blocks for BitTorrent to track or more data for rsync to checksum during each time rsync runs). Adding a "nohup bzip2 -9 monster-mb-data-file.all" to your processing chain.

I'd be interested to hear what others thought about this (good/bad/otherwise :). I don't have available engineering time to work on such a system, but it would be great to flush out the idea and then we can have it out there with the chance that an enterprising Computer Science grad student in networking might take it on as a thesis project.

Update 2011-12-08: Dan Christian followed up (g+ post) with a number of suggestions including UDP tracker protocol - "a high-performance low-overhead BitTorrent tracker protocol">. And DCCP, or ECN. Dan has something cool up his sleeve for just about any topic I bring up. Wi-fi buses drive rural web use (bbc news)

Posted by Kurt | Permalink

12.07.2011 23:29

IRC via satellite

On Monday, I saw an interesting poster... running IRC over an iridium satellite link to connect the team members on a NASA research plan with the ground control team members. Woot! Back in 1996 was the first time I used UNIX talk with someone in the field over an satellite link. But it wasn't until 2011 that I did IRC with someone on a plane (akh or dmacks was in #fink while over the middle of the Atlantic)

IRC with a plane

Posted by Kurt | Permalink

12.05.2011 10:47

ArcGIS AIS Data Handler

I just found out about this through a very round about way. I think this is a project that I consulted with a while back, but I'm not totally sure. It's interesting to see that they site two of my papers. Let me know if you use this ArcGIS plugin. I'd love to see some blog posts and papers about how this system actually works.

http://marinecadastre.gov/AIS/

I skimmed through the AISDataHandlerTutorial_DRAFT.pdf and see that they have lots of cleaning tools. It will take some serious thought to see what I think about these processing techniques.

It is very awesome that they released the source code, but the code is not well released. There is no authors, readme, license, etc. It looks like it is written in C# (C-Sharp). It also includes a number of dll's, which are very much not source.

find . | grep -i parse
./AISFilter/AISParser.cs
./AISFilter/vdm_parser.cs
./aisparser.dll

So yes, this includes Brian's AIS Parser, but only as binary.

Found a readme burried in AISTools/Resources/:

AISTools Resource Files:

-AISTools.mdb: contains codes for translating values to something meaningful
-Clean_FeatureClass.py: Python script to clean feature class
-Clean_Table.py: Python script to clean tables

Take these files and place them in an AISTools directory in the same
folder as the ArcMap "Bin" folder. For example, if your installation
is C:\Program Files\ArcGIS\Desktop10.0\Bin\ArcMap.exe, put the files
in C:\Program Files\ArcGIS\Desktop10.0\AISTools.

And in one of the python files, I found evidence that this was done by ASA, who did indeed ask me a number of questions while writing this code for NOAA.

head -5 Clean_FeatureClass.py 
# ---------------------------------------------------------------------------
# MMSI_CountryCode.py
#   (generated by Zongbo Shang zshang@asascience.com)
# Description: 
# ---------------------------------------------------------------------------

Um... not "MMSI_CountryCode.py" but at least this is some provenance info pointing at ASA. I also found this weirdness:

./AISMenu10/Properties/AssemblyInfo.cs:[assembly: AssemblyCopyright("Copyright Â©  2011")]

Posted by Kurt | Permalink

12.05.2011 07:44

QGIS instruction material

I'm at AGU this week. I was supposed to have a poster this morning, but didn't manage to get it together in time with everything that is going on. If you would like to talk to me about my poster topic or anything else, please feel free to email me and we can setup a meet time.

Tim Sutton posted A nice QGIS tutorial by Lex Berman, which links to Open Source Desktop GIS with Quantum at Harvard. A very polished looking web site.

Lex Berman QGIS tutorial

He also has 14 QGIS videos on YouTube: HarvardCGA

Posted by Kurt | Permalink

12.03.2011 16:36

Nick Parlante teaches python

I had Nick for CS106B (part 2 of introduction to computer science using Pascal) and CS109 (introduction to many programming languages), but Python wasn't arround at the time. Here is Nick teaching python at Google. How cool is that! He was one of the best lectures in computer science that I had as an undergrad back in the day.

Google Python Class in 19 videos for almost 18 hours. An he is only trying to cover python.

Posted by Kurt | Permalink

12.02.2011 15:21

basemap for matplotlib back in fink!

basemap is back! at least for fink on 10.7... it is using libgeos3.3.1.

from mpl_toolkits.basemap import Basemap
import numpy as np
import matplotlib.pyplot as plt
width = 28000000; lon_0 = -105; lat_0 = 40
m = Basemap(width=width,height=width,projection='aeqd',
            lat_0=lat_0,lon_0=lon_0)
# fill background.
m.drawmapboundary(fill_color='aqua')
# draw coasts and fill continents.
m.drawcoastlines(linewidth=0.5)
m.fillcontinents(color='coral',lake_color='aqua')
# 20 degree graticule.
m.drawparallels(np.arange(-80,81,20))
m.drawmeridians(np.arange(-180,180,20))
# draw a black dot at the center.
xpt, ypt = m(lon_0, lat_0)
m.plot([xpt],[ypt],'ko') 
# draw the title.
plt.title('Azimuthal Equidistant Projection')
plt.savefig('aeqd.png')

Posted by Kurt | Permalink

12.02.2011 09:32

Memory usage in python

Perhaps I need to think about how this particular python script is written. I expected it to be slow, but this slow?

PID   COMMAND      %CPU  TIME     #TH  #WQ  #POR #MREGS RPRVT  RSHRD  RSIZE  VPRVT  VSIZE  PGRP PPID STATE    UID  FAULTS
4742  python2.7    23.6  02:52.91 2    0    24   36939  2744M- 520K   2555M- 14G    16G    4742 1001 sleeping 502  3710024+

But I'm loading a ton of data all at one time and then writing it to a numpy array so that I can work with it much faster. 16G of virtual memory is a fair bit

Posted by Kurt | Permalink

12.02.2011 04:12

Last research tools class - RT 26 - Copyright and KML/SQLite export

Yesterday, I dashed out the audio editing and screenshot to Keynote/PDF for Research Tools 26:

html, mp3, pdf and 26-python-binary-files-part-5.org. Any comments you might want to make can go here: Kurt's Backup blog: RT 26

I wish that BitBucket would let people to add comments to a particular version of the file, not just the commit. I love the way that DjangoBook.com allows comments. At least you can comment on a commit: comment on changeset db25c6a65fe2. But why not allow comments to be associated with a particular line?

Research Tools 26 - Final presentation title slide

Research Tools 26 - Final presentation title slide

And a little imagemagick + bash. It doesn't work as I want if the image is taller than wide, but typically my image captures are landscape style.

convert -resize 600 {~/Desktop/,}rt-26.png

And an aside... I liked this post for the shell script rewrite with explanation (I'm not going to start using TextExpander).

Erm by Dr. Drang.

He went from this:

#!/bin/tcsh
echo -n `pbpaste` | perl -e 'while (<>) { s/-//g; print $_; }'

to this (I changed bash to sh and added --noprofile for speed).

#!/bin/sh --noprofile
pbpaste | tr -dc '[:digit:]'

I would also say to @macgenie's tweets: first and second... if you are going to use tcsh for scripts, you should at least switch to /bin/csh -f rather than tcsh for everything. Not a big deal in a one off, but if later you morph into a frequently called script, that will reduce overhead. But why would anyone want to use csh/tcsh? This coming from me who was a die hard {t,}csh user from 1990 to 1999 until I was forced to really start using bash during Mars Polar Lander.

Trackbacks:

Slashgeo 2011/12/05

Posted by Kurt | Permalink

12.01.2011 06:23

Last class for Research Tools 2011

I am sitting down to work on the notes for my final lecture in Research Tools. I set my goals for the class super high and knew they were definitely not attainable, but I covered way less than I wanted and with far less depth than I had hoped. I have big regrets about all the topics I have not covered that I believe are important for students to understand. I tried to create the class that I wish I had been able to take as an undergrad. I created the initial version of the class with the knowledge that I would be massively upgrading the class each year, but this is literally the last class I'm teaching where the CCOM/JHC Research Tools class is mine to craft. Whoever teaches the class next if fully welcome to any of my material that suits them, but they I expect them to teach the class in their own style and that might be radically different than my vision. There is no "correct way" to teach this class. If we never aim high and accept that there will be mistakes along the way, we will never get very far nor learn much.

Come January, I will be at Google with similar, but not identical, goals. So I am very frustrated that I won't be directly doing iteration 2 of research tools. That said, I will be working with the global marine science and education communities. As a part of that, I be a part of that community and work with everyone to improve the processes for generating and consuming ocean and lake data from our entire globe. I hope that I am able to push the Research Tools 2011 material to the next level as most of it is incomplete or at least has little or no polish. I will miss the interaction with students in the classroom who challenge me to explain topics more effectively. There is nothing like a puzzled look from the class to let you know that you did not define a word or concept before using it! And I get a motivation boost every time I see an "aha!" moment from a student.

At this point, I'm attempting to create the wrap up class and think about how effective this iteration of research tools was. I have to hand out the course reviews today. While there will be some constructive input in those responses, I really believe that the true test is how this class impacts the 16 students during their degree/certificate programs and in their careers when they leave the academic bubble. Will it matter that they took research tools from me when they are sitting in front of some tools like QGIS, MB-System, Caris, Hypack, ArgGIS or Fledermaus in the field 5 years from now?

What are some of the things that I wish I had covered, but never got to in the lectures? The biggest regret is not getting the students into Mercurial (hg) revision control right away. It takes time to develop a sense of what works and what is a bad idea with version control systems and the distributed variety (DVCS) is something that few in CCOM/JHC have had much time with. Team sizes at CCOM are typically 1 person with the occasional 2-3 person teams. I would really like the students of Research Tools to get comfortable with large team projects and with the idea that those teams can be distributed around the globe (both on land and at sea).

Some of the other topics that we did not get to that come to mind: Generic Mapping Tools (GMT), MB-System (free open source multibeam sonar processing), processing time series data with an example from tides (e.g. raw engineering units from a data logger to cleaned tide data ready and in the Caris input format; using Ben's tide tools!!!), getting data to {matlab, spreadsheets, fusion tables, databases, etc} from the raw sources via python, using LaTeX + BibTex + Beamer to create reports and presentations that rock and do not have to deal with figures that have to be live inside a document and how to write kick ass bug reports that get you help on tough problems we often face with complicated data collection and processing systems.

Many of the topics we did cover, we hardly brushed the surface on most. For example, I had plans of using a collection of 2000 ISO XML metadata files. By working through trying to understand the set of data files using only the metadata, I hoped to give people a personal sense of what should be in the metadata. Additionally, this would have sparked discussion on topics like:

Should the type of source(s) be in the metadata for the data be in a BAG representation of the sea floor? I desperately want to know if each BAG includes data from Topo Lidar, Bathy Lidar, Single beam, Multi beam, interferometric, visual imagery derived, wave height derived, seismic survey bottom picks and/or lead line methods. Brian Calder brought up that the goal for BAGs is that they are a best representation of the sea floor in an area and that I should be looking at the uncertainty to decide if I should use a data set or not. However, that information is not in the BAG ISO XML metadata (e.g. a min, max, mean, std dev of the bathy and uncertainty plus the fraction of cells that are empty or were purely interpolated between cells with data). That means I need to ask the HDF5 container for each data set about itself and even then, I will not be able to figure out the interpolation type (if any) that was applied to data. What if, for example, in 5 years, I decide that I need to exclude all BAGs that include bathy lidar from a processing chain when the BAG is older than 2012?

I owe a huge thank you to all of the students (and the two auditors) who went through the class. I hope that you all stay engaged with the research tools concept long after you've finished the class.

Posted by Kurt | Permalink