Wednesday, June 29, 2011

express charsets : CHECK! what's next?

Happy Wednesday everybody!

For those receiving my progress report, you know that we have successfully added charset expression in NeXML for TreeBASE. It feels extremely awesome to know I have actually contributed something important to an active and ongoing project.

The most important thing I have learned from this portion of the project: Test-driven Development, when you write a failing test and fix the code so that it satisfies all the tests that you wrote for it. Then you know you have a working program.  I never learned that in my classes and it is very cool to learn an important technique that I will continue to use in the future.

Summary of what we did:

  • We wrote two JUnit tests. testNexmlMatrixConverter() creates a NexmlMatrix within a Nexml document and checks to make sure that it matches up with the appropriate fetched matrix from TreeBASE. testNexmlMatrixCharSets() verifies the the coordinates and name of the charactersets  in the NexmlMatrix actually match up  to those of the TreeBASE matrix.  It also does a check to make sure that the study actually has charsets associated with it. This is important to prevent NullPointerExceptions.
  • After establishing a well thought-out and commented unit test for implementing char set expression, I used the logic from the unit test and reworded it to fit in with the actual methods of the NexmlMatrixConverter () class. The code for this class can be found here: It was pretty exciting when I was able to get everything to pass all of the tests we created for it and I was finally able to commit the code. Unfortunately, I included an unused import and it threw off the entire build of the project, causing problems for a lot of people. So...note to self: don't do that again. Anyways, after that was all sorted out, it is safe to say that TreeBASE now supports charset expression for NeXML.
Now what are the next steps?  I am going to be working on including row-segment metadata for NeXML output. Right now, I am working to express the metadata with Darwin Core terms for the specimen-related information. The sequence information is going to have to be expressed differently to be useful to anybody though. This involves editing the populateXmlMatrix() funcitons.  However, these row-segment annotations involve possibly (worded from Bill Piel) partitioning a row of character states into every possible segment and then assembling sets of row segments as needed. Then each set of partitions, you can express metadata as described above.  I have completed the editing of the populateXmlMatrix() methods, but I need to talk to Rutger now about what we are doing in regards to implementing them.

Alright, that's all folks. Have a good night!

No comments:

Post a Comment