Before anything else, I want to talk about the current unit testing framework that we've set up: In the test/unit directory of the repository, I've included a header "test_parser", which includes a couple of subclasses the WebVTT::FileParser class from libwebvttxx. This simplifies the unit test code quite a bit. There are two versions, so far: ItemCounterParser, which simply keeps a count of errors and cues encountered (but does not save them), and ItemStorageParser, which does the same as the above, but also stores the cue and error objects so that they may be inspected. Rick Eyre has been working on C++ code to wrap the cuetext 'nodes', however it has not landed just yet, he feels that there is more to do. But even so, the ItemStorageParser class alone should make it fairly simple to convert the conformance tests to unit tests, which I've already begun doing (the current tree has tests for the 'vertical' cue-setting) To add unit tests to the build so that make check runs them, we have a list of unit test programs to be built for the 'check' target:
UNIT_TESTS = \ sample_unittest \ sample2_unittest \ csvertical_unittestYou can simply add the name of your program to the list. Node that in the list, newlines are escaped with a backslash, so that the variable definition continues to the next line. Following that, you need to tell automake what source files the program depends on:
# Cue Settings tests csvertical_unittest_SOURCES = csvertical_unittest.cppWhile eventually we might have a script to do this all automatically, for the time being that's all we've got :3 Additionally, if you need to add additional libraries or include directories for your specific test, you can add them by doing something like this:
mytestprogram_CPPFLAGS = <additional preprocessor flags, such as -I<includedir> or -D<definition>> mytestprogram_LDADD = <some library, but probably won't need to do this (however, some of the libwebvttxx stuff requires -lm currently, so yeah...>So, here's an example of unit tests for the vertical cue-setting, using libwebvttxx, gtest, and the ItemStorageParser object:
#include "test_parser"
/**
* Verifies that the parser correctly parses a "vertical" key, followed by U+003A ':',
* followed by 'rl' (indicating that the text be positioned vertically, and grows towards the left)
*
* From http://dev.w3.org/html5/webvtt/#webvtt-vertical-text-cue-setting (09/28/2012):
* 1. The string "vertical".
* 2. A U+003A COLON character (:).
* 3. One of the following strings: "rl", "lr".
*/
TEST(CueSettingVertical,RL)
{
ItemStorageParser parser("cue-settings/vertical/rl.vtt");
parser.parse();
const Cue &cue = parser.getCue(0);
ASSERT_EQ(true, cue.isVerticalRightToLeft());
}
As you can see, it's basically 4 lines of code for the single test (which ensures that the 'vertical:rl' setting is parsed and interpreted correctly.
The first line declares an ItemStorageParser which opens the file "cue-settings/vertical/rl.vtt", and it's parsed in the second line.
In the 3rd line, we try to get const reference to the first WebVTT::Cue object. This should throw an exception if the cue is not available, and gtest treats this case as a failure.
Finally, we assert that the cue is has a direction of RightToLeft, which should indicate that the vertical:rl setting was correctly parsed.
So we can convert a lot of our 'good' tests to work fairly simply like this (However, I do think we probably should store line and column numbers with cues and cue nodes, so that we can check that we're tracking the line/column numbers correctly)
But what about 'bad' tests? That's an area where it gets interesting
/**
* Test if parser fails when an unknown keyword is used for the otherwise correct 'vertical' cue setting.
*
* From http://dev.w3.org/html5/webvtt/#webvtt-vertical-text-cue-setting (09/28/2012):
* 1. The string "vertical".
* 2. A U+003A COLON character (:).
* 3. One of the following strings: "rl", "lr".
*/
TEST(CueSettingVertical,DISABLED_BadKeyword)
{
ItemStorageParser parser("cue-settings/vertical/bad-keyword.vtt");
parser.parse();
const Error& err = parser.getError(0);
/**
* We're expecting a WEBVTT_INVALID_CUESETTING error on the 25th column of the 3rd line
*/
ASSERT_EQ(WEBVTT_INVALID_CUESETTING,err.error());
ASSERT_EQ(3,err.line());
ASSERT_EQ(25,err.column());
}
This above test is known to fail (in gtest, we can mark these with DISABLED_ before the test name, which tells gtest to not try to run the test).
The reason why it fails is actually the column number, somewhere in the parser we are not tracking the column correctly. Other DISABLED_ tests in csvertical_unittest.cpp have even worse problems, and we actually need to redesign the parser somewhat to simplify and solve them.
So, this brings me to my next point, which is "the future of libwebvtt". Obviously, the semester is coming to a close, and I expect most students will probably stop caring about it, but I've had some fun with it and feel like I've learned a lot, so I'd like to keep working on it.
So, here are some things to work on:
- Integrate cue-text parsing into webvtt_parse_chunk
- I don't think it's really very good to make it the browser's responsibility to deal with webvtt cuetext markup (it's not quite html5)
- A parser state stack is kind of needed anyways, because the number of states is getting a bit crazy (when addressing some of my test failures, I needed to add about 3 for each cue-setting type, but this isn't really needed)
- Reducing the number of states is overall a pretty good idea, and we may even be able to unify all those parse 'modes' into one
- Internal string representation
- It doesn't really matter how we internally represent text, however I don't think it's a good idea to have cuetext parser and the main parser dealing with different encodings: unifying around a single encoding will allow us to share useful functions like the timestamp parser. I expect we'll probably switch to using UTF8 all the way through, rather than converting to UTF16, as this will reduce the number of memory allocations performed, and in many cases will result in reduced memory use. Unify, unify, unify!
- Increase number of errors (EG, some users will likely want to ignore certain errors. Why should we die on "vertical : rl", it's pretty close to valid, so some users may wish to ignore it. WEBVTT_UNEXPECTED_WHITESPACE or something is needed)
- A lot more work on making the library thread-safe. This is important. The webvtt_parse_chunk routine is entirely synchronous and simply will not work and 2 threads simply can not talk to the same webvtt_parser as it will break. While this could be a good reason to keep the cuetext parser separate from the main parser, I don't think it's really worth it. If there is some way for us to deal with the file in a more asynchronous way, then that's awesome. But I'm not really aware of what we could do about that. So I expect we'll probably need to add locks to the internal parser structure.
- Some helper functions for turning a cue into HTML markup would be great, like webvtt_cue_to_html(webvtt_cue cue)