Hey, I moved Storkyak to WordPress. I was writing my own command line blogger tool, called makeblog, but it turns out that WordPress gives me the flexibility that I really want - in particular, the tagging.
So hey, now I can get back to the business of assembly language and 3d stuff.
Tags: Uncategorized
I’ve made some changes to Factdiv that are designed to set up me up for some more interesting work.
I’ve added GNU’s GMP library to factdiv and use it for native types. The hype says this library is supposed to be pretty fast but I have to confess that I am completely unimpressed with its performance and reliability. It’s slow, and it has a bug that seems rather silly - it blows up if you do a conversion from a string to a big number with a leading 0 using the C++ mpz_class. The crash does occur in the library, not the wrapper, but the wrapper itself does some creative things trying to improve performance. I will probably write my own less flexible but faster big int routines in x86-64 assembly.
Now, getting GNU gmp is actually easy and installing it is pretty easy to. You grab the tar.gz and do the normal make, make install thing. However, you should read the directions on this one, because by default the C++ wrapper is not actually installed. The installation works well enough, once you read the manual and have some familarity as to the proper convention for installing given a source code and an automake.
Remember that factdiv navigates a search tree, where each node of the tree is a pair of digits, one for the top and one for the bottom, of the factored number. Each level of the depth represents a digit pair position. So, if I am factoring 901 into 53 x 17, 5 & 1 would be the 2nd pair, and 3 and 7 would be the first.
One of the reasons I did use a bigint library was the realization that I could benefit from a trial division at each pair as doing so allows factdiv to blow through small factors rather quickly. Also, checking trial factors is now a trivial multiply. Interesting, I really will have to do my own multiply and division, because the GNU multiply and divisions multiply and divide the entire number, like you would expect. But since I am traversing a tree, I really only need a partial division and a partial multiplication to cover the parts of the number that I have changed. So, there’s still more changes in store for the guts of factdiv.
On a broader front though, a huge problem with factdiv is in fact trying to prune branches from the search space. Essentially, factdiv doubles its search time with each additional digit of the number being factored. So, factoring a 5 digit number is -roughly- 2^5 iterations, whereas factoring a 20 digit number is 2^20 iterations. Without a way to cut down on this exponential explosion, factoring larger numbers is really out of reach.
So, this release has some new options designed to provide a framework to prune this search space, and, to test that pruning.
-s allows for a score to be kept on each digit pair. The idea of scoring is to create a sort of a hash that looks at the number being factored and relates it to it a digit being tried. The algorithm that is there now is not that good, and is really a placeholder for future algorithms to be tried.
-sx is like s, except that it tries to also build up to the number being factored by taking a piece of the string. Like the -s scoring, this algorithm is not good, and it serves as a placeholder for future research.
-p shows gives the number of iterations for factoring sets of 10 numbers. each set adds an additional digit. It also roughly calculates the average number of factoring iterations per digit. The more successful I am with factdiv, the more that number of factoring iterations per digit will remain constant as the number of digits increases. Right now, it basically doubles with each digit. The cynical might call that a total failure, as would I in my darker moments, but, you could also see it as an opportunity for future successes.
Other options, just to document them here (they display in factdiv)
-t test. This just factors 1 - 1000 (the range is easy enough to adjust in the code), checks the results via another algorithm, and gives the total iterations. This is good for mechanically checking factdiv. Right now, there’s an optimization I made that isn’t quite right, so some small number of numbers factdiv claims are prime, in fact, are not.
I’ve committed factdiv to sourceforge, and you can get from that the source. Factdiv in sourceforge
Tags: Factdiv · GCC
I’ve ported my factdiv program to Linux. It is C++ and is designed for x86-64 only, or, really, any system where the sizeof(long) == 64 bits. I’ve checked the source code into source forge using subversion at factdiv. I like the way subversion works better than CVS, but that’s mostly because its easier for me to remember how to check in. I’m not very bright.
As I wrote in my previous post about factdiv, factdiv is designed to work by traversing hunting a decision tree for pairs of digits that accumulate to make the pair of factors. Therefor, to get any sort of performance out of it, I have to be pretty careful to decide exactly when I’m going to recurse down a branch of the tree. I’m not doing a good job of this right now, at all, and so, for that reason, factoring seems to be limited to numbers of about 10-20 digits or so.
I don’t do any trivial factoring first. A lot of other factoring programs might do a few thousand trial divisions just to see if they get lucky and remove the easy factor. For me, this is somewhat counterproductive, as, I’ve found that letting factdiv go through its paces regardless of whether the number is easy or hard, prime or non-prime, is informative. factdiv doesn’t really care too much if the number being factored is factorable by a small number first, anyway. Instead, factdiv works well when the digit pairs happen to line up early in the decision tree branches, and extremely poorly if they do not.
Tags: Factdiv
FACTOR is the problem of taking a product, such as 15, and turning it into its factors, such as 5 x 3. In other words, you look at a number and figure out what two numbers, multiplied together, make that number.
This is a deceptively hard problem, and a lot of finer minds than I have pondered how best to tackle it, and have walked away empty. The problem is a combinatorial explosion. As the number of digits to factor increases, the number of probable solutions rises exponentially. FACTOR is so hard, actually, that, it is used as the basis of many kinds of security both on the internet and off. For a long time, a company called RSA actually held a contest that awarded prizes for factoring progressively large number.
I simply could not pass up so immense a challenge. I have devised my own factoring algorithm that approaches the problem from a different perspective. Right now, this algorithm is frankly not as good as others out there. Number Field Sieve reins supreme, and my algorithm isn’t even close to that! Where they can factor numbers of up to 30-40 digits on a PC, I am only good for about 20! However, it is a different approach and it has some other advantages that I feel worth exploring. Even if it doesn’t get anywhere, the problem will help educate me further, and will also provide a basis for some future discussions around multi-threading, SSE, and decision trees and all sorts of geeky good stuff rolled into one little problem.
My factoring program is called factdiv. You can download the Windows source here Factor Source. A Linux version will be available shortly. My faithful dual Opteron had a sick SATA controller on the motherboard that I just upgraded, but, instead of eagerly porting FACTOR to it, and using the threading library I wrote about earlier, I instead regrettably have to attend a funeral, and from there, make up lost time with my day job. Hang in there!
factdiv works by what I call reverse division. Imagine how a number is multiplied:
1 2 3 4 5 6 ----- 3 2 1 carry 6 2 8 t1 5 0 5 t2 4 8 2 t3 --------- 5 6 0 8 8 result
You can see that every digit is multiplied by every digit. If you look a bit further, though, you can notice two things, if you look at how the result is calculated column wise.
a) The first digit is always the product of two integers mod 10. That is, in the above result of 56088, the 8 is the product of 6×3=18 mod 10=8.
b) We can express the columns as the sum, mod 10, of a simple series. Each column(i) is the sum( top[N-i]*bottom[i] ) + carry % 10.
Armed with this information, we can thus develop a factoring algorithm.
Examine the last digit, and guess a pair of digits A0 and B0 such that A0 * B0 mod 10 = the last digit of the number. Stash that carry, and proceed to the next digit.
For each subsequent digit I, guess a pair of digits AI & BI such that AI * B0 + A0 * BI + the previous carry + previous existing digit sums yields the total at that digit.
Note that, in both cases, we are selecting a pair A & B. The A is for the top set of digits, and the B is for the bottom. Taken together, these will be the two factors. So, consider the example: 243 = 81 x 03. We would first select 3×1, then 8×0.
To simplify things, and to improve performance, factdiv calculates all the possible digit combinations for the first and second cases outlined above. Thus, at the key point in the factoring, if it is necessary to extract a pair of digits, or factroid, that will effect the total of a particular column by 3, one can merely look up a 3 and get a list of choices. For 3, for example, we could have 3×1, 1×3, 9×7 and so on.
It is this list of choices associated with each digit that selection of pairs that concerns us. You see, the longer the list, the greater the combinatorial explosion with each digit being factored. If I need a 3 for the 2 second digit, then, I can pick one of several ways to get a three. Each of those choices effect what I can pick for the subsequent digit, thus, you really can see that this approach to FACTOR is not so different from the likes of TSP or some other difficult problem. If the tree of our decisions merely doubles at the size of every digit, then, it is well impossible to calculate as the weight of digits causes so many more doublings. FACTOR, taken this way, is an exponential problem.
FactDIV right now just barely attacks the exponential side of the equation. It’s smart enough to prune a branch when a number will be larger than the number being factored, and its smart enough to prune a branch if the chain of possible choices yields a dead end. Taken together, those two keep the number of total combinations much less than the size of the number being factored, however, they do not solve the exponential problem. It only takes 9 trials to factor 243, 103 trials to factor 242323, 894 trials to factor 123456789 and so on. But, as we get higher, the weight of possibility leans against us. It’s not sufficient to exhaustively scan all of the possible decisions.
Tags: Factdiv
You can use a DeWalt 14.4V cordless drill battery to power a large scale Bachmann train in a pinch. I have the Bachmann Teetsie Express large scale railroad, but had lost the transformer which powers the track. Determined to get this thing working for my two year old son, I
1. Set up a 48″ ring on thick sheet of MDF, and nailed the track to it.
2. Charged up the battery for a DeWalt cordless drill.
3. Stripped both ends of two wires. One wire is positive (+) and another (-). The top of the Dewalt battery actually has both exposed.
4. Wire the track to the battery.
The train should go along at a fairly good clip, but not top speed. There is no speed control - you need a resistor for that! But, what you can do is switch between forward and reverse by flipping the polarity of the wires. Works like a champ. I wedge the wires into the top of the DeWalt battery, and it will run the train for a good two hours.
Tags: General
Well, its been a long time since I was into my Linux box. First, there was work and a mad, mad schedule. Secondly, my son spilled a bottle of soda on my monitor and computer, and the result was not entirely budgeted to fix. Even now, the DVD player is still dead, so I’m still stuck on SUSE 10. No Ubuntu for me.
I’ll have to figure out how to get Yum running so that I can upgrade to KDE 3.5 (from my existing KDE 3.4). And of course, to the latest KDevelop. And I want XGL too.
Now, in any case, I did do some things to get makeblog more or less almost working. It will actually read in a config file and a bunch of html - like pages, and then create a new directory filled with marked up stuff per the template and style sheet. However, it’s not completely there yet.
Still, some good stuff is in there, and its all Linux C++. I’ve got a simple put directory via FTP wrapper, a copy directory wrapper, a good temporary directory creation wrapper and even a simple HTML parser to a DOM.
Download makeblog here
Over the next few days, I really, really promise to tell how it all works, and um, to get it too work. Most of it does, actually. There’s a lot of automated unit tests.
Tags: makeblog
I grabbed a copy of the FastCGI library for Windows, plus the GNU CGICC classes, put them into a single Visual Studio 2005 project, and got it to work with the new FastCGI ISAPI extension that Microsoft wrote for IIS. I think the result is pretty cool, and presents a viable alternative for web development on Windows instead of the traditional ASP.NET. You get the power of C++ for real systems programming, but you can use IIS to manage your networking code for you. That’s pretty sweet.
Download it Here
There’s some decisions I made… one is, that, the CGICC and FastCGI libraries were written to generate DLLs, so I went and jiggered things to make it more pleasant for static linking. That is, the project to produce the library generates a static library.
Tags: FastCGI
This means that the template you referenced is undefined. The compiler was looking for a template name, but, you gave it the name of something else, therefor, it failed.
I was working on adding top down parsing to the compiler, to provide better error messages. The crux of doing this is to have a mechanism where one may explode out a rule and its definitions. My existing grammar class was written to support bottom up parsing. Some surgery was required. In my previous article, I presented a class called orderedList and its unit test. I now want to use orderedList to help implement a rule list, and then, in turn, compose those list of rules into a rule by dictionary id lookup. Do keep in mind that this is not a bottom up design process. I sketched out the overall design previously, but am doing bottom up development so that I may test each piece in turn.
When completed with these classes, I will be able to quickly recurse down a chain of rules, all tied together by the dictionary id of the word, indexed by an array. Mechanically, it ought to be reasonably quick, although, I am running in O(n^3) time so it will not be -that- quick. The extra complexity comes from all the different trials needed to support a truly generalized top down parser.
In any case, I set out to make my rule_list_type. One of them began as follows:
template <class MYCHAR> class rule_list_type : public ordered_list_type< rule_type<MYCHAR> * > { public:
This would have been all well and good, but the compiler was bitter. It turns out that ordered_list_type is actually spelled orderedList. Thus the compiler howled:
/home/tbandrow/projects/garrett/src/rulelist.h:40: error: expected template-name before ‘lt;’ token/home/tbandrow/projects/garrett/src/rulelist.h:40: error: expected `{' before ‘lt;’ token/home/tbandrow/projects/garrett/src/rulelist.h:40: error: expected unqualified-id before ‘lt;’ token
Once I spelled the class correctly, those error messages went away.
The root cause of this, though, is maintaining two different naming conventions in one project. I really did have a strong spot for the all lower case separate words by underscores convention, (eg some_class_ahoy_type), but it seems the world has turned against me on this one. So, I’ve been using the more scripting language like mixed case (eg someClassAhoy) for all the new stuff. However, I have this old parser library, using the old case, and I have been mulling over what to do about it.
I really do have to bite the bullet and change the class names.
Tags: GCC
Before I begin, let me point out that the link for the article always points to the LATEST garrett tarball. You can also get to Garrett through SourceForge. There, you can get from the Subversion repository, or browse the source files to the project online.
Anyway, I thought I would share with you an example of a simple automated unit test. Rightly or wrongly, I decided to add a class called orderedList to Garrett. orderedList is an insertion sorted singly linked list template, whose order is specified by an integer. It also specifies that the order of items with the same integer order will be as they are inserted. This is going to be used to implement precedence in rules for my top down parser.
So, let’s have a look at my unit test.
static bool unitTest() //!< a unit test for this list, returns true if pass, fail { bool failed = false; orderedList<string> list; orderedList<string>::iterator it;
// ------------------------------------------
if (it.full() || !it.empty()) { failed = true; cerr << "orderedList failed unitTest: newly constructed iterator contained a value." << endl; }
// ------------------------------------------ // test that iterating on an empty list works int count = 0; for (it = list.begin(); it != list.end(); it++) { count++; }
if (count) { failed = true; cerr << "orderedList failed unitTest: empty list did not iterate correctly." << endl; }
// ------------------------------------------ // test that the list correctly inserted elements, and in order // also tests that dereference and order works list.insert( "D", 4 ); list.insert( "B", 2 ); list.insert( "C", 2 ); list.insert( "A", 1 ); list.insert( "E", 5 ); list.insert( "F", 5 ); list.insert( "H", 8 ); list.insert( "G", 7 );
string actualResult(""); for (it = list.begin(); it != list.end(); it++) { actualResult += *it; }
string expectedResult( "ABCDEFGH" ); if (actualResult != expectedResult) { cerr << "orderedList failed unitTest: inserting ordered elements expected " << expectedResult << " and got " << actualResult << endl; }
// ------------------------------------------
if (it.full() || !it.empty()) { failed = true; cerr << "orderedList failed unitTest: iterator after the end of the loop is supposed to be empty." << endl; }
// ------------------------------------------ // test that the list emptied, and that the first element is not good list.clear();
it = list.begin();
if (it.full() || !it.empty()) { failed = true; cerr << "orderedList failed unitTest: begin() after list clear should return an empty iterator." << endl; }
// ------------------------------------------ // test that the list was successfully populated after being emptied not good // and also with the edge case of all elements in order
list.insert( "A", 1 ); list.insert( "B", 2 ); list.insert( "C", 2 ); list.insert( "D", 4 );
expectedResult = "ABCD";
actualResult = ""; for (it = list.begin(); it != list.end(); it++) { actualResult += *it; }
if (actualResult != expectedResult) { cerr << "orderedList failed unitTest: inserting ordered elements after clear expected " << expectedResult << " and got " << actualResult << endl; }
// ------------------------------------------ // test that the list was successfully populated after being emptied not good
list.clear(); list.insert( "D", 4 ); list.insert( "B", 2 ); list.insert( "C", 2 ); list.insert( "A", 1 );
expectedResult = "ABCD";
actualResult = ""; for (it = list.begin(); it != list.end(); it++) { actualResult += *it; }
if (actualResult != expectedResult) { cerr << "orderedList failed unitTest: inserting ordered elements after clear (2) expected " << expectedResult << " and got " << actualResult << endl; }
return !failed;
}
As you can see, I work the list over a little bit. Is this a perfect test? No, but I imagine that if I encounter any bugs in the list, I will add a unit test to the above class to reproduce the bug, so that, once I fix it, I will find that I should pass the unit test. Also, its worth noting that dumping the failures to cerr detracts from the portability of the class a bit. It might be better down the road to pass a stream object to the unitTest method, so the results can go anywhere, be it cout or cerr or a string.
In any case, I then invoke my unit tests from the command line:
case 't': // test switch (arg[1]) { case 'c': // test calendar calendar::unitTest(); break; case 'a': // test assembly curveFilter::unitTest(); break; case 'm': // test monte carlo integration monteCarloTests::integration1(); monteCarloTests::integration2(); break; case 'g': // dump the grammar dumpGrammar = true; break; case 'd': if (collections::orderedList<int>::unitTest()) cout << "orderedList passed unit test" << endl; if (collections::skip_list_type< int *,int *,fdt_copier<int *>,fdt_copier<int *> >::unitTest()) cout << "skip_list_type passed unit test" << endl; break; default: cout << "Unrecognized option after -t" << endl; return EXIT_FAILURE; } break;
You can see that my style of automated unit test invocation is evolving. For my other unit tests, I kick out a lot of output no matter what. But, I think it is better really just to kick out a brief note that the system passed, if it did pass, otherwise, only kick out what failed. I’m going to have to go back and change calendar and curveFilter to be consistent.
Tags: garrett
I’m in the process of adding a top down parser to Garrett and in doing so I’ve gone from being a grudging convert of Automated Unit Testing to becoming a something of a zealot in favor of it. When I say Automated Unit Testing, or AUT, I specifically mean that, as much as practical, every class should have a static method called UnitTest, which creates an object of that type and validates that the object functions as it is going to be used.
Manual testing takes too long. I found, at my previous client, that even if I devoted 50% of my time to testing, I would still wind up in Q/A with bugs. We would then spend even more time on test scripts and our testing coverage was horrific. Of course, when I asked if we should not devote time to automated unit testing, the paradoxical answer was: “We do not have enough time”.
By contrast, with Garrett, I have automated unit tests for increasingly large sections of the program. I have unit tests for the calendaring, the vector routines, and I have just added two more unit tests for some of the underlying collections that I’ve used for Garrett. These tests are by no means complete - I have so much more to do. But, the results continually pay for themselves. If I launch Garrett now with -t and some other options, I immediately get full regression for free on an increasingly large set of Garrett subsystems. Eventually I expect to have fully automated testing for the entire application, so that, you can launch Garrett with a -te command line argument and it will run regression testing for you on everything.
Based on this growing success, I’ve decided to create this automated unit test FAQ.
What is Automated Unit Testing, or AUT.
Every class has a static method, called unitTest, so that it can test the class automatically. The application then has an option to invoke these tests either from the command line, menu, or web page.
What are some of the benefits of Automated Unit Testing
Automated Unit Tests, or AUT, constrains the exponentially rising cost of software development by reducing the need to perform manual regression testing. AUT allows you to make underlying architectural changes and know what the real impact on your system is. AUT also helps promote safe code re-use by providing a convenient place for developers to see how a class is meant to be used. It follows that AUT can take some of the pressure off of the need for really rigid conventions, simply because, code that you can use to see how to use a class will be part of it.
We don’t have time to build automated unit tests.
If you are not building automated unit tests, you are wasting time, because you are doing regression testing by hand. Manual regression testing paralyzes projects by making it impossible to do sweeping architectural changes when they are so often needed. As a result, future changes become more cumbersome and expensive to write.
My design is too complicated to build an automated unit test for.
This means that your design is inadequate. Sorry, but that is just the truth. We are going to need to start designing our systems so that they can be automated tested. We need to get it into our heads that code that is difficult to test is often not good code.
My design depends on certain database data for a unit test, so I have to do it all by hand.
Build SQL statements to populate expected data into your database as part of your unit test.
It is difficult to build automated unit tests for GUIs
Yes it is, but that indicates an area which requires us to innovate solutions, not run from problems. As a practical matter though, any sufficiently complex system will have a thinner GUI layer with the business logic encapsulated somewhere else. So you should be able to automatedly test your middle tier components.
Unit testing is not part of the product
Customers will appreciate the ability to see that you have so much automated testing in your product. People do spend more on certified pre-owned cars, do they not?
I don’t know how to automate this test or manage this new way.
Do not feel stupid! You are not alone, but there is always Google. Automated unit testing is going to be a huge field in its own right, and requires different ways of looking at problems. You will have to learn how to do this just as much as we’ve had to learn new technologies. Being able to write effective automated tests will help you become a better programmer and being able to manage such a project will make you a better manager.
How do I know what to unit test?
Build unit tests to handle the common ways you want your object to be used, then, test the edge cases as much as possible.
Should I build my unit tests to somehow not ship with the program?
NO. If you are writing a command line application, put a unit test switch right in it. If you are building a GUI application, have a Unit Test menu or button. If you are building a web based application, build a unit test page.
My Tech Lead says that the cost of implementing AUT doesn’t pay for itself, and he might be right in this case, what do I do?
Try piloting AUT on a small set of classes - and your lead should come to see the advantages of having a piece of code that tests itself. It is most likely that he or she is underestimating the cost savings.
Try to explain that AUT is part of a good quality culture. Use familiar examples in other industries. Ask her if she would rather drive a 1990 Toyota or a 1990 GM. Thirty years of GM saying it was cheaper to do a recall than it was to design the car right have ultimately created the now-innaccurate perception that their cars are junk. The point is, on any individual case, it might seem like you are losing money on AUT, but, as a whole, the project is worth much less without it.
Do keep in mind the impact on your career. If an organization is not committed to quality, then it is committed to failure. AUT offers hope to improve a rather dismal software quality picture. If your lead is ultimately stubborn, know that you will take the hit when things go wrong before she does. Line up your ducks in a row to try and point Q/A problems at him or her. You want to be in the “lesson’s learned” meeting with your AUT memos handy. In the worst case, get your resume together and find a company that knows how to win.
Should I build AUT for things I already know to work?
Yes, because you might change them, and, you can’t even really say if something works until you AUT it. Obviously, though, its not as important to build AUT for “proven” code as it is for new code, but, when you can, you do want to go back and clean up your old code by adding AUT.
Automated unit testing is going to be big. You can already see it as the basis of XP methodologies, but, you don’t need to follow a specific methodology in your project to get the benefits of automated unit testing. I would imagine that a traditional waterfall methodology could just as easily benefit from AUT. And, I think it is fair to say that broad acceptance of automated unit testing will have an impact on our toolchains. For that reason, I’m going to make a fairly bold prediction:
A completely untype safe language that makes it very easy to write automated unit tests will eventually displace any type safe language that makes it hard to write automated unit tests.
Down the road, when I do write my super language, given the choice, there will be less of an emphasis on rigid type safety and more of an emphasis on supporting automated unit tests. The best way to check a program is to run it. I’m not so sure that compilation matters as much as thought, but I can guarantee you that AUT is more important.
Tags: garrett