Just to get it out of the way ->
the new Range API. Not fully documented yet, but should give the idea to those curious ones wanting to read code. As you can see, the new implementation is significently smaller than the old
range.h and
rangepol.h. As mentioned earlier, I dropped a lot of things that go beyond the scope of Range API's responsibilities, which simplified things a lot.
What I'v been pondering about however, is PartData. The current implementation of PartData has a multitude of problems that need to be addressed. Namely, it should be easy to learn/use, and provide protection mechanisms against programmer errors. The major source of errors tend to come from forgetting to unlock/free used ranges, unlocking/freeing them twice and so on. This problem could most likely be addressed by introducing Lock objects (similar to most threading libraries with mutexes), which free the ranges they refer to during destruction. What makes it complex though is the fact that in threading API's, the locks are meant to be kept for short period of time, usually within scopes, and thus can be stored on stack, however, in our case, we need to keep the locks enabled for long periods of time (e.g. until the data has been downloaded, or source dropped).
Another (even more) important aspect of PartData is the choice of which parts to download. While PartData does not control downloading itself at all, it can indirectly control it via giving out specific ranges, since that's up to PartData to decide. The original system has a lot of flaws in that area, which resulted in a lot of range fragmenetation and incomplete chunks until to the very end of the download. New PartData implementation must:
- First priority is complete any parts that we are capable of hashing, both to validate the data, and to make it avaiable for uploading.
- Choose random (hashable) parts (but prefer first/last parts if those are not downloaded), to avoid sequencial downloading.
It's very important to fully randomize the part selection (except perhaps for first/last chunks), because if we download sequencially from begin->end, or completely randomly (disregarding chunks), we cause large-scale problems on networks if hydranode is used a lot. For example, if we download sequencially, the ends of files will become very rare, since people tend to un-share files after completing them. Alternatively, if we download random chunks, disregarding chunkhash boundaries, we won't be able to share the hashed parts until near the end of the file due to chunk fragmenetation (since we miss chunks of the parts, we can't hash them, and thus can't share them), which pretty much breaks a lot.
Madcat, ZzZz
PS: This blog engine also has comments capabilities (altough you have to bear with a nagscreen asking login/pass - but there IS an "post anonymously" link there), so - comments/thoughts are always welcome :)