main | forum
May 25th, 2024    

CIS 751



Database Notes

Final(Take Home)

Notes 0014

Networks; P2P, etc.


The following is copied from: How BitTorrent Works 101

How BitTorrent Works 101


Why you need to keep your BT client running and hope the tracker keeps running, too

When you want to share something, you create a .torrent for it. You do that with completedir or maketorrent or TorrentWiz. You share the .torrent file with other people, whether on a web page or via email, or whatever. That .torrent file contains a link back to a certain tracker. (And various information about the files that are in the torrent, the number of pieces they have been broken into, and more.) The tracker can be your tracker or someone else's tracker. It doesn't matter because all trackers do the same thing: track.

When someone launches the .torrent file and starts up a BT client, it asks them where to download the file. If you point it at the file you already have, it checks to make sure you have all of it and that it is not corrupt. Whether you have all of it or not, your BT client connects to the tracker. The BT client tells the tracker that it is interested in this specific .torrent file. (That's what that big mess of hex numbers, the info_hash, is for.) The tracker tells the BT client about other people running BT clients that are also interested in that same .torrent file. At that point, the tracker doesn't really do or know anything else, like anything specific about what is in the torrent. All it does it keep a list of who all is interested in which .torrent file. (Hence: tracker.)

From there, your BT client takes that list of interested people, and starts to talk to them. Your BT client talks to other clients and says "I have these parts (x,y,z), and I need these parts (a,b,c), what about you?". Your client then says "I'll trade you part y for part b?" and if the other client is agreeable, you get that piece. The pieces you get are almost random, which is why the BT client starts off by making a full size file (or files), as it will then be filling in random parts as you get them, and why you have to wait for the "Download 100% complete" message, even though it looks like you have the files already. This tit-for-tat exchange is also why your download speed is related to your upload speed, and why torrents start off slow but get progressively faster. (As you have more to share, more people want to trade with you.) The clients keep doing this until everyone gets the entire file.

This is where the tracker comes back into play. Every 30 minutes, each BT client dials home to the tracker and tells it how it is going. (That's how you get the nice completion statistics you see on so many tracker home pages.) The client also gets an updated list of people who are interested in the torrent. Even if your client can't get back in touch with the tracker, as long as it got that list once and there are still parts that other people have that you want, you will still be able to keep downloading. But if you run out of parts and can't talk to the tracker to get a new list of people, you're screwed.

That's why you need to keep both running. The tracker has to stay running so that everyone can keep getting new lists of people, and the client has to stay running so that the parts will get shared and actual downloading will occur.

by Knowbuddy

The Filesharing Networks

The following is copied from: The Filesharing Networks

I need an introduction. I don't have one. Deal with it. I'm going to describe the differences between the major internet-based file-sharing networks out there (as well as SpookShare and Freenet), because that's something I know about.

Napster was the first widely used filesharing network out there, so I'll start with that. Napster works by having users download a Napster client which will run on their computer. The client makes a single [TCP] connection to one of many servers run by the Napster company. Over that connection, it sends the server information about itself, such as the names of the files available for download, your connection speed, etc. The server takes that information and puts it in a database. When you type a search into your client, the search parameters are sent along your connection to the central server. That server then looks through it's database, and sends back the addresses of the files that match which have been posted by others (said address includes both the address of the host machine and the location of the file on that machine). In order to download the file, your client makes a separate connection directly to the machine hosting that file, and transfers the file using some weird protocol (not HTTP, I think). Note the reliance on the central server for doing searches. That's a Bad Thing, because if there's a power outage or a cable cut or the RIAA doesn't like what your doing, that central server will fail and nobody will be able to find eachother's files anymore.

Some guy at Nullsoft thought he'd make a file sharing network that didn't rely on any central servers at all, and came up with Gnutella. Gnutella works without any central server whatsoever. When you start up your Gnutella client, it connects to a few other Gnutella clients. When you do a search, the search parameters are sent along all of those connections at once, with a TTL of something like 7 to 16. Every client that gets the search request looks through its collection of files and sends the addresses of any matches back to you, then decrease the TTL by one and send the search to all of their neighbors, who do the same, until the TTL reaches 1. That avoids reliance on a central server, but there are serious scalability issues here. Say you send out a search to 3 neighbor nodes with a TTL of 10. They each forward the search to 3 or 4 other neighbor nodes, and so on. The search ends up being sent over 3^10 (that's 59049) times. That uses up a bunch of bandwidth. When you have more than maybe a hundred people doing this at once, it uses up enough bandwidth that you're not going to have much left over for anything else.

Something else I should mentiona bout Gnutella is that it transfers files over HTTP, the same way your browser retrieves web pages. It also does this directly between the machine with the file, and the machine which initiated the search for the file (if they chose to download it). This makes Gnutella not anonymous unless you only do searches and never download anything, because people can tell who it is that is actually downloading a file.

The next generation of filesharing networks took an approach somewhere between having centralized servers and having everyone be at the same level. Networks like KaZaA have hosts which meet certain criteria, such as having sufficient bandwidth and CPU power, act as 'SuperNodes'. Supernodes do a lot of the dirty work that every Gnutella client had to do. I'm not quite sure how KaZaA works, because it's closed source (bleck), but it works pretty much like Napster, except that there are a lot more servers (the SuperNodes), and they pass information between them. New SuperNodes pop in and out of the network all the time, and they're run by anyone who happens to have enough bandwidth and CPU time, not a single company, making KaZaA almost as hard to shut down as Gnutella, but much more scalable.

KaZaA, like Gnutella, uses plain old HTTP to transfer files. Unfortunantely (or possibly fortunately, but I don't like it), KaZaA uses a completely proprietary protocol for organizing the network and doing searches, and their client is Windoze-only, so I'm up a creek if I'm running Linux. It also crashes alot. I say having a proprietary protocol may be a good thing, because then you don't have every Joe Schmoe and his brother writing a client. You did have with Gnutella, and it seems like a lot of the clients just didn't work. I haven't been able to download anything off Gnutella for a long time. When I am able to, I usually get 99% of the file and then the download stops. On KaZaA, when I try to download a file, I almost always get it. However, the fact that I have to use their (Win only) software to access the network really turns me off. I was fairly happy with the Gnutella software model (open protocol), but I didn't like the scalability problems.

I invented a little network called SpookShare. It is made up of a bunch of HTTP servers with a little CGI program running on them. All searches are done completely over HTTP. Running all over HTTP does use up more bandwidth than it should, though, because for each search you do, you must make a new TCP connection. It does, however, make for a very open and easy-to-implement protocol.

SpookShare works like this: Someone running an HTTP server (this could be you, as the best HTTP server (apache) is free for Windows or Un*x) wants to share their files. Now that they're running the server, anyone who knows about those files can go get them, but if, say, the server is on a dial-up connection, it'll be hard to give people your address before it changes. That's where SpookShare comes in. As long as there are a few SpookShare nodes on permanent connections, the person who wants to share their files can post their addresses to one of those well-known nodes, where anyone can find out about them. If you do a search on SpookShare (go to a SpookShare server and put in a few words, just like you do with Google or Yahoo), the server will look through all the addresses of files that people have uploaded, like a KaZaA supernode. Searches are done depth-first, until the desired number of results is found, instead of everyone-send-this-to-all-your-friends, as to avoid Gnutella-style bandwidth wastage. If the server doesn't find the desired number of files, it asks another SpookShare server, until the TTL runs out. The search then 'backs up' and tries again. The client I have written for SpookShare (SWSpookShare, or Spookware SpookShare) is written in perl, and requires a separate HTTP server. It can run on Windows or Un*x.

(If you would like to try SpookShare, run a permanent node, or help develop (all would be greatly appreciated), check out

I should also mention Freenet. Freenet works completely differently than any of the above networks, because it is meant to do something different. Whereas the networks I mentioned before have a special way to distribute searches, and leave actual transfers of files up to conventional methods (such as HTTP), Freenet is all about the moving files around. Freenet was designed as an information storage/retrieval system that would make it almost impossible to remove any specific piece of information, if it was popular. Freenet dows not do searches. Instead, you must know the exact 'key' of a resource (a file), in order to download it. To retrieve the resource with that key, you ask a single neighbor about it. Requests are done depth-first, like SpookShare searches. If the file with that key is ever found (doesn't always happen - if a piece of information is not requested enough it will dissapear), the file itself - not a message giving the address of the file - is sent back through the chain of nodes to he who initiated the request. Every node that the file passed through will keep a copy of the file. This will cause files to actually move closer to where they are more popular. A side effect of that is that it makes it nearly impossible to know who initiated the request, making Freenet users, and those who post information on Freenet, almost completely anonymous. Also note that certain countries (China) are trying to block access to the Freenet website (so it must be good, right?).

And that's all I have to say about that.

- T.O.G. of Spookware

© 2006, Particle