PDA

View Full Version : Solving the online bandwidth problem(my take)



jasong
05-18-2008, 12:37 AM
I´m sure you´re all aware of the doomsayers saying that the Internet is going to slow down to a crawl in the next decade. I wouldn´t be one bit surprised if they´re proven right, but I don´t think added bandwidth is the solution. In a sense, I think LESS bandwidth per person is the solution. Here´s what I´m talking about.

A few key points I need to make:

(1) When you download a video file, there´s an excellent chance that the read/write heads aren´t getting stressed at all. It´s a cakewalk for the hard drive to handle the transfer requirements of the vast majority of Internet connections.

(2) Those video files end up being duplicated into millions of packets when they´re transferred. In a popular torrent, the same information might get retransferred hundreds of times.

(3) Most videos that people want are also wanted by 10s of thousands of other people at the same time. 90% of the bandwidth might be taken up by 2% of the files available.

My solution, or the beginnings of one, is to have a dedicated network for really popular media. Combine thousands of connections worth of bandwidth and dedicate it to transmitting, and retransmitting, shows that are popular at the time. Instead of an Ugly Betty episode taking up 1,000s of times as much bandwidth as the actual size of the file, transmit it at about 30-50 times normal speed every few hours. People wouldn´t be able to get the show immediately, but it would still be faster than things like Bit Torrent, and people would be able to transport it around instead of being attached to their desk by the Web version. I know there´s a lot of problems with DRM management, but if they come up with a better system than Bit Torrent, or at least a faster one, people will flock to it.

Worries about profits are slowing down entertainment innovations. They shouldn´t be hanging on to old methods for dear life, the big companies should be figuring out what works AND THEN figure out how to profit from it.

PCZ
05-19-2008, 04:41 AM
Jasong
There is plenty of spare capacity in the backbone of the internet.
No need to panic.
Stories like this are FUD spread by ISP's to justify capping and price hikes.

It works like this.
Company a peers with company b at 100m
Traffic increases so they start peering at 1g
Traffic increases again so they peer at 10g
Looking at projected traffic growth they plan on upgrading to 40g next year.

Problem is getting this to the end users as most of the connections in to the home are owned by one company.
They have a strangehold over the industry.

In the UK this is BT.
They are the ones strangling the internet for end users.
Every Country has their 'BT'

Chuck
05-20-2008, 01:47 PM
Jasong
As one who knows how much fiber is in the ground AND how disk drives work (that little thing called Cylinder Cache)

Do yourself, and us, a favor,

1. Get the actual data about how much Fiber is in the ground, and, most importantly, how much of it is still dark (unused). You will find that over 80% of it is still dark even in the most heavily used areas. Stop and think, the cost of fiber is the labor required to put it in the ground. So, Company A decides they need 50 pair of multi-mode, multi-spectral fiber. They put 250-300 pair in the ground at that time. The reason is the labor. It takes more effort to get a team of workers out to the site and run it through the conduits percentage wise than the few pennies/meter of fiber. AKA... It's cheaper to put in 10-20-50x more than they actually need for the next 4-5 years than it costs to actually put it in the ground and hook it up.
So, what do they do? Put the fiber in the ground and THEN light up pairs as they are needed. Perhaps you should read up and post here for us just how much capacity a single pair of fiber can carry. Now compare that to the number of homes in a typical community and the grade of service they need. Show the math of how much is used and how much is still dark, yielding a usage percentage and an unused percentage. Additionally, relating to what PCZ said, the other cost, and point of ANY issues is how to get it to you, the end customer.
If you run DSL, then you are typically running PPPOA. That stands for Point-to-Point-Protocol-Over-ATM. ATM is the Telcos' (Telecommunications companies') primary internal protocol... It is how the CORE switch works *AND* it's just the right size for sending little voice packets around without any noticable digitization problems. I could go for hours on this, but will leave it at that.... Telco's use ATM. period. If you run cable, then you are more like traditional, hardline eithernet (CDMA) where everyone on that segment of hardline cable must share the bandwidth. This is the segment/part of the internet that actually makes it to your home. If there is ANY bottleneck, it is here. CDMA has, per its specifications, a specific timing specification. lots of little packets stress this specification. This is why cable providers, but not DSL providers, are apt to limit or block things like bittorrents. The 'uplink' speed of the specification (the speed at which data can be sent back 'up the cable', is lower than the speed it can be send down the cable). Cable is designed, and always has been, as primariliy a one-way DELIVERY-TO-CUSTOMER medium. The 'HEAVY TRAFFIC' you speak of is the # of channels of HDTV or something like that, but that falls far short of the bandwidth available within the Cable ISP's capacity. If you'ld like, we can do the math here... take 100 cable channels, sent to 100,000 customers. Give each of those cable customers Internet service. Use the following as a guide AND assume that every customer is running Cat 6 (Gigabit) ethernet on EVERY computer in their home. Use the following, 'minimally acceptable teaching table' as your guide. (and this one is not all that accurate)

http://www.hometheaternetwork.com/HTN_Cables5.htm

Now, keep in mind that EVERY cable provider is using RF Coaxial, digital or analog coded (Most are digital now in preparation for the Feb 2009 switchover) and you will find they have MORE than enough data capacity... It's a mere 3 Gigahertz (100 HDTV channels in MULTICAST mode to everyone). (In regard to your post, that is the equivalent of ONE SATA-2 drive streaming at NORMAL, DESIGN specifications). When you watch a regular DVD, it is at a mere 6-8 Megabits/sec. THAT's it! If the Hard drive on your PC were that slow (1 Megabyte/sec) you would have a fit!

Why use Digital RF cable you ask? Well, your analog 'internet upload speed' is typically at 15 or 30 Mhz. There is where cable users get bottlenecked, if at all, when they bring all the cable users together into a switched network (yes, the same type of 'RJ-45' switch on your desk, except one that has 48, 96, or MORE ports on it. A good cable provider who knows the business, will have taken all this into account. What they must do, and many do it properly, is keep track and LIMIT how many customers with internet they put on a segment. (Read up on the original hardline (coaxial) 10 megabit ethernet spec). The nice part about running multiple speeds is you run them at different frequencies on the same piece of wire/coax/fibre (medium makes no difference w/ respect to timing). Just like multi-spectrum (light color) in a piece of fiber. Nothing interferes with anything else.

Now, doing the math, you see how the numbers LOOK large. This is how ISP's *TRY* to justify increasing your rates, because at first glance. this is where people get the misunderstanding about how much of the backbone is going to get overloaded.... HELL, we haven't even gotten to the backbone yet... we are still in the little cable, or telco ISP at the end of the tree... aka... the leaf on the branch. When we get to the trunk, then you can see just how much is really in use. This is where you start to see how many different paths there are.... Oh, and don't forget the data speed of a downlink from an orbiting satelite! (Why do you think people get their own dishes?)

Until then, realize that I probably put more disk data load in the simulator I just built, running 60 frames / second, rendering in HIGHER than HDTV resolution on 9+ data panels (36 - 2048x1536 pixel images stitched together in hardware). Oh, did I tell you that we run the simulators over the open internet (multiple commercial simulators) and over private net (multiple military simulators)? Does that traffic ever bother you? Have you done the math yet? Have your computed what it takes, data wise, to put 5000+ objects in a scene and have them 'move' in relation to your point of view perspective each second?

The other 'super load' i put on my own home intranet is my wifi video server which allows every tv & pc in the house to watch either TV or one of the movies I have in my jukebox. All of that is on MY micro-backbone.

Oh, regarding the sims, we did forget to multiply in how many simulators are out there and how many participate in active simulations all the time... OOOPS. :) And look, my internet service, rated at 12 Megabits/sec (10 Mbit DSL on a 15 Mbit capable piece of wire, not even fibre)... I get exactly 1.2 Megabytes/sec at my PC... and that includes the DSL modem, 2 routers, and 3 switches. I could compute packet delay time, but there's no point. Suffice it to say I am getting MORE than I pay for and STILL could get more except the Telco computer would see I'm getting more than I paid for and push me back down. The guys know that running 12 on a line 'quoted' at 10 will be overlooked by the computer.

Now, as for my performance? I can download ALL DAY LONG at full speed and not see ANY interference or delay as I chat with you or anyone else or even browse the internet or sit here and type this (what should be unnecessary) dissertation. Why? Full Duplex. transmitting and receiving at the same time, independent of each other.

Now, your point.. 10s of thousands of users watching 2% of the YouTube (for example) LOW RESOLUTION , LIMITED WINDOW SIZE videos (see, they know the math & where to be practical). FYI my friend, the heads aren't thrashing at all.. Running linux, you KNOW linux reads the data once, and if there is a need, it keeps it in memory. The next time the data is read, it's already there and the disk drive 'driver software' and internal 'disk buffer driver', simply retrieve the block from memory and push it out the wire. The heads only 'start' to get busy if 10s of thousands of users EACH wanted a different file. That is when it is justified for a systems architect/engineer (like myself) to have planned for such surges. There is NO single point data source... we've distributed it. We've also put RAID sets in for a) speed (so people like you don't complain) b) reliability... IN CASE... we get a bad drive from the mfgr and c) longevity. I don't want my engineers changing out 100 drives a week due to excessive use and end-of-life situations. Oh, did you know that Linux/Unix file systems are DESIGNED to 'splatter' the data on the disk to increase data reliability and performance? Figure that one out. How do I know? I used to teach Unix/Linux internals and device drivers on a daily basis.


Now, why did I go into all this? Because even at this rudimentary level of explanation, it's an educational session. You know I teach, and you know I've done things which cannot be discussed due to customer Non-Disclosure agreements. The point here is that you UNDERSTAND how things are built and what is going on. And, PLEASE UNDERSTAND, I'm not picking on you, but sharing all this for everyone / anyone who wants to know/learn. That's what we professors (and architects and engineers) do.

I sure hope this has helped.
Don't forget who I work for.....
That little company whose name you see more times than you realize... try doing a 'traceroute' to another user sometime.... you just may find yourself using the backbone and our network.


Cheers,
Chuck

PS: My 2nd thesis is done and published and a third one just made itself aware. I am going to try to work out the equations and all the proofs ASAP.

Chuck
05-20-2008, 03:10 PM
... I nearly forgot.

Look at the size of a Hard Drive's cylinder cache. It is reading AHEAD of the game, storing the data in a fifo (First In First Out) buffer in anticipation of your need.
Whether reading or writing... You are talking to the Fifo first. After that, the drive's onboard controller determines when to commit the new sectors to the media. So head thrashing is minimized even further here (in most cases eliminated) as a cylinder (or track) is written in one 'write' transaction, with the heads already exactly where they need to be.
If anything does go wrong in this process... it's probably because power failed or you hit reset and the filesystem buffer did not get chance to update the superblocks, directory block(s) and the actual sectors (data blocks) before the data was lost. That is what FSCK (linux) and CHKDSK (windows) is for.. to look for and fix those errors where possible... and if the data is gone, it's gone. The directory and file size information as well as sectors in the chain (each cluster assigned to the file) will be updated to properly reflect actual data on the disk & where it is. That's why we sometimes loose data when drives go down. They sometimes do not have enough time to flush all the buffers (internal to the drive and to linux/unix/windows) before the power is gone and there's nothing left.

As a reference. Our Mainframe, which does have multiple live satelite feeds on it (so you can imagine how much data is being recorded WHILE we are working) has something just over 128 drives in each rack and, I think, 4 or 5 racks.

That supports over 1000 simultaneous users (some 3000 usernames), many of which are running simulations or multi-gigabyte math models (like me) over our intranet (switched Gigabit segments). The average CPU load is less than 10% of the total CPU power and system bandwidth. Only a few simulations (I have a few) push the cpu load above 80%. Those simulations are run at night so they can run faster (8 hours vs 12-14 when regular users are on). These are equivalent to the geothermal type of simulations the oil companies run with their modeling, just a few levels (orders of magnitude) finer resolution.



If you (or anyone) need more on these, i will pm/email or post here if requested by everyone. I've done enough of a Uni lecture for one day :) (I'm really not 'harping', but to me and i thought to all, this whole thing was rather obvious and long since ignored.... so NO, I don't bite. :) )


So Jason, Less bandwidth per person is not the answer (Portugal tried that, people screamed). Multiple networks is not the answer. How you use what you have is part of the answer, distributing it in your home is highly important, and having the right equipment for the class of service you want/need is critical as every other part of the equation has been taken care of. The telco/cable ISP can't tell you what to do in your own home, that's up to you. Be educated or ignorant... it's your right.



Hope that helps,
C.

Paratima
05-26-2008, 11:34 AM
Nice. :hifi: