I started 5 EC2 Large 64-bit servers, using the amazon linux image ‘ami-221fec4b’ (aka: amzn-ami-2011.02.1.x86_64). One of these was the server, and the other 4 are the client servers, each trying to open 250k connections. These are vanilla EC2 Large instances, with the following kernel tuning (credits to the metabrew article):
The following increases the user limit for number of open file descriptors (TCP connections are file descriptors):
echo "* soft nofile 1048576" >> /etc/security/limits.conf echo "* hard nofile 1048576" >> /etc/security/limits.conf
After the above is done, you have to log out and back in again.
To tune the kernel to allow 1M connections, the following was appended to the /etc/sysctl.conf:
# Settings from http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-3 # Config needed to have enough tcp stack memory: net.core.rmem_max = 33554432 net.core.wmem_max = 33554432 net.ipv4.tcp_rmem = 4096 16384 33554432 net.ipv4.tcp_wmem = 4096 16384 33554432 net.ipv4.tcp_mem = 786432 1048576 26777216 net.ipv4.tcp_max_tw_buckets = 360000 net.core.netdev_max_backlog = 2500 vm.min_free_kbytes = 65536 vm.swappiness = 0 # This is for the outgoing connections max: net.ipv4.ip_local_port_range = 1024 65535 # I added this to set the system wide file max: fs.file-max = 1100000 # Reduce the time sockets stay in time_wait: http://forums.theplanet.com/lofiversion/index.php/t62399.html net.ipv4.tcp_fin_timeout = 12
To apply it, you need to do: sudo sysctl -p I believe this tuning still needs work. Next time i run the tests i’ll check the kernel log to see if anything in the TCP stack has maxed out.
To reproduce my tests, you can follow the steps used to configure the vanilla instances:
# Install compiler / tools sudo yum -y install gcc* git* make # Install libev wget http://dist.schmorp.de/libev/libev-4.04.tar.gz tar -zxvf libev-4.04.tar.gz cd libev-4.04 ./configure && make && sudo make install sudo sh -c "echo /usr/local/lib > /etc/ld.so.conf.d/usr-local-lib.conf" sudo ldconfig # Install MC cd ~ git clone git://github.com/chrishulbert/MegaComet.git cd MegaComet # Now do the kernel tuning as mentioned above # To run the server: cd MegaComet make ./start # To run the clients: cd MegaComet/testing make ./megatest X Y # (where X is a,b,c,d depending on which testing server this is) # Also Y is the IP address of the comet server
The clients got up to 142k, 144k, 105k, and 132k connections respectively before trying to open new connections timed out. This is a total of 523k connections, just over half a million! The RAM and CPU usage on the server was minimal throughout the test. Here’s a screenshot of top while the tests were running at approx 4000 new connections/second, to give an idea of CPU and memory usage:
top - 11:03:28 up 1:12, 2 users, load average: 0.25, 0.58, 0.48 Tasks: 77 total, 2 running, 75 sleeping, 0 stopped, 0 zombie Cpu(s): 0.8%us, 2.7%sy, 0.0%ni, 95.1%id, 0.0%wa, 0.1%hi, 1.1%si, 0.2%st Mem: 7652552k total, 1441076k used, 6211476k free, 22144k buffers Swap: 0k total, 0k used, 0k free, 823848k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 22612 ec2-user 20 0 8664 340 264 S 0.0 0.0 0:00.00 megamanager 22614 ec2-user 20 0 25552 17m 460 S 4.7 0.2 0:08.22 megacomet 22615 ec2-user 20 0 25552 17m 460 S 5.0 0.2 0:08.08 megacomet 22616 ec2-user 20 0 25556 17m 460 S 5.0 0.2 0:08.16 megacomet 22617 ec2-user 20 0 25556 17m 460 S 5.0 0.2 0:08.28 megacomet 22618 ec2-user 20 0 25580 17m 460 R 5.0 0.2 0:08.01 megacomet 22619 ec2-user 20 0 25556 17m 460 S 5.0 0.2 0:08.38 megacomet 22620 ec2-user 20 0 25556 17m 460 S 4.7 0.2 0:08.23 megacomet 22621 ec2-user 20 0 25552 17m 460 S 4.7 0.2 0:08.16 megacomet
I forgot to grab a top screenshot when the connections were all opened, but the memory usage was no different, and CPU was zero.
I really can’t believe the CPU and RAM usage are so small when the 1/2M connections are live and idle! At this stage, i’m not really testing for performance when passing messages around. I hope to get to 1M (static) open connections, and then start testing messaging. I’m optimistic: it looks promising. Next time i try this, i’ll keep a close eye on the kernel log (/var/log/kern.log) and see if i can find any bottlenecks.
http://www.metabrew.com/article/a-million-user-comet-application-with-mochiwe... http://www.cs.wisc.edu/condor/condorg/linux_scalability.html
Thanks for reading! And if you want to get in touch, I'd love to hear from you: chris.hulbert at gmail.
(Comp Sci, Hons - UTS)
Software Developer (Freelancer / Contractor) in Australia.
I have worked at places such as Google, Cochlear, Assembly Payments, News Corp, Fox Sports, NineMSN, FetchTV, Coles, Woolworths, Trust Bank, and Westpac, among others. If you're looking for help developing an iOS app, drop me a line!
Get in touch:
[email protected]
github.com/chrishulbert
linkedin