Posterous theme by Cory Watilo

More MegaComet testing: Ruling out keepalives

After the last test, which wasn’t much improvement, my suspicion was that the tcp keepalives for all those connections were swamping the interface. So i installed iftop and reran the tests to check:

echo Install iftop
sudo yum -y install libpcap*
sudo yum -y install ncurses*
cd ~
wget http://ex-parrot.com/~pdw/iftop/download/iftop-0.17.tar.gz
tar -xzvf iftop-0.17.tar.gz
cd iftop-0.17
./configure && make && sudo make install
sudo ./iftop

I didn’t get any further, but it did confirm that once the sockets were open that there is no load on the network. Seems like the keepalives aren’t major (or are nonexistent).

Another thing i found to do was ss -s to show how many sockets are open at one time. Very handy for screenshots :)

MegaComet test #4 - This time with more kernel

Summary

This time, I’ll be running the MegaComet tests as per test 3, with kernel logging enabled to see where I’m pushing the TCP stack too far, so that hopefully i can fix it with some configuration.

Setup

As per test 3: start 5 EC2 servers ‘ami-221fec4b’. Out of curiosity, I priced it this time. Since my tests will take less than an hour, it’ll cost $0.34 (ec2 large instance hourly cost) * 5 instances = $1.70 to run this test. I can handle that. It’s also probably worth mentioning that in ec2, i configure the firewall to allow all the MegaComet ports open. In the real world, you’d have the ‘application’ port restricted.

Setup script

echo Configuring TCP stack
sudo bash
echo "# Settings from http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-3" >> /etc/sysctl.conf
echo "# Config needed to have enough tcp stack memory:" >> /etc/sysctl.conf
echo "net.core.rmem_max = 33554432" >> /etc/sysctl.conf
echo "net.core.wmem_max = 33554432" >> /etc/sysctl.conf
echo "net.ipv4.tcp_rmem = 4096 16384 33554432" >> /etc/sysctl.conf
echo "net.ipv4.tcp_wmem = 4096 16384 33554432" >> /etc/sysctl.conf
echo "net.ipv4.tcp_mem = 786432 1048576 26777216" >> /etc/sysctl.conf
echo "net.ipv4.tcp_max_tw_buckets = 360000" >> /etc/sysctl.conf
echo "net.core.netdev_max_backlog = 2500" >> /etc/sysctl.conf
echo "vm.min_free_kbytes = 65536" >> /etc/sysctl.conf
echo "vm.swappiness = 0" >> /etc/sysctl.conf
echo "# This is for the outgoing connections max:" >> /etc/sysctl.conf
echo "net.ipv4.ip_local_port_range = 1024 65535" >> /etc/sysctl.conf
echo "# I added this to set the system wide file max:" >> /etc/sysctl.conf
echo "fs.file-max = 1100000" >> /etc/sysctl.conf
echo "# Reduce the time sockets stay in time_wait: http://forums.theplanet.com/lofiversion/index.php/t62399.html" >> /etc/sysctl.conf
echo "net.ipv4.tcp_fin_timeout = 12" >> /etc/sysctl.conf
exit
sudo sysctl -p

echo Enlarging user-limits on files
sudo bash
echo "* soft nofile 1048576" >> /etc/security/limits.conf 
echo "* hard nofile 1048576" >> /etc/security/limits.conf
exit

echo Enabling kernel logging
sudo bash
echo "kern.*          /var/log/kern.log" >> /etc/rsyslog.conf
sudo service rsyslog restart
exit

echo Installing build essentials
sudo yum -y install gcc* git* make

echo Installing libev
cd ~
wget http://dist.schmorp.de/libev/libev-4.04.tar.gz
tar -zxvf libev-4.04.tar.gz
cd libev-4.04
./configure && make && sudo make install

echo Adding libev to the library list 
sudo sh -c "echo /usr/local/lib > /etc/ld.so.conf.d/usr-local-lib.conf"
sudo ldconfig

echo Installing MC
cd ~
git clone git://github.com/chrishulbert/MegaComet.git
cd MegaComet
make
cd testing
make

echo Now you have to logout and in again, because you only have a low per-user limit as you can see:
ulimit -n

Viewing kernel logs

Once MC started on the first instance, i run this to view the kernel logs:

sudo tail -f /var/log/kern.log

Starting tests:

On the test instances (2-5):

cd ~/MegaComet/testing
./megatest A 10.40.29.57

Results:

I can only get up to 494k connections. On the server, here is the top output when at maximum. As you can see, there’s plenty of ram free:

top - 11:56:56 up 36 min,  4 users,  load average: 0.00, 0.14, 0.20
Tasks:  84 total,   1 running,  83 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.1%sy,  0.0%ni, 99.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   7652552k total,  2566036k used,  5086516k free,    20492k buffers
Swap:        0k total,        0k used,        0k free,   798960k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                   

 3373 ec2-user  20   0  8664  336  264 S  0.0  0.0   0:00.00 megamanager                                                                
 3374 ec2-user  20   0 27592  18m  460 S  0.0  0.2   0:15.90 megacomet                                                                  
 3375 ec2-user  20   0 26640  17m  460 S  0.0  0.2   0:15.40 megacomet                                                                  
 3376 ec2-user  20   0 26660  17m  460 S  0.0  0.2   0:15.29 megacomet                                                                  
 3377 ec2-user  20   0 26680  18m  460 S  0.0  0.2   0:15.27 megacomet                                                                  
 3378 ec2-user  20   0 27420  18m  460 S  0.0  0.2   0:16.59 megacomet                                                                  
 3379 ec2-user  20   0 27304  18m  460 S  0.0  0.2   0:15.81 megacomet                                                                  
 3380 ec2-user  20   0 26828  17m  460 S  0.0  0.2   0:15.52 megacomet                                                                  
 3381 ec2-user  20   0 27188  18m  460 S  0.0  0.2   0:15.60 megacomet

And the slabtop output:

Active / Total Objects (% used)    : 3713138 / 3713562 (100.0%)
 Active / Total Slabs (% used)      : 154792 / 154792 (100.0%)
 Active / Total Caches (% used)     : 58 / 74 (78.4%)
 Active / Total Size (% used)       : 1479546.63K / 1479691.42K (100.0%)
 Minimum / Average / Maximum Object : 0.01K / 0.40K / 8.00K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   
524160 524158  99%    0.19K  24960   21     99840K dentry
496797 496642  99%    0.19K  23657   21     94628K kmalloc-192
496000 496000 100%    0.06K   7750   64     31000K kmalloc-64
495328 495328 100%    0.12K  15479   32     61916K kmalloc-128
495040 495040 100%    0.07K   8840   56     35360K blkdev_ioc
495000 495000 100%    0.62K  41250   12    330000K sock_inode_cache
494950 494950 100%    1.62K  26050   19    833600K TCP
163410 163385  99%    0.10K   4190   39     16760K buffer_head

Nothing appeared in the kernel log. So, for now, i’m not sure what the holdup is: No kernel errors, didn’t hit a memory ceiling, i’m puzzled.

Building People

Building

I’m all about building things. I’ve discovered that is what makes me tick: watching something emerge out of nothing. It’s great. So far, it’s been about building software, building apps, woodworking, that kind of thing. But somehow, i’ve found myself building an entirely new thing.

People are the final destination

Somehow now i’m faced with the task of building not just physical things (woodworking), nor abstract things (software), but people. It came as a realisation recently while chatting with my pastor, that what i spend most of my energy and increasing amounts of my time these days is: building people.

Building a person is all about taking them where they need to go. Finding their goal, passion, calling, whatever it is – and poking them, cajoling them, convincing them, pushing them, clearing away the cobwebs from their internal drive, until they get there.

My plan

So i’ve come up with a plan to do this. Its a basic plan for now, and i’m sure i’ll revisit it and improve on it as the years go by. Lets call it: ‘leadership v0.1’ for now:

  • Connect
  • Find their vision
  • Convince them it’s possible
  • Overcome their fears
  • Hold their hands for baby steps
  • Follow up

Connect

This is all about connecting with the person. There’s no point doing anything with them if they don’t trust you, or don’t even like you. You have to build that bridge. To build the bridge, do whatever it takes to care about that person’s life. It’ll show, they’ll notice, and you’ll connect.

Then you need to make yourself available. You must be in regular contact. Remember: ‘He who spends the most time, wins.’. If your schedule is too busy for people, take a chainsaw to it. People are simply more important.

Find their vision

Everyone has a dream, a vision, a calling. Some people know what it is, and just need to be comfortable with you before they will share it. Others have never been given permission to dream. For those, you need to open their eyes to the possibilities that the world has for their life. Prod, poke, make them dream. Find out what their calling is. Not what you think they should do with their life, but what they truly are here for.

Everyone has something, as vague as it may be. Some people know exactly what it is, and some have a vague dream that will clear as they step into it. But, like a miner pans for gold, sift their thoughts until you find it. And once you’ve found it, you must encourage them to desire the dream.

Convince them it’s possible

Are you a salesman? Well, this is quite possibly the hardest sales job you’ll ever have: to convince someone that their dream lies in the realm of possibility. It is contagious: If you believe in someone, they will catch that belief eventually. It helps immensely to know that someone else believes in you. And so, you must be that person to them.

Show them examples of others who have achieved. Be an example of someone who is living out their dreams. Break down the dream into concrete, achievable steps. Persuade them that they deserve to achieve. Build a belief system into them that they are the type of person that can do great things. Many fear to hope, because of the disappointment of failure. Convince them to take the risk of hoping their dreams will come true.

Overcome their fears

Usually, fear is what holds people back from trying something new. What if i don’t have what it takes? I’ll surely crash and burn. I’m scared of the pain of failure. What if i try the wrong thing? As grown ups, we no longer fear the monster in the closet. Most have exchanged the closet monster for the failure monster.

These fears have to be exposed for what they are: the resistance. Read ‘Do the work’ by Pressfield. Give them the freedom to fail. Once they know that they’re allowed to fail a few times before they start succeeding, they’ll be more inclined to jump in and have a go. Put courage into them so that the sting of failure will not take them out.

Hold their hands for baby steps

Most people don’t know where to start. But they don’t realise that direction comes after action. The best way to convince them of this fact is to hold their hands for the first few steps, until direction kicks in. Do not stress about ‘which direction to take’ at this stage. What they’re doing isn’t as important as, simply: doing. Try many things, until one bears fruit. Teach them to take initiative to do new things related to their sphere.

The main point of this step is to build their confidence, and further convince them that they do have what it takes, that walking in the direction of the goal is possible, and that their fears are smaller than imagined. Also, this is where the nuts and bolts of showing them the ropes comes into play. But, like my daughter, once you teach them how to walk, they’ll figure out on their own how to run. So learn when to take the training wheels off.

Follow up

Excitement is a fading thing. The start of any great goal is full of it, but the first setback will halve it. The period of drudgery, as we do the mundane necessities involved will also take its toll. Once the excitement bank hits zero, they will need their inside drive to push them onwards. It is your job to fuel their inside drive. Keep their eyes on the goal, encourage them, celebrate the small wins, and show them how far they’ve come. Pick them up when they stumble.

People need people. We’re not built to make it alone, we need allies to spur us on to achieve our goals. Be that ally. Spur them on.

Who’s building me?

Which brings me to my final thought. Who’s building me? If it is my job to bring out people’s potential, who will do the same to me?

I struggled with this, until i had a realisation: life isn’t a hierarchy, it’s about each other. Instead of waiting for someone ‘higher up’ to mentor me, i found a group of like minded friends and said: lets build/spur on/encourage each other. So rather than entrusting my future to a single mentor, i now have half a dozen of them. Together, success is inevitable.

Half way there: Getting MegaComet to 523,000 concurrent HTTP connections

TL;DR

  • With a bit of kernel tuning, i was able to get up to 523k connections opened simultaneously from 4 client boxes to 1 MegaComet server.
  • Memory and CPU usage was minimal (128M across the servers processes, maybe 24% CPU at 4000 connections/sec).
  • I’ll try to improve the kernel tuning to get it to 1M by checking the /var/log/kern.log next time.
  • Libev basically runs on the smell of an oily rag.

Setup

I started 5 EC2 Large 64-bit servers, using the amazon linux image ‘ami-221fec4b’ (aka: amzn-ami-2011.02.1.x86_64). One of these was the server, and the other 4 are the client servers, each trying to open 250k connections. These are vanilla EC2 Large instances, with the following kernel tuning (credits to the metabrew article):

Tuning

The following increases the user limit for number of open file descriptors (TCP connections are file descriptors):

echo "* soft nofile 1048576" >> /etc/security/limits.conf 
echo "* hard nofile 1048576" >> /etc/security/limits.conf

After the above is done, you have to log out and back in again.

To tune the kernel to allow 1M connections, the following was appended to the /etc/sysctl.conf:

# Settings from http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-3
# Config needed to have enough tcp stack memory:
net.core.rmem_max = 33554432
net.core.wmem_max = 33554432
net.ipv4.tcp_rmem = 4096 16384 33554432
net.ipv4.tcp_wmem = 4096 16384 33554432
net.ipv4.tcp_mem = 786432 1048576 26777216
net.ipv4.tcp_max_tw_buckets = 360000
net.core.netdev_max_backlog = 2500
vm.min_free_kbytes = 65536
vm.swappiness = 0
# This is for the outgoing connections max:
net.ipv4.ip_local_port_range = 1024 65535
# I added this to set the system wide file max:
fs.file-max = 1100000  
# Reduce the time sockets stay in time_wait: http://forums.theplanet.com/lofiversion/index.php/t62399.html
net.ipv4.tcp_fin_timeout = 12

To apply it, you need to do: sudo sysctl -p I believe this tuning still needs work. Next time i run the tests i’ll check the kernel log to see if anything in the TCP stack has maxed out.

Steps

To reproduce my tests, you can follow the steps used to configure the vanilla instances:

# Install compiler / tools
sudo yum -y install gcc* git* make

# Install libev
wget http://dist.schmorp.de/libev/libev-4.04.tar.gz
tar -zxvf libev-4.04.tar.gz
cd libev-4.04
./configure && make && sudo make install
sudo sh -c "echo /usr/local/lib > /etc/ld.so.conf.d/usr-local-lib.conf"
sudo ldconfig

# Install MC
cd ~
git clone git://github.com/chrishulbert/MegaComet.git
cd MegaComet

# Now do the kernel tuning as mentioned above

# To run the server:
cd MegaComet
make
./start

# To run the clients:
cd MegaComet/testing
make
./megatest X Y # (where X is a,b,c,d depending on which testing server this is)
# Also Y is the IP address of the comet server

Results

The clients got up to 142k, 144k, 105k, and 132k connections respectively before trying to open new connections timed out. This is a total of 523k connections, just over half a million! The RAM and CPU usage on the server was minimal throughout the test. Here’s a screenshot of top while the tests were running at approx 4000 new connections/second, to give an idea of CPU and memory usage:

top - 11:03:28 up  1:12,  2 users,  load average: 0.25, 0.58, 0.48
Tasks:  77 total,   2 running,  75 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.8%us,  2.7%sy,  0.0%ni, 95.1%id,  0.0%wa,  0.1%hi,  1.1%si,  0.2%st
Mem:   7652552k total,  1441076k used,  6211476k free,    22144k buffers
Swap:        0k total,        0k used,        0k free,   823848k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                        
22612 ec2-user  20   0  8664  340  264 S  0.0  0.0   0:00.00 megamanager                                                             
22614 ec2-user  20   0 25552  17m  460 S  4.7  0.2   0:08.22 megacomet                                                               
22615 ec2-user  20   0 25552  17m  460 S  5.0  0.2   0:08.08 megacomet                                                               
22616 ec2-user  20   0 25556  17m  460 S  5.0  0.2   0:08.16 megacomet                                                               
22617 ec2-user  20   0 25556  17m  460 S  5.0  0.2   0:08.28 megacomet                                                               
22618 ec2-user  20   0 25580  17m  460 R  5.0  0.2   0:08.01 megacomet                                                               
22619 ec2-user  20   0 25556  17m  460 S  5.0  0.2   0:08.38 megacomet                                                               
22620 ec2-user  20   0 25556  17m  460 S  4.7  0.2   0:08.23 megacomet                                                               
22621 ec2-user  20   0 25552  17m  460 S  4.7  0.2   0:08.16 megacomet

I forgot to grab a top screenshot when the connections were all opened, but the memory usage was no different, and CPU was zero.

Conclusions

I really can’t believe the CPU and RAM usage are so small when the 1/2M connections are live and idle! At this stage, i’m not really testing for performance when passing messages around. I hope to get to 1M (static) open connections, and then start testing messaging. I’m optimistic: it looks promising. Next time i try this, i’ll keep a close eye on the kernel log (/var/log/kern.log) and see if i can find any bottlenecks.

References

http://www.metabrew.com/article/a-million-user-comet-application-with-mochiwe... http://www.cs.wisc.edu/condor/condorg/linux_scalability.html