Wednesday, August 30, 2006

The missing benchmarks

When puchasing computer hardware, unless you get your hands on the equipment before you purchase, you have to rely on past experience, experience of others, or benchmarks, to see if they have enough capacity for your users needs.

There is a large wealth of information you can use to judge server capacity. There is also plenty of information for storage capacity. One place that is almost always overlooked is network devices. I have seen many people in the past, blindly using the vendors advertised specifications to determine their needs. The assumption that they have Gig ports on a switch and the vendor claims the backplane is "HUGE", does not mean that you can assume that the actual performance will be "wire" speed.

If you are in the market for network hardware at the moment, before you go too far you should have a sober look at Simon Bullen's blog on Cisco's 3750. I have always had my doubts about Cisco equipment in the past, and the reasons why people choose it over other vendors. It would be nice if we could have somewhat neutral benchmark standards body like www.spec.org to produce a set of network equipment benchmarks. My searching so far only finds vendors (or their proxies) competitive benchmarking their opposition, and of course whipping their ass. You would not expect less.

While independent results from people like Simon are fantastic. It would be nice to see network vendors put their crown jewels on the benchmarking chopping block for all to see. The nice thing about this is if they don't post results for certain equipment, they may be selling you a brumby.

Thursday, August 17, 2006

PR Nightmare

Is there a breakout of foot-in-mouth disease at IBM lately, or have do they just have a drug problem? I know the Solar System is currently being redefined, but what planet are they on?

Their recent comments on Sun open source creditials do make you wonder just what are they thinking. Due to their work on the Linux kernel, Apache, Eclipse etc, IBM have been able to enjoy the admoration of the open source community, while at the same time keeping their products very closed. To now it has been very clever marketing. The problem they have now is that some commentators are starting to see through this, and are asking why IBM does not open source its software and hardware like Sun Microsystems has been doing.

The obvious reply to this to avoid the focus on IBM's closed products, would have actually been to acknowledge Sun's effort, and focus on how long it takes to change a product of the size of Solaris and AIX from multiple closed source licenses to an open source license. Instead they have decided to attack the Open Solaris and Open Sparc projects stating that they are not truely open. This is just total bullshit!

Rather than hosing down the question, giving the commentators little to write about, they have decided to give the press a juicy story. This would be fine if they had a large swag of products based on open source. The fact is that they have not, and the press have now alerted to a renewed battle between IBM and Sun with IBM standing on very shakey ground, surrounded by mountains of closed source and hardware.

IBM's current statements are probably an indication of they think Sun's push to open source not only it's entire software stack, but also hardware, is a medium to long term threat to IBM position. It looks from what they are saying, is at the moment they are scratching themselves to find a solution. Using verbal attacks on Sun is just plain counter productive. I think the future may be tough for companies that dabble in open source while at the same time keeping a very closed product line. Hypocrisy does not further their cause.

What can IBM do to get themselves out of this mess? I think a good start would be to avoid opening their mouths, and start work on a timetable to start opening up their products. They state that their customers are not interested in AIX enough, but I find this very hard to believe. When Solaris was re-released onto x86 hardware, they stated that the was not enough interest from their customers to release software for the platform. Their stance changed very quickly when they got a tap on the shoulder from some of their largest customers. I think if they are seeing little interest from their customers, then this is a more of a worrying sign of the future of AIX. I hope this is not the case.

Sunday, August 13, 2006

Don't be afraid of mdb - cont

I thought I would just quickly show why I did not use any "optimize" flags when I compiled my example code. At the same time I show the er_src program which part of Sun Studio.

First I compile the program with the '-g' flag. This will give er_src access to the source code.
doug@bangkok> cc -g -o ./makecore ./makecore.c
Next run the program and do a pstack. Everything looks fine
doug@bangkok> ./makecore
Memory fault(coredump)
doug@bangkok> /bin/pstack core
core 'core' of 102246: ./makecore
08050775 fruitloop (8060958, 804710c, 80507bb, 8047210, 804712c, 80506ca) + 15
08050794 giveitatry (8047210, 804712c, 80506ca, 1, 8047138, 8047140) + 14
080507bb main (1, 8047138, 8047140) + b
080506ca _start (1, 8047278, 0, 8047283, 8047290, 80472ca) + 7a
Ok, lets have a look at what er_src does
doug@bangkok> /opt/SUNWspro/bin/er_src -func ./makecore

Functions sorted in lexicographic order

Load Object:

Address Size Name

0x000005e0 16 @plt
0x000005f0 16 __fpstart
0x000006dc 123 __fsr
0x00000620 16 _exit
0x000007ec 27 _fini
0x00000640 16 _get_exit_frame_monitor
0x000007d0 27 _init
0x00000650 139 _start
0x00000610 16 atexit
0x00000600 16 exit
0x00000630 16 printf
0x00000760 30 fruitloop
0x00000780 38 giveitatry
0x000007b0 29 main

doug@bangkok> /opt/SUNWspro/bin/er_src -disasm fruitloop ./makecore
Annotated disassembly
---------------------------------------
Source file: ./makecore.c
Object file: ./makecore
Load Object: ./makecore
1. #include
2. #include
3.
4. static void
5. fruitloop(){

[ 5] 8050760: pushl %ebp
[ 5] 8050761: movl %esp,%ebp
[ 5] 8050763: subl $4,%esp
6. char *p;
7. p=(char *)NULL;
[ 7] 8050766: movl $0,-4(%ebp)
8. *p='c';
[ 8] 805076d: movl $0x63,%eax
[ 8] 8050772: movl -4(%ebp),%edx
[ 8] 8050775: movb %al,0(%edx)
[ 8] 8050778: jmp .+4 [ 0x805077c ]
[ 8] 805077a: nop
[ 8] 805077b: nop
9. }
[ 9] 805077c: leave
[ 9] 805077d: ret
10.
11. static void
12. giveitatry(){

[12] 8050780: pushl %ebp
[12] 8050781: movl %esp,%ebp
[12] 8050783: subl $4,%esp
13. char *msg="Ahh we made it!\n";
[13] 8050786: leal 0x8060958,%eax
[13] 805078c: movl %eax,-4(%ebp)
14.
15. fruitloop();
[15] 805078f: call fruitloop [ 0x8050760, .-0x2f ]
16. (void)printf(msg);
[16] 8050794: movl -4(%ebp),%eax
[16] 8050797: pushl %eax
[16] 8050798: call printf [ 0x8050630, .-0x168 ]
[16] 805079d: addl $4,%esp
[16] 80507a0: jmp .+4 [ 0x80507a4 ]
[16] 80507a2: nop
[16] 80507a3: nop
17. }
[17] 80507a4: leave
[17] 80507a5: ret
18.
19. int
20. main(int argc, char **argv){

[20] 80507b0: pushl %ebp
[20] 80507b1: movl %esp,%ebp
[20] 80507b3: subl $4,%esp
21. giveitatry();
[21] 80507b6: call giveitatry [ 0x8050780, .-0x36 ]
22. return(0);
[22] 80507bb: movl $0,-4(%ebp)
[22] 80507c2: jmp .+6 [ 0x80507c8 ]
[22] 80507c4: jmp .+4 [ 0x80507c8 ]
[22] 80507c6: nop
[22] 80507c7: nop
23. }
[23] 80507c8: movl -4(%ebp),%eax
[23] 80507cb: leave
[23] 80507cc: ret
Now, that is really nice. The C program is listed with the assembly code for the corresponding C code. It makes a very nice assembly language tutorial. Now, lets do the same, but compile using the "-fast" flag. (-fast is actually a macro for several other flags. It is known to generally give the best optimized code for your system with the least effort)
doug@bangkok> cc -fast -g -o ./makecore ./makecore.c
doug@bangkok> ./makecore
Memory fault(coredump)
doug@bangkok> /bin/pstack core
core 'core' of 102283: ./makecore
08050850 main (1, 8047278, 0, 8047283, 8047290, 80472ca)
Hmmm, this time it stopped in function main. Lets look at what -fast did to the code.
doug@bangkok> /opt/SUNWspro/bin/er_src -disasm fruitloop ./makecore
Annotated disassembly
---------------------------------------
Source file: ./makecore.c
Object file: ./makecore
Load Object: ./makecore
1. #include
2. #include
3.
4. static void
5. fruitloop(){
6. char *p;
7. p=(char *)NULL;
8. *p='c';

[ 8] 8050820: movb $0x63,0
9. }
[ 9] 8050827: ret

[ 8] 8050830: movb $0x63,0
10.
11. static void
12. giveitatry(){
13. char *msg="Ahh we made it!\n";
14.

Function fruitloop inlined from source file ./makecore.c into the code for the following line. 0 loops inlined
15. fruitloop();
16. (void)printf(msg);
[16] 8050837: subl $8,%esp
[16] 805083a: pushl $0x80609f4
[16] 805083f: call printf [ 0x8050630, .-0x20f ]
[16] 8050844: addl $0xc,%esp
17. }
[17] 8050847: ret

[ 8] 8050850: movb $0x63,0
[16] 8050857: subl $8,%esp
[16] 805085a: pushl $0x80609f4
[16] 805085f: call printf [ 0x8050630, .-0x22f ]
[16] 8050864: addl $0xc,%esp
18.
19. int
20. main(int argc, char **argv){

Function giveitatry inlined from source file ./makecore.c into the code for the following line. 0 loops inlined
Function fruitloop inlined from source file ./makecore.c into inline copy of function giveitatry. 0 loops inlined
21. giveitatry();
22. return(0);
[22] 8050867: xorl %eax,%eax
[22] 8050869: ret
23. }
As you can read from the comments, both of the functions were inlined. Therefore they are now part of the 'main' function. The 'er_src' program is really neat app. Lets see the comment change when we tell it not to inline.
doug@bangkok> cc -fast -g -xinline=no%fruitloop -o ./makecore ./makecore.c
doug@bangkok> /opt/SUNWspro/bin/er_src -disasm fruitloop ./makecore
Annotated disassembly
---------------------------------------
Source file: ./makecore.c
Object file: ./makecore
Load Object: ./makecore
.
.
5. fruitloop(){
6. char *p;
7. p=(char *)NULL;
8. *p='c';

[ 8] 8050820: movb $0x63,0
9. }
[ 9] 8050827: ret
10.
11. static void
12. giveitatry(){
13. char *msg="Ahh we made it!\n";
14.

Function fruitloop not inlined because user explicitly requested that it not be inlined
If you are playing around with optimizing code, then er_src is one tool you should use.

Have Fun!

Don't be afraid of mdb

Many Solaris system admins or developers would know that Solaris has some very good debugging tools. Most sysadmins would know there is a command called mdb. Sadly most would have either never used it, or was scared off when they scanned through the documentation. While using mdb does require a good knowledge of the Solaris internals, and some assembly language skills, there are times where it is probably the only (or best) tool for the job.

Consider the case where you have an application that your company has been using for a long time. Something has changed on the system, and now it crashes when it is run. Since the person who wrote the application now does not work for your company anymore and nobody knows where the source code is, you have a problem. To make things worst, when you do a pstack on the core file, you find that they have “stripped” the binary of its symbol table to save a few bytes. Your options are now really limited to do any useful debugging. Enter 'mdb'....

Now to simulate this I have created a small C program, with a null pointer buried in a couple of functions. I compile the program (not using any optimizations as the compiler will inline all of the functions as they are very small), and then run the strip command on it. During running the program we get not a very useful error message, and a core dump. Argggg!

Running a pstack on the binary, because it was stripped, pstack returns an address with “????????”, as the function name. Ah, it is now turned into a challenge.
doug@bangkok> /bin/pstack core
core 'core' of 101996: ./makecore
080506d5 ???????? (8060898, 80470fc, 805071b, 804720c, 8047124, 805062a)
080506f4 ???????? (804720c, 8047124, 805062a, 1, 8047130, 8047138)
0805071b main (1, 8047130, 8047138) + b
0805062a _start (1, 8047274, 0, 804727f, 804728c, 80472c6) + 7a
You can get the similar output from the “::stack” command, within mdb.
doug@bangkok> mdb core
Loading modules: [ libc.so.1 ld.so.1 ]
> ::stack
0x80506d5(8060898, 80470fc, 805071b, 804720c, 8047124, 805062a)
(804720c, 8047124, 805062a, 1, 8047130, 8047138)
main+0xb(1, 8047130, 8047138)
_start+0x7a(1, 8047274, 0, 804727f, 804728c, 80472c6)
Since there is nothing in human readable form, at this point most people would look elsewhere or through it in the too hard basket. If you know a little assembly language (32bit x86 in this case), you should probably continue on. A good starting point would be the assembly listing of the function where it bombs out. The first address “0x80506d5” is for the instruction where we bombed out. Doing a disassemble backwards from this address is tedious, especially if this instruction is a long way from the beginning. The address on the next line “0x80506f4” is actually more useful. It is the return address of the function, which should be the next instruction after the function call. The function calling code should be immediately before this. Lets attack it with the disassembler built into 'mdb' byte by byte.
> 80506f4::dis
0x80506f4: movl -0x4(%ebp),%eax
> 80506f3::dis
0x80506f3: decl 0xe850fc45(%ebx)
> 80506f2::dis
0x80506f2: ***ERROR--unknown op code***
> 80506f1::dis
0x80506f1: ***ERROR--unknown op code***
> 80506f0::dis
0x80506f0: int $0x3
> 80506ef::dis
0x80506ef: call -0x34 <0x80506c0>
Bingo! We have a winner - 0x80506c0. You will probably notice, the "call" op-code (1 byte) was followed by a 4 byte address, so we could have first tried the addess – 5. In my case the command inside of mdb would have been “80506f4-5::dis”.

Now we have an address, we can now easily list the function from the start.
> 80506c0::dis
0x80506c0: pushl %ebp
0x80506c1: movl %esp,%ebp
0x80506c3: subl $0x4,%esp
0x80506c6: movl $0x0,-0x4(%ebp)
0x80506cd: movl $0x63,%eax
0x80506d2: movl -0x4(%ebp),%edx
0x80506d5: movb %al,0x0(%edx)
0x80506d8: leave
0x80506d9: ret
0x80506da: nop
We can eithen add the function into mdb's user-defined symbol table, so we can now see symbolic names, rather than hex addresses. The rough comments we added by me :)
> 80506c0::nmadd -f -e 80506da badfunc
added badfunc, value=80506c0 size=1a
> badfunc::dis
badfunc: pushl %ebp ; save frame pointer to the stack
badfunc+1: movl %esp,%ebp ; copy stack pointer to frame pointer
badfunc+3: subl $0x4,%esp ; make room for the pointer - char *p
badfunc+6: movl $0x0,-0x4(%ebp) ; initialize pointer to null - p=(char*)NULL;
badfunc+0xd: movl $0x63,%eax ; copy 'c' to %eax register
badfunc+0x12: movl -0x4(%ebp),%edx ; copy pointer to register %edx - now = 0
badfunc+0x15: movb %al,0x0(%edx) ; *p = 'c' - Hmmm copy 'c' to address 0 - BAD!!!
badfunc+0x18: leave ; cleanup function call
badfunc+0x19: ret ; return to calling function
> ::stack
badfunc+0x15(8060898, 80470fc, 805071b, 804720c, 8047124, 805062a)
0x80506f4(804720c, 8047124, 805062a, 1, 8047130, 8047138)
main+0xb(1, 8047130, 8047138)
_start+0x7a(1, 8047274, 0, 804727f, 804728c, 80472c6)
> 80506f4-5::dis
0x80506ef: call -0x34
From a quick look at my disassembled code, it is clear that some idiot created a null pointer , and then tried to copy a byte to there. Not very bright eh! In a real world example you would probably need to run the command in mdb and set the breakpoint to the start of the function. From there you could step through the code to see what is does. It would go something like this -
doug@bangkok> mdb ./makecore
> 80506c0::nmadd -f -e 80506da badfunc ; Add our own symbol from above
added badfunc, value=80506c0 size=1a
> badfunc:b ; Set a breakpoint at the beginning of badfunc
> :r ; run ./makecore in the debugger
mdb: stop at badfunc
mdb: target stopped at:
badfunc: pushl %ebp
> :s ; Step through code, 1 step at a time
mdb: target stopped at:
badfunc+1: movl %esp,%ebp
> :s
mdb: target stopped at:
badfunc+3: subl $0x4,%esp
> :s
mdb: target stopped at:
badfunc+6: movl $0x0,-0x4(%ebp)
> :s
mdb: target stopped at:
badfunc+0xd: movl $0x63,%eax
> ::regs ; Check the registers - Hmmm. %edx = 0
%cs = 0x003b %eax = 0x08060898
%ds = 0x0043 %ebx = 0xfeffa7c0
%ss = 0x0043 %ecx = 0xfefa9768 libc.so.1`_sse_hw
%es = 0x0043 %edx = 0x00000000
%fs = 0x0000 %esi = 0x080470e0
%gs = 0x01c3 %edi = 0x08047204

%eip = 0x080506cd badfunc+0xd
%ebp = 0x080470e4
%kesp = 0x00000000

%eflags = 0x00000202
id=0 vip=0 vif=0 ac=0 vm=0 rf=0 nt=0 iopl=0x0
status=

%esp = 0x080470e0
%trapno = 0x1
%err = 0x0
To find the best reference on the guts of Solaris and how to make the best use mdb and other Solaris tools such as DTrace. Go and purchase the just released 2nd edition of Solaris Internals and it's new companion Solaris Performance and Tools. You can save 30% by buying the through Sun. While you are stacking your bookself, you should also consider Solaris System Programming, and Sun Performance and Tuning. Some light reading :)

Thursday, August 10, 2006

Filesystem Benchmarks

If you have been reading the zfs-discuss on OpenSolaris.org recently, you would have read that Robert Milkowski has been doing some benchmarks using Sun's StorageTek 3510 FC diskarrays. He has been getting some interesting results that suggest that using ZFS and the 3510 without the hardware RAID controllers is faster than using it with. This is very interesting because the cost of hardware raid controllers can be expensive. If it suits your needs, you can save some cash by using Solaris 10 and ZFS, as both are free!

Since I don't have have a 3510 sitting around to test on, I decided to do a quick benchmark on a spare partition of my laptop to compare ZFS and UFS. We have all been told that ZFS is faster than UFS, but by how much and when is a interesting question.

Using filebench as Robert did, I have started with the varmail workload using the average of three runs to produce the graph below. For each run I created the pool (ZFS) and filesystem, did the three benchmark runs for 60 seconds each, and then destroyed the partition for the next benchmark test.

For ZFS this was

root> zpool create -fm /none benchpool /dev/dsk/c0d0s4
root> zfs create benchpool/mnt
root> zfs set mountpoint=/mnt benchpool/mnt
root> # Set options zfs options e.g. zfs set atime=off benchpool/mnt
root> /opt/filebench/bin/filebench
filebench> load varmail
filebench> set $dir=/mnt/zfstest
filebench> run 60
root> zfs destroy benchpool/mnt
root> zpool destroy benchpool

For UFS -

root> newfs /dev/dsk/c0d0s4
root> mount -o noatime -F ufs /dev/dsk/c0d0s4 /mnt
root> # -o noatime is the option for this test
root> /opt/filebench/bin/filebench
filebench> load varmail
filebench> set $dir=/mnt/zfstest
filebench> run 60
root> umount /mnt

As you can see ZFS is indeed faster for this benchmark than UFS. To be fair, and to compare apples to apples, I should have combined UFS with the Solaris Volume Manager (SVM). This most likely seen a greater gap between ZFS and UFS. One thing it shows, is that a Acer Ferrari 4005 maybe a nice laptop, but it makes a horrible mailserver :(