Quantcast
Channel: SCN : Blog List - SAP Adaptive Server Enterprise (SAP ASE) for Custom Applications
Viewing all 173 articles
Browse latest View live

Upcoming Webcast: Virtualizing SAP Sybase ASE – Tips, Tricks, and Best Practices - Tuesday February 12th

$
0
0

Are you looking to virtualize Sybase products – specifically SAP Sybase ASE? Do you want to get better performance from your virtualized environments?

 

Join Chris Brown, SAP Database and Technologies Group, for an in-depth discussion on virtualizing SAP Sybase ASE, including a sneak preview of upcoming plans for VCE products. In this session, we’ll discuss:

 

  • How to configure VMware vSphere, VMware vFabric Data Director, and VCE products with SAP Sybase ASE to achieve optimal performance
  • Comparative benchmark results from SAP Sybase ASE in a virtualized environment, versus running on base metal
  • VMware vSphere vMotion, VMware vSphere High Availability (HA), live failovers, and migrations of running guests

 

When : February 12th, 2012 at 11am (US Mountain Time Zone)

 

Click here to register.


ASE 15.7: THREADED VS. PROCESS KERNEL MODE.

$
0
0

So, the threaded or the process?  Which one would you select on your next migration?

 

There is a clear recommendation in Sybase documentation to use the threaded kernel mode.  This recommendation, though, is precisely the opposite of what Sybase ASE DBAs would have naturally selected.  I have spoken to several customers that consider to move to 15.7.  Most feel more comfortable moving to 15.7 running in process kernel mode.  Little wonder:  we are used to ASE running this way.  For those who struggled with new optimizer upgrading from 12.x to 15.x to face yet another revolution in ASE code looks formidable.

 

And yet, setting worries aside, perhaps there are good reasons we are pushed into the new kernel?  I am not talking about the improved way the threaded kernel mode treats IO requests (which is publicized by the official documentation) and more steady query performance.  I am talking about things less straightforward.  Is not the process mode a sort of a "compatibility mode" for the 15.7 new kernel version - similar to the "compatibility mode" available for the 15.x new optimizer version.  The official "compatibility mode" was never really recommended for use.  It was an option to choose only if there is absolutely no way to let the new optimizer do its work (incidentally, it is rather curious why this mode should have been though of at all:  if the new optimizer makes more intelligent decisions, why preserving something less optimal in the same code).  What about the process mode kernel?  Is this option really an alternative to the "default" threaded kernel mode (and if it is, say for certain HW platforms, why is it not clearly stated as such)?

 

I have set up a test to check.  In fact, this is not a vane intellectual foible.  I am facing a real migration project in the very close future which will involve moving into 15.7 and which will also involve making a tested decision as to which kernel mode to choose.  If I heed to recommendations - I should choose the threaded mode.  If I heed to the DBA vox populi - I would choose the process mode (threaded mode code is quite new, less than 5 years of real customer experience - how many real issues were found and fixed, if at all?).

 

I have the following settings for the test.  ASE 15.7 ESD#2 (I'll move on to #3 & #4 later on), running on Solaris x64 VM environment (not very optimal, but since I compare rather than analyze, it is no as bad).  Really small ASEs.  ~2GB RAM, no separate cache for  tempdb.  Relatively small procedure cache and statement cache.  3 engines or threads out of 4 VM cores available (2 physical chips).   Something that may be setup and tested by anyone.  I am generating load using very simple code JAVA run from the same host.  The client code selects a couple of rows from a syscomments - each time dynamically generating unique select statement.   I will deliberately submit these requests as  prepared statement.  "Deliberately" because JAVA community loves prepared statements and will continue to use them indiscriminately whether this is a waste of resources or not.  "Deliberately," also, because I know this code has a strong potential to destabilize ASE 15.x.  What I want to check is whether 15.7 version is more stable for this type of "bad" client code than it's previous 15.x variations.  I want also to check which kernel mode performs better - if there is a difference at all.  At last, I want to check which of ASE/JDBC configuration parameters should be used/avoided in for this type of code?

 

Below are the configuration settings (and times) which are behind the graphs that I site throughout this paper.

 

For the threaded mode kernel:

 

"14:56"S157T_2LWP_DYN1_ST0_STR0_PLA0_ST0M
"14:58"S157T_2LWP_DYN1_ST0_STR1_PLA0_ST0M
"15:01"S157T_2LWP_DYN1_ST0_STR1_PLA1_ST0M
"15:03"S157T_2LWP_DYN0_ST0_STR0_PLA0_ST0M
"15:06"S157T_2LWP_DYN0_ST0_STR1_PLA0_ST0M
"15:08"S157T_2LWP_DYN0_ST0_STR1_PLA1_ST0M
"15:12"S157T_2LWP_DYN1_ST1_STR0_PLA0_ST20M
"15:14"S157T_2LWP_DYN1_ST1_STR1_PLA0_ST20M
"15:17"S157T_2LWP_DYN1_ST1_STR1_PLA1_ST20M
"15:19"S157T_2LWP_DYN0_ST1_STR0_PLA0_ST20M
"15:21"S157T_2LWP_DYN0_ST1_STR1_PLA0_ST20M
"15:24"S157T_2LWP_DYN0_ST1_STR1_PLA1_ST20M

 

For the process mode:

 

"12:41"S157P_2LWP_DYN1_ST0_STR0_PLA0_ST0M
"12:44"S157P_2LWP_DYN1_ST0_STR1_PLA0_ST0M
"12:47"S157P_2LWP_DYN1_ST0_STR1_PLA1_ST0M
"12:51"S157P_2LWP_DYN0_ST0_STR0_PLA0_ST0M
"12:54"S157P_2LWP_DYN0_ST0_STR1_PLA0_ST0M
"12:56"S157P_2LWP_DYN0_ST0_STR1_PLA1_ST0M
"13:02"S157P_2LWP_DYN1_ST1_STR0_PLA0_ST20M
"13:05"S157P_2LWP_DYN1_ST1_STR1_PLA0_ST20M
"13:07"S157P_2LWP_DYN1_ST1_STR1_PLA1_ST20M
"13:10"S157P_2LWP_DYN0_ST1_STR0_PLA0_ST20M
"13:13"S157P_2LWP_DYN0_ST1_STR1_PLA0_ST20M
"13:15"S157P_2LWP_DYN0_ST1_STR1_PLA1_ST20M

 

A bit cryptic but what the table above specifies is that we have 2 LWP generating processes, running with the JDBC setting of DYNAMIC_PREPARE set to true or false {DYN1/DYN0}, with the statement cache either turned on or off on the connection level {ST1/ST0}, with the streamlined sql options set on or off {STR0/STR1}, with the plan sharing option turned on or off {PLA0/PLA1} and with the statement cache turned off (0 MB) or configured at 20MB {ST0M/ST20M}.  S157P is Sybase 15.7 process mode.  S157T is Sybase 15.7 threaded mode.

 

Again, these are preliminary tests.  I will be performing similar test on a large SPARC server tomorrow, so more data will be available with time and more insights.  Insights may change.  For now I post only things discovered in this simplistic, home-grown "lab".

 

So here are the numbers:  we are running an endless loop creating unique select statements from JAVA client.  We always submit them as prepared statements, but we change settings:  first we run with statement cache turned off  and the "functionality group" parameters turned off.   We turn each of these on and off, than do the same with submitting the statements with DYNAMIC_PREPARE set to false.  Then we run the same tests again but with the statement cache configured 20MB.

 

CPU load:

 

Process mode:

CPULoad157P

Threaded mode:

CPULoad157T

 

I don't really know what is the throughput here, but I know that the same code generates slightly greater load on ASE running the threaded kernel (either because it has a greater throughput or because it has a greater weight).  An average for the process mode stands on 28% CPU, while on the threaded mode it gets to 32%.  The same is true for peaks:  35% vs. 42%.  What was the throughput?  We will compare TPS and procedure requests per second (actually, both TPS and PPS are the result of compiling the prepared statements we send to the ASE into LWPs - we don't have any DML statements in the code and we do not execute procedures either - so this is ASE's way to handle our code).

 

TPS in process mode:

TPS157P

TPS in threaded mode:

TPS157T

 

TPS is more or less the same.  What about procedure requests per second?

 

Process:

ProcRequests157P

Threaded:

ProcRequests157T

 

Actually, the process mode has generated 200 more procedure requests per second.  This is the speed at which the code is executed (the lacuna in graph is when both the statement cache and the DYNAMIC_PREPARE setting are turned off).

 

In general, the behavior is the same but there are certain unexpected differences.  From right to left:  we start with statement cache turned off (and set statement_cache off in the code).   We get ~700 SPS (SP executions per second).  We turn the streamlined option on - the throughput goes twice up to ~1400 SPS, plan sharing seems to add a bit more.   We turn the DYNAMIC_PREPARE off - no LWPs.  Then we do the same with statement cache turned on on ASE (and set statement_cache on in the code).    The behavior is similar, except that when we set DYNAMIC_PREPARED to off - we again generate LWPs at high rate (incidentally, CPU fares worst in both cases when the prepared statements generated by the code meet the DYNAMIC_PREPARED = false JDBC setting).  And with the statement cache set to zero the CPU load is slightly lower for this type of code.

 

The statement cache data (statements cached per second from sysmon & number of statements in the cache from the monStatementCache).

 

Process mode:

STMT_CACHED_157P

NumSTMT157P

Threaded mode:

STMT_CACHED_157T

NumSTMT157T

 

More or less similar, with some anomaly.

 

So far, it seems there is no great difference.  Moreover, the process mode fares a bit better.

 

However, this is what we see when we monitor the procedure cache usage with sp_monitorconfig.

 

Process mode:

ProcMonitor157P

Threaded mode:

ProcMonitor157T

 

Oups.   There seems to be a problem here.  With the ASE running in the process kernel mode, procedure cache gets exhausted as soon as we run prepared statements and turn the DYNAMIC_PREPARE off for own JDBC client.   The only way to reclaim space in the procedure cache for the process mode kernel is.... to bounce the server.  Urgh.  Pretty bad.   Procedure cache in the process kernel mode is very sensitive and seems to be wasted on something.

 

There is also a difference with compiled plan treatment in the two modes:

 

Process mode (query plans spinlock contention):

SpinQPLAN157P

Threaded kernel:

SpinQPLAN157T

 

This spinlock contention arises only when the statement cache is configured and the statements are submitted with DYNAMIC_PREPARED set to true in the threaded mode.  In process mode it climbs all the way up to 20%.

 

Process mode - statements recompiled due to plans being flushed out of the cache:

 

NumSTMTRecomp157P

Threaded mode:

NumSTMTRecomp157T

 

For some reason, statements are recompiled almost twice more frequently in the process kernel mode.

 

To throw a stream of optimism here:   procedure manager spinlock contention.   This was the nasty spinlock that wrought real havoc in previous 15.x versions (and brought at least one upgrade attempt down).  This one now is pretty well treated in 15.7.

 

Process mode:

 

SpinPMGR157P

Threaded mode:

SpinPMGR157T

 

But this too, seems to be treated a bit better in the threaded kernel than in the process kernel mode.

 

So, what have we learned from this simple test, if at all?  The process kernel mode seems to give more steady throughput for this kind of code:  lower CPU load, higher proc/sec ratio.  At the same time, in process kernel mode, procedure cache functions very badly:  sp_monitorconfig either does not report correct data or reports on procedure cache being exhausted if we run prepared statements and set JDBC to DYNAMIC_PREPARE = FALSE.  Moreover, there is no way to cleanup the procedure cache mysteriously wasted.  Even when all the activity in the server is halted and the procedure cache is cleaned up using the available dbcc commands, it remains 99% full.  I did not test the same issues with later ESDs, but I have seen that on SPARC platform too, process mode seems to yield more steady throughput, but often at the price of getting into issues that the threaded kernel does not have (I hit stack traces on adding engines online under heavy stress, the procedure cache reported zero utilization by sp_monitorconfig; in threaded kernel mode resource memory was dynamically adjusted by ASE at startup, in process kernel mode ASE failed to online engines if the kernel resource memory was missing &c).

 

So process or threaded kernel, which one would you choose?

 

Tomorrow I will be running similar tests on 15.7 ESD#3 running on SPARC.  I will test to see if similar issues arise there.

 

To be continued....

 

ATM.

ASE 15.7: ESD#4 UPDATES….

$
0
0

ESD#4 has been around for some time.  I did not start testing the conditions I am pushing the ASE into with this EBF for several reasons.  But now it seems to really be a pity I haven’t done so from the start.  After making a few tests with ESD#2 on Solaris x64, and after making similar tests with ESD#3 on Solaris SPARC I decided to move on.  The reason for this is that I was not really satisfied with what I saw.  True, I am pushing the ASE into an area which is very imcomfortable for it (simulating a situation which a highly respected Sybase engineer has called running a “really bad code”).  But this is a real-client situation and I must know how my ASE will handle it.  It is really naive to think that ASE should handle only properly formed workload written by developer teams sensitive to the way DB operates.  Today more and more code is written which care very little for DB needs.  Either we face it or the customers moves to DBMS systems that cope with “bad code” better than ASE…

So, on the one hand, I was not satisfied with the results of my tests – especially with high spinlock contention for various spinlocks guarding the procedure / statement cache.  On the other, ESD#4 (and the weird fish ESD#3.1, which has only a little portion of fixes out of ESD#4, but came later on – a couple of days ago in fact) says to have worked on the procedure cache spinlock contention a little more.  Since this is what I was after, I switched the direction a bit.

Unfortunately, I could not spend much time on the current tests, and I will not be able to spend any more time on them in days to come (customer calls).  I did succeed, though, to do some initial tests which I would like to share.  Especially since there is a parallel discussion on ISUG SIG area (now restricted to paid members only) which mentions statement cache issues.  Lovely discussion, but it really deals with how to tune the ASE to handle “good code” without waste of resources and how to monitor the waste.  It still avoids the situation when the ASE is bombarded with “bad code.”

An aside installation note on ESD#4.  I had to truss the installation process since it has been constantly hanging.  As it turned out, the space requirement which the installation checks does not take into account available swap space.  Installer uses /tmp quite freely, and my 800MB of /tmp was not enough for the installation of this ESD.  Had the installation informed me on this requirement in time, it might have saved me some time and pain.  I hope the good code that manages the installation will be made even better in future…

So here are the settings:

ESD#209:3009:3509:4009:4309:4509:5009:5510:0010:03
STREAM011101011
PLAN SH001100001
STCACHE(J)000011111
DYNPREP(J)111111111
STCACHE(MB)000202020202020










ESD#2
10:0610:1210:1510:2010:2310:2510:2810:30
STREAM
11001111
PLAN SH
10000111
STCACHE(J)
11111111
DYNPREP(J)
11101000
STCACHE(MB)
10010010010010010020100










ESD#412:5612:5812:5913:0113:0213:0313:0413:0613:07
STREAM011101111
PLAN SH001100110
STCACHE(J)000011111
DYNPREP(J)111111111
STCACHE(MB)00020202020100100










ESD#4
13:0813:1013:1113:1213:14


STREAM
00111


PLAN SH
00011


STCACHE(J)
11111


DYNPREP(J)
00000


STCACHE(MB)
10010010010020


Really what it says is that we are playing with 3 DB and 2 JDBC client options.  On the DB side we are playing with streamlined dynamic SQL, plan sharing and statement cache size.  On the JDBC client side we are playing with DYNAMIC_PREPARED and set statement_cache on setting (the last is not really on JDBC side, but addresses the needs of that client).  Our aim:  to keep ASE from crumbling beneath the bad code, which in the time before statement cache refinements was manageable.

We start with ESD#2

1CPU

Statement Cache situation:

1ST_CACHE

Spinlock situation:

1SPINS

The rate of LWP/Dynamic SQLs creation:

1LWPs

monitorconfig:

1MONCONFIG

Now ESD#4:

CPU

Statemet Cache situation:

ST_CACHE

Spinlock Situation:

SPINS

The rate of LWPs/Dynamic SQL creation:

LWPs

monitorconfig

PROC_MON

Before anything else, I must add a short note about reducing the size of statement cache.  Although the configuration parameter is dynamic, both in ESD#2 and ESD#4 there is a real problem reducing its size.  After the statement cache has been utilized (100 MB in my tests), when the statement cache is reduced to a lower value (20 MB or even 0 MB) the memory is not released.  This causes a sensible spike in CPU utilization   and a large SSQLCACHE spinlock contention seen on the graphs above.  Probably a bug.   This is what prsqlcache reports:

1> sp_configure “statement cache”, 0
2> go
Parameter Name Default Memory Used Config Value Run Value Unit Type
———— ——————– ———– ——————– ——————– ——————– ——————–
statement cache size 0 0 0 0 memory pages(2k) dynamic

(1 row affected)
Configuration option changed. ASE need not be rebooted since the option is dynamic.
Changing the value of ‘statement cache size’ does not increase the amount of memory Adaptive Server uses.
(return status = 0)

1> dbcc prsqlcache
2> go

Start of SSQL Hash Table at 0xfffffd7f4890e050

Memory configured: 0 2k pages Memory used: 35635 2k pages

End of SSQL Hash Table

DBCC execution completed. If DBCC printed error messages, contact a user with System Administrator (SA) role.
1>

Now to the rest of the findings.  It seems that ESD#4 treats the situation of a client code wasting its statement (and procedure) cache memory structures – uselessly turning over the pages chain – much better.  I did not post the data from my tests on a large SPARC host as promised earlier since they were not really better.    With this new information, I will have to rerun the tests next Sunday on the ESD#4, SPARC.  Hopefully, I will get a much improved performance metrics there as well.  I am eager to see the statement cache/procedure cache saga in ASE 15.x to be laid to rest.  It has really been a pain in the neck dealing with this stuff.  I hope that at last we may sigh a relief…  and get back to the optimizer issues….

Next update will probably come on Monday, unless something interesting is discovered before that (and I have time to test and testify on it here).

Yours.

ATM.

ASE 15.7 ESD#4 ON SPARC…

$
0
0

I have just finished the first round of tests of ESD#4 on Solaris SPARC platform.  I have to confirm:  the ESD#4 seems to has been finally vaccinated against wasteful reuse of the statement/procedure cache by “inappropriate” use in client-side code.  It looks like the painful experience of seeing ASE suffocating unexpectedly under the stress generated by the code that has been running “more or less smoothly” on an old ASE 12.5.4 is behind us.  I will cross test these issues again tomorrow (as well as shift my tests a bit to check yet another issue around the misuse of the statement cache).  But so far I must say that I found that ASE handles this situation successfully at last.  Very good news!

Here are the bare facts, again.

The tests I have run on the SPARC host on ASE 12.5.4 version (ESD#10):

S125_6LWP_DYN1_ST0_ST0M_108:50:00
S125_6LWP_DYN0_ST0_ST0M_109:01:00
S125_6LWP_DYN0_ST0_ST20M_109:12:00
S125_6LWP_DYN0_ST1_ST20M_109:22:00
S125_6LWP_DYN1_ST1_ST20M_109:34:00

The tests I have run on the SPARC host on ASE 15.7 – threaded mode kernel (ESD#4):

S157T_6LWP_DYN1_ST0_STR0_PLA0_ST0M_108:50:00
S157T_6LWP_DYN1_ST0_STR1_PLA0_ST0M_109:01:00
S157T_6LWP_DYN1_ST0_STR1_PLA1_ST0M_109:12:00
S157T_6LWP_DYN0_ST0_STR0_PLA0_ST0M_109:22:00
S157T_6LWP_DYN0_ST0_STR1_PLA0_ST0M_109:34:00
S157T_6LWP_DYN0_ST0_STR1_PLA1_ST0M_109:46:00
S157T_6LWP_DYN1_ST0_STR0_PLA0_ST20M_109:56:00
S157T_6LWP_DYN1_ST0_STR1_PLA0_ST20M_110:06:00
S157T_6LWP_DYN1_ST1_STR0_PLA0_ST20M_110:15:00
S157T_6LWP_DYN1_ST1_STR1_PLA0_ST20M_110:24:00
S157T_6LWP_DYN1_ST1_STR1_PLA1_ST20M_110:34:00
S157T_6LWP_DYN0_ST1_STR0_PLA0_ST20M_110:43:00
S157T_6LWP_DYN0_ST1_STR1_PLA0_ST20M_110:52:00
S157T_6LWP_DYN0_ST1_STR1_PLA1_ST20M_111:01:00
S157T_6LWP_DYN0_ST1_STR1_PLA1_ST200M_111:09:00
S157T_6LWP_DYN1_ST1_STR1_PLA1_ST200M_111:17:00

The tests I have run on the SPARC host on ASE 15.7 – process mode kernel (ESD#4):

S157P_6LWP_DYN1_ST0_STR0_PLA0_ST0M_112:24:00
S157P_6LWP_DYN1_ST0_STR1_PLA0_ST0M_112:32:00
S157P_6LWP_DYN1_ST0_STR1_PLA1_ST0M_112:43:00
S157P_6LWP_DYN0_ST0_STR0_PLA0_ST0M_112:52:00
S157P_6LWP_DYN0_ST0_STR1_PLA0_ST0M_113:01:00
S157P_6LWP_DYN0_ST0_STR1_PLA1_ST0M_113:09:00
S157P_6LWP_DYN1_ST0_STR0_PLA0_ST20M_113:18:00
S157P_6LWP_DYN1_ST0_STR1_PLA0_ST20M_113:27:00
S157P_6LWP_DYN1_ST1_STR0_PLA0_ST20M_113:37:00
S157P_6LWP_DYN1_ST1_STR1_PLA0_ST20M_113:45:00
S157P_6LWP_DYN1_ST1_STR1_PLA1_ST20M_113:53:00
S157P_6LWP_DYN0_ST1_STR0_PLA0_ST20M_114:02:00
S157P_6LWP_DYN0_ST1_STR1_PLA0_ST20M_114:11:00
S157P_6LWP_DYN0_ST1_STR1_PLA1_ST20M_114:20:00
S157P_6LWP_DYN0_ST1_STR0_PLA1_ST200M_114:28:00
S157P_6LWP_DYN1_ST1_STR0_PLA1_ST200M_114:37:00
S157P_6LWP_DYN1_ST1_STR1_PLA1_ST200M_114:44:00
S157P_6LWP_DYN0_ST1_STR1_PLA1_ST200M_114:50:00

The performance graphs:

ASE 12.5.4:

CPULOAD

Spinlock Situation (note the way 12.5.4 handles the situation with the statement cache enabled – pure disaster):

SPINS

Procedure Cache:

PCACHE

Dynamic SQL/LWPs creation rate:

LWPS

ASE 15.7 ESD#4 – threaded kernel mode:

Thread Load:

CPULOAD

Spinlock situation:

SPINS

Procedure cache:

PCACHE

Dynamic SQL/LWPs creation rate:

LWPs

Statement Cache (not available on 12.5.4):

STCACHE

ASE 15.7 ESD#4 – process kernel mode:

Engine Load:

CPULOAD

Spinlock situation:

SPINS

Procedure cache:

PCACHE

Dynamic SQL/LWPs creation rate:

LWPs

Statement Cache (not available on 12.5.4):

STCACHE

Threaded kernel mode gives a very steady throughput.  More steady than the process kernel mode.  There is the same “bug” in the process mode that the sp_monitorconfig at certain point stops reporting on procedure cache utilization (I wonder if the new monMemoryUsage MDA supplements the missing data).

In general, if you do have a client that generates a large number of fully prepared statements, DON’Tturn off the statement cache on the session level and DON’T turn off the DYNAMIC_PREPARE JDBC setting.  In both cases, the thread utilization climbs up (and the proccache spinlock).  In addition, if the statement cache us ruthlessly turned over due to very high volume of unique statements generated by the code, keep the cache as small as possible – 20M was fine here, 200M was pretty bad.

The threaded kernel mode gives more satisfactory results – more steady performance, slightly better throughput and less bugs.

I will be running more and different tests in the following weeks – as well as comparing performance across a wider spectrum of metrics – but from the point of view of running a high volume of unique prepared statements the problem of ASE 15.x seems to have been solved at last.

ATM.

All previous forums rolled into this one?

$
0
0

Do I understand this correctly?

All previous forums for ASE, RepSvr, Tuning, etc are now in this one forum?

And since I don't see any historic entries, I assume we will loose all the info contained in the old forums?

 

 

Thx,

rick_806

ASE 15.7 ESD#4 Unix Domain Sockets

$
0
0

In the past few days, I have fired up a copy of ASE 15.7ESD#4  to try the new feature Unix Domain Sockets.  I found only a few notes about this feature. The CR listing is source of my information.  CR 667751 has all of the public information that I have found to date.

 

Here is the heart of the information.

 

"master afunix unused //<hostname>/<pipe>"

 

After many failed attempts, I realized how simple this could be.

The ASE Server creates a socket file at start-up based on the master entry in the interfaces file.

In order for this to work, the ASE dataserver must have write access to a folder.

Hmmm-- $SYBASE sounds like a good choice.

 

I decided I wanted my socket file to be called "ASE.socket" in /sybase/ folder.

 

Here is an example of the master line that finally worked.


master afunix unused //mytesthost/sybase/ASE.socket

 

I tested the socket with isql & bcp... Both tools worked without a problem.  The feature of the new connection is to reduce the overhead associated with TCP/IP.  The results of my bcp inbound testing proved that the network layer was not my source of contention and slowness.  I expect that bcp outbound will show better results if I can get past other self-inflicted issues of my machine.

 

In conclusion, ASE Unix Domain Sockets works without issues when I use the ASE utilities.


 


ASE 15.7: Prepared Statements and ASE Statement/Procedure Cache, Configuration Impact

$
0
0

So we are back with the same issue:   statement cache/procedure cache behavior under the stress of executing a high volume of prepared statements.  

 

I have just finished another round of tests around this issue (which has caused quite a lot of troubles in the past releases of ASE) and I want to share and recapitulate (the more you explain the more you understand yourself).

 

First, let's quote a bit of documentation - to lay out who are the players in our field.  I will reserve it to a very few aspect of what is relevant.  .

 

[1] Procedure Cache:  {Performance and Tuning Series: Basics. Chapter 5: Memory Use and Performance}

Adaptive Server maintains an MRU/LRU (most recently used/least recently used) chain of stored procedure query plans.  As users execute stored procedures, Adaptive Server looks in the procedure cache for a query plan to use.  If a query plan is available, it is placed on the MRU end of the chain, and execution begins.

...

The memory allocated for the procedure cache holds the optimized query plans (and occasionally trees) for all batches, including any triggers.

 

[2] Statement Cache:  {Performance and Tuning Series: Basics. Chapter 5: Memory Use and Performance}

The statement cache saves SQL text and plans previously generated for ad hoc SQL statements, enabling Adaptive Server to avoid recompiling incoming SQL that matches a previously cached statement. When enabled, the statement cache reserves a portion of the procedure cache

 

[3] Streamlined Dynamic SQL:  {Performance and Tuning Series: Basics. Chapter 5: Memory Use and Performance}

In versions earlier than 15.7, Adaptive Server stored dynamic SQL statements (prepared statements) and their corresponding LWP in the dynamic SQL cache.  Each LWP for a dynamic SQL statement was identified based on the connection metadata.  Because connections had different LWPs associated with the same SQL statement, they could not reuse or share the same LWP.  In addition, all LWPS and query plans created by the connection were lost when the Dynamic SQL cache was released.

In versions 15.7 and later, Adaptive Server uses the statement cache to also store dynamic SQL statements converted to LWPs.  Because the statement cache is shared among all connections, dynamic SQL statements can be reused across connections.

 

[4] DYNAMIC_PREPARE property:

When client connection executing prepared statement sends request to ASE, it may either send language command as plain SQL text (if DYNP is set to false), or request an ASE to create LWP for it (if DYNP is set to true - something that may be seen in monSysSQLText as "create proc dynXXX as..." & DYNAMIC_SQL dynXXX...).

To quote from Managing Workloads with ASE, "The statement cache reproduces the same benefits as fully prepared statements as it takes language commands from client applications, replaces literals with parameters, creates a statement hash key, compiles and optimizes the statement, and creates a Light Weight Proc for re‐use."

 

In plain language, we may describe the playground of our tests in the following way:

 

When a client connection sends an ASE a request, if this request contains a prepared statement, the client connection will either send it as it is (SQLLANG) or convert it into a create procedure request (DYNP/LWP). If the statement cache is enabled on ASE, ASE will store the SQL text & pointer to its LWP in the statement cache, install the plan/tree of the corresponding LWP in the procedure cache and ultimately execute it.

 

The motivation behind all this is to reuse as much resources within ASE as possible. Without the statement cache, each adhoc query has to be "converted" [parsed/normalized/compiled] into query plan individually and dropped after being executed, thus preventing reuse. With statement cache enabled, the query is looked up in the statement cache instead and if the match is found its plan is cloned from the procedure cache and executed.  If it is not there, it will be  "converted" and installed for reuse.

 

The same applies to prepared statement. Without the statement cache [and when streamlined option is turned off], each prepared statement has to be converted [parsed/normalized/compiled] into query plan and stored in the procedure cache individually - to be released at client disconnect. With statement cache and streamlined option enabled, the statement is looked up, cloned and executed (or stored for later reuse).

 

In fact, this "reuse" methodology is a patented feature [2012]. It involves scanning statement/procedure cache memory page chains (holding spinlock) and either installing a new or reusing an old object.

 

StCacheSearch

 

In fact, one may collect quite a lot of information on all this over the web.

 

So far the theory.   We know who is the player on our testing grounds.  What we will see below based on the tests done is how ASE reacts to changing different configuration parameters related to the relatively new aggressive resource reuse aim.

I have confined myself in the tests to only four possible setting to play with:  existence of the statement cache (STxM), streamlined SQL option (STRx), DYNAMIC_PREPARE option (DYNPx) and configuring connection setting to use statement_cache (STx). This results in the following matrix of possible tests:

 

pst_TestTimes

 

I run 10 Java clients executing unique prepared statements on a 15-thread ASE, 15.7 ESD#4, SPARC.  The "literal autoparam" setting is turned on, so is the "plan sharing" option.

 

The first graph represents the thread busy for all our tests:

 

pst_ThreadBusy

 

First thing to notice is that whenever the prepared statement meets an ASE, if the connection property of DYNAMIC_PREPARE is turned off, ASE responds in 5 to 10 % leap in thread utilization.  On the one hand, it seems pretty obvious:  we "reuse" LWPs rather than demand ASE to generate (parse/normalize/compile) plan for each statement over and over again.  This is no so obvious, though, since in our case we generate LWPs wrapping 10 completely unique streams of prepared statements.  Reuse here is pretty minimal.  It seems a bit odd that making an ASE to scan its statement cache, install new LWP + QP and execute it to have less CPU impact than just prepare the plan and drop it.  The numbers, though, are unequivocal.   Even if ASE faces a thick stream of unique prepared statements requests, it handles it much better if the client requests them as procedures (incidentally, it was NOT the case with previous ASE 15 releases).

 

I'd like to share also the following graph:

 

pst_OpenObjects

 

We know from documentation that each cached statement consumes one object descriptor.  So it makes sense that turning the statement cache on the number of active object descriptors will rise [12:25].  The impact, though, is much greater when the streamlined option is turned on.  Something to be kept in mind.

 

This one is also interesting:

 

pst_LockRequests

pst_ShraedRow

 

Each time the DYNP is turned on, the number of lock requests is doubled.

 

This one is also telling:

 

pst_TDB_Util

pst_DDC_Util

 

Since we create our LWPs in tempdb, and since we run exclusively the code that causes ASE to generate LWPs, each time the DYNP is turned on, tempdb utilization surges up.

 

Now, let's see how we are handling procedure requests and statements requests in each case:

 

Procedures:

pst_ProcReq

pst_ProcRem

 

Statements:

 

pst_StCachedpst_StDroppedpst_StINCachepst_StNotCachedpst_StNotINCache

 

We run 12K procedure requests per second.  Demand on the statement cache is highest when we run DYNPs streaming them into the statement cache (streamlined on).

 

Happily enough, this high volume of prepared statements washing the statement/procedure cache is handled with a relatively low spinlock contention (I will test it running client connections > threads situation in future).

 

pst_SpinPCHpst_SpinPMGRpst_SpinQRYPpst_SpinSSQL

 

It is important to state here that ASE runs with TF758 turned on.  Without it you would have seen proc_cache spinlock getting to 10-s.

 

Let's see the statement cache utilization from another angle (monStatementCache):

 

pst_STCACHE

 

Our cache hit ratio is not impressive (little wonder in our situation of running unique statements from 10 concurrent sessions).   What is interesting to notice, though, is the number of statement the same 20MB cache contains running with and without the streamlined option.  Turning the streamlined option on caused the number of statements contained in the statement cache to double.

 

The following displays what the client connection requests from ASE in terms of LWP requests:

 

pst_DYNSQLS

 

When we turn the DYNP on we start requesting to create procedure from ASE explicitly.  However, the lifetime of the procedures in the cache is reverse:  when DYNP is off, ASE creates procedures implicitly and stores them in the statement cache + procedure cache.  When DYNP is on, ASE creates the procedure and drops it almost instantly - unless the streamlined option is turned on, causing ASE to store them in the statement cache as well - in a far greater number (incidentally, the names of the DYNP LWPs and ADHOC LWPs also differ).

 

 

So, what have we learned from all this, if at all?

 

From the ASE load perspective, we have seen that configuring the ASE to reuse (streamline) the dynamic SQLs is beneficial even if we face a situation of concurrent client connections generating a high volume of unique prepared statements.  Turning the DYNAMIC_PREPARE option off would only cause an additional 10% surge in CPU utilization (there is a problem that we do not really know how the throughput is influenced but rather assume it from the rate of procedure request, which is not precisely true - I will have to change the way I perform the load in future in order to generate more accurate comparison here).

 

We have learned that forcing ASE to reuse (streamline) the dynamic SQLs will force the procedure cache to handle much more plans/trees since in addition to adhoc queries landing in the statement/procedure cache to aged out, LWPs and their plans that would have died as clients disconnect will continue to occupy both statement and procedure cache, until they age out.  Although we have seen that setting the streamlined option on is beneficial to an ASE in our situation, we must not neglect this fact (after all we run each test at a time, not mix - had we mixed, both adhocs and dynps would land in the same area).   In addition, since statement cache is a general resource, if we know that we have bad (unique & high volume) DYNPs workload threatening to wash out our statement cache, perhaps the best decision would be to turn the streamlined option off and keep these DYNPs off statement cache altogether.

 

We have learned that open objects usage surges up with the streamlined option turned on, so it must be addressed in configuration in order to avoid descriptor reuse.

 

We have learned that we have to treat tempdb differently if we know that we will be streamlining.  It's utilization goes way up when ASE has to create LWPs at high rate, so additional attention is needed here.

 

What we have not understood is what is the origin of the consistent leap in shared row requests when we configure DYNAMIC_PREPARE to true.  Is it because we handle explicit create procedure requests?   To be tested further.

 

What we have also learned is how the configurations affecting statement cache may influence the procedure cache.  Documentation states that "statement cache memory is taken from the procedure cache memory pool" - a puzzling statement since the memory consumption of both is added together (logical/physical ASE memory) System Administration Guide: Volume 2, Configuring Memory.  What is obvious, though, is that the two have very high degree of reciprocity:  each statement landing into statement cache will have to consume space in the procedure cache for the LWP tree/plan.  Which means that statement cache is not a portion of  procedure cache.  If configured, it will consume a portion of procedure cache which is directly influenced by its size (it is said to be reserved ahead in the documentation).   This is much more accurate way to explain the relationship between statement cache and procedure cache.

 

We still know very little about the procedure cache.  We know that it has its chain (one, or more if it has already been implemented) of procedure query plans/trees.  We do not know, for example, which of the procedure cache modules / allocators each occupy and if these have been partitioned.  I did monitor the impact of the tests I run on both, but the data it generated has been pretty confusing/uninformative.  Especially so since there is no documentation whatsoever explaining procedure cache structure in details.  We also know very little about ELC and its impact on our tests - although it is pretty sure that it plays a role here (TF758 may testify this).

 

Which returns us to a famous Socratic sage:   the more you know the more you know you don't know...

 

Hm,  should we start the testing over?

 

ATM.

ASE 15.7: Threaded vs. Process Kernel Mode - Solaris/RHEL cross-tests

$
0
0

I am back with the comparative tests:  threaded kernel mode versus process kernel mode.  I have seen in  my earlier tests that process kernel mode seems to get slightly better throughput under the same workload.  On the other hand, I have also found that process kernel mode have shown certain types of unwanted behavior absent from the threaded kernel mode.

 

In order to verify these differences I decided to port the tests from the Solaris (SPARC) platform to Redhat Linux (Xeon). This way I will be able to test ASE behavior on different platforms: Solaris running on SPARC and RHEL running on Xeon.  The probability of an error will be minimized.

 

Having performed identical test, I have indeed been able to verify that the threaded kernel mode displays up to 10% more CPU utilization.  Behavioral lapses in the process mode too were persistent across platforms.

 

What I will report below is a series of 4 tests. Two of them performed on a SPARC host (32 logical processors) with ASE running in threaded and process mode, and another two performed on a XEON host (32 logical processors) with ASE running again in threaded and process modes.  The legend of the tests is identical to my previous posts, so I will not repeat it here.  I am still working around the issue of generating prepared statements at high rate.

 

SPARC_TESTS

 

 

XEON_TESTS

I will post only a limited number of graphs, those that looked most peculiar to me.  I will not comment on how differences in configurations affected the CPU utilization/throughput either (this too is found in earlier posts).

 

First the TPS/CPU Load:

 

SPARC:

TPS.SOL

XEON:

TPS.LX

 

ProcPS/CPU Load:

 

SPARC:

PROC_REQ.SOL

PC_MON.pr.solPC_MON.th.sol

XEON:

PROC_REQ.LX

PC_MON.pr.lxPC_MON.th.lx

 

Statement Cache:

 

SPARC:

ST_CACHED.SOLST_DROPPED.SOLST_FOUND_IN_CACHE.SOL

XEON:

ST_CACHED.LXST_DROPPED.LXST_FOUND_IN_CACHE.LX

 

Spinlock Contention:

 

SPARC:

PCACHE_SPIN.SOLSSQLCACHE_SPIN.SOL

XEON:

PCACHE_SPIN.LXSSQLCACHE_SPIN.LX

 

There are quite a few interesting things.  The same things were found to be working less well across platforms (such as sp_monitorconfig that stopped reporting utilization at certain time, on either RHEL or SPARC, or PC/SC spinlock contention that were consistently higher in the process mode - even with TF758).  What has been pretty surprising is the huge throughput leap in porting the tests from 8 chip rather old SPARC VI host (16 cores, 32 threads running 2.15 GHz) to 2 chip XEON 2600 host (16 cores, 32 threads running 2.7 GHz).  I will check SPARC VII (2.66 GHz 4-core) and T4 (2.85 GHz 8-core) chips later on to have more data.  But the throughput difference found here is something worth digging into.

 

Anyway, even with the latest tests there still remained a question:  do we really see greater thread utilization due to a higher throughput that ASE achieves running the threaded mode OR the higher thread utilization brings about a lower throughput instead.  In order to test this I had to modify slightly my tests so that I get more precise recording of the volume of work the server does in each test.

 

This has brought me to the following numbers (I have tested only the RHEL host - more tests to come):

 

THROUGHPUT.LX

 

The 100% utilization corresponds to the threaded kernel mode - we have slightly lower number of code loops (I compare here an average number of loops code in each client connection performs for the same period of time).  The threaded mode gets an average of 240 loops with 100% thread utilization, while the process mode gets 260 loops with 90% engine utilization.  This is not much, but the difference is there.  What is also interesting is the following:

 

Process Mode:

CODE_LOOPS.pr.lx

Threaded Mode:

CODE_LOOPS.th.lx

 

Each bar in the graph corresponds to a client connection executing its code (identical).  Whereas in the process kernel mode the throughput deviation between each client may vary (200 to 300!), for the threaded kernel mode this deviation does not exist (within the range of 10 - which is explained by the fact that the threads are activated serially).

 

So it seems to be true:  the threaded kernel mode does get to a slightly higher thread utilization without an increase in throughput. The difference is not significant, but it is there. On another hand, the threaded mode yields much more stable performance - both in terms of query response time and across many other counters. If I had to choose, I'd choose the threaded kernel mode over the process mode - based on its performance characteristics (probably hosted on Linux - but I have to check this aspect more thoroughly before I comment on it with any degree of reliability).  If you consider that some process mode clients has suffered ~15% drop in throughput in comparison to the threaded mode clients, the 35% improvement other process mode clients received is somewhat compromised.

 

My last tests before I leave this topic will be focusing on whether the throughput rises when more threads are brought online for the strangled ASE, thus reversing the balance towards the threaded mode.  I will check this across both RHEL and SPARC hosts to have more sanitized data and report on it here.

 

 

ATM.


ASE 15.7: Threaded Kernel vs. Process Kernel: Throughput Variance

$
0
0

I have just finished performing a controlled analysis of general performance differences for ASE 15.7 running process and threaded kernel mode on various platforms and hosts (RHEL running on Dell Xeon chips, Solaris running on SPARC M, SPARC T  and x64).  Although there are quite a few differences, the following image tells the story of the throughput variance one may expect when switching between different kernel modes:

 

PSS_VS_THR_THROUGHPUT

 

Top to bottom:  the top graph shows response time for the same query run by multiple concurrent client connections (each bar representing an absolute number of query executions a client connection has performed in a period of time - 5 minutes).  Bottom left - an average "throughput" for the client connections running vis-a-vis ASE configured in process kernel mode.  Bottom right - an average "throughput" for the client connections running vis-a-vis ASE configured in threaded kernel mode.   The bottom table(s) describes the type of configuration applied to ASE/JDBC.

 

The image is a bit bulky, but it is pretty eloquent and consistent - across all the platforms tested.  Even though sometimes it looks like ASE running the process kernel mode out-performs ASE running the threaded kernel mode, if you compare throughput more accurately it becomes clear that threaded kernel mode yields very stable, predictable response time, while the process kernel mode exhibits a bursty, unpredictable response time, with some client sessions outperforming others by as much as 100%.  Performing less controlled tests may create a false impression that the process kernel mode runs faster, which in fact is not really true.

 

It is nothing new, however.  Predictable response time has been named as one of the aspects the threaded kernel has brought into an ASE.  In the tests I run, both java client sessions and ASE itself were located on the same host, competing for OS resources.  Although there is very little I/O involved in the tests, still there is a telling difference in response time across the kernel modes.

 

I will not comment on the type of tests run in details here.  The only thing I will say is that you should definitely test your JDBC/ODBC client application with different JDBC/ODBC + ASE settings.  As it may be seen from the throughput variance, configuring ASE/JDBC/ODBC properly does make a difference (although it is not always possible to configure both for the best performance in real-life situation).

 

Stay curious and check yourself as much as possible...

 

ATM.

Violin Flash Memory Arrays Certified with SAP Sybase ASE

$
0
0

Violin Memory, Inc., provider of memory-based storage systems, and SAP AG today (April 16th)  announced that Violin 6000 Series Flash Memory Arrays are now certified for interoperability with SAP Sybase Adaptive Server Enterprise (SAP Sybase ASE), delivering reliability and accelerated performance across a range of enterprise applications. Used together, these solutions will provide customers with enhanced and simplified solutions that leverage the power of Violin Flash Memory Arrays and business-critical SAP applications to accelerate transaction-driven environments. Additionally, Violin Memory is now an SAP software solution and technology partner in the SAP PartnerEdge program. It will work with SAP to serve new and existing customers, partners and channels to benefit from solutions from Violin Memory and SAP. Violin products are also available in support of SAP ERP applications.

 

“The Violin Flash Memory Arrays integrated with SAP Sybase ASE are a key solid-state storage component for the acceleration of business-critical applications from SAP,” said Kevin Ichhpurani, senior vice president, Business Development and Ecosystem Innovation, SAP. “The certification process showed that the arrays can deliver significant performance improvement over traditional disk-based systems, delivering substantial value to customers. In fact, during preliminary internal testing, we have seen 10x improvement in the performance of SAP Sybase ASE running on Violin Flash Memory.”

 

Read the full story at - http://www.violin-memory.com/news/press-releases/violin-flash-memory-arrays-now-certified-with-sap-sybase-adaptive-server-enterprise-to-accelerate-customer-applications-in-transaction-driven-environments/

ASE FAQ: What do the numbers at the beginning of each line in the errorlog mean?

$
0
0

Q: What do the numbers at the beginning of each line in the errorlog mean?

 

A: Most entries in the ASE errorlog have a prefix in the form aa:bbbb:ccccc:ddddd:<date> <component> <message>

 

Example:

00:0000:00000:00001:2013/05/03 08:56:55.35 server  Master device size: 110 megabytes, or 56320 virtual pages. (A virtual page is 2048 bytes.)

00:0000:00000:00001:2013/05/03 08:56:55.37 kernel  Setting console to nonblocking mode.

00:0000:00000:00001:2013/05/03 08:56:55.37 kernel  Console logging is enabled. This is controlled via the 'enable console logging' configuration parameter.

 

These prefix fields are:

 

FieldMeaning
aaInstance-id.  This is always 0 for symmetric multiprocessing (SMP) servers, for Cluster Edition (CE) ASE it identifies the instance.
bbbbThread-id (threaded mode) or engine-id (process mode).
cccccFamily thread id (fid).  If the spid is using worker processes to perform parallel processing, this field identifies which one was responsible for the message.
dddddLogical thread id (spid).
dateDate and time the message was generated
componentUsually either <server> or <kernel>, indicates which layer of the ASE code generated the error.  The server layer implements the general database functionality, the kernel layer is the interface to the hardware

Q: Are connections using net password encryption?

$
0
0

Q: Are connections using net password encryption?

 

Turning on the sp_configure setting “net password encryption required” setting can greatly improve security.  Client applications that have not been programmed to use password encryption send their passwords over then netword in plaintext, where the password can be sniffed.  Turning the “net password encryption required” option on prevents such clients from connecting to ASE (though they will still be sending readable passwords over then network while trying to connect).  Presumably the users will contact the SA asking why they can't connect, allowing the applications to be identified and rewritten to use password encryption.  However, the approach of just turning this option on could cause unacceptable service interruptions.  Is there a way to identify such connections from within ASE before turning on the feature?


A: Yes, assuming the applications have current connections to the server.  It isn’t terribly convenient, but you can run a DBCC PSS(uid,spid) command against a connection. There is a bit set in the field named "p6stat" if net password encryption was not used.

 

Note: DBCC PSS is not a formally documented command; it's output may change between versions without warning. This example output is from Adaptive Server Enterprise/15.7.0/EBF 20369 SMP ESD#02 /P/Sun_svr4/OS 5.10/ase157esd2/3109/64-bit/FBO/Sat Jul  7 10:07:17 2012


Here I log in without the -X parameter used to turn on net password encryption in ISQL

bret-sun2% isql -Usa -P********
1> select @@spid
2> go

------
     17

(1 row affected)
1> dbcc traceon(3604)
2> go
00:0000:00000:00017:2013/05/03 12:49:27.44 server  DBCC TRACEON 3604, SPID 17
DBCC execution completed. If DBCC printed error messages, contact a user with
System Administrator (SA) role.
1> dbcc pss(1,17)
2> go
{

PSS (any state) for suid 1 - spid 17:

PSS at 0x10006da1a88

PSS Status fields :
pstat=0x10000 (0x00010000 (P_USERPROC))
p2stat=0x1010 (0x00001000 (P2_XLATE), 0x00000010 (P2_DEBUG))
p3stat=0x800 (0x00000800 (P3_PSS_ACTIVE))
p4stat=0x0 (0x00000000)
p5stat=0x8 (0x00000008 (P5_RUSRCONN_USED))
p6stat=0x10 (0x00000010 (P6_NETPWD_NO_ENCRYPT)
p7stat=0x0 (0x00000000)
p8stat=0x0 (0x00000000)
pextstat=0x0 (0x00000000)

In contrast, when net password encryption is used, that bit isn’t set.
In the following example, you can see another bit has been set in p8stat showing which encryption method was used
p8stat=0x2 (0x00000002 (P8_NETPWD_RSA_ENCRYPT3)).
The exact bit set when encryption is used may differ depending on the client and server versions.  For instance, in 15.0.3, the bit set is p6stat=0x40 (0x00000040 (P6_NETPWD_RSA_ENCRYPT)).


bret-sun2% isql -Usa -P******** -X
1> select @@spid
2> go

------
     18

(1 row affected)
1> dbcc traceon(3604)
2> go
00:0000:00000:00018:2013/05/03 12:52:47.79 server  DBCC TRACEON 3604, SPID 18
DBCC execution completed. If DBCC printed error messages, contact a user with
System Administrator (SA) role.
1> dbcc pss(1,18)
2> go
{

PSS (any state) for suid 1 - spid 18:

PSS at 0x10006dba390

PSS Status fields :
pstat=0x10000 (0x00010000 (P_USERPROC))
p2stat=0x1010 (0x00001000 (P2_XLATE), 0x00000010 (P2_DEBUG))
p3stat=0x800 (0x00000800 (P3_PSS_ACTIVE))
p4stat=0x0 (0x00000000)
p5stat=0x8 (0x00000008 (P5_RUSRCONN_USED))
p6stat=0x0 (0x00000000)
p7stat=0x0 (0x00000000)
p8stat=0x2 (0x00000002 (P8_NETPWD_RSA_ENCRYPT3))
pextstat=0x0 (0x00000000)

 

If you are using ISQL version 15.0 ESD 12 or higher, the new pipe feature can get you the results for every active spid at once.
(my thanks to Dan Thrall for pointing out this improvement to the method).

 

In this example, the first 14 spids are system processes so don’t have these bits set.
Spid 43 isn’t using network encryption while spid 44 is using it.

 

1> dbcc pss(0,0)
2> go | egrep "NETPWD|pspid"


pkspid=13434983   pspid=2   pclient_kpid=13434983   parent_spid=2
pkspid=13566056   pspid=3   pclient_kpid=13566056   parent_spid=3
pkspid=13697129   pspid=4   pclient_kpid=13697129   parent_spid=4
pkspid=13828202   pspid=5   pclient_kpid=13828202   parent_spid=5
pkspid=13959275   pspid=6   pclient_kpid=13959275   parent_spid=6
pkspid=14090348   pspid=7   pclient_kpid=14090348   parent_spid=7
pkspid=14221421   pspid=8   pclient_kpid=14221421   parent_spid=8
pkspid=14352494   pspid=9   pclient_kpid=14352494   parent_spid=9
pkspid=14483567   pspid=10   pclient_kpid=14483567   parent_spid=10
pkspid=14614640   pspid=11   pclient_kpid=14614640   parent_spid=11
pkspid=14745713   pspid=12   pclient_kpid=14745713   parent_spid=12
pkspid=14876786   pspid=13   pclient_kpid=14876786   parent_spid=13
pkspid=16711808   pspid=15   pclient_kpid=16711808   parent_spid=15
pkspid=16056443   pspid=20   pclient_kpid=16056443   parent_spid=20
p6stat=0x10 (0x00000010 (P6_NETPWD_NO_ENCRYPT))
pkspid=19071122   pspid=43   pclient_kpid=19071122   parent_spid=43
p8stat=0x2 (0x00000002 (P8_NETPWD_RSA_ENCRYPT3))
pkspid=19202195   pspid=44   pclient_kpid=19202195   parent_spid=44

 

Capture the contents of master..sysprocesses at the same time so you can correlate the spid with application names, user logins, and ip addresses.

 

There is an open feature request, CR 700602, to have the pssinfo() function enhanced to be able to output the pstat fields.

 

Bret Halford

Support Architect, SAP Active Global Support

Sybase, Inc., an SAP Company

385 Interlocken Crescent Suite 300, Broomfield CO 80021, USA

ASE 15.7: ESD#4 UPDATES….

$
0
0

ESD#4 has been around for some time.  I did not start testing the conditions I am pushing the ASE into with this EBF for several reasons.  But now it seems to really be a pity I haven’t done so from the start.  After making a few tests with ESD#2 on Solaris x64, and after making similar tests with ESD#3 on Solaris SPARC I decided to move on.  The reason for this is that I was not really satisfied with what I saw.  True, I am pushing the ASE into an area which is very imcomfortable for it (simulating a situation which a highly respected Sybase engineer has called running a “really bad code”).  But this is a real-client situation and I must know how my ASE will handle it.  It is really naive to think that ASE should handle only properly formed workload written by developer teams sensitive to the way DB operates.  Today more and more code is written which care very little for DB needs.  Either we face it or the customers moves to DBMS systems that cope with “bad code” better than ASE…

So, on the one hand, I was not satisfied with the results of my tests – especially with high spinlock contention for various spinlocks guarding the procedure / statement cache.  On the other, ESD#4 (and the weird fish ESD#3.1, which has only a little portion of fixes out of ESD#4, but came later on – a couple of days ago in fact) says to have worked on the procedure cache spinlock contention a little more.  Since this is what I was after, I switched the direction a bit.

Unfortunately, I could not spend much time on the current tests, and I will not be able to spend any more time on them in days to come (customer calls).  I did succeed, though, to do some initial tests which I would like to share.  Especially since there is a parallel discussion on ISUG SIG area (now restricted to paid members only) which mentions statement cache issues.  Lovely discussion, but it really deals with how to tune the ASE to handle “good code” without waste of resources and how to monitor the waste.  It still avoids the situation when the ASE is bombarded with “bad code.”

An aside installation note on ESD#4.  I had to truss the installation process since it has been constantly hanging.  As it turned out, the space requirement which the installation checks does not take into account available swap space.  Installer uses /tmp quite freely, and my 800MB of /tmp was not enough for the installation of this ESD.  Had the installation informed me on this requirement in time, it might have saved me some time and pain.  I hope the good code that manages the installation will be made even better in future…

So here are the settings:

ESD#209:3009:3509:4009:4309:4509:5009:5510:0010:03
STREAM011101011
PLAN SH001100001
STCACHE(J)000011111
DYNPREP(J)111111111
STCACHE(MB)000202020202020










ESD#2
10:0610:1210:1510:2010:2310:2510:2810:30
STREAM
11001111
PLAN SH
10000111
STCACHE(J)
11111111
DYNPREP(J)
11101000
STCACHE(MB)
10010010010010010020100










ESD#412:5612:5812:5913:0113:0213:0313:0413:0613:07
STREAM011101111
PLAN SH001100110
STCACHE(J)000011111
DYNPREP(J)111111111
STCACHE(MB)00020202020100100










ESD#4
13:0813:1013:1113:1213:14


STREAM
00111


PLAN SH
00011


STCACHE(J)
11111


DYNPREP(J)
00000


STCACHE(MB)
10010010010020


Really what it says is that we are playing with 3 DB and 2 JDBC client options.  On the DB side we are playing with streamlined dynamic SQL, plan sharing and statement cache size.  On the JDBC client side we are playing with DYNAMIC_PREPARED and set statement_cache on setting (the last is not really on JDBC side, but addresses the needs of that client).  Our aim:  to keep ASE from crumbling beneath the bad code, which in the time before statement cache refinements was manageable.

We start with ESD#2

1CPU

Statement Cache situation:

1ST_CACHE

Spinlock situation:

1SPINS

The rate of LWP/Dynamic SQLs creation:

1LWPs

monitorconfig:

1MONCONFIG

Now ESD#4:

CPU

Statemet Cache situation:

ST_CACHE

Spinlock Situation:

SPINS

The rate of LWPs/Dynamic SQL creation:

LWPs

monitorconfig

PROC_MON

Before anything else, I must add a short note about reducing the size of statement cache.  Although the configuration parameter is dynamic, both in ESD#2 and ESD#4 there is a real problem reducing its size.  After the statement cache has been utilized (100 MB in my tests), when the statement cache is reduced to a lower value (20 MB or even 0 MB) the memory is not released.  This causes a sensible spike in CPU utilization   and a large SSQLCACHE spinlock contention seen on the graphs above.  Probably a bug.   This is what prsqlcache reports:

1> sp_configure “statement cache”, 0
2> go
Parameter Name Default Memory Used Config Value Run Value Unit Type
———— ——————– ———– ——————– ——————– ——————– ——————–
statement cache size 0 0 0 0 memory pages(2k) dynamic

(1 row affected)
Configuration option changed. ASE need not be rebooted since the option is dynamic.
Changing the value of ‘statement cache size’ does not increase the amount of memory Adaptive Server uses.
(return status = 0)

1> dbcc prsqlcache
2> go

Start of SSQL Hash Table at 0xfffffd7f4890e050

Memory configured: 0 2k pages Memory used: 35635 2k pages

End of SSQL Hash Table

DBCC execution completed. If DBCC printed error messages, contact a user with System Administrator (SA) role.
1>

Now to the rest of the findings.  It seems that ESD#4 treats the situation of a client code wasting its statement (and procedure) cache memory structures – uselessly turning over the pages chain – much better.  I did not post the data from my tests on a large SPARC host as promised earlier since they were not really better.    With this new information, I will have to rerun the tests next Sunday on the ESD#4, SPARC.  Hopefully, I will get a much improved performance metrics there as well.  I am eager to see the statement cache/procedure cache saga in ASE 15.x to be laid to rest.  It has really been a pain in the neck dealing with this stuff.  I hope that at last we may sigh a relief…  and get back to the optimizer issues….

Next update will probably come on Monday, unless something interesting is discovered before that (and I have time to test and testify on it here).

Yours.

ATM.

ASE 15.7 ESD#4 ON SPARC…

$
0
0

I have just finished the first round of tests of ESD#4 on Solaris SPARC platform.  I have to confirm:  the ESD#4 seems to has been finally vaccinated against wasteful reuse of the statement/procedure cache by “inappropriate” use in client-side code.  It looks like the painful experience of seeing ASE suffocating unexpectedly under the stress generated by the code that has been running “more or less smoothly” on an old ASE 12.5.4 is behind us.  I will cross test these issues again tomorrow (as well as shift my tests a bit to check yet another issue around the misuse of the statement cache).  But so far I must say that I found that ASE handles this situation successfully at last.  Very good news!

Here are the bare facts, again.

The tests I have run on the SPARC host on ASE 12.5.4 version (ESD#10):

S125_6LWP_DYN1_ST0_ST0M_108:50:00
S125_6LWP_DYN0_ST0_ST0M_109:01:00
S125_6LWP_DYN0_ST0_ST20M_109:12:00
S125_6LWP_DYN0_ST1_ST20M_109:22:00
S125_6LWP_DYN1_ST1_ST20M_109:34:00

The tests I have run on the SPARC host on ASE 15.7 – threaded mode kernel (ESD#4):

S157T_6LWP_DYN1_ST0_STR0_PLA0_ST0M_108:50:00
S157T_6LWP_DYN1_ST0_STR1_PLA0_ST0M_109:01:00
S157T_6LWP_DYN1_ST0_STR1_PLA1_ST0M_109:12:00
S157T_6LWP_DYN0_ST0_STR0_PLA0_ST0M_109:22:00
S157T_6LWP_DYN0_ST0_STR1_PLA0_ST0M_109:34:00
S157T_6LWP_DYN0_ST0_STR1_PLA1_ST0M_109:46:00
S157T_6LWP_DYN1_ST0_STR0_PLA0_ST20M_109:56:00
S157T_6LWP_DYN1_ST0_STR1_PLA0_ST20M_110:06:00
S157T_6LWP_DYN1_ST1_STR0_PLA0_ST20M_110:15:00
S157T_6LWP_DYN1_ST1_STR1_PLA0_ST20M_110:24:00
S157T_6LWP_DYN1_ST1_STR1_PLA1_ST20M_110:34:00
S157T_6LWP_DYN0_ST1_STR0_PLA0_ST20M_110:43:00
S157T_6LWP_DYN0_ST1_STR1_PLA0_ST20M_110:52:00
S157T_6LWP_DYN0_ST1_STR1_PLA1_ST20M_111:01:00
S157T_6LWP_DYN0_ST1_STR1_PLA1_ST200M_111:09:00
S157T_6LWP_DYN1_ST1_STR1_PLA1_ST200M_111:17:00

The tests I have run on the SPARC host on ASE 15.7 – process mode kernel (ESD#4):

S157P_6LWP_DYN1_ST0_STR0_PLA0_ST0M_112:24:00
S157P_6LWP_DYN1_ST0_STR1_PLA0_ST0M_112:32:00
S157P_6LWP_DYN1_ST0_STR1_PLA1_ST0M_112:43:00
S157P_6LWP_DYN0_ST0_STR0_PLA0_ST0M_112:52:00
S157P_6LWP_DYN0_ST0_STR1_PLA0_ST0M_113:01:00
S157P_6LWP_DYN0_ST0_STR1_PLA1_ST0M_113:09:00
S157P_6LWP_DYN1_ST0_STR0_PLA0_ST20M_113:18:00
S157P_6LWP_DYN1_ST0_STR1_PLA0_ST20M_113:27:00
S157P_6LWP_DYN1_ST1_STR0_PLA0_ST20M_113:37:00
S157P_6LWP_DYN1_ST1_STR1_PLA0_ST20M_113:45:00
S157P_6LWP_DYN1_ST1_STR1_PLA1_ST20M_113:53:00
S157P_6LWP_DYN0_ST1_STR0_PLA0_ST20M_114:02:00
S157P_6LWP_DYN0_ST1_STR1_PLA0_ST20M_114:11:00
S157P_6LWP_DYN0_ST1_STR1_PLA1_ST20M_114:20:00
S157P_6LWP_DYN0_ST1_STR0_PLA1_ST200M_114:28:00
S157P_6LWP_DYN1_ST1_STR0_PLA1_ST200M_114:37:00
S157P_6LWP_DYN1_ST1_STR1_PLA1_ST200M_114:44:00
S157P_6LWP_DYN0_ST1_STR1_PLA1_ST200M_114:50:00

The performance graphs:

ASE 12.5.4:

CPULOAD

Spinlock Situation (note the way 12.5.4 handles the situation with the statement cache enabled – pure disaster):

SPINS

Procedure Cache:

PCACHE

Dynamic SQL/LWPs creation rate:

LWPS

ASE 15.7 ESD#4 – threaded kernel mode:

Thread Load:

CPULOAD

Spinlock situation:

SPINS

Procedure cache:

PCACHE

Dynamic SQL/LWPs creation rate:

LWPs

Statement Cache (not available on 12.5.4):

STCACHE

ASE 15.7 ESD#4 – process kernel mode:

Engine Load:

CPULOAD

Spinlock situation:

SPINS

Procedure cache:

PCACHE

Dynamic SQL/LWPs creation rate:

LWPs

Statement Cache (not available on 12.5.4):

STCACHE

Threaded kernel mode gives a very steady throughput.  More steady than the process kernel mode.  There is the same “bug” in the process mode that the sp_monitorconfig at certain point stops reporting on procedure cache utilization (I wonder if the new monMemoryUsage MDA supplements the missing data).

In general, if you do have a client that generates a large number of fully prepared statements, DON’Tturn off the statement cache on the session level and DON’T turn off the DYNAMIC_PREPARE JDBC setting.  In both cases, the thread utilization climbs up (and the proccache spinlock).  In addition, if the statement cache us ruthlessly turned over due to very high volume of unique statements generated by the code, keep the cache as small as possible – 20M was fine here, 200M was pretty bad.

The threaded kernel mode gives more satisfactory results – more steady performance, slightly better throughput and less bugs.

I will be running more and different tests in the following weeks – as well as comparing performance across a wider spectrum of metrics – but from the point of view of running a high volume of unique prepared statements the problem of ASE 15.x seems to have been solved at last.

ATM.

All previous forums rolled into this one?

$
0
0

Do I understand this correctly?

All previous forums for ASE, RepSvr, Tuning, etc are now in this one forum?

And since I don't see any historic entries, I assume we will loose all the info contained in the old forums?

 

 

Thx,

rick_806


ASE 15.7 ESD#4 Unix Domain Sockets

$
0
0

In the past few days, I have fired up a copy of ASE 15.7ESD#4  to try the new feature Unix Domain Sockets.  I found only a few notes about this feature. The CR listing is source of my information.  CR 667751 has all of the public information that I have found to date.

 

Here is the heart of the information.

 

"master afunix unused //<hostname>/<pipe>"

 

After many failed attempts, I realized how simple this could be.

The ASE Server creates a socket file at start-up based on the master entry in the interfaces file.

In order for this to work, the ASE dataserver must have write access to a folder.

Hmmm-- $SYBASE sounds like a good choice.

 

I decided I wanted my socket file to be called "ASE.socket" in /sybase/ folder.

 

Here is an example of the master line that finally worked.


master afunix unused //mytesthost/sybase/ASE.socket

 

I tested the socket with isql & bcp... Both tools worked without a problem.  The feature of the new connection is to reduce the overhead associated with TCP/IP.  The results of my bcp inbound testing proved that the network layer was not my source of contention and slowness.  I expect that bcp outbound will show better results if I can get past other self-inflicted issues of my machine.

 

In conclusion, ASE Unix Domain Sockets works without issues when I use the ASE utilities.


 


ASE 15.7: Prepared Statements and ASE Statement/Procedure Cache, Configuration Impact

$
0
0

So we are back with the same issue:   statement cache/procedure cache behavior under the stress of executing a high volume of prepared statements.  

 

I have just finished another round of tests around this issue (which has caused quite a lot of troubles in the past releases of ASE) and I want to share and recapitulate (the more you explain the more you understand yourself).

 

First, let's quote a bit of documentation - to lay out who are the players in our field.  I will reserve it to a very few aspect of what is relevant.  .

 

[1] Procedure Cache:  {Performance and Tuning Series: Basics. Chapter 5: Memory Use and Performance}

Adaptive Server maintains an MRU/LRU (most recently used/least recently used) chain of stored procedure query plans.  As users execute stored procedures, Adaptive Server looks in the procedure cache for a query plan to use.  If a query plan is available, it is placed on the MRU end of the chain, and execution begins.

...

The memory allocated for the procedure cache holds the optimized query plans (and occasionally trees) for all batches, including any triggers.

 

[2] Statement Cache:  {Performance and Tuning Series: Basics. Chapter 5: Memory Use and Performance}

The statement cache saves SQL text and plans previously generated for ad hoc SQL statements, enabling Adaptive Server to avoid recompiling incoming SQL that matches a previously cached statement. When enabled, the statement cache reserves a portion of the procedure cache

 

[3] Streamlined Dynamic SQL:  {Performance and Tuning Series: Basics. Chapter 5: Memory Use and Performance}

In versions earlier than 15.7, Adaptive Server stored dynamic SQL statements (prepared statements) and their corresponding LWP in the dynamic SQL cache.  Each LWP for a dynamic SQL statement was identified based on the connection metadata.  Because connections had different LWPs associated with the same SQL statement, they could not reuse or share the same LWP.  In addition, all LWPS and query plans created by the connection were lost when the Dynamic SQL cache was released.

In versions 15.7 and later, Adaptive Server uses the statement cache to also store dynamic SQL statements converted to LWPs.  Because the statement cache is shared among all connections, dynamic SQL statements can be reused across connections.

 

[4] DYNAMIC_PREPARE property:

When client connection executing prepared statement sends request to ASE, it may either send language command as plain SQL text (if DYNP is set to false), or request an ASE to create LWP for it (if DYNP is set to true - something that may be seen in monSysSQLText as "create proc dynXXX as..." & DYNAMIC_SQL dynXXX...).

To quote from Managing Workloads with ASE, "The statement cache reproduces the same benefits as fully prepared statements as it takes language commands from client applications, replaces literals with parameters, creates a statement hash key, compiles and optimizes the statement, and creates a Light Weight Proc for re‐use."

 

In plain language, we may describe the playground of our tests in the following way:

 

When a client connection sends an ASE a request, if this request contains a prepared statement, the client connection will either send it as it is (SQLLANG) or convert it into a create procedure request (DYNP/LWP). If the statement cache is enabled on ASE, ASE will store the SQL text & pointer to its LWP in the statement cache, install the plan/tree of the corresponding LWP in the procedure cache and ultimately execute it.

 

The motivation behind all this is to reuse as much resources within ASE as possible. Without the statement cache, each adhoc query has to be "converted" [parsed/normalized/compiled] into query plan individually and dropped after being executed, thus preventing reuse. With statement cache enabled, the query is looked up in the statement cache instead and if the match is found its plan is cloned from the procedure cache and executed.  If it is not there, it will be  "converted" and installed for reuse.

 

The same applies to prepared statement. Without the statement cache [and when streamlined option is turned off], each prepared statement has to be converted [parsed/normalized/compiled] into query plan and stored in the procedure cache individually - to be released at client disconnect. With statement cache and streamlined option enabled, the statement is looked up, cloned and executed (or stored for later reuse).

 

In fact, this "reuse" methodology is a patented feature [2012]. It involves scanning statement/procedure cache memory page chains (holding spinlock) and either installing a new or reusing an old object.

 

StCacheSearch

 

In fact, one may collect quite a lot of information on all this over the web.

 

So far the theory.   We know who is the player on our testing grounds.  What we will see below based on the tests done is how ASE reacts to changing different configuration parameters related to the relatively new aggressive resource reuse aim.

I have confined myself in the tests to only four possible setting to play with:  existence of the statement cache (STxM), streamlined SQL option (STRx), DYNAMIC_PREPARE option (DYNPx) and configuring connection setting to use statement_cache (STx). This results in the following matrix of possible tests:

 

pst_TestTimes

 

I run 10 Java clients executing unique prepared statements on a 15-thread ASE, 15.7 ESD#4, SPARC.  The "literal autoparam" setting is turned on, so is the "plan sharing" option.

 

The first graph represents the thread busy for all our tests:

 

pst_ThreadBusy

 

First thing to notice is that whenever the prepared statement meets an ASE, if the connection property of DYNAMIC_PREPARE is turned off, ASE responds in 5 to 10 % leap in thread utilization.  On the one hand, it seems pretty obvious:  we "reuse" LWPs rather than demand ASE to generate (parse/normalize/compile) plan for each statement over and over again.  This is no so obvious, though, since in our case we generate LWPs wrapping 10 completely unique streams of prepared statements.  Reuse here is pretty minimal.  It seems a bit odd that making an ASE to scan its statement cache, install new LWP + QP and execute it to have less CPU impact than just prepare the plan and drop it.  The numbers, though, are unequivocal.   Even if ASE faces a thick stream of unique prepared statements requests, it handles it much better if the client requests them as procedures (incidentally, it was NOT the case with previous ASE 15 releases).

 

I'd like to share also the following graph:

 

pst_OpenObjects

 

We know from documentation that each cached statement consumes one object descriptor.  So it makes sense that turning the statement cache on the number of active object descriptors will rise [12:25].  The impact, though, is much greater when the streamlined option is turned on.  Something to be kept in mind.

 

This one is also interesting:

 

pst_LockRequests

pst_ShraedRow

 

Each time the DYNP is turned on, the number of lock requests is doubled.

 

This one is also telling:

 

pst_TDB_Util

pst_DDC_Util

 

Since we create our LWPs in tempdb, and since we run exclusively the code that causes ASE to generate LWPs, each time the DYNP is turned on, tempdb utilization surges up.

 

Now, let's see how we are handling procedure requests and statements requests in each case:

 

Procedures:

pst_ProcReq

pst_ProcRem

 

Statements:

 

pst_StCachedpst_StDroppedpst_StINCachepst_StNotCachedpst_StNotINCache

 

We run 12K procedure requests per second.  Demand on the statement cache is highest when we run DYNPs streaming them into the statement cache (streamlined on).

 

Happily enough, this high volume of prepared statements washing the statement/procedure cache is handled with a relatively low spinlock contention (I will test it running client connections > threads situation in future).

 

pst_SpinPCHpst_SpinPMGRpst_SpinQRYPpst_SpinSSQL

 

It is important to state here that ASE runs with TF758 turned on.  Without it you would have seen proc_cache spinlock getting to 10-s.

 

Let's see the statement cache utilization from another angle (monStatementCache):

 

pst_STCACHE

 

Our cache hit ratio is not impressive (little wonder in our situation of running unique statements from 10 concurrent sessions).   What is interesting to notice, though, is the number of statement the same 20MB cache contains running with and without the streamlined option.  Turning the streamlined option on caused the number of statements contained in the statement cache to double.

 

The following displays what the client connection requests from ASE in terms of LWP requests:

 

pst_DYNSQLS

 

When we turn the DYNP on we start requesting to create procedure from ASE explicitly.  However, the lifetime of the procedures in the cache is reverse:  when DYNP is off, ASE creates procedures implicitly and stores them in the statement cache + procedure cache.  When DYNP is on, ASE creates the procedure and drops it almost instantly - unless the streamlined option is turned on, causing ASE to store them in the statement cache as well - in a far greater number (incidentally, the names of the DYNP LWPs and ADHOC LWPs also differ).

 

 

So, what have we learned from all this, if at all?

 

From the ASE load perspective, we have seen that configuring the ASE to reuse (streamline) the dynamic SQLs is beneficial even if we face a situation of concurrent client connections generating a high volume of unique prepared statements.  Turning the DYNAMIC_PREPARE option off would only cause an additional 10% surge in CPU utilization (there is a problem that we do not really know how the throughput is influenced but rather assume it from the rate of procedure request, which is not precisely true - I will have to change the way I perform the load in future in order to generate more accurate comparison here).

 

We have learned that forcing ASE to reuse (streamline) the dynamic SQLs will force the procedure cache to handle much more plans/trees since in addition to adhoc queries landing in the statement/procedure cache to aged out, LWPs and their plans that would have died as clients disconnect will continue to occupy both statement and procedure cache, until they age out.  Although we have seen that setting the streamlined option on is beneficial to an ASE in our situation, we must not neglect this fact (after all we run each test at a time, not mix - had we mixed, both adhocs and dynps would land in the same area).   In addition, since statement cache is a general resource, if we know that we have bad (unique & high volume) DYNPs workload threatening to wash out our statement cache, perhaps the best decision would be to turn the streamlined option off and keep these DYNPs off statement cache altogether.

 

We have learned that open objects usage surges up with the streamlined option turned on, so it must be addressed in configuration in order to avoid descriptor reuse.

 

We have learned that we have to treat tempdb differently if we know that we will be streamlining.  It's utilization goes way up when ASE has to create LWPs at high rate, so additional attention is needed here.

 

What we have not understood is what is the origin of the consistent leap in shared row requests when we configure DYNAMIC_PREPARE to true.  Is it because we handle explicit create procedure requests?   To be tested further.

 

What we have also learned is how the configurations affecting statement cache may influence the procedure cache.  Documentation states that "statement cache memory is taken from the procedure cache memory pool" - a puzzling statement since the memory consumption of both is added together (logical/physical ASE memory) System Administration Guide: Volume 2, Configuring Memory.  What is obvious, though, is that the two have very high degree of reciprocity:  each statement landing into statement cache will have to consume space in the procedure cache for the LWP tree/plan.  Which means that statement cache is not a portion of  procedure cache.  If configured, it will consume a portion of procedure cache which is directly influenced by its size (it is said to be reserved ahead in the documentation).   This is much more accurate way to explain the relationship between statement cache and procedure cache.

 

We still know very little about the procedure cache.  We know that it has its chain (one, or more if it has already been implemented) of procedure query plans/trees.  We do not know, for example, which of the procedure cache modules / allocators each occupy and if these have been partitioned.  I did monitor the impact of the tests I run on both, but the data it generated has been pretty confusing/uninformative.  Especially so since there is no documentation whatsoever explaining procedure cache structure in details.  We also know very little about ELC and its impact on our tests - although it is pretty sure that it plays a role here (TF758 may testify this).

 

Which returns us to a famous Socratic sage:   the more you know the more you know you don't know...

 

Hm,  should we start the testing over?

 

ATM.

ASE 15.7: Threaded vs. Process Kernel Mode - Solaris/RHEL cross-tests

$
0
0

I am back with the comparative tests:  threaded kernel mode versus process kernel mode.  I have seen in  my earlier tests that process kernel mode seems to get slightly better throughput under the same workload.  On the other hand, I have also found that process kernel mode have shown certain types of unwanted behavior absent from the threaded kernel mode.

 

In order to verify these differences I decided to port the tests from the Solaris (SPARC) platform to Redhat Linux (Xeon). This way I will be able to test ASE behavior on different platforms: Solaris running on SPARC and RHEL running on Xeon.  The probability of an error will be minimized.

 

Having performed identical test, I have indeed been able to verify that the threaded kernel mode displays up to 10% more CPU utilization.  Behavioral lapses in the process mode too were persistent across platforms.

 

What I will report below is a series of 4 tests. Two of them performed on a SPARC host (32 logical processors) with ASE running in threaded and process mode, and another two performed on a XEON host (32 logical processors) with ASE running again in threaded and process modes.  The legend of the tests is identical to my previous posts, so I will not repeat it here.  I am still working around the issue of generating prepared statements at high rate.

 

SPARC_TESTS

 

 

XEON_TESTS

I will post only a limited number of graphs, those that looked most peculiar to me.  I will not comment on how differences in configurations affected the CPU utilization/throughput either (this too is found in earlier posts).

 

First the TPS/CPU Load:

 

SPARC:

TPS.SOL

XEON:

TPS.LX

 

ProcPS/CPU Load:

 

SPARC:

PROC_REQ.SOL

PC_MON.pr.solPC_MON.th.sol

XEON:

PROC_REQ.LX

PC_MON.pr.lxPC_MON.th.lx

 

Statement Cache:

 

SPARC:

ST_CACHED.SOLST_DROPPED.SOLST_FOUND_IN_CACHE.SOL

XEON:

ST_CACHED.LXST_DROPPED.LXST_FOUND_IN_CACHE.LX

 

Spinlock Contention:

 

SPARC:

PCACHE_SPIN.SOLSSQLCACHE_SPIN.SOL

XEON:

PCACHE_SPIN.LXSSQLCACHE_SPIN.LX

 

There are quite a few interesting things.  The same things were found to be working less well across platforms (such as sp_monitorconfig that stopped reporting utilization at certain time, on either RHEL or SPARC, or PC/SC spinlock contention that were consistently higher in the process mode - even with TF758).  What has been pretty surprising is the huge throughput leap in porting the tests from 8 chip rather old SPARC VI host (16 cores, 32 threads running 2.15 GHz) to 2 chip XEON 2600 host (16 cores, 32 threads running 2.7 GHz).  I will check SPARC VII (2.66 GHz 4-core) and T4 (2.85 GHz 8-core) chips later on to have more data.  But the throughput difference found here is something worth digging into.

 

Anyway, even with the latest tests there still remained a question:  do we really see greater thread utilization due to a higher throughput that ASE achieves running the threaded mode OR the higher thread utilization brings about a lower throughput instead.  In order to test this I had to modify slightly my tests so that I get more precise recording of the volume of work the server does in each test.

 

This has brought me to the following numbers (I have tested only the RHEL host - more tests to come):

 

THROUGHPUT.LX

 

The 100% utilization corresponds to the threaded kernel mode - we have slightly lower number of code loops (I compare here an average number of loops code in each client connection performs for the same period of time).  The threaded mode gets an average of 240 loops with 100% thread utilization, while the process mode gets 260 loops with 90% engine utilization.  This is not much, but the difference is there.  What is also interesting is the following:

 

Process Mode:

CODE_LOOPS.pr.lx

Threaded Mode:

CODE_LOOPS.th.lx

 

Each bar in the graph corresponds to a client connection executing its code (identical).  Whereas in the process kernel mode the throughput deviation between each client may vary (200 to 300!), for the threaded kernel mode this deviation does not exist (within the range of 10 - which is explained by the fact that the threads are activated serially).

 

So it seems to be true:  the threaded kernel mode does get to a slightly higher thread utilization without an increase in throughput. The difference is not significant, but it is there. On another hand, the threaded mode yields much more stable performance - both in terms of query response time and across many other counters. If I had to choose, I'd choose the threaded kernel mode over the process mode - based on its performance characteristics (probably hosted on Linux - but I have to check this aspect more thoroughly before I comment on it with any degree of reliability).  If you consider that some process mode clients has suffered ~15% drop in throughput in comparison to the threaded mode clients, the 35% improvement other process mode clients received is somewhat compromised.

 

My last tests before I leave this topic will be focusing on whether the throughput rises when more threads are brought online for the strangled ASE, thus reversing the balance towards the threaded mode.  I will check this across both RHEL and SPARC hosts to have more sanitized data and report on it here.

 

 

ATM.

ASE 15.7: Threaded Kernel vs. Process Kernel: Throughput Variance

$
0
0

I have just finished performing a controlled analysis of general performance differences for ASE 15.7 running process and threaded kernel mode on various platforms and hosts (RHEL running on Dell Xeon chips, Solaris running on SPARC M, SPARC T  and x64).  Although there are quite a few differences, the following image tells the story of the throughput variance one may expect when switching between different kernel modes:

 

PSS_VS_THR_THROUGHPUT

 

Top to bottom:  the top graph shows response time for the same query run by multiple concurrent client connections (each bar representing an absolute number of query executions a client connection has performed in a period of time - 5 minutes).  Bottom left - an average "throughput" for the client connections running vis-a-vis ASE configured in process kernel mode.  Bottom right - an average "throughput" for the client connections running vis-a-vis ASE configured in threaded kernel mode.   The bottom table(s) describes the type of configuration applied to ASE/JDBC.

 

The image is a bit bulky, but it is pretty eloquent and consistent - across all the platforms tested.  Even though sometimes it looks like ASE running the process kernel mode out-performs ASE running the threaded kernel mode, if you compare throughput more accurately it becomes clear that threaded kernel mode yields very stable, predictable response time, while the process kernel mode exhibits a bursty, unpredictable response time, with some client sessions outperforming others by as much as 100%.  Performing less controlled tests may create a false impression that the process kernel mode runs faster, which in fact is not really true.

 

It is nothing new, however.  Predictable response time has been named as one of the aspects the threaded kernel has brought into an ASE.  In the tests I run, both java client sessions and ASE itself were located on the same host, competing for OS resources.  Although there is very little I/O involved in the tests, still there is a telling difference in response time across the kernel modes.

 

I will not comment on the type of tests run in details here.  The only thing I will say is that you should definitely test your JDBC/ODBC client application with different JDBC/ODBC + ASE settings.  As it may be seen from the throughput variance, configuring ASE/JDBC/ODBC properly does make a difference (although it is not always possible to configure both for the best performance in real-life situation).

 

Stay curious and check yourself as much as possible...

 

ATM.

Violin Flash Memory Arrays Certified with SAP Sybase ASE

$
0
0

Violin Memory, Inc., provider of memory-based storage systems, and SAP AG today (April 16th)  announced that Violin 6000 Series Flash Memory Arrays are now certified for interoperability with SAP Sybase Adaptive Server Enterprise (SAP Sybase ASE), delivering reliability and accelerated performance across a range of enterprise applications. Used together, these solutions will provide customers with enhanced and simplified solutions that leverage the power of Violin Flash Memory Arrays and business-critical SAP applications to accelerate transaction-driven environments. Additionally, Violin Memory is now an SAP software solution and technology partner in the SAP PartnerEdge program. It will work with SAP to serve new and existing customers, partners and channels to benefit from solutions from Violin Memory and SAP. Violin products are also available in support of SAP ERP applications.

 

“The Violin Flash Memory Arrays integrated with SAP Sybase ASE are a key solid-state storage component for the acceleration of business-critical applications from SAP,” said Kevin Ichhpurani, senior vice president, Business Development and Ecosystem Innovation, SAP. “The certification process showed that the arrays can deliver significant performance improvement over traditional disk-based systems, delivering substantial value to customers. In fact, during preliminary internal testing, we have seen 10x improvement in the performance of SAP Sybase ASE running on Violin Flash Memory.”

 

Read the full story at - http://www.violin-memory.com/news/press-releases/violin-flash-memory-arrays-now-certified-with-sap-sybase-adaptive-server-enterprise-to-accelerate-customer-applications-in-transaction-driven-environments/

Viewing all 173 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>