by Craig Stuntz
http://blogs.teamb.com/craigstuntz/articles/IBOptimization1.aspx
This session is part of a three-part series on creating high-performance systems using InterBase as the database back end. Part one explains how to optimize applications, part two (Optimization II: Optimizing InterBase SQL and Metadata) details how to optimize InterBase SQL and metadata. "Optimization III: Selecting and Configuring InterBase Server Hardware" completes the series.
This is a long paper, so here's an overview.
The Golden Rule of Optimization, and its corollary for database applications, are the indispensable rules which every developer must follow.
The remainder of the paper is divided into two parts. The first is a discussion of application design issues; how to create a robust architecture for a new application. The second part examines problems you may find in existing applications, how to fix them, plus some tools which will make your job easier. The second part concludes with a workflow for using the tools and techniques discussed to speed up a poorly performing, existing application.
"Optimize" only the slow stuff; leave the rest alone.
It is important to recognize that in a typical application, a relatively small fraction of code/metadata design will be responsible for most observed performance problems. Before optimizing anything, ensure that you are working on the slowest part of the system, as this will give you the greatest improvement for your optimization effort. Furthermore, it's senseless to needlessly obfuscate code or metadata design which is fast enough for the job at hand already by "optimizing" it in such a way as to render the result unreadable; this makes maintenance a nightmare.
In short, you want to optimize use of your own time, as well as your application!
Always develop database applications using a database populated with real data.
Many performance problems will not stand out until an application is used with a production database filled, perhaps even over-filled, with real data. Nearly any application can be fast with an empty database! Most developers spend at least as much time in active development of an application as they do in formal testing, if not more. By using a real, production database during development, you will discover serious performance problems early in the development process. If radical changes are required to fix the problems, such as a dramatic change in database metadata, it is much easier to do this early in the development process.
Furthermore, make sure you're using the application in the same way the customer will. On one project I worked on a while back, I was in the habit of logging in as an administrator because the security in the application frequently interfered with the testing I needed to do โ some steps of the process I was testing required administrator privileges. Certain performance problems, however, only surfaced when logged in as a non-administrative user, because they were caused by confidentiality rules which didn't apply to administrators. Only by using the process as the customer will (in this case doing some parts of the test as an administrator and other parts as a non-administrator) can you be sure that you will find performance problems before your customers will.
Using a populated database forces you to "eat your own dog food" โ if a portion of the system you are developing is too slow to make development productive when used against a large database, your customers are not likely to be productive with it, either. Use of a populated database helps you obey the Golden Rule. Since some tables (imagine a table of U.S. states, for example) are unlikely to ever be large, you don't need to spend time optimizing their use. It's harder to determine which tables are likely to be large when looking at an empty database.
Also, optimizing against a local DB server will disguise lag imposed by network traffic, so I strongly advise optimizing against DB server running on another machine on your network.
An InterBase application depends upon several pieces of hardware and software to deliver results. Hardware configuration issues are discussed in detail in my separate paper, "InterBase Server Performance Guide: How to Select and Configure Hardware," which can be found on the BorCon 2003 CD. From a software point of view, all InterBase applications use both the InterBase server plus a client application, and some use middle-tier application servers and third-party programs such as Crystal Reports.
Bottlenecks are areas of a system with insufficient bandwidth for their intended application. These can be literal bottlenecks โ there's only so much data you can cram through a dial-up connection at once โ or figurative bottlenecks, such as a query which simply takes too long to run. In accordance with the Golden Rule, the first step in optimizing an existing system is to determine where the bottlenecks are, and which cause the biggest problems. However, when designing a new system we would ideally like to avoid the most common bottlenecks from the outset. So before discussing how to identify trouble spots in existing systems, I will discuss areas of application design which have a critical impact on performance.
One other important issue that I will not discuss in this paper is how to design efficient SQL and database metadata, because this topic is by itself so large that it deserves, and has, its own session. For detailed information on this topic, please attend "Optimization II: Optimizing InterBase SQL and Metadata".
This section will mostly be useful for developers starting a new project. However, it may also be helpful for those looking to migrate legacy applications away from the (now deprecated) BDE.
Traditionally, the term "Client-Server" means an application which talks directly to a database server, and often contains business rules. "n-Tier," also in traditional usage, means a thin client which talks to an application server, which in turn talks to a database server. Business rules are normally enforced on the application server tier.
However, "psuedo-n-tier" programming, where a single executable contains both the application server and thin client tiers, has recently become popular. In this model, there is a clear separation in code between application server (business rule) and client (presentation) functionality. It's outside the scope of this paper to examine the advantages of this model in detail. When I discuss "n-Tier" applications, I'm referring both to "real" n-Tier systems with a separate application server, and "psuedo-n-tier" applications contained in a single EXE. From an optimization point of view, they are very similar.
There are some important differences between client-server and n-tier applications from an optimization point of view. I'm not going to cover how to optimize DCOM, but one of the most important differences originates from the principle that n-tier application servers should be stateless. Stateful n-tier systems do not scale well, which defeats one of the most important reasons for using the n-tier design.
In a client-server application, it's OK to open a query and fetch records a few rows at a time. InterBase has some specific performance issues when a transaction is held open for a long time, which will be discussed in more detail below, but having a query open for a couple of minutes isn't going to cause any problems in a client-server application. Stateless n-tier application servers, on the other hand, require that results of a query sent by the client tier to the application server tier be returned in a single method call.
Delphi's TClientDataset defaults to this behavior by making -1 the default value of PacketRecords. It is possible to page through a large result set a few records at a time in an n-tier system, but each "page" of records must be returned from a separate call to the application server, in much the same way that each page of search results from a Google query is a separate request to the web server. Since the application server must be stateless it is not possible to presume that the same data module instance will supply subsequent pages of data, so you can't simply keep a query open and fetch more records when requests come in for the next page.
In practical terms, what this means is that you can get away with things in a client-server application which you simply cannot do in an n-Tier system. In a client-server application, for example, it's fine to open a query which would return 1000 records, if a FetchAll were performed, and fetch additional records as the user scrolls. In some cases there is no performance penalty for a query which could return millions of records, provided that you don't actually attempt to fetch them from the server at once. (There may be a server-side penalty if the records must be sorted before it can return the first record, but there will be no network overhead). In an n-tier system, on the other hand, the application designer must be careful to ensure that the client cannot make requests to the server which would swamp the system, as fetching even a few thousand records at a time can introduce unacceptable delays in a user interface.
This means that stateless n-tier systems should be designed from the ground up to never require or request large result sets. There's no penalty for doing this in a client-server application, of course, and in my opinion doing so tends to improve the user experience. But in an n-tier system this is a strict rule which must be followed from the outset.
Borland's dbExpress components and the ADO.NET architecture practically require an n-tier design for interactive applications, and in my opinion the n-tier design should be used in all but the simplest applications. While the n-tier design does impose design restrictions which can seem burdensome at first, it pays back by allowing your system to scale much larger than a simple client-server design can.
The first decision you must make is whether to use InterBase-specific data access components or database-independent components.
Components such as InterBase Express (IBX), InterBase Objects (IBO), and FIB+ will often deliver better performance, sometimes significantly better, than database-independent components. They also expose InterBase features which are not supported in database-independent components. Some of these features are important โ for example, dbExpress does not, at the time of this writing, provide a way to create a new InterBase database. When used in an optimal manner, InterBase-specific components can deliver performance comparable to direct use of the InterBase API.
The tradeoff, of course, is that if you are asked to support other database servers in addition to InterBase, it's quite a bit more work if you have selected an InterBase-specific component set. This limitation can be significantly eased through the use of an n-tier or psuedo-n-tier architecture, since the user interface is not directly tied to the InterBase-specific components, but it's still harder than if database-independent components are used.
In practice, database-independent components have tended to carry a performance penalty. While dbExpress performs significantly better than the BDE, initial releases of dbExpress had performance problems related to fetching metadata, which have been only partially addressed in subsequent releases.
You can work around some of these using the NoMetadata, but this is not compatible with some dataset features. The performance of dbExpress is expected to improve in future releases, but for the time being we can say that it's significantly faster than the BDE, but not as fast as the InterBase-specific components.
As mentioned above, InterBase-specific components, used properly, can deliver performance on par with calling the InterBase API functions directly. So in my opinion there is little reason to ever do this unless you are using a feature which is not supported by your component set, or are developing for a platform where the component sets mentioned are not supported (Solaris). Use of Embedded SQL falls into the same category as direct use of the InterBase API.
Having settled upon a data access strategy and an application architecture, we must now turn our attention to how we use them. As a general rule, the less data you ask InterBase to send to your client application, the faster your system will be. Of course, you don't want to ask for less data than your users need to see, either, so the first question to ask when designing a client application or middle tier is, "What data do I really need in each process?"
For developers coming from the desktop database world, it is important to break the table-centric mentality. InterBase has tables, yes, but when designing an application you don't see them. What you see are result sets of SQL statements. Of course, it's quite easy to write an SQL statement that returns an entire table (or, at least, all records visible to your transaction) as its result set:
SELECT * FROM MY_TABLE
...but much of the time this is more data than is really needed for an
optimal user interface or business process. Such a query also does not
exploit some of the useful features of the InterBase server, for example,
the ability to JOIN
in data from other tables and
ORDER
the result set.
You may not need all columns in a table for a particular feature. In fact, I rarely use "*" in my SQL statements at all, preferring to list fields explicitly. The reason is that even if I need all columns in a table for a feature, I don't need any columns which will be added in the future, after my feature is completed. Specifying the columns explicitly in the SQL statement guarantees that I will get only the columns I need in the result set, even if the metadata of the table is expanded down the road.
Likewise, even if the user needs access to all rows in a table, they may not need to see them all concurrently. In fact, for tables with more than a few dozen rows, presenting all rows in the same grid may inconvenience the user, as it's difficult to find a particular record in the middle of a long list. Bill Todd's rule of thumb that if you have more than 100 rows you should question your application design seems good to me.
What this all boils down to is that the less data you send over the wire, the faster your application will seem to be, even on very fast networks.
Here are some application design techniques you can use to reduce the
amount of data you SELECT
:
JOIN
s over Delphi lookup fields, whenever
possible. Although it's possible to optimize the behavior of the
TDataset.Lookup method, even an optimized implementation isn't going to
beat an SQL JOIN
, and the default implementations are in
some cases very slow.
Reducing the amount of traffic between database client and server is just as important for non-interactive processes and applications. With non-interactive processes in particular, warning flags should go up whenever you see code which iterates over a result set:
// begin suspect code while not MyQuery.Eof do begin DoSomething; MyQuery.Next; end; // end suspect code
While it is occasionally necessary to iterate over a result set, in most cases there are more efficient ways to do the same task. Alternatives will be discussed later on.
It seems that many database developers would like to ignore the issue
of managing transactions altogether. "I'm only doing a
SELECT
, why do I need a transaction?" they ask. In fact,
transactions are as important for reading as they are for writing, and
proper use of transactions can reduce the amount of work that both server
and client need to do. Proper use of transactions is a crucial aspect of a
well-designed database application. If you don't know how
transactions are being used, then your design is incomplete. InterBase
gives developers a very large number of options for configuring how
transactions behave โ more than any one developer is ever likely to
need. I will discuss only the more commonly used options.
While many people think of transactions as simply a unit of work, that is only one piece of the picture. The classical model of transactions in a relational database is the acronym ACID โ Atomicity, Consistency, Isolation, and Durability. The "unit of work" aspect is atomicity. But we should also consider the Isolation aspect, which governs how transactions interact.
Isolation means that concurrent transactions may not interfere with each others. Among other things, this means that no transaction may see uncommitted data from another transaction (sometimes called "dirty read"). Not all databases successfully enforce this, but InterBase does, always.
One important consequence of the principle of isolation is that every
client running a SELECT
statement must start it, fetch rows,
and terminate the statement within the context of a single transaction. It
is impossible to run a SELECT
without a transaction
context because without it, InterBase wouldn't know which version of the
record your transaction should see.
There are several transaction isolation modes, but the most commonly used are read committed and snapshot. In a read committed transaction, a statement should return records as they existed at the instant when the statement was executed, even if they are not fetched until much later on. In a snapshot transaction, a statement should return records as they existed at the instant when the transaction started, even if the statement was executed much later on, and the records fetched later still.
So if Alice starts a read committed transaction, and issues a
SELECT * FROM MY_TABLE
statement, but does not fetch all of
the records, and then Bob deletes a record from MY_TABLE and commits his
transaction, Alice's query must return the record that Bob deleted, once
she has fetched all records in her result set, even if she had not fetched
that record at the time that Bob deleted it. If Alice's result set did not
include the record that Bob deleted, then Bob's transaction would have
interfered with Alice's, which means that isolation was violated and the
"transaction" wasn't really a transaction at all!
This should make it clear why SELECT
statements, like all
other SQL statements, require a transaction context, and why you cannot
start a statement in one transaction context, commit the transaction, and
continue fetching records in another transaction context. There is no way
to enforce isolation without it.
Now, why bother with isolation at all? InterBase is designed to work elegantly in a multi-user environment. When dozens or hundreds of users are working with the same tables concurrently, there needs to be a predictable and deterministic way to determine what each transaction sees. You need the ability to relate one result set to another, and that can't be done if the result sets are changing in mid-fetch.
Setting a transaction read-only is a way of declaring your intention
for what you will do with the transaction. In some cases, this
allows the server to make internal optimizations which reduce the amount
of work it must do to maintain the transaction's context. So the only
thing to say on this subject is to use the read only option when you don't
need to INSERT
, UPDATE
, or DELETE
in a transaction context, and read-write when you do.
Table stability is a way of enforcing single-transaction access to a table. Legitimate uses for table stability are very rare. Aside from the fact that it's a bottleneck in a multi-user system (it places very little load on the server itself, but makes it easy for one transaction to block many others), remember that it's important to think in the context of result sets, not tables. My general advice with regards to table stability is to not use it.
There are two normal ways to end a transaction: COMMIT
or
ROLLBACK
. For reasons discussed below, it is important to end
transactions when they are no longer needed, even if you'll need to
immediately restart them. Note that COMMIT RETAINING
and
ROLLBACK RETAINING
do not end the transaction; they simply
make changes made by the transaction visible to other transactions, and
allow other transactions to affect records modified by the first
transaction prior to the COMMIT/ROLLBACK RETAINING
. In
general, prefer a "hard" COMMIT
or ROLLBACK
over
a "soft" COMMIT RETAINING
or ROLLBACK RETAINING
whenever possible. Prior to InterBase 7, the sole advantage of a
COMMIT RETAINING
is that you can make changes made by your
transaction visible while still having a transaction context from which to
continue fetching records for active statements, and, in most cases, you
can simply use two different transactions if you need this. InterBase 7 adds
additional features to COMMIT RETAINING, however.
Unlike many other database servers, InterBase can execute transaction
rollbacks almost instantly, due to its Unlike some other database servers, InterBase allows a virtually
unlimited number of transactions to be active concurrently within the
context of a single database attachment. This gives you, the programmer, a
lot of freedom to design your use of transactions. You can use one
transaction per statement, or many statements can share a single
transaction. The table below may help in planning your use of
transactions:
transactions.
It is important to understand that InterBase often makes multiple
copies of a record as it changes over time. InterBase works like a version
control system, making a new copy of a record each time a different
transaction changes it. This is necessary so that InterBase can allow each
transaction to decide what view of the data they'd like to see. It works like this: When each transaction begins, it's assigned a
number, in the order in which it began. The transaction also gets a copy
of an internal structure called the Transaction Inventory Page (TIP), which tells the transaction in what state (active,
committed, rolled back, etc.) every other transaction was at the time it
began. When a transaction affects a row (by inserting, deleting, or
updating it), a new version of the row is created, and stamped with the
transaction number of the transaction making the change. When another
transaction executes a This means that the server will accumulate older versions of rows as
you work with a database. Server functions called garbage collection and
sweeping throw away these older versions when they are no longer needed. A
record is garbage collected when it is read and the sweep essentially
garbage collects the entire database. A version is no longer needed when
there is no possibility that a transaction might ever need to see it. So
if you start a snapshot transaction, for example, the server cannot clean
up any old versions of any record from that point onward until the
transaction ends, because the server does not know which records and
tables the transaction will try to read. This consumes some disk space,
but it doesn't create a significant performance problem in most cases.
One thing which does cause a performance problem in older versions of
InterBase is the sweep itself. In InterBase 7, the multithreading in the
server has been significantly enhanced and the sweep will not block user
queries. In older versions of InterBase, however, user queries can be
noticeably slowed when the sweep thread is active. There are several
possible workarounds for this: As mentioned above, each transaction gets a copy of the TIP
when it starts. The TIP lists the states of all transactions, active or
otherwise, back to the earliest-started transaction which was still active
when the transaction began (the Oldest Active Transaction, or OAT). This
means that if any one transaction is active for a long period of
time, this list can get very long. Since each new transaction gets a
copy of it, and must consult it when each record is read, the InterBase
server starts having to work very hard. You can use performance
monitoring to diagnose when this is happening. However, in some circumstances long-running, read committed
transactions will not create this performance problem. Changes were made
to InterBase in version 7.0 to reduce the impact of doing this. Here are
the results of some tests I have performed against InterBase 7.1 SP 1:
Even a well-designed application may have certain areas with
performance problems. Presuming that the overall design is robust, the
first step in improving performance is to figure out which parts of the
application are causing the problem. In this section of the paper I'll
discuss tools and techniques for diagnosing performance problems in an
application. None of these tools will fix a broken design, however.
They're designed to pinpoint and repair trouble spots, not re-invent the
wheel. Here are some common errors found in InterBase client applications, and
workarounds you can use to fix them.
The easiest way to diagnose this problem is with SQL
monitoring. Note that many SQL monitoring tools do not show fetches by
default, so you may need to turn on this option. If excessive fetching is
slowing down your application you'll spot the problem quickly in the SQL
monitor.
You must keep the size of your result sets small. Paging through result
sets instead of fetching an entire result set at once can help, some. Give
your end user tools to help them quickly find the information they're
looking for instead of confronting them with seemingly endless
lists. As an example, instead of displaying thousands of names in
one list, add "index" tabs for the first letter of a last name -- the
result set is now 1/26th the size it was before. If that's not small
enough, have the user access person records via a search feature instead
of picking directly from a list.
If you have a large number of small result sets and you attempt to
fetch them all at once, this can also cause performance issues even if you
are careful to keep the size of your result sets small. Here are some
things which can cause this problem, and how to fix them.
On the surface, the solution to this problem is quite simple: Just
don't do it.
There are a few times when you need a long-running transaction. When
this is the case consider using IB
7 and higher features that remove the performance penalty for doing
this. But sometimes even this workaround won't do. For example, you
might need to flatten records out into a denormalized structure for data
warehousing purposes, and you might need a single snapshot transaction
which lasts the duration of this process. But these cases are rare
exceptions; for the most part there is no need for a long-running
transaction. Very few people have problems with long-running transactions in batch
processes (i.e., transactions used "in code," as opposed to those used to
display data in a UI). The workflow here is obvious, and resource
protection blocks like Delphi's Unfortunately, transaction management is handled in radically different
ways by each set of data-access components which support InterBase.
IBX, dbExpress, IBO, and the BDE, for example, all work completely
differently in this regard, and it is simply outside of the scope of this
paper to attempt to cover them all. While each component set does provide
ways to manage transactions, no two are the same, and the best advice I
can give you is to learn the preferred method of work for the components
you use. However, it's worth noting that all of the components above work with
DataSnap, and DataSnap does provide a data-access-component-neutral method of
transaction management which works well, and uses very short transactions.
Using DataSnap, it's even possible to switch data access components
without re-architecting the entire application.
It is important to understand that the very idea of record
counting is a bit out of place in a multi-user environment, unless you are
using a
In the Delphi/BCB world, there's a further complication: The contract
for TDataset.RecordCount is weak (it does not completely specify what the
property should do), and so there's no way to deal with the property in a
generic way. In fact, IBX, dbExpress, IBO, and the BDE, all implement it
in different ways; no two are alike!
Finally, it's important to understand that InterBase's multi-generational
architecture, while it has many benefits, makes record counting an
expensive operation on large tables, especially if the sweep has not run
for a while. InterBase must examine each record to determine if it should
be visible to your transaction.
In my opinion there are few if any legitimate uses for
TDataset.RecordCount, and only slightly more for the SQL aggregate
function
It's actually worthwhile to do a search for RecordCount in the source
code of a poorly performing program if you've been asked to improve it and
you didn't write it yourself, as it takes only a minute and may turn up
"interesting" results.
Let's examine some of the alternatives:
There are a couple of use[ful/less] tricks in this statement.
Avoid iterating through the result set of a
One downside of "doing it on the server" is that it's a blocking
operation. In other words, if the process takes a minute to complete, the
thread you execute it from will lock for one minute. Of course, the same
process running on the client application might take ten minutes to
complete. There are a few workarounds you can try for this problem:
For some database servers, rollbacks are a difficult and occasionally
impossible operation. InterBase's multi-generational
architecture, on the other hand, makes them instant and generally
painless. So when you do need a rollback (in other words, when you have
made changes to data that you want to undo), don't hesitate to use it.
On the other hand, if you don't have data changes you need to undo, a
rollback is pointless. Since rollbacks do affect when the sweep
runs, there is a small performance implication to using them
unnecessarily. I occasionally see code like this: Since only one SQL statement is being run in this transaction, there is
nothing to rollback if it fails. The correct code for this case is:
This way the transaction still ends whether or not there is an error.
Of course, if there is an error, no data will be modified. Note that recent versions of InterBase will attempt to turn a pointless
rollback into a commit anyway, so the server tries to cover for you. But
it's always good to ask for the correct thing in the first place! There are a number of tools which every InterBase developer should have
on hand. This list may look long, but some of these are available for
free, and the right tool will save you a lot of money in the long run even
if you have to pay for it upfront. I'll discuss how to use these tools
later in the paper when I discuss a workflow for optimizing an
application. For the time being, this section will serve as an
introduction to what each tool can do. Code profilers time the execution of methods in your application's
source code. Some can even time the execution of individual lines of code.
To use a profiler, you tell the profiler which methods to time, and run
the program as usual. As the program executes, the profiler times and
counts calls to individual methods (or lines). When you exit the program,
the profiler analyzes the data, and shows you immediately where the slow
parts of your application are. The methods which appear to be slow in the profiler may not be
themselves at fault โ the slowdown could be caused by something the
profiler isn't measuring, like network traffic. But a profiler can usually
narrow the scope of a speed problem down from "somewhere in this process"
to a particular method, and it tends to be relatively easy to narrow the
cause further from there.
All of the header page information and more is available from the
InterBase 7.0 temporary system tables, which you can conveniently query
with InterBase performance monitoring tools, as discussed below.
But gstat/IBConsole Database Statistics work with older versions of
InterBase, so they can be useful when you have to work pre-v7 IB. The meaning of the numbers displayed in gstat is discussed in this article. InterBase Performance Monitor is a simple user interface for viewing and
updating the InterBase 7 temporary system tables, plus some additional
features. It is available as a standalone program, and it is also built in
to IBConsole. Some other third-party InterBase administration tools
have similar features available. For a detailed description of the InterBase temporary system tables,
you can refer to the InterBase Language Reference manual. But you
really don't need to do so much study to find this information
useful.
These tools make it easier to tell what InterBase server does
internally when you execute your query. They typically parse the PLAN
generated by IB server, and present it in a more "human-readable" format,
and also give you some idea of how many records the server read to run
your query, and whether or not indices were used in the process (however,
at least some of the tools I've tried have given wildly inaccurate results
for the record counts displayed). InterBase PLANalyzer is
a free query analysis tool, and some InterBase administration tools also
include query analysis functionality. Please attend "Optimization II:
Optimizing InterBase SQL and Metadata" for more information about how to
optimize individual statements.
TCP monitors allow you to look at the low level network traffic between
the InterBase client and server. Typically, you won't need this much
detail when optimizing an application, as an SQL monitor will give you the
information you need. But a low level tool is useful when you must
optimize an application for an extremely bandwidth-limited application,
such as when connecting to a server over a dial-up line. A free, InterBase-only
TCP monitor is on the InterBase Companion CD. SQL monitors allow you to see the communications between your
application and the InterBase client, gds32.dll. Typically, they
show you statement prepares, executes, fetches, and other useful
information.
Each component set has a different SQL monitor. For example, IBX
supplies a component which can be used to build an SQL monitor in an
external application, and there are some free IBX SQL monitors you can download.
But you must enable the TIBDatabase.TraceFlags in your application before
monitoring, for performance reasons. With dbExpress, on the other hand,
the TSQLMonitor component must be built into your application. What follows is a generic series of steps you can use to diagnose
performance problems in an existing application. Chances are, you'll be
familiar enough with the application that you don't need to follow the
entire series of steps every time. Because an application may have more
than one problem which causes poor performance, you may need to follow
these steps again after fixing each problem. Again, as you become familiar
with the application and the causes of performance issues, you won't need
to follow the entire list every time through. When Does the Performance Problem Happen? Is it always at a
particular point in the application? If so, skip ahead to Profile the
Application. If not, skip to Does the Problem Get Worse With Time? Does the Problem Get Worse With Time? Is the application fast
when the InterBase server is shut down and restarted? Does it get slower
as the application is used, and as more people use the server? If so, skip
to Diagnosing Long-running Transactions.
Profile the Application There are two different ways to
profile a DB application -- with a SQL monitor and with a traditional code
profiler. Start with the SQL Monitor since interaction with the DB
server usually takes more time than code execution. Again, we're trying to
fix the slowest stuff first, and move on to the less-slow stuff later on.
If you see excessive fetching, fix that. If you see queries which are
taking a long time to execute, consult my other paper from this same
conference (Optimization II: Optimizing InterBase SQL and Metadata). Diagnosing Long-running Transactions With older versions of
InterBase this was difficult, at best, but with InterBase 7 it's almost
easy. Run InterBase Performance Monitor either as a standalone application
or choose the performance monitoring option in IBConsole (it's on the
right click menu for an active DB connection). Run your application -- or
better still have a large group of people run multiple instances of the
application. Choose the Transactions tab in Performance Monitor.
Click the grid headers to resort the grid by elapsed time. When you find
the long-running transaction you can click the "Find Attachment"
toolbutton to see which user has the transaction open. Now you can either
look at what that user is doing, or filter the Statements grid in
Performance Monitor by that user. This should give you enough information
to find the open transaction in your code.Using Multiple Transactions in a Single Connection
Factor
Considerations
DataSnap
Doesn't matter, except for master/detail datasets linked by a
TDataSource. Master/detail datasets are required to share a single
transaction. DataSnap will control the transaction lifetime and keep
it as short as possible. If using IBX, do not try to use the same
transaction for datasets hooked up to a TDatasetProvider, either
directly or in a master-detail relationship, and datasets not
connected to a TDatasetPovider (as you might use for a batch process
or anything which doesn't end up in a TClientDataset). IBX requires
transactions in these two roles to be configured differently.
Atomicity/Isolation
If a group of statements must succeed or fail as a group, or if
one statement must see uncommitted changes made by another
statement, they simply must share the same transaction. There is no
other way for this to happen.
Performance
One of the key factors to improving performance is to keep
transaction lifetimes short. See "Transaction lifetime," below.
Beyond that, having an obscene number of transactions active
concurrently is probably not a good idea, but if there's a practical
limit I've never managed to reach it. If you only allow transactions
you're actually using to be active, you're unlikely to run into a
problem with "too many"
Simplicity
Choose a scheme which makes sense to you. Some people find a
single transaction easier to deal with; others prefer one per
statement. Still other people like to group statements together
based on their isolation, e.g., "read-only" statements can share one
transaction and "read-write" statements another.
Transaction lifetime
The transaction doesn't need to be active any longer than the
statement. When using multiple statements per transaction, it can be
difficult to find a good time to end the transaction, since
statements can start and stop at any time. Doing this requires
careful planning or a clearly-defined business process. Long-running
transactions can significantly increase the amount of work the
server needs to do, as explained below. Understanding InterBase's Multi Generational Architecture
SELECT
which returns that row, the
server will compare the isolation mode of the transaction executing the
SELECT
with the state of the transactions which affected each
version of the row, and grab the most recent version which should be
visible to the transaction executing the SELECT
(if any).
Sweeping
Transaction Overhead
Transaction options
Effects of long-running transaction
Read committed, read only
A long-running read-committed, read only transaction does not
stick the Oldest Active or Oldest Snapshot transaction.
Read committed, read-write
A long-running read-committed, read-write transaction will stick
the Oldest Active and Oldest Snapshot. However, by doing a COMMIT
RETAINING from time to time you can keep both of them moving along
without actually ending the transaction.
Snapshot, read only
A long-running read-committed, read-write transaction will stick
both the OAT and Oldest Snapshot for as long as it is open. The only
way to get them moving along again is to do a hard COMMIT.
Snapshot, read-write
Same as snapshot, read only. Diagnosing Performance Problems
Common Bottlenecks and How To Fix Them
Excessive Fetching
Cause
Solution
Use of the BDE
Since the BDE can only use a single transaction per connection
(even though InterBase itself has no such limitation) and since
committing a transaction closes any open cursors, the BDE has to
issue a FetchAll on all open datasets when you commit a transaction.
In practical terms, this means that calling TDataset.Post in a BDE
application causes a FetchAll on every open dataset in the entire
application. The best solution is to replace use of the BDE with a
set of components which supports InterBase's ability to have
multiple concurrent transactions within a single DB connection.
Workarounds for the problem if you must use the BDE include
minimizing the number of datasets open at any one time and calling
FetchAll on datasets as you open them (so that a FetchAll for every
dataset doesn't happen all at once when your user edits a
record).
Use of lookup fields
When you use a Delphi lookup field instead of a JOIN then the
lookup tables (or significant portions of them) must be fetched when
the "main" dataset is open. The lookup itself is performed without
benefit of server-side indices. Always use SQL JOINs when
possible. If you must use lookup fields and the lookup tables are
mostly static you can cache them on the client using a
TClientDataset. Long-running Transactions
try/finally
make it easy to
avoid mistakes. When long-running transactions become a problem, it's
generally because of data displayed in a UI.Record Counting
SNAPSHOT
transaction. Without SNAPSHOT
isolation, there is no way to count records and SELECT
them
in the same operation. So if other users are active in the table(s) in
question, the "real" count may well change between the time you count and
the time you SELECT
.COUNT
.
Task
Efficient Solution
Example
Determine if a table is empty in code
Use the TDataset.IsEmpty property
if not MyTable.IsEmpty then begin
DoSomething;
end else begin
raise Exception.Create(NORECORDS);
end;
Determine if a table is empty in SQL or a stored procedure
or trigger
Use the
EXISTS
functionThis query will return 'Table is not empty' if there is one
or more record in SOME_TABLE
, no rows otherwise: SELECT
'Table is not empty'
FROM
RDB$DATABASE
WHERE
EXISTS (SELECT 1 FROM SOME_TABLE);
RDB$DATABASE
is an InterBase system table which always
has one record. In the statement above, this ensures that we will
only get one row containing, "Table is not empty" if there is one or
more rows in SOME_TABLE
. Selecting a constant,
like '1', is a way of getting back a row without reading/caring
about the data it contains; we only want to know if the row exists.
EXISTS
is also useful in a stored proc or trigger: IF (EXISTS(SELECT 1 FROM SOME_TABLE WHERE FOO = :FOO)) THEN BEGIN...
Iterate to the end of a table in code
Iteration in the client should
be avoided when possible, but if you need to do this use the Eof
property.
while not MyTable.Eof do begin
DoSomething;
MyTable.Next;
end;
Configure a progress bar for a UI display
There's really no alternative to counting records for this, but
you can probably make the process much more efficient by breaking it
up into a smaller number of steps if the number of records will be
large.
This is explained in more detail below.
Unnecessary Iteration
SELECT
statement in code whenever possible, because it means that a lot of data
will be sent between the client and server. If it's possible at all to
perform the same operations on the server, the process will be much
faster. The better solution depends upon what you are doing:
Task
Efficient Solution
SQL Example
Copy several rows from one table to another
Use
INSERT...SELECT
syntaxINSERT INTO DESTINATION_TABLE (
DESTINATION_ID,
SOURCE_ID,
DESCRIPTION )
SELECT
GEN_ID(DESTINATION_G, 1),
SOURCE_ID,
DESCRIPTION
FROM
SOURCE
WHERE
<WHERE conditions>
Delete several rows at once
Use a
DELETE
statementDELETE FROM MY_TABLE
WHERE
SOME_DATE BETWEEN :StartDate AND :StopDate
Find aggregate values (averages, sums, minimum or maximum
value)
Use SQL aggregate functions. Note that when using
MIN
on a large table you should create an
ASCENDING INDEX
on that column, and when using
MAX
on a large table you should create a
DESCENDING INDEX
on that column. SUM
and AVG
will always require a full table scan and will
not be helped by an index. SELECT
SUM(COLUMN1), AVG(COLUMN1), MIN(SOME_DATE), MAX(SOME_DATE
FROM
MY_TABLE
WHERE
<WHERE condition>
Perform an operation not supported in simple SQL on several
rows
Use a stored procedure. If you need to get more than one
value back (for example, an error code and ID for each row on which
the operation failed), use a
SELECT
able stored
procedure.CREATE PROCEDURE DO_SOMETHING_TO_FOO (
STARTDATE DATE,
STOPDATE DATE
) RETURNS (
ID INTEGER,
ERROR_CODE INTEGER
) AS
BEGIN
FOR
SELECT
ID
FROM
FOO
WHERE
SOME_DATE BETWEEN :StartDate AND :StopDate
INTO
:ID
DO BEGIN
EXECUTE PROCEDURE DO_SOMETHING(:ID) RETURNING_VALUES (:ERROR_CODE);
IF ERROR_CODE < 0 THEN BEGIN
/* Return a row only if there's an error */
SUSPEND;
END
END
END;
Perform an operation not supported in simple SQL or the stored
procedure and trigger language on several rows
Use (or write)
a user-defined function.
DECLARE EXTERNAL FUNCTION lower
CSTRING(80)
RETURNS CSTRING(80) FREE_IT
ENTRY_POINT 'IB_UDF_lower' MODULE_NAME 'ib_udf';
COMMIT;
UPDATE MY_TABLE SET NAME_LOWER_CASE = lower(NAME);
Unnecessary Rollbacks
// Start of bad code; don't do this!
MyIBDataset.Transaction.StartTransaction;
try
MyIBDataset.ExecSQL;
MyIBDataset.Transaction.Commit;
except
MyIBDataset.Transaction.Rollback;
raise;
end;
// End of bad code
MyIBDataset.Transaction.StartTransaction;
try
MyIBDataset.ExecSQL;
finally
MyIBDataset.Transaction.Commit;
end;
Tools
Code Profilers
Gstat
Gstat
is a command-line tool included with InterBase which
reports a number of pieces of information about a particular InterBase
database. The same information is also available from Database Statistics
in IBConsole. Some of the information pertains to individual tables and
indices, and is outside the scope of this paper (see my other paper for
"Optimization II: Optimizing InterBase SQL and Metadata," if you're
interested in this topic). But the header page statistics are especially
valuable for analyzing slow applications since they can help diagnose
problems with long-running transactions. InterBase Performance Monitor /
InterBase 7 Temporary System Tables
InterBase Query Tuning Tools
TCP Monitors
SQL Monitors
A Workflow for Optimizing Existing Applications