It's very common to talk about application performance, but unfortunally performance can means different things to different people. Here is very useful set of definitions found in Patterns of Enterprise Application Architecture by Martin Fowler about the subject:
It's the amount of time it takes for the system to process a request from the outside. This may be a UI action, such as pressing a button, or a server API call.
It's about how quickly the system acknowledges a reques as opposed to processing it. This is important in many systems because users may become fustrated if a system has low responsiveness, even if its response time is good. If your system waits during the whole request, then your responsiveness and response
time are the same. However, if you indicate that you've received the request before you complete, then your responsiveness is better.
It's the minimum time required to get any form of response, even if the work to be done is nonexistant. As an application developer, I can usuallay do nothing to improve latency. Latency is also the reason why you should minimize remote calls.
It's how much stuff you can do in a given amount of time. If you are timing the copying of a file, throughput might be measure in bytes per second. For enterprise applications a typical measure is transactions per second (tps), but the problem is that this depends on the complexity of your transaction. For your particular system you should pick a common set of transactions.
In this terminology performance is either throughput or response time -whichever matters more to you. It can be sometimes be difficult to talk about performance when a technique improves throughput but decreases response time, so it's best to use the more precise term. From a user's perspective responsiveness may be more important than response time, so improving responsiveness at a cost of response time or throughput will increase performance.
It's a statement of how much stress a system is under, which might be measured in how many users are currently connected to it. The load is usually a context for some other measurement, such as response time. Thus, you may say that the response time for some request is 0.5 seconds with 10 users and 2 seconds with 20 users.
It's an expression of how the response time varies with the load. We may also use the term degradation to say that system B degrades more than system A.
It's the performance divide by the resources.
The capacity of a system is an indication of maximum effective throughput or load. This might be an absolute maximum or a point at which the performance dips below an acceptable threshold.
It's the measure of how adding resources (usually hardware) affects performance. A scalable system is one that allows you to add hardware and get a commensurate performance improvemente, such as doubling how many servers you have to double your throughput. Vertical scalability, or scaling up, means adding more power to a single server, such as more memory. Horizontal scalability , or scaling out, means adding more servers.
When building enterprise systems, it often makes sense to build for hardware scalability rather than capacity or even efficiency. Scalability gives you the option of better performance if you need it. Scalability can also be easier to do. Often designers do complicated things that improve the capacity on a particular hardware platform when it might actually be cheaper to more hardware. ... It's fashionable to complain about having to rely on better hardware to make our software
run properly, and I join this choir whenever I have to upgrade my laptop just to handle lastest version of Word. But newer hardware is often cheaper than making software run on less powerful systems. Similarly, adding more servers is often cheaper than adding more programmer -provinding that a system is scalable.