Being a queueing system, the waiting time of Hyperledger Fabric increases exponentially as load increases. Therefore transaction latency is quite low. Howsoever, when we leverage golevelDB we should get at least 2000 tps, no matter the latency.
What I can figure out from the CPU utilization plot, there are 36 vCPUs and only 16 vCPUs which are completely utilized fully. You can set the value for validatorPoolSize in core.yaml for each peer as equal or lesser than the block size, and henceforth check for an increase in throughput.
The performance, however, is dependent on some parameters and would, therefore, differ based on them. These parameters are:
workload (fab car versus fabcoin)
disk (hdd vs ssd, local versus network (attached))
load generator (CLI versus SDK)
Load generation method (open versus closed versus distributions)
Network bandwidth (should be at least 1.6 Gbps for 2700 tps)
Also, before looking for results you need to make sure that the load generator is not becoming a bottleneck. Therefore you can divide latency into endorsement latency, ordering latency, and commit latency. Then, you collect other resource utilization values including network and disk. This will help you identify the bottleneck (if any exists) easily.
Want to make your career in Blockchain? Enroll in Blockchain Course to acquire the essential skills.