Speeding up API Response Time


Speeding up API Response Time

I wish reduce the complete time a web server to REQUEST/RECEIVE data from an API server for a given query.
Assuming MySQL as bottleneck, I updated the API server db to Cassandra, but still complete time remains the same. May be something else is a bottleneck which I could not figure out.
Number of Request Estimated per minute: 100
Database: MySQl / Cassandra
Hardware: EC2 Small
Server Used: Apache HTTP

Current Observations:
Cassandra Query Response Time: .03 Secs
Time between request made and response received: 4 Secs

Time between request made and response received: 1 Secs

BOTTOM LINE: How can we reduce the complete time taken in this given case?
Feel free to ask for more details if required. Thanks


Solution 1:

Summarizing from the chat:

  • Environment:
    • Running on a small Amazon EC2 instance (1 virtual CPU, 1.7GB RAM)
    • Web server is Apache
    • 100 worker threads
    • Python is using Pylons (which implies WSGI)
    • Test client in the EC2
  • Tests:
    • 1.8k requests, single thread
      • Unknown CPU cost
      • Cassandra request time: 0.079s (spread 0.048->0.759)
      • MySQL request time: 0.169s (spread 0.047->1.52)
    • 10k requests, multiple threads
      • CPU runs at 90%
      • Cassandra request time: 2.285s (spread 0.102->6.321)
      • MySQL request time: 7.879s (spread 0.831->14.065)

Observation: 100 threads is probably a lot too many on your small EC2 instance. Bear in mind that each thread spawns a Python process which occupies memory and resources – even when not doing anything. Reducing the threads reduces:

  • Memory contention (and memory paging kills performance)
  • CPU buffer misses
  • CPU contention
  • DB contention

Recommendation: You should aim to run only as many threads as are needed to max out your CPU (but fewer if they max out on memory or other resources). Running more threads increases overheads and decreases throughput.

Observation: Your best performance time in single-threaded mode gives a probable best-case cost of 0.05 CPU-seconds per request. Assuming some latency (waits for IO), your CPU cost may be quite a lot lower). Assuming CPU is the bottleneck in your architecture, you probably are capable of 20-40 transactions a second on your EC2 server with just thread tuning.

Recommendation: Use a standard Python profiler to profile the system (when running with an optimum number of threads). The profiler will indicate where the CPU spends the most time. Distinguish between waits (i.e. for the DB to return, for disk to read or write data) vs. the inherent CPU cost of the code.

  • Where you have a high inherent CPU cost: can you decrease the cost? If this is not in your code, can you avoid that code path by doing something different? Caching? Using another library?
  • Where there is latency: Given your single-threaded results, latency is not necessarily bad assuming that the CPU can service another request. In fact you can get a rough idea on the number of threads you need by calculating: (total time / (total time – wait time))
    However, check to see that, while Python is waiting, the DB (for instance) isn’t working hard to return a result.

Other thoughts: Consider how the test harness delivers HTTP requests – does it do so as fast as it can (eg tries to open 10k TCP sockets simultaneously?) If so, this may be skewing your results. It may be better to use a different loading pattern and tool.

Solution 2:

Cassandra works faster on high load and average time of 3 – 4 secs on a two system on different sides of the world is ok.