Using HTTP APIs on the Command Line - Part 3 - ganda

This is part 3 of a series of posts on Using HTTP APIs on the command line:

curl - making requests on the command line
jq - a command-line stream editor for JSON
ganda (with a little awk) - this post

What is `ganda`? #

ganda is an open-source golang app on github. It was designed to be able to do up to millions of parallel HTTP/HTTPS requests in just a few minutes from a single box.

It expects to be piped a stream of urls that it will request in parallel:

cat file_of_urls | ganda

or if you’re averse to useless uses of cat:

ganda < file_of_urls

(for the rest of the post, I’m going to use cat, even though it is “useless” as I find it clearer and easier to switch to some other utility like head or grep if I don’t want a full file.)

Why `ganda` and not `curl`? #

I created ganda to fill what I saw as a gap in command-line tooling. We saw in part 1 of this series how curl can be used to make individual requests which could then be saved or piped into tools such as jq.

curl knows how to request a single url, but what if we want to make more than a single request and process all of the results? Well, you can use a utility like xargs to create many instances of curl:

time seq 10000 |\
 awk '{printf "http://localhost:1323/%s\n", $1}' |\
 xargs -I {} -P 8 curl -s {} > /dev/null

This creates a sequence of numbers from 1 to 10,000 and pipes them to awk to create the urls http://localhost:1323/1 to http://localhost:1323/10000. Those urls are then piped to xargs which uses 8 parallel threads to do curl requests to a local server running an echoserver process running on port 1323. We throw away the results by piping to /dev/null for this test.

echoserver simply echos back the path it is passed, so it returns responses from 1 through 10000.

On my 2-core, 3 Ghz mac-mini it takes slightly over 2 minutes, for xargs and curl to make those 10k requests:

seq 10000  0.01s user 0.00s system 84% cpu 0.010 total
awk '{printf "http://localhost:1323/%s\n", $1}'  0.02s user 0.00s system 0% cpu 44.851 total
xargs -I {} -P 8 curl -s {} > /dev/null  110.73s user 84.53s system 237% cpu 1:22.25 total

if we instead pipe those urls to ganda:

time seq 10000 |\
 awk '{printf "http://localhost:1323/%s\n", $1}' |\
 ganda -s > /dev/null

It is able to make all 10,000 requests in 1 second:

seq 10000  0.00s user 0.00s system 87% cpu 0.008 total
awk '{printf "http://localhost:1323/%s\n", $1}'  0.02s user 0.00s system 2% cpu 0.866 total
ganda -s > /dev/null  0.85s user 0.37s system 106% cpu 1.150 total

The one advantage that curl does have over ganda is that it is likely already installed on your system. If you have to make hundreds (or more) requests, you’ll save time by installing ganda first.

Installing `ganda` #

On Macs, you can use homebrew to install the latest version:

brew tap tednaleid/homebrew-ganda
brew install ganda

On linux, you can download a binary from the releases page and put it in your path.

or compile it from source if you have the golang toolchain available:

go get -u github.com/tednaleid/ganda

How to use `ganda` #

ganda wants to be piped urls. You can create a list of urls as a pre-step. This creates /tmp/list_of_urls with 3 httpbin urls and then pipes them to ganda:

cat << EOF > /tmp/list_of_urls
http://httpbin.org/anything/1
http://httpbin.org/anything/2
http://httpbin.org/anything/3
EOF

cat /tmp/list_of_urls | ganda

the output of this is:

Response: 200 http://httpbin.org/anything/3
{
  "args": {},
  "data": "",
  "files": {},
  "form": {},
  "headers": {
    "Accept-Encoding": "gzip",
    "Connection": "close",
    "Host": "httpbin.org",
    "User-Agent": "Go-http-client/1.1"
  },
  "json": null,
  "method": "GET",
  "url": "http://httpbin.org/anything/3"
}
Response: 200 http://httpbin.org/anything/1
{
  "args": {},
  "data": "",
  "files": {},
  "form": {},
  "headers": {
    "Accept-Encoding": "gzip",
    "Connection": "close",
    "Host": "httpbin.org",
    "User-Agent": "Go-http-client/1.1"
  },
  "json": null,
  "method": "GET",
  "url": "http://httpbin.org/anything/1"
}
Response: 200 http://httpbin.org/anything/2
{
  "args": {},
  "data": "",
  "files": {},
  "form": {},
  "headers": {
    "Accept-Encoding": "gzip",
    "Connection": "close",
    "Host": "httpbin.org",
    "User-Agent": "Go-http-client/1.1"
  },
  "json": null,
  "method": "GET",
  "url": "http://httpbin.org/anything/2"
}

As you can see, the results are not guaranteed to come back in the same order that they were requested (on this run they came back in 3, 1, 2 order). The Response: <STATUS_CODE> url line is written to stderr and the body of the response is written to stdout. This lets us use a tool like jq to process the results:

cat /tmp/list_of_urls | ganda | jq -c '[.method, .url]'

Response: 200 http://httpbin.org/anything/2
["GET","http://httpbin.org/anything/2"]
Response: 200 http://httpbin.org/anything/1
["GET","http://httpbin.org/anything/1"]
Response: 200 http://httpbin.org/anything/3
["GET","http://httpbin.org/anything/3"]

The Response line can be silenced with the -s flag so that you only see the info piped to stdout:

cat /tmp/list_of_urls | ganda -s | jq -c '[.method, .url]'

["GET","http://httpbin.org/anything/1"]
["GET","http://httpbin.org/anything/2"]
["GET","http://httpbin.org/anything/3"]

Using `awk` to construct urls in a pipeline to `ganda` #

Say I have a file of IDs (and other values that vary in the url):

cat << EOF > /tmp/list_of_id_and_param
1000 foo
2000 bar
3000 baz
EOF

I can use awk to construct urls out of that file:

cat /tmp/list_of_id_and_param |\
  awk '{printf "http://httpbin.org/anything/%s?value=%s\n", $1, $2}'

outputs:

http://httpbin.org/anything/1000?value=foo
http://httpbin.org/anything/2000?value=bar
http://httpbin.org/anything/3000?value=baz

Instead of creating a file out of that, I’ll just pipe it directly to ganda and then process the results:

cat /tmp/list_of_id_and_param |\
  awk '{printf "http://httpbin.org/anything/%s?value=%s\n", $1, $2}' |\
  ganda -s |\
  jq -c "[.url, .args]"

results in:

["http://httpbin.org/anything/2000?value=bar",{"value":"bar"}]
["http://httpbin.org/anything/3000?value=baz",{"value":"baz"}]
["http://httpbin.org/anything/1000?value=foo",{"value":"foo"}]

Workers and throttling #

WARNING ganda used without some care can easily send more requests than many servers can handle.

Please don’t abuse it or free services like httpbin.org with thousands of requests. Use it on your own servers however you like.

By default, ganda uses 30 parallel workers (goroutines that listen to a channel of incoming requests). This is the maximum number of requests that will be made in parallel. You can adjust the number of workers with the -W flag, using -W 1 will make ganda use a single worker and a single request at a time:

cat /tmp/list_of_urls | ganda -W 1 | grep url
Response: 200 http://httpbin.org/anything/1
  "url": "http://httpbin.org/anything/1"
Response: 200 http://httpbin.org/anything/2
  "url": "http://httpbin.org/anything/2"
Response: 200 http://httpbin.org/anything/3
  "url": "http://httpbin.org/anything/3"

You can also throttle the number of requests that are made per second with the -t <max per second> flag. This throttles it to 1 request per second, even though it is using 30 workers:

time cat /tmp/list_of_urls | ganda -W 30 -t 1 > /dev/null
Response: 200 http://httpbin.org/anything/1
Response: 200 http://httpbin.org/anything/2
Response: 200 http://httpbin.org/anything/3
cat /tmp/list_of_urls  0.00s user 0.00s system 75% cpu 0.004 total
ganda -W 30 -t 1 > /dev/null  0.01s user 0.01s system 0% cpu 3.131 total

Saving results to individual files #

If you want to save the response data rather than using them in a pipe, you can use the -o <directory> flag to specify a directory where the individual results should be saved:

cat /tmp/list_of_urls | ganda -o /tmp/output
Response: 200 http://httpbin.org/anything/1 -> /tmp/output/http-httpbin-org-anything-1
Response: 200 http://httpbin.org/anything/2 -> /tmp/output/http-httpbin-org-anything-2
Response: 200 http://httpbin.org/anything/3 -> /tmp/output/http-httpbin-org-anything-3

It will transform the url into a filename within that directory and save the results:

cat /tmp/output/http-httpbin-org-anything-1
{
  "args": {},
  "data": "",
  "files": {},
  "form": {},
  "headers": {
    "Accept-Encoding": "gzip",
    "Connection": "close",
    "Host": "httpbin.org",
    "User-Agent": "Go-http-client/1.1"
  },
  "json": null,
  "method": "GET",
  "url": "http://httpbin.org/anything/1"
}

If you will be saving more than a few thousand results to files, you likely want to save them in multiple subdirectories otherwise your operating system will be cranky when you ls that directory. ganda can hash the url and put the response in a subdirectory with the -S <subdir length> flag, use 2 for more than about 5k urls, and 4 for more than 5 million urls:

cat /tmp/list_of_urls | ganda -o /tmp/output -S 2
Response: 200 http://httpbin.org/anything/2 -> /tmp/output/64/http-httpbin-org-anything-2
Response: 200 http://httpbin.org/anything/1 -> /tmp/output/99/http-httpbin-org-anything-1
Response: 200 http://httpbin.org/anything/3 -> /tmp/output/b6/http-httpbin-org-anything-3

Notice that the files are now in subdirectories within the /tmp/output output directory.

Authentication with headers #

Like curl, ganda supports the -H <header> flags to send headers with each request:

cat /tmp/list_of_urls | ganda -H "Authorization: Bearer <mytoken>" | grep -E "Authorization|url"
Response: 200 http://httpbin.org/anything/1
    "Authorization": "Bearer <mytoken>",
  "url": "http://httpbin.org/anything/1"
Response: 200 http://httpbin.org/anything/3
    "Authorization": "Bearer <mytoken>",
  "url": "http://httpbin.org/anything/3"
Response: 200 http://httpbin.org/anything/2
    "Authorization": "Bearer <mytoken>",
  "url": "http://httpbin.org/anything/2"

Using other request methods #

By default, ganda makes GET requests. You can use the -X <method> flag to use another request type, ex:

cat /tmp/list_of_urls | ganda -X POST | grep -E "method|url"
Response: 200 http://httpbin.org/anything/2
  "method": "POST",
  "url": "http://httpbin.org/anything/2"
Response: 200 http://httpbin.org/anything/1
  "method": "POST",
  "url": "http://httpbin.org/anything/1"
Response: 200 http://httpbin.org/anything/3
  "method": "POST",
  "url": "http://httpbin.org/anything/3"

adding data to the body of the request #

ganda allows sending a templated request body via the -d flag. If you want the same thing sent with every request you can use a literal string:

cat /tmp/list_of_urls | ganda -X POST -d "foo=bar" | grep -E 'data|url'
Response: 200 http://httpbin.org/anything/1
  "data": "foo=bar",
  "url": "http://httpbin.org/anything/1"
Response: 200 http://httpbin.org/anything/2
  "data": "foo=bar",
  "url": "http://httpbin.org/anything/2"
Response: 200 http://httpbin.org/anything/3
  "data": "foo=bar",
  "url": "http://httpbin.org/anything/3"

but ganda also allows you to parameterize the template body if you’d like to vary the body that gets posted with each request. It will replace each %s in the body template with a space-delimited values after the urls that it is streamed, so with the file:

cat << EOF > /tmp/list_of_urls_and_values
http://httpbin.org/anything/1 bar 111
http://httpbin.org/anything/2 baz 222
http://httpbin.org/anything/3 qux 333
EOF

We have each url followed by values that should be used in the body that we POST to that url:

cat /tmp/list_of_urls_and_values | ganda -X POST -d '{"first": "%s", "second": "%s"}' | grep -E 'data|url'
Response: 200 http://httpbin.org/anything/3
  "data": "{\"first\": \"qux\", \"second\": \"333\"}",
  "url": "http://httpbin.org/anything/3"
Response: 200 http://httpbin.org/anything/2
  "data": "{\"first\": \"baz\", \"second\": \"222\"}",
  "url": "http://httpbin.org/anything/2"
Response: 200 http://httpbin.org/anything/1
  "data": "{\"first\": \"bar\", \"second\": \"111\"}",
  "url": "http://httpbin.org/anything/1"

This can be very useful for hitting things like GraphQL services.

When to use `ganda` #

I tend to use ganda for anything beyond a single url, but still use curl for singular urls as it is more portable. It is useful any time I want to make a number of requests and perform an analysis across urls.

Unix tools are best when they do “one thing well” and ganda’s whole purpose is to make lots and lots of http/https requests. It composes really well with tools like jq, sort, uniq and grep and enables many ad-hoc analysis possibilities when you have access to an HTTP endpoint but not direct access to a database.

What is ganda? #

Why ganda and not curl? #

Installing ganda #

How to use ganda #

Using awk to construct urls in a pipeline to ganda #