TextCaptcha

A source of simple textual CAPTCHA challenges.

tl;dr

The TextCaptcha service provides access to textual CAPTCHA challenges via a simple JSON or XML API over HTTP.

$ curl http://api.textcaptcha.com/myemail@example.com.xml
<captcha>
 <question>If tomorrow is Saturday, what day is today?</question>
 <answer>f6f7fec07f372b7bd5eb196bbca0f3f4</answer>
 <answer>dfc47c8ef18b4689b982979d05cf4cc6</answer>
</captcha>
 
$ curl http://api.textcaptcha.com/myemail@example.com.json
{ "q":"If tomorrow is Saturday, what day is today?"
  "a":["f6f7fec07f372b7bd5eb196bbca0f3f4",
                       "dfc47c8ef18b4689b982979d05cf4cc6"] }

The question is the textual challenge that should be presented to the user. The answers are the MD5 hashes of correct lower cased answers: you should be able to check responses from real users you challenge with the question against these checksums.

Examples

The service only supports English (UK) question challenges.

Do text CAPTCHAs actually work?

Yes, and No.

A text CAPTCHA is more accessible to visually impaired users than their image-based alternatives like ReCAPTCHA. Using text is also more flexible if for example you need to present the challenge over a text-only channel like SMS or IRC.

The problem with text CAPTCHAs is that they provide inherently more information than a distorted image. They are parseable, under the right conditions solveable. With the advent of contextual parsing tools such as Wolfram Alpha, such simple logic puzzles have become easier to solve programatically.

Usage

Whenever you require a logic question, you need to make a request to the TextCaptcha service at:

http://api.textcaptcha.com/<chooseYourID>.<format>

where <format> should be either xml or json, and <chooseYourID> should be some reference to yourself (e.g. an email address, domain or similar where if there are problems with your usage you can be contacted).

The request will return an XML or JSON response containing a randomly selected question and answer. Multiple answers may be returned if several responses are acceptable (e.g. '1' and 'one'). The answers are provided as MD5 checksums of the lower-cased answers which allows you to compare a users response with the answers without explicitally knowing the answer yourself.

Usage Examples

Stateful (Session) PHP Example

Once the form has been submitted, the answer given by the user needs to be validated against the answers that you have stored in session. You need to trim, lower-case and MD5 hash the user's response before directly comparing it to the answers stored in session.

Stateless PHP Example

It is possible to embed the answers directly in the form as hidden inputs to remove any dependence on session state. However, if you do not 'salt' the direct hash of the answer, this will weaken the strength of your implementation. For example, an attacker might try to guess the correct answer by hashing each word of the question and attempting to match it to the hidden form input answers (which might work for some logic questions). To protect against this, you need to 'salt' the answer hashes:

In a stateless implementation, you should consider the possibility of an attacker re-using the same captcha tokens repeatedly. For example, if the attacker loads the form and answers the question provided correctly once manually, they can then resend the same form repeatedly and it will pass your captcha test. To prevent this, you will need to lock the captcha provided to a specific form instance, and make sure that form instance can only be utilised once. This is often achieved using form timeouts from hidden inputs, or other techniques designed to avoid CSRF attacks. A full explanation is beyond the scope of this brief example, but the issue needs careful consideration to provide a robust implementation.

NodeJS Local Mirror Daemon Example

The following nodejs daemon serves a JSON captcha over HTTP on port :3000 to any internal services you may have that need access to this data, and updates the captcha from the textcaptcha API every every 5 seconds. This sort of approach is good for heavy traffic applications as it protects your own applications from textcaptcha API failures/response speeds and also limits the traffic to my API.