Discovery

A couple months ago, I met a woman on OKCupid who was also a developer. She boasted about being able to use the now removed Visitor feature of OKCupid through their JSON API, polling it to see who was looking at her profile throughout the day. We weren’t a match for each other, but I was intrigued by the API.

The Data

A few weeks ago, I hit the same API using okcupidjs, a library developed by Hung Tran, and I was floored at the data that provided.

Below is an anonymized payload of a visitor to my profile on OkCupid.

{
    "stalkers": [
        {
            "religion": "Agnostic",
            "last_sent_message": 1522203406,
            "online_now": 1,
            "original_username": "OrigJaneDoe",
            "friend_percentage": 60,
            "relationshipstatus": "Single",
            "pic_lower_right_y": 720,
            "pic_lower_right_x": 720,
            "status": "Single",
            "SHOW_IM": 0,
            "match_percentage": 86,
            "education_level": "University",
            "last_received_message": 1522206063,
            "location": "Austin, Texas",
            "realname": "Jane",
            "event_user_thumbnail": null,
            "userid": "00000000000000000",
            "image": "https://k0.okccdn.com/media/img/user/placeholder_2013/pq_60.png",
            "displayname": "JaneDoeCute666",
            "stalk_time": 1522208367,
            "languages": [
                {
                    "fluency": 0,
                    "language": 74
                }
            ],
            "adjectives": [],
            "ethnicity_list": [],
            "picid": "00000000000000000",
            "is_favorite": 0,
            "locals_like": true,
            "thumbnail": "0x0/720x720/2/000000000jpeg",
            "location_detail": {
                "position": {
                    "longitude": -97.74306,
                    "metric": 0,
                    "latitude": 30.26715
                },
                "metro_area": 640,
                "locid": 000000,
                "neighborhood": {
                    "close_type": 0,
                    "distance_from": 0.00678613,
                    "name": "Downtown"
                },
                "state_code": "TX",
                "location": {
                    "postal_code": "78701",
                    "nameid": 00000,
                    "display_state": 1,
                    "locid": 000000,
                    "state_code": "TX",
                    "country_name": "United States",
                    "longitude": -9774306,
                    "popularity": 0,
                    "state_name": "Texas",
                    "country_code": "US",
                    "city_name": "Austin",
                    "metro_area": 640,
                    "latitude": 3026715
                },
                "postal_code": "",
                "locstr": "Austin, Texas",
                "country_code": "US"
            },
            "rev_score": 5,
            "acct_level": 0,
            "gender": "F",
            "score": 5,
            "sign": "Virgo",
            "pic_upper_left_y": 0,
            "height_in": 67,
            "pic_upper_left_x": 0,
            "locals_fan": true,
            "acct_status": "ok",
            "username": "JaneDoe666",
            "bodytype": "Thin",
            "age": 30,
            "enemy_percentage": 18,
            "orientation": "Straight",
            "birthdate": 558210400
        }, ...]
  }
} 

Here’s a quick rundown about some of the less obvious data points:

  • birthdate - Exact birthdate of the person. Birthdate is considered PII (personally identifiable information) by many agencies in the US and EU.
  • stalk_time - The time the user visited your profile.
  • userid - ID of the user, could be used within another attack vector/API.
  • realname - This is usually the first name provided by the user, especially since the username deprecation by OkCupid
  • last_received_message - Timestamp of when the user last received a message.
  • last_sent_message - Timestamp of when the user last sent a message.
  • location_detail - This is where things get a little creepy, and complicated.

This is an unnecessarily large amount of data for OKCupid’s old visitor feature. The only data you would need for this feature is the username, the time they visited you, a thumbnail, their city/state, and match percentage.

The old visitor feature.

Location Tracking

Using some deductive reasoning and experimentation, I was able to figure out the following:

  • close_type of the neighborhood object represents how the user set their location. If set through the mobile app using your phone’s GPS, it will be 1. If you set it through postal/zip code, it will be 0.
  • Latitude and longitude are never exactly accurate. It’s hard to confirm, but I have a theory that OKCupid utilizes geo-fencing and correlates you to your nearest area.
  • The only piece of data that is concerning is distance_from, which seems to show your distance from the latitude/longitude described.

In conclusion, this is not absolutely terrible like some apps have been in the past. It’s still unsettling and unnecessary for the visitor feature.

Taking Action

I reached out to okcupidjs’s Hung Tran to get his thoughts. While he had written about discovering the API and its contents in his 2014 blog, he said he did not personally reach out to OKCupid regarding the security holes. It’s hard to tell what exact data the API provided in the past, compared to the present.

On March 29th, I contacted OKCupid to report the issue, and asked why so much unnecessary data was being provided. OKCupid responded within 6 hours, telling me that they confirmed the issue and removed access to the visitor API endpoint. This is fast in the world of responsible disclosure, and I give them kudos. However, they gave no answer for why unnecessary data was being provided.

Why is this Important?

First off, OKCupid doesn’t seem to be aware of what data is public and what is private. See the screenshot of the OkCupid’s Android App “Your Location” setting screen (version 10.10.0) below.

Are the italics suppose to convey sarcasm?

The statement “Your zip code will not be public” was blatantly false.

birthdate is a data point which is considered part of Personally Identifiable Information / Personal Data in many countries. I am far from being a lawyer, but OKCupid should be concerned with regulations like the GDPR approaching soon.

This may not be as appalling as Grindr’s reporting of HIV status to third parties. It might not be as powerful and widespread as the data used in the Cambridge Analytica Debacle. But that’s not the point.

We need to hold companies accountable. It is their responsibility to ensure privacy policies clearly state what data is available. Public APIs should be direct and compact, and developers need to fight for the time to implement secure systems.

P.S - If you want to contact me to talk more about this, feel free to reach out.