Creating a Giphy Scraper

My first assignment in the the LearnElixir curriculum is to create a giphy scraper with the following requirements:

  • use Giphy’s search endpoint to return 25 results
  • the user must be able to load my project in iex and call GiphyScraper.search(query) to obtain the results
  • the results must be in the following format:
[
  %GiphyScraper.GiphyImage{
    id: "some_id", 
    url: "url_to_gif", 
    username: "username of creator", 
    title: "SomeGif"
  },

  %GiphyScraper.GiphyImage{
    id: "some_other_id", 
    url: "url_to_gif_2", 
    username: "username of creator", 
    title: "MyGif"
  }
]

Here’s how I’m thinking about breaking the problem down

To start, I’m going to use the call to GiphyScraper.search(query) as my API - this is the entrypoint for a user wanting to obtain data. Given that even with this task there’ll be a fair amount of data transformation, I’ll create a primary module that I can delegate to - this module is where the bulk of my functions will live, including requests to the Giphy endpoint.

(This approach will make it easier to add additional ways to interact with the project down the line. When I add a CLI interface, all I have to do is pass the input query from the CLI to the API.)

Additionally, I’ll want a GiphyImage struct that I can parse each giphy result into in order to return the required list of structs.

Lastly, I’ll want to install and use a couple of libraries to do my request and JSON handling. LearnElixir recommends using the finch library; I decided to go with HTTPoison instead since I’m already familiar with it, but this article outlines some of the benefits of finch, and I’d like to explore it as an alternative once I get everything working.

Let’s get started!

The core logic

The bulk of the functions will live in a module that, in theory, won’t be necessary for an end-user to interact with. When it’s working as intended, I’ll delegate the GiphyScraper.search call to this module’s initial function. I like to get stuff like this out of the way at the start of the project just to make sure that everything is working as intended:

defmodule GiphyScraper do
  alias GiphyScraper.Fetcher
  defdelegate search(query), to: Fetcher, as: :get_gifs_for_query
end

And my core logic will live a separate module:

defmodule GiphyScraper.Fetcher do
  def get_gifs_for_query(query) do
    IO.puts "you passed in the following query: #{query}"
  end
end

Sure enough, running this in iex results in the expected output:

iex(3)> GiphyScraper.search "hello"
you passed in the following query: hello
:ok

Fast forwarding a bit, and I have my working core module, GiphyScraper.Fetcher, with a few relatively short functions that look as follows:

defmodule GiphyScraper.Fetcher do
  alias GiphyScraper.GiphyImage

  def get_gifs_for_query(query, limit \\ 25) do
    query
    |> get_giphy_request_url(limit)
    |> make_request_and_return_response_data
    |> Enum.map(&parse_response_data_into_image_data/1)
  end

  def get_giphy_request_url(query, limit) do
    api_key = get_api_key()
    "api.giphy.com/v1/gifs/search?api_key=#{api_key}&q=#{query}&limit=#{limit}"
  end

  def make_request_and_return_response_data(url) do
    HTTPoison.start
    {:ok, response} = HTTPoison.get(url)
    body = response.body |> JSON.decode!
    body["data"]
  end

  def parse_response_data_into_image_data(data) do
    title = get_in(data, ["title"])
    url = get_in(data, ["url"])
    username = get_in(data, ["user", "username"])
    id = get_in(data, ["id"])
    %GiphyImage{
      title: title,
      url: url,
      username: username,
      id: id
    }
  end

  defp get_api_key, do: System.get_env("GIPHY_API_KEY")
end

And indeed, running GiphyScraper.search("cheeseburger") from within iex produces the following results (truncated for brevity’s sake):

[
  %GiphyScraper.GiphyImage{
    id: "3ohs4h1Dt995D5iGA0",
    url: "https://giphy.com/gifs/scoobydoo-cartoon-scooby-doo-3ohs4h1Dt995D5iGA0",
    username: "scoobydoo",
    title: "Hungry Cartoon GIF by Scooby-Doo"
  },
  %GiphyScraper.GiphyImage{
    id: "xTiTnwj1LUAw0RAfiU",
    url: "https://giphy.com/gifs/matthewjocelyn-dancing-dance-burger-xTiTnwj1LUAw0RAfiU",
    username: "matthewjocelyn",
    title: "Dance Dancing GIF by matthewjocelyn"
  },
  ...

For the top-level function (get_gifs_for_query), I chose to use pipes in order to make clear the transformation of the data as it was received and passed on. I decided to make a standalone function for retrieving a formatted url to send to the Giphy endpoint. This was in part due to the need to retrieve an api_key, as well as the optional limit parameter. I set the default to 25, which is already what the giphy endpoint defaults to, but I thought it’d be useful to include in case the end user wants to modify it.

When parsing the response data into structs, I decided to use Kernel.get_in from the start. To me, it just looks cleaner than a bunch of subsequent brackets, and it helps to set the expectation in the code of nested maps when decoding JSON.

Some closing thoughts

This exercise was a great way to get used to parsing data from an endpoint. I ran into a few errors (ProtocolError, ArgumentError, etc) when trying to parse the response received by both the HTTPoison and the finch clients, as well as with Jason and JSON; it took some trial and error before I remembered to prase on the {:ok, _} pattern, and to realize that I was dealing with maps and not strings. I’m sure I’ll get used to it.

Additionally, I found it fun to explore the different ways of grouping certain functions, as well as deciding when to use default parameters, etc. These are not problems new to Elixir, but they were made more interesting by the different options that Elixir DOES present. Should I try to access nested keys directly, or use get_in function? Should I use an anomyous function or a named function to pass in as a second argument to Enum.map? Does the procedural break down of the data in make_request_and_return_response_data feel too “Python-y”? Should I find a way to re-arrange that data transformation so it can be piped through to the end?

It was a fun exercise and exposed me to different parts of working with Elixir code on something relatively nontrivial. You can find the full project in my Github repo

Up next

I’ll add a CLI layer to allow for query input via the command line. I’ll also see if I can get finch working as expected. Lastly, I’d like to add some tests, including finding a way to mock data for specific parts of the pipeline. Stay tuned for Part 2… (coming soon)

Update: You can find the next post here, outlining how to add a CLI interface for querying

Written by

Leo Rubiano

Reader, programmer, traveler. Experienced back-end dev proficient with Python, Go, Elixir, Ecto, and Postgres.