youtubescraper

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 14, 2022 License: LGPL-2.1 Imports: 10 Imported by: 0

README

Youtube scraper

Go Report Card

⚠️ Note: Diagrams are only rendered in github.

With this package you can get the title, description, url, thumbnail and tags from youtube videos using and without using the Youtube Data API v3.

Sample result:

Installation

From your project, run:

go get github.com/PChaparro/go-youtube-scraper

✨ Available functions

Get the urls using the youtube API:

The following example will obtain 100 urls for the query "Learn web development" using your youtube api key.

urls, err := youtubescraper.GetVideosUrlFromApi("youtube_api_key", "Learn web development", 100)
Get the urls without using the youtube API:

The following example will obtain 100 urls for the query "Learn web development" without using your youtube api key.

urls, err := youtubescraper.GetVideosUrlFromSite("Learn web development", 100)
Get the url, thumbnail,title,description, and tags with and without using the youtube API:

The following example will get the data from 100 videos obtained from que search query "Learn web development" using your youtube api key (Just to get the urls of each video) and using 32 as concurrent go routines limit. Note the "youtube_api_key" and the last boolean argument

videos, err := youtubescraper.GetVideosData("youtube_api_key", "Learn web development", 100, 32, true)

The following example will get the data from 100 videos obtained from que search query "Learn web development" without using your youtube api key and using 32 as concurrent go routines limit. Note the empty string used as youtubeApiKey argument and the last boolean argument.

videos, err := youtubescraper.GetVideosData("", "Learn web development", 100, 32, false)

🔍 Explanation

How GetVideosUrlFromApi function works in GetVideosData:
sequenceDiagram
  autonumber
  participant Package
  participant Youtube Data API
  participant Youtube.com

  Package ->> Youtube Data API: Search throught /v3/search endpoint.
  Youtube Data API ->> Package: JSON response with 50 videos limit.

  loop Until Desired Size
    Package ->> Youtube Data API: Request next page throught /v3/search endpoint.
    Youtube Data API ->> Package: JSON response with the new page.
  end

  loop For Each Video URL
    Package ->> Youtube.com: Http GET request to obtain the plain html.
    Youtube.com ->> Package: Video plain HTML
  end

  Note left of Package: Parse with regular expressions.

As you saw in the diagram, we only use the Youtube Data API v3 to request the URLS, that is, 1 time for each 50 videos.

How GetVideosUrlFromSite function works in GetVideosData:
sequenceDiagram
  autonumber
  participant Package
  participant Web driver

  Package ->> Web driver: Open a new page with the search query
  Web driver ->> Package: Page

  loop Until Desired Size
    Package ->> Web driver: Exec js code to scroll down.
    Web driver ->> Package: Updated videos array lenght.
  end

  Note left of Package: Get the urls from the videos array.

  loop For Each Video URL
    Package ->> Youtube.com: Http GET request to obtain the plain html.
    Youtube.com ->> Package: Video plain HTML
  end

  Note left of Package: Parse with regular expressions.

🧪 Testing

Take into consideration that running the tests file will use some of your Youtube Data API V3 Quotas (You can change the test file inside /tests folder to use more or less Quotas).

  1. Clone this repository
git clone https://github.com/PChaparro/go-youtube-scraper
  1. Create a .env file inside the /tests folder with the following field:
YOUTUBE_API_KEY=your key here
  1. Install the testing dependencies:
go mod tidy
  1. Run the tests:

From root directory:

go test ./...

From tests directory:

go test

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func GetVideosData

func GetVideosData(youtubeApiKey, criteria string, size, concurrencyLimit int, useYoutubeApi bool) (interfaces.Videos, error)

GetVideosData Get the videos urls with the desired method (With and without using the youtube api), and then, obtain the title, description, thumbnail, tags and url for each video.

func GetVideosUrlFromApi

func GetVideosUrlFromApi(youtubeApiKey, searchCriteria string, size int) ([]string, error)

GetVideosUrlFromApi Make a GET request to youtube API to get videos related to the search criteria and return the urls

func GetVideosUrlFromSite

func GetVideosUrlFromSite(searchCriteria string, size int) ([]string, error)

GetVideosUrlFromSite Open a web-browser headless instance and scroll the page until obtain the desired amount of videos and return it's urls.

Types

This section is empty.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL