Social Record
Distributed scraping and analysis pipeline for a range of social media platforms
Shields
Table of content
About
The goal of this project is to raise awareness about data privacy. The mean to do so is a tool to scrape, combine and analyze public data from multiple social media sources.
The results will be available via an API, used for some kind of art exhibition.
Architectural overview
You can find an more detailed overview here.
Open it in draw.io and have a look at the different tabs "High level overview", "Distributed Scraper" and "Face Search".
Further reading
Detailed documentation
Wanna contribute?
If you want to join us raising awareness for data privacy have a look into CONTRIBUTING.md
List of contributors
Deployment
The deployment of this project to kubernetes happens in codeuniversity/smag-deploy (this is a private repo!)
Getting started
Requirements
Preparation
If this is your first time running this:
- Add
127.0.0.1 my-kafka
and 127.0.0.1 minio
to your /etc/hosts
file
- Choose a
<user_name>
for your platform of choice <instagram|twitter>
as a starting point and run
$ go run cli/main/main.go <instagram|twitter> <user_name>
Scraper
Run the instagram- or twitter-scraper in docker:
$ make run-<platform_name>