stackexchange-xml-to-csv

command module
v0.0.0-...-ba6b397 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 14, 2020 License: MIT Imports: 3 Imported by: 0

README

stackexchange-xml-to-csv

CLI tool that allows you to convert Stack Exchange data dumps from XML to CSV format, which is more suitable for importing to the different databases.

Table of contents

Getting started

Before, ensure that you have:

  • Working Go environment with go version >= 1.14. Execute in the console go version command. It should display the current version of the compiler.
  • Archiver that can extract .7z files. Possible candidate is 7z.

Download database dump

Choose and download the database dump that you are going to convert.

Important: Stackoverflow dump stored in 8 separated 7z archives:

Extract

Extract archive(s) content file(s) to the directory from where you will convert files using 7z or another archiver.

Example with with academia.stackexchange.com.7z dump:

$ mkdir xml csv
$ 7z e academia.stackexchange.com.7z -oxml
$ ls xml/
Badges.xml  Comments.xml  PostHistory.xml  PostLinks.xml  Posts.xml  Tags.xml  Users.xml  Votes.xml

Building of stackexchange-xml-to-csv

Clone & build stackexchange-xml-to-csv converter:

$ git clone https://github.com/SkobelevIgor/stackexchange-xml-to-csv
$ cd stackexchange-xml-to-csv/
$ go build

XML to CSV converting

Now you have stackexchange-xml-to-csv executable file. Let’s convert XML files:

./stackexchange-xml-to-csv -—source-path=../xml --store-to-dir=../csv
List of possible flags:
  • source-path (Required) Absolute or relative path to the directory with an XML file(s) or to the separate XML file.
  • store-to-dir (Optional) Absolute or relative path to the directory where to store result CSV files.
  • skip-html-decoding (Optional) Some of the files (e.g., Posts.xml) contain escaped HTML. By default, the converter will decode them. To disable this behavior, use this flag.

RDBMS schema examples

Here you can find examples of the schema for the different databases:

License

MIT License

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL