wikiassignment

package module
v0.0.0-...-816ba0d Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 23, 2019 License: MIT Imports: 11 Imported by: 0

README

wikiassignment

GoDoc Reference Build Status Go Report Card Bugs Coverage Lines of Code Maintainability Rating Reliability Rating Security Rating Vulnerabilities Description

Package wikiassignment is a golang package that provides utility functions for automatically assigning wikipedia pages to topics.

Documentation

API documentation can be found in the associated godoc reference.

Topics data can be found in overpedia.

Installation

This package can be installed with the go get command:

go get github.com/negapedia/wikiassignment/...

Requirements

You will need a machine with internet connection, 16GB of RAM (for the english version) and docker storage base directory properly setted.

This package depends on PETSc. The associated dockerfile provides a complete environment in which use this package. Otherwise PETSc can be installed following the same steps as in the dockerfile or in the PETSc installation page.

Export options

  1. lang: wikipedia nationalization to parse or custom JSON, default it.
  2. date: wikipedia dump date in the format AAAAMMDD, default latest.

Examples of use

  1. docker run negapedia/wikiassignment export -lang en -date 20060102: basic usage, run the image on the english nationalization dump in date 2 January 2006 and store the result in the in-containter /data folder, containing: ..1. semanticgraph.json maps source page ID to the array of target page IDs. ..2. partition.json maps typology of node (article,category or topic) to the array of page IDs belonging to it. ..3. absorptionprobabilities.csv represents each page in a row with its ID and the weight assignment for each topic. ..4. pages.csv represents pages in the form requested by wiki2overpediadb.
  2. docker run -v /path/2/out/dir:/data negapedia/wikiassignment -d export -lang en: ..1. run the image as before. ..2. mount as a volume the guest /data folder to the host folder /path/2/out/dir, the output folder, so that at the end of the operations /path/2/out/dir will contain the result. This folder can be changed to an arbitrary folder of your choice. ..3. run the image in detatched mode. For further explanations please refer to docker run reference.

Useful commands

  1. docker pull negapedia/wikiassignment Update the image to the last revision.
  2. docker kill --signal=SIGQUIT $(docker ps -ql) Quit the last container and log trace dump.
  3. docker logs -f $(docker ps -ql) Fetch the logs of the last container.
  4. docker system prune -fa --volumes Remove all unused images and volume without asking for confirmation.

Documentation

Overview

Package wikiassignment provides utility functions for automatically assigning wikipedia pages to topics.

Index

Constants

View Source
const (
	//TopicNamespaceID represents topic namespace ID
	TopicNamespaceID = 6666
	//CategoryNamespaceID represents category namespace ID in Wikipedia dumps
	CategoryNamespaceID = 14
	//ArticleNamespaceID represents article namespace ID in Wikipedia dumps
	ArticleNamespaceID = 0
)

Variables

This section is empty.

Functions

func From

func From(ctx context.Context, tmpDir, lang string) (page2Topic map[uint32]uint32, namespaces struct{ Topics, Categories, Articles []uint32 }, err error)

From transforms the sematic graph from the input into a page-topic assignment

Types

type Filter

type Filter struct {
	IsWhitelist bool
	Parents     []uint32
	Dept        int
}

Filter represents a filter to be applied to the semantic graph before the transformation into assignment

type SemanticGraphSources

type SemanticGraphSources struct {
	Dumps            func(string) (io.ReadCloser, error)
	TopicAssignments map[uint32][]uint32
	Filters          []Filter
}

SemanticGraphSources represents the data sources needed to build the wikipedia semantic graph

func (SemanticGraphSources) Build

func (p SemanticGraphSources) Build(ctx context.Context) (g map[uint32][]uint32, ids2CatDistance map[uint32]uint32, namespace2Ids map[int]*roaring.Bitmap, err error)

Build returns the semantic graph, the distance in hops from any node to the closer topic and a map from namespaces ID to pages ID.

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL