Documentation ¶
Overview ¶
package freddie. This package imports the loan-level residential mortgage data provided by Freddie Mac into ClickHouse. The data is available here: https://www.freddiemac.com/research/datasets/sf-loanlevel-dataset.
The final result is a single data with nested arrays for time-varying fields. Features:
- The data is subject to QA. The results are presented as two string fields in a KeyVal format.
- A "DESCRIBE" of the output table provides info on each field
- New fields created are:
- vintage (e.g. 2010Q2)
- standard - Y/N flag, Y=standard process loan
- loan age based on first pay date
- numeric dq field
- reo flag
- property value at origination
- file names from which the loan was loaded
- QA results. There are three sets of fields:
- The nested table qa that has two arrays:
- field. The name of a field that has validation issues.
- cntFail. The number of months for which this field failed qa. For static fields, this value will be 1.
- allFail. An array of field names which failed for qa. For monthly fields, this means the field failed for all months.
command-line parameters:
-host ClickHouse IP address. Default: 127.0.0.1. -user ClickHouse user. Default: default -password ClickHouse password for user. Default: <empty>. -table ClickHouse table in which to insert the data. -create if Y, then the table is created/reset. Default: Y. -dir directory with Freddie Mac text files. -tmp ClickHouse database to use for temporary tables. -concur # of concurrent processes to use in loading monthly files. Default: 1. -memory max memory usage by ClickHouse. Default: 40000000000. -groupby max_bytes_before_external_groupby ClickHouse parameter. Default: 20000000000.
Since the standard and non-standard datasets have the same format, this utility can be used to create tables using either source. A combined table can be built by running the app twice pointing to the same -table. On the first run, set -create Y and set -create N for the second run.
Look at the example in the joined package for the DESCRIBE output of the table.
Note that the table produced by this package has slightly fewer loans than the check figures provided by Freddie. The difference seems to be that there are some loans in the static file that are not in the monthly file. With data through 2021Q3, this totals 1484 standard loans (HARP and non-HARP), and 207 non-standard loans.
Directories ¶
Path | Synopsis |
---|---|
Package joined joins the static and monthly tables created by the static and monthly packages
|
Package joined joins the static and monthly tables created by the static and monthly packages |
Package monthly loads a single quarter of monthly data into ClickHouse
|
Package monthly loads a single quarter of monthly data into ClickHouse |
Package static loads a single quarter of static data into ClickHouse.
|
Package static loads a single quarter of static data into ClickHouse. |