Copyright 2002 Daniel Egnor. See the LICENSE file. These were my personal notes kept while building the system, and mostly have to do with file formats used in the indexing process. Feel free to peruse this if it amuses you. -------------------------------------------------------------------------------- Step 1: Address records converted from TIGER record type - 'G' 0:1 zipcode 1:5 parity - E or O 6:1 name - e.g. NE Pennsylvania Ave 7:52 address - e.g. 350 59:11 side - X (invalid), L or R 70:1 long, e.g. -122123768 71:10 lat, e.g. 47164378 81:9 Step 2: Names extracted from Step 1 and converted from FIPS record type - 'N' 0:1 name type - see below 1:1 name - e.g. NE Pennsylvania Ave 2:52 start zipcode 54:5 start index offset 59:10 end zipcode 69:5 end index offset 74:10 Step 2a: (same as step 2, just renamed and with A records) Step 3: Names extracted from Step 2 record type - 'R' 0:1 name type - see below 1:1 name - e.g. NE Pennsylvania Ave 2:52 range offset 54:10 Name types: E - street even side, e.g. NE Pennsylvania Ave O - street odd side, e.g. NE Pennsylvania Ave C - city, e.g. Washington S - state code, e.g. DC A - abbreviation or synonym -------------------------------------------------------------------------------- Geo index file: zip_offset: int -> zip name_offset: int -> name end_offset: int -> end zip -> int[100000] -> addr addr -> { next_offset: int long, lat: int addr: int (/2) list: { addr: short (/2) long, lat: char side: char } [] } [] range -> { begin, end: int -> addr } [] name -> { type: char name: char[40] range: int -> range } [] end -> EOF -------------------------------------------------------------------------------- Document record type - 'XD' 0:2 batch number 2:9 document number 11:9 URL 20:EOL Location record type - 'XL' 0:2 batch number 2:9 document number 11:9 longitude, e.g. -122123768 20:10 latitude, e.g. 47164378 30:9 address text 39:EOL Region record type - 'XR' 0:2 scale exponent 2:2 east coordinate 4:6 north coordinate 10:6 batch number 16:9 document number 25:9 Term record type - 'XT' 0:2 text 2:9 batch number 11:9 document number 20:9 -------------------------------------------------------------------------------- Document index file: region_offset: int -> region term_offset: int -> term end_offset: int -> location document -> { 1 loc: int -> location 1 href: int -> string 2 } [] # TODO: swap address to the top? location -> { 1 longitude, latitude: int (*1000000) 1 address: int -> string 2 } [] # TODO: swap zone to the top? region -> { 1 scale: char 1 east, north: int 1 int -> zone 2 } [] # TODO: swap zone to the top? term -> { 1 char[9] 1 int -> zone 2 } [] string -> char [] 2 zone -> int [] -> document 2