gis/README.md

172 lines
7.8 KiB
Markdown

# gis
A Go service scaffold following [golang-standards/project-layout](https://github.com/golang-standards/project-layout),
with cleanly separated layers: HTTP transport → services → repositories, plus
RabbitMQ messaging and embedded database migrations. Single binary, three
subcommands.
## Layout
```
cmd/gis/ binary entrypoint
internal/
cli/ cobra commands: serve, worker, migrate
config/ env-based configuration
app/ composition root (wires all dependencies)
domain/ entities, enums, sentinel errors
repository/postgres/ pgx-backed repositories
service/ business logic
transport/http/ chi router, middleware, handlers
storage/s3/ MinIO/S3 object storage
messaging/rabbitmq/ connection, publisher, consumer
platform/logger/ slog setup
pkg/httputil/ generic JSON/validation HTTP helpers
migrations/ embedded goose SQL migrations
configs/ .env.example
deployments/ docker-compose (postgres, minio, rabbitmq)
build/package/ Dockerfile
api/openapi.yaml OpenAPI 3.1.1 spec (embedded + served at /openapi.yaml)
```
## Domain
- **Category** — hierarchical (self-referencing `parent_id`). Full CRUD; cycle-safe
on update.
- **Dataset** — a geo file uploaded to S3/MinIO (`file_type`: `vector_with_kato |
vector | raster`), belonging to one Category. Carries `code`/`name`/`description`/
`unit` metadata, a user-defined `meta` (JSONB) blob, an `automated` flag, a
`status` lifecycle field (defaults to `pending`), `properties` (JSONB, populated
from the file's attribute table), and a PostGIS `geometry` footprint stored in
EPSG:4326 (returned as GeoJSON, with a STAC-style `bbox` array for rasters).
Upload / list / get / download / delete (delete also removes the stored object).
Uploads are validated three ways before being stored: the `file_type` enum, the
file **extension** (must be allowed for the type), and a **content** magic-byte
check (TIFF for `.tif`, ZIP for `.zip`, SQLite for `.gpkg`, JSON for `.geojson`)
so mislabeled files are rejected with 422 up front.
Every uploaded file is then processed asynchronously by the worker, dispatched by
`file_type`:
- **`vector`** — the attribute table is parsed and stored (as a JSON array of row
objects) in `properties` (`status` `processing``ready`).
- **`raster`** — converted to a **Cloud-Optimized GeoTIFF** via `gdal_translate
-of COG` (`processing` → `ready`); the COG is stored under `cog_storage_key`
(the original is kept) and the footprint `geometry` + `bbox` are read from the
raster extent. Requires GDAL in the worker image (`gdal-tools`).
- **`vector_with_kato`** — the column-selection flow below (`parsing` →
`awaiting_mapping``extracting``ready`).
- **events** + the example RabbitMQ consumer/publisher are a generic messaging
scaffold kept alongside the real async flows.
### vector_with_kato two-phase flow
Uploading a `vector_with_kato` file (zipped shapefile, GeoJSON, or GeoPackage)
triggers asynchronous parsing of its attribute table, after which the user maps
the KATO column and the year columns:
1. `POST /datasets` with `file_type=vector_with_kato` → dataset created with
`status=parsing`; a `dataset.parse` job is published to RabbitMQ.
2. The **worker** consumes the job, parses the file's columns (with sample
values; CP1251/Cyrillic aware for shapefiles) and stores them in
`attribute_columns`; `status``awaiting_mapping` (or `failed` with
`parse_error`).
3. The client polls `GET /datasets/{id}` until `awaiting_mapping`, then submits
`POST /datasets/{id}/mapping` with the chosen `kato_column` and a
`year_columns` map (each `{column, date}`). Validated against the detected
columns; `status``extracting`.
4. A second worker job **unpivots** the attribute table into long-format
`dataset_observations` — one row per `(kato_code, date)` with a numeric
`value` (or `value_text` for non-numeric cells); `status``ready`. Read
them via `GET /datasets/{id}/observations` (paginated, optional
`?kato_code=`).
```sh
curl -X POST localhost:8080/datasets/<id>/mapping -H 'Content-Type: application/json' -d '{
"kato_column": "като",
"year_columns": [
{"column": "F_2023", "date": "2023-01-01"},
{"column": "D_2025", "date": "2025-01-01"}
]
}'
```
## Getting started
```sh
cp configs/.env.example .env
docker compose -f deployments/docker-compose.yml up -d postgres minio rabbitmq
go run ./cmd/gis migrate up # apply migrations
go run ./cmd/gis serve # HTTP server on :8080
go run ./cmd/gis worker --publish-example # consume (and seed one message)
```
Health: `GET /healthz` (liveness), `GET /readyz` (DB + S3 + RabbitMQ).
### HTTP API
The API is described by an **OpenAPI 3.1.1** spec at
[`api/openapi.yaml`](api/openapi.yaml), embedded into the binary. While the
server runs it is served at `/openapi.yaml`, with an interactive **Redoc** UI at
`/docs`.
| Method | Path | Description |
|--------|----------------------------|--------------------------------------|
| GET | `/categories` | list (optional `?parent_id=`) |
| POST | `/categories` | create (`name`, `description`, `parent_id?`) |
| GET | `/categories/{id}` | get |
| PUT | `/categories/{id}` | update |
| DELETE | `/categories/{id}` | delete |
| GET | `/datasets` | paginated list of summaries (`?page=`, `?page_size=`, `?category_id=`) |
| POST | `/datasets` | upload (multipart: `file`, `file_type`, `category_id`, `code`, `name`, `description?`, `unit?`, `meta?` (JSON), `automated?` (bool)) |
| GET | `/datasets/{id}` | full dataset (geometry as GeoJSON, `bbox` for rasters) |
| GET | `/datasets/{id}/status` | processing status; long-polls with `?current=<status>` (holds up to `?wait=` secs, default 25, max 60) |
| GET | `/datasets/{id}/download` | download the stored file |
| POST | `/datasets/{id}/mapping` | set KATO column + year→date map (vector_with_kato) |
| GET | `/datasets/{id}/observations` | paginated unpivoted values (`?kato_code=`, `?page=`, `?page_size=`) |
| DELETE | `/datasets/{id}` | delete (row + object) |
Example upload:
```sh
curl -X POST localhost:8080/datasets \
-F file=@sample.geojson -F file_type=vector -F category_id=<uuid> \
-F code=POP_2026 -F name=Population -F description="Resident population" -F unit=people
```
## Migrations
Embedded via goose and run through the binary. The first migration enables the
PostGIS extension (the database runs the `postgis/postgis` image), so a PostGIS-
capable Postgres is required.
```sh
go run ./cmd/gis migrate up|down|status|reset
go run ./cmd/gis migrate fresh # drop everything in the schema and re-run
```
> On Apple Silicon, `postgis/postgis` has no native arm64 build, so the compose
> file pins `platform: linux/amd64` (Docker Desktop emulates it). Remove that line
> on amd64 hosts.
## Development
Common tasks are wrapped in the `Makefile` (run `make help` for the full list):
```sh
make up # start postgres, minio, rabbitmq
make migrate-fresh # drop the schema and re-apply migrations
make run # run the HTTP server
make check # go vet + go test
make lint # golangci-lint (if installed)
```
CI (`.github/workflows/ci.yml`) runs build, vet, `go test -race`, and golangci-lint
on every push and pull request.
## Adding a feature
Each new domain is one vertical slice mirroring Category/Dataset:
`domain/``repository/postgres/``service/``transport/http/`
(+ `messaging/rabbitmq/` if it needs async processing), wired in `internal/app`.