gis/README.md

# gis

A Go service scaffold following [golang-standards/project-layout](https://github.com/golang-standards/project-layout),
with cleanly separated layers: HTTP transport → services → repositories, plus
RabbitMQ messaging and embedded database migrations. Single binary, three
subcommands.

## Layout

```
cmd/gis/                 binary entrypoint
internal/
  cli/                   cobra commands: serve, worker, migrate
  config/                env-based configuration
  app/                   composition root (wires all dependencies)
  domain/                entities, enums, sentinel errors
  repository/postgres/   pgx-backed repositories
  service/               business logic
  transport/http/        chi router, middleware, handlers
  storage/s3/            MinIO/S3 object storage
  messaging/rabbitmq/    connection, publisher, consumer
  platform/logger/       slog setup
pkg/httputil/            generic JSON/validation HTTP helpers
migrations/              embedded goose SQL migrations
configs/                 .env.example
deployments/             docker-compose (postgres, minio, rabbitmq)
build/package/           Dockerfile
api/openapi.yaml         OpenAPI 3.1.1 spec (embedded + served at /openapi.yaml)
```

## Domain

- **Category** — hierarchical (self-referencing `parent_id`). Full CRUD; cycle-safe
  on update.
- **Dataset** — a geo file uploaded to S3/MinIO (`file_type`: `vector_with_kato |
  vector | raster`), belonging to one Category. Carries `code`/`name`/`description`/
  `unit` metadata, a user-defined `meta` (JSONB) blob, an `automated` flag, a
  `status` lifecycle field (defaults to `pending`), `properties` (JSONB, populated
  from the file's attribute table), and a PostGIS `geometry` footprint stored in
  EPSG:4326 (returned as GeoJSON, with a STAC-style `bbox` array for rasters).
  Upload / list / get / download / delete (delete also removes the stored object).
Uploads are validated three ways before being stored: the `file_type` enum, the
file **extension** (must be allowed for the type), and a **content** magic-byte
check (TIFF for `.tif`, ZIP for `.zip`, SQLite for `.gpkg`, JSON for `.geojson`)
so mislabeled files are rejected with 422 up front.

Every uploaded file is then processed asynchronously by the worker, dispatched by
`file_type`:

- **`vector`** — the attribute table is parsed and stored (as a JSON array of row
  objects) in `properties` (`status` `processing` → `ready`).
- **`raster`** — converted to a **Cloud-Optimized GeoTIFF** via `gdal_translate
  -of COG` (`processing` → `ready`); the COG is stored under `cog_storage_key`
  (the original is kept) and the footprint `geometry` + `bbox` are read from the
  raster extent. Requires GDAL in the worker image (`gdal-tools`).
- **`vector_with_kato`** — the column-selection flow below (`parsing` →
  `awaiting_mapping` → `extracting` → `ready`).
- **events** + the example RabbitMQ consumer/publisher are a generic messaging
  scaffold kept alongside the real async flows.

### vector_with_kato two-phase flow

Uploading a `vector_with_kato` file (zipped shapefile, GeoJSON, or GeoPackage)
triggers asynchronous parsing of its attribute table, after which the user maps
the KATO column and the year columns:

1. `POST /datasets` with `file_type=vector_with_kato` → dataset created with
   `status=parsing`; a `dataset.parse` job is published to RabbitMQ.
2. The **worker** consumes the job, parses the file's columns (with sample
   values; CP1251/Cyrillic aware for shapefiles) and stores them in
   `attribute_columns`; `status` → `awaiting_mapping` (or `failed` with
   `parse_error`).
3. The client polls `GET /datasets/{id}` until `awaiting_mapping`, then submits
   `POST /datasets/{id}/mapping` with the chosen `kato_column` and a
   `year_columns` map (each `{column, date}`). Validated against the detected
   columns; `status` → `extracting`.
4. A second worker job **unpivots** the attribute table into long-format
   `dataset_observations` — one row per `(kato_code, date)` with a numeric
   `value` (or `value_text` for non-numeric cells); `status` → `ready`. Read
   them via `GET /datasets/{id}/observations` (paginated, optional
   `?kato_code=`).

```sh
curl -X POST localhost:8080/datasets/<id>/mapping -H 'Content-Type: application/json' -d '{
  "kato_column": "като",
  "year_columns": [
    {"column": "F_2023", "date": "2023-01-01"},
    {"column": "D_2025", "date": "2025-01-01"}
  ]
}'
```

## Getting started

```sh
cp configs/.env.example .env
docker compose -f deployments/docker-compose.yml up -d postgres minio rabbitmq

go run ./cmd/gis migrate up        # apply migrations
go run ./cmd/gis serve             # HTTP server on :8080
go run ./cmd/gis worker --publish-example   # consume (and seed one message)
```

Health: `GET /healthz` (liveness), `GET /readyz` (DB + S3 + RabbitMQ).

### HTTP API

The API is described by an **OpenAPI 3.1.1** spec at
[`api/openapi.yaml`](api/openapi.yaml), embedded into the binary. While the
server runs it is served at `/openapi.yaml`, with an interactive **Redoc** UI at
`/docs`.

| Method | Path                       | Description                          |
|--------|----------------------------|--------------------------------------|
| GET    | `/categories`              | list (optional `?parent_id=`)        |
| POST   | `/categories`              | create (`name`, `description`, `parent_id?`) |
| GET    | `/categories/{id}`         | get                                  |
| PUT    | `/categories/{id}`         | update                               |
| DELETE | `/categories/{id}`         | delete                               |
| GET    | `/datasets`                | paginated list of summaries (`?page=`, `?page_size=`, `?category_id=`) |
| POST   | `/datasets`                | upload (multipart: `file`, `file_type`, `category_id`, `code`, `name`, `description?`, `unit?`, `meta?` (JSON), `automated?` (bool)) |
| GET    | `/datasets/{id}`           | full dataset (geometry as GeoJSON, `bbox` for rasters) |
| GET    | `/datasets/{id}/status`    | processing status; long-polls with `?current=<status>` (holds up to `?wait=` secs, default 25, max 60) |
| GET    | `/datasets/{id}/download`  | download the stored file             |
| POST   | `/datasets/{id}/mapping`   | set KATO column + year→date map (vector_with_kato) |
| GET    | `/datasets/{id}/observations` | paginated unpivoted values (`?kato_code=`, `?page=`, `?page_size=`) |
| DELETE | `/datasets/{id}`           | delete (row + object)                |

Example upload:

```sh
curl -X POST localhost:8080/datasets \
  -F file=@sample.geojson -F file_type=vector -F category_id=<uuid> \
  -F code=POP_2026 -F name=Population -F description="Resident population" -F unit=people
```

## Migrations

Embedded via goose and run through the binary. The first migration enables the
PostGIS extension (the database runs the `postgis/postgis` image), so a PostGIS-
capable Postgres is required.

```sh
go run ./cmd/gis migrate up|down|status|reset
go run ./cmd/gis migrate fresh    # drop everything in the schema and re-run
```

> On Apple Silicon, `postgis/postgis` has no native arm64 build, so the compose
> file pins `platform: linux/amd64` (Docker Desktop emulates it). Remove that line
> on amd64 hosts.

## Development

Common tasks are wrapped in the `Makefile` (run `make help` for the full list):

```sh
make up            # start postgres, minio, rabbitmq
make migrate-fresh # drop the schema and re-apply migrations
make run           # run the HTTP server
make check         # go vet + go test
make lint          # golangci-lint (if installed)
```

CI (`.github/workflows/ci.yml`) runs build, vet, `go test -race`, and golangci-lint
on every push and pull request.

## Adding a feature

Each new domain is one vertical slice mirroring Category/Dataset:
`domain/` → `repository/postgres/` → `service/` → `transport/http/`
(+ `messaging/rabbitmq/` if it needs async processing), wired in `internal/app`.