diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index af4129437b8c45c737584d563896584d391b70cc..26aaffd5b4841ecccb9989f7ac816952d0d4e716 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -153,6 +153,7 @@ jobs: scripts/convert_image.py scripts/download_colmap.bat scripts/download_ffmpeg.bat + scripts/nerfcapture2nerf.py scripts/nsvf2nerf.py scripts/record3d2nerf.py diff --git a/docs/nerf_dataset_tips.md b/docs/nerf_dataset_tips.md index de311e80e227caca34fc73203ef42bcf433001d0..3ed090859368a0d85d952a86a472aa282bc43ab5 100644 --- a/docs/nerf_dataset_tips.md +++ b/docs/nerf_dataset_tips.md @@ -53,12 +53,12 @@ See [nerf_loader.cu](src/nerf_loader.cu) for implementation details and addition ## Preparing new NeRF datasets -To train on self-captured data, one has to process the data into an existing format supported by Instant-NGP. We provide scripts to support two complementary approaches: +To train on self-captured data, one has to process the data into an existing format supported by __instant-ngp__. We provide scripts to support three approaches: - [COLMAP](#COLMAP) to create a dataset from a set of photos or a video you took - - [Record3D](#Record3D) to create a dataset with an iPhone 12 Pro or newer (based on ARKit) +- [NeRFCapture](#NeRFCapture) to create a dataset or stream posed images directly to __instant-ngp__ with an iOS device. -Both require [Python](https://www.python.org/) 3.7 or higher to be installed and available in your PATH. +All require [Python](https://www.python.org/) 3.7 or higher to be installed and available in your PATH. On Windows you can [download an installer from here](https://www.python.org/downloads/). During installation, make sure to check "add python.exe to PATH". @@ -104,14 +104,14 @@ The script will run (and install, if you use Windows) FFmpeg and COLMAP as neede By default, the script invokes colmap with the "sequential matcher", which is suitable for images taken from a smoothly changing camera path, as in a video. The exhaustive matcher is more appropriate if the images are in no particular order, as shown in the image example above. For more options, you can run the script with `--help`. For more advanced uses of COLMAP or for challenging scenes, please see the [COLMAP documentation](https://colmap.github.io/cli.html); you may need to modify the [scripts/colmap2nerf.py](/scripts/colmap2nerf.py) script itself. -The `aabb_scale` parameter is the most important `instant-ngp` specific parameter. It specifies the extent of the scene, defaulting to 1; that is, the scene is scaled such that the camera positions are at an average distance of 1 unit from the origin. For small synthetic scenes such as the original NeRF dataset, the default `aabb_scale` of 1 is ideal and leads to fastest training. The NeRF model makes the assumption that the training images can entirely be explained by a scene contained within this bounding box. However, for natural scenes where there is a background that extends beyond this bounding box, the NeRF model will struggle and may hallucinate "floaters" at the boundaries of the box. By setting `aabb_scale` to a larger power of 2 (up to a maximum of 128), the NeRF model will extend rays to a much larger bounding box. Note that this can impact training speed slightly. If in doubt, for natural scenes, start with an `aabb_scale` of 128, and subsequently reduce it if possible. The value can be directly edited in the `transforms.json` output file, without re-running the [scripts/colmap2nerf.py](/scripts/colmap2nerf.py) script. +The `aabb_scale` parameter is the most important __instant-ngp__ specific parameter. It specifies the extent of the scene, defaulting to 1; that is, the scene is scaled such that the camera positions are at an average distance of 1 unit from the origin. For small synthetic scenes such as the original NeRF dataset, the default `aabb_scale` of 1 is ideal and leads to fastest training. The NeRF model makes the assumption that the training images can entirely be explained by a scene contained within this bounding box. However, for natural scenes where there is a background that extends beyond this bounding box, the NeRF model will struggle and may hallucinate "floaters" at the boundaries of the box. By setting `aabb_scale` to a larger power of 2 (up to a maximum of 128), the NeRF model will extend rays to a much larger bounding box. Note that this can impact training speed slightly. If in doubt, for natural scenes, start with an `aabb_scale` of 128, and subsequently reduce it if possible. The value can be directly edited in the `transforms.json` output file, without re-running the [scripts/colmap2nerf.py](/scripts/colmap2nerf.py) script. You can optionally pass in object categories (e.g. `--mask_categories person car`) which runs [Detectron2](https://github.com/facebookresearch/detectron2) to generate masks automatically. __instant-ngp__ will not use the masked pixels for training. This utility is helpful for users who wish to ignore moving or sensitive objects such as people, cars, or bikes. See [scripts/category2id.json](/scripts/category2id.json) for a list of categories. -Assuming success, you can now train your NeRF model as follows, starting in the `instant-ngp` folder: +Assuming success, you can now train your NeRF model as follows, starting in the __instant-ngp__ folder: ```sh instant-ngp$ ./instant-ngp [path to training data folder containing transforms.json] @@ -119,7 +119,7 @@ instant-ngp$ ./instant-ngp [path to training data folder containing transforms.j ### Record3D -With an >=iPhone 12 Pro, one can use [Record3D](https://record3d.app/) to collect data and avoid COLMAP. [Record3D](https://record3d.app/) is an iOS app that relies on ARKit to estimate each image's camera pose. It is more robust than COLMAP for scenes that lack textures or contain repetitive patterns. To train Instant-NGPs with Record3D data, follow these steps: +With an >=iPhone 12 Pro, one can use [Record3D](https://record3d.app/) to collect data and avoid COLMAP. [Record3D](https://record3d.app/) is an iOS app that relies on ARKit to estimate each image's camera pose. It is more robust than COLMAP for scenes that lack textures or contain repetitive patterns. To train __instant-ngp__ with Record3D data, follow these steps: 1. Record a video and export with the "Shareable/Internal format (.r3d)". 2. Send the exported data to your computer. @@ -130,14 +130,43 @@ With an >=iPhone 12 Pro, one can use [Record3D](https://record3d.app/) to collec ``` If you capture the scene in the landscape orientation, add `--rotate`. -5. Launch Instant-NGP training: +5. Launch __instant-ngp__ training: ``` instant-ngp$ ./instant-ngp path/to/data ``` + +### NeRFCapture + +[NeRFCapture](https://github.com/jc211/NeRFCapture) is an iOS app that runs on any ARKit device. It allows you to stream images directly from your phone to __instant-ngp__ thus enabling a more interactive experience. It can also collect an offline dataset for later use. + +The following dependencies are needed to run the NeRFCapture script: +``` +pip install cyclonedds +``` + +To stream: +1. Open the NeRFCapture app +2. Run the script with the --stream flag enabled. + ``` + instant-ngp$ python scripts/nerfcapture2nerf.py --stream + ``` +3. Wait for the connection between the app and the script to occur. This will be indicated on the app. +4. Click the send button on the app. The frame captured will be sent to __instant-ngp__. +5. Toggle training + +To save a dataset: +1. Open the NeRFCapture app +2. Run the script with the --save_path flag. The n_frames flag indicates how many frames to capture before saving the dataset. + ``` + instant-ngp$ python scripts/nerfcapture2nerf.py --save_path "dir1/dir2" --n_frames 20 + ``` +3. Wait for the connection between the app and the script to occur. This will be indicated on the app. +4. Click the send button on the app. The frame captured will be saved to the dataset folder on the computer running the script. + ## Tips for NeRF training data -The NeRF model trains best with between 50-150 images which exhibit minimal scene movement, motion blur or other blurring artefacts. The quality of reconstruction is predicated on COLMAP being able to extract accurate camera parameters from the images. +The NeRF model trains best with between 50-150 images which exhibit minimal scene movement, motion blur or other blurring artifacts. The quality of reconstruction is predicated on the previous scripts being able to extract accurate camera parameters from the images. Review the earlier sections for information on how to verify this. -The `colmap2nerf.py` script assumes that the training images are all pointing approximately at a shared point of interest, which it places at the origin. This point is found by taking a weighted average of the closest points of approach between the rays through the central pixel of all pairs of training images. In practice, this means that the script works best when the training images have been captured pointing inwards towards the object of interest, although they do not need to complete a full 360 view of it. Any background visible behind the object of interest will still be reconstructed if `aabb_scale` is set to a number larger than 1, as explained above. +The `colmap2nerf.py` and `record3d2nerf.py` scripts assumes that the training images are all pointing approximately at a shared point of interest, which it places at the origin. This point is found by taking a weighted average of the closest points of approach between the rays through the central pixel of all pairs of training images. In practice, this means that the script works best when the training images have been captured pointing inwards towards the object of interest, although they do not need to complete a full 360 view of it. Any background visible behind the object of interest will still be reconstructed if `aabb_scale` is set to a number larger than 1, as explained above. diff --git a/scripts/nerfcapture2nerf.py b/scripts/nerfcapture2nerf.py new file mode 100644 index 0000000000000000000000000000000000000000..60c7e062473f1a9e5d0751ffad0cc17867ef647c --- /dev/null +++ b/scripts/nerfcapture2nerf.py @@ -0,0 +1,243 @@ +#!/usr/bin/env python3 +# Streaming/Dataset capture script for the NeRFCapture iOS App + +import argparse +import cv2 +from pathlib import Path +import json +import shutil + +import cyclonedds.idl as idl +import cyclonedds.idl.annotations as annotate +import cyclonedds.idl.types as types +from dataclasses import dataclass +from cyclonedds.domain import DomainParticipant, Domain +from cyclonedds.core import Qos, Policy +from cyclonedds.sub import DataReader +from cyclonedds.topic import Topic +from cyclonedds.util import duration + +from common import * +import pyngp as ngp # noqa + +def parse_args(): + parser = argparse.ArgumentParser() + parser.add_argument("--stream", action="store_true", help="Stream images directly to InstantNGP.") + parser.add_argument("--n_frames", default=10, type=int, help="Number of frames before saving the dataset. Also used as the number of cameras to remember when streaming.") + parser.add_argument("--save_path", required='--stream' not in sys.argv, type=str, help="Path to save the dataset.") + parser.add_argument("--depth_scale", default=10.0, type=float, help="Depth scale used when saving depth. Only used when saving dataset.") + parser.add_argument("--overwrite", action="store_true", help="Rewrite over dataset if it exists.") + return parser.parse_args() + + +# DDS +# ================================================================================================== +@dataclass +@annotate.final +@annotate.autoid("sequential") +class NeRFCaptureFrame(idl.IdlStruct, typename="NeRFCaptureData.NeRFCaptureFrame"): + id: types.uint32 + annotate.key("id") + timestamp: types.float64 + fl_x: types.float32 + fl_y: types.float32 + cx: types.float32 + cy: types.float32 + transform_matrix: types.array[types.float32, 16] + width: types.uint32 + height: types.uint32 + image: types.sequence[types.uint8] + has_depth: bool + depth_width: types.uint32 + depth_height: types.uint32 + depth_scale: types.float32 + depth_image: types.sequence[types.uint8] + + +dds_config = """<?xml version="1.0" encoding="UTF-8" ?> \ +<CycloneDDS xmlns="https://cdds.io/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://cdds.io/config https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/master/etc/cyclonedds.xsd"> \ + <Domain id="any"> \ + <Internal> \ + <MinimumSocketReceiveBufferSize>10MB</MinimumSocketReceiveBufferSize> \ + </Internal> \ + <Tracing> \ + <Verbosity>config</Verbosity> \ + <OutputFile>stdout</OutputFile> \ + </Tracing> \ + </Domain> \ +</CycloneDDS> \ +""" +# ================================================================================================== + +def set_frame(testbed, frame_idx: int, rgb: np.ndarray, depth: np.ndarray, depth_scale: float, X_WV: np.ndarray, fx: float, fy: float, cx: float, cy: float): + testbed.nerf.training.set_image(frame_idx = frame_idx, img=rgb, depth_img=depth, depth_scale=depth_scale*testbed.nerf.training.dataset.scale) + testbed.nerf.training.set_camera_extrinsics(frame_idx=frame_idx, camera_to_world=X_WV) + testbed.nerf.training.set_camera_intrinsics(frame_idx=frame_idx, fx=fx, fy=fy, cx=cx, cy=cy) + + +def live_streaming_loop(reader: DataReader, max_cameras: int): + # Start InstantNGP + testbed = ngp.Testbed(ngp.TestbedMode.Nerf) + testbed.init_window(1920, 1080) + testbed.reload_network_from_file() + testbed.visualize_unit_cube = True + testbed.nerf.visualize_cameras = True + + camera_index = 0 # Current camera index we are replacing in InstantNGP + total_frames = 0 # Total frames received + + # Create Empty Dataset + testbed.create_empty_nerf_dataset(max_cameras, aabb_scale=1) + testbed.nerf.training.n_images_for_training = 0 + testbed.up_dir = np.array([1.0, 0.0, 0.0]) + + # Start InstantNGP and DDS Loop + while testbed.frame(): + sample = reader.read_next() # Get frame from NeRFCapture + if sample: + print(f"Frame {total_frames + 1} received") + + # RGB + image = np.asarray(sample.image, dtype=np.uint8).reshape( + (sample.height, sample.width, 3)).astype(np.float32)/255.0 + image = np.concatenate( + [image, np.zeros((sample.height, sample.width, 1), dtype=np.float32)], axis=-1) + + # Depth if available + depth = None + if sample.has_depth: + depth = np.asarray(sample.depth_image, dtype=np.uint8).view( + dtype=np.float32).reshape((sample.depth_height, sample.depth_width)) + depth = cv2.resize(depth, dsize=( + sample.width, sample.height), interpolation=cv2.INTER_NEAREST) + + # Transform + X_WV = np.asarray(sample.transform_matrix, + dtype=np.float32).reshape((4, 4)).T[:3, :] + + # Add frame to InstantNGP + set_frame(testbed, + frame_idx=camera_index, + rgb=srgb_to_linear(image), + depth=depth, + depth_scale=1, + X_WV=X_WV, + fx=sample.fl_x, + fy=sample.fl_y, + cx=sample.cx, + cy=sample.cy) + + # Update index + total_frames += 1 + testbed.nerf.training.n_images_for_training = min(total_frames, max_cameras) + camera_index = (camera_index + 1) % max_cameras + + if total_frames == 1: + testbed.first_training_view() + testbed.render_groundtruth = True + +def dataset_capture_loop(reader: DataReader, save_path: Path, overwrite: bool, n_frames: int): + if save_path.exists(): + if overwrite: + # Prompt user to confirm deletion + if (input(f"warning! folder '{save_path}' will be deleted/replaced. continue? (Y/n)").lower().strip()+"y")[:1] != "y": + sys.exit(1) + shutil.rmtree(save_path) + else: + print(f"save_path {save_path} already exists") + sys.exit(1) + + print("Waiting for frames...") + # Make directory + images_dir = save_path.joinpath("images") + + manifest = { + "fl_x": 0.0, + "fl_y": 0.0, + "cx": 0.0, + "cy": 0.0, + "w": 0.0, + "h": 0.0, + "frames": [] + } + + total_frames = 0 # Total frames received + + # Start DDS Loop + while True: + sample = reader.read_next() # Get frame from NeRFCapture + if sample: + print(f"{total_frames + 1}/{n_frames} frames received") + + if total_frames == 0: + save_path.mkdir(parents=True) + images_dir.mkdir() + manifest["w"] = sample.width + manifest["h"] = sample.height + manifest["cx"] = sample.cx + manifest["cy"] = sample.cy + manifest["fl_x"] = sample.fl_x + manifest["fl_y"] = sample.fl_y + manifest["integer_depth_scale"] = float(args.depth_scale)/65535.0 + + # RGB + image = np.asarray(sample.image, dtype=np.uint8).reshape((sample.height, sample.width, 3)) + cv2.imwrite(str(images_dir.joinpath(f"{total_frames}.png")), cv2.cvtColor(image, cv2.COLOR_RGB2BGR)) + + # Depth if avaiable + depth = None + if sample.has_depth: + depth = np.asarray(sample.depth_image, dtype=np.uint8).view( + dtype=np.float32).reshape((sample.depth_height, sample.depth_width)) + depth = (depth*65535/float(args.depth_scale)).astype(np.uint16) + depth = cv2.resize(depth, dsize=( + sample.width, sample.height), interpolation=cv2.INTER_NEAREST) + cv2.imwrite(str(images_dir.joinpath(f"{total_frames}.depth.png")), depth) + + # Transform + X_WV = np.asarray(sample.transform_matrix, + dtype=np.float32).reshape((4, 4)).T + + frame = { + "transform_matrix": X_WV.tolist(), + "file_path": f"images/{total_frames}", + "fl_x": sample.fl_x, + "fl_y": sample.fl_y, + "cx": sample.cx, + "cy": sample.cy, + "w": sample.width, + "h": sample.height + } + + if depth is not None: + frame["depth_path"] = f"images/{total_frames}.depth.png" + + manifest["frames"].append(frame) + + # Update index + if total_frames == n_frames - 1: + print("Saving manifest...") + # Write manifest as json + manifest_json = json.dumps(manifest, indent=4) + with open(save_path.joinpath("transforms.json"), "w") as f: + f.write(manifest_json) + print("Done") + sys.exit(0) + total_frames += 1 + + +if __name__ == "__main__": + args = parse_args() + + # Setup DDS + domain = Domain(domain_id=0, config=dds_config) + participant = DomainParticipant() + qos = Qos(Policy.Reliability.Reliable( + max_blocking_time=duration(seconds=1))) + topic = Topic(participant, "Frames", NeRFCaptureFrame, qos=qos) + reader = DataReader(participant, topic) + + if args.stream: + live_streaming_loop(reader, args.n_frames) + else: + dataset_capture_loop(reader, Path(args.save_path), args.overwrite, args.n_frames) diff --git a/scripts/record3d2nerf.py b/scripts/record3d2nerf.py index 5e5f9316d21a0ab1a4fdbb0ea96dfe36a260138a..c148a05cb18414cfe5f33936462c654157376641 100644 --- a/scripts/record3d2nerf.py +++ b/scripts/record3d2nerf.py @@ -20,157 +20,157 @@ from tqdm import tqdm from PIL import Image def rotate_img(img_path, degree=90): - img = Image.open(img_path) - img = img.rotate(degree, expand=1) - img.save(img_path, quality=100, subsampling=0) - + img = Image.open(img_path) + img = img.rotate(degree, expand=1) + img.save(img_path, quality=100, subsampling=0) + def rotate_camera(c2w, degree=90): - rad = np.deg2rad(degree) - R = Quaternion(axis=[0, 0, -1], angle=rad) - T = R.transformation_matrix - return c2w @ T + rad = np.deg2rad(degree) + R = Quaternion(axis=[0, 0, -1], angle=rad) + T = R.transformation_matrix + return c2w @ T def swap_axes(c2w): - rad = np.pi / 2 - R = Quaternion(axis=[1, 0, 0], angle=rad) - T = R.transformation_matrix - return T @ c2w + rad = np.pi / 2 + R = Quaternion(axis=[1, 0, 0], angle=rad) + T = R.transformation_matrix + return T @ c2w # Automatic rescale & offset the poses. def find_transforms_center_and_scale(raw_transforms): - print("computing center of attention...") - frames = raw_transforms['frames'] - for frame in frames: - frame['transform_matrix'] = np.array(frame['transform_matrix']) - - rays_o = [] - rays_d = [] - for f in tqdm(frames): - mf = f["transform_matrix"][0:3,:] - rays_o.append(mf[:3,3:]) - rays_d.append(mf[:3,2:3]) - rays_o = np.asarray(rays_o) - rays_d = np.asarray(rays_d) - - # Find the point that minimizes its distances to all rays. - def min_line_dist(rays_o, rays_d): - A_i = np.eye(3) - rays_d * np.transpose(rays_d, [0,2,1]) - b_i = -A_i @ rays_o - pt_mindist = np.squeeze(-np.linalg.inv((np.transpose(A_i, [0,2,1]) @ A_i).mean(0)) @ (b_i).mean(0)) - return pt_mindist - - translation = min_line_dist(rays_o, rays_d) - normalized_transforms = copy.deepcopy(raw_transforms) - for f in normalized_transforms["frames"]: - f["transform_matrix"][0:3,3] -= translation - - # Find the scale. - avglen = 0. - for f in normalized_transforms["frames"]: - avglen += np.linalg.norm(f["transform_matrix"][0:3,3]) - nframes = len(normalized_transforms["frames"]) - avglen /= nframes - print("avg camera distance from origin", avglen) - scale = 4.0 / avglen # scale to "nerf sized" - - return translation, scale + print("computing center of attention...") + frames = raw_transforms['frames'] + for frame in frames: + frame['transform_matrix'] = np.array(frame['transform_matrix']) + + rays_o = [] + rays_d = [] + for f in tqdm(frames): + mf = f["transform_matrix"][0:3,:] + rays_o.append(mf[:3,3:]) + rays_d.append(mf[:3,2:3]) + rays_o = np.asarray(rays_o) + rays_d = np.asarray(rays_d) + + # Find the point that minimizes its distances to all rays. + def min_line_dist(rays_o, rays_d): + A_i = np.eye(3) - rays_d * np.transpose(rays_d, [0,2,1]) + b_i = -A_i @ rays_o + pt_mindist = np.squeeze(-np.linalg.inv((np.transpose(A_i, [0,2,1]) @ A_i).mean(0)) @ (b_i).mean(0)) + return pt_mindist + + translation = min_line_dist(rays_o, rays_d) + normalized_transforms = copy.deepcopy(raw_transforms) + for f in normalized_transforms["frames"]: + f["transform_matrix"][0:3,3] -= translation + + # Find the scale. + avglen = 0. + for f in normalized_transforms["frames"]: + avglen += np.linalg.norm(f["transform_matrix"][0:3,3]) + nframes = len(normalized_transforms["frames"]) + avglen /= nframes + print("avg camera distance from origin", avglen) + scale = 4.0 / avglen # scale to "nerf sized" + + return translation, scale def normalize_transforms(transforms, translation, scale): - normalized_transforms = copy.deepcopy(transforms) - for f in normalized_transforms["frames"]: - f["transform_matrix"] = np.asarray(f["transform_matrix"]) - f["transform_matrix"][0:3,3] -= translation - f["transform_matrix"][0:3,3] *= scale - f["transform_matrix"] = f["transform_matrix"].tolist() - return normalized_transforms + normalized_transforms = copy.deepcopy(transforms) + for f in normalized_transforms["frames"]: + f["transform_matrix"] = np.asarray(f["transform_matrix"]) + f["transform_matrix"][0:3,3] -= translation + f["transform_matrix"][0:3,3] *= scale + f["transform_matrix"] = f["transform_matrix"].tolist() + return normalized_transforms def parse_args(): - parser = argparse.ArgumentParser(description="convert a Record3D capture to nerf format transforms.json") - parser.add_argument("--scene", default="", help="path to the Record3D capture") - parser.add_argument("--rotate", action="store_true", help="rotate the dataset") - parser.add_argument("--subsample", default=1, type=int, help="step size of subsampling") - args = parser.parse_args() - return args + parser = argparse.ArgumentParser(description="convert a Record3D capture to nerf format transforms.json") + parser.add_argument("--scene", default="", help="path to the Record3D capture") + parser.add_argument("--rotate", action="store_true", help="rotate the dataset") + parser.add_argument("--subsample", default=1, type=int, help="step size of subsampling") + args = parser.parse_args() + return args if __name__ == "__main__": - args = parse_args() - dataset_dir = Path(args.scene) - with open(dataset_dir / 'metadata') as f: - metadata = json.load(f) - - frames = [] - n_images = len(list((dataset_dir / 'rgbd').glob('*.jpg'))) - poses = np.array(metadata['poses']) - for idx in tqdm(range(n_images)): - # Link the image. - img_name = f'{idx}.jpg' - img_path = dataset_dir / 'rgbd' / img_name - - # Rotate the image. - if args.rotate: - # TODO: parallelize this step with joblib. - rotate_img(img_path) - - # Extract c2w. - """ Each `pose` is a 7-element tuple which contains quaternion + world position. - [qx, qy, qz, qw, tx, ty, tz] - """ - pose = poses[idx] - q = Quaternion(x=pose[0], y=pose[1], z=pose[2], w=pose[3]) - c2w = np.eye(4) - c2w[:3, :3] = q.rotation_matrix - c2w[:3, -1] = [pose[4], pose[5], pose[6]] - if args.rotate: - c2w = rotate_camera(c2w) - c2w = swap_axes(c2w) - - frames.append( - { - "file_path": f"./rgbd/{img_name}", - "transform_matrix": c2w.tolist(), - } - ) - - # Write intrinsics to `cameras.txt`. - if not args.rotate: - h = metadata['h'] - w = metadata['w'] - K = np.array(metadata['K']).reshape([3, 3]).T - fx = K[0, 0] - fy = K[1, 1] - cx = K[0, 2] - cy = K[1, 2] - else: - h = metadata['w'] - w = metadata['h'] - K = np.array(metadata['K']).reshape([3, 3]).T - fx = K[1, 1] - fy = K[0, 0] - cx = K[1, 2] - cy = h - K[0, 2] - - transforms = {} - transforms['fl_x'] = fx - transforms['fl_y'] = fy - transforms['cx'] = cx - transforms['cy'] = cy - transforms['w'] = w - transforms['h'] = h - transforms['aabb_scale'] = 16 - transforms['scale'] = 1.0 - transforms['camera_angle_x'] = 2 * np.arctan(transforms['w'] / (2 * transforms['fl_x'])) - transforms['camera_angle_y'] = 2 * np.arctan(transforms['h'] / (2 * transforms['fl_y'])) - transforms['frames'] = frames - - os.makedirs(dataset_dir / 'arkit_transforms', exist_ok=True) - with open(dataset_dir / 'arkit_transforms' / 'transforms.json', 'w') as fp: - json.dump(transforms, fp, indent=2) - - # Normalize the poses. - transforms['frames'] = transforms['frames'][::args.subsample] - translation, scale = find_transforms_center_and_scale(transforms) - normalized_transforms = normalize_transforms(transforms, translation, scale) - - output_path = dataset_dir / 'transforms.json' - with open(output_path, "w") as outfile: - json.dump(normalized_transforms, outfile, indent=2) \ No newline at end of file + args = parse_args() + dataset_dir = Path(args.scene) + with open(dataset_dir / 'metadata') as f: + metadata = json.load(f) + + frames = [] + n_images = len(list((dataset_dir / 'rgbd').glob('*.jpg'))) + poses = np.array(metadata['poses']) + for idx in tqdm(range(n_images)): + # Link the image. + img_name = f'{idx}.jpg' + img_path = dataset_dir / 'rgbd' / img_name + + # Rotate the image. + if args.rotate: + # TODO: parallelize this step with joblib. + rotate_img(img_path) + + # Extract c2w. + """ Each `pose` is a 7-element tuple which contains quaternion + world position. + [qx, qy, qz, qw, tx, ty, tz] + """ + pose = poses[idx] + q = Quaternion(x=pose[0], y=pose[1], z=pose[2], w=pose[3]) + c2w = np.eye(4) + c2w[:3, :3] = q.rotation_matrix + c2w[:3, -1] = [pose[4], pose[5], pose[6]] + if args.rotate: + c2w = rotate_camera(c2w) + c2w = swap_axes(c2w) + + frames.append( + { + "file_path": f"./rgbd/{img_name}", + "transform_matrix": c2w.tolist(), + } + ) + + # Write intrinsics to `cameras.txt`. + if not args.rotate: + h = metadata['h'] + w = metadata['w'] + K = np.array(metadata['K']).reshape([3, 3]).T + fx = K[0, 0] + fy = K[1, 1] + cx = K[0, 2] + cy = K[1, 2] + else: + h = metadata['w'] + w = metadata['h'] + K = np.array(metadata['K']).reshape([3, 3]).T + fx = K[1, 1] + fy = K[0, 0] + cx = K[1, 2] + cy = h - K[0, 2] + + transforms = {} + transforms['fl_x'] = fx + transforms['fl_y'] = fy + transforms['cx'] = cx + transforms['cy'] = cy + transforms['w'] = w + transforms['h'] = h + transforms['aabb_scale'] = 16 + transforms['scale'] = 1.0 + transforms['camera_angle_x'] = 2 * np.arctan(transforms['w'] / (2 * transforms['fl_x'])) + transforms['camera_angle_y'] = 2 * np.arctan(transforms['h'] / (2 * transforms['fl_y'])) + transforms['frames'] = frames + + os.makedirs(dataset_dir / 'arkit_transforms', exist_ok=True) + with open(dataset_dir / 'arkit_transforms' / 'transforms.json', 'w') as fp: + json.dump(transforms, fp, indent=2) + + # Normalize the poses. + transforms['frames'] = transforms['frames'][::args.subsample] + translation, scale = find_transforms_center_and_scale(transforms) + normalized_transforms = normalize_transforms(transforms, translation, scale) + + output_path = dataset_dir / 'transforms.json' + with open(output_path, "w") as outfile: + json.dump(normalized_transforms, outfile, indent=2)