Skip to content
Snippets Groups Projects
Unverified Commit 8268e2ce authored by Thomas Müller's avatar Thomas Müller Committed by GitHub
Browse files

Merge pull request #1181 from jc211/patch-1

Add NeRFCapture to docs
parents e53d2ca0 6e3c4501
No related branches found
No related tags found
No related merge requests found
......@@ -153,6 +153,7 @@ jobs:
scripts/convert_image.py
scripts/download_colmap.bat
scripts/download_ffmpeg.bat
scripts/nerfcapture2nerf.py
scripts/nsvf2nerf.py
scripts/record3d2nerf.py
......
......@@ -53,12 +53,12 @@ See [nerf_loader.cu](src/nerf_loader.cu) for implementation details and addition
## Preparing new NeRF datasets
To train on self-captured data, one has to process the data into an existing format supported by Instant-NGP. We provide scripts to support two complementary approaches:
To train on self-captured data, one has to process the data into an existing format supported by __instant-ngp__. We provide scripts to support three approaches:
- [COLMAP](#COLMAP) to create a dataset from a set of photos or a video you took
- [Record3D](#Record3D) to create a dataset with an iPhone 12 Pro or newer (based on ARKit)
- [NeRFCapture](#NeRFCapture) to create a dataset or stream posed images directly to __instant-ngp__ with an iOS device.
Both require [Python](https://www.python.org/) 3.7 or higher to be installed and available in your PATH.
All require [Python](https://www.python.org/) 3.7 or higher to be installed and available in your PATH.
On Windows you can [download an installer from here](https://www.python.org/downloads/). During installation, make sure to check "add python.exe to PATH".
......@@ -104,14 +104,14 @@ The script will run (and install, if you use Windows) FFmpeg and COLMAP as neede
By default, the script invokes colmap with the "sequential matcher", which is suitable for images taken from a smoothly changing camera path, as in a video. The exhaustive matcher is more appropriate if the images are in no particular order, as shown in the image example above.
For more options, you can run the script with `--help`. For more advanced uses of COLMAP or for challenging scenes, please see the [COLMAP documentation](https://colmap.github.io/cli.html); you may need to modify the [scripts/colmap2nerf.py](/scripts/colmap2nerf.py) script itself.
The `aabb_scale` parameter is the most important `instant-ngp` specific parameter. It specifies the extent of the scene, defaulting to 1; that is, the scene is scaled such that the camera positions are at an average distance of 1 unit from the origin. For small synthetic scenes such as the original NeRF dataset, the default `aabb_scale` of 1 is ideal and leads to fastest training. The NeRF model makes the assumption that the training images can entirely be explained by a scene contained within this bounding box. However, for natural scenes where there is a background that extends beyond this bounding box, the NeRF model will struggle and may hallucinate "floaters" at the boundaries of the box. By setting `aabb_scale` to a larger power of 2 (up to a maximum of 128), the NeRF model will extend rays to a much larger bounding box. Note that this can impact training speed slightly. If in doubt, for natural scenes, start with an `aabb_scale` of 128, and subsequently reduce it if possible. The value can be directly edited in the `transforms.json` output file, without re-running the [scripts/colmap2nerf.py](/scripts/colmap2nerf.py) script.
The `aabb_scale` parameter is the most important __instant-ngp__ specific parameter. It specifies the extent of the scene, defaulting to 1; that is, the scene is scaled such that the camera positions are at an average distance of 1 unit from the origin. For small synthetic scenes such as the original NeRF dataset, the default `aabb_scale` of 1 is ideal and leads to fastest training. The NeRF model makes the assumption that the training images can entirely be explained by a scene contained within this bounding box. However, for natural scenes where there is a background that extends beyond this bounding box, the NeRF model will struggle and may hallucinate "floaters" at the boundaries of the box. By setting `aabb_scale` to a larger power of 2 (up to a maximum of 128), the NeRF model will extend rays to a much larger bounding box. Note that this can impact training speed slightly. If in doubt, for natural scenes, start with an `aabb_scale` of 128, and subsequently reduce it if possible. The value can be directly edited in the `transforms.json` output file, without re-running the [scripts/colmap2nerf.py](/scripts/colmap2nerf.py) script.
You can optionally pass in object categories (e.g. `--mask_categories person car`) which runs [Detectron2](https://github.com/facebookresearch/detectron2) to generate masks automatically.
__instant-ngp__ will not use the masked pixels for training.
This utility is helpful for users who wish to ignore moving or sensitive objects such as people, cars, or bikes.
See [scripts/category2id.json](/scripts/category2id.json) for a list of categories.
Assuming success, you can now train your NeRF model as follows, starting in the `instant-ngp` folder:
Assuming success, you can now train your NeRF model as follows, starting in the __instant-ngp__ folder:
```sh
instant-ngp$ ./instant-ngp [path to training data folder containing transforms.json]
......@@ -119,7 +119,7 @@ instant-ngp$ ./instant-ngp [path to training data folder containing transforms.j
### Record3D
With an >=iPhone 12 Pro, one can use [Record3D](https://record3d.app/) to collect data and avoid COLMAP. [Record3D](https://record3d.app/) is an iOS app that relies on ARKit to estimate each image's camera pose. It is more robust than COLMAP for scenes that lack textures or contain repetitive patterns. To train Instant-NGPs with Record3D data, follow these steps:
With an >=iPhone 12 Pro, one can use [Record3D](https://record3d.app/) to collect data and avoid COLMAP. [Record3D](https://record3d.app/) is an iOS app that relies on ARKit to estimate each image's camera pose. It is more robust than COLMAP for scenes that lack textures or contain repetitive patterns. To train __instant-ngp__ with Record3D data, follow these steps:
1. Record a video and export with the "Shareable/Internal format (.r3d)".
2. Send the exported data to your computer.
......@@ -130,14 +130,43 @@ With an >=iPhone 12 Pro, one can use [Record3D](https://record3d.app/) to collec
```
If you capture the scene in the landscape orientation, add `--rotate`.
5. Launch Instant-NGP training:
5. Launch __instant-ngp__ training:
```
instant-ngp$ ./instant-ngp path/to/data
```
### NeRFCapture
[NeRFCapture](https://github.com/jc211/NeRFCapture) is an iOS app that runs on any ARKit device. It allows you to stream images directly from your phone to __instant-ngp__ thus enabling a more interactive experience. It can also collect an offline dataset for later use.
The following dependencies are needed to run the NeRFCapture script:
```
pip install cyclonedds
```
To stream:
1. Open the NeRFCapture app
2. Run the script with the --stream flag enabled.
```
instant-ngp$ python scripts/nerfcapture2nerf.py --stream
```
3. Wait for the connection between the app and the script to occur. This will be indicated on the app.
4. Click the send button on the app. The frame captured will be sent to __instant-ngp__.
5. Toggle training
To save a dataset:
1. Open the NeRFCapture app
2. Run the script with the --save_path flag. The n_frames flag indicates how many frames to capture before saving the dataset.
```
instant-ngp$ python scripts/nerfcapture2nerf.py --save_path "dir1/dir2" --n_frames 20
```
3. Wait for the connection between the app and the script to occur. This will be indicated on the app.
4. Click the send button on the app. The frame captured will be saved to the dataset folder on the computer running the script.
## Tips for NeRF training data
The NeRF model trains best with between 50-150 images which exhibit minimal scene movement, motion blur or other blurring artefacts. The quality of reconstruction is predicated on COLMAP being able to extract accurate camera parameters from the images.
The NeRF model trains best with between 50-150 images which exhibit minimal scene movement, motion blur or other blurring artifacts. The quality of reconstruction is predicated on the previous scripts being able to extract accurate camera parameters from the images.
Review the earlier sections for information on how to verify this.
The `colmap2nerf.py` script assumes that the training images are all pointing approximately at a shared point of interest, which it places at the origin. This point is found by taking a weighted average of the closest points of approach between the rays through the central pixel of all pairs of training images. In practice, this means that the script works best when the training images have been captured pointing inwards towards the object of interest, although they do not need to complete a full 360 view of it. Any background visible behind the object of interest will still be reconstructed if `aabb_scale` is set to a number larger than 1, as explained above.
The `colmap2nerf.py` and `record3d2nerf.py` scripts assumes that the training images are all pointing approximately at a shared point of interest, which it places at the origin. This point is found by taking a weighted average of the closest points of approach between the rays through the central pixel of all pairs of training images. In practice, this means that the script works best when the training images have been captured pointing inwards towards the object of interest, although they do not need to complete a full 360 view of it. Any background visible behind the object of interest will still be reconstructed if `aabb_scale` is set to a number larger than 1, as explained above.
#!/usr/bin/env python3
# Streaming/Dataset capture script for the NeRFCapture iOS App
import argparse
import cv2
from pathlib import Path
import json
import shutil
import cyclonedds.idl as idl
import cyclonedds.idl.annotations as annotate
import cyclonedds.idl.types as types
from dataclasses import dataclass
from cyclonedds.domain import DomainParticipant, Domain
from cyclonedds.core import Qos, Policy
from cyclonedds.sub import DataReader
from cyclonedds.topic import Topic
from cyclonedds.util import duration
from common import *
import pyngp as ngp # noqa
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument("--stream", action="store_true", help="Stream images directly to InstantNGP.")
parser.add_argument("--n_frames", default=10, type=int, help="Number of frames before saving the dataset. Also used as the number of cameras to remember when streaming.")
parser.add_argument("--save_path", required='--stream' not in sys.argv, type=str, help="Path to save the dataset.")
parser.add_argument("--depth_scale", default=10.0, type=float, help="Depth scale used when saving depth. Only used when saving dataset.")
parser.add_argument("--overwrite", action="store_true", help="Rewrite over dataset if it exists.")
return parser.parse_args()
# DDS
# ==================================================================================================
@dataclass
@annotate.final
@annotate.autoid("sequential")
class NeRFCaptureFrame(idl.IdlStruct, typename="NeRFCaptureData.NeRFCaptureFrame"):
id: types.uint32
annotate.key("id")
timestamp: types.float64
fl_x: types.float32
fl_y: types.float32
cx: types.float32
cy: types.float32
transform_matrix: types.array[types.float32, 16]
width: types.uint32
height: types.uint32
image: types.sequence[types.uint8]
has_depth: bool
depth_width: types.uint32
depth_height: types.uint32
depth_scale: types.float32
depth_image: types.sequence[types.uint8]
dds_config = """<?xml version="1.0" encoding="UTF-8" ?> \
<CycloneDDS xmlns="https://cdds.io/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://cdds.io/config https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/master/etc/cyclonedds.xsd"> \
<Domain id="any"> \
<Internal> \
<MinimumSocketReceiveBufferSize>10MB</MinimumSocketReceiveBufferSize> \
</Internal> \
<Tracing> \
<Verbosity>config</Verbosity> \
<OutputFile>stdout</OutputFile> \
</Tracing> \
</Domain> \
</CycloneDDS> \
"""
# ==================================================================================================
def set_frame(testbed, frame_idx: int, rgb: np.ndarray, depth: np.ndarray, depth_scale: float, X_WV: np.ndarray, fx: float, fy: float, cx: float, cy: float):
testbed.nerf.training.set_image(frame_idx = frame_idx, img=rgb, depth_img=depth, depth_scale=depth_scale*testbed.nerf.training.dataset.scale)
testbed.nerf.training.set_camera_extrinsics(frame_idx=frame_idx, camera_to_world=X_WV)
testbed.nerf.training.set_camera_intrinsics(frame_idx=frame_idx, fx=fx, fy=fy, cx=cx, cy=cy)
def live_streaming_loop(reader: DataReader, max_cameras: int):
# Start InstantNGP
testbed = ngp.Testbed(ngp.TestbedMode.Nerf)
testbed.init_window(1920, 1080)
testbed.reload_network_from_file()
testbed.visualize_unit_cube = True
testbed.nerf.visualize_cameras = True
camera_index = 0 # Current camera index we are replacing in InstantNGP
total_frames = 0 # Total frames received
# Create Empty Dataset
testbed.create_empty_nerf_dataset(max_cameras, aabb_scale=1)
testbed.nerf.training.n_images_for_training = 0
testbed.up_dir = np.array([1.0, 0.0, 0.0])
# Start InstantNGP and DDS Loop
while testbed.frame():
sample = reader.read_next() # Get frame from NeRFCapture
if sample:
print(f"Frame {total_frames + 1} received")
# RGB
image = np.asarray(sample.image, dtype=np.uint8).reshape(
(sample.height, sample.width, 3)).astype(np.float32)/255.0
image = np.concatenate(
[image, np.zeros((sample.height, sample.width, 1), dtype=np.float32)], axis=-1)
# Depth if available
depth = None
if sample.has_depth:
depth = np.asarray(sample.depth_image, dtype=np.uint8).view(
dtype=np.float32).reshape((sample.depth_height, sample.depth_width))
depth = cv2.resize(depth, dsize=(
sample.width, sample.height), interpolation=cv2.INTER_NEAREST)
# Transform
X_WV = np.asarray(sample.transform_matrix,
dtype=np.float32).reshape((4, 4)).T[:3, :]
# Add frame to InstantNGP
set_frame(testbed,
frame_idx=camera_index,
rgb=srgb_to_linear(image),
depth=depth,
depth_scale=1,
X_WV=X_WV,
fx=sample.fl_x,
fy=sample.fl_y,
cx=sample.cx,
cy=sample.cy)
# Update index
total_frames += 1
testbed.nerf.training.n_images_for_training = min(total_frames, max_cameras)
camera_index = (camera_index + 1) % max_cameras
if total_frames == 1:
testbed.first_training_view()
testbed.render_groundtruth = True
def dataset_capture_loop(reader: DataReader, save_path: Path, overwrite: bool, n_frames: int):
if save_path.exists():
if overwrite:
# Prompt user to confirm deletion
if (input(f"warning! folder '{save_path}' will be deleted/replaced. continue? (Y/n)").lower().strip()+"y")[:1] != "y":
sys.exit(1)
shutil.rmtree(save_path)
else:
print(f"save_path {save_path} already exists")
sys.exit(1)
print("Waiting for frames...")
# Make directory
images_dir = save_path.joinpath("images")
manifest = {
"fl_x": 0.0,
"fl_y": 0.0,
"cx": 0.0,
"cy": 0.0,
"w": 0.0,
"h": 0.0,
"frames": []
}
total_frames = 0 # Total frames received
# Start DDS Loop
while True:
sample = reader.read_next() # Get frame from NeRFCapture
if sample:
print(f"{total_frames + 1}/{n_frames} frames received")
if total_frames == 0:
save_path.mkdir(parents=True)
images_dir.mkdir()
manifest["w"] = sample.width
manifest["h"] = sample.height
manifest["cx"] = sample.cx
manifest["cy"] = sample.cy
manifest["fl_x"] = sample.fl_x
manifest["fl_y"] = sample.fl_y
manifest["integer_depth_scale"] = float(args.depth_scale)/65535.0
# RGB
image = np.asarray(sample.image, dtype=np.uint8).reshape((sample.height, sample.width, 3))
cv2.imwrite(str(images_dir.joinpath(f"{total_frames}.png")), cv2.cvtColor(image, cv2.COLOR_RGB2BGR))
# Depth if avaiable
depth = None
if sample.has_depth:
depth = np.asarray(sample.depth_image, dtype=np.uint8).view(
dtype=np.float32).reshape((sample.depth_height, sample.depth_width))
depth = (depth*65535/float(args.depth_scale)).astype(np.uint16)
depth = cv2.resize(depth, dsize=(
sample.width, sample.height), interpolation=cv2.INTER_NEAREST)
cv2.imwrite(str(images_dir.joinpath(f"{total_frames}.depth.png")), depth)
# Transform
X_WV = np.asarray(sample.transform_matrix,
dtype=np.float32).reshape((4, 4)).T
frame = {
"transform_matrix": X_WV.tolist(),
"file_path": f"images/{total_frames}",
"fl_x": sample.fl_x,
"fl_y": sample.fl_y,
"cx": sample.cx,
"cy": sample.cy,
"w": sample.width,
"h": sample.height
}
if depth is not None:
frame["depth_path"] = f"images/{total_frames}.depth.png"
manifest["frames"].append(frame)
# Update index
if total_frames == n_frames - 1:
print("Saving manifest...")
# Write manifest as json
manifest_json = json.dumps(manifest, indent=4)
with open(save_path.joinpath("transforms.json"), "w") as f:
f.write(manifest_json)
print("Done")
sys.exit(0)
total_frames += 1
if __name__ == "__main__":
args = parse_args()
# Setup DDS
domain = Domain(domain_id=0, config=dds_config)
participant = DomainParticipant()
qos = Qos(Policy.Reliability.Reliable(
max_blocking_time=duration(seconds=1)))
topic = Topic(participant, "Frames", NeRFCaptureFrame, qos=qos)
reader = DataReader(participant, topic)
if args.stream:
live_streaming_loop(reader, args.n_frames)
else:
dataset_capture_loop(reader, Path(args.save_path), args.overwrite, args.n_frames)
......@@ -20,157 +20,157 @@ from tqdm import tqdm
from PIL import Image
def rotate_img(img_path, degree=90):
img = Image.open(img_path)
img = img.rotate(degree, expand=1)
img.save(img_path, quality=100, subsampling=0)
img = Image.open(img_path)
img = img.rotate(degree, expand=1)
img.save(img_path, quality=100, subsampling=0)
def rotate_camera(c2w, degree=90):
rad = np.deg2rad(degree)
R = Quaternion(axis=[0, 0, -1], angle=rad)
T = R.transformation_matrix
return c2w @ T
rad = np.deg2rad(degree)
R = Quaternion(axis=[0, 0, -1], angle=rad)
T = R.transformation_matrix
return c2w @ T
def swap_axes(c2w):
rad = np.pi / 2
R = Quaternion(axis=[1, 0, 0], angle=rad)
T = R.transformation_matrix
return T @ c2w
rad = np.pi / 2
R = Quaternion(axis=[1, 0, 0], angle=rad)
T = R.transformation_matrix
return T @ c2w
# Automatic rescale & offset the poses.
def find_transforms_center_and_scale(raw_transforms):
print("computing center of attention...")
frames = raw_transforms['frames']
for frame in frames:
frame['transform_matrix'] = np.array(frame['transform_matrix'])
rays_o = []
rays_d = []
for f in tqdm(frames):
mf = f["transform_matrix"][0:3,:]
rays_o.append(mf[:3,3:])
rays_d.append(mf[:3,2:3])
rays_o = np.asarray(rays_o)
rays_d = np.asarray(rays_d)
# Find the point that minimizes its distances to all rays.
def min_line_dist(rays_o, rays_d):
A_i = np.eye(3) - rays_d * np.transpose(rays_d, [0,2,1])
b_i = -A_i @ rays_o
pt_mindist = np.squeeze(-np.linalg.inv((np.transpose(A_i, [0,2,1]) @ A_i).mean(0)) @ (b_i).mean(0))
return pt_mindist
translation = min_line_dist(rays_o, rays_d)
normalized_transforms = copy.deepcopy(raw_transforms)
for f in normalized_transforms["frames"]:
f["transform_matrix"][0:3,3] -= translation
# Find the scale.
avglen = 0.
for f in normalized_transforms["frames"]:
avglen += np.linalg.norm(f["transform_matrix"][0:3,3])
nframes = len(normalized_transforms["frames"])
avglen /= nframes
print("avg camera distance from origin", avglen)
scale = 4.0 / avglen # scale to "nerf sized"
return translation, scale
print("computing center of attention...")
frames = raw_transforms['frames']
for frame in frames:
frame['transform_matrix'] = np.array(frame['transform_matrix'])
rays_o = []
rays_d = []
for f in tqdm(frames):
mf = f["transform_matrix"][0:3,:]
rays_o.append(mf[:3,3:])
rays_d.append(mf[:3,2:3])
rays_o = np.asarray(rays_o)
rays_d = np.asarray(rays_d)
# Find the point that minimizes its distances to all rays.
def min_line_dist(rays_o, rays_d):
A_i = np.eye(3) - rays_d * np.transpose(rays_d, [0,2,1])
b_i = -A_i @ rays_o
pt_mindist = np.squeeze(-np.linalg.inv((np.transpose(A_i, [0,2,1]) @ A_i).mean(0)) @ (b_i).mean(0))
return pt_mindist
translation = min_line_dist(rays_o, rays_d)
normalized_transforms = copy.deepcopy(raw_transforms)
for f in normalized_transforms["frames"]:
f["transform_matrix"][0:3,3] -= translation
# Find the scale.
avglen = 0.
for f in normalized_transforms["frames"]:
avglen += np.linalg.norm(f["transform_matrix"][0:3,3])
nframes = len(normalized_transforms["frames"])
avglen /= nframes
print("avg camera distance from origin", avglen)
scale = 4.0 / avglen # scale to "nerf sized"
return translation, scale
def normalize_transforms(transforms, translation, scale):
normalized_transforms = copy.deepcopy(transforms)
for f in normalized_transforms["frames"]:
f["transform_matrix"] = np.asarray(f["transform_matrix"])
f["transform_matrix"][0:3,3] -= translation
f["transform_matrix"][0:3,3] *= scale
f["transform_matrix"] = f["transform_matrix"].tolist()
return normalized_transforms
normalized_transforms = copy.deepcopy(transforms)
for f in normalized_transforms["frames"]:
f["transform_matrix"] = np.asarray(f["transform_matrix"])
f["transform_matrix"][0:3,3] -= translation
f["transform_matrix"][0:3,3] *= scale
f["transform_matrix"] = f["transform_matrix"].tolist()
return normalized_transforms
def parse_args():
parser = argparse.ArgumentParser(description="convert a Record3D capture to nerf format transforms.json")
parser.add_argument("--scene", default="", help="path to the Record3D capture")
parser.add_argument("--rotate", action="store_true", help="rotate the dataset")
parser.add_argument("--subsample", default=1, type=int, help="step size of subsampling")
args = parser.parse_args()
return args
parser = argparse.ArgumentParser(description="convert a Record3D capture to nerf format transforms.json")
parser.add_argument("--scene", default="", help="path to the Record3D capture")
parser.add_argument("--rotate", action="store_true", help="rotate the dataset")
parser.add_argument("--subsample", default=1, type=int, help="step size of subsampling")
args = parser.parse_args()
return args
if __name__ == "__main__":
args = parse_args()
dataset_dir = Path(args.scene)
with open(dataset_dir / 'metadata') as f:
metadata = json.load(f)
frames = []
n_images = len(list((dataset_dir / 'rgbd').glob('*.jpg')))
poses = np.array(metadata['poses'])
for idx in tqdm(range(n_images)):
# Link the image.
img_name = f'{idx}.jpg'
img_path = dataset_dir / 'rgbd' / img_name
# Rotate the image.
if args.rotate:
# TODO: parallelize this step with joblib.
rotate_img(img_path)
# Extract c2w.
""" Each `pose` is a 7-element tuple which contains quaternion + world position.
[qx, qy, qz, qw, tx, ty, tz]
"""
pose = poses[idx]
q = Quaternion(x=pose[0], y=pose[1], z=pose[2], w=pose[3])
c2w = np.eye(4)
c2w[:3, :3] = q.rotation_matrix
c2w[:3, -1] = [pose[4], pose[5], pose[6]]
if args.rotate:
c2w = rotate_camera(c2w)
c2w = swap_axes(c2w)
frames.append(
{
"file_path": f"./rgbd/{img_name}",
"transform_matrix": c2w.tolist(),
}
)
# Write intrinsics to `cameras.txt`.
if not args.rotate:
h = metadata['h']
w = metadata['w']
K = np.array(metadata['K']).reshape([3, 3]).T
fx = K[0, 0]
fy = K[1, 1]
cx = K[0, 2]
cy = K[1, 2]
else:
h = metadata['w']
w = metadata['h']
K = np.array(metadata['K']).reshape([3, 3]).T
fx = K[1, 1]
fy = K[0, 0]
cx = K[1, 2]
cy = h - K[0, 2]
transforms = {}
transforms['fl_x'] = fx
transforms['fl_y'] = fy
transforms['cx'] = cx
transforms['cy'] = cy
transforms['w'] = w
transforms['h'] = h
transforms['aabb_scale'] = 16
transforms['scale'] = 1.0
transforms['camera_angle_x'] = 2 * np.arctan(transforms['w'] / (2 * transforms['fl_x']))
transforms['camera_angle_y'] = 2 * np.arctan(transforms['h'] / (2 * transforms['fl_y']))
transforms['frames'] = frames
os.makedirs(dataset_dir / 'arkit_transforms', exist_ok=True)
with open(dataset_dir / 'arkit_transforms' / 'transforms.json', 'w') as fp:
json.dump(transforms, fp, indent=2)
# Normalize the poses.
transforms['frames'] = transforms['frames'][::args.subsample]
translation, scale = find_transforms_center_and_scale(transforms)
normalized_transforms = normalize_transforms(transforms, translation, scale)
output_path = dataset_dir / 'transforms.json'
with open(output_path, "w") as outfile:
json.dump(normalized_transforms, outfile, indent=2)
\ No newline at end of file
args = parse_args()
dataset_dir = Path(args.scene)
with open(dataset_dir / 'metadata') as f:
metadata = json.load(f)
frames = []
n_images = len(list((dataset_dir / 'rgbd').glob('*.jpg')))
poses = np.array(metadata['poses'])
for idx in tqdm(range(n_images)):
# Link the image.
img_name = f'{idx}.jpg'
img_path = dataset_dir / 'rgbd' / img_name
# Rotate the image.
if args.rotate:
# TODO: parallelize this step with joblib.
rotate_img(img_path)
# Extract c2w.
""" Each `pose` is a 7-element tuple which contains quaternion + world position.
[qx, qy, qz, qw, tx, ty, tz]
"""
pose = poses[idx]
q = Quaternion(x=pose[0], y=pose[1], z=pose[2], w=pose[3])
c2w = np.eye(4)
c2w[:3, :3] = q.rotation_matrix
c2w[:3, -1] = [pose[4], pose[5], pose[6]]
if args.rotate:
c2w = rotate_camera(c2w)
c2w = swap_axes(c2w)
frames.append(
{
"file_path": f"./rgbd/{img_name}",
"transform_matrix": c2w.tolist(),
}
)
# Write intrinsics to `cameras.txt`.
if not args.rotate:
h = metadata['h']
w = metadata['w']
K = np.array(metadata['K']).reshape([3, 3]).T
fx = K[0, 0]
fy = K[1, 1]
cx = K[0, 2]
cy = K[1, 2]
else:
h = metadata['w']
w = metadata['h']
K = np.array(metadata['K']).reshape([3, 3]).T
fx = K[1, 1]
fy = K[0, 0]
cx = K[1, 2]
cy = h - K[0, 2]
transforms = {}
transforms['fl_x'] = fx
transforms['fl_y'] = fy
transforms['cx'] = cx
transforms['cy'] = cy
transforms['w'] = w
transforms['h'] = h
transforms['aabb_scale'] = 16
transforms['scale'] = 1.0
transforms['camera_angle_x'] = 2 * np.arctan(transforms['w'] / (2 * transforms['fl_x']))
transforms['camera_angle_y'] = 2 * np.arctan(transforms['h'] / (2 * transforms['fl_y']))
transforms['frames'] = frames
os.makedirs(dataset_dir / 'arkit_transforms', exist_ok=True)
with open(dataset_dir / 'arkit_transforms' / 'transforms.json', 'w') as fp:
json.dump(transforms, fp, indent=2)
# Normalize the poses.
transforms['frames'] = transforms['frames'][::args.subsample]
translation, scale = find_transforms_center_and_scale(transforms)
normalized_transforms = normalize_transforms(transforms, translation, scale)
output_path = dataset_dir / 'transforms.json'
with open(output_path, "w") as outfile:
json.dump(normalized_transforms, outfile, indent=2)
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment