集微访谈 | Atul Ingle:CIS为何越来越成为智能手机用户体验的主要提升点?

来源:爱集微 #集微访谈# #图像传感器#
1.5w

集微网消息,在往期的集微访谈栏目中,爱集微有幸采访了波特兰州立大学计算机科学系助理教授Atul Ingle。他专注于计算成像、计算机视觉和信号处理领域。目前研究涉及单光子图像传感器的成像硬件和算法的协同设计,是Image Sensors World作者之一。集微访谈就关于多像素、小像素、HDR技术、Spat摄像头以及图像传感器的前景等方面提出了一系列问题,并收到了十分有启发的答复。

问:可以简单谈一谈关于CFA么?

答:我认为部分原因涉及遗留问题。RGBG已经成为一种流行,它已经存在了很久。大多数从事图像传感器和下游算法的人都熟悉如何处理这种类型的数据。如果有人提出一个完全超前的新模式,那么你需要重新设计一个新的底层算法来处理这些原始数据并生成一个图像。这中间包括很多工作。也许你最终得到的收益并没有计划中那么多。虽然我认为还有一些其他的排列,特别是对于低光成像,除了RGB,还有只捕捉单色的,就像一个白色通道,如果你想这样称呼它。

可能会有一些特定的应用,其中一些非标准的非拜尔排列是有意义的。但是,一般来说,只是因为遗留问题,只是因为在图像质量方面可能没有足够的收益。完全改变你的低级处理算法来处理一种新的模式,会有很高的成本。一般来说,使用非RGBG排列可能没有意义。

问:近几年,为什么越来越多的移动电子传感器今天使用独特的cfa,也就是多像素,如由索尼的Quarter、三星的Nonapixel、豪威科技的而不是Bayer

答:这是个好问题。如果你看一下比赛中各种各样的代码,它们看起来仍然很像拜尔排列,因为它们仍有类似的 RGBG 排布。但目前他们所做的是,将一些像素分组,并将相同的滤色片放在一组像素里。

对于智能手机和移动图像传感器来说,背后有一个持续的推动力,使应用于手机市场的像素尺寸缩小。因为你希望自己的手机不要很大,相应地,图像传感器也需要缩小,但另一方面你又期待能有高分辨率的图像。而能做到这一点的唯一方法是使像素变小。

现在,如果像素变小,每个像素采集的光就会变少。而这就意味着必须处理噪音问题,这无法避免。有一个物理限制,如果像素变小,那么在最终图像中就会满是噪点。而处理噪声问题的方法就是采用所谓的合并采样,你把一堆像素结合在一起,这是一种提高信号量的方法。

现在,如果你用标准的RGBG 拜尔排列来进行合并采样,那么基本上是把包含不同滤镜的像素组合在一起。我想这就是为什么人们倾向于将使用相同颜色滤色片的像素集中在一组里,用这种方式来组合它们的信号就更有价值。这只是一个权宜之计,为了改善图像质量,通过在空间分辨率方面付出少量的代价来减少噪声。

问:可以简单聊一聊小像素的合并采样流程么?

答:就最终的图像质量而言,这就像一个多因素的优化问题。它变得相当复杂,因为有很多其他参数在最终的图像质量中起作用。对于多合一滤色片阵列来说,另一个需要注意的是,你要在模拟域中做合并采样。我认为这才是正确的做法。如果你在数字化之后才开始做,那么这个时候你已经付出了噪声代价,因为模数转换器已经有了噪声干扰,对吗?

所以你就失去了可能得到的增益量。如果你在模拟领域将该信号叠加,我认为存在一些可察觉的提升。如果你需要在模拟流程做合并采样,而不是后来用数字流程做。

问:下一个问题是关于你刚才提到的小像素。为什么单维传感器制造商如此热衷于小像素传感器,如索尼已经开发出居于行业首位的0.8微米像素、而三星已经开发出0.9、0.7以及0.64微米像素。而豪威科技已经开发出0.61微米像素。那么,小像素的优势是什么

答:像我之前提到的那样,对于手机市场来说,有一个不断推动像素尺寸变小的趋势。因为没有其他方法可以在提高图像传感器分辨率的同时使图像传感器变小。你必须在相同的甚至更小的物理面积上装下所有这些像素,同时还能实现高分辨率。这些要求是相互冲突的,而且从物理上来说,不可能用其他方式来实现。你只能缩小像素。比方说,你有一个照相机机或视频机,图像传感器的尺寸没有那么大的约束。因此,图像传感器尺寸可以是2厘米乘1厘米那么大。你有足够大的面积,几百平方毫米,即使是百万像素的传感器,也可以收集足够的光线。但与你必须放在智能手机上的传感器的尺寸相比,它可能有10平方毫米的面积,这只是你在视频机或照相机上的面积的一小部分。

所以只能以收集更少光线的代价来处理这个问题。现在,这些小像素面临的最大挑战是,因为像素变小,它们可以收集的电荷量也小得多。因此,这些像素的满阱容量也小得多,这会限制你能从这些小像素中获得的原始动态范围。但真的没有其他办法,如果你想在很小的面积上保持高分辨率,这些都是挑战,必须在后期处理中进行计算,或在硬件中找到一些其他的解决措施。

例如,我们刚刚提到的采用合并采样这样的方法。

我的专长不是太多关于低级别的硬件细节。所以我无法给出关于这方面详细的解释。但我可以肯定的是,如果你的像素太小,那么要在每个像素内安装其他处理电路将是一个挑战。虽然我觉得最近在三维堆栈方面的一些进展可以帮助缓解其中的一些问题,你可以在第二层或甚至第三层中安装电路。但你是对的,它限制了你能在里面做多少东西,每个像素本身非常小。

这也是不容易做到的,因为我理解到,一些公司在将这些不同的晶圆对准到微米级的精度方面有一些真正特殊的专业知识,如果你的像素尺寸很小,这一点真的很重要。当然,这种类型的技术也对稍大的像素有好处。

问:好的。让我们继续下面内容,是关于芯片上的HDR图片或图像传感器中的逻辑芯片。为什么越来越多的移动信号环境支持片上HDR功能,这曾经是我们手机的图像传感器,只是拍摄图像。今天越来越多的移动实例的图像传感器使用像HDR这样的技术。为什么越来越多的传感器支持这种功能呢?

答:是这样的,消费者很明显对智能手机的图像质量有着越来越高的需求。在不同的HDR成像方案中,传统的方式为通过捕捉一堆图像,应用一些后期处理算法,这些算法可以在你的智能手机上使用CPU或RAM上运行。

但是,传统的后期处理方法有几个不利因素:首先是延迟。因为如果你在后期处理中进行处理,就要付出额外的代价,将图像数据传输到手机内存或进行一些中间处理、计算模块,这些模块使用HDR算法;第二个不利因素是噪音。尽可能在接近图像传感器的地方对原始数据运行算法是有利的。因为所有后续步骤都会在该信号处理链中引入一些噪声和伪像。第三个缺点是,我们将无法处理运动伪像。这也与延迟问题有关,如果场景中的东西移动了,而你正在捕捉多个图像,无法迅速地运行HDR算法,如果图像是在太多的延迟后捕捉到的,那么你必须在后处理中解决这一问题。

而这往往是很难做到的。你需要忍受一些运动伪像,比如在最终HDR图像中出现鬼影。如今,即使是普通消费者也无法接受这一问题。我们都已经习惯了手机上非常好的图像质量。我想这就是为什么许多图像传感器制造商现在都在推动在尽可能接近图像传感器的地方实现这些HDR功能。我想这并不是新趋势,如果追溯到曾经那些满阱容量有限的像素设计,那时还有像接在像素旁的水桶一样专门收集溢出电荷的设计。这些尝试从21世纪初就已经存在。

最近,有一些滚动快门传感器,它们使用某种隔行扫描方案,当数据被捕获时,不同的扫描行用不同的曝光设置,后期处理的HDR算法能够生成一个高动态范围的图像,同时考虑到图像传感器上不同行或不同像素组之间的曝光设置变化,等等。最终,它只是确保没有其他运动或鬼影,因为所有这些数据都是在同一时间相对于场景同时捕获的。归根结底,所有这些技术都只是在做一个权衡,在你能得到的最终空间分辨率和动态范围之中权衡。

问:但是其他需要HDR处理的传感器,比如S5Cage、小米11曾使用过。有额外要求的应用传感器,如nigerian联想来处理他们的罢工SDR图像已经捕获。而且我相信San francis,它可以提高有效的电源效率。所以在未来,计算照片这块的处理将由芯片上的图像传感器isp或独立应用来完成,就像jane 2。为什么呢?

答:对于HDR,在一般情况下,对于任何计算性的摄影算法,你可以想象以几种不同的方式实现它。你可以完全交由后期处理来实现,先捕捉所有的图像数据,然后进行处理,甚至在智能手机的处理单元和内存上运行,或者也可以选择在另一个极端实现,在光线照射到传感器时就开始处理数据,做一些他们称之为像素内的处理。这是两个极端,也有在介于两者之间的处理方法。现在,一般来说,把处理移到图像传感器上越近,好处越多,因为这样你可以更快地处理东西,通常来说延迟更低。它在处理噪声、运动和其他伪像方面更加强大。当你在本地RAW域中处理这些信息时,(处理流程)要尽可能地接近图像传感器。但这也有一个缺点,会丧失一定的灵活性。比方说,明天我想为HDR发布一些其他精细调整和优化的算法。如果它在信号处理链的后期实现,对我来说更容易做到这一点。给别人的应用程序发送更新要比让他们更新整个固件——例如更改 ISP 或应用处理器中的一些底层代码——要容易得多。我认为许多计算性的摄影算法,包括HDR已经相对标准了,我想大多数人都会把其中的许多算法视为他们智能手机上的默认设置而已。

因此,对于这样的算法,以一种非常严格的优化方式在芯片ISP上实现确实是有意义的。一旦它部署在那儿,你真的不必处理或做较多改变。最后,至于在ISP或这些独立的应用处理器之间应该如何选择,我认为需关注的细节很重要,许多不同的应用有各自不同的要求,需要做许多不同的权衡,你必须逐案考虑。在专门的ISP上实现这种算法可能是有意义的,因为ISP只做一件事,而且做得非常好,或者有一点灵活性可能也有意义,在一个应用、特定的处理器上运行它,HDR不是唯一需要处理的事情,它还要做许多其他事情。

问:我的下一个问题是关于innocence的市场,因为在索尼领域,在2022年的业务部门简报会上,索尼半导体解决方案公司的首席执行官表示,到2024年,手机的静态图像预计将超过可更换镜头摄像头的图像质量。巧合的是,mac lobby领导的团队为谷歌像素的智能手机开发了hdr +技术。也有人提出了类似的观点此举有可能取代DSAR,你怎么看这个观点?从innocence的角度和计算摄影等软件的角度来看,移动成像有哪些有前景的发展方向可以实现这一目标?

答:是的。这是一个非常有趣的问题。我认为,在某些方面,他们发表这样的声明是很大胆的。但话又说回来,这在某种程度上是正确的,对吧?我只是一个业余摄影爱好者。如果你考虑到我的摄影技术,这句话说的已经实现了,就像我的智能手机拍的照片比我现在用数码单反拍的照片好得多,因为我并不是一个优秀的摄影师,对吧?

有很多业余爱好者已经有了这种感觉,他们只是在没有任何挑战性的场景条件下拍摄照片。大多数智能手机都可以拍照,如果你看看现在的高级智能手机,它拍出的照片已经比中低端数码单反好多了。所以在某种程度上这个说法是正确的。但是我认为如果你想知道我对这个声明的真实看法,我会做一个更符合条件的陈述,而不是如此笼统的陈述。

所以我可能会说五年后。一款拥有高质量光学器件和高质量图像传感器的智能手机的图像质量要比低端可更换镜头相机好。所以我认为这是我对这些公司今天所说内容的彻底理解,对吧?我认为实现这一目标将需要两个方面,即硬件和算法。

同样,这可能是一个有点偏颇的观点,因为我的工作是计算摄影和计算成像。但我认为这是真的,因为我们需要在某种意义上同时优化这两件事,以不断提高我们使用这些小型图像传感器所能获得的图像质量。

问:你能告诉我们更多关于计算摄影技术的信息吗?因为我们发现这是针对智能手机来说非常重要的技术,我相信智能手机的成像技术。因为你不需要扩大你的锻炼范围,也不需要购买昂贵的大型CCM。你可以迅速提高你的图像质量。你能告诉我们更多未来的事情吗,比如竞争、摄影有机体以及一些发展方向

答:从我的角度来看,未来的一个令人兴奋的方向是单光子图像传感器,这些传感器具有极高的灵敏度。

所以我把它们归到单光子图像传感器的范畴下。如何真正实现可能是有争议的。我认为SPAD是实现这一目标的一种方式,而且在过去5到10年里,它们已经表现出了巨大的潜力和非常迅速的发展。在硬件方面,这些基于镜头的图像传感器的分辨率正在迅速提高。但事实上,这并不是做单光子成像的唯一方法。还有其他的技术,有量子图像传感器,甚至还有传统的基于CMOS的图像传感器,就像我们前面讨论的小像素的图像传感器。这些都是非常敏感的,对吧?它们可能没到单光子级别的敏感程度,他们仍然有大约一个电子或小于一个电子的读出噪声。但它们足以分辨少量光子级别。

这种极端的敏感度为你提供了一种完全不同的捕获场景信息的方式,因为它以你用其他方式永远无法捕捉的最高精细度捕捉场景信息。因为没有其他任何物理上的信息的尺度比捕捉一个光子还精细。在某种意义上,这些图像传感器能够捕获场景信息,直到没有关于场景的附加信息可被捕获为止。

最后,你必须把这个和智能计算结合起来,从原始数据中提取场景信息。我认为这就是硬件和原始数据之间紧密耦合的地方。下游计算将在计算摄影和计算成像方面发挥巨大作用,这是我们需要的。

除了单光子成像,在光学方面也有进步,有很多工作在进行,设计无镜头相机。所以无镜头成像或设计具有一些非常规镜头设计的相机,如使用元透镜。所以我们现在从这些技术中得到的图像质量仍然不能和基于镜头的相机相比。

所以我想我无法给出搭载无镜头相机的智能手机上市的具体日期或年份,但我猜它在不远的将来会出现。但同样地,计算将在图像捕获中扮演非常重要的角色。如果我们开始研究这些无镜头或基于元光学的非常规、基于镜头的传感方法,事实上,这些元透镜的设计将必须与图像传感器和图像重建算法共同完成。我认为这是一个令人兴奋的未来方向。在这个方向上已经有大量的工作在进行。

问:我在博客里读过一篇文章,在某种意义上他们说使用单一质子传感器。因此,它可以将图像融合到原始的rgb感,以提高动态范围。你能详细谈谈吗?因为我发现单处理器是很有前途的方向。而且它非常敏感。它只能感知一个质子。这意味着它在弱光下性能很好,因为它无法计算质子的数量。所以看起来它的容量非常大,因为它不需要物理容量。它将具有非常高的动态范围。现在,我认为spat传感器受到了分辨率的限制。这不是很好。你对此有何评论?

答:这太棒了。正如你所提到的,SPAD传感器,即使它们具有这种单光子敏感度,看起来它们很适合低光,但实际上它们对极高光也适用,因为它们捕获光子的方式,它们在某种意义上可以自由压缩动态范围,比如像素本身,正如你所说,SPAD 不需要势阱来存储光电子。所以从理论上来说,单光子图像传感器并没有最大能采集多少光子的限制。最后,你会受到读出速度的限制,但这是一个逐渐趋近的极限。所以理论上,你永远不会真正达到那个极限。您可以获得极高的动态范围,但由于与传统的CMOS图像传感器相比,SPAD传感器是如此新鲜,因此它们的空间分辨率、填充因子和光子检测效率仍然比CMOS图像传感器低得多。

最近有一些新闻报道说一些公司发布了有百万像素分辨率的SPAD传感器。所以这个方向看起来很有前途。我们离1000万像素或2000万像素、填充因子接近90%的SPAD摄像头还很远。但CMOS摄像头可以做到这些。

现在,如果你想要高分辨率,如果你也想要极高的动态范围,我认为可以将这两个词的优点结合起来,让高动态范围的场景信息从基于SPAD的传感器中输入,并从CMOS图像传感器中提取高空间分辨率。你可以把这两种信息流结合起来,并充分利用这两个词,你就得到了高动态范围和高分辨率。

我认为它很有前景的另一个原因是,现在市面上的手机已经有了带有多个摄像头的图像传感器模块,其中一个是CMOS摄像头。其中之一是SPAD摄像头,目前正在用于三维成像应用的LiDAR传感器。但这表明,事实上,你可以有一个多摄像头模块,用来组合不同类型的图像传感器。在这种情况下,SPAD和CMOS摄像头在一个模块中。然后由下游计算以智能的方式组合这些信息,为你提供它正在捕获的信息。

问:说到spat摄像头,正如你说过的,spat摄像头的技术挑战是什么?未来的发展方向是什么?我之所以问这个问题,是因为我们已经谈到了分辨率。我相信你已经提到摄像头有读时间或读速度的问题。那么你能展开来说吗?

答:我不认为SPAD摄像头本身有读出的问题,但SPAD摄像头本质上包含每一个光子到达每个像素的峰值,对吗?从SPAD摄像头中输出的原始数据量太大,无法从传感器中下载并在后期处理。所以对于SPAD摄像头,我认为我们必须想出一些聪明的方法来进行像素内处理,我们能快速提取一些信息,这样我们就不必传输图像传感器的所有光子数据,因为这将需要太多的带宽,也会消耗大量的电力。

但这是可行的。你真的不需要把每一张照片都发送到ISP或AP,你可以提取每个像素本身内部的一些底层信息,或者通过进行一些像素处理,然后将这些信息发送给ISP或AP,现在有一些公司已经在开发一些真正令人兴奋的应用程序,使用SPAD相机进行被动成像,甚至不用将其与CMOS相机融合。在一些具有挑战性的情况下,这样做是有意义的。

例如,如果你的光照非常低,那么单光子灵敏度是绝对有意义的。如果你将弱光和高速运动结合起来,那就是传统CMOS摄像头难以提供良好场景信息的地方。这就是SPAD摄像头真正的优势所在。

问:这个观点很有启发意义。下一个问题是,随着堆栈模式的逻辑芯片的性能更有意义,变得越来越强大。大型芯片的处理节点越来越先进了,我相信,使用65纳米节点和如今台积电未能跟上索尼图像传感器,其逻辑芯片没有产生索尼图像传感器的逻辑晶圆。他们已经推进到了22纳米节点。大芯片的堆叠方法变得越来越强大。那么,图像传感器在未来有越来越多的功能,不仅仅是成像,比如索尼IMX500的AI工艺的构建,它还可以做一些边缘空气处理,比如人脸识别之类的。

答:我认为这一趋势将继续增长。我对此很有信心,因为在图像传感器的许多应用中,图像本身并不是最终目标,它只是某种智能处理算法的起点,这种算法需要从这些图像中提取某种有意义的、更高级别的信息。想想我们这些天在计算机视觉方面的所有进展,我们可以用这些基于学习的数据驱动方法来实现目标,这些方法在大型图像数据集上使用深度神经网络。对于这样的应用,图像只是一个起点,而不是最终目标。你可以想象很多应用,例如,让机器人在建筑物中导航的应用,我是否向那个机器人展示具有美感的图像,比如看起来非常漂亮的图像,其实并不重要。

最重要的是,机器人是否避免了碰撞?它是否快速、顺利地到达了目的地?也许那些美丽好看的图像是无法实现这个任务的。还有大量的工业自动化任务,你需要在受控的环境中运行非常具体的东西。比方说,你要为一组受控的物体做物体识别。它只是10个或100个物体中的某一个。同样,在这种情况下,只在图像传感器上运行物体识别或物体检测算法是完全有意义的。没有必要下载所有图像,然后将它们发送到下游的一些计算模块。在图像传感器上运行它们是非常合理的。这样一来,你也避免了将这些图像传输到主机上的造成的额外延迟和带宽成本。你可以以更快的速度运行这些图像,每秒有更高的帧数。

总的来说,我认为这种在图像传感器上集成机器学习算法的趋势将不断扩大,将比现在大得多。

问:或者混合键合。这就是一些新技术。在图像和移动实例中,似乎你已经提到了索尼的三维晶体管基于混合键合,因为你需要输出一些信号。你必须转移到下一级晶体管。所以你需要混合键合。有什么新的方向吗?我记得你提到过Meta Lens

你可以展开说一下,告诉我们更多关于方法的细节,因为我发现这非常有前,在手机方面也很有前,因为它可以减小紧凑型摄像头模块的尺寸,现在智能手机的摄像头传感器尺寸有越来越大,们需要越来越多的摄像头。你可以在智能手机上看到一个巨大的凸起。因此,如果我们有镜头技术的方法,也许我们可以使用大型传感器,并且仍然适用于智能手机的小填充系数。你能告诉我们更多关于Meta Lens镜头摄像头的信息吗?

答:我认为这肯定是一个很有前景的未来方向,但今天你从无镜头相机得到的图像质量离智能手机消费者期望看到的图像质量还很遥远,就像基于镜头的相机仍然能提供更高的图像质量。

为这些无镜头成像技术构建的图像重建算法本身,可能比传统的基于镜头的相机更复杂,因为你不再通过镜头捕捉场景信息。原始数据本身将来自场景许多不同部分的光信号混合在一起。RAW本身看起来与图像完全不同。现在,下游计算算法的责任是将这些信息分解,并生成最终的漂亮图像。所以在这种情况下,为了生成最终图像,计算可能要复杂得多。但我认为无镜头相机可用于其他应用。

正如我前面提到的,有些应用的最终图像不是主要目标。对于无镜头相机来说,这些应用可能会再次变得有趣。我想到的一件事是,现在智能手机上的相机,有些基本上只是在做与安全相关的事情,比如面部识别或类似的东西。也许无镜头摄像头可能适合这种应用。你要找的是那个人脸上的某种独特的特征。

也许你不需要拍一张高分辨率、好看的照片。在隐私保护算法方面也有很多工作在使用无镜头方法,对吧?没有办法转换这些数据,弄清楚这个人是谁。但你仍然可以在这些数据上做一些计算机视觉任务。比如你可以知道那个人在哪里,或者他周围可能有什么其它物体。你可以做一些物体识别或姿势识别。你可以知道这个人是否在挥手、鼓掌或做其他动作等类似的识别应用。所以我认为这些可能是元透镜和无镜头相机的有趣应用。

以下是采访原文(英文):

Q:Can you briefly talk about CFA technology selection and sampling method selection?

A:I think partly it's a bit of a legacy issue for sure. I think because our rgbg has become such a popular thing and it's been around for so long. Most people who work on image sensors and downstream algorithms are familiar with how to deal with that type of data. If someone comes around with a completely over the top new pattern, then you need to design basically a new low-level algorithm that takes that raw data and makes an image. I mean that's just way too much work. And perhaps the amount of gain that you get in the end might not be that much in the overall grand scheme of things. Although I think there are some other patterns where especially for low light imaging, there are patterns where in addition to the rgb you also have one that's just capturing monochrome, just like a white channel if you want to call it that way.

So there may be some specific applications where some non-standard non Bayer patterns make sense. But in general, just because of the legacy issue, and just because maybe there isn't enough gain in terms of image quality that is to be had, there is a high cost involved in completely changing your low-level processing algorithm to deal with a new type of pattern. It may not make sense, in general, to use a non rgbg pattern.

Q:Why do more and more mobile sensors in today's are using the unique cfa is called a multi pixels, such as the Quarter by Sony and non-sale or Nonapixel from the Samsung and for pixel from OmniVision instead of the Bayer in recent years?

A:That's a great question. If you look at the multi sale code from the race, they still look a lot like the Bayer pattern, because they still have the rgb and g roughly speaking. But what they are doing is they are now grouping some of the pixels together and they're putting the same color filter on the group of pixels.

Now, the reason for that is for smartphones and mobile limit sensors, there is a constant push towards making their pixel size is really small for the mobile phone market. That's just because you want your phones to be small. You want the image sensor to be really small, but you still want the high-resolution image, right? The only way you can do that is you have to make the pixel smaller.

Now, if you make the pixel smaller, each pixel will collect less noise. And if it collects less noise, then you have to deal with noises. There's just no way around it. There is that physical limitation where if the pixel is smaller than there will be night in your final image. So the way you can deal with that noise is you do what's called binning where you combine a bunch of pixels together. Right? That's a way to boost the amount of a signal that you get back.

Now, if you were to do this process with the standard rgbg Bayer pattern, then you will essentially be combining pixels that have different filters on them. I think that's why there is a push towards using these groups of pixels where they have the same color filter on them. And then it makes more sense to combine them that way. In the end, it's just a trade-off where you're improving your image quality. You're reducing the amount of night by paying a small amount of penalty in terms of the spatial resolution.

Q:Can you briefly talk about the combined sampling process of small pixels?

A:In terms of the final limits quality, it's just like a multi factor optimization problem. It becomes pretty complicated because you have so many other parameters that are playing a role in the final image quality. But with the multi filter cfa is another thing to remember is that you're doing the winning in the analog domain. I think that's the right way to do it. If you do the beginning after digitalization, then you've already paid the penalty because the a to d converter has already added some noise in there, right?

So then you lose the amount of gain that you could potentially get. If you combine that signal while you were in the analog domain. And I think that's where there may be some perceptible improvement. If you do it in that analog domain, as opposed to doing it digitally later on.

Q:So my next question is about the small pixels you have already mentioned. So why are the single dimension sensors manufacturers are so keen on the small pixel sensors, such as Sony has developed the industry first 0.8 microns. And the Samsung has developed industry first 0.9 and 0.7 and 0.64 microns. And OmniVision has developed the industry for 0.61 microns. So what are the advantage of the small pixels?

A:I mean as I mentioned before, there is a constant push towards making the pixel size smaller for the smartphone mobile phone market. Because there is just no other way of increasing image sensor resolution while also making your image sensor smaller. You have to physically fit all of those pixels in the same amount of even smaller amount of physical area, while also allowing high resolution. So these are conflicting requirements, and it's physically impossible to do it any other way. You have to make the pixel smaller. It will take an example. Let's say you have some still camera or like a video camera, right? The size of the image sensor is not as big a constraint. So you can have an image sensor, which is maybe like 2 centimeters by 1 centimeter. You've got a large enough area, hundreds of million square millimeter, where you can collect enough light even with a mega pixel sensor. But compare that with the size of the sensor that you have to put on your smartphone. It's got maybe 10 square millimeter of area, which is a small fraction of how much area you have on a video camera or a still camera.

But then you pay the penalty of collecting less light. You have to deal with that one way or another. Now, one way, one big challenge with these small pixels is that because the pixels are getting so much smaller, the amount of charge that they can collect is also much smaller. Right? So the full well capacity of these pixels is also much smaller. That limits the native dynamic range that you can get from these small pixels. But there's really no other way. I mean if you want to maintain that high resolution in the small amount of area, those are some of the challenges that will have to deal with computational in post processing or playing some other tricks in hardware.

For example, the winning idea that we just talked about. Okay?

So I my expertise is not too much on like the low level hardware details. So I can't really go into too much detail on that. But I can certainly see that if your pixels are just so small, then it will be a challenge to fit the other processing circuitry that needs to go within each pixel. Although I feel like with some of the recent advances in three d stacking that could help alleviate some of those problems where you could fit that circuitry in a second layer or maybe even a third layer. But you're right. So it limits how much you can do in inside. Each pixel of the pixel itself is so small.

That's also not easy to do, because if what I have read is some of the companies have some really special expertise in terms of aligning these are different wafers together down to like micron level precision, which is really important if your pixel sizes are so small. It's certainly true that type of technology has also benefited pixels that are slightly larger.

Q:Alright. Let's move on the next section about that. My next section is about the on chip HDR pictures or the logic chip of the image sensor. So why are the more and more mobile signal circumstances support on chip HDR features used to be our image sensors of mobile phone. It just take image. So today's more and more image sensors of mobile instances have some like the staggered HDRs or the dual confess again, HDR such as the technology.

So why did more and more image sensors of supports that features?

A:So there is strong demand for better and better image quality on smartphones for consumer applications. Right? One way you can do is HDR imaging is the traditional way where you capture a bunch of images, and then you apply some post processing algorithm that could run on your smartphone using the cpu or the ram on your phone.

But there are several down sides to doing this kind of traditional post processing approach. One is latency, because if you're doing it in post processing, you're paying the extra penalty of transferring the image data over to either your cell phone memory or some intermediate processing, compute module, which does the HDR algorithm. The other downside is noise. It's always advantageous to run your algorithms as close to the image sensor in the native data format as possible. Because all the subsequent steps you apply to it will introduce some noise and some artifacts in that signal processing chain. The third downside to doing is post processing approach is that we won't be able to deal with motion artifacts, if it's also connects back to the latency issue where if something in the scene has moved, while you're capturing these multiple images, if you're not doing the running the HDR algorithm, fast enough, if your frames are captured after too much of a delay, then you have to deal with that motion in post processing.

And that is often very difficult to do. Then you'll have to live with some motion artifacts like ghosting in your final HDR image. That is increasingly not acceptable even to an average consumer these days, right? We have all gotten used to very good image quality on our phones. I think this is the main reason why many of the image sensor manufacturers are now pushing towards implementing these HDR features as close to the image sensor as possible. And I guess it's not a very new trend, because if you trace back these pixel designs where you had a finite full well capacity, but then the overflow charge used to be captured in like a bucket on the side that has been around since like the early 2000s.

And then more recently, there are these rolling shutter sensors where they use some kind of interlacing schemes where different rows use different exposure settings when the data is being captured. And then the post processing HDR algorithm is able to generate a single high dynamic range image while accounting for those exposure setting variations between either the different rows or different groups of pixels on your image sensor and so on. In the end, it's just making sure that there are no other motion or ghosting artifacts, because all of that data was captured simultaneously at the same time with respect to the scene, right?

At the end of the day, all of these techniques are just making a tradeoff. The tradeoff is between the final spatial resolution that you can get and the dynamic range.

Q:But others required hr processing out of the sensors, such as the s5cage, which is Samsung and which is using in the xiao mi 11 ultra require additional ap sensors, such as nigerian associates to process their strike sdr image has captured. And I believe San francis, it can improve the efficient power efficiency. So in the future, the processing of the computational photograph piece will be completed by the image sensor on chip isp or by the independent aps, like the jane 2. And why?

A:For HDR or even in general, for any computational photography algorithm, right? You can imagine implementing it in a few different ways. You could either implement it as purely post processing where you capture all the image data, and then you deal with it later on. Maybe even as late as running it on the processing unit and the ram of your smartphone, or you could run it on the extreme end where you start processing the data as soon as the light is hitting the sensor, you do some in pixel processing as they call it. These are the two extremes, and then there is everything in between. Now, in general, I believe the closer you move your processing to the image sensor, there are some advantages to it, because then you can process things much faster. So there's low latency. In general, it is more robust to deal with noise and motion and other image artifacts. When you deal with that information in the native raw data domain, which is as close to the image sensor as possible. But then there is a downside where you lose some amount of flexibility, right? Let's say tomorrow I want to release some other finely tuned, optimized algorithm for HDR. It's way easier for me to do that if it's implemented later in the signal processing chain, right? It's much easier to send someone an update for their app than having them to do, like a form where update that has to change some low-level code in their isp or ap. I think many computational photography algorithms, including HDR have become so standard that I think most people would just consider many of these as just default settings on their smartphones.

And so for such algorithms, it does make sense to implement them on chip isp in a very heavily optimized way. Once it's there, you don't really have to touch it or change it that much. The final answer to your question, I that dichotomy between is ips or these independent application processors, I think that the devil is in the detail there, because there may be many application, specific requirements and many different tradeoffs that you have to consider on a case by case basis. It may make sense to implement that a on a dedicated isp that just does one thing, and does it really well or it may make sense to have a bit of flexibility, and then you run it on an application, specific processor, where HDR is not the only thing it does many other things.

Q:My next question is about the market of the innocence, because in sony space, in business segment briefings in 2022, president of ceo of the sony semiconductor solution corporation says that in 2024, the still image of the mobile phone are expected to exceed interchangeable lens camera image quality. And coincidentally, the mac lobby who lead a team to develop the hdr take hdr plus technology for google pixel smartphones. It's also posed a similar opinion that the move on have potential to replace the dsars what do you think of such a point of view? And what are the promising development directions in mobile imaging that can achieve this goat from the perspective of the innocence and the software such as computational photography?

A:Yeah. So that's a very interesting tape. And I think in some ways, they are being bold in making that statement. But then again, it's true to some extent, right? i'm just an amateur photography person. So if you consider my photography skills, that statement is already true, like my smartphone takes much better photos than what I can take with my ds in our camera right now, because i'm just not a good photographer, right?

There are so many amateurs out there where it's already true for them, and they're just taking pictures in like not eliminated conditions with really no challenging scene conditions. Most smartphones are taking, if you look at a higher smartphone today, it's already taking much better photos than a mid or low nds a lot. So to some extent that statement is already true. But I think if I want to so if you want my honest opinion on that statement would be I would make a more qualified statement instead of such a broad brush.

So I would probably say something like maybe in 5 years from now. The image quality of like a really high and smartphone with good quality optics and a good quality image sensor will be better than, I say, a low end interchangeable lens camera. So I think that would be my torn down version of what some of these companies are saying today, right? And I think the developments that will achieve that goal, they will come from both sides, both the hardware and the algorithms.

Again, this may be a slightly biased opinion, because I work in computational photography and computational imaging. But I think it's true because we need to optimize both of these things together, in some sense, to keep improving the image quality that we can get with these small size image sensors.

Q:Do you can tell us more about the computational photography technology? Because we found it's for the smartphone side, it's very important technology, I believe in the smartphone imaging technologies. And because you don't need to expand your exercise and a large, expensive ccms. And you can improve your image quality rapidly. Can you tell us more about the future of the such as the competition, photography, the organism and some development directions.

A:So I think one exciting direction in the future, from my perspective is image sensors that are capturing individual photons of light, these sensors that have extremely high sensitivity.

So I would put them under the umbrella term of single photon image sensors. How you actually implement it could be up for debate. I think spiders are one way to do it, and they have shown great promise and very rapid development in the last 5 or 10 years. On the hardware side, the resolutions of these lens based image sensors is increasing quite rapidly. But facts are not the only way of doing single photon imaging. There are other techniques. There's quantum and sensors. There is even traditional cis based image sensors, like the one we discussed earlier with the small pixel cyrus. Those are extremely sensitive, right? They may not be single food and sensitive, but there are still, they have some electron with single electron or even sub electron read noise. They are able to distinguish between just a handful of photons.

That kind of extreme sensitivity levels gives you a completely a different way of capturing scene information, because it captures scene information at the finest granularity that you can never capture. Because there is no other physically possible information that you can capture beyond the photon. In some sense, these image sensor have the ability to capture scene information to a point where there is no additional information that can be captured about the scene.

So now, in the end, you have to then couple this with smart computation to like, actually extract that scene information from this raw data. I think that's where that tight coupling between the hardware and the raw data. And then the downstream computation will play a huge role in terms of the computational photography, computational imaging advances that we need there.

In addition to single photon imaging, there are also advances in the optics side where there's a lot of work going on, designing cameras that don't have lenses. So lensless imaging or designing cameras with some more unconventional lens designs like use of meta optics. So the image quality that we get right now from these kinds of techniques is still not quite close to what we can get with the lens based camera.

So I think I won't really put a date or a year on when a lensless camera will be available in the market, but I guess it's not too far in the future. But once again, computation will play a very a huge role in image capture. If we start looking at these lensless or meta optics based unconventional, lens based sensing methods, in fact, the design of these meta lenses themselves will have to be done in a joint fashion together with the image sensor and the image reconstruction algorithm. And I think that's an exciting future direction. And there is a ton of work that's already happening in that direction.

Q:I have read an article, so maybe but here in a blog, in a sense, the world and they said the using the single proton sensors. So it can it can produce to fuse the imaging to the original rgb senses to improve the dynamic range. Can you talk more about that? Because I find the single processor is very promising directions. And it's very sensitive. It can sense only one proton. So that means it will be very sigma in low light. Performance will very well in low light, because it's can't count the number of the protons. So it seems like it's have very large the full of capacity, because it doesn't need the physical capacity. It will have a very high dynamic range. In nowadays, I think the space center has limited by its resolution. It was not very great. So do you have something comment about that?

A:So that's a great. As you mentioned, spat sensors, even though they have this single photon sensitivity, it may seem like they're good for low light, but they're actually also really good for extremely high light levels, because of the way they capture photons, they give you compression of dynamic range for free in some sense, like the pixel itself. As you said, because there is no well being filled out. In theory, there is no upper limit to how many photos you can count. In the end, you'll be limited by the speed of your readout, but it's like a slowly approaching limit. Like you, in theory, you never really get there. You get this extremely high dynamic range, but because spat sensors are so new compared to conventional cmos image sensors, their spatial resolution and their few factors and their photon detection efficiencies are still much lower than cmos image sensors.

There have been some recent news articles about some companies releasing like mega pixel resolution, spark sensors. So that direction is looking quite promising. We are still quite far away from having like a 10 or 20 mega pixel spat camera with like 90 % fill factor. But that is something you can get with a cmos camera.

Now, if you want high resolution, and if you also want extremely high dynamic range, I think it makes perfect sense to combine the best of both words and have high dynamic range scene information come in from a spat based sensor and extract a high spatial resolutions from a cmos image sensor. And you can combine these two streams of information and get the best of both words you have high dynamic range and high resolution.

The other reason why I think it's promising is because there are phones out there today that already have image sensor modules with multiple cameras on it, where one of them is a cmos camera. One of them is a spat camera that's currently being used for three d imaging for a light out like application. But it goes to show that you can, in fact, have a multi camera module where you can combine different types of image sensors. In this case, the spat and the cmos camera in a single module. And it's then up to the downstream computation to combine that information in a smart way to give you the information that it's capturing.

Q:Talking about the spec cameras, as you have already said, what's the technical challenge of the spec cameras or in the futures is have some development directions? I want to ask that because we have already talked about the resolutions. I believe you have already mentioned that camera has the readout time or a readout speed issue. So can you just expand that?

A:So i don't think spark cameras have read out issues per se, but the raw data that this spat camera will essentially contain a spike for every almost every photon that arrives on every pixel, right? The volume of the raw data that will come out or come out of this spat  camera will be way too big to download off of the sensor and deal with in post processing. Right? So for spark cameras, I think it is absolutely essential that we come up with smart methods for doing some kind of in pixel processing, where we extract some information as quickly as possible, so that we don't have to transfer all of this photon data of the image sensor, because that's going to require way too much bandwidth and will also consume a lot of power.

But that is possible. I you really don't need to have send every single photo on off to your isp or ap, you can extract some low level information inside each pixel itself, or maybe by doing some groups of pixel processing, and then send that information off to the isp or the ap and there are companies out there today that are already working on some really exciting applications in the real world for just passive imaging using spark cameras, even without fusing it with cmos cameras. And it does make sense to do that in some challenging scenarios.

For example, if you have extremely low light, then it absolutely makes sense to have a single photon sensitivity there. And if you combine low light with high speed motion, that's really where conventional cmos cameras do struggle to give a good scene information. That's where spat cameras really have an advantage.

Q:That point of view is very instructive. So my next question is, as the performance of logic chip of stack modes make sense to become more and more powerful. Because the processing node of the larger chip is more and more advanced used to be, I believe, using the 65 nm nodes and today's in tsmc just falls Sony image sensors.  

The logic chip didn't have produced some Sony image sensors logic wafer. They have advanced to the 22 nm node. So the larger chip of the stacking methods to become more and more powerful. Well, the images sensors have more and more functions other than just imaging in the future, such as Sony IMX500 AI process building and which have it can do some edge air processing like the face recognition such as that.

A:And I think this trend will continue to grow. I feel pretty confident about that, because there are so many applications of image sensors where the image itself is not the end goal. It's just the starting point for some kind of smart processing algorithm that needs to extract some kind of meaningful, higher-level information from these images. Right? Think of all the advances we are seeing in a computer vision these days, things that we can do with these learning based data driven approaches that use deep neural networks on large image data sets. Right? For such applications, the image is just the starting point. The image is not the end goal. You could imagine so many applications where, for example, let's take the application of a robot trying to navigate through a building, right? It doesn't really matter if I show aesthetically pleasing like really beautiful looking images to that robot, just doesn't matter.

All that matters in the end is did the robot avoid making collisions? And if it get to the destination quickly and smoothly, right? And maybe beautiful images, nice looking images are not needed for achieving this task. Then there is tons of industrial automation tasks where you need to run one very specific thing in a very controlled environment. Let's say you're doing object recognition for a very controlled set of objects. It is going to be just one out of these 10 or hundred objects. Again, in such cases, it totally makes sense to just run that object recognition or object detection algorithm right there on the image sensor. Right? There is no need to download all these images and then send them off to some compute module that's downstream. Makes perfect sense to run them on the image sensor. In that way, you also avoid extra latency and bandwidth cost of transferring these images over to a host machine. You can potentially run these even faster, much higher frames per second.

Overall, I think this trend of integrating machine learning algorithms on the image sensor will keep expanding and will be much bigger than what it is right now. Okay.

Q:Or the hyper running. So that's such some new technologies. And in the image inside and in mobile instances, it seems like you just already mentioned three d transistor from sony, also, based on the hyper running, because you need to transfer that, you transist output some signals. You must transfer to the next level transistors. So you need the hybrid running such as that. Is that have any new direction? I remember that you have mentions about the meta lens.

You can expand more tell us more details about the methods because I found that it's very promising, also promising in the mobile phone because it can reduce the size of the compact camera modules, because nowadays, the smartphone cameras has more and more larger sensor size and they required more and more lens. You can see a giant bump in the smartphone. So if we have the methods of the lenses cameras technologies, maybe we can using a large sensors and still fit in the small form factors of the smartphone. So can you tell us more about the meta lenses or the lenses camera?

A:I think it's certainly a really promising future direction, but the type of image quality that you get from meta optic space cameras today. It's still quite far away from the type of image quality that a smartphone consumer will expect to see like a lens based camera still gives you much higher image quality and the image reconstruction algorithm themselves for all these lensless emerging techniques.

They can be more complicated than conventional lens based cameras, because you're not capturing the scene information through a lens anymore. The raw data itself is mixing together the light signals that are coming from many different parts of the scene. The raw data itself does not look like the image at all. Now, the onus is on the downstream computational algorithm to unmix this information, if you will, and produce a final nice looking image. Right? So the computation can be much heavier in that case to produce the final image, but that I think that could be other applications.

As I was mentioning earlier, there are applications where the final image is not the main goal. Those applications could be interesting again for lensless cameras. one thing that comes to my mind is there are cameras on smartphones today that are essentially just doing security related things like face id or something like that. Maybe a lensless camera might be okay for that kind of application, right? You are looking for is some kind of unique signature for that person's face.

And then maybe you don't need to actually take a high resolution, a nice looking image of that person. There's also a lot of work in these privacy preserving algorithms that are using lensless methods, basically, right? There is no way to invert that data and figure out who the person was. But you're still able to do some computer vision tasks on that data. Like you can maybe figure out where the person is or what other objects there might be around it. You could do some kind of object recognition or pose recognition. You could figure out if the person is waving their hands or clapping or some other action recognition, kind of application. So I think those could be interesting applications for meta lens and lensless cameras.

责编: 武守哲
来源:爱集微 #集微访谈# #图像传感器#
THE END

*此内容为集微网原创,著作权归集微网所有,爱集微,爱原创

关闭
加载

PDF 加载中...