Home

Content-based Projections for Panoramic Images and - PUC-Rio

1. 27 71 x 5 5 that will be projected do not change across time Only objects are moving For this case it is expected that the background is always the same and the objects move in a temporally coherent way This coherence will be explained further e Case 2 Stationary VP moving FOV and stationary objects In this case the viewpoint and all the scene are stationary The only thing that is different through time is the rectangle a1 a2 x 91 55 This case may be seen as panoramic visualizer of scenes where one could navigate through a scene with panoramic views of it e Case 3 Moving VP This case is the most difficult since everything in the scene is changing through time even still objects The separation between scene and objects is harder in this case and so is the modeling of temporal coherence Table 4 1 illustrates the separation we just did 4 3 Desirable Properties We divide the requirements for a perceptually good panoramic video in two categories ver frame requirements and temporal requirements 98 Stationary VP Stationary objects Moving objects Stationary FOV Trivial case Case 1 Moving FOV Cases 1 2 Moving VP Table 4 1 Separation of the panoramic video problem The per frame requirements are the properties that each frame must satisfy inde pendent of the other frames We propose the following two requirements 1 Each frame must be a good panoramic image Undesirable dis
2. Creating full view panoramic image mosaics and en vironment maps in SIGGRAPH 97 Proceedings of the 24th annual conference on Computer graphics and interactive techniques New York NY USA pp 251 258 ACM Press Addison Wesley Publishing Co 1997 3 M Brown and D G Lowe Recognising panoramas in ICCV 03 Proceedings of the Ninth IEEE International Conference on Computer Vision Washington DC USA p 1218 IEEE Computer Society 2003 4 J Kopf M Uyttendaele O Deussen and M F Cohen Capturing and viewing gigapixel images ACM Trans Graph vol 26 no 3 p 93 2007 5 D Zorin and A H Barr Correction of geometric perceptual distortions in pictures in SIGGRAPH 95 Proceedings of the 22nd annual conference on Computer graphics and interactive techniques New York NY USA pp 257 264 ACM 1995 16 M Agrawala D Zorin and T Munzner Artistic multiprojection rendering in Proceedings of the Eurographics Workshop on Rendering Techniques 2000 London UK pp 125 136 Springer Verlag 2000 7 L Zelnik Manor G Peters and P Perona Squaring the circles in panoramas in ICCV 05 Proceedings of the Tenth IEEE International Conference on Computer Vision Washington DC USA pp 1292 1299 IEEE Computer Society 2005 8 J Kopf D Lischinski O Deussen D Cohen Or and M Cohen Locally adapted projections to reduce panorama distortions Computer
3. An example that one would obtain when minimizing only line energies is shown in figure 2 16 ol Figure 2 16 The method tries to straighten the specified lines but due to the disconti nuous nature of the straight line constrains the result becomes very unpleasant 2 5 2 Inverting Bilinear Interpolation This section is devoted to obtain the bilinear coefficients a b c d used to define the output virtual vertices As we explained such coefficients are obtained in the following way e Given A B C D S project them orthogonally on the plane tangent to S passing through P P Ps P3 say TI and obtain A B C D e Obtain a b c d such that P aA 0bB cC dD The first step is quite simple the equation of is PE Pi Poly Pa Py z Ps 0 which can be rewritten as gt Pix Poy P3z 1 If A is the orthogonal projection of A see figure 2 17 it satisfies A A AP P A 1 for some A Solving the above equations leads to A A AP where 1 P A The same holds for B C and D Now we have the situation illustrated in figure 2 18 Let n be the unit vector normal to we know that in our case n P but we ignore this for a while in order to obtain a general result 92 Figure 2 18 Projected vertices on tangent plane We first look for such that s g 1 B A BC S28 1 BB 8D and P are collinear figure 2 19 Figure 2 19 P s g and s28 must be
4. and the center of the image x y 5 represents the point 0 0 T point rr 5 on the equirectangular domain as illustrated in figure 1 6 Figure 1 6 Correspondence between the equirectangular domain and the equirectangular image The correspondence between r Tr x z and the equirectangular image is very A ta m 1 252 m0 4 0 0 where m and n are the height and width of the equirectangular image and the image is simple assumed to have coordinates in 0 m 1 x 0 n 1 To keep the proportions of the equirectangular domain we impose n 2m The inverse correspondence between the equirectangular image and 7 7 x Z is immediate 2 9 6 EL m The equirectangular image shows the strong distortions that are caused by the map 2TY T TL ping r For example regions near 5 and 5 which correspond to regions 9 near the south and north poles on the sphere are too stretched That happens because the sphere has a much smaller area near the poles for some variation of than near the equator 0 for this same variation of q But these very different areas are represented by the same number of pixels on the equirectangular image To finish this discussion about equirectangular images it is important to emphasize how popular this format of image became during the last years Today it is possible to find thous
5. esting new possibilities on narratives Special display devices that cover a wider FOV can be developed to explore better this new way of filming and displaying e Sport broadcasting There is a huge difference between watching a soccer game in the stadium or at home in the TV One of the reasons for this fact is that in the stadium we 153 perceive much better the details of all the field while at home we only see a limited FOV usually a window around the position of the ball Since the structure of the soccer game scene is very simple some lines in the field and a set of known moving objects one of the first applications of panoramic videos could be for sports filming These considerations are extendable for other sports such as basketball baseball volleyball and so on e Vigilance Spherical cameras can replace common cameras for vigilance since they see much more information Analysis of spherical videos as for example people de tection become necessary and informative ways of displaying the video i e the choice of the appropriate projection may be analyzed We intend to work with these and possibly other practical applications as soon as we develop more the theoretical ideas on the theme 154 Bibliography 1 R Carroll M Agrawal and A Agarwala Optimizing content preserving projec tions for wide angle images ACM Trans Graph vol 28 no 3 pp 1 9 2009 D R Szeliski and H Y Shum
6. v Btan d if 0 We show in figure 1 19 a good result obtained setting a 2 and 5 7 Figure 1 19 180 degree longitude 130 degree latitude 1 5 Previous Approaches As pointed in last sections it is not an easy task to obtain an image from a wide field of view We showed some of the most known projections from the sphere to an image plane and realized that all of them have their advantages and disadvantages This section is devoted to discuss previous approaches created during the last years to deal with the problem of distortions in panoramic images We do not intend to get into too many details of each approach but we intend to show their key ideas in order to motivate the next chapter Thus this section is intended to be a review and a motivation 20 We chose to discuss three papers that we understand to have the key ideas on the development of this theme 5 6 and 7 At the end of the section we mention some other important related work 1 5 1 Correction of Geometric Perceptual Distortions in Pic tures This work by Zorin and Barr 5 is surely one of the most referenced in this area It is probably the first work to apply perceptual principles to the analysis and construction of planar images of the 3D world Their theory is even more applicable for panoramic images where deviations of perception are more present The authors mention that the most important features that an image should have
7. well defined vertical orientation in this frame FOV 180 longitude 120 latitude To correct this problem we use the temporal coherence equations for the object Doing that we also satisfy desirable property 4 Rewriting the equations we have 76 bt Sl bth ta 5g On Ost i oY IV Ig Metals Fb Pras Os ti to gg A dita 0 108 Figure 4 11 16th frame for the video produced using only image energies Due to the lines near the man he is very curved in this frame This fact demonstrates temporal incoherence since he remains vertical in the input video VA t1 E SP x ti Vti te E 0 to In the discretized version of the equations we only use the transition functions from one frame to the next i e we take t ty and to that We denote AP A gt t1 6774 A Br t1 dy Jr kt1lijk eri We always map vertices to vertices on the discretization If that does not happen we map to the closest vertex The discretization of the equations is Saree teh Seer ae o ur eaye E o Viri k Uijk o Ad Ag o E E T E Viet o Ss de o l No Ad E V Aj Gis te E SE X te We use the last equations to formulate the object temporal coherence energy 1 2 in cos Qi COR E P E Vista ag Di a 3 0 BEL Vitis ke Uijk ob i Ad k 0 i j e St 1 2 9 A TEE E o A ta el o Vi 1 j k na Vijk E ol TC SEA AQ k 0 1 5 87 It is not difficult to rewrite E
8. 1 In our work we wrote explicitly the perturbed energy E and proved some statements in order to obtain its minimizer Other example is how to turn the quadratic energies into matrix form and what matrices are obtained in this process The discussion of the results was richer in our work and we could conclude that the method is applicable for a good variety of scenes All this analysis is one of the main contributions of our work e Development of the application software We developed a software that imple ments all the theory discussed about 1 New features such as specification of FOV number of iterations and vertices were added to the interface in 1 which turns our soft ware a small contribution of our work e Methods to detect features in equirectangular images We applied Compu 151 ter Vision techniques to develop methods for detecting faces and lines in equirectangular images which are important for the main reference we discussed in the thesis The line detection method is an original contribution of our work e Panoramic videos We separated the problem in 3 cases stated desirable proper ties in panoramic videos defined the temporal viewing sphere and studied some of its properties mathematically modeled the panoramic video problem turned the desirable properties into mathematical equations obtained energies that measure how temporally incoherent a panoramic video is and suggested an optimization soluti
9. 1 we have the stereographic projection Some results are shown in figures 1 15 and 1 16 16 Figure 1 15 Both with 150 degree long 150 degree lat Left K 0 Right K 7 Figure 1 16 Both with 150 degree long 150 degree lat Left K a Right K 1 The advantages of these new projections are the following e They give us an intuitive way to control the conformality and the preservation of straight lines in the scene The greater the K the better the shapes of objects are preserved and the extreme case K 1 leads to the stereographic projection which is conformal On the other hand when lower values for K are used straight lines become less bent and objects more stretched The extreme case K 0 leads to perspective projection Therefore the parameter K can be adjusted according to the scene and FOV that is being projected e The value K could depend on the points on the sphere Thus we would have K as a function of A 9 K A 4 This idea allows the possibility of having a projection locally 17 adapted to the image content Possibly K A 9 should have some degree of smoothness and such analysis and results for this approach are going to be left as future work 1 4 2 Perspective Projection Centered on Other Points The perspective projection shown on section 1 3 1 preserves well only the shape of objects that are near the point with A 0 0 the center of such projection As we ll see i
10. Both have the same purpose of controlling the conformality and straight lines in the scene Which one is better The choices 5 and 1 K 5 are arbitrary The key ideas that we can take from this work are a panoramic image should respect the structural requirements no panoramic image mapping from the viewing sphere to a plane that satisfies the structural requirements can satisfy completely both desirable properties at the same time an optimization framework is an option for the task of minimizing all the important distortions 1 5 2 Artistic Multiprojection Rendering As we have already mentioned perspective projection causes too much distortion for wide angle fields of view A simple and effective alternative to such problem is to render the most distorted objects in a different way This alternative was already known and used by painters hundreds of years ago for example in Raphael s School of Athens Figure 1 23 The humans in the foreground would appear too distorted if rendered with the same perspective projection of the back ground So Raphael altered the projections of the humans to give each one a more central perspective projection This choice did not take off actually improved the realism of the painting This work by Agrawala Zorin and Munzner 6 suggests using the same method for computer generated images and animations a scene is rendered using a set of different cameras One of this cameras is elected t
11. D B have a negative orientation and thus A and D B are diverging on J as shown in figure 2 20 Figure 2 20 C A and D B diverging on J 54 Figure 2 20 shows us the two possibilities for collinearity between P s 3 and s2 3 We do not want the smallest root G5 because 6a lt O and 0 lt 5 lt 1 Since a lt 0 the largest and desired root is as as 44143 Pi DO a ay For a gt 0 we have the situation illustrated in figure 2 21 Figure 2 21 C A and D B converging on J Now the desired root 5 is the smallest one Since a gt 0 we have Ay y as 44143 Bj gt gt gt SE 204 Thus we conclude that the desired root is always obtained by taking the negative sign in the expression for p Now we want to find a such that P 1 a s g 083 8 as shown in figure 2 22 A E P 1 a si 3 83 3 Figure 2 22 P as convex combination of s g and g This is achieved by defining a as the length of the projection of s P on the vector 51 3526 1 P 81 8 82 8 81 8 Q sas 81 6 dO Thus a P 1 a si g a52 6 1 a 1 B A BC a 1 8 B 6D a bB cC dD where a 1 aj 1 8 b a 1 8 c 1 0 B d a 2 6 Smoothness Joining conformality and straight line energies and minimizing them leads to a result like in figure 2 23 Figure 2 23 The m
12. and passes through zero This also happened for the simple perspective projection the points s t x 0 could not be projected in _ The final projection is in R and we would like it to have a 2D coordinate system This is achieved by performing two rotations on in order to transform it in _ for example We will not expose the details of this process here We show in figure 1 18 a result where the center object the tower appears less stretched than in the standard perspective projection right image in figure 1 8 Figure 1 18 Perspective centered on A 5 5 1 4 3 Recti Perspective Projection This projection also known as Pannini projection is a modification of the perspective projection that is designed to handle wider FOVs and preserve radial and vertical lines The other lines appear bent on the final result Let u and v be the coordinates of the standard perspective projection and u and v the coordinates of recti perspective projection The modification omit u atan a 19 where alpha is a chosen parameter allows this projection to handle wider FOVs and preserves vertical lines A constant u constant In order to preserve radial lines the v coordinate of the perspective projection must be scaled by a constant multiple of the factor used to scale the u axis i e vi U 1 8 5 uU where is a chosen parameter The above expressions lead to ED ago and
13. by Flickr user Vincent Montibus e Field of view 210 degree longitude 140 degree latitude e Number of vertices 55 970 vertices e Number of double iterations 6 e Final energy E 4 9531 107 e Time to construct the matrices 17 seconds e Time to perform the optimizations 124 seconds e Time to generate the final result 10 seconds Input image and marked lines i ta mm 5 t its i 3 ty a wry Tae Be da MOR AD 18 Standard projections perspective stereographic and Mercator RI FP ms 7 a 19 Modified standard projections perspereographic for K 0 6 and recti perspective a 2 6 0 6 Result of the method uncropped SO Result of the method cropped e Comments for this example This example illustrates that even if many lines are marked on the input image the method returns a good result Sixty nine lines were marked and covered a good part of the field of view that would be projected and this fact was well handled Because of the quantity of green lines more double iterations were needed to reach visual convergence Thus the method took longer than two minutes to obtain visual convergence 81 3 3 Result 3 e Source image La Caverne Aux Livre by Flickr user Gadl e Field of view 150 degree longitude 140 degree latitude e Number of vertices 39 758 vertices e Number of double iterations 3 e Final energy E
14. cos z cos A cos 6 MM implies A arctan 2 Now consider x lt 0 y lt 0 for such points on the sphere we must have r lt A lt 5 and A arctan 2 7 satisfies such inequality and the relations between x y and A Analogously for x lt 0 and y gt O the solution is arctan 2 7 For points s t x 0 y lt 0 the correspondent longitude is A 5 and for x 0 y gt 0 we have A 5 To summarize A arctan 2 y x 32 where arctan 2 gt arctan 2 7 y PD 0 arctan2 y 1 4 arctan 2 r a re Und gt 20 gt y U 70 5 0 y lt 0 Beyond allowing the user to specify lines in the equirectangular domain the interface also allows her to set the orientation of the lines by typing v for vertical lines h for horizontal lines or g for lines with no specified orientation As we mentioned in section 2 1 it is important that the final result has orientation consistency If it were not possible to set line orientations the final panoramic image would be like figure 2 3 Figure 2 3 A result produced by the method if the user had only marked lines with no specified orientation It seems that the tower is falling for example In order to avoid such problem we give the user the possibility to specify the orientation of lines that she wants to be vertical or horizontal on the final result To summarize our interface works in the following way e Inpu
15. represented with solid lines For this example factor 2 As mentioned before we do not have any control on the positions that the solution x of the optimization will return Let Umin MIN Uij Umar Max Uij Umin MIN Vij Umar Max Vij 2 9 1 1 2 9 126 We map all values u j vij to tij vij such that tj V E J0 ratio res x 0 res according to the following transformation x Uij Umin E Vij Umin Uij x ratio res Vij gt res Umax Umin Umax Umin Umax Umin where ratio and res is a specified parameter that determines the height maz Umin of the result image Each pixel on the equirectangular image for example pixel P in figure A 10 has known surrounding vertices on the discretization of a a2 x G 62 and known bilinear coefficients say P dal Gig a O Oa as Qij Qi ij41 Qu j41 1 We simply define that P is mapped to u P 0 P on the final image where UP aU Qui gut Gta QUA Lj OP SOU OU O E E Thus the final image is just a bilinear interpolation the positions of the vertices on the discretization of the viewing sphere A problem in this approach is that if the final image has a higher resolution than the input image holes on the final image may appear A more appropriate approach would be to use inverted bilinear interpolation section 2 5 2 We leave this alternative as future work and assume that the relation between
16. x 191 87 k 0 we calculate the optimal panoramic image function u G IR A 6 u A p v0 and for each t we define U e a x 8 83 x 14 gt R A Q tr E u A Pp v A Pp Ho i e each frame of the panoramic video will be the region of the panoramic image corres ponding to the FOV that is being projected It is not difficult to see that this construction satisfies the temporal coherence equa tions since the projections are the same for different times the derivatives will also be the same The final video is not an optimal panoramic video because each frame of the video may not be an optimal panoramic image for the FOV that is being projected it is optimal only for S We make available in 22 an example of panoramic video constructed in the way we just explained This example may be compared to the temporally incoherent solution We think the optimization framework would lead to similar results and this compari son 1s also left to future work 4 8 Concluding Remarks As we mentioned in the beginning of this chapter what we developed here was just first step in our research on panoramic videos But even here we could see how important 116 temporal aspects are in modeling panoramic videos We left aside in this chapter case 3 of panoramic videos This case is the most difficult one since all the scene is moving not only moving objects Separation between moving objects and scene i
17. x t The definition above may be extended to subsets S xt CS x t e Given A to if for all t 0 to there is a transition function ps defined on A to the set pto A Q to Frejoto is the orbit of A 9 to Other dynamical properties could be derived from the definition of transition functions but they are not important now 102 TS gt a es es s SS A i a su m l Figure 4 4 Projection U tangent basis of the temporal viewing sphere and tangent basis of the final panoramic video i 5 X to j is M oF 4 A O ty gt Pasto ia gt A ot A 9 th 4 1 1 1 1 1 0 Figure 4 5 Transition function from time t to time t e The transition functions consider occlusions if a point A t of an object is occluded in time t At then yr as is not defined at A t We consider two kinds of transition functions the scene transition function denoted by pi and the object transition functions denoted by y Yt t 18 defined im the entire S x t and model the temporal coherence of all points that are being projected It must depend on the movement of the viewpoint For example if the VP is stationary we have PE A 0 81 A Q t2 VA Bi E S x t1 since all points in S x t are stationary except for the moving objects 103 Ga is defined only in object regions and tells the correspon
18. 1 0 1 Ov do cod cos 3A Thus the projection 1 4 gt u A v log sec d tan is cylindrical and conformal We show in fugure 2 8 an example of wide angle image generated by the Mecator projection where we can see that in fact shapes of objects are well preserved Figure 2 8 Example of Mercator projection Now we consider the stereographic projection Statement 2 2 The stereographic projection 1s conformal 40 Proof As we saw in section 1 3 2 the coordinates of the stereographic projection are 2 sin A cos 2 sin o cos A cos 1 aid cos A cos 1 It s enough to check the two C R equations in order to prove conformality but here we check only the first one since the second is analogous u 53 2 sin cos cos A cos q 1 95 cos A cos 1 2 sin cos d do cos A cos q 1 2 sin sin cos cos 1 cos A sin 2 sin cos cos A cos q 1 2 sin A cos sin cos q 2sin Asin 2sin cos A sin cos cos A cos q 1 2 sin Asin q cos cos q 1 dv 2 2 sin q cos cos 1 cos cos 1 2 sin O cos cos q 1 sinAcosq 2sind 2sin Asin cos cos Acos 1 cos Acoso 1 Ou 1 Ov TE Thus 550 6 731000 VO 0 1 7 x 55 We show in figure 2 9 an example of panoramic image produced by the stereographic projection where the sha
19. 2 Figure B 2 Value of the integral image at x y The integral image can be calculated for the entire image in just one pass In fact using the cumulative sum in row z gt Dery we have the relation ix y iile 1 y s x y and this relation permits that the integral image is evaluated in just one pass Here we assume s x 1 0 and 2i 1 y 0 Now it is easy to evaluate a sum inside a rectangle For example in figure B 3 the sum within D is easy to obtain using the values of the integral image Figure B 3 The sum within D can be computed as 4 1 2 3 the obtained detector in different resolutions More details in 16 130 With the rectangular features we can start defining the classifiers A weak classifier is a function h x f p which depends on a feature f a threshold 0 and a polarity p 1 4 ifpf x lt p O otherwise MT Jp o where x is a 24 x 24 sub window of the image Now it is necessary to choose which classifiers characterize better face features This selection is done using a training process based on face and non face examples some face examples used for training are shown in figure B 4 as described below After such classifiers are chosen a strong classifier is constructed This process is explained below E Mi Figure B 4 Example of frontal upright face images used for training Adaboost Inicialization e Given examples 1
20. 2 1 The Hough Transform ss macias a aoe ee a Domo Bilateral Puller ss dra RS MEE OS ESD EH dd B 2 3 Eigenvalue Processing e eget ME VELO y oscars ie oe ees Oe SO ee es es he Gee Za ARES e hick a ee BL oem Re ee BOSE OE AYS Bek eed B20 Concluding remark s adorada dd ado e DAS Conclusion LSV dados o a A eee Oe oe ee ee ES A A A o A SIT ARAS Future WOK qr ded tere Di a Y we hh By Reed eth Ard dE DR A Bh D a Bibliography 128 128 129 133 135 136 137 137 140 141 143 148 149 151 151 152 153 155 Introduction Motivation and Overview of the Problem This thesis studies the problem of obtaining perceptually acceptable panoramic images which are images that represents wide fields of view One of the motivations for this problem is that common cameras capture just a limited field of view FOV usually near 90 degree longitude and 90 degree longitude while our eyes see about 150 degree longitude and 120 degree latitude When we see a photograph it is as if we were seeing the world through a limited window This imitation in common photographs happens because they are produced under a projection that approximates the perspective projection which stretches objects too much for wide FOVs The panoramic images can be used to extrapolate our perception since they can capture FOVs beyond the human eye Also a panoramic image allows us to better represent an entire scene There may be important parts i
21. Direct view condition All possible objects in the image should look as if they were viewed directly as if they appeared in the middle of a photograph The authors obtain two functionals K and D that measure how much a mapping Tsphere from the viewing sphere to a plane does not respect both conditions So ideally we should find a map such that DO ieee L T ieee 0 After a theoretical development it is shown that there is no T pnere Satisfying these conditions and also the structural requirements Then they suggest to minimize the functional Ps HUK Tsphere a 1 WD mero where u is the desired tradeoff between both desirable conditions An approximate minimizer for this functional is a perspective projection of the viewing sphere followed by a one to one smooth transformation of the image plane given by r As A p A2 1 4 where 7 amp is the polar coordinate system of the perspective image p Y is the polar coordinate system of the transformed image is a parameter that depends on u and R depends on the FOV that is being projected Some results for different values of A are shown in figures 1 21 and 1 22 Figure 1 21 Both 150 degree longitude 150 degree latitude Left A 0 very similar to stereographic projection Right A 1 perspective projection 22 Figure 1 22 Both 150 degree longitude 150 degree latitude Left A 5 Right Pers pereographic projection for K gt
22. TAXAS Wij COS Qij Wij COS Pj Sita nasal nes s ANAd S Sarasate AAG t 1 m 1 7J 0 n 1 The other entries are defined as zero S is called the smoothness matrix S has at most 4 nonzero entries per row so it is sparse Minimizing only the smoothness energy leads to results like the one in figure 2 25 Figure 2 25 The solution for minimizing only Es Joining the smoothness energy to conformality and straight line energies causes the expected result on the image in the beginning if this section as shown in figure 2 26 2 7 Spatially Varying Weighting In this section we present the weights w used to define conformality and smoothness energies Each w is associated to a vertex on the viewing sphere A They control shape distortions and variation of the projection in different regions of the panoramic image IS Figure 2 26 Joining all the energies together leads to a smoother solution Observe that the undesirable artifacts in figure 2 23 are corrected These weights strongly depend on the image content which was pointed out as a desirable property in section 2 1 We construct the weighting function based on three quantities proximity to a line endpoint a local image salience measure and proximity to a face e Line endpoint weights The straight line constrains are very discontinuous along the projection while a vertex may have no such constrain another one very close to
23. about time of training and detection can be found in 16 Another important point to emphasize is that this the training and detection processes are applicable for detection of other kinds of objects not only faces B 1 2 Method and Implementation In this section we show and explain each step of our method for detecting faces in equirectangular images The method starts by projecting the equirectangular image using the Mercator projection section 1 3 3 and obtaining its corresponding Mercator image Next the face detection process explained in previous section is applied to the Mercator image and the coordinates of the detected faces are mapped back to the equirectangular domain We also show implementation details of some steps We used the OpenCV library 24 and C language to implement this method Step 1 Obtain the Mercator Image This step is necessary because the faces in the input equirectangular image may be too 133 distorted due to the longitude latitude parametrization distortions Beyond the Mer cator projection preserves the vertical orientation of the faces that are vertical in the equirectangular domain Also the Mercator projection is conformal i e it will preserve well the shape of the faces These two observations make the face detector explained in the previous section applicable for the Mercator image Step 2 Process the Mercator image Obtain its corresponding gray scale image and equalize
24. background is rendered using the multi plane projection and the objects are rendered each one using a local perspective projection centered on it The author of this thesis implemented most of the techniques contained in this article in his Image Processing course final project The reader can see the details in 13 1 5 4 Other Approaches In this section we expose very briefly three other approaches 14 4 and 8 that also deal with the problem of constructing panoramic images 1 Photographing Long Scenes with Multi Viewpoint Panoramas 14 This work addresses the problem of making a single image of a long planar scene the buildings of one side of a street for example having as input a set of photographs taken from different viewpoints Although this problem is different from the one we are concerned about in this thesis here the input comes from a single viewpoint it faces similar difh culties such as preserving shapes of objects and making the final result a comprehensive representation for the scene 2 Capturing and Viewing Gigapixel Images 4 This article presents a viewer that interpolates between cylindrical and perspective projection as the FOV is increased or decreased The ability for zooming in and out panoramas is more useful when the input has high resolution We think that the perspereographic projection presented in section 1 4 1 is a simpler solution for this task and could also produce good result
25. chapter 3 We discuss more this example in next section where we illustrate how to obtain the weights Wi face weights introduced in section 2 7 Figure B 9 6 faces correctly detected and 1 false detection B 1 4 Weight Field We show below the output file face matrix txt for the result shown in figure B 9 1 8316 0 1564 0 0237 0 8419 0 0408 0 0259 0 8 34 0 1502 0 0244 2 33 4 0 0816 0 0259 0 0346 0 1035 0 0326 136 1 7247 0 1502 0 0467 1 1058 0 3801 0 0541 Each line corresponds to a detected face say fg The first two numbers are the coordi nates Az x of the center of the face in the equirectangular image and the last number say ow corresponds to one third of the radius of the face in the Mercator projection For each A j Qij in the discretization of a1 a2 x 61 62 we define IMA 419 MA bp 11 202 Wi E k F where M is the Mercator projection To define the face weights w we just sum the vh YE wl f face weights over all faces B 2 Semiautomatic Line Detection in Equirectangu lar Images As we saw in Appendix A the task for marking lines in equirectangular images may be quite long depending on the scene Also straight lines in the world are curved in the image and identify which arcs correspond to lines in the world may be difficult in some cases For these reasons we propose in this section a method to semiautomatically detect lines in equirectangular images Thi
26. collinear Since n is orthogonal to s gP P s g and 1 5825 82 8 81 8 this is equivalent gt to the volume defined by n s P and 516826 to be zero i e n P 81 4 X S2 6 81 0 9 where denotes the usual inner product and x denotes the cross product in R 59 Developing the above expression we obtain 0 MP 81 8 x 82 8 81 8 n P x 593 P x 81 81 3 X 52 8 18 X 18 EZ n P x 1 BLED PX 1 BA CC 1 B A BC x 1 8 B BD n 1 co x B P x D 1 B P x A B P x C 1 BA x B 1 BIB A x D B 1 B C x D BAC x D n Px B nd A x B 8n P x B P x D EA od take athe els 22n x B Ax D x B x D amp 0 n C x A x B 6 n D B x P C A x P B n P A x P B Defining a n C A x D B ag n D B x P A C A x P B the equation becomes a 3 a2 a3 0 which has solutions E as az 4a 03 204 We choose the root by analyzing the sign of a n C A x D B First we observe that a 0 is not possible n is parallel to C A x D B Since n 0 a 0 would imply A x D B 0 which means C A and D B are collinear and this does not happen by construction Now suppose a lt 0 This means that n C A and
27. derivatives using second order finite differences Et bij amp i aula Ut ade bij amp bp ia UA E Ag 479 YJ Ad j Op 199 YJ Ad Oru _ UMA O Uij 1 Uy T Uij Orv A Vaga O Vig 1 O Vi41 F Vi DADO A Ad DM AMA To obtain the smoothness energy we also multiply the equations by spatially varying weights section 2 7 that control the strength of such energy on different areas of the panorama and again by cos p to make the energy depend on the area of the quad Ha gt u cos Qiz Seu R tu L s T 2 cos Qij 111 1 11 era 1n 1 DDD key i 9 0 m 1n 1 2 e Lu wees He te A 1 1 9 0 SP Wi cos i Penas Vij l Vi T ay eer AMG ot As the other energies we turn E into a matrix form E Sx Each double summation of the energy must correspond to a line in S Thus S has 4 m 1 n lines The entries of S are Wij COS Pi 2Wij COS Qij S4 j i 1 n 2 j i 1 n 1 AD S4j i 1 n 2 i n AD gt Wij COS Pi Wij COS Pj Pari E RG 2Wij COS Di Wij COS Qij DAG Gaye aA Ady Dag DA a AH Wij COS Pi Wij COS Pj Sa j i 1 n 2 G D 1 0 1 AMA SA j i 1 n 2 2 j i nt1 NAg O Wij COS Pi Wij COS Pj S4 j 6 1 n 2 2G i41 n41 TAXAS gt Sa j i 1 m 2 20G in 1 AAAS Wij COS ii Wij COS Qij S4 j 6 1 n 3 2 G 1 i41 n 1 41 AXA S4 j 6 1 n 3 2 F 41 i n 1 1
28. described in figure A 8 are performed each one at a time This script is run by the command run sh as explained in the previous section In this section we explain implementation details of windows 1 and 2 and Matlab processing which is the most important module since it implements most of the theory explained in chapters 2 and 3 Face detection module is detailed in the next section We placed it in a different section to emphasize the face detection process in equirectangular images which is a method itself 122 A 2 1 Window 1 Window 1 was implemented in C language using FLTK Fast Light Toolkit API application programming interface Window 1 was shown in figures A 2 and A 3 It consists basically in a box that shows the marked equirectangular image and a close button We developed a class Box that inherits the properties of the class Fl box The dif ference of this new class is that it handles mouse click events and typing h v or g events When two points are clicked and one of the keys h v or g are pressed the class calls a function named draw_arc that draws the corresponding arc connecting these two points Finally the image on the Boz is reloaded The curve drawn is only useful for interaction purposes The draw_arc function also saves in a text file the coordinates of the endpoints an index for the line and the specified orientation After the process of marking lines is fini
29. dip v2 O p v1 v2 where O is a differentiable function on S that never vanishes The above definition says that dy preserves inner products except for the O factor The following statement proves that conformal mappings preserve angles Statement 2 1 Conformal mappings preserve angles Proof Let y S S be a conformal mapping Let a I gt S and 3 I S two curves in that intersect in say t 0 The angle 6 between them at t 0 is given by O AA a 0 IIIB cos 0 lt 0 lt T 36 p transform such curves in curves po a I Se pob I S that intersect at t 0 forming an angle given by dea a 0 deso 010 _ OHa 0 B 0 AD Tapo dos AT TO MAT cos E The definition of conformal mappings turns up to be appropriate for modeling pre servation of shapes according to definition 2 1 locally the objects can only be rotated and or scaled in an equal manner along all directions As we saw in statement 2 1 this also implies in the preservation of angles We bring the formal discussion into the panoramic image context by taking in the definition of o S 8S S R and y u Let pe 7 7 x z 5 The basis of T S associated to r the longitude latitude parametrization is END oo where sin A cos cos A sin Or Or Dy 2 cos A cos and ag sin A sin 0 cos Assuming u to be a dipheomorfism du T S gt Tu pR R has the fol
30. emanating from the pole opposite to the point of tangency 1 0 0 in this case So it is essentially the perspective projection but with lines coming from 1 0 0 instead of coming from 0 0 0 as shown in figure 1 10 Figure 1 10 Stereographic projection If 1 9 2 is the projection of x y z S by similarity of triangles we obtain the following relations 2 al goo z 2 N gt l 2y 22 2Y 22 Thus x y z S d to 1 Y l E Ob serve that the mapping is not defined at 1 0 0 the opposite pole to the tangent plane In longitude latitude coordinates we have 2 sin A cos 2 sin A cos A cos sin A cos sin gt ies GOT ea So the final formula for the stereographic projection is re x VACO gt R 2sin A cos 2 sin 4 6 u v room We show some results of this projection for different scenes in figure 1 11 13 Figure 1 11 Left 180 degree long 180 degree lat Right 180 180 The main advantages and disadvantages of the stereographic projection are Advantages e Lines that pass through the center of the image are preserved e It is conformal i e it preserves the shape of objects locally Although the objects near the periphery of wide fields of view are stretched this stretching is the same in all directions which maintains the conformality of such mapping Disadvantages e Most of t
31. equirectangular images The Computer Vision and Image Processing techniques we used are explained We finish the thesis by discussing what we consider to be the future of this theme panoramic videos Chapter 4 is an initial step on this direction We separate the problem in 3 cases discuss and model undesirable distortions in panoramic videos and state the problem as finding a projection from the temporal viewing sphere to the 3 dimensional eu clidian space An optimization solution is proposed for case 1 and other possible solutions are discussed Some initial results are provided Chapter 1 Panoramic images A panoramic image or wide angle image or panorama is an image constructed from a wide field of view The field of view FOV is the angular extent of the observable world that is seen at any given moment In this chapter we model the field of view as a subset of a unit sphere centered at the viewpoint From this we derive standard projections from this sphere to an image plane and also show some modifications of them Motivated by the distortions that these mappings cause we show some previous ap proaches that proposed methods to alleviate such problems This chapter can be seen as a detailed introduction to the panoramic image problem and its main goals are e Present and explain the necessary formalism section 1 1 e Clearly state the panoramic image problem section 1 2 e Discuss known projections and previous appr
32. in the matrix form E OBX but we do not detail this part here We join this new energy to the previous one and obtain AX E AX 109 where WopOB Wop is a weight that controls the strength of this new energy Using Wo 4 in our example corrects the problem of too much orientation change in the object We show the new result in figures 4 12 and 4 13 Figure 4 12 8th frame for the video produced using image energies and object energy Now the scene starts to change in order to preserve shape and orientation of the object Thus it becomes necessary to impose temporal coherence for the scene which was stated as desirable property 3 in section 4 3 We rewrite the temporal coherence equations for this case 55 A Q to 55 0 Q t 0 oV oV ao Q t2 ao dt 0 V A 4 t1 E fay ae2 x 01 05 x ti Vti te E 0 to Using the discretization of the temporal viewing sphere and only transitions between t and t 1 1 we obtain the following 110 Figure 4 13 16th frame for the video produced using image energies and object energy Comparing to figure 4 11 the man is less curved as expected Now other problem arises the projection for the entire scene changes too much to satisfy object constrains One can observe for example that the borders of the projection from 8th frame to 16th frame change discretized equations Vivi jet Visita Vitiin Visit N 0
33. is too large the white points lie close to the direction of v the eigenvector associated to Ay If A and A are too close then there is no predominant direction and the points are too spread in the window These ideas give us a way of discarding points e Given a binary image a window size and a threshold value 7 e For each window if e lt T discard all the white points of this window For the example in figure B 16 we obtain the result in figure B 17 To implement this step we programmed a function named eig preprocess IplImage eig_preprocess IplImage grayscale float tau int window size B 2 4 The Method In this section we explain how we integrate the techniques exposed up to here to detect straight lines in equirectangular images We explain the steps of the method and illustrate them with an example We used OpenCV library and C language to implement this method e Input An equirectangular image that represents a scene figure B 18 e Step 1 Obtain six perspective projections from the equirectangular images centered on six different points section 1 4 1 The choice for perspective projections is quite obvious 143 Figure B 17 Result obtained with window size 5 and T 1 5 The eigenvalue proces sing helps to eliminate some textures Figure B 18 Equirectangular input image Kasuza Ushiku Station Kominato Railway Chiba Japan by Flickr user Soosuke since we want to detect straight
34. it may be the beginning of a line where a strong constrain is imposed This behavior leads to more distorted quads near line endpoints We thus define weights wi that are stronger near line endpoints to make conformality and smoothness energies correct this problem Let p 4 6 be an endpoint Suppose p lAs 5 7 Aip jp 1 X Dip jp Piptlijp figure 207 For each 2 0 m 7 0 n we calculate the value of a gaussian function centered on ip jp with deviation o 5 and height equal to 1 a AS iJ To define the final weight wi for the vertex A we just sum over all endpoints 99 i ip 5 5p L X p __ gt Wij Wij 20 p endpoint p endpoint 59 ip L jp ip 1 jp 1 ty Ie tn jp 1 Figure 2 27 p and its corresponding quad In figure 2 28 we compare a result without and with such weights Figure 2 28 Left Without line endpoint weights Right With them Two close line segments in the left image highlighted in red and green have too different orientations problem corrected in the right image Also the face highlighted in yellow in the left image is near a line segment and becomes too stretched fact less noticeable in the right image e Salience weights These weights we are constructed based on the observation that in areas of the image with many details the projection should be smoother and in other ares like skies or walls the p
35. lines in the real world we have to look for straight lines on an image where the lines are preserved As we already know the perspective projection satisfies this requirement The perspective images for our example are shown in figures B 19 B 20 and B 21 e Step 2 Filter each perspective image with the bilateral filter applied six times This step depends on two parameters of the bilateral filter and produces results as in figure B 22 showing just two of the six results 144 dai Apagar Figure B 21 Perspective projections centered on A 4 0 5 and A 6 0 5 e Step 3 Obtain the edges of the filtered images using Canny edge detector This step depends on a parameter to set the thresholds for the filter The greater this parameter the smaller the number of detected edges The results for this step are shown in figure B 23 145 after applying bilateral filtering Figure B 23 Edge images obtained with Canny filter e Step 4 Process these edge images with eigenvalue processing This step depends on two parameters to discard points The results for this step are shown in figure B 24 Figure B 24 Results after applying eigenvalue processing e Step 5 Detect lines from the binary images obtained in step 4 using OpenCV s Hough 146 Probabilistic Transform Figure B 25 shows the results for this step Figure B 25 Detected lines are indicated in green e Step 6
36. of my masters studies not only constructive but also very pleasant I would like to thank all the people that work at IMPA and contribute to keeping it a place of excellence to study Mathematics I am very grateful to the Brazilian Government for giving me conditions to focus on my studies and CNPq for the financial support Resumo C meras comuns geralmente capturam um campo de vis o bastante limitado por volta de noventa graus raz o para este fato que quando o campo de vis o se torna maior a proje o que estas c meras usam come a a introduzir distor es n o naturais e n o triviais Esta disserta o estuda estas distor es com a finalidade de obter imagens panor micas isto imagens de grandes campos de vis o Ap s modelar o campo de vis o como uma esfera unit ria o problema passa a ser achar uma proje o de um subconjunto da esfera unit ria em um plano de imagem com propriedades desej veis N s fazemos uma discuss o aprofundada de Carroll et al 1 no qual preserva o de linhas retas e formas de objetos s o colocados como as propiedades desej veis principais e uma solu o de otimiza o proposta A seguir n s mostramos imagens panor micas obtidas por este m todo e conclu mos que ele funciona bem numa variedade de cenas Esta disserta o tamb m faz um estudo inovador sobre v deos panor micos isto v deos nos quais cada quadro constru do a partir de um grande ca
37. one uses such detection of lines instead of marking them on the equirectangular image B 1 Automatic Face Detection in Equirectangular Images In this section we explain how we find the weight field Wi exposed in section 2 7 that depends on the localization of faces on the equirectangular image In a more general context we show a method for face detection in equirectangular images Beyond our motivation of using such detection for correcting shape distortion in face regions on panoramic images there are other motivations for the face detection problem For more information we recommend the presentation available in 23 We start by explaining the main details of the standard face detector by Viola and 128 Jones 16 which is the key ingredient for our method Then we show how we joined this detector to other steps to produce a method that automatically detects faces in equirectangular images It is important to mention that this method was already proposed very briefly in 1 and this section intends to give more details about it Next we show some results and finish explaining how we compute the weight field wi j based on the faces detected by the method B 1 1 Robust Real time Face Detection In this section we explain 16 which is one of the main methods to detect faces in images known in the Computer Vision literature It is applicable to detect frontal faces We do not intend to expose technical details about imple
38. positive definite matrix is invertible in fact if it were not there would be a w 0 s t ATA el w 0 and w A A el w wt 0 O what contradicts the fact that A A eT is positive definite Thus the unique critic point for E is amp ATA eI ey It remains to prove that x is a minimizer In fact V E x 2AT Ax 2ex 2ey gt H E x ATA eI Vx R Since A A el is positive definite we conclude that the hessian H E X is positive definite This proves that x is a minimizer for E E The main advantage of minimizing E instead of E is that we replace an eigenvalue problem by solving a linear system Furthermore A A el is sparse since A is sparse symmetric and positive definite For example Matlab software that we used to perform the optimizations has specific routines to solve sparse linear systems and produces results much faster One point that remains open is what value we choose for On one hand e 0 gt E x E x and the minimizer for E would be very close to the minimizer for E On the other hand too small leads to instability problems when solving 4 A eDx ey 66 Since we are dealing with a perceptual problem we can say that a good choice for e is a value that produces visually identical results when comparing the minimizers for E and E and does not causes instability problems We show with an example in figure 2 31 that e 10 is a good choice We used a c
39. projection is P x 5 5 gt R 272 272 ll e ls tan e Figures 1 8 and 1 9 show some results of this projection with different fields of view 11 Figure 1 8 Left 90 degree long 90 degree lat Right 120 120 Figure 1 9 Left 90 degree long 90 degree lat Right 130 120 The main advantages and disadvantages of the perspective projection are Advantages e Straight lines in the scene appear straight on the final result e When the camera is held parallel to the ground the orientation constancy of the vertical lines is maintained i e they appear vertical in the resulting image Disadvantages e As the field of view increases the shape of the objects near the periphery of the image starts to change considerably This fact is noticeable even for FOVs which are not too wide such as 120 degrees see the right images in figures 1 8 and 1 9 The cause of this effect is that the perspective projection is not conformal concept that we are going to formalize further Informally a mapping is conformal if it locally 12 preserves shapes of objects The nonconformality of the perspective projection is the reason why simple photographs have a small field of view usually less than 90 degrees 1 3 2 Stereographic Projection The geometric construction of the stereographic projection is the following the vie wing sphere is projected on the x 1 plane just as the perspective projection through lines
40. suffered from this problem in different degrees As we will see in chapter 3 this new approach succeeds in this task That happens because all the important distortions are considered and well modeled Also this method depends on parameters and it is not desirable when one has to find a set of parameters that work well for different scenes As we will see in in chapter 3 this approach works well with a fixed set of parameters e Produce results fast This is the only property listed here that is not satisfied by this approach It is the price we have to pay in order to obtain a really precise result In chapter 3 we show that each result took about one or two minutes to be computed Although one can see it as a problem we think that it is an incentive to study the nume rical details and implementation of the method and develop tools and theory in order to reduce computation time 2 2 User Interface The interface that will be shown here allows the user to identify linear structures in the equirectangular image and mark them Then the method will focus on making only this specified linear structures to be straight in the final result which is a more intelligent solution than trying to make straight all possible lines as the perspective projection A central question here is given two points plo Pe e R what is the projection of the line segment r connecting them on the viewing sphere See the illustration in figure 2 2 Figur
41. t1 H A Q t1 and K 4 to A Q t1 K A Q t The construction we just did holds for any transition function Thus we obtain the following temporal coherence equations for object movement Pr H A tx K yp 4 A Q t1 K A t1 Assuming conformality for each time H A o t RooK A t we use only the equation involving the differential north vector the first one 105 Eo AG pu a BONO Mi Oo dado the pai can be rewritten as 7 re A O t ge ty A O ti t2 A P t g Oit A P ti OR A ti t2 A 9 h VA d ti E SP x pol Vti t2 E 0 to where Sp x t is the region that the object 26 occupies at time t in the equirectangular domain These are the temporal coherence equations for moving objects in the scene An analogous development lead to temporal coherence equations for the entire scene ag OF ty As Ps ti Of 45 AsO ti t2 ae P ti OV OV Od wee t2 A Q ti On t2 A Q t1 to rae Q ti VOA Q t1 E SE x ti Vti te E 0 to where S x t1 is the set of all points in the scene at time t that will be projected In the particular case we are considering case 1 A t1 A and off A Q t1 since the viewpoint is gs Thus in P case the equations are the following geht mee va O ta Bo A t V A Q t1 a1 a2 x 81 G2 x ti Ver te E 0 tol 4 6 2 Discretization of the t
42. that represent this properties propose an optimization solution for a particular case and point future directions Keywords Viewing sphere panoramic images panoramic videos Contents Introduction 1 Motivation and Overview of the Problem 00 L CO rias ica ea ea a ee es 3 Oneinal Contributions dirag 4 onthe oe DID AAA 3 Structure of the thesis a time line 0 0 00002 eee 4 1 Panoramic images 6 1 1 The Viewing Sphere cereza iento dE Sd ws US es 6 L2 Problem Statement s s sus Sh Ake oe a a A ee ee em a 10 ko standard Projections a a gs Er Ek Ee eS He OAS ESD BE E 10 Lo Perspecuve PrOJeCtION rss aus ee eE Ske EE we eS ee i 10 L32 Dtercosraphie Projection ss mois A RA SA 13 Loo Mercator Proj chON pesa wm von ea g Be a A A 14 1 4 Modified Standard Projections 424 24 GEE bee TELA oe TR ee E 15 1 4 1 Perspereographic Projections 00008 2G 16 1 4 2 Perspective Projection Centered on Other Points 18 1 4 3 Recti Perspective Projection a 2 eae 19 1 5 Previous Approaches 4 das be ee ow ee EES 4 Oe disp eA 20 1 5 1 Correction of Geometric Perceptual Distortions in Pictures 2 1 5 2 Artistic Multiprojection Rendering 23 1 5 3 Squaring the Circle in Panoramas 25 1 5 4 Other Approaches 0 00000 eee ee 27 2 Optimizing Content Preserving Projections for Wide Angle Images 28 24 Desirable PrODeRLICS esp weak Kapa SS
43. the specified field of view S C S that is a bilinear interpolation of both uij Sand Ue s values This process is detailed in section A 2 3 Now that we have a continuous function defined in the equirectangular domain we just map the texture in the equirectangular image according to such function The result for this initial iteration is shown in figure 2 33 This partial result and the other ones shown in this section were produced with 62 000 vertices and field of view 180 180 Figure 2 33 The initial result is not very good because the initialization of the s s is imprecise For this result the value of the energy is E Eg 0 0832 From the vector x returned by the initial iteration we compute the normal vectors n for each line as described in section 2 5 and minimize wi E w E wy Elo wi Sy Une A x EL acer For this iteration and all following iterations we always use we 0 4 w 0 05 and w 1000 The only iteration that used different weights was the initial one After obtaining x from the optimization we again normalize and obtain x E x 69 From this new solution vector we calculate the projections s as described in section 2 5 and minimize Ej wE wiE wi gt Er wy gt Eva Aax IEL IEINL where we 0 4 ws 0 05 and w 1000 A X Exp This process of first minimizing E and then minimizing E we call a double iteration Again we normalize the s
44. the next sections we are going to formulate energy terms that depend on the values uij and v and measure how much a panoramic image contains undesirable distortions This discretization does not have to be the pixel discretization of the equirectangular image dO Figure 2 6 The discretization of the equirectangular domain induces a discretization of the viewing sphere The vertices of this discretization of the sphere or equivalently the vertices of the discretization of the equirectangular domain are mapped to the image plane by the function u 2 4 Conformality This section is devoted to mathematically model the concept of preservation of shapes stated as a desirable property for wide angle images in section 2 1 The reader is assumed to have some notions of differential geometry of surfaces Such notions can be found in 15 sections 2 1 to 2 5 Here a point A 9 is an interior point in the equirectangular domain i e A 7 and 5 This assumption allows us to consider differential properties of the mapping u and turns r into an actual parametrization Although now this parametrization does not cover the entire sphere it excludes one meridian and the poles we identify r 7a 7 x 7 5 as S for convenience 2 4 1 Differential Geometry and the Cauchy Riemann Equations Definition 2 1 A dipheomorfism y S gt S is a conformal mapping if for allp S and for all v va TS holds dyp v1
45. the user goes to the directory corresponding to the application we named it as panorama There she just types run sh images test image In this example the input image must be in the directory images with the name test_image ppm This command will open window 1 which is shown in figure A 2 Window I Equinettangular image i amp Pe geet se ih Res E 7 oa A ma Figure A 2 Window 1 loads the input image and allows the user to mark lines on it This window corresponds to the user interface explained in section 2 2 where the user mark the lines that she expects to be straight in the final result Clicking on the two 119 endpoints of the line will plot the corresponding curve in black Then the user types v h or g to specify if the line should be vertical horizontal or with general orientation on the final result After marking such lines window 1 is as shown in figure A 3 Winder i Equirectangular image Ge la pesa winem Figure A 3 Window 1 after the process of marking lines Clicking on go to next window button takes the user to window 2 which is shown in figure A 4 Window Equirettangular image alpha o pil alphaz alpha wertices wertices kermoni iterations qo t opmization Figure A 4 Window 2 shows and allows to specify the field of view that is going to be projected This second window was not done in 1 it is a new
46. v K3 which proves that u u v is a constant mapping E We have proved that the only mappings that are smooth and conformal in the conti nuous case are the constant ones We extend this result for the discrete case assuming that the discretization is fine enough such that the result holds Since our optimization method returns a nonconstant solution which is a desirable property we conclude that such solution must have nonzero energy since the total energy ia a sum of conformality smoothness and line energies To finish the discussion about the results we argue about the time of computation of them We used Matlab implementations that will be discussed in A 2 3 to compute the results Despite of being very practical Matlab tends to be slower than using Linear Algebra Routines in C or C A future work would be to convert our implementation to C But even in the Matlab context we think the result could be produced faster It is taking longer than desired to alternate between energies 1 e to compute the normal 95 vectors n and the projections s as described in section 2 8 4 Also in order to produce the final result with bilinear interpolation it is taking longer than expected and we think this procedure could be improved We leave these both tasks to future work Despite of these procedures that have to be improved we think that about one minute to produce the results is a satisfactory time 96 Cha
47. with radius 0 and height 1 Since we have only one object we do not need to sum over all objects to determine the final weights To plug this weight into our energy we have to change the matrices and Sin Ag leading to different matrices A for different times The new matrix will be Al O Woa OB WicSC The effect of these new weights is shown in figure 4 16 We make available all partial results with and without energies and weights that we discussed in this section in 22 4 6 4 Implementation Details All results in last section were generated using a discretization of the temporal viewing sphere of 60 000 vertices 3 705 vertices per frame The time to compute the final video was about 8 minutes The part of the method that is the most time consuming is solving 113 Figure 4 16 16th frame for the video produced using image object and scene energies and object weights The man is less stretched if compared to 4 15 the final linear system APA eDX eY We did a very simple implementation to determine the transition functions for the object in the scene For each frame we drew a box around the object We identified the box as the object and assumed it to have same size for all frames thus the transition functions were only translations We show in figure 4 17 a frame illustrating what was just said Figure 4 17 For each frame a box around the object was drawn 114 4 6 5 Other Solutions I
48. 1 Y1 Zn Yn where y 0 1 for negative and positive examples gt for y 0 1 respectively where m and are the number e Initialize weights wii 37 3 of negatives and positives Adaboost Loop fort 1 T We i ia Wej e Select the weak classifier that minimizes e Normalize weights w E min gt wish f p 0 yil Obs 1 0 1 Obs 2 There is a way of efficiently minimize e described in 16 2T stands for the number of weak classifiers that will compose the strong classifier It is a chosen parameter 131 e Define hilz h x fe pr 04 where fi p O are the minimizers of e Update the weights l e Wai O a Et where e 0 if example x is classified correctly e 1 otherwise and 0 Et Obs This update means that if the error e is low the examples that were classified correctly will be less considered for the next weak classifier selection Adaboost Final strong classifier e After the T weak classifiers are chosen the final strong classifier is defined as T T 1 C x o 1 if 2 azh x 9 Qt eal O otherwise where a log ae Obs If the error e is low a is higher To illustrate the learning process we show in figure B 5 the features associated to the first two weak classifiers selected by the method Figure B 5 The first two features The first one measures the difference in intensity betwee
49. 2 8807 10 e Time to construct the matrices 4 seconds e Time to perform the optimizations 38 seconds e Time to generate the final result 4 seconds Input image and marked lines Hi in i IHRE He o A 32 Detected faces AR ma Cui ise D eh an f q TP sx Standard projections perspective stereographic and Mercator 83 Modified standard projections perspereographic for K 0 4 and recti perspective a 1 7 G 0 8 Result of the method uncropped 84 Result of the method cropped e Comments for this example This example shows that the face detection process works well All the frontal faces were detected by the method that will be explained in section B 1 The face detection helped to preserve the shapes of the faces of the man in the center and of the woman in the left of the result Their faces would be more stretched without the face detection since many lines are passing near them 9 3 4 Result 4 e Source image Entr e du Parc Floral de Paris by Flickr user Gadl e Field of view 360 degree longitude 180 degree latitude e Number of vertices 78 804 vertices e Number of double iterations 3 e Final energy E 0 0553 e Time to construct the matrices 3 seconds e Time to perform the optimizations 51 seconds e Time to generate the final result 14 seconds Input image and marked lines 86 Standard projections perspe
50. 9924 107 e Time to construct the matrices 4 seconds e Time to perform the optimizations 58 seconds e Time to generate the final result 10 seconds Input image and marked lines 4 Standard projections perspective stereographic and Mercator 19 Modified standard projections perspereographic for K 0 5 and recti perspective a 2 5 G 0 9 Result of the method uncropped 16 Result of the method cropped e Comments for this example The method took about one minute long to produce the result which is average for the method As expected the step that took longer was the optimizations and alternations between energies All the marked lines are straight according to the orientation specified by the user for example the side buildings are all vertical and the front building is horizontal In addition all the shapes are well preserved no stretching is evident All these properties make the result produced by the method better than all standard and modified projections One problem of the final result is the highly detailed floor Since the user did not mark any straight lines on it it looked curved in the final result This will be a common problem the user usually does not mark lines on the ground and if she does it will take too long to mark all the lines V 3 2 Result 2 e Source image Saint Gu nol Church of Batz Surmer Equirectangular 360
51. Ad Ad Viera Vigbt Ss Vig O Ad Ad o t 0 m 1 7 0 n k 0 1 The above equations lead to the scene temporal coherence energy m l n 2 U e a E a 1 0 9 0 k 0 Ae m l n ll 2 Views Vyer Visage Vuk a cos 2 Qi J J J J i 0 j 0 k 0 Ap We rewrite Es in matrix form Es SCX The new total energy is E AxIf 111 where WO B Topos We set Wi 12 and corrected the problem of the background shaking to preserve the object We show some frames of the new result in figures 4 14 and 4 15 Figure 4 14 8th frame for the video produced using image object and scene energies We observe now that the object regions lose conformality and smoothness due to the extra constrains imposed on them For example see figure 4 15 In order to correct this problem and also to satisfy desirable property 2 we increase conformality and smoothness in object regions using spatially temporally varying weights For each time ty the image weights w in section 2 7 will be replaced by o L S F ob ob Wijk 20057 200 7 AU LA Wijk Wig Wijk The weights wer are defined in the following way Let Ae de tk be the center of the object and o the radius of the object at time t We define lay rt aj sit wee y 20 112 Figure 4 15 16th frame for the video produced using image object and scene energies a gaussian centered at Ac de tr
52. Final energy E 4 0658 107 e Time to construct the matrices 4 seconds e Time to perform the optimizations 43 seconds e Time to generate the final result 4 seconds Input image and marked lines 90 Result of the method described in 7 cropped Result of the method described in this thesis cropped e Comments for this example The method in 7 breaks the lines on the ceiling This is not a problem for the method described in this thesis as can be seen above 91 3 6 Failure cases The first failure case we show is when a scene is covered by multiple parallel lines that cover near 180 An example of such scene is shown in figure 3 1 where long horizontal lines are present in the scene Straightening such lines unavoidably causes distortions in the region that is between them In figure 3 2 we show the final result where the train is distorted Figure 3 1 A scene with two horizontal lines covering near 180 of the equirectangular image Source image Express Shirasagi by Flickr user Vitroid B E a Sy A pi A TA A A A PE a TA AAA de Pat od CE TSM ta a e ae Figure 3 2 Final result Observe how the train is distorted 92 Another failure case happens when the user forgets to mark important lines in the scene and or forgets to specify some important orientation We show in figure 3 3 the same example shown in result 5 but with other lines marked As can be se
53. For each line in each perspective image obtain its endpoints map them to the original equirectangular image and plot the geodesic curve connecting these endpoints on the equirectangular domain The output is shown in figure B 26 Figure B 26 Final result 110 line segments detected Besides the final image the method also produces two files L_endpoints and L2_endpoints as in section A 2 1 with coordinates of the endpoints orientation and index for each line As we mentioned before our method depends on parameters which are passed in command line Below we explain what each parameter means e argv 1 Input equirectangular image 147 e argv 2 Output image e argv 3 Parameter used by the filter Canny step 3 described in the Ist section of this report e argv 4 os space sigma for bilateral filter The greater this parameter the smal ler the number of detected lines because the image is more blurred e argu 5 o range sigma for bilateral filter The greater this parameter is the stronger the edge has to be to not be blurred i e the greater this parameters the smaller the number of detected lines too e argu lb Window size for eigenvalue processing It decides the locality of the eigen g value analysis e argu 7 argu 8 argu 9 argu 10 argu 11 argu 12 T value for eigenvalue processing for the six different perspectives Since all the steps are performed separated in each pers
54. Graphics Forum Proceedings of EGSR 2009 vol 28 no 4 p to appear 2009 9 Point grey ccd and cmos digital cameras for industrial machine and computer vision URL http www ptgrey com 110 Flickr Equirectangular URL http www flickr com groups equirectangular 11 J P Snyder Map projections a working manual Supt of Docs 1987 155 112 L K Sacht Multiperspective images from real world scenes URL http w3 impa br leo ks s3d 13 L K Sacht Multi view multi plane approach to project the viewing sphere URL http w3 impa br leo ks image processing 114 A Agarwala M Agrawala M Cohen D Salesin and R Szeliski Photographing long scenes with multi viewpoint panoramas ACM Trans Graph vol 25 no 3 pp 893 861 2006 115 M P Do Carmo Differential Geometry of Curves and Surfaces New Jersey Prentice Hall Inc 1976 16 P Viola and M J Jones Robust real time face detection Int J Comput Vision vol 57 no 2 pp 137 154 2004 LL 117 U Neumann T Pintaric and A Rizzo Immersive panoramic video in MULTI MEDIA 00 Proceedings of the eighth ACM international conference on Multimedia New York NY USA pp 493 494 ACM 2000 18 D Kimber J Foote and S Lertsithichai Flyabout spatially indexed panoramic vi deo in MULTIMEDIA 01 Proceedings of the ninth ACM international conference on Multimedia New Yor
55. INSTITUTO NACIONAL DE MATEM TICA PURA E APLICADA Content based Projections for Panoramic Images and Videos Leonardo Koller Sacht Advisor Paulo Cezar Carvalho Co advisor Luiz Velho Rio de Janeiro April 5 2010 Master thesis committee Paulo Cezar Pinto Carvalho advisor IMPA Luiz Carlos Pacheco Rodrigues Velho co advisor IMPA Marcelo Gattass PUC Rio Luiz Henrique de Figueiredo substitute IMPA Acknowledgements First of all I would like to thank my mother for always supporting me encouraging me and pushing me forward I would like to thank Professor Paulo Cezar Carvalho for guiding my studies during the last years and for giving valuable suggestions and contributions to this work I am grateful to Professor Luiz Velho for receiving me very well in the Visgraf La boratory for introducing me to the theme of this thesis for giving great ideas for the development of this work and for providing key discussions about the theme I would also like to thank Professor Marcelo Gattass for helping me on the Computer Vision aspects of this work A more general acknowledgement to all Professors from IMPA and UFSC that contri buted to my academic growth I want to thank all my colleagues from the Visgraf Lab that somehow helped me in this thesis Thiago Pereira Adriana Schulz Djalma Lucio Gabriel Duarte Marcelo Cicconet Francisco Ganacim and Leandro Cruz Also I want to thank all friends at IMPA for making the years
56. We fix this problem by multiplying both equations by a fixed multiple of the area of each quad on the sphere The definition of area of a part of a surface can be found in 15 page 98 In our case we want to know the area of r A A 41 X Pij Qiri r A Ai AA x Di A on the viewing sphere which is given by the following Pij AG ij AA ATeaij J A 6 x dAdo Ta o 4 42 Straightforward calculations give us cos cos q Or x Or ind 2 an DE sin A cos sin cos q and Or 7 Or 7 ra COS q Thus PiFAP pAjjHAA pij tAd Area cos pdidp a cos d Pij Aij Pij A Ag cos AAAG cos Qij where is some value in liz big AQ Since AAA is constant we conclude that Area is proportional to cos and the discretized equations become Vij 1 Ut Uit1 j Uij A MM Uij 1 UJ Vi 1 j Vij C o ee nO We define the conformality energy of u to be a weighted sum of the quadratic errors of both equations m l n 1 a a r i 2 2 bjt A Wray sig Ec Wa PH cos gy Mid t ij AA Ad 1 0 9 0 m l1n 1 2 2 Uijti Vij Vi 1 j Vij SE Ne ERON Qu o al AA Ad 1 0 7 0 wij are spatially varying weights that depend on the content of the image and will be defined in section 2 7 The definition of E is necessary because we need a way of measuring how much conformal a discretized mapping is In order to produc
57. We use these coefficients to define the output virtual vertex l Uj au bu j44 CUi 1 j dUi 1 j qij E V 46 l The same is done to define Ugar and ul the virtual vertices corresponding to the endpoints of J We define our straight line energies as a function of the position of the output virtual vertices that are linear combinations of actual output vertices u Thus in the end the energies will depend only on the u s In order to have all U S collinear we should have the distance from each Us to the l start line connecting u to u to be zero Vqi V see figure 2 15 Und o a rt Figure 2 15 The distance should be zero One way of expressing this distance is as the coefficient of the orthogonal projection l l l l of U Ustart ON the normal direction to U q Ustari Which is given by l E NP res l u Us N Ustart gt Und where l Uend Ustart Usar Wend Roo l Und Ustart Using such distance to measure how the U S are not collinear leads to the following energy for line 2 E y u o Ustart MU starts Wena qij EVI This expression is not convenient because it involves cross products of variables As a consequence E becomes a sum of nonlinear squares We want it to be a sum of linear squares since the other energies have such form We now turn to an alternative way of expressing the distance fro
58. all together into an equirectangular domain that represents such scene More specifically from a set of photographs common photographs or images obtained with fisheye lenses for example taken from the same viewpoint it became possible to create an image in which every pixel represents a point and its associated color on the equirectangular domain The stitching process itself is a very detailed task and it is not going to be discussed in this work However it is important to notice that without the development of such techniques it would not be possible to deal with projections from the viewing sphere to an image plane which is one of the main tasks of this thesis For additional information about stitching we suggest references 2 and 3 Such images of the equirectangular domain are called equirectangular images and will be the input information for all the algorithms that we will develop Examples of equi rectangular images are shown in figures 1 3 1 4 and 1 5 A 3 TR T I l F TYF ms o o Ja Uy dM Beene te Ta ip J gt 0 Figure 1 4 Reboot 8 0 lanus demos Cabinet to Thomas kid by Flickr user Aldo taken from 10 Figure 1 5 Cloud Gate by Flickr user WemY77T taken from 10 The bottom left corner of the images with coordinates x y m 1 0 represents the point 7 Z the top right corner x y 0 n 1 of the image represents the
59. and orientation through all panoramic video If an object is moving away from the viewpoint its size should not be preserved it should be smaller from one time to another That is the reason why me model this property depending on the temporal viewing sphere The mathematical modeling of requirements 3 and 4 is made with definition of tran sition functions definition that will be stated in section 4 5 99 4 4 The Temporal Viewing Sphere and Problem Sta tement In this section we mathematically define the concept of temporal viewing sphere Temporal viewing spheres will be the input for our methods and contains the information of a scene that varies through time We also state the panoramic video problem All this section is an immediate extension of the definitions we gave for the panoramic image problem Let 0 to be a time interval and consider the function R 7 7 x 3 3 x 0 t R A t cos A cos sin A cos sin t We call the image of R the temporal viewing sphere It is just a set of viewing spheres with one more coordinate t that tells which time they are representing We denote the temporal viewing sphere by TS We give in figure 4 1 an illustration of the concept we have just defined 1 1 1 1 1 1 1 I 1 I I l I I I I I I I I 1 i i Figure 4 1 An illustration of the temporal viewing sphere The variation of the fourth coordinate is represented by the va
60. and techniques related to the following areas Differential Geometry Linear Algebra both theoretical and numerical Optimization Numerical Analysis Analytic Geometry Multivariable Analysis Statistics Computer Vision Image and Video Processing Interesting extensions arose from our study of panoramic images In this work we suggested methods to detect lines and faces in equirectangular images and opened a novel discussion about panoramic videos This work left open many possibilities for future work and we point them in next section 152 Future Work We think the most important future work for the panoramic image problem are e Migrate all the Matlab code to C C We implemented the optimization and the interface in different languages It would be interesting for future applications to integrate both parts to the same language e Integration of line detection to the interface This integration would allow the user to specify the parameters for line detection in a window where she also could include or remove lines e Apply the method to gigapixel images After finding a discretized projection using the method described in chapters 2 and 3 the final panoramic image is produced using bilinear interpolation and this process can be done for any resolution of the input equirectangular image even gigapixel ones High quality results would be produced using these input images For the panoramic video problem the next steps we int
61. ands of them on photo sharing sites To illustrate this point the reader may for example access Flickr group on 10 One can find there a great variety of equirectangular images indoor or outdoor scenes with or without people realistic or with artistic effects from places all around the world in many different resolutions 1 2 Problem Statement With the formalism created in the last section we can formulate the panoramic image problem as the one of finding a mapping il INES R Ap uo with desirable properties Here S is a field of view which may not be the entire 360 by 180 degree entire field of view The set u S can be interpreted as a continuous image each u A 9 u S receives the color that the viewing sphere has at A d Thus we have two ways of thinking of a panoramic image as a mapping u or as a continuous image u S This duality allows us to turn perceptual properties of the continuous image into algebraic expressions that depend on the function u 1 3 Standard Projections There are many known functions that project the viewing sphere or a part of it onto a plane Many of them were developed for cartography purposes since Earth s shape can be approximated by a sphere There are different classifications for these projections equal area conformal cylindrical etc For details about these classifications and many examples of projections we recommend 11 In this section we study the b
62. apping changes too much to satisfy the line constrains We want the solution to be smoother We can observe huge changes of scale and orientation over the image especially near line segments This behavior has an explanation conformality imposes that the differential north vector h is a 90 degree rotation of the differential east vector k i e h Rook Such imposition turns the mapping u locally into a rotation composed with a uniform scale But this rotation and scale can be very different even for two points A1 1 A2 2 that are very close in the equirectangular domain Figure 2 24 illustrates what was just said We avoid such behavior by imposing small variations for both h and k If we have oh oh ay A De T T 1 8 0 VA 9 5 5 x mm 50 h h u A gt do u A 01 k Figure 2 24 Vectors h and k varying too much for small variations of A it follows that Ok w N 00 gd 0 VAG e 5 5 x mm since k R_ooh here we are assuming that u is conformal Thus it is enough to impose the constrains only on h Hence ideally we should have h 2 4 7 ey Au 7 8 v Ou i O A gt ON 00 ETS 0 0 Ou Ov Ou Ov ag o 0 age o 0 agar 9 0 DGD many Oe Uy MANO 2 6 1 Energy Term We impose the last equations on the vertices A dj 7 1 m 1 7 0 n 1 of the discretization of the viewing sphere and approximate the second
63. become more critical Next section presents a modification to the problem of minimizing E x that provides a good quality approximate solution and is faster 2 8 3 Linear System Method We replace E x by other energy E x that is E x plus a small perturbation that tells how much the mapping x deviates from some known mapping y E x E x ellx yl Ax ellx yl where gt 0 is a chosen small value We use the stereographic mapping section 1 3 2 for y The advantages of using such perturbed energy are discussed further Before that consider the following statement Statement 2 5 The minimizer of E in R is z A A el ey Tt turns out that in practice Az gt 0 Then this requirement is always satisfied 69 Proof First we look for critic points of E i e we look for x R such that VE x 0 Rewriting the expression for E leads to E x x AT Ax e x y x y Thus VE x V x At Ax eV x y x y V x A Ax e V x x V x y V y x V y y We know from multivariable Calculus that V x 4 Ax 24 Ax since A A is symme tric Vix x Ve Ix 21x 2x Vix y Vy x y and Viy y 0 Then we have V E x 2A Ax 2ex 2ey and VE x 0 A7A eI x ey Observe that A A el is positive definite Ty AT sara E SE T i 2 x A A eIl x x A Ax exx x A Ax e x gt 0 Vx 4 0 20 gt 0 We know that every
64. ce in order to produce a perceptually good panoramic video Goals This thesis has the following goals e Study and understand the panoramic image problem In our work we do a review of the methods proposed up to the moment which is necessary to understand the difficulties and challenges of the problem e Deeply detail one reference on this topic After doing the review we elected 1 as the main reference of this thesis because it satisfies most of the properties we state as desirable All the details even the ones that were omitted in the reference are explai ned in this thesis e Propose extensions for this reference Beyond detailing 1 we propose two ex tensions for it feature detection on equirectangular images and panoramic videos e Focus on mathematical aspects of the problem All the mathematical tech niques related to the problem are discussed in details in this thesis Perceptual aspects are also discussed and implementation aspects are left as appendix Original Contributions We believe that the two most important contributions of our work are e Statement modeling and solutions for the panoramic video problem As far as we know this thesis is the first work where the problem of obtaining a video where each frame represents a wide FOV is considered We consider desirable properties for this problem that depend on temporal coherence of the objects and of the entire scene we m
65. ce wide angle images should have the following charac teristics e Depend on the scene content As we saw in sections 1 3 and 1 4 global pro jections produce distortions One of the reasons for this fact is that they do not give a special treatment for different regions of the panorama Some previous approaches sec tions 1 5 2 and 1 5 3 tried to do something like this but they did it in a coarse way This approach constructs a wide angle image adapted to the location of lines sections 2 2 2 5 and 2 7 faces sections 2 7 and B 1 and importance of regions on the scene section 2 7 e Handle wide fields of view Some standard projections and previous approaches are only defined for fields of view up to 180 degree and some of them produce bad results even for narrower FOVs This approach does not have this problem and can handle arbi trary FOVs as can be seen in chapter 3 e Satisfy the structural requirements In section 1 5 1 we stated requirements that a wide angle image should have in order to match our perception of the world since they are based on the retinal projections We do not devote any special discussion to them here but we will always be careful to weather the method satisfy them or not e Have a simple user interface Although not emphasized in the last chapter some previous approaches sections 1 5 2 1 5 3 and 1 5 4 3 needed a precise and or tedious interaction with the user in order to yield a good re
66. conclude that LO is sparse The conclusion E LOx is straightforward 2 2 Os LOW se se otis LOx LO ay 147 LO oy 2 LOY x Eio ana Belgo y Ejo Elo LEL We now focus on the lines that have no specified orientation i e the set IAL fo o oat As we explained we alternate between minimizing Ejo and Eja for such lines To turn E into a matrix form we do the same way we did for lines with fixed 9 orientation for each 9 E L Ly k 0 k2 1 we construct the matrix LO in the same way we did before and define LOW LOA LOW a E We call LOA the alternate fixed orientation line matrix Since now we can have ny 4 0 and no 0 each line of LOA has at most 16 nonzero entries So it is still sparse Let L L Developing the expression for Fia we obtain Eul y gig EV 1 ar O Ostet Wipe teen Ir 1 da A RR a 1 ae Bie Otay rae ater ce 1 do SED AD O PANE EE QijUij 0554105541 Cgi Gita jt Uit jH 2 SijlendUicnaend SijbendWienajendtl e SijCendUi ng jend Sij dendUiena l jenatl 2 Y aijuig digtrtigga SigCendionatljena Sig dendUicnat Lena qui Vi 2 Tag ds Sie Usa z A 15O SijCendViena L jend EE A T le SThe system usually has thousands of variables the number of variables is two times the number of vertices 50 where corresponds to the middle 8 terms that are being
67. ction 2 4 that a mapping u S R is conformal if Ou 1 dv Ov 1 Ou Od E cos db cosdON and from section 2 6 u is smooth if du _ 9 do _ 4 Ou Ov ag aga aa o ao The following statement shows that the only mapping that satisfy the above six equa tions are the constant mappings 94 Statement 3 1 u S R is conformal and smooth amp u A amp uo vo V A 4 Proof lt Constant mappings have all derivatives equal to zero and the above six equations are trivially verified We use the above six equations to obtain expressions for u and v Ou Ou 260 7 0 gt oa Fi gt u Fi d t F A for some functions F and F gt Ou on O O Fi 0 gt Fle K gt u Kido F A gt u Kio Fo for some constant K Thus we have the following expression for u Oy 2 VU Analogously using the equations 0 and one obtains the following expression for v v Kod i F A for some constant K and some function F4 Now considering the Cauchy Riemann equations we have 1 EIA o s ER gt K ne gt K cos Fi A Since cos only depends on q and F A depends only on A the last equation implies K Fi A 0 Hence K 0 Fy K and u Fy A and v Kod K3 for some constant K3 v Ou 1 PALA o 7 E K i gt FAA Kocos d gt K 0 Fo Ka for some OSs Q constant Ku Thus u K and
68. ctive stereographic and Mercator S7 Modified standard projections perspereographic for K 1 and recti perspective a 50 8 0 7 Result of the method uncropped 88 Result of the method cropped e Comments for this example This example shows that the method is applicable for wide fields of view even the entire viewing sphere The standard and modified projections have the problem of not being defined for such FOVs or infinity stretching For example the Mercator projection for this example was obtained by projecting a subset of the viewing sphere actually 160 degree latitude since the projection stretches to infinity when o ae The possibility of projecting the entire viewing sphere allows the construction of a viewer of panoramas based on the method approached on this thesis The method would return the solution vector x and the user would specify what FOV of the sphere she wants to see Thus the viewer would collect only the positions of the vertices of the chosen FOV construct an image and display on the screen Some topological problems could appear since the borders of the final result are irregular There would not be the possibility of looking all around the viewing sphere for example 89 3 5 Result 5 e Source image Posters by Flickr user Simon S e Field of view 180 degree longitude 100 degree latitude e Number of vertices 44 431 vertices e Number of double iterations 3 e
69. dence of points of an object between different times For example if an object is translated by Ao do in the equirectangular domain the transition function for it will be ON P ti A 0 Q do t2 VA P t1 E object at time t 4 6 Case 1 Stationary VP Stationary FOV and Mo ving Objects In this section we propose a solution for the 1st case of panoramic videos This solution is strongly based on the solution we studied for images we derive equations for temporal coherence discretize the domain obtain energy terms for the temporal coherence equations and compute a panoramic video using an optimization framework We simplify this case by assuming there is only one moving object in the scene but our solution is easily extendable for n objects Since the FOV is stationary the part of the temporal viewing sphere that will be projected is a cube of the form 1 a2 x 61 G2 x O to 4 6 1 Temporal Coherence Equations In this section we obtain partial differential equations that model temporal coherence for case 1 of panoramic videos We obtain these equations for the object in the scene using the transition functions E and extend them for the entire scene using the transition functions y y Let S x t1 C S x t be the moving object in the scene at time t and assume pr to be defined in S x t Let Ay 1 t1 E S x t and As bo t2 Pt t A1 01 41 The perceptual requirement that we
70. e 2 2 Projection of lines from the scene to the viewing sphere Points pm and Pp are projected to po and PY on the sphere We denote by r the line segment connecting P to PS ol The key observation is that r and r project to the same points on the viewing sphere the arc connecting pt and PY on the sphere So we can work just with the points pf and P S e S which will be corresponding points to the ones marked by the user on the equirectangular image A very simple parametrization for r is pts 0 1 R tes IPO a The projection of r on S say 8 also has a simple parametrization ys 10 1 5 a 2 P tp P 4P We have to bring these calculations to the equirectangular domain The user marks two points A1 61 and A2 G2 E 7 7 x which have the following corresponding points on S P cos A1 cos 41 sin A1 cos 1 sin and PS cos A2 cos 2 sin Az cos do sin d3 Let 8 as above For each y8 t x t y t z t S t 0 1 we have to find the corresponding A t d t 7 a x a Let to 0 1 and y to x y z and A the corresponding longitude and latitude on the equirectangular domain Then we have the following relation x y z cos A cos sin A cos sin Obviously p arcsin z To obtain A we first consider x gt 0 in this case we must have ee he gt and the relation y sin A
71. e midpoint of each quad line inter section as described in section 2 5 and then obtaining the bilinear coefficients for each virtual output vertex section 2 5 2 With these values the matrix LO is constructed Also the matrix LDA for the initial iteration is defined Then the initial iteration section 2 8 4 is performed using the code below 125 function f all energies initial forsyth minimization 2 L L2 C S LO LDinit m2 n2 alpha 1 alpha 2 beta 1 beta 2 y A 0 4 C 0 05 S 10 L0 10 LDinit epsilon 107 6 B sparse 2 m2 n2 2 m2xn2 B A xA for 1 1 2 m2 n2 B i i B i i epsilon x B epsilon y f x norm x To solve the linear system we used just the backslash tool x B epsilon y provided by Matlab that uses the proper numerical method to solve such linear system The next step is to perform the number of double iterations specified by the user in Window 2 and the final iteration The code is almost the same as the one used for the initial iteration e Generation of the final result As we already mentioned our discretization of la1 2 X 31 55 depends on the value factor returned by Window 2 This dependence is illustrated in figure A 10 Figure A 10 The portion of the equirectangular image corresponding to 01 a X B1 B2 has 2 times more pixels represented with dashed lines than the number of vertices of the discretization of a1 a2 x 61 G2
72. e the final result we will join this energy to other ones and the task of looking for a conformal mapping will be replaced by looking for a mapping that minimizes a weighted sum of these energies E can be rewritten as E Cx where C is a matrix and x is a vector since E is a sum of squares of linear terms Let x R2 D where XQ j i n 1 Uij and X2 j i n 1 41 Vij 0 si US J 0 ad Each entry of Cx must correspond to the term that is in each double summation of E So each line of C must correspond to a double summation and C must have 2mn lines 43 Let C RAMXAMEDA 1 The equality E Cx is achieved by defining Wij C2 j 4in 2 j 1 i n 1 1 RO Cajtin 2j i n 1 1 re Co Wiz COS Pig a E SE o Wij COS Qij 2 j in 2 j i 1 n 1 2 j in 2 j i n 1 Ad Wij Co j in 1 2 j 1 i n 1 E Ay Otini Ay _ Wij COS Wi COS Dj Caj in 1 2 G 1 n 1 1 Z Ca j in 1 2 j i n 1 1 Eo AQ 1 0 m 1 7 0 n 1 The other entries of C are defined as 0 We call C the conformality matrix and observe that it is sparse only 4 nonzero entries per row We do not discuss the minimization of E Cx here since E will be minimized among other energies in section 2 8 But in figure 2 10 we show a panoramic image that one obtains when minimizes such energy alone As we can see the object shapes are well preserved but lines that should be strai
73. ed according to figure 3 Chapters 2 and 3 Chapter 1 Appendices A and B Chapter 4 Past Present Future Figure 3 Structure of the thesis Chapter 1 starts with the statement of the panoramic image problem and then reviews many possibilities proposed to solve this problem until last year 2009 That is why we associate it to past time The first solutions considered are the standard projections which were developed centuries ago with other purposes but are applicable to the problem Some modifications of them are also considered Then we analyze previous approaches proposed in the last 15 years 5 6 7 4 and 8 Motivated by chapter 1 we start chapter 2 by making a list of desirable properties a method to produce panoramic images should have and explain why 1 satisfies most 4 of them Since 1 is very recent it was published in SIGGRAPH 2009 we associate it to present time Each section of this reference is discussed and theoretical details are rigorously posed In chapter 3 we show the results produced by the method Good results are shown each one illustrating some interesting feature of the method Also some failure cases are discussed In order to complement the discussion about 1 we provide two appendices In appendix A we show the application software we did to implement the method and give implementation details Appendix B shows the methods we developed to detect faces and straight lines in
74. emporal viewing sphere In this section we discretize the temporal viewing sphere in an analogous manner we did to discretize the viewing sphere section 2 3 We discretize the entire temporal viewing sphere S x 0 to TS but all the development is analogous if we use a cube of the form a1 a2 x 51 5 x 0 to which is the domain of projection for case 1 We stated the panoramic video problem as the one of finding a function U TS gt R Apt UA gt VA gt We replace this continuous problem by finding U on a discrete mesh A k As Qi tk where EE E m 2 t fp 0 k2 k 0 1 l The parameters m and n can be chosen as we did for panoramic images and the para meter is the number of frames in the input equirectangular video The values of U at A will be denoted by Uge Win Vestir TES O en t PR M RES O sui ade 106 4 6 3 Total energy minimization and results In this section we produce a panoramic video using all the theory developed in this chapter We use as input equirectangular video the video shown in figures 4 2 and 4 3 with 16 frames We mark the lines for the first frame Since the viewpoint is stationary they do not move across time so it is enough to mark them only in the first frame The marked lines are shown in figure 4 9 Figure 4 9 Marked lines for the example video The first step is to obtain an optimal panoramic image for the first frame Since the back
75. en in figure 3 4 the result produced by the method is very undesirable Figure 3 4 Fimal undesirable result The method shows to be user dependent in some situations It requires precision of the user to specify lines The semiautomatic detection of lines described in section B 2 may help the user in this task 93 3 7 Result Discussion The results shown in the last section prove the precision of the method studied in this thesis In all the cases where the user marked correctly the important lines in the scene the method returned a perceptually great result In this section we finish the discussion about the results presenting some properties that all of them share The first one is that the projection is very uniform far from the lines We plot in figure 3 5 results 2 and 3 together with a base mesh which is not the same mesh used to obtain the result it is a coarser one Figure 3 5 The projection is more distorted near the line segments Another property that all the results share is that the final result never has null energy Since we want a mapping with as least distortions as possible mappings with null energy would be the ideal ones Such behavior has a clear explanation the smoothness conformality and line constraints can not be satisfied at the same time We show in particular that the smoothness and conformality energy can not be null at the same time in the continuous context Remember from se
76. end to pursue are the following e Finish case 1 We intend to implement the other solutions discussed for this case and compare them to the one implemented in this thesis e Case 2 Implement the optimization solution and compare it to the solution presented in this thesis e Integrate cases 1 and 2 Use the solutions of these cases to solve the panoramic video problem with stationary viewpoint e Case 3 Generalize for this case what was developed for the first 2 cases e Investigate numerical methods for the problem The solution we proposed in this thesis took too long to be computed We intend to investigate numerical and com putational methods for the problem in order to produce results faster The development of the panoramic video theme may cause impact for different areas of art and entertainment since it leads to a different way of filming and visualizing scenes We mention below some areas of application e Cinema If a scene is filmed with a spherical camera that produces an equirectangular video there is no need to choose in which direction to point the camera The scene will be much better represented by the equirectangular video and specification of FOVs could be done as a post processing step An interesting future work is to develop an interface where the input is an equirectangular video and the user can specify different FOVs for different times Also the possibility of filming and visualizing wide FOVs can add inter
77. er The result of min E x x 1 tells us the direction of minimal increase for E For our purposes the direction is the only thing that matters since scaling the final mapping will not change its final shape Statement 2 3 The solution of ma Ax is e the eigenvector of ATA associated to 1 its smallest eigenvalue A In addition E e Af Proof Since A A is symmetric and positive semidefinite x A Ax gt 0 Vx it has an orthonormal basis of eigenvectors e i 1 q with eigenvalues 0 lt A lt s S lt Ag Thus any unit vector can be written as X Me lgEg where pi pg 1 The first conclusion of the statement comes from E x Ele x A A x et A A ey DA Tres A OG gt A FeAl A Mu Freh A 0 gt Elx gt E e Vx st x 1 The other conclusion is straightforward One could think that the above statement solves our problem it would be enough to find the smallest eigenvalue of 4 A and its associated eigenvector But a problem arises Statement 2 4 The vectors x such that uij ky and vij ky Vi j are eigenvectors of A A with corresponding eigenvalues equal to 0 Proof We consider A Ag The case A A is analogous All energies Ec Es Eio and Eja vanish if we plug into them constant values for u s and vi s this fact is straightforward from the expression for each energy Thus for u ku and vj k we have Ea wE w Es 07 B
78. es and the edge detector would not detect them The bilateral filter only blurs pixels on the image that have similar neighbors because it considers the difference of colors between pixels this is appropriate to remove textures that do not have great disparity of colors and preserves the most important edges More formally for some pixel p the filtered result is BF Do Y Go P qEQ p a DGM Lp La where Gs is a gaussian filter centered on p with deviation os and G is a gaussian filter centered on Jp with deviation 0 kp is a normalizing factor the sum of the Gs Gs filter weights and Q is the spatial support of Gs 140 OpenCV has the following function cvSmooth const CvArr src CvArr dst int smoothtype CV GAUSSIAN int parami 3 int param2 0 double param3 0 double param4 0 The parameters mean the following e src dst Source and destiny images e smoothtype Set CV BILATERAL e parami param2 Size of spatial support Q it has to be square e param3 Color sigma 0 e param4 Space sigma 05 For our example the result of bilateral filtering is shown in figure B 14 Figure B 14 Bilateral filtering using os 15 and o 31 A good way of removing more texture of the image is to reapply the bilateral filter We show in figure B 15 the result of applying bilateral filter six times on the example image with the same parameters o 15 and o 31 For the method proposed i
79. est known projections Perspective Stereographic and Mercator and discuss their properties 1 3 1 Perspective Projection The result of a perspective projection is very well known because most of the pho tographs taken with simple cameras are captured by lenses that approximate linear 10 perspective since this projection has many desirable properties that we are going to discuss further The construction of this projection is quite simple the viewing sphere is projected onto a tangent plane we are going to use the plane x 1 through lines emanating form the center of the sphere as shown in figure 1 7 Figure 1 7 Perspective projection Z z Thus x y z S is mapped to 1 A 4 _ which consists in a DD E simple division by the x coordinate Observe that the mapping stretches to infinity when x 0 and is not defined when x 0 So we define the perspective only for points with jae a UE Since we want a mapping from the equirectangular domain to a plane we have to convert the formula above to latitude longitude coordinates given x y z S x gt 0 there is a A x 7 such that x y z cos A cos sin A cos sin and the perspective projection is cos A cos sin A cos sin gt 1 sin A cos d aie cos A cos cos A cos AT E cot O pee ES E ct O RSS Ne Hence the final formula for the perspective
80. esult This approach avoids this problem by allowing the user to specify the orientation of lines section 2 2 and by obtaining an energy that measures how much a line deviates from the assigned direction section 2 5 1 e Preserve shape of objects Object distortion is another very unpleasant effect that may appear in wide angle images Previous approaches sections 1 5 2 and 1 5 3 tried to fix this problem by locally correcting the projection of objects This approach forma lizes the concept of preservation of object shapes through the mathematical concept of conformality Section 2 4 is devoted to explain such concept and obtain an energy that measures how conformal a panoramic image is This energy is minimized together with other energies in section 2 8 e Vary scale and orientation smoothly Discontinuities of scale and orientation may be unpleasant For example when the approach in section 1 5 3 is applied to scenes that do not have some natural discontinuity the result is not good the unnatural discontinui ties can be noticed on the ceiling of the scene in figure 1 27 for example In section 2 6 we describe an energy that measures how smooth a panoramic image is This energy is also minimized among other ones e Avoid restrictions to some particular structure of scene The method should 30 produce good results in a variety of scenes and it should not be restricted to some special kinds of scene All previous approaches
81. ex for each quad on the sphere that is intersected by in the following way Let qi Vi Quads intersected by l except the first and the last Let r Ao r A1 be the endpoints of qi N l see figure 2 12 We define the virtual vertex P as r Ag 5r Ay sr Ao r 44 Il which is approximately the midpoint of qi N l We use the midpoints because they are P Dol FR Dole evenly spaced along l For the first and last quads that l intersects say Gstart and Geng the virtual vertices are the endpoints of themselves regardless of their positions see figure 2 13 Here line stands for marked curves on the equirectangular domain 45 dij r Ai4i541 r A r Aj 541 Figure 2 12 Intersection of q and and the virtual vertex P Ustart dend Figure 2 13 Virtual vertices for start and end quads Now for each virtual vertex we define an output virtual vertex in the following way let A r A B 5 A j 1 C r A D r Aitij4i We project these four points orthogonally on the plane tangent to the sphere passing through P and obtain A B C D The result of this projection is shown in figure 2 14 Figure 2 14 Projection of qi on TpS and corresponding vertices We then obtain the bilinear coefficients a b c d that express P as convex combination of A B C D P aA bB cC dD The details of this process are given in section 2 5 2 Inverting bilinear interpolation
82. feature from our implementation It allows the user to choose the field of view that is going to be projected the number of vertices for the discretization of the viewing sphere and the number of double iterations The buttons alpha 1 alpha 1 alpha 2 alpha 2 beta 1 beta 1 beta 2 and beta 2 allows the user to choose the FOV 01 05 x 81 8 C 120 I 7 7 x 25 5 that is going to be projected When the user clicks these buttons the image changes in order to show the FOV in the rectangle determined by the four green lines If the buttons iterations iterations or vertices vertices are clicked the user sees in the command line the number of iterations or number of vertices that she is specifying figure A 5 Window Equirectanguler bmage El vm amp conven either a Ay Serr Ss leo keer bcm diera por are Ble EM gem Jere Tape Help heno hPa PI pou corta le ere cos N jleo ks darboux dissertacao panoramas run sh images test 55 E lExecuting Window 1 nes J number of vertices 947 i number o 50678 tema Pee ees 214 6 Cd Figure A 5 The user sees the number of vertices she is specifying for example When the user clicks go to optimization Matlab is run to perform the optimizations Each time that an iteration is finished the user sees the energy value in the ter
83. finition of conformality we conclude that u is conformal gt Now suppose that u is conformal i e du v dup v2 O p v1 v2 Yvi v2 DS e Taking v Ay P and vz p leads to du 5 p du 5 p 16 du 2 p and du 2 p are orthogonal O du UN 0 p Taking v v2 260 leads to du Ep p o Thus feto 350 amy 3500 The two above considerations du 5 p and du 5 p are orthogonal and have e Taking v vw Y p we obtain O the same length in R allows only two possibilities du 3 da P Roo du 2 p or du Ep R 90 du 2 p 38 We call Be a p d s p E p or OX Od 0 D Ea 556 0 3 Dy P ag ag the differential north vector and o Ou Ou 1 Or Dy P ag P 1 k du o gt 9 A cos p a n Dy P ag DA 1 P the differential east vector The lemma tells us that u conformal is equivalent to h Rook or h R_ogk Both possibilities are shown in figure 2 7 du or p Ao p oF p Figure 2 7 We exclude the second possibility above We ask u to preserve the orientation of the orthonormal basis of the tangent plane i e we forbid h R_gok This choice avoids a mirroring effect that could appear on the final result With these new considerations we restate the lemma Theorem 2 1 Cauchy Riemann Equations Let p A m m x 3 4 u S R is a c
84. ght are too curved We deal with this problem in next section Figure 2 10 A panoramic image obtained by minimizing Observe that it is the stereographic projection which we already know to be conformal It will be more clear in section 2 8 3 why between all possible conformal mappings the stereographic one is the result for minimizing E using the method proposed in this section 2 5 Straight Lines In this section we approach another important aspect in panoramic images mentioned on section 2 1 the preservation of straight lines In contrast to other energies we do not model straightness of lines in a continuous manner and then discretize it We directly use the discretization of the sphere to formulate 44 straight line energies Here we use the information given by the user section 2 2 and constrain the curves marked by her to be lines in the final result Figure 2 11 shows another example of a marked equirectangular image Figure 2 11 An image produced by the interface presented in section 2 2 We consider three sets of lines L All marked lines Ly Lines with fixed orientation L L Lines with general orientation Thus is the set of green red and blue lines Ly is the set of red and blue lines and L L is the set of green lines Let L In general the vertices A of the discretization of the equirectangular domain do not belong to J We thus define a virtual vert
85. ground is stationary we use the orientations of lines obtained from this frame to use in all frames in the video This avoids the alternation between minimizing Fig and Ej which may be a computational time problem for videos We now introduce some notation The solution vector X will be composed by solution vectors for each frame with the notation of last chapters For example the first entries of X will correspond to entries of the solution of frame k 0 using the order of the previous chapters T X Vow Yooo Umno Vago Doo Yoo Umm Vinnt We also construct a vector Y which is the stereographic mapping y for each instant We start to determine a panoramic video by satisfying the first desirable property each frame should be a good panoramic image The sum of image energies for each 107 frame can be written in matrix form as 2 Uooo Ao Vind Uo Ao Vini We minimize this energy in the same way we did for images X 4 A el eY The solution as it is expected is always the same projection the only thing that changes is the texture that is being projected This solution leads to temporal incoherence for the objects in the scene In figures 4 10 and 4 11 we see that the man walking starts with an orientation and finishes with another which is undesirable since this behavior does not happen in the input video Figure 4 10 Sth frame for the video produced using only image energies The man has a
86. he lines in the scene are bent on the final result 1 3 3 Mercator Projection This projection was presented by the flemish cartographer and geographer Gerardus Mercator in 1569 with only cartography purposes in mind It is a cylindrical projection which means that the u coordinate varies linearly with the longitude A and it is conformal We are going to obtain its formula when we introduce the Cauchy Riemann equations on the next chapter in order to formalize what is a conformal mapping For the moment it is just a cylindrical projection that preserve the shape of the objects Its formula is the following 55 gt R 1 9 u v A log sec d tan M 7 7 x Observe that the mapping tends to infinity when q 5 Figures 1 12 and 2 8 show some results of this projection 14 Figure 1 12 360 degree longitude 150 degree latitude Figure 1 13 360 degree longitude 150 degree latitude The main advantages and disadvantages of the Mercator projection are Advantages e As in all cylindrical projections meridians ES constant are mapped to vertical lines e It is conformal e It handles wide longitude fields of view even 360 degree ones Disadvantages e Just as the stereographic projection most of the straight lines in the scene are bent on the final result 1 4 Modified Standard Projections In the last section we showed the three most known projections from the viewing s
87. here is no transition function pe y Since all the scene is stationary on the temporal viewing sphere the transition function for the scene is Pato SUNS xd gt stdn sta x to A Q t EE A Q to Again imposing the preservation of the differential north vector we obtain the scene temporal coherence equations for this case OU OU ag ag ag A Po ta T VA Ost ESPIAS x t1 Vti t2 I0 tol 115 4 7 2 A solution A naive solution for case 2 would be for each t in the discretization of the tempo ral viewing sphere to obtain the optimal panoramic image for projecting ate as pe ps x tg As the FOV moves marked lines in the scene come in and out the X FOV what leads to different constrains from one time to another and leads to problems of size and orientation in the scene An example of temporally incoherent video constructed in this way is available in 22 Another solution which considers temporal coherence would be to discretize the scene temporal coherence equations obtain an energy term and compute a panoramic video by optimizing an energy that joins panoramic image energies and this new term just as we did for case 1 Due to time constrains we have not implemented this solution and we leave it to future work We propose a simpler solution for case 1 We obtain an optimal panoramic image for a FOV that contains all FOVs a as x gl BEH We take l S lar aa
88. hm eo a ee Sat we Boe ES e 29 2 NC CCN Ce se te cons ew es a es ee a Ae Eek Be A Bike eee 31 2 3 Discretization of the Viewing Sphere 004 35 Ze GOMAS se aas a ds Bs DE AA RR Sd VE ee o 36 2 4 1 Differential Geometry and the Cauchy Riemann Equations 36 222 DMD rara A DP EHO Oo de x 40 DAS Ennery Tenni mma dd Ag E Ama E SEU GOK go pn a no ES 42 220 SO AMES e srs TEL cw ds no he Bes ee A Ee DELE A EE E A ee E 44 Dj MEROS CRIS 4 1 a SR E Boe A SR ee e E ER E 2 9 2 Inverting Bilinear Interpolation 0 20 OMOGLNNESS y sd saga og S re e ook ee ee oo he eee A 20d Eney lema ass maca Pg See a Yo eh ES ae ee 2 7 Spatially Varying Weighting 0 0 0000 eee DO A ai o IEEE o ota AA 2 8 2 Eigenvalue Eigenvector Method 28 9 ear oysten Method sa gm acute ee we eee ah ee Boke See 2 8 4 Results in each iteration 1 Results dd BO a tok eras aaa A OS eG ee ma Dol NC aos A RD he asa ll a me So eS a ty tee ot ty So Boma oe ee ee Re ee RS SS O SS e Sd RUN sro ace Mee Oe E oe eS E EE OAR AG AYE A G do MOSUL 5 bam oran 828244 24834540 EE we SRS Or Failure CASES do god Sep be E a oe EE ee ee SS Sar Result DISCUSSION ums awa SAS Rae DSR eA a da Ee Panoramic Videos dul DUNE do ao tes DIC Be oer hh Sey oh Boe A GE A ee A Ay AMC TES CASES a de a4 ae bab ak bo rr e ERAS 4 3 Desirable Properties e 4 4 44 gs F544 6 4864 844445 4 4 A 4 4 The Te
89. hown in figure 1 24 In the next windows the user specifies which of the perspectives is the best view for each object and the program tries to solve the visibility problem for the master camera An output of the program is the right image in figure 1 25 where I rendered myself with a local camera perspective projection centered on me to correct my distortions on the left image The reader may check the home page with more results and details of this project 12 24 Figure 1 24 First window of the interface of the S3D project Figure 1 25 Both 90 degree longitude 90 degree latitude Left Standard perspective projection Right Me black t shirt corrected with a different perspective 1 5 3 Squaring the Circle in Panoramas The key idea of this article by Zelnik Manor Peters and Perona 7 is the one of constructing a projection that depends more on the structure of the entire scene not only where the objects are like the previous approach we just presented They start by discussing global projections the ones we have already discussed under the name of standard projections and suggest a multi plane projection that is construc ted in the following way multiple tangent planes are positioned around the sphere and each region of the viewing sphere is projected via perspective projection onto its corres ponding tangent plane This construction is illustrated in figure 1 26 29 Figure 1 26 Top view of the mult
90. input and output image is such that the output image has no holes e Saving image If the result is stored in the matrix variable result we use the command imwrite result result ppm PPM to save it as result ppm 127 Appendix B Feature Detection in Equirectangular Images In this appendix we explain the methods we have used to detect faces and lines in equirectangular images For these both detections we adapt standard Computer Vision techniques for the equirectangular image context Such techniques will also be detailed In section B 1 we expose in details the method we used to detect faces which is used to define the face detection weights we mentioned in section 2 7 We apply the standard face detector 16 by Viola et al on the Mercator projection section 1 3 3 of the equirectangular image Implementation details of this process are also provided Section B 2 shows the method we propose to semiautomatically detect lines on equi rectangular images This method will try to replace the task of the user of marking lines which is performed in window 1 of our application section A 1 We apply the Hough transform on perspective projections section 1 3 1 of the equirectangular image to de tect lines It turns out that the results obtained by doing only that are not satisfactory and preprocessing steps will be necessary We also show implementation details for this method and a final panoramic image produced when
91. iplane projection To unfold this projection on a plane without distortions one could think of the inter sections of the planes being fitted with hinges that allow flattening The advantages of this new projection are that it preserves the straight lines that are mapped entirely in one single plane and it uses perspective projections only for limited fields of view which avoids distortions caused by the global perspective projection This process causes discontinuities of orientation along the seams intersections bet ween tangent planes This problem can be well hidden for scenes that naturally have such discontinuities like man made environments Then the tangent planes must be chosen in a way that fits the geometry of the scene usually so that vertical edges of a room project onto the seams and each projection plane corresponds to a single wall A result of the multi plane projection is shown in figure 1 27 Figure 1 27 180 degree longitude 90 degree longitude Source image Posters by Flickr user Simon S taken from 10 Under this projection the objects on the scene still may appear distorted on the final result for two reasons if an object falls on a seam it will have a discontinuity of orientation on it which is very unnatural or even for reduced FOVs the perspective projection still 26 can distort objects The solution adopted for these two kinds of distortion is very similar to the one used on 6 the
92. its histogram We show below the code that performs this 2 steps cvCvtColor mercator_image gray CV_BGR2GRAY cvResize gray img2 CV INTER LINEAR cvEqualizeHist img2 img2 The processed image is saved as img2 Step 3 Detect faces on the Mercator image This step is performed using OpenCV s function cuHaarDetectObjects CvSeq cvHaarDetectObjects const CvArr image CvHaarClassifierCascade cascade CvMemStorage storage double scale_factor int min neighbors int lags CvSize min size This function implements a variation of the method explained in section B 1 1 It includes diagonal features to the rectangular ones but the rest of the process is identical to what we explained e image stands for the input image the Mercator image in our case e cascade stands for a cascade classifier OpenCV makes available classifiers for front face detection The command CvHaarClassifierCascade cascade CvHaarClassifierCascade cvLoad C Program Files OpenCV data haarcascades haarcascade_frontalface_alt xml NULL NULL NULL loads such a classifier for example e storage is a memory space where the detected faces will be stored e scale factor is the jump between resolutions where the detector will look for faces For 134 example we used scale factor 1 05 which means that each scale is 5 larger than the previous scale e min neighbors stands for the number of neighbor wi
93. k NY USA pp 339 347 ACM 2001 19 M Uyttendaele A Criminisi S B Kang S Winder R Szeliski and R Hartley Image based interactive exploration of real world environments IEEE Comput Graph Appl vol 24 no 3 pp 52 63 2004 20 A M d Matos Visualizacao de panoramas virtuais Master s thesis PUC Rio 1998 21 A Agarwala K C Zheng C Pal M Agrawala M Cohen B Curless D Salesin and R Szeliski Panoramic video textures in SIGGRAPH 05 ACM SIGGRAPH 2005 Papers New York NY USA pp 821 827 ACM 2005 22 L K Sacht Content based projections for panoramic images and videos URL http w3 impa br leo ks msc thesis 23 L K Sacht Face detection URL http w3 impa br leo ks cv2009 face detec tion 24 D G R Bradski and A Kaehler Learning opencv 1st edition O Reilly Media Inc 2008 25 F Szenberg Acompanhamento de Cenas com Calibracao Automatica de Cameras PhD thesis PUC Rio 2001 156
94. l method 4 1 Overview We start the study of the panoramic video problem by separating it in three cases This separation is done in section 4 2 and considers if viewpoint field of view and objects are stationary or moving through time In section 4 3 we discuss perceptual properties that we believe to be the most important in wide angle videos Our discussion is not based on any perceptual study it takes into account only intuitive ideas 97 Sections 4 4 and 4 5 are devoted to model the general case through the mathematical definition of temporal viewing sphere panoramic videos and transition functions In section 4 6 we suggest a solution for the first case of panoramic videos when the viewpoint and the FOV are stationary and there are moving objects in the scene Some first results are shown We finish the chapter by briefly discussing solutions for the other cases and making some concluding remarks 4 2 The three cases We separate the panoramic video in three categories according to the movement of the viewpoint VP the field of view and the objects in the scene The general case is a combination of theses three cases This separation was done because we think it is easier to solve first simpler cases in order to get the general solution We list below the three Cases e Case 1 Stationary VP stationary FOV and moving objects In this case the viewpoint and the field of view the rectangle 01 02 x 81 05
95. lowing formt 3 3 ar Bl dup w dy Ov Wide p 28 p Y Dy P Ag p O O where w 2 p 2 p is a vector in T S written in the basis EN o r p are orthogonal but they are not unitary since t s clear that ca and a6 OX cos d cos al we define the following orthonormal basis for T S or Or 0 zP z p iii A 5 p 9 and i or 1 Or ay E cos d By Oe E ai 55 p o dr Lemma 2 1 u S R is conformal if and only if du Rigo du Dy 2 E T T X gt 5 here Roy i R_99 90 y where 2 are p ye 7 90 1 0 2 90 1 0 90 degree rotations Details in 15 pages 84 and 85 37 Proof lt We want to prove that u is conformal Let p 7 7 x gt 5 and Or Or D Ba p It s clear Or or v 02 E T S Suppose v 01 p Aa P and va o Op that U1 U2 Q1Q2 6103 oe Or since TO and 5 P are orthonormal Now consider ae dty v1 dtty v2 ordi 3500 Gida FO cada Gto Badu 50 do EO 01t E sda 8 3 a2 du Ep dui E0 fio atu 30 Qas Applying the hypothesis we have Le ni du E p du 2 p du E 0 du 2 p 0 and e 35 0 Jas 30 and we obtain 2 m 2 du v1 dup v2 du 2 p aiaz 6102 du 5 p 01 V2 Taking O p du 5 p in the de
96. m Us to the line connecting ul to ul as the norm of the difference between ut ul and its start end 19 start orthogonal projection on the line connecting ul UL y u Wiad s u E pai Wend Wend Mar where gt T u Ustart a Uai s U Ustarto Ucnd uu end start E can be rewritten in the following form El u o ee ao o Und Una Ua I i qij Vi Aq It turns out that this way of expressing is also a sum of nonlinear squares We simplify the two expressions for E by fixing the normal vector n on the first expression which leads to 2 Elo e y Cu al Ustart n qij EVI and by fixing the projection s on the second expression which leads to Eia y u Ustart Sij tena Ustart I qijEVI Both these energies are now sums of linear squares They have a geometric interpretation Fio allows points to slide freely along the line while fixing line s orientation defined by the normal vector and Eja allows the line to change its direction while fixing the relative positions of the points defined by the s s Ifl Ly i e l has a specified orientation the vector n is known and fixed so we just minimize Er in this case If l L Ly we use both Ei and Ea in an interactive minimization The steps are described below e Initialize each s using the arc length between r A and r Astart on the viewing sphere e Minimi
97. make property 4 in section 4 3 is that the object changes size and orientation according to such changes in the temporal viewing sphere Consider a ball of radius e gt 0 around A1 1 t in S x t and assume with no loss of generality that B 1 1 t1 is projected on a ball with radius gt 0 in U TS U B A 1 t1 B U A di t1 6 We illustrate what was just said in figure 4 6 Figure 4 6 For our analysis we suppose a ball in the domain is projected on a ball on the final panoramic video Suppose for example that Y4 scales B A1 1 t1 twice and changes its position 104 from A di t1 to As Po to We illustrate this transformation in both tangent planes Toa TS and T a 02 12 TS in figure 4 7 LO OS AS Pta ta Ar ta y gt Figure 4 7 The action of the transition function for this example it doubles the ball and changes its position Observe that yr only transforms textures from one viewing sphere to another it does not change the geometry of any of the spheres In order to transport the scaling for the final panoramic video we have to preserve the tangent vectors H and K and the change of texture between the equirectangular domains will cause the expected result Figure 4 8 shows what we just explained Pt sta t to Figure 4 8 The ball in the panoramic video will be doubled from one time to another if A v4 d
98. mentation time of detection and comparisons with previous methods We focus only on explaining the main ideas of the method For details we recommend the user to read 16 The method is based in three key ingredients which are the three main contributions of 16 e Integral image use an integral graph for fast feature evaluation e Feature selection select important features using a learning method e Focus of attention focus on potential sub windows of the image These sub windows are selected by a cascade of classifiers We assume that the detection is going to be performed in a gray scale image with equalized histogram The features that are going to be used to detect faces are the rectangular features which are illustrated in figure B 1 Figure B 1 The sum of pixels that le within white rectangles are subtracted from the sum of pixels in the gray rectangles Observe that the set of rectangular features is over complete for the base resolution The detector is constructed assuming such base resolution The detection is performed by scaling 129 of the detector that is 24 x 24 pixels there are 160 000 different rectangular features The main advantage of the rectangular features is that they are easy to evaluate using the integral image model The integral image at pixel x y is given by ts DX way a lt a y Ky where 2 is the original image The value of the integral image is illustrated in figure B
99. miliarized to programming in C C and Matlab languages for a complete understanding of this section A 1 Application and user s manual We developed a software application that implements all the theory of chapters 2 and 3 but isolates it from the user i e even people who do not know the theoretical details of the method can use the application We made a video of our interface working which can be found in the home page of this thesis 22 For a better understanding of what will be explained here we recommend the reader to download such video and watch it together with the explanation The reader that is interested on having our application can contact us through the e mail that is also in 22 The application will be sent as a compressed folder containing the binary files corresponding to compiled programs in C and Matlab programs m files To run our program the system must have installed Matlab and OpenCV and FLTK libraries 118 The input is an equirectangular image in the PPM format It is recommended that it has a good resolution 4000 by 2000 pixels for example It will be more clear further why we make this requirement The input image for our example is shown in figure A 1 gt E e Ma ack l Q Rd bE R h a pur pu ue jj P iat IN ll iz Bde sd Wiens oe dvs 7 y e sii aii F ei LA His i i KEPE AO h Figure A 1 Source image Mus e du Louvre by Flickr user Gadl On the terminal
100. minal as shown in figure A 6 leo ks darboux dissertacao panorama File Edit View Terminal Tabs Help leo ks darboux 3 leo ks darboux disser leo ks darboux src fa leo ks darboux disser vertices 58366 energy system 2 7527e 05 energy system 1 6629e 05 Figure A 6 The energy value is displayed at the end of each iteration After the optimization the result is saved as result ppm in the folder of the applica tion panorama in our case The result for the example in this section is shown in figure A T 121 resbtpem One pe Edi jper rage Go Help Gmin e AALA DE MT EA d Ds de A AA JAIR io des Pat A a Em Fo 1 a iz 10 Li poe 54 M8 100 ara Figure A 7 The final result obtained for this example Field of view 220 degree longi tude 90 degree latitude A 2 Implementation Details This section is devoted to explain how we developed the application presented in the last section We try not to get too deep into all details but show the most essential ones for the understanding of our implementation Figure A 8 shows the structure of our application software a Sve Lina Matlab Processing Output Image Figure A 8 Structure of our application software We developed a shell script to perform all these steps in an integrated way When the user runs the script it seems that is only one program running but actually all the steps
101. mpo de vis o N s in troduzimos um modelo matem tico para o problema discutimos propriedades de coer ncia temporal desej veis formulamos equa es que representam estas propriedades propomos uma solu o de otimiza o para um caso particular e apontamos dire es futuras Palavras chave Esfera vis vel imagens panor micas v deos panor micos Abstract Common cameras usually capture a very narrow field of view FOV around ninety degrees The reason for this fact is that when the field of view becomes wider the projection that these cameras use starts introducing unnatural and nontrivial distortions This thesis studies these distortions in order to obtain panoramic images i e images of wide fields of view After modeling the FOV as a unit sphere the problem becomes finding a projection from a subset of the unit sphere to the image plane with desirable properties We provide an in depth discussion of Carroll et al 1 where preservation of straight lines and object shapes are stated as the main desirable properties and an optimization solution is proposed Next we show panoramic images obtained by this method and conclude that it works well in a variety of scenes This thesis also provides a novel study about panoramic videos 1 e videos where each frame is constructed from a wide FOV We introduce a mathematical model for this pro blem discuss desirable temporal coherence properties formulate equations
102. mporal Viewing Sphere and Problem Statement do EransitionPINCHONS sose E cs Eo ee eee Ee OR AAA E 4 6 Case 1 Stationary VP Stationary FOV and Moving Objects 4 6 1 Temporal Coherence Equations 04 4 6 2 Discretization of the temporal viewing sphere 4 6 3 Total energy minimization and results 4 6 4 Implementation Details as gu poa Bodo a OE A OE AO OUNCE Solutions x E a SGA SAE ELSA oe amp See eo a 4 7 Case 2 Stationary VP Moving FOV and Stationary Objects 4 7 1 Temporal Coherence Equations 04 AZ sASSOMIGION id dz E pack Bs EG EE RMR ED BS E 4 8 Concluding Remarks ss wt Be Se we Re ey oe SS ee E ee ee E Application Software A 1 Application and user s manual 0 0 0000020084 A 2 Implementation Details 0 2 0 0 0 0000000084 Posh Nildo spears She Bt a ee oS ee A 2 2 A 2 3 WN IODO Ds sacas pr os a ee ae a ia Se Es eee MablaD Process 2 ad d as ba 48 e Eh we UR B Feature Detection in Equirectangular Images B 1 Automatic Face Detection in Equirectangular IMAGES Ss do a Stree So a dE Oe es ee Ge e EE B 1 1 Robust Real time Face Detection 0 B 1 2 Method and Implementation 0 0 BPA REU LOS ro te dene ot ache yx A Diss A Mies A he SG de oe di Cr PS Dele Went Field 2 4 sea e434 heh ee ew eG Se ae de B 2 Semiautomatic Line Detection in Equirectangular Images B
103. n a scene that could not be seen under a limited FOV The study of this topic became possible only recently with the development of stitching software and equipment 2 3 and 4 With these techniques it is possible to create an image of the entire viewing sphere centered at the viewpoint an image that contains the visual information that is seen from this viewpoint in all possible directions In figure 1 one can see an example of such image which we call equirectangular image Once we have an image that represents the viewing sphere what is left to be done is to find a projection from the sphere to the plane that results in a perceptually good result Some previous works as 5 6 7 4 and 8 considered this problem The main difficulty that arises is to satisfy two important perceptual properties preservation of shapes i e objects in the scene should not appear too stretched in the final panoramic image and preservation of straight lines i e straight lines the scene should be mapped to straight lines in the final panoramic image The paper studied in depth in this thesis named Optimizing content preserving projec tions for wide angle images 1 addresses these two properties by formulating energies that measure how a projection distorts shapes and bends lines The user marks in an interface the lines she wants to be preserved and the method detect regions where the projection should preserve more shapes as face regio
104. n further sections one may want to use perspective projection but preserve shapes of other objects that are not near A 0 0 That leads to constructing perspective projections centered on Ao o 4 0 0 The geometrical construction is analogous to the one presented on section 1 3 1 Let Lo Yo 20 cos Ag cos o sin Ag cos p sin o x y z cos A cos d sin A cos sin A S The tangent plane to S passin x y z cos A cos g g through Zo Yo Zo has the following equation o x Xo Yoly Yo 20 z 20 0 TI LoL Yoy 20 1 since o Yo 20 L We illustrate the projection in figure 1 17 cox yoy 202 1 v H e a sy ToT yoy 202 Og Figure 1 17 Perspective projection centered on Ap dp x 2 We want to project x y z radially in i e we want to find a s t E a a which is equivalent to T y Z To 20 yo yo 20 20 U a a a Solving the above equation leads to a 107 yoy 202 cos p cos 4 cos Ap A sin do sin 18 So the projection x y z e has the following form in A coordinates aaa cos A cos Ase gt Erro cos o e sin do sin d sin A cos 4 sin A cos do cos cos Ap A sin do sin TRA TAO Observe that the projection is not well defined for x y z E S s t vox yoy 202 0 the plane that is parallel to
105. n section B 2 4 the number of applications of the filter is always six The parameters o and o are passed on the command line B 2 3 Eigenvalue Processing After bilateral filtering Canny edge detector is applied to the image In some cases undesirable textures still remain see figure B 16 We applied a method suggested in 25 It consists in spliting the binary images in windows of the same size and perform a local analysis of the position of the points in each window 141 Figure B 15 Result applying six times the bilateral filter Notice how almost all the texture is gone and the important lines still remain Figure B 16 Result of the edge detector applied after bilateral filtering Some textures still remain and blurring more the image would blur too much the important lines If the points lie in the same direction they possibly must belong to a line Otherwise if the points are too spread in the cell they must be discarded because they do not belong to a line For each window a covariance matrix is constructed 142 where gt ui u u v b E gt vi SU gt u vi u y m where n is the number of white points in the window and u and v are the cartesian coordinates of each white point The eigenvalues of A are NA _ AZ a c 7 c er a E c It is clear that A gt As and both are positive because A is symmetric If the ratio Ay between A and
106. n the region of the eyes and a region across the upper cheeks The feature capitalizes on the observation that the eye region is often darker than the cheeks The second feature compares the intensities in the eye regions to the intensity across the bridge of the nose It turns out that a 200 feature strong classifier is fast the results are good but not enough for many real tasks Adding more features increases too much the computational time The solution for this problem is to develop a attentional cascade classifier that we explain briefly below 132 e Simpler strong classifiers are used in the first stages of the cascade to reject non face windows leaving just a few windows to be evaluated by the more complicated strong classifiers in next stages e The illustration of this process is shown in figure B 6 All Sub windows Further Processing Reject Sub window Figure B 6 The windows that are rejected by the simpler classifiers are no longer evalua ted leaving to the more complicated classifier a reduced number of windows to evaluate e The training process to obtain a good cascade classifier is based on detection and performance goals We do not explain this process here for more details see 16 The most computationally expensive part of this method is the training process But once training is completed the detector is ready to be applied to any image and the detection time is very fast The discussion
107. n this section we discuss other possible solutions for case 1 One other solution which is actually an extension to the one we proposed is to in volve more precise object extraction and transition functions Since the background is stationary background subtraction could be used to this end Also the way we modeled object constrains introduce many constrain discontinuities in the equirectangular video A way of making such constrains smoother becomes necessary Other possible solution is to solve separately the projection for the background and foreground and combine them after This separation would lead to two simpler problems The final combination could be achieved by combining the projections or composting the resulting images Some problems like incoherence between background and foreground could appear Also interactions between background and foreground would be difficult to model We intend to implement these different solution soon and compare them to the one we proposed in this thesis 4 7 Case 2 Stationary VP Moving FOV and Sta tionary Objects We propose in this section a solution for case 2 For each time t 0 to we project the set a a x BO B x t SU x t i e the FOV that will be projected depends on time This case may be seen as a viewer of panoramas where each view is a panoramic image 4 7 1 Temporal Coherence Equations For case 2 we do not have moving objects in the scene Thus t
108. nce was already reached Energy value E 0 0609 in energies E and Fe This choice can become a problem if AA Ad thus we decided to use such terms to have more general discretized partial derivatives Other explanation for this difference of weights is that the authors of 1 do not pro vide implementation details of their system hus slight differences with respect to our implementation may exist But the most important thing about the weights we chose is that all results in this thesis were generated with a fixed set of weights we 0 4 ws 0 05 and w 1000 72 Chapter 3 Results This chapter is devoted to show and explain many results produced by the method described in this chapter and in the previous one All results were produced using the application software explained in Appendix A We first show four examples for which we consider the method was successful We compare each one to the three standard projections presented in section 1 3 and to the perspereographic and recti perspective projections presented in section 1 4 For both these projections we chose parameters such that the result was the best possible We give the following information about each example e Field of View The field of view S C S chosen by the user to be projected It is always a rectangular subset of the equirectangular domain e Number of vertices The number of vertices used to discretize S C S according to the discretization explai
109. ndows that has to be detected to a window be reported as a face This parameter uses the observation that usually when a window is detected as a face some other windows near it will be reported as a face too Thus the parameter prevents that false face results happen We used min_neighbors 4 e min size is the minimum size of window where the detector looks for faces We used 1 x 1 pixel window as minimum Step 4 Loop on detected faces For each detected face e Map the coordinates of its center and corners in the Mercator image back to the equirectangular domain e Draw a rectangle around the face on the equirectangular domain e Write in a text file the file face matrix trt mentioned in section A 2 the A amp coordi radius 3 nates of the center of the face and o where radius is the radius of the face on the Mercator domain Example and details about this file we give in section B 1 4 We made the source code of the method available in 23 B 1 3 Results We show in this section some result images figures B 7 B 8 and B 9 of the method described in last section Figure B 7 4 faces correctly detected 1 missing and 1 false detection Source image Sziget 2008 Solving a maze by Flickr user Aldo 139 Figure B 8 7 faces correctly detected 1 missing and 2 false detections Source image eppoi3 by Flickr user popx The detection in figure B 9 was already used in a result shown in
110. ned in section 2 3 e Number of double iterations Number of double iterations necessary to reach visual convergence as explained in section 2 8 4 e Final energy Energy obtained after the final iteration section 2 8 4 e Time to construct the matrices Time necessary to construct the matrices C and S sections 2 4 and 2 6 in a Intel Core 2 Quad Q8400 2 66 Ghz using Matlab Version 7 6 0 324 R2008a All times in this section were obtained in the same computer e Time to perform the optimizations Time required to perform all the iterations detailed in section 2 8 4 e Time to generate the final result After a final solution x is obtained we turn it into an image using bilinear interpolation Details are given in section A 2 In addition for each example we make specific comments emphasizing what properties the example illustrates We also show an example comparing the result of this method with the result produced by 7 method that was explained in section 1 5 3 To finish this chapter we show some results produced by the method that are not as desired For each example we explain why the result showed an unpleasant behavior We also provide a qualitative discussion about the results 13 3 1 Result 1 e Source image Britomart in 360 by Flickr user Craigsydnz e Field of view 220 degree longitude 140 degree latitude e Number of vertices 58 479 vertices e Number of double iterations 4 e Final energy E 9
111. nitial guess for lines which may help the user avoiding her to have to mark all the lines One final remark is that the interface showed here is simpler than the one implemented in 1 Our input is an equirectangular image that represents the entire viewing sphere In 1 the input may have arbitrary FOVs and other formats not only equirectangular Despite simpler our interface serves our purposes well 2 3 Discretization of the Viewing Sphere For the rest of this chapter we assume S S i e the field of view that will be projected is the entire viewing sphere All the development is analogous if restricted to some narrower FOV since such FOV corresponds to a rectangle on the equirectangular domain In section 1 2 we stated the panoramic image problem as the one of finding u S2 gt JR M gt u A p v A gp where A are in the equirectangular domain and u v are cartesian coordinates that represent position on the image plane Instead of finding a function defined in all equirectangular domain we discretize it in a uniform manner and look for the values of u at the vertices More precisely the vertices of the discretization of the domain are where 25 T T MA hae 0 a Di 0 Oe n 2 m and the corresponding values of u at A are U u A Pij renee 9 0 sis ds e 0 MN Figure 2 6 illustrates what was just explained The image of each rectangle by the function r on the sphere is called quad In
112. ns for example Based on this in Figure 1 Example of equirectangular image formation the method formulates the energies and the minimizer of a weighted sum of these energies is the panoramic image that most satisfy these properties An extra term for modeling the smoothness of the projection is necessary to avoid mappings that vary too much to satisfy the constrains An example of panoramic image produced by this method is shown in figure 2 Figure 2 Example of result produced by the method deeply discussed in this thesis This thesis is also concerned about the problem of obtaining perceptually acceptable panoramic videos This theme has the motivations we already mentioned for panoramic images but also it has more interesting practical applications The development of ideas in this field could lead to new ways of filming which could be applicable for cinema and sport broadcasting for example Recently capture devices that film a wide field of view were invented An example of it can be found in 9 These cameras return a video where each frame is an equirectangular image for the respective time Again what is left to be done is to project this set of viewing spheres which we call temporal viewing sphere to a set of images Very little work has been done on this subject The strategy adopted in this thesis is to adapt the theory studied for images and include new desirable properties that model temporal coheren
113. o alternate between minimizing each one at a time Such alternation is detailed in this section where we also show the intermediate results produced by each iteration This process is necessary to to straighten the lines marked by the user with no specified orientation the green ones in figure 2 32 Figure 2 32 Alternating between minimizing E and Eg straighten the general orientation lines The initial iteration consists in minimizing 2 2 2 Y 2 Y 2 Ed we Eo SP wo E P wi Elo P w Eid Agx leL y lEL Ly where A is defined in section 2 8 1 As explained in section 2 5 1 in this initial iteration we define the coefficients s which are necessary to define Eja as the arc length between r A and r Astar in the viewing sphere Since this choice may be imprecise in terms of expected final result we set a lower value for w in this iteration w 10 The other weights are w 0 05 and We 0 4 The minimization process explained in last section produces x R that contains the positions u vij where each vertex A of the viewing sphere is mapped to In other words Gas Uir u A 0 1 iio M q 0 1 adds 68 Since E is multiplicative E Kx KE x VK gt 0 what matters is the direction of x and not its norm We chose each iteration to return a unitary vector Thus we return X x Ra this vector x we produce a function defined on the entire sphere or on
114. o be the master camera which is going to be used to render the background and the other ones are the local cameras which are going to be used to render the objects in the scene The visibility is not a well defined problem in this context They use the visibility 23 m cal A a Je a nt Y Taz fref 5 E l x L Figure 1 23 Raphael s School of Athens ordering of the master camera to solve this problem a point will be rendered if it is visible for the master camera The key idea that we have to take from this paper is that a special treatment can be given to the objects in order to reduce their distortion m panoramic images But the multiprojection rendering has also other applications e Artistic Expression The usage of different viewpoints was used by painters also to express feelings ideas and mood e Best Views A good viewpoint for an object may not be the best viewpoint for other objects By choosing the best viewpoint for each object in the scene it s possible to improve the representation of the scene The author of this thesis in a final course project adapted the techniques described in this article for real world scenes in the following way a set of equirectangular images views is given to the user so he can choose a different perspective camera for each view by setting the FOV and the center point of each perspective A screenshot of the first window of the user interface is s
115. oaches in order to understand what pro perties are desirable in wide angle images sections 1 3 1 4 and 1 5 1 1 The Viewing Sphere In this work any scene observed from a fixed viewpoint at a given moment will be modeled as the unit sphere centered at the viewpoint S z y z ER x y 4 27 1H on which each point has an associated color the color that is seen when one looks toward this point Here we assume that the viewpoint is the origin of R for convenience This sphere we will call the wewing sphere Notice that the viewing sphere represents the whole 360 degree longitude by 180 degree latitude field of view Figure 1 1 shows an example of viewing sphere Figure 1 1 A viewing sphere looked from outside that represents the visible information of some scene A very known and useful representation of S is the one by longitude and latitude coordinates r mm x 2 2 gt amp 4 6 cos A cos sin A cos sin 9 This representation is illustrated in figure 1 2 Figure 1 2 Longitude latitude representation r r gives us a way of representing all the information of a scene from a single viewpoint as the longitude latitude rectangle 7 7 x 5 El which we will call the equirectangular domain Also known as yaw and pitch values or pan and tilt values Recent development of stitching techniques made it possible to take many pictures of a scene and stitch them
116. oarse grid only 1 369 vertices for the viewing sphere because a finer one would turn the minimization of E eigenvalue eigenvector method impractical 4 A E A r N a rr Figure 2 31 Left final result minimizing in each iteration energy E with the eigenva lue eigenvector method Right final result minimizing in each iteration energy E with the linear system method e 107 While the eigenvalue eigenvector method took some minutes long to perform all ite rations and produce a final result the linear system method took just some seconds For all reasons discussed in this section from now on we only use the linear system method to produce the results i e the function to be minimized in each iteration will be E Now we can understand why minimizing only E in section 2 4 3 led to the stereogra phic projection to do such minimization we set w w 0 we 0 in Ea 2 2 2 2 2 Ea w E w Es w X Ewo w Eau in this case 67 Among all possible mappings that vanish E the conformal ones the linear system method chose the stereographic one because it also minimizes the term eljx yl i e the stereographic projection minimizes E x Eq x ellx yl w E x ellx yl 2 8 4 Results in each iteration In the last section we showed how to minimize and Eg separately But as men tioned in section 2 8 1 in order to produce a final result we have t
117. od Figure 2 1 A result produced by the method described in this chapter FOV 285 degree longitude 170 degree latitude Observe how most of the lines in the scene are straight and the shape of the objects is well preserved Motivated by chapter 1 we start this chapter by making a list of desirable properties in panoramic images and how this approach satisfies most of them Then we detail each section of the article in a mathematical way all the necessary definitions and theorems are going to be stated The pre requisites for understanding the 28 theory are going to be mentioned progressively For the moment we assume the reader is familiar with multi variable calculus and linear algebra The only parts of the article that are not explored in this chapter are the results and implementation The first topic is left to Chapter 3 where a discussion about it is provided We leave to Appendix A the details about how we implemented the method we are going to describe here To summarize the main goals of this chapter are e To list the main desirable properties in wide angle images section 2 1 e To make an as complete as possible mathematical explanation for the techniques in 1 all the other sections of this chapter 2 1 Desirable Properties This section is devoted to argument why we consider 1 the best method for dealing with panoramic images and why a study in depth about it is worthy We believe a method to produ
118. odel the problem as the one of finding a projection and we propose an optimization solution for a particular case These contributions are all found in chapter 4 e An in depth conclusive analysis of 1 As we already mentioned 1 omits de tails of their method This thesis makes a complete mathematical analysis of their work and also a conclusive analysis based on the results produced by their method This ana lysis is in chapters 2 and 3 Our work has other contributions of less impact but also important in the context of this thesis e Line detection on equirectangular images We propose a method to semiauto matically detect straight lines of the world in equirectangular images This detection was pointed as future work in 1 and helps the user in the task of marking lines This contribution can be found in appendix B e Application software We propose in this work an application software that has some features that the interface proposed in 1 does not have such as specification of FOV vertices and number of iteration This application is explained in appendix A e Perspereographic Projections We developed a set of projections that interpolates conformality and preservation of straight lines in a very intuitive way It has the same purpose of the projection presented in 5 but is obtained in a much easier way These projections are in section 1 4 1 Structure of the thesis a time line The thesis is structur
119. of this file is matlab_data tzt We show below an example of such file 1 9199 1 9199 0 5236 1 0472 9 3 A 2 3 Matlab processing This module consists in four submodules as illustrated in figure A 9 Load data from Matrices a Windows 1 2 and gt eer gt Optimization ss gt Save image Construction Face Detection Genereation of the final result Figure A 9 The steps performed in Matlab Next we explain each of the submodules briefly e Loading data Files L endpoints txt L2_endpoints tat matlab data txt and face matriz txt are loaded using command load from Matlab Each text archive is transformed in a matrix with the same structure of the text file For example for matlab_data tzt like below 1 9199 1 9199 0 5236 1 0472 9 3 the command returns the following matrix matlab_data 1 9199 1 9199 0 5236 1 0472 9 3 e Matrices construction This step constructs the conformality and smoothness ma trices C and S sections 2 4 and 2 6 and the stereographic mapping y that will be used to minimize as explained in section 2 8 3 At this point it is important to construct the matrices having in mind their sparse structure and how a sparse matrix is stored in Matlab A naive way of constructing them leads this step to take longer than the optimization We fixed this problem but we do not discuss it here e Optimization This step starts by calculating th
120. olution of the optimization X and obtain x The results for the first second and third double iterations are shown in figures 2 34 2 39 and 2 36 respectively Figure 2 34 Left The minimizer of E Energy value E 0 0740 Right The minimizer of Eq Energy value Ea 0 0531 As can be seen minimizing E produces perceptually better intermediate results Thus we chose to use as final iteration the minimization of just E Figure 2 37 shows the final result for this example The number of double iterations needed to obtain convergence varies from case to case but usually three double iterations are enough The weights w w and w were empirically determined The main criterion used was to set parameters such that the results were very similar to the ones obtained in Carroll et al 1 We tried to use the same parameters presented in that article we 1 ws 12 w 1000 but they did not work well in our case One explanation for that is that the authors do not consider the terms AA and Ad used to discretize the partial derivatives 10 Figure 2 35 Left The minimizer of E Energy value E 0 0609 Right The minimizer of Eq Energy value Ey 0 0518 Figure 2 36 Left The minimizer of E Energy value E 0 0609 Right The minimizer of Eq Energy value Ey 0 0518 71 4 ih ddd Figure 2 37 Result for the final iteration Comparing with figure 2 36 left we see that the visual converge
121. omitted by convenience To obtain Eul LD x we associate 2 lines of LD to each quad qi Vi say q th and q 1 th lines and define its entries as 1 1 _ LD it i n lij LD 541 i n 1 bij 1 1 o ad ia a TSijCend L pCa ee De SC id LD 41 26 i n 1 1 Fag LU 4 n 1 1 Vij 1 1 LDA Gena iena DMA AA gt SijCend LD 41 2 jenat tenat1 mt1 1 o Sijdend The other entries of LD are zero If we do the above process for each line in LAL we obtain the matrices 9 g LDO LD LD We define the alternate fixed projection line matrix to be LD LDA LD LDA is also sparse since it has at most 12 nonzero entries per row It turns out that Eid y Eul lEL Ls which can be written as Eja LDAx This conclusion is obtained in the same manner we did before to obtain Ej LOx We now have both energies in a matrix form The minimization for all lines the set L is performed by first minimizing gt Epl gt Eul LOx LDAx leLy lEL L y and then plugging the obtained values into y Erol and minimizing ELAL y Elo I oF y Elo LE Ls lEL L f LOx LOAx The obtained values are plugged into y Eul and the process continues until conver lEL L gence is reached More details of such minimization are left to section 2 8 where such energies are minimized among with other ones
122. on for one of the cases This is the main original contribution of this thesis since almost no material on this topic is known on the literature Discussion At the end of this thesis we can make some qualitative concluding remarks that reflect the main ideas one can get from our work The first one is that the panoramic image problem is now very mature The need for obtaining images of wide fields of view finds its roots centuries ago in paintings and the limitation of perspective projection was already in the Renaissance But only with the development of technology and methods as for example stitching equipment and methods it was possible to bring this theme to a different level In recent years many publications on the theme were published and now it is clear what properties we expect in panoramic image Among all these references we decided to study and implement Carroll et al 1 We think we proved the applicability of the method by showing a variety of situations where it produces good results The explanation for the quality of the method is that it models precisely the undesirable distortions through the mathematical formalization of them Equations and energy terms were used to model the distortions At this point it is valuable to observe how a concrete theme such as producing pa noramic images and videos can amalgamate very different areas of pure and applied ma thematics In this work we used theories
123. onformal mapping that preserves the orientation of the orthonormal basis of T S if and only if Ou 1 Ov Ov 1 Ou iad P pal E 9 0 p cos ay Eram 9 0 p cos d ay P Proof u conformal mapping that preserves orientation h Rook amp 1 Ou Ou o 1 Ov 0 1 cos OX p s da P TT cos g Sr 9 Ao RO Zp A cos p OA p Od p m cos OX p Despite of being a little abuse of nomenclature from now on we consider the fact of u S R being conformal equivalent to u satisfying the Cauchy Riemann Equations The C R equations are an analytical and practical way of checking conformality 99 2 4 2 Examples This section illustrates the theory that we just developed with two mappings u already discussed in chapter 1 there we just used intuition and perception to argument that Mercator section 1 3 3 and Stereographic section 1 3 2 projections are conformal Here we argument it analytically The Mercator projection is a cylindrical projection designed to maintain conformality The imposition u A makes it cylindrical because u is proportional to A To impose conformality we use the C R equations and determine the expression for the v coordinate from the second equation v 1 du 1 do col DX costo 1 From basic calculus we know that one possible solution for the above differential equations 1s v log sec tan Such v also satisfies the first C R equation du qo
124. onformality smoothness and line energies The total energy will be a weighted sum of such energies The fixed values for such weights are defined in section 2 8 4 If the values of the projections s section 2 5 are set the energy to be minimized is Ea wE w E wy y Elo w Ea lELy IELAL y If the normal vectors n section 2 5 are set the energy is Es wE w E wy y Elo wi y Elo leLy ELAL f We thus alternate between minimizing these two energies in the same way described in section 2 9 By defining WU WU Aa Gee and A use w LO w LO wi LDA w LOA where C S LO LDA and LOA are the conformality smoothness and straight line ma trices we can rewrite both energies as Eu Aax and E Aox We use here again the notation established in section 2 4 3 Xa j i n 1 Ui ANd X2j i n 1 41 Vij 2 8 2 Eigenvalue Eigenvector Method For this section and the next one we drop off the indices of Eq and E and study two different ways of minimizing E x Axll Let x 4 0 The equality B x E Ips Alb x 2 X Iate E x O is a minimizer for E but it s an undesirable solution since corresponds to mapping all the 10 viewing sphere to the origin 63 shows us that E is unbounded and has no minimum or maximum for x 0 Since E x is proportional to E it turns out that the set x x 1 is a X Ns x good place to look for a minimiz
125. otu gt E w 0 w 0 w Y 0 w gt 0 0 IEL IELAL lEL y lEL L f which implies Ax Agx 0 and obviously Ax 0 Then we have At Ax A 0 0 0 x and this proves the statement 64 We do not want constant mapping as solutions because they do not satisfy the per ceptual requirements that we discussed in the first chapter The subspace of constant mappings K x u ku vij ky has dimension 2 1 and vv has m 1 n 1 since K span vu vv where vu has entries v 0 and u entries Uij 0 and Vij EE m n If we look for min E x in x x 1 x vu x vv it may happen that E has no minimizer since such set is not compact anymore Thus we restrict our minimization to x x 1 M K Assuming e vu ez vv in the proof of Statement 2 3 we have x x 1 A K ues mes pe 1 In the same way we did in the proof of such statement we conclude that the minimizer is ez and E ez A3 So it is enough that we look for the eigenvector of A A associated to the third smallest eigenvalue Although we found an exact solution for our problem finding eigenvectors numerically is never an easy task There are some reasons for this fact e The problem Ax Ax is nonlinear itself e The methods usually find faster the eigenvector associated to the smallest eigenvalue To find the third one takes longer e Our problem involves thousands of variables and the problems above
126. pective the user may want to discard more points and thus detect less lines in the final result in a different way for the different perspective images B 2 5 Results Figures B 27 B 28 and B 29 show some more results obtained with our method Figure B 27 161 line segments detected Input image Kitahiroshima Station by Flickr user Vitorid 148 Figure B 28 93 line segments detected Input image Microsoft Vodafone Viaduct by Flickr user Craigsydnz Figure B 29 226 line segments detected Input image Nagoya University by Flickr user Vitorid In figure B 30 we show a final panoramic image produced using the lines returned by our method shown in figure B 26 B 2 6 Concluding remarks As can be seen in figures B 26 B 27 B 28 and B 29 many important lines were detected by our method but some other lines were missing Another problem is that there are regions as the rails in figure B 26 where too many lines were detected which leads to too many constrains for our optimization Other point to be mentioned is that when the 149 Figure B 30 Panoramic image produced using lines shown in figure B 26 FOV 140 degree longitude 160 degree latitude Vertices 31 000 user marks lines she has the possibility of marking lines that are not clearly represented in the image as a horizon that she wants to be horizontal in final image All the reasons mentioned above show
127. pes of the objects are well preserved Figure 2 9 285 degree longitude 180 degree latitude 41 2 4 3 Energy Term We now focus on measuring how conformal a mapping from the discretized viewing sphere section 2 3 to the plane is For this task we use the C R equations obtained in section 2 4 1 among with a standard numerical technique for discretizing PDEs named finite differences We approximate the partial derivatives at the vertices of the discretized viewing sphere in the following way Ou Ui l j Uij Ov Uij 1 Vij T POIS Ad A a AA v Uag n Ou o UH Ui Sold e MERE s MEY 2 where Aq E ese m n Replacing the derivatives by their approximations in both C R equations du 1 v dv 1 u db cosdOX dd cosdOX we obtain Ui 1 j7 Uij 1 Uij 1 Vij and Vi 1 j Vij 1 Ur j Uij l E S eee TO E ds e bs 1 Thus a discretized mapping should satisfy the following equations Uitti j Uij Lo Vijyi Vij 70 Ad COS Qij AA and l Wa Ugg Vi4 1 Vi COS Qij AA Ad If we take the left sides of both equations to measure the deviation of conformality we 0 will obtain similar values of deviations for quads with very different areas on the viewing sphere This would cause an effect of biasing conformality in regions of the sphere with high quad densities and trying to minimize these conformality deviations would favor such regions
128. phere to an image plane In this section we show how simple modifications of these projections can generate better results 15 1 4 1 Perspereographic Projections As we mentioned before perspective and stereographic projections have a lot in com mon Actually they are constructed in a very similar way the only difference is that the point from where the rays emanate in the first is 0 0 0 and in the second is 1 0 0 We generalize the geometrical construction of these both projections as follows for each K 0 1 we define the perspereographic projection for K as being the one obtained by projecting points from the sphere on the x 1 plane through rays emanating from K 0 0 Figure 1 14 illustrates this projection Figure 1 14 Perspereographic projection for K If 1 y 2 is the projection of x y z S by similarity of triangles we obtain joy A Z so U hjy o Urk z FRK a2 K 1 K z4K O atk KO Obviously this projection is not defined for x y z S such that r K To simplify we are going to consider just the points s t x gt 0 In longitude latitude coordinates 1 K sin A cos 1 K sin cos A cos sin A cos sin gt 1 cos cos d K cos A cos d K And the final formula is PSx ee x on gt R 2 2 1 K sin A cos 0 1 K sin Ad 0 00 oi ao K Notice that when K 0 we have the perspective projection and for K
129. pter 4 Panoramic Videos In this chapter we study the problem of producing perceptually good panoramic videos A panoramic video is a video where each frame is a panoramic image In order to produce these videos we join the theory developed in previous chapters to novel ideas that model temporal coherence in wide angle videos It is important to emphasize that this chapter is the beginning of a research that probably will last some years In this chapter we model the panoramic video problem discuss cases and desirable properties suggest a solution for one of the cases and point directions to solve the other cases As far as we know the problem that we address in this chapter was not solved yet Previous works as 17 18 19 and 20 produce from a temporally variable viewing sphere which we call in this chapter temporal viewing sphere a video in which each frame has a narrow FOV for immersion purposes The work that has closest goals to the ones we have is 21 They produce a video with wide angle frames from a set of common photographs by transferring textures between frames in a coherent way Our method takes as input a temporal viewing sphere that represents much better an entire scene since it is not limited to some field of view Also we consider geometric distortions in wide angle videos which are not considered in 21 Furthermore their method is very restricted to scenes with particular structures We develop a more genera
130. re 2 30 we compare a result without and with the face weights Wi e Total weights We use the combination of the 3 kinds of weights suggested in 1 in order to define the total weights 61 Figure 2 30 Result without and with face weights Such weights are the most effective ones as can be seen in the woman s face that looks less distorted in the bottom image 2 8 Minimization Up to now in this chpater we modeled some kinds of distortion in panoramic images and derived some energies conformality straight line and smoothness energies that mea sure how distorted an image is Thus ideally we would like to find mappings u with null energies We already discus sed the difficulties of finding distortion free mappings in Chapter 1 Also in the result discussion in section 3 7 we prove an additional statement that says the only mappings that satisfy both conformality and smoothness conditions are the constant ones There fore a more reasonable solution is to minimize all the energies in some sense In this section we formulate an energy that is a sum of all energies obtained up to 62 here Such quantity takes into account all types of distortions We then study two ways to minimize it one that is exact but slow and another that provides an approximate solution fast 2 8 1 Total Energy In order to formulate the total energy we use weights we w and w that control the relative importance of the c
131. riation of position of the center of each sphere We represent the temporal viewing sphere by its corresponding equirectangular video For each t 0 to we associate a frame in the equirectangular video containing the equirectangular image for this time Thus the equirectangular video is just a set of equirectangular images representing the set r 7 x 5 5 x t for each t 0 to There are some special devices that capture equirectangular videos The one we used was Ladybug2 For more information about this and other cameras one can see 9 In figures 4 2 and 4 3 we show some frames of the equirectangular video we captured using this camera With the definition of temporal viewing sphere we now can state the panoramic video problem We look for a function U ScTS gt R A t gt UO 4 t VO t 2 with desirable properties 100 Figure 4 2 First frame of the video we use in this chapter The camera we used does not capture the lower part of the entire field of view i e the points with latitude near to 5 cl pi E ne na a Figure 4 3 Last frame of the video we use in this chapter We now consider tangent vectors in TS and in U TS Let p A t 21 7 x 5 5 x 0 t0 A tangent basis for 7S is e Ry 00 1 sin A cos 6 cos A cos d 0 0 e Ry 55 0 1 cos sin 9 sin sin 0 0 oR A t 0 0 0 1 The basis abo
132. rojection could have a worse behavior that it wouldn t be noticed We construct we as follows suppose 1 lt i lt S m 1 1 lt j lt n 1 We take the mean value of luminance on the 3 by 3 window centered on A aja peg ga Sri ga P bg pe ba Se Da pb mean f 9 where L is the luminance of the corresponding pixel to A on the equirectangular image For boundary vertices the construction is similar 60 Then we obtain the standard deviation of luminance on the window gt Ley mean kl 9 Finally we define we by normalizing the deviations to be between O and 1 Ss dev min deus max dev min dev In figure 2 29 we show a result without and with these weights iil iii e f ari a F r f a e Figure 2 29 Left Without salience weights Right With them The difference is very subtle e Face detection weights Even little distortions in human faces are very noticeable Thus we would like shapes of faces to be especially preserved This is achieved by increa sing conformality in face regions To do that we should be able to detect faces in equirectangular images This process is detailed in section 2 8 where we discuss theory and implementation of a standard face detector 16 and describe a way of applying it to equirectangular images From this face detection process a weight field Wi is obtained Such field has higher values in face regions In figu
133. rs for the optimization was developed This class was named alg_parameters class alg_parameterst public float alpha_1 float alpha_2 float beta_1 float beta_2 int iterations int factor char input image 100 char output image 100 int m char work image name 100 public alg parameters char input image2 char output image2 int m2 constructor alg parameters empty constructor void print alpha_1 alpha 2 beta_1 and beta 2 determine what field of view a1 as x 61 G2 E r r x 5 5 will be projected The buttons alpha 1 alpha 1 alpha 2 alpha 2 beta 1 beta 1 beta 2 and beta 2 update these values and the image inside the box is also updated For this window the box with the image only has function of loading an updated image whenever the parameters alpha 1 alpha 2 beta_1 and beta 2 are updated The parameter iterations stands for the number of double iterations section 3 3 The buttons iterations iterations update these values factor controls the number of vertices in the following way if the size in the equirec tangular image of the field of view that will be projected is m x n pixels the number of 124 vertices of the discretization of 01 09 x 61 5 will be ee ee The output of this window is a text file containing the parameters alpha_1 alpha_2 beta 1 beta 2 factor and iterations The name
134. rtojstart A Os EE A Na1CstartUistari lijstart Ml Uta al NU nad O a Viale N2di 1 5 1U 1 5 1 q NolstartVistartrjstart A 2 NalCstartVistarttLjstart Mod start ii where a Ori Githa liria Dita Ostari Cotart deta Ore the bilinear coelncients used to define the output virtual vertices We can rewrite Eno as Eio LO x making each quad qi V corresponds to a line in LOW say the q th line 7 o J LO 2Aj i n 1 ilij LO X j 1 4i n41 Z MDL o o i E DO ienes Mi DO ua maa et 1 Ea 1 LO 2 jstarttistart n 1 7 Tl start PC 42 ntact Disto Oxpares DO ee ee T MA Cstart gt LOY Hera A Distant AD mA Z 14start DO a Rali o rima E n bi j 1 LO 2 j i 1 n 1 1 2G 1 9 gt LO cs Nadir LO ca 24 start 12am noite No0start LO T ne N2Cstart gt LO N2U start Dista PIE A Qu ak The other entries of LOW are set to be zero We want the energy for all lines in Ly fi m i e Fito T y Eo y y u TETE f EL lEL y qij EVI 102 The above construction is made for each line leading to matrices gt nP s o Ee Lowe LO We use here the notation of section 2 4 3 to construct LOW and x 49 that correspond to the inner sums and define LOG LO role a We call LO the fixed orientation line matrix Observing that each line of LO has at most 8 nonzero entries per line since n 0 or ng 0 in this case we
135. s 3 Locally Adapted Projections to Reduce Panorama Distortions 8 This work starts with a cylindrical projection of a scene and allows the user to mark regions where he or she wants the projection to be near planar Then the method computes a deformation of the projection cylinder that fits such constraints and smoothly varies between different regions and unfold the deformed cylinder on a plane Although their results are very good in many cases and produced quickly via optimization their method has limitations if some marked region occupies a wide angle FOV up to 120 degrees the final result starts suffering with the same limitations as perspective projection stretching of objects a good solution depends on the precision of the user in marking regions even inside the marked regions lines may appear slightly bent and if two marked regions are too close orientation discontinuities may appear between these regions similarly to the method presented in section 1 5 3 2 Chapter 2 Optimizing Content Preserving Projections for Wide Angle Images In this chapter we show and discuss in more depth the ideas presented in Carroll et al 1 which we believe to be the state of the art reference for the panoramic image problem Many details that are going to be exposed here do not appear in the original reference which makes this chapter a good complement for it We show in figure 2 1 a result produced by this meth
136. s detection can substitute or abbreviate the task of marking lines performed by the user in window 1 section A 2 1 The detection is semiautomatic because it depends on parameters This set of para meters may vary from case to case and we did not find a good fixed set Such parameters are detailed in the end of this section We start this section by explaining the Hough transform the bilateral filter and a pro cess that we called eigenvalue processing These three topics will not be explained in the equirectangular image context because such operations will be performed in perspective images Next we detail our method that is based on obtaining six different perspective pro jections from the equirectangular image and searching for lines in each of them To finish we show some results of our method and panoramic images that one would obtain if used the detected lines instead of marking them in window 1 We also make some concluding remarks B 2 1 The Hough Transform In this section we show a standard technique for detecting lines in images named Hough transform The example image that we will use in this and in next two sections is shown in figure B 10 137 Figure B 10 Example image a perspective projection taken from an equirectangular image The Hough transform is applied to a binary image We apply the Canny filter to the previous image and obtain the edge image shown in figure B 11 Figure B 11 Binar
137. s less trivial and also it will be more difficult to model the transition functions Maybe a way of modeling them is with the help of perspective projections since the geometry of such projections is well understood For narrow fields of view the perspective projection could be used to determine transition functions on neighborhoods of points and also determine if the point is a part of a moving object Many things in this chapter were pointed as future work We make a summary of future work related to panoramic videos in the conclusion of this thesis 117 Appendix A Application Software This chapter intends to complement the theory developed in chapters 2 and 3 through the exposition of an application that implements everything that was discussed in such chapters and is user friendly We provide a manual of how to use such application and give implementation details In section A 1 we show our application working It consists in two windows where the user interacts and a processing step performed in Matlab that computes the final panoramic based on the information passed by the user into the two windows We also explain how to use the program thus the reader of this thesis can try it and produce her own results Section A 2 is devoted to explain how the application was implemented Although we do not show many source codes the details we consider most essential for the overall process will be provided We assume the reader is fa
138. s through Zo yo form a sinusoidal curve in the Hough plane Figure taken from 24 The most voted pairs p 0 represent lines with more points in the edge image and are elected as lines OpenCV implements this process through the function cuHoughLines2 CvSeq cvHoughLines2 CvArr image void line storage int method double rho double theta int threshold double param 0 double param2 0 For details about the parameters we refer to 24 pages 156 to 158 We used for method the value CV_HOUGH_PROBABILISTIC which returns lines with endpoints a very 139 convenient feature for our application We do not discuss what values we used for the other parameters here In figure B 13 we show the detected lines for our example Figure B 13 The detected lines are shown in green As one can see many nonexisting lines were detected This happened because in highly textured regions as the ground between the rails the edge detector detected many points that voted too much for nonexisting lines in such regions A naive solution would be to change the parameters of the edge detector to detect less lines but this would cause in missing many lines In next sections we present methods that try to alleviate such problem B 2 2 Bilateral Filter Our first attempt to remove undesirable texture from an image was to use bilateral filtering Simple blurring would handle this task but would also blur lin
139. shed the user clicks on the close window button which has a callback to close the window Then another function is called to separate the points in the generated text file into points corresponding to lines with specified orientation and points corresponding to lines with general orientation These files are saved as L endpoints and L2_endpoints Such files will be loaded by Matlab into a matrix form We show below an example of an L_endpoints file 0 80423999 0 01256000 1 2 0 36442399 0 01256000 1 2 0 19477800 0 18849500 2 1 0 19477800 0 02513000 2 1 4586 199 0 3832 399 3 1 4586 199 0 01256000 3 1 6694798 0 30159199 4 1 18539801 0 01256000 4 1 18752205 0 49637100 5 1 18123806 0 02513000 5 1 69017601 0 33929199 6 1 68389297 0 03 69000 6 1 0 54663002 0 18221200 7 1 0 54035002 0 01256000 7 1 0 79795998 0 36442399 8 1 0 77911001 0 01256000 8 1 1 15610003 0 31415901 9 1 1 14981997 0 03141000 9 1 Rh e e e oO O O O 11 stands for vertical lines 2 for horizontal lines and three for general orientation lines 123 1 57079005 0 49637100 10 1 1 55193996 0 03141000 10 1 1 99804997 0 33929199 11 1 2 01061010 0 03141000 11 1 A 2 2 Window 2 Window 2 was also developed with C language using FLTK API Window 2 was shown in figures A 4 and A 5 It consists in a box some buttons that regulate the parameters that will be passed to the optimization and a close button A class with the paramete
140. sult The interface proposed in this approach requires the user only to click the endpoints of a lines in the real world and set 29 their orientation Details are presented in sections 2 2 and A l e Mathematically formalize distortions and use an optimization framework Many previous approaches used just intuitive and classical ideas for minimizing distor tions as for example centering projections on objects A more precise solution would be to mathematically formalize these distortions and try to minimize them all in the way we saw in section 1 5 1 All this chapter is devoted to develop such optimization solution e Preserve straight lines It is very unnatural and noticeable if a line that is sup posed to be straight in the final result appears bent That happens because we perceive all straight lines in the real world as straight This approach handles this task by allo wing the user to mark curves on the equirectangular image that should map to straight lines on the final result section 2 2 by obtaining an energy that measures how much a marked curve is not straight in the final result section 2 5 and then minimizing this energy among with other energies section 2 8 e Have orientation consistency Another undesirable effect related to lines is when a line that is supposed to have some orientation like the corner between walls or a tower are supposed to be vertical appear with another orientation in the r
141. t Equirectangular image e For each line the user identifies e The user clicks the two endpoints A 91 A2 2 on the equirectangular image e The program computes P PL 49 y t d t t 0 1 and draws in black the curve A t p t on the equirectangular image Here line stands for both r and y 33 e The user types v h or g for the orientation and the color of the curve changes e Output Marked equirectangular image and a list of points We show in figure 2 4 an image produced by the process just explained Figure 2 4 Equirectangular image with lines marked by the user Red lines stands for vertical lines blue for horizontal ones and green for general orientation ones For such marked lines the method produces the result shown in figure 2 5 Figure 2 5 Observe that line orientation is correct now For example the tower appears vertical on the result because the user specified such behavior 2The details of this list are left to section A 2 34 The implementation details can be found in section A 2 In our opinion this interface satisfies the requirement of being simple and intuitive The tasks of clicking endpoints and setting orientations are simple and the procedure to mark all the lines takes about one minute long We try to automate this procedure in section B 2 with the help of Computer Vision techniques It turns out that the obtained results are a good i
142. to be representative are the structural features such as dimension whether the image of an object is an area a curve or a point and presence or absence of holes and self intersections They make the following statement The retinal projections of an image of an object should not contain any structural features that are not present in any retinal projection of the object itself Since most of the visual informations that we have are in the images formed on the retina this statement asks that when we look at an image the objects should not contain any structural feature that we would not see if we looked directly at them They selected three structural requirements to develop their theory e The image of a surface should not be a point e The image of a part of a straight line either should not have self intersections loops or else should be a point e The image of a plane should not have twists on it i e either each point of the plane is projected to a different point in the image or the whole plane is projected to a curve Figure 1 20 illustrates the last two requirements Figure 1 20 Mappings forbidden by the last two requirements They also state desirable conditions These are not as essential as the structural ones because they can be changed with some intervals of tolerance The desirable conditions are the following e Zero curvature condition Images of all possible straight lines should be straight 21 e
143. tortions in indivi dual frames would be noticeable in videos especially if these distortions were transported to other frames using temporal coherence A state of the art method for computing pa noramic images turns out to be necessary and we use the method discussed in chapters 2 and 3 2 Moving objects must be well preserved The regions of a video to which one most pay attention when watching are the moving objects Thus conformality and smoothness should be increased in such regions when there are moving objects in the scene The temporal requirements impose that objects and scene change in a temporally coherent manner Thus they are requirement to be satisfied by the frames depending on other frames We make two temporal requirements 3 Temporal coherence of the scene This is an imposition made for all points that are being projected It depends on the movement of the viewpoint For example if the VP and the FOV are stationary the background should be the same through time 4 Temporal coherence of the objects This imposition is made over moving object regions It tells us that size and orientation of objects should only change if such properties change in the temporal viewing sphere For example if an object becomes twice larger from on time to another on the temporal viewing sphere it should be twice larger from one frame to another in the panoramic video An important observation is that the object should not have the same size
144. us that our detection should be integrated to window 1 as a preprocessing step The user could remove include or extend the detected lines which is easier than looking for all possible lines and marking them Also the user could control the parameters of our line detector in the interface which is more intuitive We leave the integration of our line detector to the interface as future work 150 Conclusion Review In this section we review the main aspects of our work and reinforce which of them are original contributions e Modeling of concepts related to the problem We modeled some concepts that are part of the physical world as mathematical entities For example field of view was modeled as a part of a sphere panoramic images as functions and so on e Bibliographic review We provided a bibliographic review of the panoramic image problem giving details of the main references on this topic and discussing pros and cons of each approach We also obtained conclusions on perceptual properties that are desired in panoramic images such as conformality and preservation of straight lines This review can be seen aS a small contribution of our work e Deep and conclusive analysis of 1 We discussed this reference in a formal and detailed way making our work a good complement for it For example the perturbed optimization which we called linear system method was mentioned very briefly and no details were provided in
145. ve is already orthogonal but R is not unitary An orthonormal tangent basis for T 5 TS is a Lo a R cos Ro Rg and R R Given a panoramic video function U its derivative dU written in the basis Ra Ry Re 101 is represented by the following matrix 20 00 20 00 ro EE dU IR Ro Ri SO Q t 50 Q t H A Q t 5 6 Ar We define the differential north east and time vectors respectively H K and T as the image of dU in the vectors Ro R and R 3 4 dst H dU Ry dU rn n n 1 gt HA 4 t pot y 0 a cos q 1 80 K dU Ra dU RA Ro Re O ES K b t cos 5 p t 0 0 7 ae Q t T dU R dU 0 gt T000 Von 1 The three vectors we just defined tell the variation of the projection U considering all coordinates A q and t We illustrate in figure 4 4 the panoramic video projection U and the tangent vectors in T TS and in U TS 4 5 Transition Functions In order to model temporal coherence we define transition functions between points in different times on the temporal viewing sphere In other words given ty to O to a a transition function between t and ta is given by Puta S XA D Xii A dit gt Anta A D ti P A 0 41 to We illustrate the transition function concept in figure 4 5 We make some comments on the definition above e A transition function defined on time t may not be defined in the entire S
146. y image obtained with Canny edge detector The function cvCanny in OpenCV implements Canny edge detector Its syntax is the following void cvCanny const CvArr img CvArr edges double lowThresh double highThresh int apertureSize 3 138 For theoretical details about the detector and information about the parameters of the above function we recommend 24 pages 151 to 153 We do not explain here which parameters we used to obtain the result in figure B 11 To find straight lines in the image we analyze each white point to yo in the binary image Passing through zo yo there are infinite lines of the form Yo axo 6 Thus there is an infinity of pairs a b corresponding to lines passing through Zo yo Each pair a b receives a vote and the most voted pairs correspond to the lines with more points in the image It turns out that this representation is not convenient for implementation purposes since a and b are in the range oo 00 For this reason each line passing through xo yo is written in polar coordinates p o cos 0 yo sin 0 and represented by p 0 The set of all p 0 representing lines passing through Zo yo form a sinusoidal curve in the p 0 plane called Hough plane as illustrated in figure B 12 figure B 11 Figure B 12 4 lines passing through zo yo are shown in b Each one of these lines are represented by a point p 0 in the Hough plane c All possible line
147. ze Eia to obtain new values for u and thus obtain new values for Us y Vio and Uend e From the new values Usar and Ucna calculate the normal vector l l end Ustart Und e Uia u n Roo e Minimize Ei to obtain new values for u and thus obtain new values for Us y Ustart and Uend e From these new values calculate l l Pal l 2 a u o Ustart Wend E Ustart y ee ad ab oo a E Vis ju u end start e Return to the 2nd step and repeat the process until convergence 2 5 1 Energy terms In the last section we analyzed one single line and obtained straight line energies associated to it 6 Although we do not have a theoretical proof of convergence in practice the results always converged visually after repeating the steps above 4 times at most 48 We now turn such energies into a matrix form and develop the straight line energies for the whole mapping u We start by considering Ly which is the simplest case Let n n1 n2 the normal vector n 1 0 for vertical lines and n 0 1 for horizontal ones Developing the energy term we obtain 2 n gt l l T l Elo l u Ustart q Vi Ha aj Ui Es bi j 1Ui j T Ci 1 j Ui 1 j E di Ua aa EN Astart Wistartjstart qij EVI 2 n ee OE E T ee eee T 5 start tstart Jstart tl start tstart t 1 Jstart start tstart 1lJjstart 1 Ng X 1100 Madi gata Mice jt Mada AAA gig EVI MilstartUista

Content-based Projections for Panoramic Images and - PUC-Rio

Contents

Download Pdf Manuals

Related Search

Related Contents