Audio Analysis and Control - Luận văn a visual art- 123docz.net

4.2.1 Definition of Audio

The electronic representation or replication of sound waves, often in the form of elec- trical impulses, digital data, or analog recordings, is referred to as audio. It includes all audible frequencies, allowing for the collection, processing, and playback of sound for a variety of applications such as music, communication, entertainment, and multimedia.

When we receive sounds from any source, our brain analyses it and gathers some in- formation. Our brain can recognize the pattern of each word matching to it and generate or encode the textual intelligible data into waveform if the sound data is properly formed.

The wave generates numerical data in the form of frequency.

Within the scope of this thesis, it can be referred to our brain as VAG receives the data of audio files and process for the use of manipulating the textures, which is similar to the action of the brain.

4.2.2 Audio File Formats

There are many type of formats for audio files. However, the author would like to use 2 most popular format in the world for the standard input. These two formats are:

• WAV: developed by Microsoft and IBM. It’s a lossless or raw file format and have no compression on the original sound recording.

• MP3: created by the Fraunhofer Society in Germany and is widely used across the world. It’s the most popular file format since it makes music easy to save on portable devices and transfer via the Internet. Despite the fact that mp3 compresses music, it nevertheless provides decent sound quality.

Figure 4.3: The WAV format

Figure 4.4: The MP3 format

4.2.3 Audio Data

Audio data is a digital representation of analog sounds that retains the main qualities of the original. A sound, as we learned in physics class, is a wave of vibrations that travels through a material such as air or water and eventually reaches our ears. When analyzing audio data, three main aspects must be considered: time period, amplitude, and fre- quency.

Time period: it is how long it lasts, or how many seconds it takes to complete one cycle of vibrations.

Amplitude: it is the measure of sound strength in decibels (dB), which we perceive as loudness.

Frequency: measured in Hertz (Hz) indicates how many sound vibrations happen per second. People interpret frequency as low or high pitch.

4.2.4 How to analyze and extract the Audio data ?

There are two main features which are used to manipulate the textures of 3D objects within the scope of this thesis. They areAmplitude andFrequencyof the audio file.

From the library Three JS, we have 2 features calledAudioAnalyzerand Audio. They use theWeb Audio API, which provides a wide range of audio handling and processing on web-based application.

Get the Frequency Data

In Web Audio API, there is an interface calledAnalyserNode. This interface can help to extract the frequency data and put the data in the flow of manipulating the textures.

Figure 4.5: AnalyserNode extract the Frequency data The Amplitude Data

In Web Audio API, there is an interface called GainNode. This interface represents a change in volume. It is an audio-processing module that causes a given gain to be applied to the input data before its propagation to the output. A GainNode always has exactly one input and one output, both with the same number of channels.

Figure 4.6: GainNode modify the Amplitude data

4.2.5 Audio Control with MIDI Devices

In the scope of this thesis, the author uses the MIDI Device calledNovation Launchkey MK2developed by Novation Music.

Figure 4.7: Novation Launchkey MK2

The audio data will be arranged and mapped into suitable control key of the Novation Launchkey and user will be able to control by the CC or notes on the device.

4.3 3D Rendering

From basic advertisements to deep virtual reality, 3D visualization is ubiquitous. 3D rendering is used by architects, product designers, industrial designers, and branding firms to generate attractive, realistic visuals that resemble real life.

In the scope of a 3D rendering web-based application as VAG, it is important to keep it simple and easy to manifest. Moreover, the flow of renders has to be smooth and stable for user to experience. However, it requires some basic knowledge about 3D setup and rendering such as coordinate system, space, coloring, transparency, shadowing, lighting, camera setup, etc.

The author will present some of these basic features of a 3D rendering scene below. This is also the default setup of the VAG.

4.3.1 Coordinate System

Figure 4.8: The Coordinate System

In geometry, a coordinate system is a system that uses one or more integers, or coordinates, to define the position of points or other geometric components on a manifold such as Euclidean space. The order of the coordinates matters, and they are sometimes identified by their position in an ordered tuple, and other times by a letter, as in "the x-coordinate." In elementary mathematics, the coordinates are assumed to be real numbers, but they can also be complex numbers or members of a more abstract system, such as a commutative ring. The use of a coordinate system allows geometry issues to be transformed into numerical problems and vice versa; this is the foundation of analytic geometry.

4.3.2 Geometries

Geometries are the fundamental building elements used in 3D rendering to construct 3D objects and shapes within a virtual 3D space. Geometries define the structure, size, and look of 3D objects and are critical in producing realistic and visually appealing 3D representations. Some examples of basic geometries used in 3D rendering include:

• Cube/Box: A six-sided object with equal-sized square faces.

• Sphere: A perfectly round 3D shape, resembling a ball.

• Cylinder: A 3D shape with circular bases and a curved surface connecting them.

• Cone: A 3D shape with a circular base and a single curved surface tapering to a point.

Figure 4.9: The Basic Geometries

4.3.3 Camera

In the scope of this thesis, Three JS provides two types of camera, which arePerspective CameraandOrthographic Camera. They have distinct properties and are appropriate for various forms of visual representations.

• Viewing Perspective:

– Perspective Camera: A perspective camera models how human eyesight works in real life. By converging distant objects towards a vanishing point, it provides a sense of depth and realism. items appear smaller as they move away from the camera, and the camera’s field of view influences how large items appear.

– Orthographic Camera: The orthographic camera does not exhibit perspective effects. It projects things onto the visual plane without taking their distance from the camera into account. As a result, regardless of their distance, all objects appear the same size in the produced image. This camera is fre- quently used in technical drawings, engineering, and certain stylised painting styles.

• Depth Perception:

– Perspective Camera: The items in the foreground appear larger than those in the background due to perspective projection, creating a sense of depth and spatial connection in the image.

– Orthographic Camera: Lacks depth perception since all objects are the same size regardless of distance from the camera. It shows the scene in an isometric or "flat" perspective.

• Usage:

– Perspective Camera: Commonly employed in scenes that demand a sense of realism, immersion, and depth perception. It’s popular in 3D games, films, animations, and architectural visualizations.

– Orthographic Camera: Better suited for scenarios that require exact and uni- form portrayal without distortion, such as technical drawings, CAD models, 2D gaming views, and some motion graphics.

• Depth Perception:

– Perspective Camera: The items in the foreground appear larger than those in the background due to perspective projection, creating a sense of depth and spatial connection in the image.

– Orthographic Camera: Lacks depth perception since all objects are the same size regardless of distance from the camera. It shows the scene in an isometric or "flat" perspective.

• Rendering Effectiveness:

– Perspective Camera: Rendering with a perspective camera requires more sophisticated calculations, especially for foreshortening and vanishing points.

Rendering with a perspective camera may be slightly more computationally intensive as a result.

– Orthographic Camera: Rendering with an orthographic camera is compar- atively straightforward because it does not account for perspective distortion.

It can be more computationally efficient, especially in scenarios with many items.

• Controlling the Camera:

– Perspective Camera: The field of vision (FOV) of a perspective camera controls the extent of the scene captured. The perceived scale and depth of the produced image are affected by changing the FOV..

– Orthographic Camera: The size of the camera in an orthographic camera controls the scale of the items in the produced image. The apparent size of the items is unaffected by moving the camera closer or farther away from the scene.

Figure 4.10: Two Types of Camera

4.3.4 Lighting

Lighting is crucial in 3D rendering for creating realistic and visually appealing environ- ments. It models how light interacts with 3D objects and determines how they appear in the produced image or animation. Proper lighting can improve the scene’s depth, at- mosphere, and overall realism. In VAG, two most basic lighting areAmbient Lightand Directional Light.

Figure 4.11: Two Types of Lighting