ChatGPT Introduces Voice and Image Capabilities

The voice feature, which will be available on iOS and Android, allows users to engage in a two-way conversation with ChatGPT. In collaboration with professional voice actors, OpenAI has developed a selection of five unique voices for users to choose from. Whisper, OpenAI's open-source speech recognition system, ensures accurate transcription of user speech into text.

On the visual front, users can now show ChatGPT images to facilitate more context-aware conversations. This capability can be handy in various scenarios, such as discussing meal options based on fridge contents, or deciphering complex graphs. The underlying technology for image interpretation comprises the advanced multimodal GPT-3.5 and GPT-4 models.

However, with such advancements come challenges. OpenAI acknowledges the potential risks, especially concerning the realistic synthetic voices which could be misused for impersonation or fraud. To counter these concerns, the voice technology has been narrowly applied to voice chat scenarios. Similarly, the image input functionality is said to be designed with privacy in mind, focusing on understanding the broader context of an image rather than analysing individuals.