Personalized AI’s Application in Language Model Development and AI System Creation
The discussion featured Shomir Wilson, Associate Professor at Pennsylvania State University, Scott Yih, Research Scientist at Meta FAIR, Minjoon Seo, Assistant Professor at KAIST, and moderator Tejas Srinivasan, a PhD student at the University of Southern California.
User Perspectives on the Benefits of Personalized AI
- Shomir Wilson initiated the session by addressing the issue of Data Collection and Content for users. As a Privacy Research expert, he observed that many developers prioritize data collection over obtaining user consent for data access. He emphasized the importance of ensuring users understand how their collected data will be utilized within the technology.
- Scott Yih shared his experience from his time at Microsoft's Industrial Research Labs. He highlighted the significance of making products more personalized during development. He cited examples such as shopping platforms displaying irrelevant ads to users who have already made purchases, and concerns about data security, as issues that warrant attention.
- Minjoon Seo used three Large Language Models – GPT, Claude, and Llama – as examples. He explained why he believes AI still has room for improvement, possibly due to a lack of understanding of users' identities or data, leading to an inability to provide customized answers or information. For instance, when using GPT to write emails, the AI might not match his writing style, suggesting that Personalized AI could address this issue.
Balancing Data Privacy for Users and Developers
Regarding Data Privacy, Shomir Wilson stressed the importance of creating a collaborative framework. If users feel apprehensive about their data, it could lead to decreased usage. He also introduced the concept of “Contextual Integrity,” which involves setting boundaries for data sharing with others, categorizing data into highly private and general information. This concept can be applied to personal data collection for product design or even targeted ads not directly related to technology, enhancing users' sense of security regarding their personal data access.
Scott Yih, a Research Scientist at Meta FAIR, revealed that each new model release is accompanied by a paper explaining the type of data used for AI training, helping users understand the purpose of data collection. He also works in Academic Fundamental Research for the organization, disclosing that his organization restricts access to user and company data. Anyone needing access must go through authorization, verifying their identity and requesting permission. This demonstrates the organization's strong emphasis on data security.
The Role of Regulation in Addressing Data Privacy
Shomir Wilson views the key role of AI Regulation as ensuring user confidence that their data will be protected and not misused. It's also essential to inform users about potential risks associated with these technologies in the future. He acknowledged that even lawmakers might not foresee all future technical challenges, but those involved in AI governance can create guidelines explaining Data Privacy principles. This helps users understand how their data is used and stored securely, which he considers the most challenging aspect of addressing Data Privacy concerns.
Minjoon Seo believes many people might worry about AI regulation due to a lack of understanding, but it could also benefit some, depending on the level of privacy and consent. Some might consent to Sensitive Data access, while others might not. Therefore, it's crucial to understand the nuances of personalization in relation to privacy.
Technical Challenges and Strengths in Building Personalized Systems
- From Minjoon Seo's perspective, personalization in Generative AI, particularly Language Models, is achieved by inputting different prompts for training. For example, to elicit varied responses from the Language Model, he would provide prompts indicating the desired type of answer and the type of questions users don't want. This demonstrates the fundamental skill of Prompt Engineering for effective Language Model training.
- However, there might be limitations depending on individual user personalization. Minjoon Seo views personalization as similar to creating Long Context Language Models, prompting companies like Google to develop 100 million token Language Models.
- This leads him to question why not create Fine-Tuning Language Models, customized for personalized use cases, as they maximize the utilization of Personal Data.
- The main challenge in building Language Models for personalization is their current limitations, leading to prompt modifications instead.
- From a research scientist and technical perspective, Scott Yih finds building Large Models to accommodate personalization demands both challenging and intriguing. Language Models have limitations, and user Personal Data might be stored on local devices instead of the cloud, posing a significant challenge in making personalization effective and user-specific.
Opportunities and Challenges for Personalization in AI Systems
- Shomir Wilson emphasized the importance of making personalization work seamlessly for various user groups, respecting and valuing users. For example, using Open-Ended Gender instead of traditional male/female designations. While personalization caters to individual needs, it's crucial to avoid adversely impacting diverse user groups.
- Scott Yih sees personalization as a good way to balance Personal Data usage for both users and developers, designing models that align with specific use cases.
- Minjoon Seo explained that a way to improve Language Model performance is to avoid providing excessive context. For instance, when training a Language Model on Harry Potter data, train it to predict the next token without explicitly stating it's about Harry Potter, encouraging self-learning.
- He also believes in training models with context for diversity, enabling them to handle more personalized input.
From the perspectives of researchers and Language Model development experts in this session, it's evident that integrating personalization into Language Models and AI Systems empowers users to leverage AI's potential and receive information customized to their needs. However, caution is necessary regarding Data Privacy and developing Language Models that cater to diverse user groups. This marks an exciting and promising step in Language Model training and enhancing personalization effectiveness in the future.
Watch this session on Youtube: https://youtu.be/ZLuA3XTVKHc?si=gIj_cKxFSE9oHpqr