NEWS
Privacy not a blocker for “meaningful” research access to platform data, says report
European lawmakers are eyeing binding transparency requirements for Internet platforms in a Digital Services Act (DSA) due to be drafted by the end of the year. But the question of how to create governance structures that provide regulators and researchers with meaningful access to data so platforms can be held accountable for the content they’re amplifying is a complex one.
Platforms’ own efforts to open up their data troves to outside eyes have been chequered to say the least. Back in 2018, Facebook announced the Social Science One initiative, saying it would provide a select group of academics with access to about a petabyte’s worth of sharing data and metadata. But it took almost two years before researchers got access to any data.
“This was the most frustrating thing I’ve been involved in, in my life,” one of the involved researchers told Protocol earlier this year, after spending some 20 months negotiating with Facebook over exactly what it would release.
Facebook’s political Ad Archive API has similarly frustrated researchers. “Facebook makes it impossible to get a complete picture of all of the ads running on their platform (which is exactly the opposite of what they claim to be doing),” said Mozilla last year, accusing the tech giant of transparency-washing.
Facebook, meanwhile, points to European data protection regulations and privacy requirements attached to its business following interventions by the US’ FTC to justify painstaking progress around data access. But critics argue this is just a cynical shield against transparency and accountability. Plus of course none of these regulations stopped Facebook grabbing people’s data in the first place.
In January, Europe’s lead data protection regulator penned a preliminary opinion on data protection and research which warned against such shielding.
“Data protection obligations should not be misappropriated as a means for powerful players to escape transparency and accountability,” wrote EDPS Wojciech Wiewiorówski. “Researchers operating within ethical governance frameworks should therefore be able to access necessary API and other data, with a valid legal basis and subject to the principle of proportionality and appropriate safeguards.”
Nor is Facebook the sole offender here, of course. Google brands itself a ‘privacy champion’ on account of how tight a grip it keeps on access to user data, heavily mediating data it releases in areas where it claims ‘transparency’. While, for years, Twitter routinely disparaged third party studies which sought to understand how content flows across its platform — saying its API didn’t provide full access to all platform data and metadata so the research couldn’t show the full picture. Another convenient shield to eschew accountability.
More recently the company has made some encouraging noises to researchers, updating its dev policy to clarify rules, and offering up a COVID-related dataset — though the included tweets remains self selected. So Twitter’s mediating hand remains on the research tiller.
A new report by AlgorithmWatch seeks to grapple with the knotty problem of platforms evading accountability by mediating data access — suggesting some concrete steps to deliver transparency and bolster research, including by taking inspiration from how access to medical data is mediated, among other discussed governance structures.
The goal: “Meaningful” research access to platform data. (Or as the report title puts it: Operationalizing Research Access in Platform Governance: What to Learn from Other Industries?)
“We have strict transparency rules to enable accountability and the public good in so many other sectors (food, transportation, consumer goods, finance, etc). We definitely need it for online platforms — especially in COVID-19 times, where we’re even more dependent on them for work, education, social interaction, news and media consumption,” co-author Jef Ausloos tells TechCrunch.
The report, which the authors are aiming at European Commission lawmakers as they ponder how to shape an effective platform governance framework, proposes mandatory data sharing frameworks with an independent EU-institution acting as an intermediary between disclosing corporations and data recipients.
It’s not the first time an online regulator has been mooted, of course — but the entity being suggested here is more tightly configured in terms of purpose than some of the other Internet overseers being proposed in Europe.
“Such an institution would maintain relevant access infrastructures including virtual secure operating environments, public databases, websites and forums. It would also play an important role in verifying and pre-processing corporate data in order to ensure it is suitable for disclosure,” they write in a report summary.
Discussing the approach further, Ausloos argues it’s important to move away from “binary thinking” to break the current ‘data access’ trust deadlock. “Rather than this binary thinking of disclosure vs opaqueness/obfuscation, we need a more nuanced and layered approach with varying degrees of data access/transparency,” he says. “Such a layered approach can hinge on types of actors requesting data, and their purposes.”
A market research purpose might only get access to very high level data, he suggests. Whereas medical research by academic institutions could be given more granular access — subject, of course, to strict requirements (such as a research plan, ethical board review approval and so on).
“An independent institution intermediating might be vital in order to facilitate this and generate the necessary trust. We think it is vital that that regulator’s mandate is detached from specific policy agendas,” says Ausloos. “It should be focused on being a transparency/disclosure facilitator — creating the necessary technical and legal environment for data exchange. This can then be used by media/competition/data protection/etc authorities for their potential enforcement actions.”
Ausloos says many discussions on setting up an independent regulator for online platforms have proposed too many mandates or competencies — making it impossible to achieve political consensus. Whereas a leaner entity with a narrow transparency/disclosure remit should be able to cut through noisy objections, is the theory.
The infamous example of Cambridge Analytica does certainly loom large over the ‘data for research’ space — aka, the disgraced data company which paid a Cambridge University academic to use an app to harvest and process Facebook user data for political ad targeting. And Facebook has thought nothing of turning this massive platform data misuse scandal into a stick to beat back regulatory proposals aiming to crack open its data troves.
But Cambridge Analytica was a direct consequence of a lack of transparency, accountability and platform oversight. It was also, of course, a massive ethical failure — given that consent for political targeting was not sought from people whose data was acquired. So it doesn’t seem a good argument against regulating access to platform data. On the contrary.
With such ‘blunt instrument’ tech talking points being lobbied into the governance debate by self-interested platform giants, the AlgorithmWatch report brings both welcome nuance and solid suggestions on how to create effective governance structures for modern data giants.
On the layered access point, the report suggests the most granular access to platform data would be the most highly controlled, along the lines of a medical data model. “Granular access can also only be enabled within a closed virtual environment, controlled by an independent body — as is currently done by Findata [Finland’s medical data institution],” notes Ausloos.
Another governance structure discussed in the report — as a case study from which to draw learnings on how to incentivize transparency and thereby enable accountability — is the European Pollutant Release and Transfer Register (E-PRTR). This regulates pollutant emissions reporting across the EU, and results in emissions data being freely available to the public via a dedicated web-platform and as a standalone dataset.
“Credibility is achieved by assuring that the reported data is authentic, transparent and reliable and comparable, because of consistent reporting. Operators are advised to use the best available reporting techniques to achieve these standards of completeness, consistency and credibility,” the report says on the E-PRTR.
“Through this form of transparency, the E-PRTR aims to impose accountability on operators of industrial facilities in Europe towards to the public, NGOs, scientists, politicians, governments and supervisory authorities.”
While EU lawmakers have signalled an intent to place legally binding transparency requirements on platforms — at least in some less contentious areas, such as illegal hate speech, as a means of obtaining accountability on some specific content problems — they have simultaneously set out a sweeping plan to fire up Europe’s digital economy by boosting the reuse of (non-personal) data.
Leveraging industrial data to support R&D and innovation is a key plank of the Commission’s tech-fuelled policy priorities for the next five+ years, as part of an ambitious digital transformation agenda.
This suggests that any regional move to open up platform data is likely to go beyond accountability — given EU lawmakers are pushing for the broader goal of creating a foundational digital support structure to enable research through data reuse. So if privacy-respecting data sharing frameworks can be baked in, a platform governance structure that’s designed to enable regulated data exchange almost by default starts to look very possible within the European context.
“Enabling accountability is important, which we tackle in the pollution case study; but enabling research is at least as important,” argues Ausloos, who does postdoc research at the University of Amsterdam’s Institute for Information Law. “Especially considering these platforms constitute the infrastructure of modern society, we need data disclosure to understand society.”
“When we think about what transparency measures should look like for the DSA we don’t need to reinvent the wheel,” adds Mackenzie Nelson, project lead for AlgorithmWatch’s Governing Platforms Project, in a statement. “The report provides concrete recommendations for how the Commission can design frameworks that safeguard user privacy while still enabling critical research access to dominant platforms’ data.”
You can read the full report here.
NEWS
OpenAI Introduces Fine-Tuning for GPT-4 and Enabling Customized AI Models
OpenAI has today announced the release of fine-tuning capabilities for its flagship GPT-4 large language model, marking a significant milestone in the AI landscape. This new functionality empowers developers to create tailored versions of GPT-4 to suit specialized use cases, enhancing the model’s utility across various industries.
Fine-tuning has long been a desired feature for developers who require more control over AI behavior, and with this update, OpenAI delivers on that demand. The ability to fine-tune GPT-4 allows businesses and developers to refine the model’s responses to better align with specific requirements, whether for customer service, content generation, technical support, or other unique applications.
Why Fine-Tuning Matters
GPT-4 is a very flexible model that can handle many different tasks. However, some businesses and developers need more specialized AI that matches their specific language, style, and needs. Fine-tuning helps with this by letting them adjust GPT-4 using custom data. For example, companies can train a fine-tuned model to keep a consistent brand tone or focus on industry-specific language.
Fine-tuning also offers improvements in areas like response accuracy and context comprehension. For use cases where nuanced understanding or specialized knowledge is crucial, this can be a game-changer. Models can be taught to better grasp intricate details, improving their effectiveness in sectors such as legal analysis, medical advice, or technical writing.
Key Features of GPT-4 Fine-Tuning
The fine-tuning process leverages OpenAI’s established tools, but now it is optimized for GPT-4’s advanced architecture. Notable features include:
- Enhanced Customization: Developers can precisely influence the model’s behavior and knowledge base.
- Consistency in Output: Fine-tuned models can be made to maintain consistent formatting, tone, or responses, essential for professional applications.
- Higher Efficiency: Compared to training models from scratch, fine-tuning GPT-4 allows organizations to deploy sophisticated AI with reduced time and computational cost.
Additionally, OpenAI has emphasized ease of use with this feature. The fine-tuning workflow is designed to be accessible even to teams with limited AI experience, reducing barriers to customization. For more advanced users, OpenAI provides granular control options to achieve highly specialized outputs.
Implications for the Future
The launch of fine-tuning capabilities for GPT-4 signals a broader shift toward more user-centric AI development. As businesses increasingly adopt AI, the demand for models that can cater to specific business needs, without compromising on performance, will continue to grow. OpenAI’s move positions GPT-4 as a flexible and adaptable tool that can be refined to deliver optimal value in any given scenario.
By offering fine-tuning, OpenAI not only enhances GPT-4’s appeal but also reinforces the model’s role as a leading AI solution across diverse sectors. From startups seeking to automate niche tasks to large enterprises looking to scale intelligent systems, GPT-4’s fine-tuning capability provides a powerful resource for driving innovation.
OpenAI announced that fine-tuning GPT-4o will cost $25 for every million tokens used during training. After the model is set up, it will cost $3.75 per million input tokens and $15 per million output tokens. To help developers get started, OpenAI is offering 1 million free training tokens per day for GPT-4o and 2 million free tokens per day for GPT-4o mini until September 23. This makes it easier for developers to try out the fine-tuning service.
As AI continues to evolve, OpenAI’s focus on customization and adaptability with GPT-4 represents a critical step in making advanced AI accessible, scalable, and more aligned with real-world applications. This new capability is expected to accelerate the adoption of AI across industries, creating a new wave of AI-driven solutions tailored to specific challenges and opportunities.
This Week in Search News: Simple and Easy-to-Read Update
Here’s what happened in the world of Google and search engines this week:
1. Google’s June 2024 Spam Update
Google finished rolling out its June 2024 spam update over a period of seven days. This update aims to reduce spammy content in search results.
2. Changes to Google Search Interface
Google has removed the continuous scroll feature for search results. Instead, it’s back to the old system of pages.
3. New Features and Tests
- Link Cards: Google is testing link cards at the top of AI-generated overviews.
- Health Overviews: There are more AI-generated health overviews showing up in search results.
- Local Panels: Google is testing AI overviews in local information panels.
4. Search Rankings and Quality
- Improving Rankings: Google said it can improve its search ranking system but will only do so on a large scale.
- Measuring Quality: Google’s Elizabeth Tucker shared how they measure search quality.
5. Advice for Content Creators
- Brand Names in Reviews: Google advises not to avoid mentioning brand names in review content.
- Fixing 404 Pages: Google explained when it’s important to fix 404 error pages.
6. New Search Features in Google Chrome
Google Chrome for mobile devices has added several new search features to enhance user experience.
7. New Tests and Features in Google Search
- Credit Card Widget: Google is testing a new widget for credit card information in search results.
- Sliding Search Results: When making a new search query, the results might slide to the right.
8. Bing’s New Feature
Bing is now using AI to write “People Also Ask” questions in search results.
9. Local Search Ranking Factors
Menu items and popular times might be factors that influence local search rankings on Google.
10. Google Ads Updates
- Query Matching and Brand Controls: Google Ads updated its query matching and brand controls, and advertisers are happy with these changes.
- Lead Credits: Google will automate lead credits for Local Service Ads. Google says this is a good change, but some advertisers are worried.
- tROAS Insights Box: Google Ads is testing a new insights box for tROAS (Target Return on Ad Spend) in Performance Max and Standard Shopping campaigns.
- WordPress Tag Code: There is a new conversion code for Google Ads on WordPress sites.
These updates highlight how Google and other search engines are continuously evolving to improve user experience and provide better advertising tools.
Facebook Faces Yet Another Outage: Platform Encounters Technical Issues Again
Uppdated: It seems that today’s issues with Facebook haven’t affected as many users as the last time. A smaller group of people appears to be impacted this time around, which is a relief compared to the larger incident before. Nevertheless, it’s still frustrating for those affected, and hopefully, the issues will be resolved soon by the Facebook team.
Facebook had another problem today (March 20, 2024). According to Downdetector, a website that shows when other websites are not working, many people had trouble using Facebook.
This isn’t the first time Facebook has had issues. Just a little while ago, there was another problem that stopped people from using the site. Today, when people tried to use Facebook, it didn’t work like it should. People couldn’t see their friends’ posts, and sometimes the website wouldn’t even load.
Downdetector, which watches out for problems on websites, showed that lots of people were having trouble with Facebook. People from all over the world said they couldn’t use the site, and they were not happy about it.
When websites like Facebook have problems, it affects a lot of people. It’s not just about not being able to see posts or chat with friends. It can also impact businesses that use Facebook to reach customers.
Since Facebook owns Messenger and Instagram, the problems with Facebook also meant that people had trouble using these apps. It made the situation even more frustrating for many users, who rely on these apps to stay connected with others.
During this recent problem, one thing is obvious: the internet is always changing, and even big websites like Facebook can have problems. While people wait for Facebook to fix the issue, it shows us how easily things online can go wrong. It’s a good reminder that we should have backup plans for staying connected online, just in case something like this happens again.
-
SEARCHENGINES7 days ago
Daily Search Forum Recap: October 3, 2024
-
WORDPRESS7 days ago
WP Engine sues WordPress co-creator Mullenweg and Automattic, alleging abuse of power
-
SEARCHENGINES6 days ago
Google Ranking Volatility Record, Forbes Advisor Slapped, Bing Generative Search Experience & More
-
WORDPRESS6 days ago
Automattic demanded web host pay $32M annually for using WordPress trademark
-
SEO4 days ago
Google’s AI Overviews Avoid Political Content, New Data Shows
-
SEO7 days ago
YouTube Extends Shorts To 3 Minutes, Adds New Features
-
SEO6 days ago
8% Of Automattic Employees Choose To Resign
-
SEARCHENGINES4 days ago
Google Shopping Researched with AI